MATH 3210 Metric spaces

University of Leeds, School of Mathematics

November 29, 2017

1. Definition and fundamental properties of a metric space. Open sets, closed sets,
closure and interior. Convergence of sequences. Continuity of mappings. (6)
2. Real inner-product spaces, orthonormal sequences, perpendicular distance to a
subspace, applications in approximation theory. (7)
3. Cauchy sequences, completeness of R with the standard metric; uniform convergence
and completeness of C[a, b] with the uniform metric. (3)
4. The contraction mapping theorem, with applications in the solution of equations
and differential equations. (5)
5. Connectedness and path-connectedness. Introduction to compactness and sequential
compactness, including subsets of Rn . (6)

1 Metrics, open and closed sets

We want to generalise the idea of distance between two points in the real line, given
d(x, y) = |x − y|,
and the distance between two points in the plane, given by
d(x, y) = d((x1 , x2 ), (y1 , y2 )) = (x1 − y1 )2 + (x2 − y2 )2 .

to other settings.


This will include the ideas of distances between functions, for example.

1.1 Definition
Let X be a non-empty set. A metric on X, or distance function, associates to each
pair of elements x, y ∈ X a real number d(x, y) such that
(i) d(x, y) ≥ 0; and d(x, y) = 0 ⇐⇒ x = y (positive definite);
(ii) d(x, y) = d(y, x) (symmetric);
(iii) d(x, z) ≤ d(x, y) + d(y, z) (triangle inequality).

(i) X = R. The standard metric is given by d(x, y) = |x − y|. There are many other
metrics on R, for example

d(x, y) = |ex − ey |;

|x − y| if |x − y| ≤ 1,
d(x, y) =
1 if |x − y| ≥ 1.
Let X be any set whatsoever, then we can define

1 if x 6= y,
d(x, y) = (the discrete metric).
0 if x = y,

(ii) X = R2 . The standard metric is the Euclidean metric: if x = (x1 , x2 ) and y =

(y1 , y2 ) then p
d2 (x, y) = (x1 − y1 )2 + (x2 − y2 )2 .
This p
is linked to the inner-product (scalar product), x.y = x1 y1 + x2 y2 , since it is
just (x − y).(x − y). We will study inner products more carefully later, so for the
moment we won’t prove the (well-known) fact that it is indeed a metric.
Other possible metrics include

d∞ (x, y) = max{|x1 − y1 |, |x2 − y2 |}.

Let’s check the axioms. In fact (i) and (ii) are easy (i.e., the distance is positive definite,
symmetric); for (iii) let’s write |x1 −y1 | = p, |x2 −y2 | = q, |y1 −z1 | = r and |y2 −z2 | = s.
Then |x1 − z1 | ≤ p + r and |x2 − z2 | ≤ q + s; so

d∞ (x, z) = max{|x1 − z1 |, |x2 − z2 |}

≤ max{p + r, q + s}
≤ max{p, q} + max{r, s} = d∞ (x, y) + d∞ (y, z).

by inspection.

Another metric on R2 comes from d1 (x, y) = |x1 − y1 | + |x2 − y2 |. These metrics

are all translation-invariant (i.e., d(x + z, y + z) = d(x, y)), and homogeneous (i.e.,
d(kx, ky) = |k|d(x, y)).

(iii) Take X = C[a, b]. Here are three metrics:
Z b
d2 (f, g) = (f (x) − g(x))2 dx.

Again, this is linked to the idea of an inner product, so we will delay proving that it is
a metric.
Z b
d1 (f, g) = |f (x) − g(x)| dx,

the area between two curves [DIAGRAM].

d∞ (f, g) = max{|f (x) − g(x)| : a ≤ x ≤ b},

the maximum separation between two curves. [DIAGRAM].

Example: on C[0, 1] take f (x) = x and g(x) = x2 and calculate

Z 1 1/2
2 2
d2 (f, g) = (x − x ) dx = 1/30,
Z 1
d1 (f, g) = |x − x2 | dx = 1/6, and
d∞ (f, g) = max |x − x2 | = 1/4.

1.2 Definition
A set X together with a metric d is called a metric space, sometimes written (X, d). If
A ⊆ X then we can use d to measure distances between points of A, and (A, d) is also
a metric space, called a subspace of (X, d).

1. The interval [a, b] with d(x, y) = |x − y| is a subspace of R.p
2. The unit circle {(x1 , x2 ) ∈ R2 : x21 + x22 = 1} with d(x, y) = (x1 − y1 )2 + (x2 − y2 )2
is a subspace of R2 .
3. The space of polynomials P is a metric space with any of the metrics inherited from
C[a, b] above.

1.3 Definition

Let (X, d) be a metric space, let x ∈ X and let r > 0. The open ball centred at x, with
radius r, is the set
B(x, r) = {y ∈ X : d(x, y) < r},
and the closed ball is the set

B[x, r] = {y ∈ X : d(x, y) ≤ r}.

Note that in R with the usual metric the open ball is B(x, r) = (x − r, x + r), an open
interval, and the closed ball is B[x, r] = [x − r, x + r], a closed interval.

For the d2 metric on R2 , the unit ball, B(0, 1), is disc centred at the origin, excluding
the boundary. You may like to think about what you get for other metrics on R2 .

1.4 Definition
A subset U of a metric space (X, d) is said to be open, if for each point x ∈ U there is
an r > 0 such that the open ball B(x, r) is contained in U (“room to swing a cat”).

Clearly X itself is an open set, and by convention the empty set ∅ is also considered
to be open.

1.5 Proposition
Every “open ball” B(x, r) is an open set.

Proof: For if y ∈ B(x, r), choose δ = r − d(x, y). We claim that B(y, δ) ⊂ B(x, r).
If z ∈ B(y, δ), i.e., d(z, y) < δ, then by the triangle inequality

d(z, x) ≤ d(z, y) + d(y, x) < δ + d(x, y) = r.

So z ∈ B(x, r).

1.6 Definition
A subset F of (X, d) is said to be closed, if its complement X \ F is open.

Note that closed does not mean “not open”. In a metric space the sets ∅ and X are
both open and closed. In R we have:
(a, b) is open.
[a, b] is closed, since its complement (−∞, a) ∪ (b, ∞) is open.
[a, b) is not open, since there is no open ball B(a, r) contained in the set. Nor is it
closed, since its complement (−∞, a) ∪ [b, ∞) isn’t open (no ball centred at b can be
contained in the set).

1.7 Example

If we take the discrete metric,

1 if x 6= y,
d(x, y) =
0 if x = y,

then each point {x} = B(x, 1/2) so is an open set. Hence every set U is open, since
for x ∈ U we have B(x, 1/2) ⊆ U .

Hence, by taking complements, every set is also closed.

1.8 Proposition
In a metric space, every one-point set {x0 } is closed.

Proof: We need to show that the set U = {x ∈ X : x 6= x0 } is open, so take

a point x ∈ U . Now d(x, x0 ) > 0, and the ball B(x, r) is contained in U for every
0 < r < d(x, x0 ).

1.9 Theorem
Let (Uα )α∈A beSany collection of open subsets of a metric space (X, d) (not necessarily
finite!). Then α∈A Uα is open. Let U and V be open subsets of a metric space (X, d).
Then U ∩V is open. Hence (by induction) any finite intersection of open subsets is open.
Proof: If x ∈ α∈A Uα then there is an Sα with x ∈ Uα . Now Uα is open, so
B(x, r) ⊂ Uα for some r > 0. Then B(x, r) ⊂ α∈A Uα so the union is open.

If now U and V are open and x ∈ U ∩ V , then ∃r > 0 and s > 0 such that B(x, r) ⊂ U
and B(x, s) ⊂ V , since U and V are open. Then B(x, t) ⊂ U ∩ V if t ≤ min(r, s).

So the collection of open sets is preserved by arbitrary unions and finite intersections.

However, an arbitrary intersection ofTopen sets is not always open; for example (− n1 , n1 )
is open for each n = 1, 2, 3, . . ., but ∞ 1 1
n=1 (− n , n ) = {0}, which is not an open set.

For closed sets we swap union and intersection.

1.10 Theorem
Let (Fα )α∈A be any
T collection of closed subsets of a metric space (X, d) (not necessar-
ily finite!). Then α∈A Fα is closed. Let F and G be closed subsets of a metric space
(X, d). Then F ∪ G is closed. Hence (by induction) any finite intersection of closed

subsets is closed.

To prove this we recall de Morgan’s laws. We use the notation S c for the complement
X \ S of a set S ⊂ X.

[ [ \
x 6∈ Aα ⇐⇒ x 6∈ Aα for all α, so ( Aα )c = Acα .
\ \ [
x 6∈ Aα ⇐⇒ x 6∈ Aα for some α, so ( Aα )c = Acα .

Write Uα = Fαc = X \ F
Proof: Tα which cis open. So α∈A Uα is open S by Theorem
1.9. Now, by de Morgan’s
T laws, ( α∈A Fα ) = α∈A Fα . This is just α∈A Uα . Since
its complement is open, α∈A Fα is closed.
Similarly, the complement of F ∪ G is F c ∩ Gc , which is the intersection of two open
sets and hence open by Theorem 1.9. Hence F ∪ G is closed. 
S∞ 1 unions of closed sets do not need to be closed. An example is
n=1 [ n , ∞) = (0, ∞), which is open but not closed.

1.11 Definition
The closure of S, written S, is the smallest closed set containing S, and is contained
in all other closed sets containing S. Also S is dense if S = X.
A smallest closed set containing S does exist, because we can define
S = {F : F ⊃ S, F closed},

the intersection of all closed sets containing S. There is at least one, namely X itself.

1.12 Example in R
The closure of S = [0, 1) is [0, 1]. This is closed, and there is nothing smaller that is
closed and contains S.

1.13 Theorem
The set Q of rationals is dense in R, with the usual metric.

Proof: Suppose that F is a closed subset of R which contains Q: we claim that it

F = R.
For U = R \ F is open and contains no points of Q. But an open set U (unless it is
empty) must contain an interval B(x, r) for some x ∈ U , and hence a rational number.
Our only conclusion is that U = ∅ and F = R, so that Q = R. 

1.14 Proposition

Let S ⊂ X. Then:
(i) S ⊂ S.
(ii) S = S ⇐⇒ S is closed (so S = S).
(iii) S ⊂ T ⇒ S ⊂ T .
(iv) ∅ = ∅, X = X.
(v) S ∪ T = S ∪ T .
(vi) S ∩ T ⊂ S ∩ T .

Proof: All these are quite easy except (v) and (vi) (CHECK).

For (v) note that S ⊂ S and T ⊂ T so S ∪ T ⊂ S ∪T , which is closed, so S ∪ T ⊂ S ∪T .

Also S ⊂ S ∪ T and T ⊂ S ∪ T so S ∪ T ⊂ S ∪ T . So equal.

For (vi), we have S ∩ T ⊂ S and S ∩ T ⊂ T so S ∩ T ⊂ S ∩ T .

But we don’t need to have equality; for example X = R, S = (0, 1), T = (1, 2). Then
S ∩ T = ∅ = ∅, whereas S ∩ T = [0, 1] ∩ [1, 2] = {1}.

1.15 Definition
We say that V is a neighbourhood (nhd) of x if there is an open set U such that
x ∈ U ⊆ V ; this means that ∃δ > 0 s.t. B(x, δ) ⊆ V . Thus a set is open precisely
when it is a neighbourhood of each of its points.

1.16 Example
The half-open interval [0, 1) is a neighbourhood of every point in it except for 0.

1.17 Theorem
For a subset S of a metric space X, we have x ∈ S iff V ∩ S 6= ∅ for all nhds V of x
(i.e., all neighbourhoods of x meet S).

Proof: If there is a neighbourhood of x that doesn’t meet S, then there is an open

subset U with x ∈ U and U ∩ S = ∅. [DIAGRAM?]
But then X \ U is a closed set containing S and so S ⊂ X \ U , and then x ∈
/ S because
x ∈ U.
Conversely, if every neighbourhood of x does meet S, then x ∈ S, as otherwise X \ S
is as open neighbourhood of x that doesn’t meet S. 


1.18 Definition

The interior of S, int S, is the largest open set contained in S, and can be written as
int S = {U : U ⊂ S, U open}.

the union of all open sets contained in S. There is at least one, namely ∅.

We see that S is open exactly when S = int S, otherwise int S is smaller.

1.19 Examples in R
int[0, 1) = (0, 1); clearly this is open and there is no larger open set contained in [0, 1).

int Q = ∅. For any non-empty open set must contain an interval B(x, r) and then it
contains an irrational number, so isn’t contained in Q.

1.20 Proposition
int S = X \ (X \ S).

Proof: By De Morgan’s laws,

int S = {U : U ⊂ S, U open}
= X \ {U c : U ⊂ S, U open}
= X \ {F : F ⊃ X \ S, F closed} = X \ (X \ S).

This is because U ⊂ S if and only if U c = X \U ⊃ X \S. Also F = U c is closed precisely

when U is open. That is, there is a correspondence between open sets contained in S
and closed sets containing its complement.

1.21 Corollary
(i) int S ⊂ S.
(ii) int S = S ⇐⇒ S is open.
(iii) S ⊂ T ⇒ int S ⊂ int T .
(iv) int (int S) = int S.
(v) int(S ∪ T ) ⊃ int S ∪ int T .
(vi) int(S ∩ T ) = int S ∩ int T .

Proof: Easy, or take complements and use Prop’s 1.14 and 1.20.

1.22 Definition
The boundary or frontier of S is ∂S = S \ int S = S ∩ X \ S.
This writes ∂S as the intersection of two closed sets, so it is also closed.

1.23 Examples in R

For S = [0, 1) we have int S = (0, 1) and S = [0, 1] so ∂S = {0, 1}.

For S = Q we have int S = ∅ and S = R, so ∂S = R.

1.24 Examples in R2
For S = {(x, y) : x2 + y 2 < 1}, we have int S = S and S = {(x, y) : x2 + y 2 ≤ 1}, so
∂S is the circle {(x, y) : x2 + y 2 = 1}.

For S = [0, 1) regarded as the subset {(x, y) : 0 ≤ x < 1, y = 0} of R2 , we have

S = {(x, y) : 0 ≤ x ≤ 1, y = 0} and int S = ∅ so ∂S = S.

2 Convergence and continuity

Let (xn ) be a sequence in a metric space (X, d), i.e., x1 , x2 , . . .. (Sometimes we may
start counting at x0 .)

2.1 Definition
We say xn → x (i.e., xn tends to x or converges to x) if d(xn , x) → 0 as n → ∞. That
is, for all ε > 0 there is an N such that d(xn , x) < ε for n ≥ N (“for n sufficiently

This is the usual notion of convergence if we think of points in Rm with the Euclidean

2.2 Theorem
(i) The sequence (xn ) tends to x if and only if for every open U with x ∈ U , ∃n0 s.t.
xn ∈ U for all n ≥ n0 .

(ii) Let S be a subset of the metric space X. Then x ∈ S if and only if there is a
sequence (xn ) of points of S with xn → x.

Proof: (i) If xn → x and x ∈ U , then there is a ball B(x, ε) ⊂ U , since U is open.

But xn → x so d(xn , x) < ε for n sufficiently large, i.e., xn ∈ U for n sufficiently large.

Conversely, if the “open set” condition works, and ε > 0, choose U = B(x, ε). Then
xn ∈ U for n sufficiently large, and so d(xn , x) < ε for n large.

(ii) If x ∈ S, then for each n we have B(x, n1 ) ∩ S 6= ∅ by Theorem 1.17. So choose

xn ∈ B(x, 1/n) ∩ S. Clearly d(xn , x) → 0, i.e., xn → x.

Conversely, if x 6∈ S, then there is a neighbourhood U of x with U ∩ S = ∅. Now no

sequence in S can get into U so it cannot converge to x.

2.3 Examples
1. Take (R2 , d1 ), where d1 (x, y) = |x1 − y1 | + |x2 − y2 |, where x = (x1 , x2 ) and
y = (y1 , y2 ), and consider the sequence ( n1 , 2n+1
). We guess its limit is (0, 2). To see if
this is right, look at
1 2n + 1 1 2n + 1 1 1
d1 , , (0, 2) = +
− 2 = + →0
n n+1 n n+1 n n+1

as n → ∞. So the limit is (0, 2).

2. In C[0, 1] let fn (t) = tn and f (t) = 0 for 0 ≤ t ≤ 1. Does fn → f , (a) in d1 , and (b)
in d∞ ?

(a) Z 1
d1 (fn , f ) = tn dt = →0
0 n+1
as n → ∞. So fn → f in d1 .

d∞ (fn , f ) = max{tn : 0 ≤ t ≤ 1} = 1 6→ 0
as n → ∞. So fn 6→ f in d∞ .

Note: say gn  → g pointwise on [a, b] as n → ∞ if gn (x) → g(x) for all x ∈ [a, b]. If we
0 for 0 ≤ x < 1,
define g(x) = then fn → g pointwise on [0, 1]. But g 6∈ C[0, 1], as
1 for x = 1,
it is not continuous at 1.

3. Take the discrete metric

1 if x 6= y,
d0 (x, y) =
0 if x = y.

Then xn → x ⇐⇒ d0 (xn , x) → 0. But since d0 (xn , x) = 0 or 1, this happens if and

only if d0 (xn , x) = 0 for n sufficiently large. That is, there is an n0 such that xn = x
for all n ≥ n0 .
All convergent sequences in this metric are eventually constant. So, for example
d0 (1/n, 0) 6→ 0.

A result on convergence in R2 .

2.4 Proposition

Take R2 with any of the metrics d1 , d2 and d∞ . Then a sequence xn = (an , bn ) con-
verges to x = (a, b) if and only if an → a and bn → b.

Proof: We have d1 (xn , x) = |an − a| + |bn − b|. This tends to zero as n → ∞ if and
only if each of the terms |an − a| and |bn − b| does. And that’s the same as saying that
an → a and bn → b.

Also d2 (xn , x) = (|an − a|2 + |bn − b|2 )1/2 , which tends to zero if and only if |an − a|2 +
|bn − b|2 does; this happens if and only if |an − a|2 and |bn − b|2 tend to zero, which is
the same as an → a and bn → b.

Finally, d∞ (xn , x) = max{|an − a|, |bn − b|}. If this tends to zero then so do |an − a|
and |bn − b| as they are smaller and still positive; and if they both tend to zero then
so does their maximum, which is less than their sum. Again this is the same as saying
an → a and bn → b.

A similar result holds for Rk in general.

Now let’s look at continuous functions again.

2.5 Theorem
If fn → f in (C[a, b], d∞ ), then fn → f in (C[a, b], d1 ).

(d∞ convergence is stronger than d1 convergence.)

Proof: d∞ (fn , f ) = max{|fn (x) − f (x)| : a ≤ x ≤ b} → 0 as n → ∞, so, given ε > 0

there is an N so that d∞ (fn , f ) < ε for n ≥ N . It follows that if n ≥ N then
Z b Z b
d1 (fn , f ) = |fn (x) − f (x)| dx ≤ ε dx = ε(b − a),
a a

so d1 (fn , f ) → 0 as n → ∞.

Note: It is also true that if d∞ (fn , f ) → 0 then fn → f pointwise on [a, b]. The
converse is FALSE.

Now we look at continuous functions between general metric spaces.

2.6 Definition
Let f : (X, dX ) → (Y, dY ) be a map between metric spaces. We say that f is continuous
at x0 ∈ X if for each ε > 0 there is a δ > 0 such that dY (f (x), f (x0 )) < ε whenever
dX (x, x0 ) < δ.

So f is continuous, if it is continuous at all points of X.

2.7 Proposition
For f as above, f is continuous at x0 if, whenever a sequence xn → x0 , then f (xn ) →
f (x0 ) (“sequential continuity”).

Proof: Same proof as in real analysis, more or less. If f is continuous at x0 and

xn → x0 . Then for each ε > 0 we have a δ > 0 such that dY (f (x), f (x0 )) < ε whenever
dX (x, x0 ) < δ.
Then there’s an n0 with d(xn , x0 ) < δ for all n ≥ n0 , and so d(f (xn ), f (x0 )) < ε for all
n ≥ n0 . Thus f (xn ) → f (x).

Conversely, if f is not continuous at x0 , then there is an ε for which no δ will do,

so we can find xn with d(xn , x0 ) < 1/n but d(f (xn ), f (x0 )) ≥ ε. Then xn → x0 but
f (xn ) 6→ f (x0 ).

But there is a nicer way to define continuity. For a mapping f : X → Y and a set
U ⊂ Y , let f −1 (U ) be the set

f −1 (U ) = {x ∈ X : f (x) ∈ U }.

This makes sense even if f −1 is not defined as a function.

2.8 Theorem
A function f : X → Y is continuous if and only if f −1 (U ) is open in X for every open
subset U ⊂ Y .

(“The inverse image of an open set is open.” Note that for f continuous we do not
expect f (U ) to be open for all open subsets of X, for example f : R → R, f ≡ 0, then
f (R) = {0}, not open.)

Proof: Suppose that f is continuous, that U is open, and that x0 ∈ f −1 (U ), so
f (x0 ) ∈ U . Now there is a ball B(f (x0 ), ε) ⊂ U , since U is open, and then by
continuity there is a δ > 0 such that dY (f (x), f (x0 )) < ε whenever dX (x, x0 ) < δ. This
means that for d(x, x0 ) < δ, f (x) ∈ U and so x ∈ f −1 (U ). That is, f −1 (U ) is open.

Conversely, if the inverse image of an open set is open, and x0 ∈ X, let ε > 0 be given.
We know that B(f (x0 ), ε) is open, so f −1 (B(f (x0 ), ε)) is open, and contains x0 . So it
contains some B(x0 , δ) with δ > 0.

But now if d(x, x0 ) < δ, we have x ∈ B(x0 , δ) ⊂ f −1 (B(f (x0 ), ε)) so f (x) ∈ B(f (x0 ), ε)
and we have d(f (x), f (x0 )) < ε.

2.9 Example
Let X = R with the discrete metric, and Y any metric space. Then all functions
f : X → Y are continuous!

(i) Because the inverse image of an open set is an open set, since all sets are open.
(ii) Because whenever xn → x0 we have xn = x0 for n large, so obviously f (xn ) → f (x0 ).

2.10 Proposition
(i) A function f : X → Y is continuous if and only if f −1 (F ) is closed whenever F is
a closed subset of Y .

(ii) If f : X → Y and g : Y → Z are continuous, then so is the composition

g ◦ f : X → Z defined by (g ◦ f )(x) = g(f (x)).


Proof: (i) We can do this by complements, as if F is closed, then U = F c is open,

and f −1 (F ) = f −1 (U )c (a point is mapped into F if and only if it isn’t mapped into
U ).
Then f −1 (F ) is always closed when F is closed ⇐⇒ f −1 (U ) is always open when U
is open.

(ii) Take U ⊂ Z open; then (g ◦ f )−1 (U ) = f −1 (g −1 (U )); for these are the points which
map under f into g −1 (U ) so that they map under g ◦ f into U .
Now g −1 (U ) is open in Y , as g is continuous, and then f −1 (g −1 (U )) is open in X since
f is continuous.

2.11 Definition
A function f : X → Y is a homeomorphism between metric spaces if it is a bijection
s.t. f and f −1 are continuous. Then we say X and Y are homeomorphic, or X ∼ Y .

2.12 Example
The real line R is homeomorphic to the open interval (0, 1). For if we take y =
tan−1 x this maps it homeomorphically onto (−π/2, π/2), and this can be mapped
homeomorphically onto (0, 1), e.g. by z = π1 (y + π/2).

3 Real inner-product spaces
Notation: vectors written u, v, w, etc. (Sometimes just u, v, w).
Scalars written a, b, c, etc.
Functions written f , g, h.
Coordinates of a vector u normally written u1 , u2 , u3 , etc.

3.1 Inner product in Rn

For vectors u = (u1 , u2 ) and v = (v1 , v2 ) in R2 we write hu, vi for the standard inner
hu, vi = u1 v1 + u2 v2 ;
sometimes written u.v or (u, v).
We can do similarly for vectors in Rn where n = 1, 2, 3, . . ., (i.e., n components), so if
u = (u1 , . . . , un ) and v = (v1 , . . . , vn ) we have

hu, vi = u1 v1 + u2 v2 + . . . + un vn .

For example,

h(1, 2, 3, 4), (0, −1, 5, 2)i = 1.0 − 2.1 + 3.5 + 4.2 = 21.

3.2 Standard properties of the scalar product

hau + bv, wi = ahu, wi + bhv, wi,
for a, b real and u, v, w vectors.

hu, vi = hv, ui.


hu, ui ≥ 0 for all u, and we have hu, ui = 0 if and only if u = 0.
The first two are easy to check. For III note that hu, ui = u21 + . . . + u2n ≥ 0, and it
will be zero if and only if u1 = . . . = un = 0.

3.3 Definition of a general (real) inner product

Let V be a real vector space and suppose that we have for each pair of vectors u, v in
V a real number written hu, vi, such that properties I, II and III of (3.2) hold. Then
we call V a real inner product space, and hu, vi the inner product of u and v.

N.B. In quantum mechanics and elsewhere people use complex inner products. Not in
this course.


3.4 Examples
1. The usual inner product on Rn .
2. We can define a new inner product on R2 by
hu, vi = 2u1 v1 + 3u2 v2 .
Easily checked to be linear (do it!) and symmetric. For positive definiteness, note that
hu, ui = 2u21 + 3u22 ≥ 0
and is > 0 unless u1 = u2 = 0.
The following alternative is not an inner product, e.g. define
hu, vi = 2u1 v1 − 3u2 v2 ,
so hu, ui = 2u21 − 3u22 , and would be negative if u = (0, 1), say.
3. For a < b define C[a, b] to be the vector space of all continuous real functions on
[a, b].
For f , g ∈ C[a, b] define
Z b
hf, gi = f (x)g(x) dx.
Example: in C[0, 1], let f (x) = x + 1 and g(x) = 2x.
Z 1 Z 1  3 1
2 2x 2 5
hf, gi = (x + 1)(2x) dx = (2x + 2x) dx = +x = .
0 0 3 0 3
3.5 Other properties of inner products
(a) hu, av +bwi = hav +bw, ui (rule II) = ahv, ui+bhw, ui (rule I) = ahu, vi+bhu, wi
(rule II again).
So it is linear in the second argument as well as the first.
(b) h0, ui = h0u + 0u, ui = 0hu, ui + 0hu, ui = 0 for all u, using rule I.
Also hu, 0i = h0, ui = 0, using rule II. This is for any u ∈ V .

(c) More generally we can check that

ha1 u1 + a2 u2 + . . . + aN uN , b1 v1 + b2 v2 + . . . + bM vM i
behaves like multiplication, and we get
ai bj hui , vj i.
i=1 j=1

4 Lengths, angles, orthogonality
4.1 Definition
In an inner product space we define the length of a vector v (sometimes called its size
or norm) by p
kvk = hv, vi.
Note that hv, vi is always ≥ 0; also by property III, kvk = 0 if and only if v = 0.
This agrees with what we usually do in R√ , e.g. v = (3, 4, −12), then kvk2 = 32 + 42 +
(−12)2 = 9 + 16 + 144 = 169, so kvk = 169 = 13.
Example: in C[−1, 1] let f (x) = x. Then
1 1
2 2 2
kf k = x dx = = ,
−1 3 −1 3
so kf k = 2/3.
Note that if v ∈ V and a ∈ R, then

kavk2 = hav, avi = a2 hv, vi = a2 kvk2 ,

so kavk = a2 kvk = |a|.kvk, taking the positive square root.
For example (−2)v is twice as big as v, but with direction reversed.

4.2 Definition
The angle between two non-zero vectors u and v is the unique solution θ to

hu, vi = kuk kvk cos θ

in the range 0 ≤ θ ≤ π (radians!) It is easy to check that the angle between u and u
is 0, and the angle between u and −u is π.
We say u and v are orthogonal if hu, vi = 0. This is because the angle between them
satisfies cos θ = 0 so θ = π/2. This is sometimes written u ⊥ v.
To make sense of our definition we will need to know that
hu, vi
cos θ =

lies between −1 and 1; see later.

Example: in C[0, 1] find the number a such that the functions f (t) = t and g(t) =
3t + a are orthogonal.
Solution: 1
3 a
hf, gi = t(3t + a) dt = t + =1+ ,
0 2 0 2

so hf, gi = 0 ⇐⇒ 1 + 2
= 0, or a = −2.
More generally, a set of vectors {u1 , . . . , uN } is an orthogonal set if hui , uj i = 0 when-
ever i 6= j.

4.3 Pythagoras’s theorem

If hu, vi = 0 then ku + vk2 = kuk2 + kvk2 .

[DIAGRAM – square on the hypotenuse etc.]


ku + vk2 = hu + v, u + vi
= hu, ui + hv, ui + hu, vi + hv, vi
= kuk2 + 0 + 0 + kvk2 ,

using orthogonality.

4.4 Parallelogram identity

ku + vk2 + ku − vk2 = 2kuk2 + 2kvk2 .

[DIAGRAM – draw a parallelogram.] The sums of the squares of the two diagonals
equals the sums of the squares of the four sides.
Proof: expand the inner products; see the example sheets.

5 Cauchy–Schwarz and its consequences

In order to make sense of (4.2) we need the following.

5.1 Cauchy–Schwarz inequality

For u and v in an inner-product space,

hu, vi2 ≤ hu, ui hv, vi,

i.e., |hu, vi| ≤ kuk kvk.

Example: if u1 , . . . , un and v1 , . . . .vn are real numbers, then

n n
!1/2 n
ui vi ≤ u2i vi2 .

i=1 i=1 i=1

Note that the LHS is |hu, vi| and the RHS is kuk kvk, where u = (u1 , . . . , un ),
v = (v1 , . . . , vn ) and we use the standard inner product in Rn .

We give two proofs, and in each we assume that u 6= 0 and v 6= 0 (otherwise the
inequality is obvious).

Proof 1:

kau − bvk2 = a2 kuk2 − 2abhu, vi + b2 kvk2 ≥ 0,
with a = hu, vi and b = kuk2 . We get

kuk2 (hu, vi2 − 2hu, vi2 + kuk2 kvk2 ) ≥ 0,

which gives the result.

Proof 2:

For real t we have

htu + v, tu + vi ≥ 0,
t2 hu, ui + 2thu, vi + hv, vi ≥ 0.
We’ll minimize this over t, so by differentiation this is where

2thu, ui + 2hu, vi = 0.

So we put t = −hu, vi/hu, ui, and we get

hu, vi2 hu, vi2
−2 + hv, vi ≥ 0.
hu, ui hu, ui
This simplifies to
hu, vi2
− + hv, vi ≥ 0,
hu, ui
hu, vi2
≤ hv, vi,
hu, ui
which is what is required.

hu, vi
NOW we know that lies between -1 and 1, and so the definition of angle
kuk kvk
makes sense.

5.2 Triangle inequality
In an inner product space we have

ku + vk ≤ kuk + kvk.

For example, in Rn this gives

!1/2 n
!1/2 n
(ui + vi )2 ≤ u2i + vi2 .
i=1 i=1 i=1

[DIAGRAM – triangle of vectors]


ku + vk2 = hu + v, u + vi
= kuk2 + 2hu, vi + kvk2
≤ kuk2 + 2kuk kvk + kvk2 = (kuk + kvk)2 .

5.3 Theorem
In an inner-product space the norm (length) of a vector satisfies
(i) kuk ≥ 0, and kuk = 0 if and only if u = 0;
(ii) kauk = |a| kuk;
(iii) ku + vk ≤ kuk + kvk.

5.4 Corollary
Let V be an inner-product space, and define d(x, y) = kx − yk. Then d is a metric.

Proof: From Theorem 5.3, we see easily that d(x, y) ≥ 0 and d(x, y) = 0 if and
only if x − y = 0, i.e., x = y.
Also d(x, y) = kx − yk = ky − xk = d(y, x).

d(x, z) = kx − zk = k(x − y) + (y − z)k ≤ kx − yk + ky − zk = d(x, y) + d(y, z).

So every inner-product space is a metric space.

5.5 The space `2

P∞ elements of the space `2 (also written `2 ) are real sequences (uk )∞
k=1 such that
k=1 ku < ∞.
1 2
So, for example ( 21 , 14 , 81 ,P

. . .) ∈ `2 , since k=1 2k < ∞ (geometric series); but
2 ∞ 2
(1, 2, 3, 4, . . .) 6∈ ` , since k=1 k = ∞.

We shall get a vector space by adding sequences term-wise; if u = (uk ) and v = (vk ),
then u + v = (uk + vk ) and au = (auk ), just like vectors with an infinite sequence of
How do we know that (uk + vk ) is still in `2 ?

Proof: for each N ,

!1/2 N
!1/2 N
(uk + vk )2 ≤ u2k + vk2
k=1 k=1 k=1

!1/2 ∞
≤ u2k + vk2 = A,
k=1 k=1

the triangle inequality in RN . Since this holds for every N we

say, where we used first P
let N → ∞ to see that N 2 2
k=1 (uk + vk ) converges, and its limit is at most A .

In fact `2 is an inner-product space; define

hu, vi = uk vk .

To see that this sum converges, use Cauchy–Schwarz in RN :

!1/2 N
|uk vk | ≤ u2k vk2
k=1 k=1 k=1

!1/2 ∞
≤ u2k vk2 = B,
k=1 k=1
P∞ P∞
say. Hence k=1 |uk vk | converges to a limit which is at most B. So k=1 uk vk is
absolutely convergent.

It is easy now to check that this defines an inner product. Also kuk2 = hu, ui =
P ∞ 2 n
k=1 uk , so it is like R with n = ∞. It is an infinite-dimensional vector space, but a
very useful one.


6 Orthonormal sets
6.1 Definition

A set of vectors {e1 , . . . , en } in an inner product space is orthonormal if it is orthogonal
and each vector has norm 1. So

0 if i 6= j,
hei , ej i =
1 if i = j.

If it’s also a basis for the inner product space, then we call it an orthonormal basis.

(i) (1, 0, 0), (0, 1, 0), (0, 0, 1) is an orthonormal basis of R3 (the standard basis);
(ii) An unusual orthonormal basis of R2 is e1 = ( 53 , 45 ) and e2 = (− 45 , 53 ).
[DIAGRAM – draw the vectors]

6.2 Proposition
If {e1 , . . . , en } is orthonormal, then


ai e i = a2i ,

i=1 i=1

for any scalars a1 , . . . , an , and so the vectors {e1 , . . . , en } are linearly independent.
Proof: * n +
X n
X n X
X n
ai ei , aj e j = ai aj hei , ej i,
i=1 j=1 i=1 j=1
by (3.5). All terms except for those with i = j are zero, and we get i=1 a2i , as required.

Also, if ni=1 ai ei = 0, then ni=1 a2i = 0, and so a1 = . . . = an = 0; i.e., the vectors

are independent.

6.3 The Gram–Schmidt process

We start with a sequence v1 , . . . , vn of independent vectors and end up with a se-
quence e1 , . . . , en of orthonormal vectors such that for each 1 ≤ k ≤ n the set
{e1 , . . . , ek } spans the same subspace as {v1 , . . . , vk }.

Define w1 = v1 and e1 = w1 /kw1 k.

Let w2 = v2 − hv2 , e1 ie1 , and e2 = w2 /kw2 k.

Then w3 = v3 − hv3 , e1 ie1 − hv3 , e2 ie2 , and e3 = w3 /kw3 k.

In general
wk+1 = vk+1 − hvk+1 , ei iei , and ek+1 = wk+1 /kwk+1 k.

Then {e1 , . . . , en } are orthonormal and for each k the vectors e1 , . . . , ek span the same
space as v1 , . . . , vk .

Proof: Basically, the orthonormality property is shown by induction.

Suppose that we know that e1 , . . . , ek are orthonormal (k = 1 is already done).

Then we work out hwk+1 , ej i for j ≤ k. So
hwk+1 , ej i = hvk+1 , ej i − hvk+1 , ei ihei , ej i = hvk+1 , ej i − hvk+1 , ej i = 0.

So each new vector wk+1 and hence also ek+1 is orthogonal to the earlier ej . It isn’t
zero since vk+1 is independent of v1 , . . . , vk .

Also ek+1 = wk+1 /kwk+1 k implies that kek+1 k = kwk+1 k/kwk+1 k = 1.

The span of e1 , . . . , ek is k-dimensional and contained in span{v1 , . . . , vk }, so must

equal it.

Example: Take v1 = (1, 0, 0, 1), v2 = (2, 3, 2, 0) and v3 = (0, 7, −2, 2) in R4 .

Set w1 = v1 and
e1 = w1 /kw1 k = √ (1, 0, 0, 1).
1 1
w2 = v2 − hv2 , e1 ie1 = (2, 3, 2, 0) − √ 2 √ (1, 0, 0, 1) = (1, 3, 2, −1).
2 2
Note that w2 ⊥ e1 . Then
e2 = w2 /kw2 k = √ (1, 3, 2, −1).

w3 = v3 − hv3 , e1 ie1 − hv3 , e2 ie2

2 1 1 1
= (0, 7, −2, 2) − √ √ (1, 0, 0, 1) − √ 15 √ (1, 3, 2, −1)
2 2 15 15
= (0, 7, −2, 2) − (1, 0, 0, 1) − (1, 3, 2, −1) = (−2, 4, −4, 2).

(−2, 4, −4, 2) 1
e3 = w3 /kw3 k = √ = √ (−1, 2, −2, 1).
40 10
Having done this, CHECK that

1 if i = j,
hei , ej i =
0 if i 6= j.

Example (Legendre polynomials):
Take the functions 1, t, t2 , t3 , . . . in C[−1, 1] with inner product
Z 1
hf, gi = f (t)g(t) dt.
R1 √
Now k1k2 = −1
1 dt = 2, so e1 (t) = 1/ 2.

Next take
Z 1
1 t
w2 (t) = t − ht, e1 ie1 = t − √ √ dt = t − 0 = t.
2 −1 2
Also r
Z 1
2 2 2 3
kw2 k = t dt = , so e2 (t) = w2 (t)/kw2 k = t.
−1 3 2

w3 (t) = t2 − ht2 , e1 ie1 (t) − ht2 , e2 ie2 (t)

1 1 2
Z Z 1
2 3
= t − t dt − t t3 dt
2 −1 2 −1
1 1
= t2 − − 0 = t2 − .
3 3
Z 1 Z 1
2 1 2 1
kw3 k = 2
(t − )2 dt = (t4 − t2 + ) dt
−1 3 −1 3 9
2 4 2 8
= − + = ,
5 9 9 45
so r r
45 2 1 5 2
e3 (t) = (t − ) = (3t − 1).
8 3 8
In general, en (t) has degree n − 1. Lots of useful systems of polynomials are obtained
by orthonormalizing 1, t, t2 , t3 , . . . with respect to different inner products (e.g. Cheby-
shev, Hermite, Laguerre, . . . ).


7 Orthogonal projections and best approximation

Many approximation problems consist of taking a vector v and a subspace W of an
inner-product space, and then finding the closest element w in W to v, i.e., minimizing
the size of the error v − w.

1. Take R3 with the usual inner product and W a plane through the origin.
The closest point of W is obtained by “dropping a perpendicular onto W ”.


2. Find the best approximation to the function f (t) = |t| on [−1, 1] by a quadratic
g(t) = a + bt + ct2 , in the sense of minimizing
Z 1
kf − gk = (f (t) − g(t))2 dt.

7.1 Theorem
Let W be a (finite-dimensional) subspace of an inner-product space V , let v ∈ V , and
let w ∈ W satisfy
hv − w, zi = 0 for all z ∈ W.
Then kv − yk ≥ kv − wk for all y ∈ W . That is, w is the closest point in W to v, and
it is unique.

[DIAGRAM: plot v, w, y.]

Proof: for y ∈ W write v − y = (v − w) + (w − y) and note that v − w is orthogonal

to w − y, since w − y is in W .
By Pythagoras’s theorem (4.3),

kv − yk2 = kv − wk2 + kw − yk2 ≥ kv − wk2 ,

as required. Note that if y 6= w, then kv − yk > kv − wk so the closest point is unique.

7.2 Definition
If W is a subspace of an inner product space V , then its orthogonal complement, W ⊥ ,
is the set of all vectors u that are orthogonal to every vector of W .
Clearly 0 ∈ W ⊥ , and indeed W ⊥ is a subspace, since if u1 and u2 are orthogonal to
everything in W , then ha1 u1 + a2 u2 , wi = a1 hu1 , wi + a2 hu2 , wi = 0 for all w ∈ W .

Example: if W is the 1-dimensional subspace of R3 spanned by the vector w = (3, 5, 7)
then x = (x1 , x2 , x3 ) is in W ⊥ if and only if hx, wi = 0, i.e., 3x1 + 5x2 + 7x3 = 0. This
is the plane perpendicular to W .

It can be checked that (W ⊥ )⊥ is W again.

Now in (7.1) we have that if v − w lies in W ⊥ , then w is the best approximation to v

by vectors in W .

7.3 The normal equations

Suppose that w1 , . . . , wn is a basis for W . Then the best approximant w to v is found
by solving
hv − w, wi i = 0 for each i,
because this makes v − w orthogonal to all linear combinations of the wi . Hence we
hw, wi i = hv, wi i for each i.
Suppose now that w = k=1 ck wk is the best approximant. Then we have
ck hwk , wi i = hv, wi i

for each i = 1, . . . , n.

In C[−1, 1] we take f (t) = |t|; to approximate it by a quadratic take w1 (t) = 1,
w2 (t) = t and w3 (t) = t2 .
The best approximant c0 + c1 t + c2 t2 to |t| satisfies:

c0 h1, 1i + c1 ht, 1i + c2 ht2 , 1i = hf, 1i,

c0 h1, ti + c1 ht, ti + c2 ht2 , ti = hf, ti,
c0 h1, t2 i + c1 ht, t2 i + c2 ht2 , t2 i = hf, t2 i.

Now we can easily check that

Z 1 
k 0 if k is odd,
t dt = 2
if k is even,
−1 k+1

so we can soon calculate inner products and get

Z 1
2c0 + 0 + c2 = |t| dt = 1
3 −1

Z 1
0 + c1 + 0 = |t|t dt = 0,
3 −1
Z 1
2 2 1
c0 + 0 + c2 = t2 |t| dt = .
3 5 −1 2

Note that Z 1 Z 0 Z 1
|t| dt = (−t) dt + t dt,
−1 −1 0
3 15
The solution to these equations is c0 = 16
, c1 = 0 and c2 = 16
, giving the approximation

3 15
|t| ≈ + t2 .
16 16

7.4 Corollary
Suppose that e1 , . . . , en is an orthonormal basis for W . Then the best approximant of
v ∈ V by an element of W is
w= hv, ek iek .

Proof: Let w = k=1 ck ek . Then the normal equations become
ck hek , ei i = hv, ei i,

which reduces to ci = hv, ei i using orthonormality.

Thus we could have solved the example of approximating f (t) = |t| by using an or-
thonormal basis for the quadratic polynomials, e.g. the Legendre functions.

7.5 Definition
The orthogonal projection of v onto W , written PW v, is the closest vector w ∈ W to
v. In particular,
PW v = hv, ek iek ,

if {e1 , . . . , en } is an orthonormal basis of W . Note that PW : V → W is a linear


Example: the plane W = {(x1 , x2 , x3 ) ∈ R3 : x1 + x2 + x3 = 0} is a 2-dimensional
subspace with orthonormal basis e1 = √12 (1, −1, 0) and e2 = √16 (1, 1, −2). CHECK
that these are orthonormal and lie in W (so, since dim W = 2, they are also a basis for

Calculate PW (1, 0, 0). It is

PW (1, 0, 0) = h(1, 0, 0), e1 ie1 + h(1, 0, 0), e2 ie2

1 1 2 1 1
= (1, −1, 0) + (1, 1, −2) = ( , − , − ).
2 6 3 3 3
Now for some more serious applications of the theory.

7.6 Least squares approximation

Problem: find the line through (0, 0) (to be varied later) which “best approximates”
the data (x1 , y1 ), . . . , (xn , yn ). We would like yi = cxi for each i, but we don’t know c
and the points won’t always lie exactly on a line.

We decide to minimize i=1 (yi − cxi )2 , least squares approximation, useful in statis-
tical applications.

This is the same as taking x = (x1 , . . . , xn ) and y = (y1 , . . . , yn ) in Rn and minimizing
ky − cxk.
Take V to be Rn , usual inner product, and W to be the one-dimensional subspace
{ax : a ∈ R}. This is the same as finding the closest point to y in W .

Solution: take
hy, xi
c= ,
hx, xi
since this is the orthogonal projection onto W . In detail, w = cx, and the normal
equation is hw, xi = hy, xi, or chx, xi = hy, xi.
x1 y1 + . . . + xn yn
c= .
x21 + . . . + x2n
Example: find the best fit to the data
x y
2 3
1 2
3 3
4 5
2.3 + 1.2 + 3.3 + 4.5 37
c= = .
22 + 12 + 32 + 42 30
7.7 Generalization
Suppose that y is known/guessed to be a linear combination of m variables x1 , . . . , xm ,
y = c1 x1 + . . . + cm xm , so we have experimental data
x1 x2 ... xm y
x11 x21 ... xm1 y1
.. .. .. ..
. . . .
x1n x2n . . . xmn yn

Set up the problem in Rn and choose c1 , . . . , cm to minimize ky − (c1 x1 + . . . + cm xm )k.

If W = span{x1 , . . . , xm }, then we want the closest point in W to y.
We know from (7.3) that the constants c1 , . . . , cm are determined by the normal equa-
tions: m
h ck xk , xi i = hy, xi i for each i,


c1 hx1 , x1 i + . . . + cm hxm , x1 i = hy, x1 i

... ... = ...
c1 hx1 , xm i + . . . + cm hxm , xm i = hy, xm i

To get a unique solution we need the vectors x1 , . . . , xm to be independent, which re-
quires n ≥ m.

Example: Use the method of least squares approximation to find the best relation of
the form y = c1 x1 + c2 x2 fitting the following experimental data:
x1 x2 y
i) 1 0 2
ii) 0 1 3
iii) 1 1 2
iv) 1 −1 0

Solution: We work in R4 and take x1 = (1, 0, 1, 1), x2 = (0, 1, 1, −1) and y =

(2, 3, 2, 0). Normal equations are

3c1 + 0c2 = 4
0c1 + 3c2 = 5,

so c1 = 4/3 and c2 = 5/3. So the best relation is y = 34 x1 + 53 x2 , giving

x1 x2 yexperimental ytheoretical
1 0 2 4/3
0 1 3 5/3
1 1 2 3
1 −1 0 −1/3
7.8 Curve fitting
Given (x1 , y1 ), . . . , (xn , yn ) find a (polynomial) curve which fits these points well in the
sense of least squares approximation.
Example: Find the parabola y = c0 + c1 x + c2 x2 which best fits the points (0, 0),
(1, 4), (−1, 1), (−2, 5).
Solution: Apply the method of least squares approximation to y = c0 x0 + c1 x1 + c2 x2
with x0 = 1, x1 = x and x2 = x2 .
Put x0 = (1, 1, 1, 1), x1 = (0, 1, −1, −2), x2 = (0, 1, 1, 4), y = (0, 4, 1, 5). Note that x0
is the vector with all components 1, x1 the vector of x values, and and x2 the vector
of x2 values.
Normal equations are

4c0 − 2c1 + 6c2 = 10,

−2c0 + 6c1 − 8c2 = −7,
6c0 − 8c1 + 18c2 = 25,

from which c0 = 3/10, c1 = 8/5 and c2 = 2.

Example: Find the line y = c0 + c1 x which best fits the points (2, 3), (1, 2), (3, 3) and
(4, 5). (Data used earlier to get y = cx only.)

Solution: let x0 = (1, 1, 1, 1), x1 = (2, 1, 3, 4) and y = (3, 2, 3, 5). So we want
y ≈ c0 x 0 + c1 x 1 .
Normal equations are
4c0 + 10c1 = 13
10c0 + 30c1 = 37,
giving c0 = 1 and c1 = 9/10, or y = 1 + 10


8 Cauchy sequences and completeness

Recall that if (X, d) is a metric space, then a sequence (xn ) of elements of X converges
to x ∈ X if d(xn , x) → 0, i.e., if given ε > 0 there exists N such that d(xn , x) < ε
whenever n ≥ N .

Often we think of convergent sequences as ones where xn and xm are close together
when n and m are large. This is almost, but not quite, the same thing.

8.1 Definition
A sequence (xn ) in a metric space (X, d) is a Cauchy sequence if for any ε > 0 there is
an N such that d(xn , xm ) < ε for all n, m ≥ N .

Example: take xn = 1/n in R with the usual metric. Now d(xn , xm ) = n1 − 1
Suppose that n and m are both at least as big as N ; then d(xn , xm ) ≤ 1/N .

[DIAGRAM, showing the points]

Hence if ε > 0 and we take N > 1/ε, we have d(xn , xm ) ≤ 1/N < ε whenever n and m
are both ≥ N .

In fact all convergent sequences are Cauchy sequences, by the following result.

8.2 Theorem
Suppose that (xn ) is a convergent sequence in a metric space (X, d), i.e., there is a
limit point x such that d(xn , x) → 0. Then (xn ) is a Cauchy sequence.

Proof: take ε > 0. Then there is an N such that d(xn , x) < ε/2 whenever n ≥ N .
Now suppose both n ≥ N and m ≥ N . Then
d(xn , xm ) ≤ d(xn , x) + d(x, xm ) = d(xn , x) + d(xm , x) < ε/2 + ε/2 = ε,

and we are done.

8.3 Proposition
Every subsequence of a Cauchy sequence is a Cauchy sequence.

Proof: if (xn ) is Cauchy and (xnk ) is a subsequence, then given ε > 0 there is an N
such that d(xn , xm ) < ε whenever n, m ≥ N . Now there is a K such that nk ≥ N
whenever k ≥ K. So d(xnk , xnl ) < ε whenever k, l ≥ K.

Does every Cauchy sequence converge?

Examples: 1. (X, d) = Q, as a subspace of R with the usual metric. Take x0 = 2 and

define xn+1 = x2n + x1n . The sequence continues 3/2, 17/12, 577/408, . . . and indeed
xn → x where x = x2 + x1 , i.e., x2 = 2. But this isn’t in Q.

Thus (xn ) is Cauchy in R, since it converges to 2 when we think of it as a sequence
in R. So it is Cauchy in Q, but doesn’t converge to a point of Q.

2. Easier. Take (X, d) = (0, 1). Then n1 is a Cauchy sequence in X (since it is

Cauchy in R, as seen above), and has no limit in X.

In each case there are “points missing from X”.

8.4 Definition
A metric space (X, d) is complete if every Cauchy sequence in X converges to a limit
in X.

Is R complete? What do we mean by R? We could regard it as the set of all infinite

decimal numbers; but since there is an ambiguity e.g. 0.999. . . = 1.000. . . , we have
to allow for this, e.g. by regarding all the recurring-9 numbers as the same as the
corresponding recurring-0 numbers.

Cauchy sequences can be awkward, e.g. xn = 21 +(−1)n 101n , i.e., 0.4, 0.51, 0.499, 0.5001,
0.49999, . . . , will converge to 0.5, even though the individual digits do not converge.

8.5 Theorem
R is complete.

We do this in several stages.

A: Every bounded increasing or decreasing sequence in R converges. Increasing means
x1 ≤ x2 ≤ . . . and you can guess what decreasing means. Monotone means either
increasing or decreasing.
B: Every Cauchy sequence in R is bounded.
C: Every sequence in R has a monotone subsequence.
D: If a Cauchy sequence has a convergent subsequence, then the original sequence
E: R is complete.

Proof of E: let (xn ) be a Cauchy sequence in R. By (C) it has a monotone subse-

quence (xnk ), which is also Cauchy by (8.3). By (B) this sequence is bounded. So by
(A) it converges. Now by (D) the original sequence converges.

Proof of A: we can take this as an axiom of R, or observe that if the numbers are
increasing and bounded, then eventually the integer parts are constant, then the first
digit after the decimal point, then the second, . . . , so it is clear what number we want
as our limit. But if xn agrees with x to k decimal places then |xn − x| < 10−k ; this
shows that xn → x. √
Example, 1.0, 1.2, 1.4, 1.41, 1.412, 1.414, 1.4141, 1.4142, ... is homing in on 2.

Proof of B: if (xn ) is Cauchy, then with ε = 1 we know that |xm − xn | < 1 when-
ever m, n ≥ N . Now |xn | ≤ |xn − xN | + |xN | < 1 + |xN | for all n ≥ N . Let
K = max{|x1 |, |x2 |, . . . , |xN −1 |, 1 + |xN |}. Then |xn | ≤ K for all n.

Proof of D: suppose that (xn ) is Cauchy in (X, d) and limk→∞ xnk = y. Take ε > 0.
Then there exists N such that d(xm , xn ) < ε/2 whenever m, n ≥ N ; and K such that
d(xnk , y) < ε/2 whenever k ≥ K. Choose k ≥ K such that nk ≥ N . Then for n ≥ N ,

d(xn , y) ≤ d(xn , xnk ) + d(xnk , y) < ε,

and so d(xn , y) → 0 as n → ∞.

Proof of C: let (xn ) be a sequence in R. We say that xm is a peak point of the sequence
if xm ≥ xn for all n > m.


Case 1: only finitely many peak points. Choose n1 large so that xn is not a peak point
for any n ≥ n1 .
Since xn1 is not a peak point we can find n2 > n1 with xn2 > xn1 ;
since xn2 is not a peak point we can find n3 > n2 with xn3 > xn2 ; and so on.

Now (xnk ) is strictly increasing.

Case 2: (xn ) has infinitely many peak points, say, xn1 , xn2 , . . . , with n1 < n2 < . . ..
Now xn1 ≥ xn2 ≥ . . ., so (xnk ) is a decreasing subsequence.

We have finished the proof that R is complete.

8.6 Corollary
A subset X ⊂ R is complete if and only if it is closed.

Proof: If X is not closed, then X 6= X, so there is a point y ∈ R such that y ∈ X \X.

There is a sequence (xn ) in X that converges to y, by Theorem 2.2. Then (xn ) is a
Cauchy sequence by Theorem 8.2, but it does not have a limit in X, so X is not com-

Conversely, if X is closed and (xn ) is a Cauchy sequence in X, then it has a limit y in

R, since R is complete, by Theorem 8.5. But then y ∈ X by Theorem 2.2, so y ∈ X
since X is closed. Hence X is complete.

Examples: open intervals in R are not complete; closed intervals are complete.

What about C[a, b] with d1 , d2 or d∞ ?

Define fn in C[0, 2] by 
xn for 0 ≤ x ≤ 1,
fn (x) =
1 for 1 ≤ x ≤ 2.

Z 2
d1 (fn , fm ) = |fn (x) − fm (x)| dx
Z 1
= |xn − xm | dx
Z0 1
= (xm − xn ) dx if n ≥ m
1 1 1
= − ≤ → 0,
m+1 n+1 m+1
and hence (fn ) is Cauchy in (C[0, 2], d1 ). Does the sequence converge?

Z 2
If there is an f ∈ C[0, 2] with fn → f as n → ∞, then |fn (x) − f (x)| dx → 0, so
Z 1 Z 2 0

and both tend to zero. So fn → f in (C[0, 1], d1 ), which means that f (x) = 0
0 1
on [0, 1] (from an example we did earlier). Likewise, f = 1 on [1, 2], which doesn’t give
a continuous limit.

Similarly, (C[a, b], d1 ) is incomplete in general. Also it is incomplete in the d2 metric,

as the same example shows (a similar calculation with squares of functions).

What about d∞ ?

8.7 Definition
A sequence (fn ) of (not necessarily continuous) functions defined on [a, b] is said to
converge uniformly to f if sup{|fn (x) − f (x)| : x ∈ [a, b]} → 0 as n → ∞. (If these
are continuous functions, then this is just convergence in the d∞ metric.)

8.8 Theorem
If (fn ) are continuous functions and fn → f uniformly, then f is also continuous.

Proof: Take ε > 0 and a point x ∈ [a, b]. Then there is an N such that |fn (t) − f (t)| <
ε/3 for all t ∈ [a, b] whenever n ≥ N . Now fN is continuous, so we can choose δ > 0
such that |fN (t) − fN (x)| < ε/3 for all t ∈ [a, b] with |t − x| < δ. Then

|f (t) − f (x)| ≤ |f (t) − fN (t)| + |fN (t) − fN (x)| + |fN (x) − f (x)|
≤ ε/3 + ε/3 + ε/3 = ε

whenever t ∈ [a, b] and |t − x| < δ. Hence f is continuous at x.

Thus, for
 example, the functions fn (t) = t converge pointwise on [0, 1] to
0 for 0 ≤ t < 1,
g(t) = but g is not continuous, so the convergence isn’t uniform.
1 for t = 1,


8.9 Theorem
(C[a, b], d∞ ) is a complete metric space.

Proof: take a Cauchy sequence (fn ) in (C[a, b], d∞ ). The proof goes in two steps.

I: For each x ∈ [a, b], (fn (x)) is a Cauchy sequence in R, and so has a limit, which we
call f (x).

II: fn → f uniformly; hence f ∈ C[a, b] and d∞ (fn , f ) → 0.

Step I: given ε > 0 there is an N with d∞ (fn , fm ) < ε for n, m ≥ N , since (fn ) is
Cauchy. But |fn (x) − fm (x)| ≤ d∞ (fn , fm ) and so this is also < ε for n, m ≥ N . So
(fn (x)) is a Cauchy sequence in R. Since R is complete by (8.5), we see that there is
a limiting value f (x).

Step II: take ε > 0 and N as in Step I. Then |fn (x) − fm (x)| < ε for each x, provided
that n, m ≥ N . Fix n ≥ N and let m → ∞. We conclude that |fn (x) − f (x)| ≤ ε for
each x, provided that n ≥ N . This is just the uniform convergence of fn to f . So f is
continuous, i.e., f ∈ C[a, b] by (8.8), and d∞ (fn , f ) → 0.

8.10 Remark
Note that R2 is also complete with any of the metrics d1 , d2 and d∞ ; since a Cauchy/
convergent sequence (vn ) = (xn , yn ) in R2 is just one in which both (xn ) and (yn ) are
Cauchy/ convergent sequences in R (cf. Prop. 2.4).

Similar arguments show that Rk is also complete for k = 1, 2, 3, . . ., and (with the same
proof as for Corollary 8.6) all closed subsets of Rk are complete.

9 Contraction mappings
Our aim is to use metric spaces to solve equations by using an iterative method to get
approximate solutions.

9.1 Examples
1. x3 + 2x2 − 8x + 4 = 0. Rewrite this as x = 18 (x3 + 2x2 + 4).

Consider the function φ : R → R given by φ(x) = 81 (x3 + 2x2 + 4).

Then x is a root of our equation if and only if φ(x) = x, i.e., x is a fixed point of φ.

Guess a solution x0 ; then let x1 = φ(x0 ), x2 = φ(x1 ), . . .. This gives a sequence of

numbers x0 , x1 , x2 , . . . , xn , xn+1 = φ(xn ), . . ..
If these terms converge to a limit, then this limit should be a solution.
e.g. take x0 = 0, then x1 = 0.5, x2 = 0.578, x3 = 0.608, x4 = 0.621, x5 = 0.626,
x6 = 0.629, x7 = 0.630, x8 = 0.630.
2. = x(x + y), for 0 ≤ x ≤ 1, with y(0) = 0.

Rewrite as Z x
y(x) = t(t + y(t)) dt.
Define Z x
φ(f )(x) = t(t + f (t)) dt.
So y = f (x) solves the original equation if and only if φ(f ) = f .
Again, try to find the solution as the limit of a sequence. Take f0 (x) = 0 for 0 ≤ x ≤ 1.
Then Z x Z x
f1 = φ(f0 ), i.e., f1 (x) = t(t + f0 (t)) dt = t2 dt = .
0 0 3

x x
t3 x3 x5
f2 = φ(f1 ), i.e., f2 (x) = t(t + f1 (t)) dt = t(t + ) dt = + .
0 0 3 3 15
t3 t5 x3 x5 x7
f3 = φ(f2 ), i.e., f3 (x) = + ) dt =t(t ++ + .
0 3 15 3 15 105
Suppose we have a metric space (X, d) and a function φ : X → X. Choose x0 ∈ X
and define xn = φ(xn−1 ) for n ≥ 1. This gives a sequence (xn ); if it is Cauchy and
(X, d) is complete, then x = limn→∞ xn exists and x should solve x = φ(x). How can
we guarantee that (xn ) will be Cauchy?

Note that d(xn , xn+1 ) = d(φ(xn−1 ), φ(xn )), so to get (xn ) Cauchy we want φ to shrink
distances. Let’s call φ : X → X a shrinking map if d(φ(y), φ(z)) < d(y, z) for all
y, z ∈ X with y 6= z.


Take X = [1, ∞), regarded as a subspace of R, usual metric. It is complete. Define

φ : X → X by φ(x) = x + x1 . How can we check that φ is a shrinking map? Answer:
use the Mean Value Theorem (MVT)!
|φ(x) − φ(y)|
= |φ0 (c)|
|x − y|
for some c between x and y. Now φ0 (c) = 1 − 1/c2 , so |φ0 (c)| < 1 for all c ∈ X.
Hence |φ(x) − φ(y)| < |x − y| for any x 6= y, and so φ is a shrinking map.

Take x0 = 1, x1 = φ(x0 ) = 2, x2 = φ(x1 ) = 2 + 1/2 = 5/2, x3 = φ(x2 ) = 29/10,

x4 = φ(x3 ) = 941/290, . . . . Clearly (xn ) is increasing. If it remains bounded, then it
has a limit, `, say. Then we shall have ` = ` + 1/`, which is impossible. So (xn ) is
unbounded, doesn’t converge, isn’t Cauchy. Too bad!


9.2 Definition
Let (X, d) be a metric space. A map φ : X → X is a contraction mapping, if there
exists a constant k < 1 such that
d(φ(x), φ(y)) ≤ kd(x, y) for all x, y ∈ X.
1. Take X = [0, 1], usual metric, and φ(x) = x2 /3. Then
2 2

x y 1 2
d(φ(x), φ(y)) = − = (x + y)(x − y) ≤ |x − y|.
3 3 3 3
So φ is a contraction mapping, with k = 2/3.

2. Take X = R and φ(x) = 41 sin 3x. So |φ(x) − φ(y)| = 14 | sin 3x − sin 3y|.
Use MVT! φ0 (x) = 43 cos 3x, so |φ0 (x)| ≤ 34 , and
|φ(x) − φ(y)| = |φ0 (c)| |x − y| ≤ |x − y|, etc.
3. Take X = [1, ∞) and φ(x) = x + 1/x. Suppose x = n and y = n + 1; then
1 1 1
φ(y) − φ(x) = n + 1 + −n− =1− ,
n+1 n n(n + 1)
so |φ(y) − φ(x)|/|y − x| can be made as close to 1 as we like by taking x = n and
y = n + 1 for n large. Thus φ (which is a shrinking mapping) is not a contraction

9.3 Theorem
Let (X, d) be a metric space and let φ : X → X be a contraction mapping. Then
for each x0 ∈ X the sequence defined by xn = φ(xn−1 ) (for each n ≥ 1) is a Cauchy

Proof: for some k < 1 we have d(φ(x), φ(y)) ≤ kd(x, y). So

d(x2 , x1 ) ≤ kd(x1 , x0 ) = kd, say,
d(x3 , x2 ) ≤ kd(x2 , x1 ) ≤ k d,
... ≤ ...
d(xn+1 , xn ) ≤ k n d, and so on.
Hence for n > m we have
d(xn , xm ) ≤ d(xm , xm+1 ) + d(xm+1 + xm+2 ) + . . . + d(xn−1 , xn )
≤ k m d + k m+1 d + . . . + k n−1 d from the above
k d
≤ k m d(1 + k + k 2 + . . .) = ,

which tends to 0 as m → ∞. Thus (xn ) is Cauchy.

Note: in the above theorem, if (X, d) is complete, then (xn ) will converge to a limit
x ∈ X. Note that x is a fixed point of φ, i.e., φ(x) = x, since

d(x, φ(x)) ≤ d(x, xn ) + d(xn , φ(xn )) + d(φ(xn ), φ(x))

≤ d(x, xn ) + d(xn , xn+1 ) + kd(xn , x),

and each term tends to 0 as n → ∞. So d(x, φ(x)) = 0, i.e., x = φ(x).

9.4 Theorem (Banach’s Contraction Mapping Theorem) (CMT)

Let (X, d) be a complete metric space, and let φ : X → X be a contraction mapping.
Then φ has a unique fixed point. If x0 is any point in X then the sequence defined by
xn = φ(xn−1 ) (for n ≥ 1) converges to the fixed point.

Proof: by Theorem 9.3 and the note following it, we have proved everything except
the fact that there is only one fixed point for φ. But if x and y are fixed points, then

d(x, y) = d(φ(x), φ(y)) ≤ kd(x, y),

with k < 1; this can only happen if d(x, y) = 0, i.e., x = y.

How to apply the CMT: suppose we want to solve the equation φ(x) = x, where φ
is a contraction mapping. Take x0 ∈ X and construct (xn ) as above. Then (xn ) tends
to a solution x.

Note that in an incomplete metric space, there may be problems. For example take
X = (0, 1) ⊂ R and φ(x) = x/2. The iterates form a Cauchy sequence but the limit,
0, isn’t in the space, and there is no fixed point in the space.

9.5 How fast does it converge?

Answer: d(x1 , x) = d(φ(x0 ), φ(x)) ≤ kd(x0 , x), and in general d(xn , x) ≤ k n d(x0 , x).
Also d(x0 , x) ≤ d(x0 , x1 )+d(x1 , x) ≤ d(x0 , x1 )+kd(x0 , x), so (1−k)d(x0 , x) ≤ d(x0 , x1 ),
and we conclude that
d(xn , x) ≤ d(x0 , x1 ),
so we can choose n large to make this as small as we wish.


1. Show that x3 + 2x2 − 8x + 4 = 0 has a unique solution in [−1, 1], and find it correct
to within ±10−6 .

Solution: write equation as x = 81 (x3 + 2x2 + 4), and let φ(x) = 18 (x3 + 2x2 + 4) for
−1 ≤ x ≤ 1. Note that if |x| ≤ 1 then
1 7
|φ(x)| ≤ (|x|3 + 2|x| + 4) ≤ ,
8 8
so φ does map [−1, 1] to itself. Then

1 2
φ (x) = (3x + 4x) ≤ ,
8 8

for x ∈ [−1, 1], so φ is a contraction mapping with k = 7/8. It has a unique fixed
point, as required.

Take x0 = 0. Defining xn = φ(xn−1 ), we get x1 = 0.5, x2 = 0.578, etc. as in Examples
9.1. The sequence converges to 0.6308976, although convergence is slow, since k = 7/8,
so the error after n steps is only bounded by
kn 7
|xn − x| ≤ |x0 − x1 | = 4 .
1−k 8

2. Define φ : C[0, 1] → C[0, 1] by

Z x
φ(f )(x) = t(t + f (t)) dt.

Show φ is a contraction mapping for the metric d∞ , and use φ to find an approximate
solution y to the differential equation
= x(x + y), y(0) = 0, (0 ≤ x ≤ 1).
d∞ (φ(f ), φ(g)) = max{|φ(f )(x) − φ(g)(x)| : 0 ≤ x ≤ 1}.
Z x Z x

|φ(f )(x) − φ(g)(x)| =
t(t + f (t)) dt − t(t + g(t)) dt

0 0
Z x

t(f (t) − g(t)) dt

Z x
≤ t|f (t) − g(t)| dt
Z0 x
≤ td∞ (f, g) dt = x2 d∞ (f, g).
0 2

Thus d∞ (φ(f ), φ(g)) ≤ 12 d∞ (f, g), and φ is a contraction map with k = 1/2.

If y = f (x) is a solution of the diff. eq. then f 0 (t) = t(t + f (t)) and f (0) = 0. Integrate
from 0 to x: Z x Z x
f (t) dt = t(t + f (t)) dt,
0 0

i.e., f (x) = φ(f )(x). So f = φ(f ) and f is a fixed point of φ.

So CMT says that the d.e. has a unique solution, which we can obtain by iteration.
We did this in Examples 9.1 as well. Note that f0 = 0, f1 (x) = x3 /3, so d∞ (f0 , f1 ) = 13 ,
and in general d∞ (fn , f ) ≤ 3.21n−1 , by 9.5.
2 /2
Another example: f 0 (t) = t(1 + f (t)), for t ∈ [0, 1], with f (0) = 0, f (t) = et −1
is the actual (unique) solution.
Z x
Take f0 (x) = 0, f1 (x) = t(1 + f0 (t)) dt = ,
Z x 0 2
x2 x4
f2 (x) = t(1 + f1 (t)) dt = + , etc.
0 2 8

9.6 General method

= F (x, y), y(a) = c, a ≤ x ≤ b,
where F (x, y) is a real-valued function defined for a ≤ x ≤ b and y ∈ R.

If y = f (x) is a solution, then

f 0 (t) = F (t, f (t)), f (a) = c, (t ∈ [a, b]), (1)

or, equivalently, Z x
f (x) = c + F (t, f (t)) dt. (2)

Define φ : C[a, b] → C[a, b] by

Z x
φ(f )(x) = c + F (t, f (t)) dt.

So f is a solution of (1) ⇐⇒ f is a solution of (2) ⇐⇒ f is a fixed point of φ.

If φ is a contraction for the d∞ metric on C[a, b], then by CMT (1) has a unique
solution, which is the limit of a sequence (fn ), where f0 ∈ C[a, b] is arbitrary and
fn = φ(fn−1 ) for n ≥ 1.
Also d∞ (fn , f ) ≤ 1−k d(f0 , f1 ), where k is the contraction constant of φ.

So when is φ a contraction map?

Note that some differential equations don’t have solutions everywhere we might want
them; e.g. f 0 (t) = −f (t)2 for t ∈ [0, 2], with f (0) = −1. The only solution is f (t) =
1/(t − 1), which is discontinuous at t = 1.

9.7 Theorem
With F and φ as above, suppose there is a constant k < 1 such that
|F (x, y1 ) − F (x, y2 )| ≤ |y1 − y2 | for all x ∈ [a, b], y1 , y2 ∈ R.
Then φ is a contraction mapping on (C[a, b], d∞ ).

Proof: For f, g ∈ C[a, b],

Z x

|φ(f )(x) − φ(g)(x)| = (F (t, f (t)) − F (t, g(t)) dt

Z ax
≤ |F (t, f (t)) − F (t, g(t))| dt
Z x
≤ |f (t) − g(t)| dt
b−a a
Z x
≤ d∞ (f, g) dt
b−a a
= k d∞ (f, g),
so d∞ (φ(f ), φ(g)) ≤ kd∞ (f, g), as required.

9.8 Definition
A function f : [a, b] → R satisfies the Lipschitz condition with constant m if
|f (x1 ) − f (x2 )| ≤ m|x1 − x2 | for all x1 , x2 ∈ [a, b].

If f is differentiable on [a, b] and m = max{|f 0 (t)| : t ∈ [a, b]}, then f satisfies the
Lipschitz condition with constant m, since the Mean Value Theorem gives, for some c
between x1 and x2 ,

|f (x1 ) − f (x2 )| = |(x1 − x2 )f 0 (c)| ≤ m|x1 − x2 |.

Similarly, if we have a function F (x, y), we say that it satisfies the Lipschitz condition
in y with constant m if

|F (x, y1 ) − F (x, y2 )| ≤ m|y1 − y2 |

for all x and for all y1 and y2 for which the above is defined.
If we have partial derivatives, we can take
m = max : a ≤ x ≤ b, y ∈ R .
9.9 Theorem
If F satisfies the Lipschitz condition in y with a constant m < b−a , then the differential
equation y = F (x, y), y(a) = c has a unique solution for a ≤ x ≤ b.

Proof: use Theorem 9.7 writing m = b−a
with k < 1.

In fact if it satisfies the Lipschitz condition with any constant m at all, we can still
solve the equation. What we do is to solve it in C[a, a + δ], where m < 1δ , and ob-
tain a value y(a+δ). We then solve it in C[a+δ, a+2δ], and keep going until we get to b.

[DIAGRAM: lots of pieces joined together.]

1. = cos(x2 y), y(0) = 2, for 0 ≤ x ≤ 1.
Here F (x, y) = cos(x2 y), and = −x2 sin(x2 y), so

∂F 2
∂y ≤ x ≤ 1. (3)

Thus F satisfies the Lipschitz condition in y with constant m = 1. Not good enough
for the theorem to apply on [0, 1] (although we could use [0, 21 ] and [ 12 , 1], as above).

But if Z x
φ(f )(x) = 2 + cos(t2 f (t)) dt,
Z x

|φ(f )(x) − φ(g)(x)| = (cos(t2 f (t)) − cos(t2 g(t)) dt

Z 0x
≤ |F (t, f (t)) − F (t, g(t))| dt
Z0 x
≤ t2 |f (t) − g(t)| dt by (3),
Z0 x
≤ t2 d∞ (f, g) dt = x3 d∞ (f, g).
0 3
So d∞ (φ(f ), φ(g)) ≤ 13 d∞ (f, g), and so φ is a contraction map.

dy √
= y, y(0) = 0, 0 ≤ x ≤ 1. (4)
∂F 1
Here F (x, y) = y 1/2 and = y −1/2 , unbounded.
∂y 2
So F does not satisfy a Lipschitz condition in y at all. For any c ∈ (0, 1] we can define

0 if x ≤ c,
fc (x) = 1 2
(x − c) if c ≤ x ≤ 1.
[DIAGRAM: constant on [0, c], parabola rising on [c, 1].]

Then fc (x) satisfies (4), so there is no unique solution.

A new type of example: consider φ : R → R defined by φ(x) = cos x. Then φ is a

shrinking map but not a contraction map, since
|φ(x) − φ(y)|
= | sin z|
|x − y|
for some z between x and y, by the Mean value Theorem. This is at most 1 (so shrink-
ing), but can be close to 1 if x and y are close to π/2, for example.

[DIAGRAM: plot y = x, y = cos x; curves meet once between 0 and π/2.]

We shall see that nevertheless φ has a unique fixed point.

We see that φ2 : R → R, where φ2 (x) = φ(φ(x)), is cos(cos(x)), and this is a contraction

map, since
| cos(cos x) − cos(cos y)|
= | sin(cos z). sin z|,
|x − y|
by the MVT, and this is at most sin 1, since cos z lies between −1 and 1. But sin 1 is
about 0.8415, anyway it’s less than 1.

The following theorem shows that φ has a unique fixed point, given by iteration:
x0 = 0, x1 = φ(x0 ) = 1, x2 = φ(x1 ) = 0.54, etc. (keep hitting cos button on calculator,
working in radians), and this converges to 0.7390851 . . ..

9.10 Theorem
If (X, d) is a complete metric space and φ : X → X is a map such that some iterate
φm of φ is a contraction map, then φ has a unique fixed point. For any x0 ∈ X the
sequence (xn ) = (φn (x0 )) converges to the fixed point.

Proof: by the CMT applied to φm , we get a unique fixed point x for φm . So x = φm (x).
Apply φ, then
φ(x) = φ(φm (x)) = φm+1 (x) = φm (φ(x)),

so that φ(x) is also a fixed point of φm . By the uniqueness, φ(x) = x, so x is a fixed
point of φ as well.

If x and y are fixed points of φ, then x and y are fixed points of φm , which is a con-
traction mapping, and so x = y. Hence φ has a unique fixed point.

Sketch of last assertion: let’s do m = 3 for illustration (the general case is similar, with
more complicated notation). We have, by the CMT for φm :

x0 , x3 , x6 , . . . , x3k , . . . → x,
x1 , x4 , x7 , . . . , x3k+1 , . . . → x,
x2 , x5 , x8 . . . , x3k+2 , . . . → x.

This implies that the single sequence x0 , x1 , x2 , . . . tends to x, since given ε > 0, we
have d(x3k , x) < ε for k > k0 , say, d(x3k+1 , x) < ε for k > k1 , say, and d(x3k+2 , x) < ε
for k > k2 , say. So d(φN (x0 ), x) < ε for N > max{3k0 , 3k1 + 1, 3k2 + 2}.


9.11 The final word on differential equations (calculation non-examinable)

Given the differential equation
= F (x, y), y(a) = c, (a ≤ x ≤ b), (5)
we define Z x
φ(f )(x) = c + F (t, f (t)) dt,
as usual. Suppose that F satisfies the Lipschitz condition in y with constant m. We
shall see that some iterate of φ is a contraction mapping.

As before we calculate
Z x
|φ(f )(x) − φ(g)(x)| ≤ |F (t, f (t)) − F (t, g(t))| dt
Z x
≤ m |f (t) − g(t)| dt, by the Lipschitz condition (6)
Z x
≤ m d∞ (f, g) dt
Z x
= md∞ (f, g) dt = md∞ (f, g)(x − a). (7)


|φ2 (f )(x) − φ2 (g)(x)| = |φ(φ(f ))(x) − φ(φ(g))(x)|

Z x
≤ m |φ(f )(t) − φ(g)(t)| dt, by (6) replacing f , g by φ(f ), φ(g)
Z x
≤ m md∞ (f, g)(t − a) dt, by (7)
Z x
2 (x − a)2
= m d∞ (f, g) (t − a)dt = m2 d∞ (f, g) . (8)
a 2
Once more:
Z x
3 3
|φ (f )(x) − φ (g)(x)| ≤ m |φ2 (f )(t) − φ2 (g)(t)| dt, by (6) replacing f , g by φ2 (f ), φ2 (g)
Za x
(t − a)2
≤ m m2 d∞ (f, g) dt, by (8)
a 2
(x − a)3
= m3 d∞ (f, g) .
In general we obtain
mn (x − a)n
|φn (f )(x) − φn (g)(x)| ≤ d∞ (f, g),
mn (b − a)n
d∞ (φn (f ), φn (g)) ≤ d∞ (f, g).
P xn n n
Now n!
converges for any x, so the terms tend to zero, and so m (b−a)
tends to
mN (b−a)N
zero for all choices of m, a and b. If we choose N so that N!
< 1, then φN is a
contraction mapping. We thus have:

9.12 Theorem
If F (x, y) satisfies the Lipschitz condition in y for some constant m, where a ≤ x ≤ b
and y ∈ R, then the differential equation (5) has a unique solution which can be ap-
proximated by iteration.

10 Connectedness
10.1 Definition
A metric space X is disconnected if ∃U , V , open, disjoint, nonempty, such that
X = U ∪ V . Note that U and V will also be closed, as their complements are
open. Otherwise X is connected. A subset is connected/disconnected if it is con-
nected/disconnected when we restrict the metric to the subset to get a (smaller) metric


[DIAGRAM in R2 ]

Examples: (i) X with the discrete metric is disconnected if #X > 1.

(ii) In R, consider A = [0, 1] ∪ [2, 3]. This splits
√ into A ∩ (−∞,
√ 3/2) and A ∩ (3/2, ∞).
(iii) Q is disconnected; splits into Q ∩ (−∞, 2) and Q ∩ ( 2, ∞).
(iv) R is connnected – see later.

10.2 Definition
An interval in R is a set S such that if s, t ∈ S then [s, t] ⊂ S.

Examples are (a, b), [a, b], (a, b], [a, b), (−∞, b), (−∞, b], (a, ∞), [a, ∞), with a, b finite;
also ∅ and R. These are all the examples possible.

We want to show that the connected subsets of R (usual metric) are precisely the in-

10.3 Lemma
Let x, y ∈ R with x < y, let U, V be disjoint open sets in R with x ∈ U and y ∈ V .
Then there is a z ∈ (x, y) with z 6∈ U ∪ V .

Proof: Let T = {t < y : t ∈ U }. Now x ∈ T and so T 6= ∅, and it is bounded above

by y, so it has a least upper bound, z = sup T .

We can’t have z ∈ U or else a neighbourhood (z − δ, z + δ) is contained in U , which

means that z isn’t the least upper bound of T .

We can’t have z ∈ V , or else a neighbourhood (z − δ, z + δ) is contained in V (and so

doesn’t meet U ), and again z isn’t the least upper bound of T .

Thus x < z < y and z 6∈ U ∪ V .

N.B. The same result holds assuming only that [x, y] ∩ U and [x, y] ∩ V are disjoint,
which is the form we shall require.

10.4 Theorem
A subset S of R is connected if and only if it is an interval.

Proof: (i) If S is not an interval, then there are x, y ∈ S with [x, y] 6⊂ S, so there is
a z ∈ (x, y), z 6∈ S.

Now take U = S ∩ (−∞, z) and V = S ∩ (z, ∞); we see that this disconnects S.

(ii) Suppose that S is an interval and U, V are open in R with (U ∩ S) ∩ (V ∩ S) = ∅,

but U ∩ S and V ∩ S nonempty, and S ⊂ U ∪ V .
Take x ∈ U ∩ S and y ∈ V ∩ S. By Lemma 10.3, there is a z ∈ (x, y) with z 6∈ U ∪ V .
But z ∈ S ⊂ U ∪ V , a contradiction. Hence the result.

The intersection of two connected sets needn’t be connected [picture in R2 ] and nor is
the union of two connected sets (e.g. (0, 1) ∪ (2, 3)). (In R, however, the intersection
of two connected sets is connected, since these are just intervals.)

Unions are OK if there is a point in common, as we see next.


10.5 Remark
Let Y ⊂ X, where X is a metric space. Then a subset S ⊂ Y is open (regarded as a
subset of Y ) if and only if S = U ∩ Y , where U is an open subset of X. This follows
easily on noting that every open subset in Y is a union of balls BY (s, ε) for s ∈ S, and
BY (s, ε) = BX (s, ε) ∩ Y .
So, for example, we can say that [0, 1] and [2, 3] are open subsets of the metric space
Y = [0, 1] ∪ [2, 3], although not open when regarded as subsets of X = R, since, for
example [0, 1] = (−∞, 23 ) ∩ Y .

10.6 Theorem

S x ∈ X. Let {Sλ : λ ∈ Λ} be a family of connected

Let (X, d) be a metric space and
sets, each containing x. Then λ∈Λ Sλ is connected.
Proof: Let U, V be open, with Sλ ⊂ U ∪ V , and U ∩ V ∩ Sλ = ∅. WLOG,
x ∈ U and x 6∈ V .

For each λ,
Sλ = (U ∩ Sλ ) ∪ (V ∩ Sλ ),
| {z }
nonempty, contains x
so since Sλ is connected, V ∩ Sλ = ∅ for each λ, and so V ∩ Sλ = ∅.
Hence Sλ is connected.

So every point x ∈ X is contained in a maximal connected subset, the connected
component containing x, namely
Cx = {S ⊂ X, x ∈ S, S connected}.

Of course {x} itself is one such connected set S, so this is not an empty union.

Now for x, y ∈ X, either Cx = Cy or else Cx ∩ Cy = ∅. For otherwise, Cx ⊂ Cx ∪ Cy ,

which would be a strictly larger connected set containing x, by Theorem 10.6, and this
contradicts the definition of Cx .
Hence, we can write X as a disjoint union of connected components, and these are the
maximal connected subsets.

If we apply this for an open subset of R, we end up by seeing that it is necessarily a

countable union of disjoint open intervals (countably many, since each one contains a
different rational point and there are only countably many to go round).

10.7 Theorem
Let f : X → Y be a continuous mapping between metric spaces, and suppose that X
is connected. Then the image f (X) := {f (x) : x ∈ X} is also connected.

Proof: If f (X) = U ∪ V with U , V , open and disjoint, then X = f −1 (U ) ∪

f (V ), with f −1 (U ) and f −1 (V ) disjoint, and open (since f is continuous). Since X

is connected this can only happen if one of f −1 (U ) and f −1 (V ) is empty, which means
that one of U and V is empty. So f (X) is connected.

10.8 Corollary (Intermediate Value Theorem)

Let f : [a, b] → R be continuous. Then for each y between f (a) and f (b) there is a x
between a and b with f (x) = y.

Proof: By Theorem 10.7 we have that f ([a, b]) is connected, and hence, by Theo-
rem 10.4 it is an interval. The result is now clear.

10.9 Definition
Let (X, d) be a metric space. Then X is path-connected if, for all x, y ∈ X, there is a
continuous f : [0, 1] → X with f (0) = x, f (1) = y (i.e., a path joining them).

(Of course we can also talk about path-connected subsets of a metric space, as they
are metric spaces too.)

10.10 Proposition
Let X be a path-connected metric space. Then X is connected.

Proof: Suppose that X = U ∪ V , with U and V open, disjoint and nonempty. Take
x ∈ U and y ∈ V . Then there is a path f : [0, 1] → X joining x to y.
Hence [0, 1] = f −1 (U )∪f −1 (V ), as open disjoint sets in the metric space [0, 1]. But [0, 1]
is connected so one of f −1 (U ) and f −1 (V ) is empty. But 0 ∈ f −1 (U ) and 1 ∈ f −1 (V ),
so we have a contradiction. So X is connected.

The converse is false. Take X = G ∪ I, where G = {(x, sin 1/x) : x > 0} and
I = {(0, y) : −1 ≤ y ≤ 1}. Then G ∪ I is connected, but not path-connected. See the


10.11 Remark
For open subsets of Rn it is true that connected and path-connected are the same.
Suppose that S is open and connected, and take x ∈ S. Then

U = {y ∈ S : we can join x to y by a path}

is open [DIAGRAM]. So is

V = {y ∈ S : we can’t join x to y by a path}.

Since S = U ∪ V (open, disjoint) and U is nonempty, since it contains x, we see that

by connectedness V = ∅ and U = S.

10.12 Theorem
(i) Let n ≥ 2. Then Rn (usual metric) isn’t homeomorphic to R.
(ii) Moreover, no two out of (0, 1), [0, 1) and [0, 1] are homeomorphic.

Proof: (i) Suppose that f : R → Rn was a homeomorphism. Let U = R \ {0}, and

V = Rn \ {f (0)}.
Then g = f|U is a homeomorphism from U onto V , and hence V is disconnected, as it
splits into f (−∞, 0) ∪ f (0, ∞).
But V is path-connected, hence connected. Contradiction.

(ii) Similarly, if we delete any point from (0, 1) it becomes disconnected; not true for
the others if we deleted 0. So (0, 1) is not homeomorphic to the others. If we delete
any 2 points from [0, 1) it becomes disconnected, not true for [0, 1] if we deleted the
end-points. So the other two aren’t homeomorphic, either.

Similarly we can see that [0, 1] is not homeomorphic to the square [0, 1]×[0, 1], since re-
moving any three points will disconnect [0, 1]. This is in spite of the fact that there exist
“space-filling curves”, i.e., continuous (non-bijective) maps from [0, 1] onto [0, 1]×[0, 1].
There also exist discontinuus bijections between the two sets.

11 Compactness
Recall that any real continuous function on a closed bounded interval [a, b] is bounded
and attains its bounds. We look at this in a more general context.

11.1 Definition
Let K ⊆ X, where (X, d) isSa metric space. An open cover of K is a family of open
sets (Uλ )λ∈Λ such that K ⊂ λ∈Λ Uλ . We say that K is compact if whenever (Uλ )λ∈Λ is
an open cover of K, there is a finite subcover Uλ1 , . . . , UλN such that K ⊂ Uλ1 ∪. . .∪UλN .

“Every open cover has a finite subcover.”

It doesn’t matter whether we cover K with open sets in K or open sets in X, since
open sets in K are just the intersection with K of open sets in X.

11.2 Examples
Clearly R is not compact, as Un = (−n, n) for n = 1, 2, . . . form an open cover, but we
cannot cover the whole of R by taking only finitely many of these sets.
[ 1
Similarly, nor is (0, 1) = , 1 , but not a finite union of any of these sets.
It will be shown later that [0, 1] is compact. More generally, it turns out that the com-
pact subsets of Rn with the Euclidean metric are just the closed bounded ones. Thus,
compact subsets of R include finite sets, and finite unions of closed intervals such as
[0, 1] ∪ [2, 3]. But NOT (0, 1), R itself, or Q.

11.3 Theorem
Let f : (K, d) → R be continuous with K ⊂ X compact. Then f is bounded on K
and it attains its bounds (so that ∃x ∈ K with f (x) = sup{f (k) : k ∈ K} < ∞ and
similarly for inf).

Proof: Let Un = {x ∈ K :S|f (x)| < n} for n = 1, 2, 3, . . ., which is f −1 (−n, n)

and hence open; we have K ⊂ Un . By compactness K ⊂ Un1 ∪ . . . ∪ UnN for some
Un1 , . . . , UnN , and now |f (x)| ≤ max{n1 , . . . , nN } for x ∈ K.

Also, if s = supx∈K f (x), we have either that f (x) = s for some x, or else that
1/(s − f (x)) is a continuous function on K and hence bounded by M > 0, say. This
means that s − f (x) ≥ 1/M for all x ∈ K; i.e., f (x) ≤ s − 1/M , contradicting the
definition of s as the sup.

11.4 Theorem
Let (X, d) be a metric space; then every compact subset K ⊂ X is closed and bounded.

Proof: Let x be a point of X \K. For each k ∈ K consider the balls Bk = B(k, rk /2)
and Ck = B(x, rk /2) where rk = d(x, k) > 0. These are disjoint and the Bk form an
open cover of K. By compactness we can find k1 , . . . , kN such that K ⊂ Bk1 ∪. . .∪BkN .
But now Ck1 ∩. . . CkN , is an open ball containing x which is disjoint from Bk1 ∪. . .∪BkN
and hence from K. So K is closed. [DIAGRAM]

Also, let x be any point of X and note that K ⊂ ∞

n=1 B(x, n). By compactness
∃n1 , . . . , nN such that K ⊂ B(x, n1 )∪. . .∪B(x, nN ), and thus d(k, x) < max{n1 , . . . , nN }
for all k ∈ K.

An alternative way to see why a compact set K is necessarily closed and bounded is to
take any x ∈ X \K and consider the continuous function on K given by f (k) = d(k, x).
By Theorem 11.3, f attains its lower bound δ ≥ 0. But δ cannot be 0 as then we would
have k ∈ K with k = x. Thus δ > 0 and B(x, δ) ∩ K = ∅ so K is closed.
Also, since f is bounded we see that K is bounded.

11.5 Example
1 1
The infinite set S = 1, , , . . . ∪ {0} is compact. For any open cover (Uλ ) of S,
2 3
there will be a set, say Uλ0 , containing 0. Since Uλ0 is open, there is an N such that
Uλ0 will also contain for all n ≥ N . But then we only need finitely-many more Uλ
to cover the whole set.

New compact sets from old ones.

11.6 Theorem
(i) Let X be a compact metric space and F a closed subset of X. Then F is compact.
(ii) Let X be a compact metric space and Y an arbitrary metric space. Suppose that
f : X → Y is continuous. Then f (X) is compact.

Proof: (i) If we have an open cover of F , say, F ⊂ λ∈Λ Uλ , then by adding the set
X \ F , which is open, we have an open cover of X. Since X is compact we only need
finitely many sets, say X ⊂ (X \ F ) ∪ Uλ1 ∪ . . . ∪ UλN , and now F ⊂ Uλ1 ∪ . . . ∪ UλN ,
so F is compact.

(ii) Given an open cover f (X) ⊂ λ∈Λ Uλ , we see that X ⊂ λ∈Λ f −1 (Uλ ) since for
each point x ∈ X there is a λ with f (x) ∈ Uλ , meaning that x ∈ f −1 (Uλ ). Since f is
continuous, this is an open cover of X.
But now we have a finite subcover of X. X ⊂ f −1 (Uλ1 ) ∪ . . . ∪ f −1 (UλN ), which means
that f (X) ⊂ Uλ1 ∪ . . . ∪ UλN . Hence f (X) is compact.

This gives us another way to prove Theorem 11.3. For if K is compact and f (K) ⊂ R
is compact, it is a bounded set, and being closed implies that the least upper bound is
in the set.

11.7 Theorem (Heine–Borel)

Any closed bounded real interval [a, b] ⊂ R is compact (in the usual metric).
Proof: Given an open cover [a, b] ⊂ λ∈Λ Uλ , let

S = {x ∈ [a, b] : [a, x] ⊂ some finite subcollection of the Uλ }.

Now a ∈ S as there is a Uλ containing a, so S is a nonempty bounded set, indeed it’s

an interval [a, y) or [a, y] since if x1 < x2 and x2 ∈ S then also x1 ∈ S.
If we can show that y = b and y ∈ S, then we have the result. But if y < b, then y lies
in some Uλ0 so that (y − δ, y + δ) ⊂ Uλ0 for some δ > 0. Now y − δ/2 ∈ S, so we cover
[a, y − δ/2] by finitely many Uλ and then adding Uλ0 to the collection cover [a, y + δ/2]
by finitely many, contradicting the definition of y.
The same argument also shows that y ∈ S.
Putting this together, we see that we can cover [a, b] by finitely many sets.

Now here is a concept that, for metric spaces, is equivalent to compactness and a little
easier to understand.

11.8 Definition
A subset K of a metric space is sequentially compact if every sequence in K has a
convergent subsequence with limit in K.

The classical Bolzano–Weierstrass theorem in R says that every bounded sequence has
a convergent subsequence, and this implies that all closed bounded subsets F ⊂ R are
sequentially compact (“closed” guarantees that the limit is in F ).

11.9 Example
The closed unit ball B in `2 is not sequentially compact (although closed and bounded).
Recall that ( ∞
B = (xn ) : xn ≤ 1 .

For let e1 = (1, 0, 0, . . .), e2 = (0, 1, 0, 0, . . .), e3 = (0, 0, 1, 0, . . .), etc. Then (e√
n ) is a
sequence in B with no convergent subsequence since d(en , em ) = ken − em k = 2 for
all n 6= m.

We need one more definition before we state the final big theorem of the course.

11.10 Definition
A subset K of a metric space is precompact or totally bounded if for each ε > 0 it can
be covered with finitely many balls B(xk , ε).
[Think of employing finitely-many short-sighted guards to watch over your set.]

Easily, every compact set is precompact, since we can cover K with open balls B(x, ε),
where x varies over the whole of K. By compactness we only need finitely many. [DI-

But the closed ball of `2 isn’t precompact, since if it were covered by balls of radius
1/2 then each en would have to be in a different one – for if d(x, en ) < 1/2 and
d(x, em ) < 1/2 we get d(xn , xm ) < 1, which is a contradiction for n 6= m. So it isn’t
compact either.

11.11 Example - the Hilbert cube

Consider the subset C ⊂ `2 defined by
∞ 1
C = (xn )n=1 : |xn | ≤ for each n .

X 1 ε2
We claim that C is precompact. For given ε > 0 choose N such that 2
< .
n=N +1
n 4
Now the set  
N 1
D = x = (x1 , . . . , xN ) ∈ R : |xn | ≤ for each n
is easily seen to be precompact: we can cover it with balls of radius ε/2 simply by
taking enough centres (y1 , . . . , yN ) such that |xj − yj | < ε/2N for each xj ∈ [−1/j, 1/j]


Now think of vectors in RN as being padded with zeroes, so that they lie in `2 .

That is D ⊂ K
k=1 B(zk , ε/2). But now we have that C ⊂ k=1 B(zk , ε), since for every
point in c ∈ C its truncation c to N coordinates lies in D; thus d(c0 , zk ) < ε/2 and

hence d(c, zk ) ≤ d(c, c0 ) + d(c0 , zk ) < ε by the triangle inequality.

In fact the Hilbert cube is also compact, and this is a consequence of the big result
that follows.

11.12 Theorem
The following are equivalent in a metric space (X, d):
(1) X is compact.
(2) If (En ) are nonempty closed sets in X with E1 ⊇ E2 ⊇ . . ., then ∞
n=1 En 6= ∅.
(3) X is sequentially compact.
(4) X is complete and precompact.
Proof: (1) ⇒ (2). Suppose that En = ∅, then (X \ En ) = X (de Morgan’s
law); this is an open cover of X so there is a finite subcover, by compactness. So
X = (X \ E1 ) ∪ . . . ∪ (X \ EN ) = X \ EN as the En are decreasing so their complements
are increasing. Which means that EN = ∅, a contradiction.

(2) ⇒ (3). Let (xn ) be any sequence in X and let En = T {xn , xn+1 , . . .}, which are
decreasing nonempty closed sets. Thus there is a point y in ∞ n=1 En .
Take B(y, 1): this meets {x1 , x2 , . . .} since y is in its closure. Pick xn1 ∈ B(y, 1).
Now B(y, 1/2) meets {xn1 +1 , xn1 +2 , . . .} since y is in its closure. Pick xn2 ∈ B(y, 1/2),
and note that n2 > n1 .
Continuing this way we find xnk in B(y, 1/k), so the subsequence (xnk ) converges to y.

(3) ⇒ (4). If X is sequentially compact it is certainly complete, for if (xn ) is a Cauchy

sequence, let (xnk ) be a convergent subsequence, converging to y. Now the original
Cauchy sequence also converges to y (see Part D of Theorem 8.5).

Also X will be precompact, since if not then we can find ε > 0 with no finite covering
S of radius ε; choose x1 ∈ X, and then inductively we obtain (xn ) such that
by balls
xn 6∈ n−1
k=1 B(xk , ε). Now it’s clear that d(xk , xn ) ≥ ε > 0 for all k < n, which means
that the (xn ) have no convergent subsequence.

We’ll postpone (4) ⇒ (1) until the next lecture, as it is long.

11.13 Corollary

A subset K ⊂ RN is compact if and only if it is closed and bounded.

Proof: We saw in Theorem 11.4 that all compact sets (in any metric space) are
closed and bounded.

For the converse, we can show that all closed bounded sets K in RN are sequentially
compact. If (xn ) = (xn1 , . . . , xnN ) is a sequence in K, then by passing to a subsequence
we can ensure that the sequence (xn1 ) of first coordinates converges (since every bounded
sequence in R has a convergent subsequence); then to a further subsequence to ensure
that the sequence (xn2 ) of 2nd coordinates converges, and so on.

After finitely many steps we have a subsequence (y n ) = (y1n , . . . , yN

) such that ykn → zk ,
say, as n → ∞ for each 1 ≤ k ≤ N . This implies that y → z = (z1 , . . . , zN ) (cf. Propo-
sition 2.4). Also z ∈ K since K is closed.

Thus K is sequentially compact, and hence compact by Theorem 11.12.

(4) ⇒ (1) in Theorem 11.12. This is the hardest bit and the proof is definitely not
examinable. We suppose S that X is complete and precompact, and show it is compact.
So take an open cover λ∈Λ Uλ of X.
First we reduce it to a countable cover. For each n we can cover X by finitely many
balls B(an,1 , 1/n) ∪ . . . ∪ B(an,rn , 1/n), using precompactness. Let A denote the set of
centres (this is countable and dense because for each n the set A comes within 1/n of
every point of X) and consider all the balls B(a, 1/k) for a ∈ A and k = 1, 2, . . .. We
claim that every open set U is a union of some of the B(a, 1/k). For if U is open and
x ∈ U , there is a ball B(x, 1/j) ⊂ U and a point a ∈ A contained in B(x, 1/2j). But
now x ∈ B(a, 1/2j) ⊂ B(x, 1/j) ⊂ U [DIAGRAM].

Thus we can cover X with a countable subcollection of the Uλ , since for each x ∈ X
there is a ball B(a, 1/k) with x ∈ B(a, 1/k) ⊂ some Uλ . There are only countably
many balls to choose from so select one Uλ for each ball we used.

The upshot, after relabelling, is that we may assume that X = U1 ∪ U2 ∪ . . .. If there

is a p such that X = U1 ∪ . . . ∪ Up , we are finished. If not, then for each i we may find
xi 6∈ U1 ∪ . . . ∪ Ui . We select a Cauchy subsequence as follows (a “diagonal” argument).

Cover X by finitely-many balls of radius 1. Then for at least one of these balls there
is an infinite subsequence, say x11 , x12 , . . ., all in the same ball B(y1 , 1).
Now cover X by finitely-many balls of radius 1/2. Then for at least one of these balls
there is an infinite subsequence of (x1k ), say x21 , x22 , . . . all in the same ball B(y2 , 1/2).

Repeat. We obtain nested subsequences (xnk )k all in the same ball B(yn , 1/n). But now
the diagonal subsequence (xnn ) is Cauchy, since since for m ≤ n, d(xmm , ym ) < 1/m
and also d(xnn , ym ) < 1/m as the (xnk ) are a subsequence of the (xmk ). Hence
d(xmm , xnn ) < 2/m for n > m, i.e., a Cauchy sequence.

By completeness, xnn → z, say. Now z ∈ Uj for some j so xnn ∈ Uj for n ≥ some n0 .

But this contradicts the construction of our (xi ) as for i ≥ j we have xi 6∈ U1 ∪ . . . ∪ Uj .

11.14 Final Example – the Cantor set

We define the Cantor set C ⊂ [0, 1] as follows.

Let C0 = [0, 1], C1 = [0, 1/3] ∪ [2/3, 1], C2 = [0, 1/9] ∪ [2/9, 1/3] ∪ [2/3, 7/9] ∪ [8/9, 1],
and so on, at each stage deleting the middle open third of every interval that remains.

Then C = ∞
n=0 Cn . This is the intersection of closed subsets of R, and is hence a
closed (even compact) set. Remarkably, it is uncountable: indeed it consists of all

X aj
numbers of the form x = , where aj = 0 or 2 for each j (and not 1). Note that
we regard 1/3 = 0.02222 . . ..

One can use a Cantor diagonal argument (as one does to prove that R is uncountable)
to show that C is uncountable.

Note that in fact there is a surjection f : C → [0, 1] defined by

∞ ∞
X aj X aj /2
f: 7→ .
3j j=1

Paradoxically, the complement of the Cantor set is an open set and so just a countable
union of intervals. If one calculates the total length of the intervals removed from [0, 1]

X 2j
it is j
, since we removed 2j intervals of length 3−j at each stage. This sums up to
1, but there are still many points left!

The set C is “totally disconnected” – it clearly doesn’t contain any intervals, so all its
subsets consisting of more than one point are disconnected. That is, every component
of C consists of a single point.

In a technical sense (outside the scope of this course) C is a fractal set – its “dimension”
is log 2/ log 3 or about 0.63.


The exercises on the sheet Extra Examples will be done in lectures, but the solutions
will not be put online (if necessary, you can watch the videos).


