Notes On MATH 441

MATH 441: Advanced Calculus
Lecture notes.
M.McIntyre
July 24, 2017

2
Contents
Details of the course iii
1 The Fréchet Derivative for maps f : R → R 1

1.1 Linear and Affine functions from R to R. . . . . . . . . . . . . . . 3
1.2 The Fréchet derivative for maps
f : R → R. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
2 Linear and Affine Maps 7

2.1 The Matrix of a Linear Map . . . . . . . . . . . . . . . . . . . . . 8
2.2 Affine Maps between arbitrary
Vector Spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
3 Continuity and Limits 15

3.1 Continuity and limits. . . . . . . . . . . . . . . . . . . . . . . . . 16
3.2 Continuity for Linear Maps: . . . . . . . . . . . . . . . . . . . . . 22
4 Spaces of Linear Maps 27

4.1 The vector space L(V, W ) . . . . . . . . . . . . . . . . . . . . . . 27
5 Tangency Of Maps 33
5.1 Tangency and Affine Maps . . . . . . . . . . . . . . . . . . . . . . 34
6 Concept of Derivative 37
7 Differentiation of Composites 43
8 Differentiability of real-valued functions 49
9 Partial derivatives and Jacobian matrices 55

9.1 Fréchet differentiability and partial derivatives. . . . . . . . . . . 56
9.2 General Partial Derivatives. . . . . . . . . . . . . . . . . . . . . . 58
9.3 Jacobian Matrix . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
i
ii CONTENTS
10 Inverse Maps 65
10.1 Some techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
10.2 Existence of local inverses. . . . . . . . . . . . . . . . . . . . . . . 68
11 Implicit functions 83
11.1 An Application of the Implicit Function Theorem . . . . . . . . 85
12 Proof of the Theorems 89
13 Appendix A- Vector Spaces 103
14 Appendix B- Complete Spaces 105
15 APPENDIX C-Higher Order Fréchet Derivatives 109

Details of the course
These notes were complied by Dr. McIntyre. Much material has been adapted
from lecture notes written by Arthur Jones and Alistair Gray of Latrobe Uni-
versity, Australia.
The text by Lang, S. “Analysis 1” is an appropriate supplement. In this book,
the presentation and the order of introducing ideas varies from that taken by
the lecturer, nevertheless the student is advantaged by reading from this text.
Differential calculus for functions mapping from R to R has been studied in
calculus. Such a function is differentiable at a ∈ R if limh→0 f (a+h)−f
h
(a)
exists.
The derivative of f at a is then defined to be this number and is denoted f 0 (a).
Unfortunately however this idea does not carry over to higher dimensions.
For example division is not defined for an element of Rm , which is a vector
rather than a number.
The way out of this dilemna was discovered by the French mathematician
Fréchet in 1911. His idea was to think of differentiation as a process of ap-
proximating the function f near a, by a linear map. This linear map is called
the Fréchet derivative of f at a. The main aim of this course is to give an
understanding of this differentiation.
1. At first we briefly consider functions f : R → R and define Frećhet differ-
entiation in this familiar context. The Fréchet derivative at a point a ∈ R
is just a linear map from R to R, i.e. it is a constant (in fact the ordinary
derivative at a) times the identity map.
2. We define linear and affine maps between arbitrary vector spaces, our
scalars always being from the field R and we examine the geometry of
such maps when the vector spaces are Euclidean. We review the matrix of
a linear map from Linear Algebra and prove the result that an affine map
between arbitrary vector spaces has an inverse if and only if the same is
true of its linear part.
3. Having reviewed linear and affine maps, the other prerequisite for under-
standing the Fréchet derivative of maps between normed vector spaces is
the study of limits for such maps. We recall from 353 and 356, some topol-
ogy of normed vector spaces. A normed vector space is a metric space,
the metric being determined by the norm. Thus as basis for a topology,
we have the set of all open balls. The study of limits is of course closely
iii
iv DETAILS OF THE COURSE
related to the study of continuity. We will digress to study some function

spaces at this point in the course. We give examples to recall the notion
of a complete normed vector space. We will consider some function spaces
which are complete and we’ll see some that are not complete.
In particular we study the space L(V, W ) of linear maps between vector
spaces V and W , equipped with the supremum norm.
We will see that for linear maps between normed vector spaces, continuity
assumes a particularly simple form. If a linear map is continuous at the
origin, then it is continuous at every point of its domain. All linear maps
Rm → Rn are continuous. These theorems recur frequently in later proofs.
4. Next we study the concept of tangency for maps between normed vec-
tor spaces. This is our main application of the idea of limits developed
previously.
Then we study the derivative for functions which map one normed vector
space to another. The basic idea is very simple. A function is differentiable
at a point if there is an affine map which approximates the function very
closely near this point. The derivative of the function is then the linear
part of that affine map.
5. In calculus of several variables, differentiation of maps from Rn to R and

maps from Rn to Rm was studied using the notion of partial differentia-
tion. Partial differentiation played a conspicuous role in the history of the
development of calculus, but it has since been realized that it should not
be overemphasised at the expense of Fréchet differentiation. Neverthe-
less we will take time to understand the connection between the Jacobian
matrix of partial derivatives and the Fréchet derivative.
6. A generalised notion of partial derivative (where the partial derivative is

also a linear map) plays a role in results such as the Implicit Function
Theorem. And so having briefly examined some aspects of the topology
of product spaces, we introduce the notion of Fréchet partial derivatives.
Again we examine the connection between these derivatives and ordinary
partial derivatives.
7. Next we state and apply an important theorem of analysis, the Inverse

Map theorem-which gives sufficient conditions for the local invertibility of
a map which is differentiable. En route we study a version of the Mean
Value theorem which is valid for maps between normed vector spaces. We
recall the completeness of R and Rn and the notion of Banach spaces. We
examine the contraction map theorem. With these tools we can prove the
Inverse Map theorem.
8. Then we study the Implicit Function theorem, which is considered to pro-

vide one of the most important tools in modern analysis. It has wide
applications particularly to existence theorems in the study of differential
v
equations. We use the Inverse Map Theorem to prove the Implicit Func-
tion Theorem. As application we could examine the notion of charts for
manifolds.
vi DETAILS OF THE COURSE
Chapter 1
The Fréchet Derivative for

maps f : R → R
The primary aim of this course is to extend the notion of differentiability to

maps
f : V → W , where V, W are normed vector spaces.
Recall that V a vector space is a set with two operations called addition and
scalar multiplication- each operation satisfying four axioms: (A1) commuta-
tivity (A2) associativity (A3) existence of zero (additive identity) (A4) exis-
tence of (additive) inverse; (M1) multiplication by one (the unit of the field)
(M2) distributivity of a scalar (M3) distributivity of a vector and (M4) for
λ, µ ∈ F, a ∈ V we have λ(µa) = (λµ)a.
The scalars belong to a field (check the appendix), most commonly R, C, or Q.
Recall that a norm on a vector space is a map || || : V → R satisfying three
properties: positivity, the triangle inequality and homogeneity (check the ap-
pendix).
For example: for V = Rn , we have the p Euclidean
Pn norm given for each
x = (x1 , x2 , . . . , xn ) ∈ Rn as, ||x|| = 2
i=1 xi . Notice that the inner (dot)
product x · x = ||x||2 .
This norm also satisfies the Cauchy-Schwartz inequality. In analysis we saw
that the Cauchy-Schwartz inequality implies the triangle inequality. Here we
prove the Cauchy-Schwartz inequality.
n
Proposition (1.0): ∀x, y ∈ Rn , |
P
xi yi | ≤ ||x||||y||.
i=1
Sketch of proof: If either x = 0 or y = 0 then we have zero on each P side. In

n
addition, if x = λy for some λ ∈ R then the left hand side equals λ. i=1 yi2
which equals the right hand side. So suppose ||x − λy|| > 0, for all λ ∈ R. And
1
2 CHAPTER 1. THE FRÉCHET DERIVATIVE FOR MAPS F : R → R
so
n
0 < ||x − λy||2 (xi − λyi )2
P
=
i=1
definition of || || on Rn
n n n
x2i − 2λ xi yi + λ2 yi2 ;
P P P
=
i=1 i=1 i=1
a quadratic in λ having no real solution. Thus the discriminant is negative i.e.

n
X n
X n
X
2
4( xi yi ) − 4 yi2 x2i < 0
i=1 i=1 i=1
from which the Cauchy-Schwartz inequality follows.

So far you have studied differentiability of functions
f : Rn → R vectors to numbers
(i) f : R → R and (ii)
f : Rn → Rm vectors to vectors
In your study of (ii) following the historical development of calculus, the notion
of partial differentiation was introduced.
For f : Rn → R and x ∈ Rn , (x = (x1 , . . . , xn )), f is differentiable at a =
(a1 , . . . , an ) with respect to xi when
f (a1 , a2 , . . . , ai + h, . . . , an ) − f (a1 , . . . , an )
lim
h→0 h
∂f
exists. The value of the limit is the ith partial derivative ∂x i
of f at a.
∂f
Each partial derivative ∂xi determines a function from R → R, so partial differ-
entiation reduces the problem in (ii) to that of (i). Next to consider differentia-
bility of functions f : Rn → Rm , since f (x) ∈ Rm , f determines m component
functions; i.e. f (x) = (f1 (x), . . . , fm (x)) each of which is a function from Rn
to R. Again the question of differentiability becomes a question of existence
of partial derivatives of real valued functions. This led to the definition of the
Jacobian matrix (of partial derivatives) to which we will return later.
Considering our definition of f 0 (a) for f : R → R, i.e.
f 0 (a) = limh→0 f (a+h)−f
h
(a)
, it is obvious that this cannot apply for functions
f : R → R because division by h ∈ Rn is not defined.
n
The ideas of elementary calculus do not carry to higher dimensions or to arbi-

trary vector spaces. The way out of this dilemna was discovered by the French
mathematician Fréchet in 1911. His idea was that we should think of differen-
tiation at a point a as a process of approximating the function f near to a by
a linear map. This map will be the Fréchet derivative of f at a.
So we have to adjust our thinking from considering derivatives as numbers to
considering derivatives as maps.
To assist this adjustment we will begin by exploring what this means for func-
tions from R to R.
Notation: we will make use of the following notations:
Id : V → V is the identity function for any vector space V , i.e Id(x) = x for all
x ∈ V and
1.1. LINEAR AND AFFINE FUNCTIONS FROM R TO R. 3
c : V → W is the constant function which assigns the value c ∈ W to each

x ∈ V ; i.e. c(x) = c for each x ∈ V .
Recall that the set F(R, R) of functions from R to R is a vector space when the
vector addition and scalar multiplication are defined as follows:
for all f, g ∈ F(R, R) the function f + g is defined at each x ∈ R by (f + g)(x) =
f (x) + g(x)
and for all scalars λ ∈ R the function λf is defined at each x ∈ R by (λf )(x) =
λf (x).
You should verify that F(R, R) is closed under these two operations; i.e. f + g ∈
F(R, R), and λf ∈ F(R, R), where f, g ∈ F(R, R) and λ ∈ R.
We can also define a product of vectors (functions) by (f.g)(x) = f (x)g(x) for
each x ∈ R.
Recall what it means for two functions to be equal:
f = g ⇔ ∀x ∈ R, f (x) = g(x), where f, g ∈ F(R, R).
1.1 Linear and Affine functions from R to R.

In the context of functions from R to R a linear map is just a constant multiple
of the identity map. More precisely:
Definition 1.1.1 A function f : R → R is linear ⇔

∀x ∈ R, f (x) = mx for some m ∈ R: i.e. f = mId.
Definition 1.1.2 A function f : R → R is affine ⇔

∀x ∈ R, f (x) = mx + c for some m, c ∈ R: i.e. f = mId + c.
Geometrically the graphs of linear and affine maps from R to R are lines.
affine
linear
Notation: L(R, R) will denote the space of linear maps from R to R. It is closed
under the vector space operations of pointwise addition and scalar multiplica-
tion. Moreover it is possible to define a norm on this vector space. For example:
|| || : L(R, R) → R defined by ||mId|| = |m| (i.e. the absolute value of the slope
of the line) is a norm on L(R, R).
We will say more about linear and affine maps when we consider more general
functions.
1.2 The Fréchet derivative for maps

f : R → R.
We have f 0 (a) = limh→0 f (a+h)−f
h
(a)
, which measures the slope of the tangent
to the graph of f at a. Translate the tangent line so that it passes through the
origin.
tangent has gradient f 0 (a)
graph of f 0 (a)Id
The linear map given by f 0 (a)Id is the Fréchet derivative of f at a. More

precisely:
Definition 1.2.1 The linear map Df (a) : R → R defined by
Df (a) = f 0 (a)Id is the Fréchet derivative of f at a.
Thus for each x ∈ R, Df (a)(x) = (f 0 (a)Id)x = f 0 (a)x.
Notice that Df (a) ∈ L(R, R), the space of linear maps from R to R.
Example (1): f = Id2 (i.e. f (x) = x2 for each x ∈ R).
We have for a ∈ R, f (a) = a2 , so f 0 (a) = 2a and Df (a) = 2aId.
The notation suggests an underlying map Df . Suppose f is differentiable at
each a ∈ R, then
Df : R → L(R, R)
is the map which assigns to each a ∈ R, the linear function Df (a) ∈ L(R, R).
Returning to the example, we had Df (a) = 2aId i.e. Df (a)(h) = (2aId)(h) =
2ah.
Df : R → L(R, R)
a 7→ Df (a)
For f = Id2 , the map Df is 2IdId; since to each a ∈ R Df assigns the map
Df (a) which is 2aId.
1.2. THE FRÉCHET DERIVATIVE FOR MAPS F : R → R. 5
Example (2): Let f : R → R be given by f = 3Id2 . cos (i.e. f (x) = 3x2 cos(x).)
f is differentiable at each a ∈ R, by elementary calculus we have
f 0 = 6Id. cos −3Id2 . sin (i.e. at each a ∈ R, f 0 (a) = 6a cos(a) − 3a2 sin(a))
Df (a) = f 0 (a)Id = (6a cos(a) − 3a2 sin(a))Id
Df (a)(h) = 6ah cos(a) − 3a2 h sin(a)
Df = (6Id. cos ◦Id−3Id2 sin ◦Id)Id, where conventionally the composite cos ◦Id =
cos and sin ◦Id = sin, so Df = (6Id. cos −3Id2 sin)Id.
Repeated differentiation: higher derivatives:

In the elementary calculus
if f : R → R is differentiable then we get a function f 0 : R → R and
if f 0 : R → R is differentiable then we get a function f 00 : R → R and so on...
but with Fréchets approach the derivative function Df no longer has R for its
codomain; we have Df : R → L(R, R). But Df (a) : R → R and so if Df (a)
is differentiable we should be able to differentiate again D(Df (a)) = D2 f (a)
exists. More about higher derivatives is available as appendix C.
Exercise 1:
1. Let f = Id + 3, g = −2Id. Find f + g and 7f .
2. Check that L(R, R) is closed under the operation of

(i) adding two functions, and (ii) multiplying a function by a real num-
ber.
3. Describe the zero vector in the vector space L(R, R).
4. Check that ||mId|| = |m| is a norm on L(R, R).
5. Let f, g ∈ L(R, R). Show that f ◦ g ∈ L(R, R) and express its slope in terms
of the slopes of f and g.
6. Let a, h ∈ R and f : R → R. In each case find f 0 , f 0 (a), Df (a), Df (a)(h)

and Df .
(a) f = (-2Id+4)2 . Show the geometric significance of Df (1) on a graph.

(b) f = Id · exp (c) f = cos3 ◦Id (d) f = arctan2 .
(Id+1)
7. Let f : R → R be differentiable at each a ∈ R. Match all of the appropriate

qualifiers (a), (b), (c) and (d), on the second row to each entity on the
first row.
Df f 0 (a) f 0 Df (a) Df (a)(h) Id Id
(a) ∈ L(R, R) (b) : R → L(R, R) (c) ∈ R (d) : R → R
8. Let f : R → R be differentiable at each a ∈ R.
(a) Show that f 0 (a) = Df (a)(1)

(b) Find Df (a)(2) in terms of f 0 (a)
(c) Show that Df = f 0 · Id.

9. Let f : R → R and g : R → R and let
g be differentiable at a ∈ R
f be differentiable at g(a) ∈ R
(a) What does the chain rule of elementary calculus tell you about
(f ◦ g)0 (a)?
(b) Hence prove that
D(f ◦ g)(a) = Df (g(a)) ◦ Dg(a).
This is the form the chain rule takes for Fréchet derivatives.
10.
(a) Let f : R → R be a linear map and let a ∈ R.

(i) Show that Df (a) = f . (ii) Find Df . (iii) Is Df a constant
map? What about Df (a)?
(b) Let f : R → R be an affine map and let a ∈ R. Find Df (a), Df and
repeat question (a)(iii) above.
Chapter 2
Linear and Affine Maps
Next we generalize the notion of linear (and affine) maps to arbitrary vector
spaces; a preliminary step to generalizing the Fréchet derivative. For simplicity
all of our vector spaces will be over the field R.
There are two aspects to linearity, the algebraic and the geometric. First we
consider the algebraic: each vector space has two operations; addition and scalar
multiplication, so a linear map should preserve these operations.
Definition 2.0.2 Let V and W be vector spaces over the same field of scalars.
By saying that a map L : V → W is linear, we mean that for all vectors x, y ∈ V
and all scalars λ ∈ R:
(i) L(x + y) = L(x) + L(y) and
(ii) L(λx) = λL(x)
Notice: x + y ∈ V , L(x) + L(y) ∈ W
λx ∈ V , L(λx) ∈ W and L(x) ∈ W .
(†) Verify that in the case V = W = R, this definition is equivalent to Definition
(1.1.1): i.e. prove that if L = mId then L satisfies Definition (2.0.2) and
conversely prove that if L is a map which satifies Definition (2.0.2) then L = mId
for some choice m ∈ R.
Indeed we can show the following three statements are true:
(1) If V = W = R, it can be shown that
a map L is linear ⇔ it has the form L = mId for some m ∈ R
(2) If V = R2 and W = R, it can be shown that
a map L : R2 → R is linear ⇔ it has the form L(x, y) = ax + by for some
a, b ∈ R
(3) If V = R2 and W = R2 , it can be shown that
a map L : R2 → R2 is linear ⇔ it has the form L(x, y) = (ax + by, cx + dy)
for some a, b, c, d ∈ R
7
8 CHAPTER 2. LINEAR AND AFFINE MAPS
To show that a map is not linear, we have to find a counterexample: a con-

tradiction to either statement (i) in Definition (2.0.2) or to statement (ii) in
Definition (2.0.2).
To show that the map L : R2 → R given by L(x, y) = x2 + y 2 is not linear,
consider x = (x, y) = (1, 2) and λ = 2 and we’ll see that scalar multiplication is
not preserved. We have L(λx) = L((2, 4)) = 4+16 whilst λL(x) = 2.L((1, 2)) =
2(1 + 4) 6= 20. Thus L is not linear.
Alternatively, having proved (2) in the paragraph above we could have used the
fact that x2 + y 2 is not of the form ax + by for any real numbers a, b.
Geometrically for linear maps L : R2 → R2 there are three possibilities. We
cannot draw the graph of L since Gr(L) = {(x, L(x)) : x ∈ R2 } ⊆ R4 , but
we can draw an arrow diagram which indicates where points in the domain are
mapped to in the codomain. The three possibilities are:
(a) L maps R2 onto R2 : in this case it can be shown that each “grid” (equally
spaced parallel lines) maps onto a grid.
(†) Given L : R2 → R2 with L(x) = L(x, y) = (x − y, x + y) see where a

grid through the points (1, 1), (−1, 1), (−1, −1) and (1, −1) is mapped to.
Notice that the kernel kerL of L is {(0, 0)} the set containing the zero of
R2 , thus the map L is injective.
(b) L maps R2 onto a line through the origin.

(†) Given L : R2 → R2 with L(x) = L(x, y) = (y, 2y), notice that the
x-axis maps to (0, 0), the line {(x, 1) : x ∈ R} maps to the point (1, 2) and
kerL = {(x, 0) : x ∈ R} a 1-dimensional set.
(c) L maps R2 onto the origin. Obviously this is the case a, b, c, d = 0.
2.1 The Matrix of a Linear Map

Suppose L : Rm → Rn is linear. For any x = (x1 , . . . , xm ) ∈ Rm we have
(*) x = x1 e1 + · · · + xm em ,
2.1. THE MATRIX OF A LINEAR MAP 9
where {e1 , e2 , . . . , em } is the usual orthonormal basis for Rm i.e.

e1 = (1, 0, . . . , 0), e2 = (0, 1, . . . , 0) . . . em = (0, 0, . . . , 1).
Now applying L to each side of (*) and using the property of linearity we have:
L(x) = L(x1 e1 + · · · + xm em )
= L(x1 e1 ) + · · · + L(xm em )
= x1 L(e1 ) + · · · + xm L(em ) ∈ Rn
Also L(x) = (L1 (x), . . . , Ln (x)). Thus
     
L1 (x) L1 (e1 ) L1 (em )
 L2 (x)   L2 (e1 )   L2 (em ) 
(L(x))T =  ..  = x1  ..  + · · · + xm  ..
     

 .   .   . 
 L n (x) L n (e 1
)  Ln (e m)
L1 (e1 ) . . . L1 (em ) x1
 L2 (e1 ) . . . L2 (em )   x2 
= .. ..   .. 
  
 . .   . 
Ln (e1 ) . . . Ln (em ) xm
= [L]xT
Definition 2.1.1 The matrix with ijth entry Li (ej ) is called the matrix of the
linear map L (relative to the usual basis). We denote this matrix [L].
Example: A linear map L : R2 → R2 sends points e1 and e2 to the points (2, 2)
and (1, 3). Find [L] (relative to the usual basis) and use it to calculate L((3, 1)).
We have
L(e1 ) = (2, 2) = (L1 (e1 ), L2 (e1 )) and
L(e2 ) = (1, 3) = (L1 (e2 ), L2 (e2 )) so

2 1 T 2 1 3 7
[L] = . Hence L((3, 1)) = = . Indeed in
2 3 2 3 1 9
this example, for x = (x, y),
T
L(x) = (L(x))T = (2x + y, 2x + 3y).
The map L is onto when it has full rank.
To assist with the proof of the statement: “ If V = W = R2 , it can be shown
that a map L is linear ⇔ it has the form L(x, y) = (ax + by, cx + dy) for some
a, b, c, d ∈ R”: let L : R2 → R2 be linear and suppose that L maps e1 and e2 to
the points (1, 1) and (−1, 1). Now find a formula for L(x) valid for each x ∈ R2 ,
i.e. let x = (x, y) so x = xe1 + ye2 then
L(x) = L(xe1 ) + L(ye2 )
= x(1, 1) + y(−1, 1) = (x − y, x + y)
which tells you what to choose for a, b, c and d.
Returning now to matrices and linear maps:
Theorem 2.1.2 Given an n × m matrix A with real entries, define a map

L : Rm → Rn by putting for each x ∈ Rm ,
L(x)T = AxT .
The map L is then linear and its matrix (relative to the usual basis) is given by
[L] = A.
Theorem 2.1.3 If L : Rm → Rn and M : Rn → Rp are linear, then so is

M ◦ L : Rm → Rp and the matrix [M ◦ L] of the composite is [M ][L], where [M ]
(resp. [L]) are the matrices for M (resp. L).
Proof shortly. In the meantime a few more examples to assist with the exercises.
There is an equivalent formulation of the notion of linearity given in Definition
(2.0.2).
Definition (2.0.2)’: A map L : V → W is linear ⇔ L(x+λy) = L(x)+λL(y),
for all x, y ∈ V and λ ∈ R (a field).
You will find both definitions useful to address different problems.
Let’s see that a map L : R2 → R2 which reflects each point across the vertical
axis is linear.
L(x + y)
x+y
x L
L(x)
y L(y)
L sends x, y and x + y to L(x), L(y) and L(x + y) respectively. Since triangles

are preserved under reflections (proof?) we have L(x + y) = L(x) + L(y).
Given a scalar λ ∈ R, x and λx lie on the same straight line, L sends x and λx
to L(x) and L(λx). Since a reflection sends lines to lines (proof?) we have that
L(x) and L(λx) lie on the same straight line. In addition reflection preserves
the lengths of lines (proof?) and so L(λx) = λL(x).
To find the matrix of L (reflection
vertical axis): we have L(e1 ) = −e1
in the
−1 0
and L(e2 ) = e2 . Thus [L] = .
0 1
To prove that a linear map sends lines to lines:
For fixed vectors u, v ∈ R2 we have l = {u + δv : δ ∈ R} is a line in R2 . (We
need to see that L(l) is a line.)
For δ1 , δ2 ∈ R, u + δ1 v and u + δ2 v are points in l. Now
L(u + δ1 v) = L(u) + (δ1 )L(v) and L(u + δ2 v) = L(u) + (δ2 )L(v)

2.2. AFFINE MAPS BETWEEN ARBITRARY VECTOR SPACES 11
by the linearity of L. So the image of the two points in l is two points on the
line through L(u) parallel to L(v). Thus L(l) is a line in R2 .
Proof of Theorem (2.1.3): Suppose L : Rm → Rn , M : Rn → Rp are linear.
Let x, y ∈ Rm , λ ∈ R, so x + λy ∈ Rm (it’s a vector space). Now
(M ◦ L)(x + λy) = M (L(x + λy)

= M (L(x) + λL(y)) linearity of L
= M (L(x)) + λM (L(y)) linearity of M
= (M ◦ L)(x) + λ(M ◦ L)(y).
So M ◦ L is linear.
Let [L] (resp. [M ]) be the matrix for L (resp. M ). Then
((M ◦ L)(x))T = [M ◦ L]xT by Theorem (2.1.2) applied to M ◦ L.
Also
((M ◦ L)(x))T = ((M (L(x)))T

= [M ](L(x))T by Theorem (2.1.2) applied to M
= [M ]([L](x)T ) by Theorem (2.1.2) applied to L
= [M ][L]xT since the matrix multiplication
is associative.
Hence [M ◦ L] = [M ][L] as required.
2.2 Affine Maps between arbitrary

Vector Spaces
.
An affine map is just a linear map with the addition of a constant map.
Definition 2.2.1 An affine map A : V → W is a map with values given by

A(x) = c + L(x) where L ∈ L(R, R) and c ∈ W .
Every linear map L : V → W sends the zero vector in V to the zero vector in
W . Thus c = A(0), where 0 ∈ V .
In fact A(x) − A(0) = L(x) i.e. the linear map is uniquely determined by the
affine map.
Examples:
(1) Let A : R2 → R2 given by A(x, y) = (x + y + 1, x − y + 1). The linear part

of A is L : R2 → R2 with L(x, y) = (x + y, x − y) (we have a = 1, b =
1, c = 1, d = −1) while A(0, 0) = (1, 1) = c (∈ R2 ).
Geometrically a grid of the unit square maps as below:
A((1, 1))
A A((−1, −1)) A(1, 1)
A((−1, 1))
(2) Let A : R2 → R2 given by A(x, y) = (2x + 12 y + 2, x + y). The linear

part of A is L : R2 → R2 with L(x, y) = (2x + 21 y, x + y) (we have
a = 2, b = 12 , c = 1, d = 1) while A(0, 0) = (2, 0) = c (∈ R2 ).
Geometrically a grid of the unit square maps as below:
A((1, 1))
A
A((−1, 1))
A((1, −1))
A((−1, −1))
To generalize differentiation to arbitrary vector spaces, in addition to a knowl-

edge of linear and affine maps we need a notion of continuity and limits for
functions between normed vector spaces. We address this in the next chapter.
Exercise 2
1. Prove (from the definition) that every linear map L : V → W maps the zero
element of V to the zero element of W .
2.
(a) Generalise the statements (1), (2) and (3) on page 7 to give the form
of a linear map L : Rm → R with respect to the usual basis for Rm .
Show that your choice of L is indeed linear.
(b) Now generalise to give the form of a linear map L : Rm → Rn with
respect to the usual basis for Rm . Again show that your choice of L
is linear.
2.2. AFFINE MAPS BETWEEN ARBITRARY VECTOR SPACES 13
3. In each case state whether the map L with domain R2 is linear. If not, give
a counterexample. If so, give a proof.
(a) L(x) = L(x, y) = (2y, 4x − y),
(b) L(x) = L(x, y) = x + 4y + 1,
(c) L(x) = L(x, y) = x2 + 2x − y,
(d) L(x) = L(x, y) = (x + y, −3xy),
(e) (
x2 −y 2
if x + y 6= 0
L(x) = L(x, y) = x+y
2x otherwise
(f) L : Mn×n (R) → Mn×n (R) with L(A) = AT + BA, where B is a fixed
n × n matrix.
4. Sketch the graph of the linear map L : R2 → R with L(x, y) = −3x + 2y.
5. Let L : R2 → R2 be a linear map and let {e1 , e2 } be the usual orthonormal
basis for R2 . Suppose that L(e1 ) = (4, 1) and L(e2 ) = (1, 4).
(a) On an arrow diagram, show the effect of L on a unit grid in the
domain.
(b) Find a formula giving the value of L at an arbitrary point x in R2 .
6. Prove that if L : V → W is linear, then its kernel {x ∈ V : L(x) = 0W } is
a vector subspace of V .
2 2
7. A linear map
L:R → R has its matrix relative to the usual basis given
2 6
as [L] = . Find the point to which this map sends (1, −1).
1 4
8. A map L : R2 → R3 is defined by putting L(x, y) = (2x + y, x + 4y, 6x − 7y).

Say why L is linear and give [L] relative to the usual basis.
9. Let L : R2 → R2 and M : R2 → R3 be linear maps with matrices
 
1 2
2 2
[L] = and [M ] =  3 4 .
3 5
5 6
Say why M ◦ L is linear and give its matrix.

10. Given θ ∈ R, let Lθ : R2 → R2 be the map which rotates points in the
plane through an angle θ about the origin.
(a) Give a geometric argument to show that Lθ is linear.
(b) Find the matrix of Lθ relative to the usual basis.
(c) Given also ψ ∈ R, which linear map is Lθ ◦ Lψ ?
Now use theorem (2.1.3) to find two different expressions for its matrix
and thence derive the elementary addition formulaes for cos and sin .
11. Verify that the formula A(x) = −1 + 5x defines an affine map A : R → R
which is not linear.
12. Verify that the formula A(x, y) = 1 − x + 2y defines an affine map A :

R2 → R.
13. Let L be the linear part of an affine map A. Verify the identity A(a + h) =
A(a) + L(h).
14. Show that each affine map A satisfies the identities
A(x) + A(y) = A(0) + A(x + y)

aA(x) = (a − 1)A(0) + A(ax).
15. Recall that a map has an inverse if and only if the map is 1-1 and onto.
Prove that an affine map A : V → W has an inverse if and only if the
same is true of its linear part L. [Hint: Express A−1 in terms of L−1 .]
Chapter 3
Continuity and Limits
For linear maps, the study of continuity assumes a particularly simple form: if a
linear map is continuous at the origin (the zero vector) then it is continuous at
every point of its domain. Moreover, all linear maps between Euclidean spaces
L : Rn → Rm are continuous.
Definition 3.0.2 An open ball in a normed vector space (V, || ||) is a set of
the form
{x ∈ V : ||x − a|| < r}
for some a ∈ V and real r > 0.
a is the centre of the ball, r its radius.
{x ∈ V : ||x − a|| < r} is denoted Br (a) (the open ball)
{x ∈ V : ||x − a|| ≤ r} is denoted B r (a) (the closed ball)
Proposition 3.0.3 (a) Each open ball contains its centre.
(b) Each open ball in V is a subset of V .
(c) For each a ∈ V and r > 0, Br (a) ⊆ B r (a).
(d) If 0 < r ≤ s then Br (a) ⊆ Bs (a).
Definition 3.0.4 A subset S of V is open if ∀a ∈ S : ∃r > 0 : Br (a) ⊆ S.
Proposition 3.0.5 V \{b} is open.
Proof: Let b ∈ V . Let a ∈ V \{b}.

[have to find r; then show Br (a) ⊂ S = V \{b}]
Choose r = ||b − a|| (which is greater than zero because a 6= b.)
Let x ∈ Br (a), i.e. ||x − a|| < r i.e. ||x − a|| < ||b − a||.
Now if x = b you get a contradiction, so x ∈ V \{b} as required. Thus V \{b}
is open.
In each normed vector space V , (i) every open ball is an open set, (ii) V is open
and (iii) ∅ is open. So together with the fact that
15
16 CHAPTER 3. CONTINUITY AND LIMITS
Theorem 3.0.6 (1) the union of an arbitrary collection of open sets is open
and
(2) the intersection of finitely many open sets is open
our definition of open set determines a topology on V .
3.1 Continuity and limits.

Compare the statements “f is continuous at a” and “f has a limit l at a”
An important contrast between the two ideas comes when we consider the do-
main of the function f . Implicit in the statement of continuity is the fact that
f (a) exists i.e. a is in the domain of f .
This is not the case in the statement about limits.
In the study of limits we compensate for the fact that a need not belong to
S = domf , by imposing some other conditions on the point a and the set S.
The simplest condition is to require that S ∪ {a} is an open set. Thus a is not
an isolated point of S ∪ {a}. Notice that if a is an isolated point, the point x
cannot approach a whilst remaining in the domain S of f ; there is an open set
containing a and no other point of S ∪ {a}.
For a similiar reason we want to avoid the case in which the vector space V
consists of just the origin (the zero vector) and so we assume that V is not {0}.
Definition 3.1.1 By saying that the function f : S → W has a limit l at a, we
mean that
(∀B)(∃A) such that f (A ∩ S) ⊆ B,

where A and B are open balls with centres a and l respectively.
Notice that if a 6∈ S then a 6∈ A ∩ S.
In terms of inequalities definition (3.1.1) says
(∀ > 0)(∃δ > 0)(∀x ∈ domf ) if ||x − a|| < δ then ||f (x) − l|| < .
The idea being
x f
a ∃δ l
∀ε
f (x)
Let f : R2 → R2 be defined by putting

0 if x = 0
f (x) = 1
||x|| x x 6= 0
Thus f (x) lies on the unit circle when x 6= 0; i.e. f (R2 \{0}) is the unit circle
in R2 .
3.1. CONTINUITY AND LIMITS. 17
(a) Notice that f (B 21 ((0, 0))) is the unit circle S 1 together with the point {0}.
Proof: Let y ∈ f (B 12 ((0, 0))) i.e.
y = f (x) for some x with ||x|| < 21 i.e.
1
y = 0 or y = ||x|| x for some x with ||x|| < 21 .
In the first case y ∈ {0} and in the second case ||y|| = 1 and so y ∈
S 1 ∪ {0}.
Conversely, suppose y ∈ S 1 ∪ {0}, i.e. y = 0 or ||y|| = 1.
In the first case choose x = 0 = (0, 0) ∈ B 21 ((0, 0)) (in the domain).
Then f (x) = y and so there’s an x ∈ B 12 ((0, 0)) such that f (x) = y; i.e.
y ∈ f (B 21 ((0, 0))).
In the second case choose x = 14 y. Then x ∈ B 12 ((0, 0)), since ||x|| =
|| 41 y|| = 14 < 12 . And so y = ||x||
1
x = f (x) for this choice of x ∈ B 21 ((0, 0)).
Thus f (B 12 ((0, 0))) = S 1 ∪ {0}.
(b) Also f (Ba ((a, 0))), a > 0, is the right half of the unit circle; i.e. {(y1 , y2 ) ∈
R2 : ||(y1 , y2 )|| = 1 and y1 > 0}.
Proof: Let y ∈ f (Ba ((a, 0))); i.e. y = f (x) for some x, with ||(x1 −
a, x2 )|| < a. Now x 6= 0 and x1 > 0, since |x1 − a| < a.
1
Thus by the definition of f , y = ||x|| x, which gives ||y|| = 1 and since
x1 > 0 we have y1 > 0; i.e. y ∈ {(y1 , y2 ) ∈ R2 : ||(y1 , y2 )|| = 1 and y1 >
0}.
Conversely suppose y ∈ {(y1 , y2 ) ∈ R2 : ||(y1 , y2 )|| = 1 and y1 > 0}.
Choose x = ay1 (y1 , y2 ) then
||x − (a, 0)|| = ||(ay12 − a, ay1 y2 )||

1
= ((ay12 − a)2 + (ay1 y2 )2 ) 2
1
= (a2 y12 y12 + a2 y12 y22 − 2a2 y12 + a2 ) 2
1
= (a2 y12 − 2a2 y12 + a2 ) 2 since y12 + y22 = 1
1
= (a2 (1 − y12 )) 2
1
= a(1 − y12 ) 2 < a since 0 < y1 < 1 so 0 < 1 − y12 < 1.
We have x = ay1 (y1 , y2 ) ∈ Ba ((a, 0)). Now f (x) ∈ f (Ba ((a, 0))) and
x
f (x) = ||x|| . So
ay1 (y1 , y2 )
f (ay1 (y1 , y2 )) = = (y1 , y2 ) = y
ay1 ||(y1 , y2 )||
(since a > 0, y1 > 0 and ||(y1 , y2 )|| = 1) i.e. y ∈ f (Ba ((a, 0))).
Thus {(y1 , y2 ) ∈ R2 : ||(y1 , y2 )|| = 1 and y1 > 0} = f (Ba ((a, 0))) as
required.
To express the statement that a function f : S ⊆ V → W is continuous at a we
simply replace l by f (a) in Definition (3.1.1). We have
Definition 3.1.2 By saying that the function f : S → W is continuous at a,

we mean that
(∀B)(∃A) such that f (A ∩ S) ⊆ B,

where A and B are open balls with centres a and f (a) respectively.
We show that f : R2 → R2 defined as before, by putting

0 if x = 0
f (x) = 1
||x|| x x 6= 0
is not continuous at zero.

We are to prove the negation of the statement
(∀B)(∃A) such that f (A ∩ S) ⊆ B, where A and B are open balls with
centres 0 and f (0) = 0 respectively; i.e. we are to show
(∃B)(∀A) such that f (A ∩ S) 6⊂ B.
We know that f (Ba ((0, 0))) = S 1 ∪ {0} for any a > 0, so ||f (x)|| = 1 for all
non zero x; we can choose radius of B to be less than 1.
Choose B = B 12 ((0, 0)).
Let A = Bδ ((0, 0)) so A ∩ R2 = Bδ ((0, 0)).
Consider x = ( 2δ , 0) ∈ Bδ ((0, 0))
( δ2 ,0)
then f ( 2δ , 0) = ||( δ2 ,0)||
which is a point on the unit circle. We have ||f ( 2δ , 0)|| = 1
so f ( 2δ , 0) 6∈ B 12 ((0, 0)); i.e.
(∃B)(∀A) such that f (A ∩ R2 ) 6⊂ B where A and B are each open balls with
centre 0. So f is not continuous at 0.
Theorem 3.1.3 If V 6= {0} and S ∪ {a} is an open set in V then the function
f : S ⊂ V → W has at most one limit l at a.
Lemma (A): If V 6= {0} and S ∪ {a} is an open set in V then every open ball
Br (a) in V contains some point different from a.
Lemma (B): Every open ball Br (a) of centre a in V contains a point of S
different from a.
The proof of Lemma (A) is an exercise.
Proof of Lemma (B): Let Br (a) be an open ball of centre a in V .
We are to show that Br (a) ∩ S contains a point other than a, where S =
domf .
a
r ∃δ1
From the diagram we see that choosing δ = min{r, δ1 } where δ1 is the radius
of the open ball centred on a and contained in S ∪ {a} (which exists because
S ∪ {a} is open) will work.
Let Bδ1 (a) be an open ball centred on a with Bδ1 (a) ⊆ S ∪ {a} (open).
Let δ = min{r, δ1 } then Bδ (a) ⊆ Br (a) and Bδ (a) ⊆ Bδ1 (a) ⊆ S ∪ {a}.
Thus Bδ (a) ⊆ Br (a) ∩ S ∪ {a}.
By Lemma (A), Bδ (a) contains a point of V other than a. Let y 6= a, be in
Bδ (a).
Then y ∈ S ∪ {a}, indeed y ∈ S and y 6= a.
Thus every open ball of centre a contains a point of S other than a.
Proof of Theorem (3.1.3): Suppose V 6= {0} and S ∪ {a} is an open set in
V . Let f : S ⊆ V → W and suppose l1 and l2 ∈ W are both limits of f at a.
Let > 0.
Let δ1 > 0 be the number for which

(∀x ∈ S ∪ {a}), x ∈ Bδ1 (a) ⇒ ||f (x) − l1 || < (i)
2
Let δ2 > 0 be the number for which

(∀x ∈ S ∪ {a}), x ∈ Bδ2 (a) ⇒ ||f (x) − l2 || < (ii)
2
Choose δ = min{δ1 , δ2 }; so Bδ (a) = Bδ1 (a) ∩ Bδ2 (a). By Lemma (B), Bδ (a)
contains a point of S other than a.
Let y ∈ S, y 6= a and y ∈ Bδ (a), so that y ∈ Bδ1 (a) and y ∈ Bδ2 (a).
So by (i) ||f (y) − l1 || < 2 and by (ii) ||f (y) − l2 || < 2 . Thus ||l1 − l2 || ≤
||f (y) − l1 || + ||f (y) − l2 || < .
But since this holds for all ; we have ||l1 − l2 || = 0, i.e. l1 = l2 as required.
Theorem 3.1.4 Properties of limits (and continuity).
(1) Translation of the origin:
(a) (in the domain)
lim f (x) = l ⇔ lim f (a + h) = l

x→a h→0
(b) (in the codomain)
lim f (x) = l ⇔ lim ||f (x) − l|| = 0

x→a x→a
(2) A sort of squeeze (sandwich) principle:
Suppose that ||g(x)|| ≤ ||f (x)|| for x ∈ domg ⊆ domf

if lim f (x) = 0 then lim g(x) = 0.
x→a x→a
(3) Linearity: Let c, d ∈ R,
if lim f (x) = l and lim g(x) = m then

x→a x→a
lim (cf (x) + dg(x)) = cl + dm
x→a
(4) Components...here we have V, W = R2 . Let f : S ⊆ R2 → R2 be given in

terms of its components f = (f1 , f2 ), where fi : S → R for i = 1, 2. Let
l = (l1 , l2 ) ∈ R2 ; then
lim f (x) = l ⇔ lim f1 (x) = l1 and lim f2 (x) = l2 .

x→a x→a x→a
(5) Restriction of domain: Sometimes it will be necessary to take the limit of

a function which has been restricted to a smaller domain, i.e. f : S ⊆
V → W suppose we wish to consider the limit of f restricted to a domain
T ⊂ S. Again T may be any set such that T ∪ {a} is open in V . Let v ∈ V
where v 6= 0.
If lim f (x) = l then lim f (a + tv) = l, i.e. lim (f ◦ (a + vId))(t) = l.
x→a t→0 t→0
(6) Left and Right Limits: (note the domain)

Let f : S ⊆ R → W , l ∈ W
lim f (t) = l if and only if lim f (t) = l and lim f (t) = l.

t→0 t→0 t→0
t<0 t>0
(7) Continuity via limits:

Let f : S ⊆ U → V and g : T ⊆ V → W and l ∈ T
g is continuous at l ∈ T if and only if lim f (x) = l ⇒ lim g◦f (x) = g(l).

x→a x→a
Some proofs for parts of theorem (3.1.4):

(1) Translation of the origin (in the codomain).
First suppose that limx→a f (x) = l. Set g = || || ◦ (f − l) with domg =
domf.
Let > 0.
Required to prove: limx→a ||f (x) − l|| = 0.
But setting g = || || ◦ (f − l) we have
g(x) = (|| || ◦ (f − l))(x) (∀x ∈ S)

= ||f (x) − l||
Choose δ > 0 for which for all x ∈ S with ||x−a|| < δ we have ||f (x)−l|| <
. Thus g(x) < and so ||g(x)|| < , since g ≥ 0; i.e. limx→a g(x) = 0, i.e.
limx→a ||f (x) − l|| = 0 as required.
Conversely suppose that
lim ||f (x) − l|| = 0.

x→a
Let > 0 and choose δ > 0 for which for all x ∈ S, with ||x − a|| < δ we
have ||f (x) − l|| < .
(i.e. ||||f (x) − l|| − 0|| < ). That is limx→a f (x) = l.
(2) Suppose that ||g(x)|| ≤ ||f (x)|| for x ∈ domg ⊆ domf . And suppose that
lim f (x) = 0.
x→a
(It is necessary to assume that domg ∪ {a} is open.)
Required to prove: ∀B = B (0) there is A = Bδ (a) such that ∀x ∈

A ∩ domg, f (x) ∈ B (0).
Let B = B (0). Choose A = Bδ (a) where ∀x ∈ (A ∩ domf ), f (x) ∈ B (0).
Let y ∈ g(A ∩ domg) i.e. y = g(x) for some x ∈ A ∩ domg.
Now since x ∈ A ∩ domg ⇒ x ∈ A ∩ domf. And so f (x) ∈ B (0) by
the assumption that limx→a f (x) = 0, i.e. ||f (x)|| < . This means that
||g(x)|| < since ||g(x)|| ≤ ||f (x)|| (for all x). And so g(x) = y ∈ B (0).
Hence g(A ∩ domg) ⊆ B, i.e. limx→a g(x) = 0 as required.
(3) Linearity: Suppose that limx→a f (x) = l and limx→a g(x) = m.
Required to prove: limx→a (cf (x) + dg(x)) = cl + dm.
Let > 0. Let B = B (cl + dm). Let S = dom(cf ) = domf , T =
dom(dg) = domg.
Choose A = A1 ∩ A2 where
A1 = Bδ1 (a) and (cf )(A1 ∩ S) ⊆ B 2 (cl) and
A2 = Bδ2 (a) and (dg)(A2 ∩ T ) ⊆ B 2 (dm).
(Now I have to prove cf (x) + dg(x) ∈ B, for all x ∈ (A ∩ S ∩ T ).)

Let y = cf (x) + dg(x) where x ∈ A ∩ S ∩ T ;
⇒ x ∈ A ∩ S and x ∈ A ∩ T
⇒ x ∈ A1 ∩ S and x ∈ A2 ∩ T .
So cf (x) ∈ B 2 (cl) and dg(x) ∈ B 2 (dm).
We have
||cf (x) + dg(x) − (cl + dm)|| = ||cf (x) − cl + dg(x) − dm||
≤ ||cf (x) − cl|| + ||dg(x) − dm||
< 2 + 2 =
Thus y = cf (x) + dg(x) ∈ B (cl + dm)
i.e. cf (x) + dg(x) ∈ B, for all x ∈ (A ∩ S ∩ T ).
Hence limx→a (cf (x) + dg(x)) = cl + dm.
(4) Components: Let f : S → R2 be written in terms of its components as
f = (f1 , f2 ) where fi : S → R, i = 1, 2. Then we are to prove that
lim f (x) = l ⇔ lim f1 (x) = l1 and lim f2 (x) = l2 .
x→a x→a x→a
The key to the proof is the following lemma:

Each ball contains a box.
If B is an open ball in R2 , then there are open balls β1 and β2 in R such
that β1 × β2 ⊆ B. Moreover if B has centre (l1 , l2 ) and radius r > 0,
then β1 and β2 can be chosen to have centres at l1 and l2 respectively and
radius √r2 > 0.
Proof:
Let B = Br ((l1 , l2 )). Choose β1 = B √r2 (l1 ) and β2 = B √r2 (l2 ).
Let
y ∈ β1 × β2
= {(y1 , y2 ) ∈ R2 : y1 ∈ B √r2 (l1 ), y2 ∈ B √r2 (l2 )}
so ||y − (l1 , l2 )|| = ||(y1 − l1 , y2 − l2 )||
1
= ((y1 − l1 )2 + (y2 − l2 )2 ) 2
2 2 1
< ( r2 + r2 ) 2 = r
i.e. y ∈ Br ((l1 , l2 )) = B. Thus β1 × β2 ⊆ B as required.

To prove the theorem is left as an exercise.
(5) Restriction of domain: Let v ∈ V, v 6= 0. If limx→a f (x) = l then

limt→0 f (a + tv) = l, i.e. limt→0 f ◦ (a + vId)(t) = l.
δ1
Proof: Let > 0 Let B = B (l). Choose A = Bδ (0) ⊆ R, where δ = ||v||
and δ1 is the positive real number such that ∀x ∈ Bδ1 (a) ∩ domf we have
f (x) ∈ B (l).
Let y ∈ (f ◦ (a + vId))(A ∩ domf ) so
y = (f ◦ (a + vId))(t) where t ∈ A (since domf ◦ (a + vId) = A ∩ R = A.)
So
δ1
|t| < δ = ||v||
i.e. ||v|||t| < δ1 ⇒ ||tv|| < δ1 by homogeneity. That is, ||a + tv − a|| < δ1 ,
which means that a + tv ∈ Bδ1 (a). Also t ∈ domf ◦ (a + vId) ⇒ a + tv ∈
domf .
Thus a + tv ∈ Bδ1 (a) ∩ domf . And so f (a + tv) ∈ B (l), i.e. limt→0 f (a +
tv) = l as required.
3.2 Continuity for Linear Maps:

Theorem 3.2.1 If a map L : V → W is linear then for each a ∈ V , L is
continuous at a ⇔ L is continuous at 0.
Theorem 3.2.2 If a map L : V → W is linear then L is continuous at 0 ⇔

there is a constant c such that for all x ∈ V ||L(x)|| ≤ c||x||.
Theorem 3.2.3 Every linear map L : Rm → Rn is continuous.

3.2. CONTINUITY FOR LINEAR MAPS: 23
In fact one can replace Rn and Rm with any Banach space of finite dimension.
Proof of 3.2.1: We have L : V → W , a ∈ V . Suppose that L is continuous at
a ∈ V ; i.e.
∀ > 0∃δ > 0 such that ∀x ∈ V with ||x − a|| < δ, ||L(x) − L(a)|| < .
Required to prove: L is continuous at 0. Note that every v ∈ V is of the

form x − a for some x ∈ V (namely x = v + a) because V is a vector space.
Also L(x) − L(a) = L(x − a), since L is linear.
Let > 0. Choose the real number δ > 0 for which ||x − a|| < δ and ||L(x) −
L(a)|| < . Since x ∈ V , we have x − a ∈ V , so let v = x − a. Then
||x − a|| < δ ⇒ ||v|| < δ and ||L(x) − L(a)|| < ⇒ ||L(x − a)|| = ||L(v)|| < .
i.e. ∀ > 0, ∃δ > 0 such that ∀v ∈ V , ||v|| < δ ⇒ ||L(v)|| < . So ||L(v) −
L(0)|| < , since L(0) = 0.
Thus L is continuous at 0.
Conversely suppose L is continuous at 0. We have for each x ∈ V , x − a ∈ V ,
i.e. x − a = v for some v ∈ V . (Also v ∈ V , v + a ∈ V so x = v + a ∈ V .)
Required to prove: L is continuous at a ∈ V .
Let > 0. Choose the real number δ > 0 for which ||v|| < δ and ||L(v)|| <
for all v ∈ V . Then v = x − a for some x ∈ V and ||v|| < δ ⇒ ||x − a|| < δ.
Hence ||L(x − a)|| < .
Now ||L(x − a)|| = ||L(x) − L(a)|| since L is linear. Thus L is continuous at a.
Proof of 3.2.2: Suppose that L is continuous at 0. Let x ∈ V .
To find: c ∈ R such that ||L(x)|| ≤ c||x||. Consider B1 (0) ∈ W .
δx
So if ||L(x)|| < 1 when ||x|| < δ, then for x 6= 0, the point 2||x|| ∈ Bδ (0)
δx δ x
and ||L( 2||x|| )|| = | 2 |||L( ||x|| )|| < 1; and if x = 0, then L(x) = 0; so choose
c = 2δ .
Let δ > 0 be the radius of an open ball centred on 0 such that ∀x ∈ Bδ (0)
we have L(x) ∈ B1 (0). (This is OK because L is assumed continuous at 0.)
Choose c = 2δ . Now if x = 0, then L(0) = 0 and so 0 = ||L(x)|| ≤ 2δ ||x|| = 0.
δx δx
Suppose x 6= 0. Then 2||x|| ∈ V , since it’s a vector space. And || 2||x|| || =
δ δ δx δ
| 2||x|| |||x|| ≤ 2||x|| ||x|| < δ. And so ||L( 2||x|| )|| < 1 i.e. 2||x|| ||L(x)|| < 1, i.e.
||L(x)|| ≤ c||x|| as required.
Conversely suppose that there exists c ∈ R such that ∀x ∈ V, ||L(x|| ≤ c||x||.
Let B (0) be an open ball centred on 0 for some > 0. Choose δ = c , then
x ∈ B c (0) ⇒ ||x|| < c ⇒ c||x|| < ⇒ ||L(x)|| < and so L(x) ∈ B (0).
To complete the chapter we examine the notion of uniform continuity. The

definition of continuity was given as follows
f : S ⊆ V → W is continuous on S ⇐⇒
∀y ∈ S
∀B = Bε (f (y)) (in the codomain)
∃A = Bδ (y) (in the domain)
such that f (A) ⊆ B
which is equivalent to
f : S ⊆ V → W is continuous on S ⇐⇒
∀y ∈ S
∀ε > 0
∃δ > 0
such that ∀x ∈ S for which x ∈ Bδ (y) we have f (x) ∈ Bε (f (y)).
Compare this to the statement of uniform continuity
f : S ⊆ V → W is uniformly continuous on S ⇐⇒
∀ε > 0
∃δ > 0
such that ∀x and ∀y ∈ S, with ||x − y||V < δ we have ||f (x) − f (y)||W < ε.
In the definition of continuity the δ depends on both ε and y, whilst in the
definition of uniform continuity the δ is independent of y. Consequently
Proposition 3.2.4 If f : S ⊆ V → W is uniformly continuous then f is
continuous on S.
The proof is left to you. It depends only on the fact that if there exists a δ
which can satisfy any choice of y, then for any fixed y we can certainly find a
suitable δ (the very same one will do for all points in the domain.)
The converse is false. Consider for example, the function Id−1 : (0, 1] → R. It
is continuous on (0, 1] but it is not uniformly continuous on (0, 1].
Proof: We leave the proof of continuity to the reader and show that Id−1 is
not uniformly continuous. Choose ε = 1. Let δ > 0 and consider two cases (i)
δ ≥ 1 and (ii) 0 < δ < 1. In the first case (i) we can choose x = 21 and y = 14 ,
then |x − y| = 14 < δ, whilst | x1 − y1 | = 2 > 1 = ε.
√
2
δ +8δ
And if 0 < δ < 1, then for n ≥ δ+ 2δ (so that n2 δ −nδ −2 ≥ 0) choose x = n1
n2 δ
and y = n1 + 2δ . Then |x − y| = 2δ < δ and | x1 − y1 | = |n − 2+nδ
2n
| = | 2+nδ | ≥ 1,
by choice of n.
Exercise 3
1. Sketch each of the following sets and decide if they are open sets. Give brief
reasons for your decision.
(a) The open interval (−1, 1) in the normed vector space (R, | |).
(b) The closed interval [−1, 1] in R.
(c) The interval [0, ∞) in R.
(d) The line {(2x, x) : x ∈ R} in R2 .
(e) The square (0, 1] × [0, 1) in R2 .
3.2. CONTINUITY FOR LINEAR MAPS: 25
(f) The set {(0, 0, 0)} ∪ (0, 1) × (0, 1) × (0, 1) in R3 .
2. Prove that each of the following sets is open in the appropriate vector space
with usual norm.
(a) {(x, y) ∈ R2 : (x − 1)2 + (y + 2)2 < 1}.

(b) {x ∈ R : |x − 3| < 1}.
(c) (0, ∞) in R.
(d) R\{0}.
3. By considering the collection of intervals {(− n1 , n1 ) : n ∈ N } show that the

intersection of an infinite number of open sets need not be open.
4. Prove the following statements. In each normed vector space V ,
(a) each open ball is an open set.

(b) the subset V itself is open.
(c) the empty set ∅ is open.
5. Prove that in any normed vector space
(i) the union of an arbitrary collection of open sets is open and

(ii) the intersection of finitely many open sets is open.
6. Find the derivative of the function Id2 at the point a ∈ R from first princi-
2 2
ples. What is the domain S of the Newton quotient Id (x)−Id (a) ? x−a
Is S ∪ {a} an open subset of R?
7. Verify that each of the following functions have the stated limits.
(i) Identity function: Id : V → V : limx→a Id(x) = limx→a x = a,

(ii) Constant function: c : V → W , c ∈ W , limx→a c(x) = limx→a c = c.
8. Let L : R2 → R be given by
(
x2 −y 2
if x + y 6= 0
L(x) = L(x, y) = x+y
2x otherwise
Decide whether L is continuous at 0 = (0, 0) ∈ R2 . Justify your claim.
9. Assume that S ∪ {a} is an open subset of V and that V 6= {0}. Use the
fact that V must contain some point different from 0, to prove Lemma A:
that every open ball Br (a) in V contains some point different from a.
10. (i) In theorem (3.1.4), prove: translation of origin in the domain, (ii)
complete the proof of components and (iii) prove the statement on limit
of a composite.
(iv) Extend the statement on limit of a composite to: Let f : U → V ,
g : V → W and suppose that f is continuous. Then g is continuous if and
only if limx→a g ◦ f (x) = g(limx→a f (x)). (continuity via limits) Prove it.
11. In this question we will prove Theorem 3.2.3.

Let y = Ax where y ∈ Rn , x ∈ Rm and A is a real matrix with ijth entry
aij . By
Pm applying the Cauchy-Schwartz inequality to
yi = j=1 aij xj (1 ≤ i ≤ n), show that there are constants bi such that
p
|yi | ≤ bi (x21 + · · · + x2m ).
Deduce that there is a constant c such that ||Ax|| ≤ c||x||, where || ||
denotes the Euclidean norm on either space.
Hence prove Theorem 3.2.3.
12. Prove that an affine map is continuous if and only if its linear part is
continuous.
13. Once you get away from vector spaces of finite dimension, it is no longer
true that every linear map is automatically continuous. For example, let
C00 = {f ∈ F(N, R) : f (n) = 0 for all sufficiently large n}, where F(N, R)
is the vector space of all functions from N to R with the usual pointwise
operations. Another way of expressing the condition for f to belong to
C00 is that {n : f (n) 6= 0} is finite.
P∞
Then define L : C00 → R by L(f ) = n=1 nf (n).
Notice that the sum on the right hand side is in fact a finite sum. Now
(a) Check that L is linear.
(b) Show that {|L(f )| : ||f || ≤ 1} is not bounded above. Then deduce
that L is not continuous, from the following theorem:
For each continuous linear map L : V → W the set {||L(x)||W :

||x||V ≤ 1} is bounded above.

1 n=r
[Hint: consider the functions er , defined by er (n) = and
0 otherwise
define a norm on er by ||er || := sup{|er (n)| : n ≤ 1} = |er (1)|. Show
that for all r ∈ N, ||er || ≤ 1. On the other hand {|L(er )| : r ∈ N} is not
bounded above.]
14. Prove Proposition 3.2.4.

15. The function f : ( 21 , ∞) → R given by f (x) = 2x−1 1
is continuous on
1 1
( 2 , ∞), prove that it is not uniformly continuous on ( 2 , ∞).
Chapter 4
Spaces of Linear Maps
4.1 The vector space L(V, W )
L(V, W ) is the set of all continuous linear maps L : V → W (when V, W are

Euclidean spaces all linear maps are continuous, but this is not true for general
vector spaces.)
Define the vector addition by:
(L + M )(x) = L(x) + M (x), for all L, M ∈ L(V, W ), ∀x ∈ V
and the scalar multiplication by:
(λL)(x) = λL(x), for all L ∈ L(V, W ), i.e x ∈ V and ∀λ ∈ R.
[When studying the whole space L(V, W ) it is often best to think of L ∈ L(V, W )
as a point rather than as an arrow L : V → W .]
Theorem 4.1.1 L(V, W ) is a vector space with respect to the usual operations.
Proof: The zero vector in L(V, W ) is the zero map 0 : V → W : v 7→ 0. We

must show that this map is a continuous linear map.
Let u, v ∈ V and λ ∈ R, then 0(u + λv) = 0 and 0(u) + λ0(v) = 0. Thus 0 is a
linear map. Let a ∈ A. To show that 0 is continuous at a, let B = B (0) ⊆ W .
Choose A = B (a) ⊆ V . Now 0(A) = {0}, since 0(v) = 0 for all v ∈ V . And
since every ball contains its centre we have 0(A) = {0} ⊆ B (0). Thus 0 is a
continuous map. So 0 is the zero vector in L(V, W ).
Let L, M ∈ L(V, W ), we need to show that L + M , λL ∈ L(V, W ). By the
theorem on limits (3.1.4)–linearity, we have that L + M and λL are continuous;
since L, M are continuous being in L(V, W ).
To show: L + M and λL are linear.
27
28 CHAPTER 4. SPACES OF LINEAR MAPS
Let x, y ∈ V , µ ∈ R. Then
(L + M )(x + µy) = L(x + µy) + M (x + µy)

by definition of vector addition in L(V, W )
= L(x) + µL(y) + M (x) + µM (y)
since L, M are linear
= L(x) + M (x) + µL(y) + µM (y)
W is a vector space
= (L + M )(x) + µ(L + M )(y)
definition of vector addition and
scalar multiplication in L(V, W )
Hence L + M is linear.
Similiarly
(λL)(x + µy) = λ(L(x + µy))
= λ(L(x) + µL(y))
= λL(x) + λµL(y)
and so λL is linear.
Finally dom(L + M ) = V = dom(λL) and codom(L + M ) = W = codom(λL).
Thus L + M and λL are in L(V, W ).
From 3.2.3 we have that every linear map from Rm to Rn is continuous. Hence
L(Rm , Rn ) = {L : Rm → Rn : L is linear}.
• Given a linear map L : R2 → R, we know from earlier that there are

unique a, b ∈ R: for all x, y ∈ R, L(x, y) = ax + by; i.e. L = aId1 + bId2 .
Thus the pair of functions {Id1 , Id2 } is a basis for L(R2 , R) over R. It is
a 2-dimensional vector space over R.
• Consider a linear map L : R2 → R2 , we know that
∃a, b, c, d ∈ R : for all x, y ∈ R, L(x, y) = (ax + by, cx + dy);
i.e. L = (aId1 + bId2 , cId1 + dId2 ) = (a, c)Id1 + (b, d)Id2 =

(a, 0)Id1 + (0, c)Id1 + (b, 0)Id2 + (0, d)Id2 . Thus the set
{(1, 0)Id1 , (0, 1)Id1 , (1, 0)Id2 , (0, 1)Id2 } = {(Id1 , 0), (0, Id1 ), (Id2 , 0), (0, Id2 )}
is a basis for L(R2 , R2 ).
To make meaningful ideas of closeness and approximation in the vector

space L(V, W ), we introduce a norm. (This generalises from L = mId ∈ L(R, R)
and ||L|| = |m| see Exercise 1.)
We could define || || : L(V, W ) → R to be the map which assigns to each
L ∈ L(V, W ) the value sup||x||≤1 ||L(x)|| (:=sup{||L(x)||W : ||x||v ≤ 1}.
But before we accept this definition we should check that sup||x||≤1 ||L(x)|| ex-
ists. We have
4.1. THE VECTOR SPACE L(V, W ) 29
Theorem 4.1.2 For each continuous linear map L : V → W , the set {||L(x)|| :
||x|| ≤ 1} is bounded above.
Notice that by theorem 3.2.2, {||L(x)|| : ||x|| ≤ 1} ⊆ R, is bounded above and

its not empty, since ||L(0)|| is in the set (0 ∈ V = domL and L(0) = 0 since
L is linear.(*)) Thus by the least upper bound property of R, sup||x||≤1 ||L(x)||
exists.
(*) Notice that if L were defined on a subset S of V not containing 0, then
we just need to establish that there is another point in {||L(x)|| : ||x|| ≤ 1}.
But by lemma (A) we have a point y 6= 0 in Bδ (0) where δ is large enough
y
so that Bδ (0) ∩ S 6= ∅ and ||y|| ∈ B1 (0), thus there exists x:||x|| ≤ 1 with
||L(x)|| ∈ {||L(x)|| : ||x|| ≤ 1}.
To prove 4.1.2: Let L : V → W be a continuous linear map. Let c ∈ R be the
real number for which ||L(x)|| ≤ c||x|| (which exists by Theorem (3.2.2)).
Let r ∈ {||L(x)||W : ||x||V ≤ 1}. So r = ||L(x)||W for some x ∈ V with
||x||V ≤ 1. And r ≤ c||x|| ≤ c.
i.e. ∀r ∈ {||L(x)|| : ||x|| ≤ 1}, r ≤ c. Thus {||L(x)||W : ||x||V ≤ 1} is bounded
above.
Definition 4.1.3 The map || || : L(V, W ) → R defined by ||L|| = sup{||L(x)|| :

||x|| ≤ 1} defines the supremum norm on L(V, W ).
(It is an exercise to verify that the map is a norm.)
Theorem 4.1.4 For each continuous linear map L ∈ L(V, W ),

∀x ∈ V , ||L(x)||W ≤ ||L||L(V, W ) ||x||V .
Proof: Let L : V → W ∈ L(V, W ). We’ll consider the two cases x = 0 and

x 6= 0.
If x = 0 then
||L(x)||W = ||L(0)||W = ||(0)||W since a linear map sends the zero to zero
= 0 by positivity
Since || || is a norm we have ||L||L(V, W ) ≥ 0.

Hence ||L||L(V, W ) ≥ ||L(x)||W .
Next suppose x 6= 0 then
x 1 x
|| ||x|| ||V = ||x|| ||x||V = 1 by homogeneity of || || and so ||x|| ∈ B1 (0).
Furthermore
x 1
||L( )||W = || L(x)||W
||x|| ||x||
by linearity, which equals
1
||L(x)||W
||x||
by homogeneity.
Now for all v ∈ V with ||v|| ≤ 1, we have
||L(v)|| ≤ ||L||L(V, W ) since ||L||L(V, W ) is an upper bound of

{||L(x)||W : ||x||V ≤ 1}.
1
And so ||x|| ||L(x)||W ≤ ||L||L(V, W )
i.e. ||L(x)||W ≤ ||L||L(V, W ) ||x||.
Thus in both cases we have the required result
i.e. ∀x ∈ V , ||L(x)||W ≤ ||L||L(V, W ) ||x||.
To think a little about the geometry we mention an interpretation for ||L|| in
the case L ∈ L(R2 , R2 ). The key fact is that such a map L sends the unit circle
in the domain into a (possibly skewed) ellipse in the codomain (or exceptionally
into a line or a point).
The maximum distance of L(x) from the origin occurs at a vertex of the ellipse
i.e. ||L|| := sup||x||≤1 ||L(x)|| = maximum value of ||L(x)|| as x moves around
the unit circle= length of the semi-major axis of the ellipse.
More generally ||L|| is the radius of the image L(B1 (0)) of the unit ball of V i.e.
||L|| is the radius of the smallest closed ball in W with centre 0, which contains
the image of the closed unit ball in V .
Exercise 4
1. State which of the following projection functions from R2 to R belong to

L(R2 , R).
Id1 , Id1 + Id2 , Id21 , Id1 · Id2 .
2. Write down a basis for the vector space L(R2 , R2 ).

3. Show that || || : V → R : v 7→ ||v|| is continuous at each point of V , where
V is a normed vector space.
4. Write down a basis for the vector space L(Rm , R).
5. In each case find the norm of the linear map L directly from the definition:
(a) L : R → R with L(x) = −2x
(b) L : R2 → R with L(x, y) = x
(c) L : R2 → R2 with L(x, y) = (3x, y).
4.1. THE VECTOR SPACE L(V, W ) 31
6. Prove that the map defined in 4.1.3 is a norm of L(V, W ).

7.
(a) Let L and M be two continuous linear maps such that the composite
M ◦ L is defined. By quoting appropriate theorems show that M ◦ L
is a continuous linear map and prove that
||M ◦ L|| ≤ ||M ||||L||.
(b) Prove the following statement:

If L ∈ L(Rm , Rn ) with matrix given by
 
a11 a12 . . . a1m
 .. ..  ,
[L] =  . . 
an1 an2 ... anm
s
n P
n
a2ij .
P
then ||L|| ≤
i=1 j=1
8. Find an upper bound for ||L|| where L : R2 → R3 with
L(x, y) = (2x + y, x + 3y, 3x − 2y)
.
9. By a product of V × W → U we mean a map ∗ : V × W → U |(v, w) 7→ v ∗ w,
satisfying the following:
P1. If v, v 0 ∈ V and w ∈ W , then (v + v 0 ) ∗ w = v ∗ w + v 0 ∗ w and
If v ∈ V and w, w0 ∈ W , then v ∗ (w + w0 ) = v ∗ w + v ∗ w0
P2. If c ∈ R, then (cv) ∗ w = c(v ∗ w) = v ∗ (cw)
P3. For all v, w we have ||v ∗ w||U ≤ ||v||V ||w||W .
Let V, W be normed vector spaces. Show that the evaluation map
L(V, W ) × V → W given by (L, v) 7→ L(v)
is a product.
10. Prove the statement, if L : Rm → Rn is a linear map, then it is uniformly

continuous.
[Hint: first show that ∃c such that for all x, y ∈ Rm we have
||L(x) − L(y)|| ≤ c||x − y||
This says that L satisfies a Lipschitz inequality with Lipschitz constant c.]
Chapter 5
Tangency Of Maps
The next step is to understand tangency of maps between normed vector spaces.
This is our main application of the notion of limit.
In elementary calculus, a function f : R → R is said to be differentiable at a
point a of its domain, if there is a line which is tangent to the graph of the
function at the point (a, f (a)). This line is the graph of some affine function
A : R → R.
Recall linear functions L : R → R| x 7→ mx, x ∈ R and
affine functions A : R → R| x 7→ mx + c, m, c ∈ R.
For a pair of functions f : R → R and g : R → R, if g is tangent to f at a then
the ratio |f (x)−g(x)|
|x−a| is very small as x approaches a.
f
f (x)
g(x)
g
a x
Definition 5.0.5 A map g : A ⊂ V → W is tangent to a map f : B ⊆ V → W

at a point a ∈ A ∩ B if and only if
||f (x) − g(x)||

f (a) = g(a) and lim = 0.
x→a ||x − a||
Examples.
(1) Id2 : R → R : x 7→ x2 and

0 : R → R : x 7→ 0 are tangent at 0 ∈ R.
33
34 CHAPTER 5. TANGENCY OF MAPS
(2) Id3 : R → R : x 7→ x3 and

3Id − 2 : R → R : x 7→ 3x − 2 are tangent at 1 ∈ R.
(3) Id4 : R → R : x 7→ x4 and
−Id2 : R → R : x 7→ −x2 are tangent at 0 ∈ R
It’s an exercise to verify these claims.
5.1 Tangency and Affine Maps

If two straight lines are tangent at some point then it is intuitively obvious that
they must be identical.
Theorem 5.1.1 If two affine maps are tangent at some point then the two
maps are identical.
We will use the following lemma to prove the theorem.
Lemma 5.1.2 If L : V → W is a linear map which is tangent to the zero map
from V to W at 0 ∈ V then L itself must be the zero map.
Proof of Theorem 5.1.1:
Let A : V → W be the affine map given by
A(x) = cA + LA (x)
and let B : V → W be the affine map given by
B(x) = cB + LB (x).
Suppose that A is tangent to B at the point a ∈ V .

Thus (LB − LA )(a) = cA − cB and
||A(x) − B(x)||
lim = 0.
x→a ||x − a||
Now
||A(x) − B(x)|| = ||LA (x) − LB (x) − (cB − cA )||
= ||(LA − LB )(x) − (LB − LA )(a)||
= ||LA (x − a) − LB (x − a)||.
Thus
||A(x)−B(x)||
0 = lim ||x−a||
x→a
= lim ||LA (x−a)−L B (x−a)||
||x−a||
x→a
= lim ||(LA −L B )(x−a)||
||x−a|| .
x→a
In addition (LA − LB )(0) = 0(0) since LA − LB is a linear map.

Thus LA − LB is a linear map which is tangent to the zero map at zero.
Now by the lemma this means that LA − LB = 0, i.e. LA = LB , i.e. LA (x) =
5.1. TANGENCY AND AFFINE MAPS 35
LB (x), for all x ∈ V . Finally since 0 = (LA − LB )(a) we have cA = cB . Thus

A = B as required.
Proof of the lemma is an exercise.
Exercise 5
1. Give a pair of functions f : R → R and g : R → R such that g is not tangent
to f at 0 and yet limx→0 |f (x)−g(x)|
|x| = 0.
Illustrate with a sketch.
2. The 3 examples given in the examples page 33 were
Id2 : R → R : x 7→ x2 and Id3 : R → R : x 7→ x3 and

0 : R → R : x 7→ 0 3Id − 2 : R → R : x 7→ 3x − 2
are tangent at 0 ∈ R are tangent at 1 ∈ R
Id4 : R → R : x 7→ x4 and
−Id2 : R → R : x 7→ −x2
are tangent at 0 ∈ R
Verify that these examples do indeed satisfy the definition of tangency.
3. Suppose that f : R → R and g : R → R are related by the formula
f (x) = g(x) + (x − a)n . For which values of n > 0 (n an integer) is f
tangent to g at a?
4. Verify that the relation “. . . is tangent to . . . at a” is an equivalence relation
on the functions which map neighbourhoods of the point a in V into W .
(Recall that an equivalence relation “∼” on a set S is a relation such that
for all x, y and z in S
(i) x ∼ x (ii) x ∼ y ⇒ y ∼ x
(iii) x ∼ y and y ∼ z ⇒ x ∼ z.
5. Suppose that a continuous map g : U → W is tangent at a to a map
f : U → W . Prove that f is also continuous at a.
6. Let f : R → R be the absolute value function and let g : R → R be an affine
map for which g(0) = 0. Show that it is not possible for g to be tangent
to f at 0.
7. Prove lemma 5.1.2. (Hint: Given a non-zero vector h, there is a non-zero
vector v such that h = tv for some scalar t. Furthermore h → 0 as t → 0.)
36 CHAPTER 5. TANGENCY OF MAPS
Chapter 6
Concept of Derivative
The basic idea of a derivative for maps f : V → W is simple.

A function is differentiable at a point if there is an affine map which approxi-
mates the function very closely near to this point.
The derivative will then be the linear part of that affine map.
In the previous exercises you proved that there is no affine map which is tangent
to the absolute value function abs : R → R at 0 ∈ R. Equivalently we say that
abs : R → R is not differentiable at 0 ∈ R.
Definition 6.0.3 Let f : V → W , U open subset of V , V, W normed vector

spaces and a ∈ U .
f is differentiable at a ∈ U ⇔ there exists a continuous affine map A : V → W
such that A is tangent to f at a; i.e.
||f (x) − A(x)||

f (a) = A(a) and lim = 0.
x→a ||x − a||
Geometrically for a function f : R2 → R, the graph of f i.e. {(x, f (x)) : x =

(x, y) ∈ R2 }, is a surface in R3 . The map f is differentiable at a ∈ R2 , if there
exists a continuous affine map A : R2 → R, f (a) = A(a) and lim ||f (x)−A(x)||
||x−a|| =
x→a
0; that is to say if there is a plane which is tangent to f at a.
Notice in the following example; there does not exist a plane which is tangent
to f at 0.
37
38 CHAPTER 6. CONCEPT OF DERIVATIVE
To prove that a given function is differentiable at a point a in its domain, you

must find a suitable affine map and verify that the map you have chosen is
tangent to f at a.
Example
Consider f : R2 → R with f (x, y) = x2 + y 2 + 1. I want to prove that f is
differentiable at (0, 0) ∈ R2 .
From the diagram an appropriate choice of A is

A : R2 → R, given by A(x, y) = 1.
To Prove: A is tangent to f at (0, 0).
Proof: We have A(0, 0) = 1 = f (0, 0).
Now
||f (x,y)−A(x,y)|| ||x2 +y 2 +1−1||
lim ||(x,y)−(0,0)|| = lim
(x,y)→(0,0) (x,y)→(0,0) ||(x,y)−(0,0)||
||x2 +y 2 ||
= lim
(x,y)→(0,0) ||(x,y)||
||x2 +y 2 ||
= lim √ 2 2
(x,y)→(0,0)
px +y
= lim x2 + y 2 = 0.
(x,y)→(0,0)
Hence f is differentiable at (0, 0) ∈ R2 .

Suppose f is differentiable at a point a ∈ V ; i.e. ∃ an affine map A : V → W
such that A is tangent to f at a; i.e. A(x) = c + L(x) for some linear map
L : V → W and
||f (x) − A(x)||
f (a) = A(a) = c + L(a) whilst lim = 0.
x→a ||x − a||
By translation of the origin, this says that
||f (a + h) − A(a + h)||

lim = 0.
h→0 ||h||
39
From which we have

||f (a + h) − f (a) − L(h)||
lim = 0,
h→0 ||h||
since
A(a + h) = c + L(a + h) = A(a) + L(h)
= f (a) + L(h).
And so we have an alternate definition of differentiability
Proposition 6.0.4 f : U ⊆ V → W is differentiable at a ∈ U if and only if

there exists a continuous linear map L : V → W with
||f (a + h) − f (a) − L(h)||

lim = 0.
h→0 ||h||
Definition 6.0.5 (Fréchet Derivative)

For a map f : U ⊆ V → W which is differentiable at a point a ∈ U , we define
its Fréchet derivative at a to be the unique continuous linear map La : V → W
satisfying
||f (a + h) − f (a) − La (h)||
lim = 0.
h→0 ||h||
Notation: We denote the Fréchet derivative of f at a, by Df (a). Thus Df (a) =

La : V → W .
In fact from now on we will drop the adjective Fréchet; so derivative will mean
Fréchet derivative; otherwise we will use the phrase the ordinary (usual) deriva-
tive.
That is when U = W = R, we have two sorts of derivative
(a) the ordinary (usual) derivative
f (a + h) − f (a)
f 0 (a) = lim
h→0 h
(it’s a real number) and
(b) the (Fréchet) derivative Df (a) which satisfies
||f (a + h) − f (a) − Df (a)(h)||

lim = 0,
h→0 ||h||
its a linear map from R → R. In fact we saw earlier that Df (a)(h) =

f 0 (a)h.
Finding Df (a).
Example 1.
Let f : V → W be a continuous linear map. We will show that this map is
differentiable at each point a ∈ V and we will find Df (a).
Let a ∈ V .
Choose La = f .
Then (i) La : V → W is a continuous linear map (by assumption) and (ii)
||f (a + h) − f (a) − La (h)|| ||f (a) + f (h) − f (a) − f (h)||

lim = lim
h→0 ||h|| h→0 ||h||
since f is linear and La = f . And the right hand side is clearly 0.

Example 2.
Let f : R2 → R with f (x, y) = x2 + y 2 . We will show that f is differentiable at
each point (a, b) ∈ R2 , by finding the linear map Df (a, b).
Let a = (a, b) and h = (h, k) ∈ R2 .
We have to find a linear map L such that
||f (a + h) − f (a) − L(h)||

lim = 0.
h→0 ||h||
f (a + h) − f (a) = f (a + h, b + k) − f (a, b)
= (a + h)2 + (b + k)2 − a2 − b2
= 2ah + 2bk + h2 + k 2 ,
where 2ah + 2bk is the part which is linear in h and k. So choose L(h) =
2ah + 2bk.
Choose La (h) = 2ah + 2bk. Now you should check that this is a linear map.
Then by theorem 3.2.3, it is continuous.
Also
||f (a + h) − f (a) − La (h)|| |h2 + k 2 | p
lim = lim = lim h2 + k 2 = 0.
h→0 ||h|| h→0 ||(h, k)|| h→0
Thus our choice of La satisfies the definition of the derivative of f at a.

So Df (a)(h) = 2ah + 2bk.
Note that in this example the derivative depends upon the point (a, b) = a
unlike in the first example.
We could write Df (a) = 2aId1 + 2bId2 , where Id1 : R2 → R| (x, y) 7→ x and
Id2 : R2 → R| (x, y) 7→ y.
So that evaluating at h = (h, k) we have
Df (a)(h) = (2aId1 + 2bId2 )(h, k) = 2ah + 2bk.
Example 3.
Let f : R2 → R2 be given by f (x, y) = (x2 + y 2 + 1, xy + 1). Let U be an open
rectangle in R2 , a = (a1 , a2 ) a point in U , h = (h1 , h2 ) ∈ R2 . Then
f (a + h) − f (a) = (2a1 h1 + 2a2 h2 + h21 + h22 , a1 h2 + h1 a2 + h1 h2 )
of which (2a1 h1 + 2a2 h2 , a1 h2 + h1 a2 ) is linear in h. So choosing Df (a) : R2 →

R2 defined by Df (a)(h) = (2a1 h1 + 2a2 h2 , a1 h2 + h1 a2 ) we have a continuous
41
linear map from R2 to R2 . In addition,
|f (a + h) − f (a) − Df (a)(h)| |(h21 + h22 , h1 h2 )|

lim = .
h→0 |h| |h|
p p √
And |(h21 + h22 , h1 h2 )| = (h21 + h22 )2 + (h1 h2 )2 ≤ 2(h21 + h22 )2 = 2||h||2 . So
√
|f (a + h) − f (a) − Df (a)(h)| 2||h||2 √
lim ≤ lim = lim 2||h|| = 0.
h→0 ||h|| h→0 ||h|| h→0
So the Fréchet derivative of f at a is Df (a) = (2a1 h1 + 2a2 h2 , a1 h2 + h1 a2 ).

Example 4.
Assume that the set Mn×n (R) of all n × n matrices with real entries can be
made into a vector space, in such a way that
∀A, B ∈ Mn×n (R) we have ||AB|| ≤ ||A||||B||.
Assume furthermore that ∀A, ||AT || = ||A||.

Let f : Mn×n (R) → Mn×n (R) be the map with f (X) = XX T for X ∈
Mn×n (R).
We will show that f is differentiable at each point A ∈ Mn×n (R) and we’ll find
Df (A).
Let A, H ∈ Mn×n (R); we have
f (A + H) − f (A) = (A + H)(A + H)T − AAT

= AH T + HAT + HH T
(remembering that (A + H)T = AT + H T ).
So choose LA (H) = AH T + HAT . You check that it is linear.

To prove that LA is continuous, we will show that it is continuous at 0 and then
by theorem 3.2.1, we have that it is continuous everywhere.
Let A ∈ Mn×n (R). Let ε > 0, let Bε (0) be the open ball centred on 0 (the zero
matrix) in the codomain.
ε
Choose δ = 2||A|| and let Bδ (0) be the open ball centred on 0 in the domain.
ε
Let H ∈ Bδ (0); so ||H|| < δ = 2||A|| . Now ||LA (H)|| = ||AH T + HAT || ≤
||AH T || + ||HAT || ≤ 2||A||||H|| < ε. Thus LA is continuous at zero and so
continuous on Mn×n (R).
||f (A+H)−f (A)−Df (A)(H)|| ||HH T ||
Finally lim ||H|| = lim = 0. Thus LA satisfies the
H→0 H→0 ||H||
T T
definition of derivative; i.e. Df (A)(H) = AH + HA .
(*) To close the chapter, notice that for a function f : V → W which is differ-
entiable at a ∈ V , the continuous affine map A : V → W which is tangent to f
at a is given by
A(x) = f (a) + Df (a)(x − a).
Exercise 6
1. Let V and W be normed vector spaces. In each case show that the map
is differentiable at each point a of its domain, by finding an affine map
which is tangent to the given map at a:
(a) a constant map c : V → W
(b) a continuous affine map B : V → W and
(c) a continuous linear map L : V → W.
2. Deduce from question 1 and other results that every linear map and every
affine map from Rm to Rn is differentiable at every point of its domain.
For each of the given maps in question 1, find its Frećhet derivative at a.
3. Let f : R2 → R with f (x, y) = (x − 1)2 + (y − 2)2 + 3. Sketch the graph
of f and hence guess an affine map which is tangent to f at (1, 2). Hence
prove that f is differentiable at the point (1, 2).
4. Let f : R2 → R with f (x, y) = x2 + y 2 . Sketch the graph of f . Prove that

p
f is not differentiable at the point (0, 0).

5. Prove that if f : V → W is differentiable at a point a ∈ V it must be
continuous there.
6. Let V and W be normed vector spaces and let f : V → W be differentiable
at a ∈ V . Let h ∈ V . In each case decide whether the statement is always
true:
a Df (a)(h) is a linear map from V to W
b Df (a) ∈ W
c Df (a)(h) ∈ W
d Df is a continuous map from V to W .
7. Let f : R → R with f (x) = 3x2 + 2. Prove that f is differentiable at the
point −1 and find its Fréchet derivative Df (−1) at −1.
8. Let f : R2 → R with f (x, y) = x2 + y. Prove that f is differentiable at
the point (0, 0) and find its Fréchet derivative Df (0, 0) : R2 → R, at this
point.
9. Show in each case that f is differentiable at each a = (a, b) ∈ R2 :
(a) f : R2 → R with f (x, y) = x + y 3 ,
(b) f : R2 → R2 with f (x, y) = (x2 + y 2 , x2 − y 2 ),
(c) f : R2 → R3 with f (x, y) = (2xy, x2 + y 2 , x2 − y 2 ).
10. Prove statement (*), the last statement of the chapter.
11. For each of (a), (b) and (c) in question 9, write down an affine map which
is tangent to the given map at (1, 1).
Chapter 7
Differentiation of
Composites
Let U, V and W be normed vector spaces; let f : U → V and g : V → W be

composable maps. Have in mind the following picture
V
U V W
.a
f .f (a) g .g(f (a))
g◦f
and remember that by saying that two maps are tangent at a point we mean
(roughly) that one map is a good approximation to the other map near that
point.
Theorem 7.0.6 Consider maps f : U → V and g : V → W and suppose that

there are continuous affine maps A : U → V and B : V → W such that A is
tangent to f at a and B is tangent to g at f (a).
The composite map B ◦ A is then a continuous affine map which is tangent to
g ◦ f at the point a.
Proof: That the composite of two affine maps is an affine map follows from the
result that the composite of two linear maps is a linear map.
Since g(f (a)) = B(f (a)) (B is tangent to g at f (a))
and B(f (a)) = B(A(a)) (A is tangent to f at a)
we have g(f (a)) = B(A(a)). It remains to prove that
||(g ◦ f )(a + h) − (B ◦ A)(a + h)||

lim = 0.
h→0 ||h||
43
44 CHAPTER 7. DIFFERENTIATION OF COMPOSITES
The strategy is to find a quotient not less than the given one and use the
squeeze principle. The steps are:
(1) define two functions ε and η (satisfying (2)) and show that they are
continuous;
(2) ε and η are so chosen to satisfy
(g ◦ f )(a + h) = (B ◦ A)(a + h)+

||h||LB (ε(h)) + ||LA (h) + ε(h)||h|||| η(LA (h) + ε(h)||h||),
where LB , LA are the linear parts of B and A respectively;

(3) prove that lim LB (ε(h)) = 0
h→0
(4) prove that limh→0 ||LA (h) + ε(h)||h|||| = 0 and that
||η(LA (h) + ε(h)||h||)|| ||LA (h) + ε(h)||h|| ||

lim = 0.
h→0 ||h||
The result will follow by the squeeze principle, since we have

||(g◦f )(a+h)−(B◦A)(a+h)||
0 ≤ ||h||
≤ ||LB (ε(h))|| + || ||LA (h)+ε(h)||h||||||h||
η(LA (h)+ε(h)||h||) ||
and by (3) and (4) the limit of both summands on the right is zero, thus
||(g ◦ f )(a + h) − (B ◦ A)(a + h)||

lim = 0.
h→0 ||h||
(1) Define ε : U → V , η : V → W by
(
1
(f (a + h) − A(a + h)) ||h|| if h 6= 0
ε(h) =
0 h=0
for each h ∈ U respectively

(
1
(g(f (a) + k) − B(f (a) + k)) ||k|| if k 6= 0
η(k) =
0 k=0
for each k ∈ V . The functions ε and η are clearly continuous away from 0
so we just need to show show that ε : U → V is continuous at 0 ∈ U and
η : V → W is continuous at f (a) (i.e. k = 0).
Proof: Because f is tangent to A at a we know that
||f (x) − A(x)||

lim = 0.
x→a ||x − a||
45
With h = x − a and translation of the origin this says
||f (a + h) − A(a + h)||

lim = 0. (†)
h→0 ||h||
But if h 6= 0, then (†) says that
lim ε(h) = 0;
h→0
i.e. let > 0, then B (0) is an open ball in V centred on 0 then can
choose δ > 0 to be the radius of the open ball in U centred on 0, for
which for all h ∈ Bδ (0) we have ε(h) ∈ B (0). In addition if h = 0 then
ε(h) = ε(0) = 0 and so again ε(h) ∈ B (0).
By a similar argument we find η is continuous at f (a) ∈ V .
(2) We show that ε and η defined in (1) satisfy the given equation.
Let
k = LA (h) + ε(h)||h||
= LA (h) + f (a + h) − A(a + h) by definition of ε
= f (a + h) − f (a) since A(a + h) = f (a) + LA (h).
Thus the equation in (2) can be rewritten as
(g ◦ f )(a + h) = (B ◦ A)(a + h) + ||h||LB (ε(h)) + ||k||η(k).
Now f (a) + k = f (a + h) and ||k||η(k) = g(f (a + h)) − B(f (a + h)) so on

the right we have
B(A(a + h)) + LB (f (a + h) − A(a + h)) + g(f (a + h)) − B(f (a + h))

= B(A(a + h)) − LB (A(a + h)) + LB (f (a + h)) − B(f (a + h)) + g(f (a + h))
= g(f (a + h))
since LB − B = c; for some c ∈ W .

(3) Next we prove that lim LB (ε(h)) = 0.
h→0
Since ε and LB are continuous, by continuity via limits (7) of Theorem
3.1.4 we have
lim LB (ε(h)) = LB ( lim ε(h)) = LB (0) = 0.

h→0 h→0
(4) First we’ll show that
k ≤ c||h|| + ||ε(h)||||h|| = ||h||(c + ||ε(h)||),
where k = LA (h) + ε(h)||h||.

By the triangle inequality
||k|| ≤ ||LA (h)|| + ||ε(h)||h||||.

Now LA is continuous at a since A is continuous at a and so LA is continu-

ous at 0 (Theorem 3.2.1), so there exists c ∈ R such that ||LA (x)|| ≤ c||x||,
for all x ∈ V (Theorem 3.2.2). Thus
||k|| ≤ c||h|| + ||h||||ε(h)||

= (c + ||ε(h)||)||h||.
But since lim (c + ||ε(h)||)||h|| = 0, by the squeeze principle we have

h→0
lim ||k|| = 0.
h→0
Now g is tangent to B at f (a) and so by translation of the origin (y =
f (a) + k so y → f (a) ⇔ k → 0 in V ) we have
||g(f (a) + k) − B(f (a) + k)||
lim = 0.
k→0 ||k||
But the left hand side is lim ||η(k)|| and so
k→0
lim ||η(k)|| = 0.
k→0
By the first part

||η(k)|| ||k|| ||η(k)|| ||(c + ||ε(h)||)||h|| ||
0≤ ≤
||h|| ||h||
||η(k)|| ||(c+||ε(h)||)||h|| ||
But lim ||h|| = 0 and so by the squeeze principle again
h→0
we have
||η(k)|| ||k||
lim = 0 as required.
h→0 ||h||
This completes the proof that B ◦ A is tangent to g ◦ f .
Theorem 7.0.7 (Chain Rule)

Let maps f : U → V and g : V → W be differentiable at points a ∈ U and
f (a) ∈ V respectively. The composite map g ◦ f : U → W is then differentiable
at a and
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).
Proof: By the differentiability of f and that of g, we have continuous affine
maps A : U → V and B : V → W , say, which are tangent to f and g at a and
f (a) respectively. So by theorem 7.0.6 B ◦ A is a continuous affine map which
is tangent to g ◦ f at a.
Thus g ◦ f is differentiable at a and by definition of derivative we have
D(g ◦ f )(a) = the linear part of B ◦ A
= the linear part of B◦ the linear part of A
= Dg(f (a)) ◦ Df (a)
as required.
47
Exercise 7
1. Simplify the chain rule D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a) in each of the
following cases:
(a) g is continuous linear (b) f is continuous linear.
(c) g is a constant map (d) f is a constant map.
2. Let f : R2 → R3 and g : R3 → R be differentiable at the points a ∈ R2 and
b = f (a) ∈ R3 respectively. Suppose the matrices of the respective linear
maps are given by
[Dg(b)] = [2 4 3]
 
2 3
[Df (a)] =  1 2  .
5 6
Find the matrix of the linear map D(g ◦ f )(a) and hence find the number
D(g ◦ f )(a)(h) where h = (2, 3).
3. Find the value at the point (x, y) of the function
Id1 + Id31 Id2 ,
where Id1 and Id2 are the projections from R2 to R.

4. Express in terms of the projection functions, the function f : R2 → R with
values given by
x2 + x1 x2
f (x1 , x2 ) = 1 .
1 + x32
5. Find the value at (x1 , x2 ) of the function cos ◦(Id1 + Id22 ).

6. Sketch an outline of the proof, if you are to prove theorem 7.0.7, using
definition 6.0.5.
Chapter 8
Differentiability of
real-valued functions
Since we won’t always want to have to calculate derivatives from first principles,
just as in the elementary calculus, we will develop some rules for differentiation.
The rules only apply for real-valued functions; i.e. functions f : V → R. In
general this is still not an efficient way to calculate derivatives and so we will
in the next chapter examine the connection between the derivative Df (a), at a
point a, of a function f : V → W and the matrix of partial derivatives of f .
The usual algebraic operations on R induce corresponding pointwise operations
on real-valued functions. Let S and T be arbitrary sets and consider functions
f : S → R and g : T → R. We have
(f + g)(x) = f (x) + g(x); (cf )(x) = cf (x) (c ∈ R);
f.g(x) = f (x).g(x) and (f /g)(x) = f (x)/g(x).
Each of these functions will have S ∩ T as their domain except the last which
has S ∩ T \{x|g(x) = 0} as domain.
We can use the projection functions Idj : Rm → R |x = (x1 , x2 , . . . , xm ) 7→ xj ;
1 ≤ j ≤ m to build all of the polynomial and rational functions using the four
operations described in the previous paragraph. For example:
Id1 Id2 : Rm → R is described by Id1 Id2 (x1 , . . . , xm ) = x1 x2 .
And
f : Rm → R described by f (x) = x21 + x22
is just the function Id21 + Id22 .
The rules of differentiation for real-valued functions are similar to those of ele-
mentary calculus. And the following lemma is useful in proving the rules:
Lemma 8.0.8 Let f, g : Rm → R be differentiable at the point a ∈ Rm . Then
the function
(f, g) : Rm → R2 | x 7→ (f (x), g(x))
49
50 CHAPTER 8. DIFFERENTIABILITY OF REAL-VALUED FUNCTIONS
is differentiable at a ∈ Rm , with derivative the linear map (Df (a), Dg(a)) :

Rm → R2 given by (Df (a), Dg(a))(h) = (Df (a)(h), Dg(a)(h)).
In fact the converse is also true, i.e.
if (f, g) : Rm → R2 is differentiable at a ∈ Rm then f : Rm → R is differentiable
at a and g : Rm → R is differentiable at a ∈ Rm .
Proof of lemma 8.0.8:
Suppose the maps f : Rm → R and g : Rm → R are differentiable at the point
a ∈ Rm ; i.e. ∃ a continuous linear map Df (a) : Rm → R such that
||f (a + h) − f (a) − Df (a)(h)||
lim =0
h→0 ||h||
and ∃ a continuous linear map Dg(a) : Rm → R such that
||g(a + h) − g(a) − Dg(a)(h)||
lim = 0.
h→0 ||h||
Now choose the map (Df (a), Dg(a)), which is clearly linear and
||(f,g)(a+h)−(f,g)(a)−(Df (a),Dg(a))(h)||
lim ||h||
h→0
= lim ||(f (a+h)−f (a)−Df (a)(h),g(a+h)−g(a)−Dg(a)(h))||
||h||
h→0 √
(f (a+h)−f (a)−Df (a)(h))2 +(g(a+h)−(g(a)−Dg(a)(h))2
= lim ||h|| .
h→0
Now √
(f (a+h)−f (a)−Df (a)(h))2 +(g(a+h)−(g(a)−Dg(a)(h))2
0 ≤ ||h||
||f (a+h)−f (a)−Df (a)(h)||
≤ ||h|| + ||g(a+h)−g(a)−Dg(a)(h)||
||h||
and by assumption the summands on the right both have limit of zero as h → 0.
So by the squeeze(sandwich) principle for limits, we have
||(f, g)(a + h) − (f, g)(a) − (Df (a), Dg(a))(h)||
lim = 0.
h→0 ||h||
Thus (Df (a), Dg(a)) is the derivative of (f, g) at a as required.
Lemma 8.0.9 (a) Id1 Id2 : R2 → R is differentiable everywhere in R2 and
D(Id1 Id2 )(a) : R2 → R is given by D(Id1 Id2 )(a1 , a2 ) = a2 Id1 + a1 Id2 .
(b) Id1 + Id2 : R2 → R is differentiable everywhere in R2 and
D(Id1 + Id2 )(a) : R2 → R is given by D(Id1 + Id2 )(a1 , a2 ) = Id1 + Id2 .
(c) Id−1 : R\{0} → R is differentiable everywhere and

1
D(Id−1 )(a) : R → R is given by D(Id−1 )(a) = − Id.
a2
51
Theorem 8.0.10 Let f, g : Rm → R be differentiable at a ∈ Rm . The functions

cf ,
f + g, f g, where c ∈ R are differentiable at a ∈ Rm and
(i) D(cf )(a) = cDf (a)
(ii) D(f + g)(a) = Df (a) + Dg(a)
(iii) D(f g)(a) = f (a)Dg(a) + g(a)Df (a)
(iv) If g(a) 6= 0 then f /g is differentiable at a and
D(f /g)(a) = g(a)Df (a)−f
g 2 (a)
(a)Dg(a)
.
We will prove some parts of the lemma and theorem the rest are to be done as
exercises.
Proof of lemma 8.0.9(a):
Let a = (a1 , a2 ) ∈ R2 , h = (h1 , h2 ) ∈ R2 . Notice that
Id1 Id2 (a + h) − Id1 Id2 (a) = a1 h2 + a2 h1 + h1 h2 .
Choose La (h) = a1 h2 + h1 a2 . It is a continuous linear map from R2 to R and

||Id1 Id2 (a + h) − Id1 Id2 (a) − La (h)|| ||h1 h2 ||
lim = lim .
h→0 ||h|| h→0 ||h||
Now ||h1 h2 || ≤ ||h1 ||||h2 || ≤ ||h||2 and so limh→0 ||h||h||

1 h2 ||
= 0.
Thus La (h) satisfies the definition of the Fréchet derivative of Id1 Id2 at a,
i.e. D(Id1 Id2 )(a1 , a2 )(h1 , h2 ) = a2 h1 + a1 h2 as required. (In variable free nota-
tion we have D(Id1 Id2 )(a1 , a2 ) = a1 Id2 + a2 Id1 . )
Now consider Theorem 8.0.10(iii). We can rewrite (f · g) as the composite
(Id1 Id2 ) ◦ (f, g).
And so by the chain rule
D(f · g)(a) = D(Id1 Id2 )((f, g)(a)) ◦ D(f, g)(a)

= f (a)Id2 + g(a)Id1 ◦ (Df (a), Dg(a)) by 8.0.9(c) and 8.0.8
= f (a)Dg(a) + g(a)Df (a)
as required.
The idea of proof of the other parts of theorem 8.0.10 work similarly, express the
given function as a composite, then use lemmas 8.0.8 and 8.0.9 as appropriate.
And so to find the derivative of a given function at a point you now have 3
methods:
(a) find a continuous affine map which is tangent to the given map at the given
point, then find the linear part of that affine map,
(b) find a continuous linear map satisfying the limit condition
(c) express the given function in an appropriate form and use the results lemma
8.0.8, 8.0.9 and theorem 8.0.10.
Using the Rules: Lemma 8.0.8,8.0.9 and theorem 8.0.10.

(1) Let f : R2 → R| (x, y) 7→ x + xy. We will show that f is differentiable at
a = (a1 , a2 ) ∈ R2 and give Df (a).
We have f = Id1 + Id1 Id2 : R2 → R. Each projection function is differentiable
(being a continuous linear map) and by 8.0.9(a), Id1 Id2 is differentiable, so by
8.0.10(ii), we have that f is differentiable.
To find Df (a).
Df (a) = D(Id1 + Id1 Id2 )(a)

= DId1 (a) + D(Id1 Id2 )(a) by 8.0.10(ii)
= DId1 (a) + Id1 (a)DId2 (a) + Id2 (a)DId1 (a) by 8.0.10(iii)
= Id1 + a1 Id2 + a2 Id1
since the projection functions are continuous linear maps
Thus Df (a)(h) = (1 + a2 )h1 + a2 h2 .

You should check this from the definition of derivative; i.e. from first principles.
(2) Let f : R2 → R| (x, y) 7→ arctan(2x2 + y).
Rewriting f we have f = arctan ◦(2Id21 + Id2 ).
Now by 8.0.9 and 8.0.10, 2Id21 + Id2 is differentiable and arctan : R\{0} → R is
differentiable on R\{0}, with
1
D arctan(a) = arctan0 (a)Id = Id.
1 + a2
By the chain rule we have
D(arctan ◦(2Id21 + Id2 ))(a) = D arctan((2Id21 + Id2 )(a)) ◦ D(2Id21 + Id2 )(a)
= D arctan(2a21 + a2 ) ◦ D(2Id21 + Id2 )(a)
where a = (a1 , a2 )
= 1+(2a12 +a2 )2 Id ◦ D(2Id21 + Id2 )(a)
1
Now
D(2Id21 + Id2 )(a) = 2DId21 (a) + DId2 (a)
= 2.2a1 Id1 + Id2
thus
1
D(arctan ◦2Id21 + Id2 )(a) = (4a1 Id1 + Id2 ).
1 + (2a21 + a2 )2
And so
D(arctan ◦2Id21 + Id2 )(a)(h) = 1

1+(2a21 +a2 )2
(4a1 Id1 + Id2 )(h)
1
= 1+(2a21 +a2 )2
(4a1 h1 + h2 ).
(3) Define f : Rn → R by putting

n
X
f (x) =< x, x >= x2i
i=1
53
where x = (x1 , x2 , . . . , xn ). We show that f is differentiable at a ∈ Rn and find

Df (a)(h).
Let h ∈ Rn .
f (a + h) − f (a) =< a + h, a + h > − < a, a >
n n
(ai + hi )2 − a2i
P P
=
i=1 i=1
n n
h2i .
P P
=2 ai hi +
i=1 i=1
n
P
So choose La (h) = 2 ai hi .
i=1
Then La is linear, since its a linear combination of projection functions with real
scalars and by theorem 3.2.3, La is continuous (being a map between Euclidean
spaces.)
In addition
n
h2i
P v
u n
||f (a + h) − f (a) − La (h)|| uX
lim = lim i=1 = lim t h2i = 0.
h→0 ||h|| h→0 ||h|| h→0
i=1
n
Thus f is differentiable at a ∈ Rn . La (h) = 2
P
ai hi satisfies the definition of
i=1
n
P n
P
derivative and so Df (a)(h) = 2 ai hi ; i.e. Df (a) = 2 ai Idi .
i=1 i=1
More generally the map
n
X
<, >: R2n = Rn × Rn → R given by < x, y >= xi yi
i=1
is differentiable at (a, b) ∈ R2n and it is an exercise to show that

n
X n
X
D < , > (a, b) = ai Idn+i + bi Idi
i=1 i=1
where we write (h, k) ∈ R2n as (h1 , h2 , . . . , hn , k1 , . . . , kn ) so for 1 ≤ i ≤ n we

have Idi (h, k) = hi and Idn+i (h, k) = ki .
Exercise 8
1. Let < , > denote the usual inner product in Rn , i.e. x = (x1 , x2 , . . . , xn ),
y = (y1 , y2 , . . . , yn ) and
< , >: R2n → R| < x, y >= x1 y1 + x2 y2 + . . . xn yn .
(a) From first principles show that < , > is differentiable and find the
derivative of this map at the point (a, b) = (a1 , a2 , . . . , an , b1 , . . . , bn ) ∈
R2n .
(b) Suppose f, g : Rm → Rn are differentiable at z ∈ Rm and define

< f, g >: Rm → R by
n
X
< f, g > (z) =< f (z), g(z) >= fi (z)gi (z).
i=1
Notice that < f, g >=< , > ◦(f, g).

Now show that the map < f, g > is differentiable at a ∈ Rm and find
its derivative.
3. Prove Theorem 8.0.10(i) from first principles; i.e. find a continuous affine
map tangent to cf at a, then find the linear part, or, find a continuous
linear map La (h) that satisfies limh→0 ||cf (a+h)−cf (a)−La (h)||
||h|| = 0.
4. Prove Lemma 8.0.9 (b)(c), from first principles.

5. Now prove theorem 8.0.10(ii).
6. Notice that g1 = Id−1 ◦ g. Find D( g1 )(a), where g(a) 6= 0. Now prove
Theorem 8.0.10(iv).
7. Show (using the theorems) that the following function is differentiable at
each point (a, b) ∈ R2 and find its derivative at this point:
2Id1 + 3Id22 .
8. Use Theorem 8.0.10 to prove by induction that if f : Rm → R is differen-

tiable at a ∈ Rm then so is f r for each positive integer r and
Df r (a) = rf r−1 (a)Df (a).
9. Note that f r = Idr ◦ f , where Id : R → R is the identity function. Use the

chain rule to get the same result as in the exercise above.
10. Write the function (x, y) 7→ sin(4x2 ) as a composite and then use the chain
rule to find the derivative of this function.
11. Write the function (x, y) 7→ cos(2x2 + y) as a composite and then use the
chain rule to find the derivative of this function.
12. Show that the following function is differentiable at each point (a, b, c) ∈ R3
and find its derivative at this point:
Id1 + 3Id22 − Id3 .

Chapter 9
Partial derivatives and

Jacobian matrices
Our main use for partial derivatives will be to provide another (easier) method
of computing derivatives. Let f : U ⊆ Rm → R, U an open subset of Rm . By
allowing one of the xi in x = (x1 , x2 , . . . , xm ) to roam free while holding all of
the others fixed, we get a function xi 7→ f (x1 , . . . , xi , . . . , xm ); a function from
R to R. So its ordinary derivative is
Definition 9.0.11
f (x + hei ) − f (x)
lim := f /i (x),
h→0 h
where ei = (0, 0, . . . 1, 0, . . . , 0) the 1 being in the ith place.
∂f
(More conventionally f /i (x) is written as ∂x i
but in this course we will reserve
the ∂ notation for another sort of partial derivative.)
Example
Let f (x1 , x2 ) = x21 + x32 . Fix x2 ∈ R and consider the function which maps x1
to f (x1 , x2 ); i.e.
x1 7→ x21 + x32 .
Its derivative (with respect to xi ) is 2Id. So f /1 (x1 , x2 ) = 2Id(x1 ) = 2x1 .
Now fix x1 and consider the function which maps x2 to f (x1 , x2 ); i.e.
x2 7→ x21 + x32 .
Its derivative (with respect to x2 ) is 3Id2 . So f /2 (x1 , x2 ) = 3Id2 (x2 ) = 3x22 .

Notice that f /i (x) ∈ R; it is not a linear map.
Graphs
Let f : R2 → R. The graph of f is {(x1 , x2 , f (x1 , x2 ))| (x1 , x2 ) ∈ R2 }. The
collection of points forms a surface in R3 . One can sketch a contour map of the
surface.
55
56 CHAPTER 9. PARTIAL DERIVATIVES AND JACOBIAN MATRICES
y 3
f (x, y) = x2 + y 2
2
x
3
The interpretation of f /i (a, b).
9.1 Fréchet differentiability and partial deriva-

tives.
All partial derivatives for a function f may exist at a point a ∈ domf but f may
not be differentiable at a; i.e. existence of all partial derivatives is not sufficient
for differentiability. On the other hand, if a function is differentiable then all
partial derivatives exist; i.e. existence of partial derivatives is necessary.
The classic example is given by f : R2 → R with

−|y| if |x| ≥ |y|
f (x, y) =
−|x| if |x| < |y|
From the sketch it is clear that there does not exist an affine map which is
tangent to f at (0, 0). But f /1 (0, 0) = 0 and f /2 (0, 0) = 0. In fact one can show
that f /1 : R2 → R |(x, y) 7→ f /1 (x, y) (and by symmetry f /2 ) is not continuous
at (0, 0); to do this we have to find B = B (0) such that for every A = Bδ (0, 0)
9.1. FRÉCHET DIFFERENTIABILITY AND PARTIAL DERIVATIVES. 57
we have f /1 (Bδ (0, 0)) 6⊆ B.

Choose B = B 12 (f /1 (0, 0)) = B 12 (0) ⊂ R. Let A = Bδ (0, 0) for some δ > 0.
Now consider the point ( 2δ , 2δ ). We have

/1 δ δ
f ( δ2 +h, δ2 )−f ( δ2 , δ2 )
||f ( 2 , 2 )|| := lim

h
h→0
≥ 1 when − 2δ < h < 0
since | 2δ + h| ≤ | 2δ | + |h| so −| 2δ | + h ≤ −| 2δ + h|
f ( δ2 +h, δ2 )−f ( δ2 , δ2 ) (−| δ2 +h|−−| δ2 |
so lim h = lim h ≥
h→0 h→0
(−| δ2 |+h+| δ2 |
lim h =1
h→0
and so

/1 δ δ /1
/1 δ δ 1
f , − f (0, 0) = f
, − 0 ≥ 1 > .
2 2 2 2 2
Thus f /1 ( 2δ , 2δ ) 6∈ B 12 (0) as required.
Theorem 9.1.1 Let f : U → R (where U is an open set in Rn ) be differentiable

at x then all partial derivatives of f exist at x and for 1 ≤ i ≤ n
f /i (x) = Df (x)(ei ).
Proof: Since f is differentiable at x, there exists the linear map Df (x) such
that
||f (x + h) − f (x) − Df (x)(h)||
lim = 0.
h→0 ||h||
Let i be an integer: 1 ≤ i ≤ n. By the properties of limits (restriction of domain)
we may restrict the variable h and replace it with tei , t ∈ R. Rewriting the
limit condition we have
||f (x+tei )−f (x)−Df (x)(tei )||
0 = lim ||tei || = lim || f (x+teti )−f (x) − Df (x)(ei )||
t→0 t→0
which says that

f (x + tei ) − f (x)

lim = Df (x)(ei )
t→0 t
i.e. f /i (x) = Df (x)(ei ).
Theorem 9.1.2 Let f : U → R (where U is an open set in Rn ) be continuous.
If all n partial derivatives of f exist at each point of U and if each of the partial
derivatives is continuous on U , then f is differentiable at each point of U and
Df (x) = f /1 (x)Id1 + · · · + f /n (x)Idn

for each x ∈ U .
We’ll generalise this theorem and examine its proof.
9.2 General Partial Derivatives.

We saw that the partial derivative f /i is a function from R → R and the value
at each point x ∈ U ⊆ Rn is a number, not a linear map.
The appropriate generalisation of Rm is the product V1 × V2 × · · · × Vm of m
normed vector spaces.
Definition 9.2.1 The product V1 × V2 × · · · × Vm of m normed vector spaces
is the set of all m-tuples (v1 , v2 , . . . , vm ) where vi ∈ Vi for each i, equipped with
componentwise addition and scalar multiplication and a norm.
When Vi = R for all i the product is just Rm .
The simplest cases are when each space Vi is of the form Rki so that
V1 × V2 × · · · × Vm = Rk1 +k2 +···+km .
Thus Rm may be considered as a product space in many different ways.
For example
R5 = R × R × R × R × R, R5 = R3 × R2 , R5 = R × R3 × R.
The context in which the usual partial derivatives are defined is when f : U →
R where U is an open subset of Rm . The generalisation is that the partial
derivatives can be defined when f : U → W , U an open subset of V1 × V2 ×
· · · × Vm .
Definition 9.2.2 Let f : U → W be differentiable, U an open subset of V1 ×
V2 × · · · × Vm , Vi normed vector spaces. Then the ith partial derivative ∂i f (u)
of f at u ∈ U is the linear map from Vi to W defined by
∂i f (u)(vi ) = Df (u)(0, 0, . . . , vi , 0, . . . , 0).
Examples.
(1) Let V1 = V2 = · · · = Vm = R. Then for each u ∈ U , by theorem 9.1.1, we
have
Df (u)(ei ) = f /i (u)
and so by linearity
Df (u)(vi ei ) = vi f /i (u).
But by definition 9.2.2
Df (u)(vi ei ) = Df (u)(0, 0, . . . , vi , 0, . . . , 0) = ∂i f (u)(vi ).
I.e.
∂i f (u)(vi ) = vi f /i (u).
This example will be useful in the sequel.
(2) V1 = R2 , V2 = R and f : V1 × V2 → R with f ((x, y), z) = x2 + y 2 + xz.
For each (r, s) ∈ R2 , t ∈ R we have
Df ((x, y), z)((r, s), t) = Df ((x, y), z)((r, s), 0) + Df ((x, y), z)(0, 0, t)
= ∂1 f ((x, y), z)(r, s) + ∂2 f ((x, y), z)(t).
9.2. GENERAL PARTIAL DERIVATIVES. 59
Now
∂1 f ((x, y), z) : R2 → R
(r, s) 7→ Df ((x, y), z)(re1 ) + Df ((x, y), z)(se2 ).
So
∂1 f ((x, y), z)(r, s) = Df ((x, y), z)(re1 ) + Df ((x, y), z)(se2 )

= rf /1 ((x, y), z) + sf /2 ((x, y), z) by (1) above
= r(2x + z) + s(2y).
And
∂2 f ((x, y), z) : R → R
t 7→ Df ((x, y), z)(t) = tf /3 ((x, y), z) = tx.
So
Df ((x, y), z)((r, s), t) = r(2x + z) + s(2y) + tx.
(3) Let f : R2 × R2 → R with f (w, x, y, z) = z cos(xw) + (wx)2 y.

Then setting x = (w, x, y, z), for each (r, s) ∈ R2 and (p, q) ∈ R2 we have
Df (x)((p, q), (r, s)) = Df (x)((p, q), (0, 0)) + Df (x)((0, 0), (r, s))
= ∂1 f (x)((p, q)) + ∂2 f (x)((r, s)).
Now
∂1 f (x) : R2 → R
(p, q) 7→ Df (x)(pe1 ) + Df (x)(qe2 ).
So
∂1 f (x)(p, q) = Df (x)(pe1 ) + Df (x)(qe2 )

= pf /1 (x) + qf /2 (x) by (1) above
= −pzx sin(xw) + 2px2 yw − 2qw sin(xw) + 2qw2 yx.
And
∂2 f (x) : R2 → R
(r, s) 7→ Df (x)(r, s) = Df (x)(re3 ) + Df (x)(se4 )
So
∂2 f (x)(r, s) = r(wx)2 + s cos(xw).
Thus
Df (x)((p, q), (r, s)) =

−pzx sin(xw) + 2px2 yw − 2qw sin(xw) + 2qw2 yx + r(wx)2 + s cos(xw).
(We have f /1 (x) = −zx sin(xw) + 2x2 yw; f /2 (x) = −2w sin(xw) + 2w2 yx ;
f /3 (x) = (xw)2 and f /4 (x) = cos(xw).)
9.3 Jacobian Matrix

Recall the matrix of a linear map L : Rm → Rn in definition 2.1.1 and theorem
2.1.2.
Let f : U → Rn , U an open subset of Rm , f = (f1 , f2 , . . . , fn ). Let x ∈ U , be a
point where all of the partial derivatives of all of the component functions exist.
Definition 9.3.1 The Jacobian matrix of f at x is the following matrix of
partial derivatives:
 
/1 /2 /m
f1 (x) f1 (x) . . . f1 (x)
 .. .. .. 
. . ... .
 
 
/1 /2 /m
fn (x) fn (x) . . . fn (x)
which we will denote f 0 (x). This is consistent with our notation for functions
f : R → R, where we had Df (a)(h) = f 0 (a).h
Example
Let f : R2 → R2 | (x, y) 7→ (x2 + y 2 , xy).
So f = (f1 , f2 ) where f1 (x, y) = x2 + y 2 and f2 (x, y) = xy.
/1 /2 /1 /2
Thus f1 (x, y) = 2x; f1 (x, y) = 2y; f2 (x, y) = y and f2 (x, y) = x. I.e.

2x 2y
f 0 (x) =
y x.
The next theorem shows that the matrix of the linear map Df (x) is just the
Jacobian matrix f 0 (x). The result will be useful in calculating derivatives.
Theorem 9.3.2 Let f : U → Rn be differentiable at x ∈ U (an open subset of
Rm ). The Jacobian matrix f 0 (x) then exists and the Fréchet derivative
Df (x) : Rm → Rn has its values given by
(Df (x)(h))T = f 0 (x).hT
where . means matrix multiplication.
The significance of the theorem is that it provides another way of calculating
derivatives, but you still need to establish the existence of the derivative either
from first principles or from the rules (Lemmas 8.0.8,8.0.9 and theorem 8.0.10).
In the case of a map f : R → R, the Jacobian matrix is a 1 × 1 matrix and
hence may be regarded as a real number and identified with f 0 (x). In this case
(Df (x)(h))T = f 0 (x).hT reduces to Df (x)(h) = f 0 (x).h where . is multiplica-
tion of real numbers.
Proof of 9.3.2: Let f : U → Rn be differentiable at x ∈ U (an open subset
of Rm ). We need to show that the matrix (Df (x)) for the linear map Df (x) is
f 0 (x) the result follows then from theorem 2.1.2.
f : U → Rn ; f = (f1 , . . . , fn )
Df (x) : Rm → Rn ; Df (x) = (Df1 (x), . . . , Dfn (x)) by the generalisation of
lemma 8.0.8.
9.3. JACOBIAN MATRIX 61
Now the ijth element of (Df (x)) is (Df (x))i (ej ) = Dfi (x)(ej ).
/j
And by theorem 9.1.1, Dfi (x)(ej ) = fi (x). Thus the ijth element of (Df (x))
/j
is fi (x); i.e.
 /1 /m

f1 (x) . . . f1 (x)
 .. .. 

 . . 

[Df (x)] =  fi (x) . . . fi/m (x) 
 /1 
.. ..
 
. .
 
 
/1 /m
fn (x) . . . fn (x)
:= f 0 (x)
Many theorems of elementary calculus rely on continuity of f 0 , the analogue
here will be theorems which depend on the continuity of Df . This notion uses
the norm in L(V, W ), the space of continuous linear maps with supremum norm.
It is different to discussing the continuity of Df (x) which is by definition a con-
tinuous linear map. The next theorem gives necessary and sufficient conditions
for Df to be continuous.
Theorem 9.3.3 Let f : U → Rn be differentiable, where U is an open subset of

Rm . Then the map Df from U to L(Rm , Rn ) is continuous if and only if each
/i
of its partial derivatives fj is continuous as a map from U to R.
We’ll omit the proof here.
Definition 9.3.4 (Continuously differentiable) When f is differentiable on U

and Df is continuous, we say that f is continuously differentiable.
We can now state the generalisation of theorems 9.1.1, 9.1.2 and 9.3.2.
Theorem 9.3.5 Let f : U → Rn be continuous, where U is an open subset of

Rm . Then f is continuously differentiable on U if and only if all of the partial
/i
derivatives fj exist and are continuous on U .
/i
The partial derivatives fj (x) are described in terms of the derivative Df (x) by
/i
fj (x) = Dfj (x)(ei )
and Df (x) can be described in terms of the partial derivatives as

/1 /m
Df (x) = (f1 (x)Id1 + · · · + f1 (x)Idm ,
/1 /m
f2 (x)Id1 + · · · + f2 (x)Idm ,
...
/1 /m
fn (x)Id1 + · · · + fn (x)Idm )
Theorem 9.3.6 (Chain Rule I for partial derivatives)

Let E1 , E2 , E, F be normed vector spaces, U1 open in E1 , U2 open in E2 . f :
U1 × U2 → E differentiable at (a, b) ∈ U1 × U2 and g : E → F differentiable at

f (a, b) ∈ E.
Then for i = 1, 2
∂i (g ◦ f )(a, b) = Dg(f (a, b)) ◦ ∂i f (a, b)
(which easily generalises to a function of n variables.)
Theorem 9.3.7 (Chain Rule II for partial derivatives) Let h : Rn → Rp , g :

Rm → Rn be differentiable. Then for each i = 1, 2, . . . , m
n
/i
X
/i
(h ◦ g) = h/j ◦ g.gj ,
j=1
i.e. for each x ∈ Rm

n
/i
X
(h ◦ g)/i (x) = h/j (g(x)).gj (x).
j=1
Proof: The details are left as an exercise. Let a ∈ Rm . Then for each i =
1, 2, . . . , m
(h ◦ g)/i (a) = ∂i (h ◦ g)(a)(1) by example (1) after def 9.2.2 with vi = 1

= Dh(g(a)) ◦ ∂i g(a)(1) by theorem 9.3.6
= Dh(g(a))(g /i (a)) by example (1)
/i /i
Now note that g /i = (g1 (a), . . . , gn (a)) and apply definition 9.2.2 to complete
the proof.
Exercise 9.
1. Find each of f /1 (x, y), f /2 (x, y) ∂1 f (x, y) and ∂2 f (x, y) where f : R2 → R
is given by
(a) f (x, y) = cos(x) sin(y) (b) f (x, y) = 2x2 sin(y) + 3y 3 x.
2. Find f /1 (x, y, z), f /2 (x, y, z) and f /3 (x, y, z) where
f : R3 → R|(x, y, z) 7→ x2 cos(x + y + z).
Consider R3 as the product R × R2 and write down the continuous linear

maps ∂1 f (a, b, c) : R → R and ∂2 f (a, b, c) : R2 → R.
3. Let f : R4 → R be given by f (x1 , x2 , x3 , x4 ) = cos(x21 ) + cos2 (x2 ) +
ex3 x21 + x1 x2 x3 x4 . Let (a, b, c, d) ∈ R4 . Find f /1 (a, b, c, d), f /2 (a, b, c, d),
f /3 (a, b, c, d) and f /4 (a, b, c, d).
y
xx
4. Let f (x, y) = xx + (log(x))(arctan(arctan(arctan(sin(x)y)))) − log(x + y).
Find f /2 (1, y).
9.3. JACOBIAN MATRIX 63

0 if x = y = 0
5. Let f : R2 → R with f (x, y) = xy
(4x2 +y 2 ) otherwise
Show that both partial derivatives f /1 (0, 0) and f /2 (0, 0) exist by first
finding f (x, 0) and f (0, y).
Show that nevertheless f is not continuous, hence not differentiable at
(0, 0) by considering the values of f at points of the line {(x, y)|y = x}.
6. Let f : R4 → R be a function for which
Df ((x1 , x2 , x3 , x4 )) = x1 x2 Id1 + cos(x3 )Id2 + esin(x2 ) Id3 + Id4 .
Find f /i (x1 , x2 , x3 , x4 ) for i = 1, 2, 3, 4.
7. Let f : R3 → R be a continuous function for which
f /1 (x, y, z) = x3 − 4y, f /2 (x, y, z) = y cos(z) and f /3 (x, y, z) = x.
Say why f is continuously differentiable and write down Df (x, y, z)(h, k, l),
where (h, k, l) ∈ R3 .
8. Let f : R2 → R be a continuous function for which
f /1 (x, y) = x2 + 2y, f /2 (x, y) = y sin(x).
Use Theorem 9.1.2 to show that f is Fréchet differentiable on R2 write

down
Df (x, y)(h, k) where (h, k) ∈ R2 .
9. Let f : R2 × R3 → R with
Df ((x1 , x2 ), (y1 , y2 , y3 ))((h1 , h2 ), (k1 , k2 , k3 )) = x21 k2 + y1 y2 h1 .
Find (∂1 f )((x1 , x2 ), (y1 , y2 , y3 ))(h1 , h2 ) and

(∂2 f )((x1 , x2 ), (y1 , y2 , y3 ))(k1 , k2 , k3 ).
10. For each of the following four functions:
(a) Assume that the function is differentiable and calculate its Jacobian
matrix at (x, y) ∈ R2 .
(b) Use the results, 8.0.8, 8.0.9, 8.0.10 and the chain rule to show that f
is differentiable at (x, y). (First express f in variable free notation.)
(c) Use theorem 9.3.5 to find the Fréchet derivative of f at (x, y).
(d) Now calculate f directly using the expression you wrote for f in (b)
and results numbered 8.0.8, 8.0.9, 8.0.10 and the chain rule.
(i) f : R2 → R2 with f (x, y) = (x sin(y), y sin(x)).

(ii) f : R2 → R2 with f (x, y) = (ex cos(y), ex sin(y)).
2
+y 2
(iii) f : R2 → R3 with f (x, y) = (ex , exy , x).
(iv) f : R2 → R with f (x, y) = sin(x3 + 2y 4 ).
11. Let f : R3 → R2 and g : R2 → R be differentiable functions. Suppose that

1 2 −1
f 0 (2, 1, 2) = ,
0 4 5
f (2, 1, 2) = (0, 1); g /1 (0, 1) = 1; g /2 (0, 1) = −1 and g(0, 1) = 12.

(i) Find the matrix (g ◦ f )0 (2, 1, 2) and write down an expression for the
continuous linear map D(g ◦ f )(2, 1, 2). Evaluate D(g ◦ f )(2, 1, 2) at
the point (0, −1, 2).
(ii) Given the decomposition of R3 as R × R2 , write down the continuous
linear map ∂2 (g ◦ f )(2, 1, 2) and give an upper bound for ||∂2 (g ◦
f )(2, 1, 2)||.
(iii) Decide whether or not the inverse functions (∂1 (g ◦ f )(2, 1, 2))−1 and
(∂2 (g ◦f )(2, 1, 2))−1 exist. If so, write down the inverse functions and
justify your claim.
Chapter 10
Inverse Maps
We next develop the background to understand two important theorems of

analysis: the Inverse Map Theorem and the Implicit Function Theorem.
Recall some basic facts about inverses.
• A map f : A → B which is injective and surjective has an inverse map

f ← : B → A defined by putting
∀x ∈ A ∀y ∈ B y = f (x) ⇐⇒ x = f ← (y).
• Even if f : A → B is not injective, it is often of use to obtain a map which

is injective by restricting the domain.
• From Linear algebra, a linear map L : V → W ( V, W finite dimensional

vector spaces) has an inverse only when V and W have the same dimen-
sion.
• A linear map L : Rn → Rn is injective ⇐⇒ it is surjective. Furthermore

the inverse exists if the matrix (L) for L, has full rank. For low dimension
a convenient test is that det(L) 6= 0.
• The matrix of the inverse of a linear map, is the inverse of the matrix of
that linear map.
10.1 Some techniques

In this section we give some techniques for determining a maximal subset of the
domain of a function on which a function is injective.
Let f : R2 → R2 be defined by f (x, y) = (x2 + y 2 , 2xy).
65
66 CHAPTER 10. INVERSE MAPS
(a) The circles of centre (0, 0) and radius r = 1, 2, 3 respectively in the domain
have image:
(b) Moving around the circles of centre (0, 0) and radius r = 1, 2, 3 respectively
in the domain, when restricted say to
π
{(r cos(θ), r sin(θ))| 0 ≤ θ ≤ }
2
we see that f is not injective; for example
√ ! √ ! √ !
3 1 3 1 3
f , = 1, =f ,
2 2 2 2 2
√ √
3 3 1
but ( 12 , 2 ) 6= ( 2 , 2 ).
(c) To find a maximal subset X of R2 such that f : X → R2 is injective; sketch

an arrow diagram. Here we have for example
X = {(x, y) ∈ R2 | 0 ≤ |y| ≤ x} or equivalently

π π
X = {(r cos(θ), r sin(θ)) ∈ R2 | r ≥ 0, −
≤ θ ≤ }.
4 4
The choice of X is not unique...find another possibility.
(d) Now find the subset Y of R2 such that f : X → Y is surjective. With X

as in (c) we have
f (X) = {(x2 + y 2 , 2xy)|(x, y) ∈ X}.
Other alternatives are

π π
f (X) = {(r2 , r2 sin(2θ))| r ≥ 0, − ≤θ≤ }
4 4
and f (X) = {(x, y) ∈ R2 | 0 ≤ |y| ≤ x}.

Now set Y = f (X).
10.1. SOME TECHNIQUES 67
(e) Now we are in a position to talk about an inverse for f |X : X → Y . We

want a formula for f ← : Y → X.
Let (u, v) ∈ Y , find (x, y) ∈ X : f ← (u, v) = (x, y).
Choose u = x2 + y 2 , v = 2xy (*) so that
u + v = (x + y)2 and u − v = (x − y)2 .
Now we have for (x, y) ∈ X; 0 ≤ |y| ≤ x and so

√ √
x + y ≥ 0, x − y ≥ 0 thus x + y = u + v and x − y = u − v,
so the choice at (*) makes sense.

√ √ √ √
Notice√now that√2x = u + v + u√− v; i.e. x√= 12 ( u + v + u − v) and
2y = u + v − u − v; i.e. y = 12 ( u + v − u − v).
Thus
1 √ √ √ √
f ← (u, v) = ( u + v + u − v, u + v − u − v),
2
which is well-defined since we have 0 ≤ |v| ≤ u; (which is just the state-
ment that 0 ≤ |2xy| ≤ x2 + y 2 ; i.e. (x + y)2 ≥ 0 and (x − y)2 ≥ 0.)
It remains to demonstrate that this really is the inverse; i.e. prove that
f ◦ f ← = Id|Y and f ← ◦ f = Id|X .
(f) Finally suppose we want to find the set of all points (x, y) such that the
continuous linear map Df (x) = Df ((x, y)) : R2 → R2 has a continuous
linear inverse.
We have f = (Id21 + Id22 , 2Id1 Id2 ) and since each component involves
sums/products of projection functions, it is differentiable on R2 . For
x = (x, y) ∈ R2 we have

0 2x 2y
f ((x, y)) = .
2y 2x
We have det(f 0 (x, y)) = 4x2 − 4y 2 ≥ 0, since 0 ≤ |y| ≤ x. So Df (x, y) is

invertible everywhere except on S = {(x, y) |x = ±y} since if x = ±y then
det(f 0 (x, y)) = 0. That is Df (x, y) is invertible on {(x, y)|0 ≤ |y| < x}.
Having checked that for f : X → Y and f ← : Y → X we have f ◦ f ← = Id|Y
and f ← ◦ f = Id|X ; notice
Id|X = DId(x, y) = D(f ← ◦ f )(x, y) = Df ← (f (x, y)) ◦ Df (x, y), (1)
for each (x, y) ∈ X and
Id|Y = DId(u, v) = D(f ◦ f ← )(u, v) = Df (f ← (u, v)) ◦ Df ← (u, v), (2)
for each (u, v) ∈ Y.

The corresponding matrix equations for the linear maps
Id, D(f ◦ f ← )(u, v) and D(f ← ◦ f )(x, y)
are
(1) [Id] = [Df ← (f (x, y))][Df (x, y)]
(2) [Id] = [Df (f ← (u, v))][Df ← (u, v)]
and replacing (u, v) with f (x, y)
(2)0 [Id] = [Df (x, y)][Df ← (f (x, y))] so

1 0
(1) = (f ← )0 (f (x, y)).f 0 (x, y)
0 1
1 0
(2) = f 0 (x, y)(f ← )0 (f (x, y))
0 1
which says that (f ← )0 (f (x, y)) is the inverse matrix of f 0 (x, y); i.e. the Jacobian
matrix for f ← is the inverse of the Jacobian for f .
Here we have (f ← )0 (f (x, y)) = (f ← )0 (x2 + y 2 , 2xy). Now
1 √ √ √ √
(f ← )(u, v) = ( u + v + u − v, u + v − u − v)
2
so " #
1 √1 + √1 √1 − √1
← 0 u+v u−v u+v u−v
(f ) (u, v) = √1 √1 √1 √1
.
4 u+v
− u−v u+v
+ u−v
Finally replacing u with x2 + y 2 and v with 2xy

1 1 1 1
+ −

1
(f ← )0 (x2 + y 2 , 2xy) = x+y
1
x−y
1
x+y
1
x−y
1
4 x+y − x−y x+y + x−y
from which you can check the matrix equation

← 0 2 0 1 0
2
(f ) (x + y , 2xy).f (x, y) = = f 0 (x, y).(f ← )0 (x2 + y 2 , 2xy).
0 1
10.2 Existence of local inverses.

The Fréchet derivative Df (a) of a map f at a point a of its domain gives a lot
of information about the local behaviour of f .
In particular, it should come as no surprise that if Df (a) has an inverse then we
can find open sets X and Y with a ∈ X and f (a) ∈ Y , such that the restricted
map f : X → Y has an inverse.
Morally this is the content of the Inverse Map Theorem, although we need an
additional hypothesis. In fact we also need that Df is continuous. Shortly we
will give an example to demonstrate this.
10.2. EXISTENCE OF LOCAL INVERSES. 69
Theorem 10.2.1 Inverse Map Theorem

Let f map an open subset A of a Banach space V into a normed vector space
W and let f be differentiable on A.
Suppose Df is continuous at a ∈ A.
If the linear map Df (a) : V → W has a continuous inverse then
there is an open set X ⊆ A containing a and
an open set Y ⊆ W containing f (a) such that
the restricted map f : X → Y has a continuous inverse f ← : Y → X.
Furthermore the inverse map is differentiable at f (a) and
(f ← )0 (f (a)) = (f 0 (a))−1 ;
i.e. the Jacobian of the inverse at f (a) equals the inverse matrix of the Jacobian
of f at a.
The inverse map theorem thus gives a sufficient condition for local invertibility
of a function f : A ⊆ V → W .
Remarks
• A Banach space is a complete normed vector space; i.e. a normed vector
space in which every Cauchy sequence converges to a point in the space.
For a sequence {fn }∞
n=1 of points (maps) in V (a Banach space) we have
If {fn }∞
n=1 is a Cauchy sequence then∃l ∈ V, lim {fn }∞
n=1 = l.
n→∞
• The following example demonstrates why we need the hypothesis Df :

A → L(V, W ) continuous at a ∈ A.
The Roll and Wiggle Function. Let f : R → R be defined by
2 1
x
2 + x sin x if x 6= 0
f (x) =
0 otherwise
We will show that Df (0) has a continuous inverse, but that f fails to be injec-
tive on every open interval containing 0 and so cannot have an inverse. This
demonstrates that existence of the inverse of the derivative is not sufficient by
itself. We will also demonstrate that Df is not continuous at 0.
Since f : R → R we have Df (x) = f 0 (x).Id. Now for x ∈ R\{0},
1 1 1
f 0 (x) = + 2x sin − cos
2 x x
f (h)−f (0)
and f 0 (0) = 21 . Since f 0 (0) := lim h we have
h→0
h
+ h2 sin h1
f 0 (0) = lim 2
.
h→0 h
Now
| h2 + h2 sin h1 | | h + h2 | 1

≤ 2

= + h
|h| |h| 2
thus
( 12 + 2x sin x1 − cos x1 ).Id

if x 6= 0
Df (x) = 1
2 .Id if x = 0.
And so Df (0) = 12 .Id, which clearly has continuous inverse (Df (0))−1 = 2.Id.
We will now prove that
Df : R → L(R, R)| x 7→ Df (x)
is not continuous at 0.
Consider ε = 21 , let δ > 0 and since { 2nπ
1
}∞
n=1 is a decreasing sequence we can
choose large n so that 2nπ < δ. Notice that Df (x) − Df (0) = 2x sin x1 − cos x1
1
1
so we can choose x such that sin x = 0 and cos x = 1; consider x = 2nπ . Now
|x| = x < δ
and
||Df (x) − Df (0)|| = ||( 21 − cos 2nπ)Id − 21 .Id||
= || cos 2nπ.Id||
= ||Id|| = 1 > ε.
Thus Df is not continuous at 0.
Finally we show that f fails to be injective on every open set containing 0; and
so it cannot have an inverse.
First notice that
• |f (x) − x2 | = |x2 sin x1 | = |x2 || sin x1 | ≤ x2 . So x
2 − x2 ≤ f (x) ≤ x
2 + x2 .
2 0
• If x ∈ (0, 14 ) then x
2 − x2 is increasing and ( Id
2 − Id ) (x) =
1
2 − 2x.(Which
1 1
is zero when x = 4 and positive when 0 < x < 4 .)
2
• There is a decreasing sequence of points sn = (4n−1)π such that f (sn ) =
sn 2 2 1
2 − sn , for all n and s1 = 3π < 4 . This means that ∀n > 1, f (sn ) <
f (s1 ) and since f (sn ) is a local minimum for each n; ∃r ∈ R such that
sn < r < sn−1 and f (r) = f (sn−1 ).
Now I want to prove that ∀δ > 0, f is not injective on Bδ (0).
Let δ > 0, Bδ (0) is an open ball in R.
If δ ≥ 41 , then choose x1 = 3π2
and x2 the point between 2
7π and 2
3π such
that 2
1 2
f (x2 ) = − .
3π 3π
Then
2 2
1 2 3π 1 2
f (x1 ) = + sin = − = f (x2 ).
3π 3π 2 3π 3π
And if δ < 41 , then choose for x1 the first real number sν say, for which
f (sν ) = snu 2
2 − (sν ) (=f (x1 )). Then f (sν+1 ) < f (sν ) and both points are
local minima so ∃x2 such that sν+1 < x2 < sν for which f (x2 ) = s2ν −(sν )2 .
So again f is not injective on Bδ (0).
• We should also stress the “local” nature of the conclusion. It is possible

to give a function f such that Df (a) exists at every a ∈ domf , but f has
no inverse until it is restricted to a proper subset of domf .
For example, consider the function f : R2 → R2 with
f (x, y) = (ex cos y, ex sin y).
So
f = ((exp ◦Id1 ) cos ◦Id2 , (exp ◦Id1 ) sin ◦Id2 )
which is componentwise a product of differentiable functions and so is differen-
tiable. We have that the Jacobian matrix at (x, y) is
x
e cos y −ex sin y

0
f (x, y) = .
ex sin y ex cos y
Since all of the partial derivatives are continuous, by Theorem 9.1.2 f is con-
tinuously differentiable, which by definition means that Df is continuous.
Also det(f 0 (x, y)) = e2x which is not zero. And so the inverse of the linear map
Df (x, y) exists everywhere.
On the other hand, we have
f ← (u, v) = (x, y) ⇐⇒ f (x, y) = (u, v) = (ex cos y, ex sin y).
So setting u = ex cos y, v = ex sin y, we have that u2 + v 2 = e2x and so x =

1 2 2
2 ln(u + v ).
If cos y 6= 0, i.e. y 6= (2n+1)π
2 , n ∈ Z, then
v v
= tan y → y = arctan .
u u
Thus the codomain of the inverse must be restricted to one branch of the domain
of tan; − π2 < y < π2 say. This says that f has inverse only if − π2 < y < π2 . We
find that
f : X → Y if X = {(x, y) ∈ R2 : x ∈ R : − π2 < y < π

2}
then Y = f (X) = R × R\{ (2n+1)π

2 : n ∈ Z} and

← 1 2 2 v
f (u, v) = ln(u + v ), arctan .
2 u
In the very special case where V = W = R, we can give an improvement on the

Inverse Map Theorem, in the sense that there is no need to restrict the domain;
thus we get a global result.
If f : I → R and f 0 > 0 on I, a real interval, then f is onto an

interval J, say (since by assumption f 0 exists so f is continuous on
I, thus f (I) is a connected subset of R; which means that f (I) is an
interval) and f ← : J → I exists.
Finally in the case when V = Rn and W = Rn we have

Theorem 10.2.2 Let f map an open subset A of Rn into Rn and let f and all
/i
of its partial derivatives fj be continuous on A.
0
If det(f (a)) 6= 0 then
∃ open X ⊆ Rn ; a ∈ X
∃ open Y ⊆ Rn ; f (a) ∈ Y
such that f : X → Y has a continuous inverse f ← : Y → X.
Furthermore the inverse map is differentiable at f (a) and
Df ← (f (a)) = (Df (a))← .
Proof: We assume Theorem 10.2.1. Recall that f 0 (a) is the matrix of Df (a),
so the assumption that det(f 0 (a)) 6= 0 means that (Df (a))← exists and it is
linear, being the inverse of a linear map.
By assumption all partial derivatives exist and are continuous and so Df is
continuous.
Thus the hypothesis of Theorem 10.2.1 is satisfied and so Theorem 10.2.2 follows.
Now to prove theorem 10.2.1 we will need a generalisation to arbitrary normed
vector spaces, of the mean-value theorem. Recall that for functions f : R → R
continuous on [a, b], a closed real interval, and differentiable on (a, b)
∃x ∈ (a, b)| f 0 (x) = f (b)−f
b−a
(a)
.
Corollary 10.2.3 Let f be as in the mean-value theorem for real-valued func-
tions with real domain; then |f (b) − f (a)| ≤ sup {| f 0 (t)|(b − a)}.
t∈[a,b]
Proof: Assuming the mean-value theorem we have |f (b) − f (a)| = |f 0 (x)|(b − a)

for some x ∈ (a, b). So |f 0 (x)| ≤ sup{|f 0 (t)| : t ∈ [a, b]} since sup is an upper
bound and so |f (b) − f (a)| ≤ sup {| f 0 (t)|(b − a)}.
t∈[a,b]
It is the corollary which we can generalise to arbitrary normed vector spaces,
once we have found a replacement for the notion of closed real interval.
Definition 10.2.4 Let a, b ∈ V , a normed vector space, we define
[a, b] := {a + t(b − a) : t ∈ [0, 1]}
to be the “closed segment” joining the 2 points a and b.

More generally one has a notion of convex set in a normed vector space:
Definition 10.2.5 A subset S of V is convex ⇐⇒ ∀ pairs of points a, b ∈ S;
[a, b] ⊆ S.
Theorem 10.2.6 Mean-Value Theorem

Let f map an open set containing the segment [a, b] in the normed vector space
V , into W . If f has a derivative defined and bounded on [a, b] then
||f (b) − f (a)||W ≤ sup {||Df (x)||L(V,W ) }||b − a||V .

x∈[a,b]
Lemma 10.2.7 Let ϕ map an open set on the real line containing the interval
[0, 1] into W . If ϕ has a derivative which exists and is bounded on [0, 1], then
||ϕ(1) − ϕ(0)|| ≤ sup {||Dϕ(t)||}.

t∈[0,1]
Proof: First we show that for any > 0, for each t ∈ [0, 1], ∃δ > 0 such that
for all h ∈ R
|h| < δ ⇒ ||ϕ(t + h) − ϕ(t)|| < (||Dϕ(t)|| + )|h|.
Since ϕ is differentiable on [0, 1] we have
||ϕ(t + h) − ϕ(t) − Dϕ(t)(h)||

lim = 0.
h→0 |h|
Let > 0.
Choose δ > 0 such that
||ϕ(t + h) − ϕ(t) − Dϕ(t)(h)||
|h| < δ ⇒ <
|h|
⇒ ||ϕ(t + h) − ϕ(t) − Dϕ(t)(h)|| < |h|

⇒ ||ϕ(t + h) − ϕ(t)|| − ||Dϕ(t)(h)|| < |h| triangle inequality
⇒ ||ϕ(t + h) − ϕ(t)|| < |h| + ||Dϕ(t)(h)||
≤ |h| + ||Dϕ(t)|||h| Theorem 4.1.4
= (||Dϕ(t)|| + )|h| as required (*)
Next we define a set S which is not empty and bounded above and for which
the least upper bound is sup {||Dϕ(t)||}.
t∈[0,1]
Define S = {s ∈ [0, 1]| ||ϕ(s) − ϕ(0)|| ≤ supt∈[0,s] (||Dϕ(t)|| + )|s|}. Now
if 1 ∈ S , then ||ϕ(1) − ϕ(0)|| ≤ sup (||Dϕ(t)|| + ) and since can be made
t∈[0,1]
arbitrarily small then
||ϕ(1)−ϕ(0)|| ≤ sup ||Dϕ(t)|| which is what we need to prove the special case.
t∈[0,1]
0 ∈ S , since 0 ∈ [0, 1] and
0 = ||ϕ(0) − ϕ(0)|| ≤ sup (||Dϕ(t)|| + )

t∈{0}
Thus S 6= ∅ and its bounded above (since S ⊆ [0, 1]) so it has a least upper
bound.
Let A = sup S . I will show that A ∈ S and then that A = 1.
Let {sn }∞
n=1 be a sequence of real numbers in S such that lim sn = A.
n→∞
Since ϕ is continuous on A (by assumption differentiable on [0, 1]) we have
by the theorem “continuity via sequences”, lim ϕ(sn ) = ϕ(A).
n→∞
And so
lim ϕ(sn ) − ϕ(0) = ϕ(A) − ϕ(0)
n→∞
⇒ lim (ϕ(sn ) − ϕ(0)) = ϕ(A) − ϕ(0) since ϕ(0) is independent of n and so
n→∞
|| lim (ϕ(sn ) − ϕ(0))|| = ||ϕ(A) − ϕ(0)||
n→∞
i.e. ||ϕ(A) − ϕ(0)|| = lim ||ϕ(sn ) − ϕ(0)|| since || || is continuous (*)
n→∞
But sn ∈ S , so for each n ∈ N

||ϕ(sn ) − ϕ(0)|| ≤ supt∈[0,sn ] {(||Dϕ(t)|| + )|sn |}
≤ supt∈[0,A] {(||Dϕ(t)|| + )|A|} since for all n ∈ N, sn ≤ A (*)
Putting together the three (*)’s we have
||ϕ(A) − ϕ(0)|| ≤ sup {(||Dϕ(t)|| + )|A|}
t∈[0,A]
Hence A ∈ S .
Now by definition of A it is less than or equal to 1. If A = 1 then we are
done. So suppose that A < 1.
Choose 0 < δ ≤ 1 − A, so if |h| < δ by the first (*) we have
||ϕ(A + h) − ϕ(A)|| = (||Dϕ(A)|| + )|h| so
||ϕ(A + 12 δ) − ϕ(A)|| = (||Dϕ(A)|| + ) 12 δ
Now ||ϕ(A + 12 δ) − ϕ(0)|| = ||ϕ(A + 12 δ) − ϕ(A) + ϕ(A) − ϕ(0)||
≤ ||ϕ(A + 12 δ) − ϕ(A)|| + ||ϕ(A) − ϕ(0)||
by the triangle inequality
≤ (||Dϕ(A)|| + ) 21 δ + sup {(||Dϕ(t)|| + )|A|}
t∈[0,A]
≤ sup {(||Dϕ(t)|| + )|A + 12 δ|}
t∈[0,A+ 12 δ]
So A + 12 δ ∈ S which contradicts the assumption that A is the least upper

bound of S . Hence A = 1 as required.
This completes the proof of the lemma.

We now prove the mean value theorem from the special case.
Proof of Theorem 10.2.6: Let f map an open set containing the segment
[a, b] in the normed vector space V into W . Suppose f has a derivative which
is defined and bounded on [a, b].
Put ϕ(t) = f (a + t(b − a)) = (f ◦ (a + Id(b − a)))(t). Then ϕ(1) = f (b),
ϕ(0) = f (a) and a + t(b − a) ∈ [a, b], for all t ∈ [0, 1]. So ϕ maps an open set
containing [0, 1] into W . Since f is differentiable at a+t(b−a) and a+Id(b−a)
is differentiable (being an affine map) the composite ϕ is defined on [0, 1] and
for any t ∈ [0, 1],
Dϕ(t) = D(f ◦ (a + Id(b − a)))(t)
= Df (a + Id(b − a)(t)) ◦ D(a + Id(b − a))(t)
= Df (a + t(b − a)) ◦ (b − a)Id.
So ||Dϕ(t)|| = ||Df (a + t(b − a))||||(b − a)Id|| by an exercise in chapter 5, and

so is bounded on [0, 1]. Let s ∈ R, |s| ≤ 1
||Dϕ(t)(s)|| = ||Df (a + t(b − a))(b − a)s||

≤ ||Df (a + t(b − a))||||(b − a)s|| by 4.1.4
= ||Df (a + t(b − a))||||(b − a)|||s| by homogeneity of norm
≤ ||Df (a + t(b − a))||||(b − a)|| since |s| ≤ 1
i.e. ||Df (a + t(b − a))||||(b − a)|| is an upper bound for {||Dϕ(t)(s)|| : |s| ≤ 1}.
But ||Dϕ(t)|| is the least upper bound of {||Dϕ(t)(s)|| : |s| ≤ 1}. So
||Dϕ(t)|| ≤ ||Df (a + t(b − a))||||(b − a)||.
Now
||f (b) − f (a)|| = ||ϕ(1) − ϕ(0)||
≤ sup ||Dϕ(t)|| by the special case
t∈[0,1]
≤ sup ||Df (a + t(b − a))||||b − a||
t∈[0,1]
≤ sup ||Df (x)||||b − a||
x∈[a,b]
completing the proof of the Mean Value Theorem.

We will also need the definition of a contraction mapping and the contraction
map theorem.
Definition 10.2.8 ( Contraction Map):

A function g : S → S is said to be a contraction map on S if
(∃k < 1)(∀x, y ∈ S)||g(x) − g(y)|| ≤ k||x − y||.
Theorem 10.2.9 Contraction Map Theorem:

Let S be a complete subset of a normed vector space and let x1 ∈ S.
If g is a contraction map of S into itself, then the sequence of iterates of x1
under g converges to an element x ∈ S, satisfying x = g(x).
There is no other element of S which satisfies this equation.
To prove the theorem we need the result
If S is a closed ball in a complete normed vector space (Banach

space) then S is itself complete.
Plan of Proof of the Contraction Map Theorem:
A. Show that the distance between successive iterates drops down by a factor
of k or more at each step.
B. Deduce that if {xn }∞

n=1 is a Cauchy sequence then it has a limit in S,
because S is complete.
C. Show that the limit x is a fixed point of g, and that it is the only one.
Proof of the Contraction Map Theorem:

A. By the construction of the sequence {xn }∞
n=1 we have
xn = g(xn−1 ) ∀n ≥ 2
and so xn+1 = g(xn )
so that xn+1 − xn = g(xn ) − g(xn−1 ).
From the contraction condition on the map g we have
||g(xn ) − g(xn−1 )|| = ||xn+1 − xn || ≤ k||xn − xn−1 ||(†).
Since (†) holds for all n ≥ 2, by repeatedly applying (†) we may drop the value
of n on the right hand side, back to 2; i.e.
||xn+1 − xn || ≤ k||xn − xn−1 ||

≤ k.k||xn−1 − xn−2 ||
≤ k.k.k||xn−2 − xn−3 ||
..
.
≤ k n−2 ||xn−(n−3) − xn−(n−2) || = k n−2 ||x3 − x2 ||
≤ k.k n−2 ||x2 − x1 ||
= k n−1 ||x2 − x1 ||
and so we see that the distance between successive iterates is dropping down at
least as rapidly as the terms of a geometric series with ratio k.
B. To show {xn }∞ n=1 is a Cauchy sequence.
Let m < n. Note that
||xn − xm || = ||xn − xn−1 + xn−1 − xn−2 + · · · + xm+2 − xm+1 + xm+1 − xm ||

≤ ||xn − xn−1 || + ||xn−1 − xn−2 || + · · · + ||xm+2 − xm+1 ||+
||xm+1 − xm || by the triangle inequality
≤ (k n−2 + k n−1 + · · · + k m−1 )||x2 − x1 ||
and since ||x2 − x1 || is a constant we have a geometric progression;

tn = k n−2 ||x2 − x1 || such that
||xn − xm || ≤ ||tn − tm ||; i.e. {tn }∞ ∞

n=1 dominates {xn }n=1 .
Now since {tn }∞

n=1 is a geometric progression, it converges and so it is a Cauchy
sequence. Thus {xn }∞ n=1 is a Cauchy sequence; because it is dominated by a
Cauchy sequence. Now since S is complete, {xn }∞ n=1 is convergent in S.
Thus there exists x ∈ S; such that lim xn = x.
n→∞
C. We need to see that x is a fixed point of g.
We have ∀n ≥ 2, xn = g(xn−1 ) and so
lim xn = lim g(xn−1 ).

n→∞ n→∞
But g is a contraction map, so g is continuous (see the exercises) thus
lim g(xn−1 ) = g( lim xn−1 )

n→∞ n→∞
= g( lim xn )by theorem on shifting sequences
n→∞
= g(x); thus x = g(x).
Finally to see that there is no other fixed point:

suppose that x 6= y are both fixed points of g, so x = g(x) and y = g(y).
Now ||g(x) − g(y)|| = ||x − y||. But by the contraction condition we have
||g(x) − g(y)|| ≤ k||x − y|| for some k < 1.
And here we have k = 1, since x 6= y; thus the contraction condition assumed

for g is contradicted. And so the fixed point is unique.
Finally in this chapter we give an example of the use of the Inverse Map Theo-
rem.
Let A = {(x1 , x2 ) : − 21 < x1 < 12 } and define
f : A → R2 by f (x1 , x2 ) = (e−x1 cos(x2 ), e−x1 sin(x2 )).
First we check that the hypotheses of the inverse map theorem apply; i.e.
• f maps an open subset A of a Banach space V into a normed vector space
• f should be Fréchet differentiable on A
• Df must be continuous at a ∈ A and
• the linear map Df (a) : V → W must have a continuous inverse.
A is an open subset of R2 a complete normed vector space.

In variable free form
f = ((exp ◦−Id1 ).(cos ◦Id2 ), (exp ◦−Id1 ).(sin ◦Id2 ))
the theorems assert differentiability of the projection functions; the functions

cos, sin, exp : R → R are all differentiable; composites are differentiable and a
function is differentiable if its component functions are, so the theorems give
differentiability of f on R2 and therefore on A.
Since f maps an open subset of R2 to R2 , Df is continuous at a = (a, b) ∈ A
⇐⇒ all of the partial derivatives fji are continuous at (a, b). Now
/1 /2
f1 (a, b) = −e−a cos(b) f1 (a, b) = −e−a sin(b)
/1 /2
f2 (a, b) = −e−a sin(b) f2 (a, b) = e−a cos(b)
each of which is differentiable on R2 and so is continuous on R2 . Thus Df is

continuous at (a, b) ∈ A.
Df (a) : R2 → R2 is the continuous linear map given by
−e−a cos(b) −e−a sin(b)

T Id1
(Df (a)) = −a −a
−e−a sin(b) e −a cos(b) Id
2
−e cos(b)Id1 − e sin(b)Id2
=
−e−a sin(b)Id1 + e−a cos(b)Id2
so Df (a) = (−e−a cos(b)Id1 −e−a sin(b)Id2 , −e−a sin(b)Id1 +e−a cos(b)Id2 ) which
conforms with the form of a linear map from R2 to R2 . The linear map
Df (a) : R2 → R2 has an inverse ⇐⇒ it’s matrix f 0 (a) has non-zero de-
terminant.
Now
−e−a cos(b) −e−a sin(b)

0
f (a, b) =
−e−a sin(b) e−a cos(b)
and so the determinant det(f 0 (a)) = −e−2a 6= 0 for any a ∈ R. Thus f satisfies
the hypothesis of the theorem on R2 and so on A.
Thus the conclusion to the theorem applies; i.e.
• there is an open set X ⊆ A containing a and an open set Y ⊆ W containing
f (a) such that the restricted map f : X → Y has a continuous inverse
f ← : Y → X.
• The inverse map is Fréchet differentiable at f (a) and
(f ← )0 (f (a)) = (f 0 (a))−1 .
But before we choose the sets X and Y we can assume, by the inverse map
theorem that an inverse exists on some open set and so we will find the inverse
and then try to choose sets X and Y so that f and f ← are bijective.
−1
For f (x, y) = (e−x cos(y), e−x sin(y)) = (u, v) find that x = ln((u2 + v 2 ) 2 ) and
y = arctan uv . And so
−1 v
f ← (u, v) = (ln((u2 + v 2 ) 2 ), arctan ).
u
Try to find sets X and Y = f (X) which are suitable in this example.
The Jacobian for f ← (u, v) is the matrix
−u −v
u2 +v 2 u2 +v 2
−v u
u2 +v 2 u2 +v 2
which gives f ← (f (x, y)) is
−ex cos(y) −ex sin(y)

.
−ex sin(y) ex cos(y))
Check that the matrix product f 0 (x, y).(f ← )0 (f (x, y)) gives the identity 2 × 2
matrix; thus the composite Df ((x, y)) ◦ Df ← (f (x, y)) is the identity map on
Y and that the matrix product (f ← )0 (f (x, y)).f 0 (x, y) is also the identity 2 × 2
matrix, so that the composite Df ← (f (x, y)) ◦ Df (x, y) is the identity map on
X. It is also easy to check that (f ← )0 (f (a)) = (f 0 (a))−1 as asserted by the
theorem.
Exercise 10
1. Let f : R2 → R2 with f (x, y) = (x2 − y 2 , 2xy)
(a) Sketch the images of the hyperbolae of centre (0, 0) and distance to
a vertex of 1,2,3 respectively.
(b) Is f 1-1? Give reasons.
(c) Find a set X such that the restricted map f : X → R2 is 1-1.
(d) Now find the set Y such that the map f : X → Y is onto.
(e) Find a formula for the inverse f ← : Y → X of the restricted map.
(f) Now find the set of points for which the linear map Df (x, y) : R2 → R2
has an inverse. Is the inverse of Df (x, y) continuous?
2. Let g : R3 → R3 be given by g(x, y, z) = 71 (1 − y 3 , 1 + z 2 , x3 ). Find an upper

bound for sup(x,y,z)∈[−1,1]3 {||Dg(x, y, z)||}. Use the mean value theorem
to show that g is a contraction map.
3. Use the mean value theorem (for real valued functions) to prove that the
restriction of the function 1−cos : [0, π4 ] → [0, π4 ] is a contraction mapping.
4. For f : R2 → R2 with f (x, y) = (x2 − y 2 , x2 + y 2 )show that there is a

constant k < 1 such that for all x, x0 in the square [− 18 , 18 ] × [− 18 , 18 ]
||f (x) − f (x0 )|| ≤ k||x − x0 ||.
5. Let V be a Banach space. Let L : V → V be a continuous linear map

and let Id : V → V be the identity map on V . Suppose that L is
close to the identity in the sense that ||L − Id|| < 1, where ||L − Id|| =
sup||x||≤1 {||L(x) − x||}.
(a) Prove that for each b ∈ V there is a unique x ∈ V such that L(x) = b,
thereby proving that L has an inverse L← : V → V .
(b) Use the equation Id = Id − L + L to prove that there is a constant
k such that for each x ∈ V ||x|| ≤ k||L(x)||. Deduce that L← is
continuous.
Thus you have shown that
If L : V → V is a continuous linear map and ||L − Id|| < 1,

where V is a Banach space; then L has a continuous inverse.
6. Let V be a Banach space and W a normed vector space and let L0 : V → W

be a continuous linear map with continuous linear inverse. Let L be a
continuous linear map which is close to L0 in the sense that ||L − L0 || <
1
||L← || . Show that the map L itself then has a continuous linear inverse.
0
Hint: apply the results of 5 to the map L←

0 ◦ L.
7. Prove that each open ball in a normed vector space is convex.
Let f : R3 → R3 have a derivative which is zero at each point of Br (1, 0, 0)

in R3 , where r > 0. From the above Br (1, 0, 0) is a convex set, use the
Mean Value Theorem to show that f is constant on Br (1, 0, 0).
8. Let φ : U → R2 with φ(r, θ) = (r cos(θ), r sin(θ)) where r > 0 and U is an

open subset of R2 . Find regions of φ(U ) for which a local inverse can be
given.
In other words show that φ satisfies the hypotheses of the inverse map
theorem and then find the open sets X and Y and the map f ← of the
conclusion.
9. Let φ : R2 → R2 with φ(x, y) = (x + x2 f (x, y), y + y 2 g(x, y)), where

f, g : R2 → R are diffferentiable on R2 , with continuous partial deriva-
tives. Find the Jacobian at (0, 0) and decide on the invertibility of φ in a
neighbourhood of (0, 0).
[In fact one views a map such as φ as a perturbation of the identity map,
the functions f and g taking very small values when x, y are both close to
zero. ]
10. Consider the initial value problem
φ0 = (f ◦ φ).g, φ(0) = c, f (c) > 0
where f, φ and g are each real valued differentiable functions(and hence

continuous) on R.
(i) By the following lemma (“no-jump” lemma) for some δ > 0 we have
f (y) > 0 for all y ∈ (c − δ, c + δ).
No-jump Lemma: Let D ⊂ R. If f : D → R is continuous and
l ∈ D is such that f (l) > 0 then there exists δ > 0 such that, for all
x∈D
|x − l| ≤ δ ⇒ f (x) > 0.
The lemma tells us that if f (l) > 0 then f (x) > 0 for all x ∈ D
sufficiently near l. The function cannot suddenly jump to negative
values arbitrarily close to l. (A similiar result holds for f (l) < 0).
Prove the following statement and deduce the no-jump lemma from
it.
Let f : D → R be continuous and suppose f (l) > 0. With a suitable

choice for > 0 show ∃δ > 0 such that
∀x ∈ D |x − l| < δ ⇒ f (x) > 0.
(ii) Furthermore from the continuity of φ we have for some ν > 0 φ(x) ∈
(c − δ, c + δ) for all x ∈ (−ν, ν).
(iii) Thus ∀x ∈ (−ν, ν), (f ◦ φ)(x) > 0 and so we can divide both sides
of the differential equation by f ◦ φ and get
Id−1 ◦ f ◦ φ.φ0 = g on (−ν, ν).
(iv) Integrating gives for x ∈ (−ν, ν)

Zx Zx
−1 0
Id ◦ f ◦ φ.φ = g.
0 0
Using the substitution rule on the left hand side and the initial con-
dition φ(0) = c gives
φ(x)
 
Zx Z Z
Id−1 ◦ f ◦ φ.φ0 = Id−1 ◦ f =  Id−1 ◦ f  (φ(x))
0 c c
and so  
Z Z
 Id−1 ◦ f  ◦ φ = g on (−ν, ν)
0
c
←
−1 −1
R R
So if the function Id ◦ f has an inverse Id ◦f on
c c
(c − δ, c + δ) we could conclude that
 ←
Z Z
−1
φ=  Id ◦ f  ◦ g
0
c
is the unique solution to

R the original problem.
Show that the function Id← ◦f satisfies the hypotheses of the Inverse
c
Map Theorem.
Chapter 11
Implicit functions
Our last major result for this course is the Implicit Function Theorem. It is a
more general theorem than the inverse map theorem. To motivate the theorem,
consider the question; given a function F : R2 → R does there exist a set of
points
{(x, y) ∈ R2 : F (x, y) = 0}
which is the graph of some function; i.e. does there exist a function φ say, such
that
{(x, y) ∈ R2 : F (x, y) = 0} = {(x, φ(x)) : F (x, φ(x)) = 0}?
In general the answer to the question is no. The purpose of the Implicit Function
Theorem is to establish sufficient conditions for such to exist, possibly on a
subset of the domain of F .
Example
Let F : R2 → R be given by F (x, y) = x2 + y 2 − 1.
(a, b)
(−1, 0) (1, 0)
The set of points
F −1 ({0}) = {(x, y) ∈ R2 : F (x, y) = 0} is {(x, y) : x2 + y 2 = 1}
is the unit circle, which cannot be the graph of any function.

We replace the global question by a local question: given a point (a, b) ∈
F −1 ({0}) is there a neighbourhood of (a, b) (X say) such that the set X ∩
F −1 ({0}) forms the graph of some function?
83
84 CHAPTER 11. IMPLICIT FUNCTIONS
In this example the answer is yes; provided that (a, b) 6= (±1, 0).
Notice that in this example ∂2 F (a, b) = F /2 (a, b)Id = 2bId = 0 at each of the
points (±1, 0). We have
Theorem 11.0.10 Let F : U ⊆ V × W → W where U is open and let F be
Fréchet differentiable on U . Suppose that there is a point (a, b) in U such that
F (a, b) = 0, DF is continuous at (a, b) and that (∂2 F )(a, b) ∈ L(W, W ) has a
continuous inverse (so that (∂2 F )(a, b) is a homeomorphism.)
Then there is an open set A ⊆ V containing a, an open set X ⊆ V × W
containing (a, b) and a unique function ϕ : A → W with ϕ(a) = b such that for
all x ∈ A
(a) (x, ϕ(x)) ∈ X
(b) F (x, ϕ(x)) = 0
(c) X ∩ F −1 ({0}) = {(x, ϕ(x)) : x ∈ A}.
(d) The function ϕ is differentiable at a and
Dϕ(a) = −(∂2 F (a, b))← ◦ (∂1 F )(a, b).
Remarks:
• In the special case V = W = R (i.e. F : U ⊆ R2 → R) we have that
∂2 F (a, b) = F /2 (a, b)Id ∈ L(R, R).
But remember that L(R, R) = {mId : m ∈ R}, so
1
(a) mId has an inverse ⇐⇒ m 6= 0; (it’s inverse is m Id.)
(b) ∂2 F (a, b) = F /2 (a, b)Id and
(c) every linear map from R to R is continuous.
So by (a) and (c) ∂2 F (a, b) has a continuous inverse ⇐⇒ ∂2 F (a, b) 6= 0
and by (b) this is equivalent to F /2 (a, b)Id 6= 0, which is in turn equivalent
to F /2 (a, b) 6= 0. Thus we can replace the condition “∂2 F (a, b) has a
continuous inverse” with F /2 (a, b) 6= 0.
• Notice also that in this case (V = W = R) we have ∂1 F (a, b) = F /1 (a, b)Id
so that /1
F (a, b)
(∂2 F (a, b))← ◦ ∂1 F (a, b) = Id.
F /2 (a, b)
• The next simplest case is V = Rm−n , W = Rn with m > n. Identify
V × W with Rm (= Rm−n × Rn ). In this case the conditions that
F be differentiable on U and
DF be continuous at a = (a, b) are usually met by checking that all partial
/j
derivatives Fi (1 ≤ i ≤ n, 1 ≤ j ≤ m) exist and are continuous at a.
The condition that ∂2 F (a, b) be a homeomorphism reduces to the invert-
ibility of its matrix; i.e. need that the matrix for the linear map ∂2 F (a, b)
has non zero determinant.
11.1. AN APPLICATION OF THE IMPLICIT FUNCTION THEOREM 85
11.1 An Application of the Implicit Function

Theorem
Let F : R2 × R → R be the map defined by
x2
F (x, y, z) = + y 2 + z 2 − 1.
4
We show that F satisfies the hypotheses of the Implicit Function Theorem on
an open set containing the point ((0, √12 ), − √12 ). Then we will interpret the
conclusion to the Implicit Function Theorem.
To see that F is differentiable on R3 and that it’s (Fréchet) partial derivatives
are all continuous (i.e. ∂1 F ((x, y), z) and ∂2 F ((x, y), z) are continuous) we have
Id2
F = 4 1 + Id22 + Id23 − 1 and so the theorems on differentiability give that F is
differentiable on R3 .
Now F /i (x, y, z) = DF (x, y, z)(ei ) thus we have
F /1 ((x, y), z) = x2
F /2 ((x, y), z) = 2y and .
F /3 ((x, y), z) = 2z
So the Jacobian F 0 ((x, y), z) is the 1 × 3 matrix ( x2 2y 2z) and
x
DF (x, y, z) = Id1 + 2yId2 + 2zId3 .
2
Now
∂1 F ((x, y), z)(u, v) = DF ((x, y), z)((u, v), 0)
= x2 u + 2yv.
At the point ((0, √12 ), − √12 ) we have

1 1 2
∂1 F 0, √ , −√ (u, v) = √ v;
2 2 2

1 1 2
i.e. ∂1 F 0, √ , −√ = √ Id2 .
2 2 2
And so ∂1 F ((0, √12 ), − √12 ) : R2 → R is continuous. Also
∂2 F ((x, y), z)(w) = DF ((x, y), z)((0, 0), w)

= 2zw
which gives
1 1 2
∂2 F 0, √ , −√ (w) = − √ w.
2 2 2
And so ∂2 F ((0, √12 ), − √12 ) : R → R is continuous and furthermore it has a
continuous inverse
← √
1 1 2
∂2 F 0, √ , −√ =− Id.
2 2 2
Since F and its partial derivatives are continuous on an open set containing
((0, √12 ), − √12 ), DF is continuous at ((0, √12 ), − √12 ).
Finally ((0, √12 ), − √12 ) is a point in R2 ×R such that F ((0, √12 ), − √12 ) = 0. Thus
F satisfies the hypotheses of the Implicit Function Theorem.
The conclusion to the Implicit Function Theorem asserts the existence of
• open A ⊆ R2
• open X ⊆ R2 × R containing ((0, √12 ), − √12 ) and
• a unique function φ : A → R with φ(0, √12 ) = − √12 satisfying for all x ∈ A;
(a) (x, φ(x)) ∈ R2 × R,

(b) F (x, φ(x)) = 0 and
(c) X ∩ F −1 ({0}) = {(x, φ(x) : x ∈ A}, furthermore
• the function φ is differentiable at (0, √12 ) and

←
1 1 1 1 1
Dφ 0, √ = − ∂2 F 0, √ , −√ ◦∂1 F 0, √ , −√ .
2 2 2 2 2
It is the proof of the theorem which will guide us to understand how to explicitly
describe the conclusion and so we defer the description until we have examined
a proof of the theorem.
Exercise 11
1. Suppose F : R2 → R with F (x, y) = (x − 2)3 + xey−1 . Show that in a

neighbourhood of (1, 1) the set {(x, y)| F (x, y) = 0} forms the graph of a
function φ with F (x, φ(x)) = 0.
2. The condition F /2 (a, b) 6= 0 in Theorem (11.0.10) is sufficient but not

necessary. Show that this is the case by using the example
F : R2 → R with F (x, y) = x9 − y 3 .
3. A curve C in R3 is given as the intersection of two surfaces S1 and S2

where
S1 = {(x, y, z) : x2 + y 2 + z 2 = 1} and
S2 = {(x, y, z) : z = x2 }
Show (0, 1, 0) ∈ C, and use the Implicit Function Theorem to show that
in a neighbourhood of (0, 1, 0) the coordinates of C may be expressed
entirely in terms of the x-coordinate, i.e there are functions f : R → R
and g : R → R such that in some neighbourhood of (0, 1, 0), (x, y, z) ∈ C
if and only if y = f (x) and z = g(x). Find f 0 (0) and g 0 (0).
Hint: Find a function F : R3 → R2 such that (x, y, z) ∈ C if and only if
F (x, y, z) = (0, 0).
11.1. AN APPLICATION OF THE IMPLICIT FUNCTION THEOREM 87
4. Let F : R×R
√ → R with F (x, y) = x2 +y 2 −4. Show that in a neighbourhood
of (1, 3) the set {(x, y)| F (x, y) = 0} forms the graph of a function φ
with F (x, φ(x)) = 0.
5. Let F : R × R → R with F (x, y) = x2 cos(y) + sin(x2 ).
√
Show that in a neighbourhood of ( π, π2 ) the set {(x, y)| F (x, y) = 0}
forms the graph of a function φ with F (x, φ(x)) = 0.
Chapter 12
Proof of the Theorems
Theorem 10.0.10: (Inverse Map Theorem)

Let f map an open subset A of a Banach space V into a normed vector space
W and let f be Fréchet differentiable on A and Df continuous at a ∈ A.
If the linear map Df (a) : V → W has a continuous inverse then
there is an open set X ⊆ A containing a and an open set Y ⊆ W containing f (a)
such that the restricted map f : X → Y has a continuous inverse f ← : Y → X.
The inverse map is Fréchet differentiable at f (a) and
(f ← )0 (f (a)) = (f 0 (a))−1
i.e. the Jacobian matrix of the inverse is the inverse of the Jacobian matrix.
The theorem reduces to the following special case in which V = W and Df (a) =
IdV .
Lemma 12.0.1 (Special Case of the Inverse Map Theorem)

Let f map an open set A of a Banach space V into V and let f be Fréchet
differentiable on A. Let Df be continuous at a.
If Df (a) = IdV then
there is an open set X ⊆ A containing a and an open set Y ⊆ V containing f (a)
←
such that the restricted map f : X → Y has a continuous inverse f : Y → X.
← ←
The inverse map f is differentiable at b = f (a) with Df (b) = Id.
First we assume that the special case is true and deduce the Inverse Map The-
orem.
Suppose that f maps an open subset A of a Banach space V to W , f is Fréchet
differentiable on A, Df is continuous at a ∈ A and the linear map Df (a) :
V → W has a continuous inverse (Df (a))← (i.e. assume the hypotheses of the
inverse map theorem).
Now set f = (Df (a))← ◦ f .
89
90 CHAPTER 12. PROOF OF THE THEOREMS
f -
A⊆V W

f
(Df (a))←
?
+
V
We want to show that f satisfies the hypotheses of the special case. We have
(i) f maps an open set A ⊆ V → V , by its definition
(ii) f is Fréchet differentiable at a ∈ A and (Df (a))← is Fréchet differen-

tiable at f (a) ∈ V (being a continuous linear map) and so the composite
(Df (a))← ◦ f is Fréchet differentiable at a ∈ A. Furthermore
Df (a) = D(Df (a))← ◦ f )(a)

= D(Df (a))← (f (a)) ◦ Df (a) by the chain rule
= (Df (a))← ◦ Df (a)
since (Df (a))← is a continuous linear map
= Df ← (f (a)) ◦ Df (a) the inverse of Df (a)
is the derivative of the inverse f ←
= D(f ← ◦ f )(a) by the chain rule
= DIdV (a) since f ← ◦ f = IdV
= IdV since it’s a continuous linear map.
(iii) To see thatDf is continuous at a ∈ A.

We have
Df : V → L(V, V )
a → Df (a) = IdV
and so ||Df (a) − Df (x)|| = ||IdV − IdV || = 0 < for every x in V .
By (i), (ii) and (iii) f = (Df (a))← ◦ f satisfies the hypotheses of the special
case.
The conclusion is that there is an open set X ⊆ A containing a and an open set
Y ⊆ V containing f (a) such that the restricted map f : X → Y has a continuous
← ←
inverse f : Y → X. The inverse map f is differentiable at b = f (a) with
←
Df (b) = Id.
It remains to deduce the conclusion to the Inverse Map Theorem.
Let X ⊆ A be an open set and let Y be the subset Df (a)(Y ) ⊆ W , so that
f : X → Y ⊆ W is given by
f = Df (a) ◦ f where Df (a) is restricted to Y .

91
f
X⊆A⊆V -Y ⊆W

3

f
Df (a)
?
Y ⊆V
Since each of Df (a) and f have a continuous inverse then f has a continuous
inverse
←
f ← = (Df (a) ◦ f )← = f ◦ (Df (a))← .
And so f ← : Y → X, Y is open, since Y is open and (Df (a))← is continuous.
Thus the conclusion to the Inverse Map Theorem holds:
there is an open set X ⊆ A containing a and an open set Y ⊆ W containing f (a)
such that the restricted map f : X → Y has a continuous inverse f ← : Y → X
which is Fréchet differentiable at f (a) and
(f ← )0 (f (a)) = (f 0 (a))−1 .
And so we have to prove the special case of the inverse map theorem. This is
really the main step in the proof of the Inverse Map Theorem because here we
have to prove existence of the sets X and Y .
Proof of Lemma (12.0.1):
We assume that f is a function satisfying the hypotheses of the special case,
i.e. we have f maps an open set A of a Banach space V into V , f is Fréchet
differentiable, Df continuous and Df (a) = IdV .
Now given a y close to b = f (a) ∈ V , we want to solve the equation y = f (x)
to get a unique point x near to a in A.
For this we can use the contraction map theorem.
• Define a function gy which is a contraction map on B r (a) and

gy = Idv − f + y.
• Apply the contraction map theorem to find that gy has a unique fixed
point (one for each y) and
gy (x) = x ⇐⇒ y = f (x) by the definition of gy
and so we obtain the desired y = f (x).
• This allows us to find the sets X, Y of the conclusion.
We define
gy : B r (a) → V by gy = IdV − f + y.
We’ll show that gy satisfies the contraction condition on B r (a) and that it maps
the closed ball B r (a), of radius r > 0 centred on a into itself. Since B r (a) is a
closed subset of a complete space it is complete and so for each y, gy satisfies
the hypotheses of the contraction map theorem.
First notice that; for each x ∈ B r (a),
Dgy (x) = DIdV (x) − Df (x) + Dy(x)

by the theorem on sums for Fréchet derivatives
= IdV − Df (x)
and that for each pair of points x1 , x2 ∈ B r (a)
||Dgy (x1 ) − Dgy (x2 )|| = ||IdV − Df (x1 ) − IdV + Df (x2 )||
= ||Df (x1 ) − Df (x2 )||.
We use this result together with the fact that Df is continuous at a ∈ A to

prove ∃r > 0 such that for each y
1
∀x ∈ B r (a) ||Dgy (x)|| < .
2
Since Df : V → L(V, V ) is continuous we have that
∀ > 0, ∃δ > 0, such that if x ∈ B r (a) ⊂ A we have Df (x) ∈ B (Df (a)).

1
In particular for = 2. Let r > 0 be so chosen that for all x ∈ A with
||x − a|| ≤ r we have
1
||Df (x) − Df (a)|| < 2
1
⇒ ||Df (x) − IdV )|| < 2 by assumption Df (a) = IdV
1
⇒ ||Dgy (x)|| < 2
and so for all ||x − a|| ≤ r we have ||Dgy (x)|| < 21 .

We have for x1 , x2 ∈ B r (a), [x1 , x2 ] ⊂ B r (a) since B r (a) is convex (see exercises
in chapter 10) and ||Dgy (x)|| < 12 i.e. gy has a derivative on [x1 , x2 ] ⊂ B r (a),
which is bounded, so by the Mean Value Theorem
||gy (x1 ) − gy (x2 )|| ≤ sup {||Dgy (x)||||x1 − x2 ||}

x∈[x1 ,x2 ]
1
< 2 ||x1 − x2 ||. (†)
Thus for each y, gy satisfies the contraction condition. Next we show that for
y ∈ B r2 (b), gy (x) ∈ B r (a) for all x ∈ B r (a).
Let y ∈ B r2 (b), i.e. ||y − b|| < 2r .
Let x ∈ B r (a) ⇒ ||x − a|| ≤ r.
Now by (†) we have for each fixed y and each x1 , x2 ∈ B r (a)
1
||gy (x1 ) − gy (x2 )|| < ||x1 − x2 ||.
2
So in particular we have
1 r
||gy (x) − gy (a)|| < ||x − a|| ≤ ,
2 2
93
whenever x ∈ B r (a).
Now gy = IdV − f + y and so
gy (a) − a = −f (a) + y = −b + y and ||gy (a) − a|| = || − b + y||.
Furthermore
||gy (x) − a|| = ||gy (x) − gy (a) + gy (a) − a||
≤ ||gy (x) − gy (a)|| + ||gy (a) − a||
= ||gy (x) − gy (a)|| + ||y − b|| < r ≤ r.
So for y ∈ B r2 (b) and x ∈ B r (a) we have
||gy (x) − a|| ≤ r

⇒ gy (x) ∈ B r (a)
i.e. gy : B r (a) → B r (a)
which is what we wanted to prove, since along with the contraction condition
(†) it gives that gy satisfies the contraction map theorem for B r (a).
By the Contraction Map Theorem gy has a unique fixed point for each y.
i.e. ∀y ∈ B r2 (b), ∃ unique x ∈ B r (a) such that y = f (x).
And so
i.e. ∀y ∈ B r2 (b), ∃ unique x ∈ Br (a) such that y = f (x) (***).
So put Y = B r2 (b) and X = f ← (Y ) ∩ Br (a). Then Y is open and X is open

(being the intersection of two open sets) and by (***), f (X) = Y .
It remains to prove that f ← is continuous and that f ← is Fréchet differentiable
at f (a) = b.
Let r > 0, Br (a) ⊆ X.
Choose δ = 2r , B r2 (b) = Y .
Then f ← (B r2 (b)) ⊆ Br (a), since (∀y ∈ (B r2 (b)), f ← (y) = x ∈ Br (a).
And so f ← is continuous.
Finally we show that f ← is differentiable at f (a) = b.
We have by assumption that f is differentiable at a, so
f (a + h) − f (a) − Df (a)(h) = O(h) (i)

O(h)
where O(h) is a function for which lim = 0.
h→0 ||h||
Assume L← b is a linear map and
←
that Lb ◦ (Df (a)) =
Id we’ll find a sufficient
condition for L←
b to be the Fréchet derivative of f ←
at b. (L←b exists since by
assumption Df (a) = IdV which has a linear inverse.)
Applying L← b to (i) we have
L←
b (f (a + h) − f (a)) = h + L←b (O(h)) (1)
= (a + h − a) + L←
b (O(a + h − a)).
For k 6= 0 and b + k ∈ Y there is a unique non-zero h such that f (a + h) =

b + k; i.e. f ← (b + k) = a + h and so
L← ← ← ← ← ←
b (b + k − b) = f (b + k) − f (b) + Lb (O(f (b + k) − f (b)))
rewriting f (a) = b and f (h) = k
i.e. f ← (b + k) = f ← (b) + L← ← ← ←
b (k) − Lb (O(f (b + k) − f (b))).
To see that L←
b is the Fréchet derivative of f
←
at f (a) we have to show that
||L← ← ←
b (O(f (b + k) − f (b)))||
lim =0
k→0 ||k||
which is the sufficient condition for L← ←
b = Df (b).
But Lb is a linear transformation and so ||Lb (x)|| ≤ ||L←
← ←
b ||||x|| and so we have
to show that
||O(f ← (b + k) − f ← (b))||
lim = 0.
k→0 ||k||
Now
||O(f ← (b+k)−f ← (b))||
||k||
←
(b+k)−f ← (b))|| ||f ← (b+k)−f ← (b||
= ||O(f
||f (b+k)−f ← (b)||
← ||k|| (2)
← ← ←
and by continuity of f we have f (b + k) → f (b) as b + k → b, i.e. k → 0.
And by the definition of O
||O(f ← (b + k) − f ← (b))||
→0
||f ← (b + k) − f ← (b||
as f ← (b + k) → f ← (b).
Also, by the contraction condition (†)
||f ← (b + k) − f ← (b|| 1
→ .
||k|| 2
So since both limits on the right of (2) exist and one of them is zero we
have the desired result. Thus we have deduced the conclusion to the special
case of the Inverse Map Theorem we have X = f ← (Y ) ∩ Br (a), Y = B r2 (b) and
f (X) = Y . In addition f : X → Y is injective and so it’s inverse f ← : Y → X
exists. We have also shown that f ← is continuous and differentiable.
Recall the Implicit Function Theorem 11.1
Let F : U ⊆ V × W → W where U is open and let F be Fréchet
differentiable on U . Suppose that there is a point (a, b) in U such
that F (a, b) = 0, DF is continuous at (a, b) and that (∂2 F )(a, b) ∈
L(W, W ) has a continuous inverse (so that (∂2 F )(a, b) is a homeo-
morphism.)
Then there is an open set A ⊆ V containing a, an open set X ⊆
V × W containing (a, b) and a unique function ϕ : A → W with
ϕ(a) = b such that for all x ∈ A
95
(a) (x, ϕ(x)) ∈ X

(b) F (x, ϕ(x)) = 0
(c) X ∩ F −1 (0) = {(x, ϕ(x)) : x ∈ A}.
(d) The function ϕ is differentiable at a and
Dϕ(a) = −(∂2 F (a, b))← ◦ (∂1 F )(a, b).
We use the Inverse Map Theorem to prove the Implicit Function Theorem.
The Implicit Function Theorem involves a function F : V × W → W so to use
the Inverse Map Theorem, I need a function H : V ×W → V ×W which satisfies
the hypotheses of the Inverse Map Theorem.
Define a function H : V × W → V × W by
H(v, w) = (v, F (v, w))
(1) To show H is differentiable and for each (h, k) ∈ V × W
DH(v, w)(h, k) = (h, DF (v, w)(h, k)).
H = (H1 , H2 ) = (Id1 , F ), so by the theorem on component functions,

since Id1 and F are differentiable on U , H is differentiable on U with
DH(v, w)(h, k) = (DId1 (v, w)(h, k), DF (v, w)(h, k))

= (Id1 (h, k), DF (v, w)(h, k))
= (h, DF (v, w)(h, k))
as required.
(2) To show DH is continuous at (a, b),

This is true if for each (v, w) ∈ U
||DH(v, w) − DH(a, b)|| = ||DF (v, w) − DF (a, b)||
since DF is continuous on U by assumption.

Let (a, b) ∈ U , let (v, w) ∈ U
||DH(v, w) − DH(a, b)|| = ||(Id1 , DF (v, w)) − (Id1 , DF (a, b))||

= ||(0, DF (v, w) − DF (a, b))||
= ||DF (v, w) − DF (a, b)||
Hence DH is continuous at (a, b).
(3) Next we are to prove that DH(a, b) has an inverse and that the inverse is
continuous.
(i) Let (v, w) ∈ V × W. To Prove:
DH(a, b)(x, y) = (v, w) ⇔ (x, y) = (v, ∂2 F (a, b)← (w−∂1 F (a, b)(v))).
First suppose that
DH(a, b)(x, y) = (v, w)
i.e. by (1) (x, DF (a, b)(x, y))) = (v, w) and so x = v and DF (a, b)(x, y) =
w. But
DF (a, b)(x, y) = ∂1 F (a, b)(x) + ∂2 F (a, b)(y)
i.e. ∂2 F (a, b)(y) = w − ∂1 F (a, b)(x). Now by assumption ∂2 F (a, b)

has continuous inverse. We have
(∂2 F (a, b))← (∂2 F (a, b)(y)) = (∂2 F (a, b))← (w − ∂1 F (a, b)(x))
i.e. y = (∂2 F (a, b))← (w − ∂1 F (a, b)(v))

since x = v
i.e. (DH(a, b))← (v, w) = (v, (∂2 F (a, b))← (w − ∂1 F (a, b)(v)))
i.e. (DH(a, b))← = (Id1 , (∂2 F (a, b))← ◦ (Id2 − ∂1 F (a, b) ◦ Id1 )).
Conversely suppose that
(x, y) = (v, (∂2 F (a, b))← (w − ∂1 F (a, b)(v))).
Now
DH(a, b)(x, y) = DH(a, b)(v, (∂2 F (a, b))← (w − ∂1 F (a, b)(v)))
.
= (v, DF (a, b)(v, (∂2 F (a, b))← (w − ∂1 F (a, b)(v))))
And
DF (a, b)(v, ∂2 F (a, b)← (w − ∂1 F (a, b)(v)))
= ∂1 F (a, b)(v) + ∂2 F (a, b)(∂2 F (a, b)← (w − ∂1 F (a, b)(v)))
= ∂1 F (a, b)(v) + w − ∂1 F (a, b)(v)
=w
Hence
DH(a, b)(x, y) = (v, w).
(ii) Next we want to see that (DH(a, b))← is continuous.
||(DH(a, b))← (v, w)|| = ||(v, (∂2 F (a, b))← (w − ∂1 F (a, b)(v))||
so
||(DH(a, b))← (v, w)||2 = ||v||2 +||(∂2 F (a, b))← (w−∂1 F (a, b)(v))||2
97
since ||(v, w)||2 = ||v||2 + ||w||2 in a product space. This gives
||(DH(a, b))← (v, w)||2

≤ ||v||2 + ||(∂2 F (a, b))← ||2 ||w − ∂1 F (a, b)(v)||2
since (∂2 F (a, b))← is linear
≤ ||v||2 + ||(∂2 F (a, b))← ||2 (||w|| + ||∂1 F (a, b)||||v||)2
by triangle inequality and ∂1 F (a, b) is linear
Hence if ||(v, w)|| ≤ ε i.e. ||v||2 + ||w||2 ≤ ε2 , i.e. ||v|| ≤ ε and

||w|| ≤ ε then
||(DH(a, b))← (v, w)||2

≤ ε2 + ||(∂2 F (a, b))← ||2 (ε + ∂1 F (a, b)ε)2
= ε2 (1 + ||(∂2 F (a, b))← ||2 (1 + ∂1 F (a, b))2 )
i.e.
1
||(DH(a, b))← (v, w)|| ≤ ε(1 + ||(∂2 F (a, b))← ||2 (1 + ∂1 F (a, b))2 ) 2
and so can be made arbitrarily small by choice of ε.

So by (1), (2) and (3) H satisfies the hypotheses of the Inverse Map
Theorem. There exists an open set X ⊆ U containing (a, b) and an open
set Y = H(X) ⊆ V × W containing (a, 0) = H(a, b), such that the
restricted map H : X → Y has a continuus inverse H ← : Y → X which is
Fréchet differentiable at (a, 0) with DH ← (a, 0) = (DH(a, b))← .
From this we want to demonstrate the conclusion to the Implicit Function
Theorem.
(4) For the open set A ⊆ V we choose A = {x ∈ V : (x, 0) ∈ Y }.
Define a continuous function f : V → V × W by f (x) = (x, 0).
Then A = f −1 (Y ), since if (u, 0) ∈ Y then f −1 ((u, 0)) = u ∈ V and so it
is in A by definition of A, i.e. f −1 (Y ) ⊆ A.
Conversely let x ∈ A, i.e. x ∈ V and (x, 0) ∈ Y . Hence x ∈ f −1 (Y ). And
so A ⊆ f −1 (Y ). Hence A = f −1 (Y ).
The restriction f |A : A → A × {0} = Y of f to A is a continuous bijection
with continuous inverse (i.e. a homeomorphism), hence A is open if and
only if Y is open.
(5) Next we want to find ϕ.
First we show that for each x ∈ A, there exists unique y ∈ W for which
(x, y) ∈ X and H(x, y) = (x, 0).
Let x ∈ A ⇒ (x, 0) ∈ Y ⇒ ∃(x0 , y0 ) ∈ X with H(x0 , y0 ) = (x, 0), since
H : X → Y is onto.
But H(x0 , y0 ) = (x0 , F (x0 , y0 )) and so x = x0 and there exists unique y0
such that F (x0 , y0 ) = 0, since if y2 6= y0 and F (x0 , y2 ) = 0 ⇒ H(x0 , y0 ) =
H(x0 , y2 ) and so H is not 1-1 on X.
Thus we may define ϕ : A → W by for each x ∈ A, (x, ϕ(x)) ∈ X and

H(x, ϕ(x)) = (x, 0).
This gives ϕ(a) = b, since (a, ϕ(a)) ∈ X and
H(a, ϕ(a)) = (a, F (a, ϕ(a))) = (a, 0) by definition of ϕ.
So F (a, ϕ(a)) = 0 = F (a, b) (by assumption), i.e. ϕ(a) = b, since H is
1-1 on X.
(6) In fact, for any x ∈ A we have (x, ϕ(x)) ∈ X and
H(x, ϕ(x)) = (x, 0) = (x, F (x, ϕ(x)))
⇒ 0 = F (x, ϕ(x)).
Next we show that

X ∩ F −1 (0) = {(x, ϕ(x))|x ∈ A}.
Let (u, v) ∈ {(x, ϕ(x))|x ∈ A}, i.e. u = x, v = ϕ(x) for some x ∈ A

⇒ (u, v) ∈ X, by definition of ϕ. Also (u, v) ∈ F −1 (0) since F (x, ϕ(x)) =
0 so (u, v) ∈ X ∩ F −1 (0). i.e. {(x, ϕ(x))|x ∈ A} ⊆ X ∩ F −1 (0).
Next take (u, v) ∈ X∩F −1 (0) ⇒ (u, v) ∈ X ( and consequently H(u, v) =
(u, F (u, v)) ∈ Y )
also (u, v) ∈ F −1 (0) ⇒ F (u, v) = 0.
i.e. H(u, v) = (u, 0).
And so (u, v) = (x, ϕ(x)) for some x ∈ A, i.e. (u, v) ∈ {(x, ϕ(x))|x ∈ A}.
And so
X ∩ F −1 (0) = {(x, ϕ(x))|x ∈ A}
which is what we wanted to show.
(7) It remains to show that ϕ is differentiable at a ∈ A and that
Dϕ(a) = −(∂2 F (a, b))← ◦ ∂1 F (a, b).
Notice that if f : A → V × W is defined by f (x) = (x, 0) then

ϕ = Id2 ◦ H ← ◦ f .
[Let x ∈ A, then
(Id2 ◦ H ← ◦ f )(x) = Id2 ◦ H ← (x, 0) = Id2 (x, ϕ(x)) = ϕ(x).]
But Id2 and H ← are differentiable and f is continuous linear and hence
differentiable. And so ϕ is the composite of differentiable functions.
Furthermore
Dϕ(a) = D(Id2 ◦ H ← ◦ f )(a)
= DId2 (H ← (f (a))) ◦ D(H ← ◦ f )(a)
= Id2 ◦ DH ← (f (a)) ◦ Df (a)
= Id2 ◦ DH ← (a, 0) ◦ f since f is continuous linear
99
But
DH ← (a, 0) = (DH(a, b))←
= (Id1 , (∂2 F (a, b))← ◦ (Id2 − ∂1 F (a, b) ◦ Id1 )).
And so
Dϕ(a)(h) = ((∂2 F (a, b))← ◦ (Id2 − ∂1 F (a, b) ◦ Id1 ))(f (h))
= ((∂2 F (a, b))← ◦ (Id2 − ∂1 F (a, b) ◦ Id1 ))((h, 0))
= ((∂2 F (a, b))← ◦ −∂1 F (a, b))(h) hence
Dϕ(a) = (∂2 F (a, b))← ◦ −∂1 F (a, b)
as required.
This completes the proof of the Implicit Function Theorem.
To finish we return to the example of Chapter 11 and find explicitly the sets X
and A and function φ : A → W .
Everything depended upon the function H here we have H : R2 × R → R2 × R
which maps ((x, y), z) to ((x, y), F ((x, y), z)).
X was to be a set on which H is bijective so that H −1 : Y → X exists; also X
should contain the point ((0, √12 ), − √12 ). Consider the set
x2
F −1 ({0}) = {((x, y), z) ∈ R3 : + y 2 + z 2 = 1}.
4
q
x2 2
Now 4 + y 2 + z 2 = 1 ⇐⇒ z = ± 1 − ( x4 + y 2 ). And z can only be defined
x2
in this way if we have 4 + y 2 ≤ 1.
q
2
Also its clear that we should take z = − 1 − ( x4 + y 2 ) (†)
since we want ((0, √12 ), − √12 ) ∈ X.
Now if H is restricted to
x2
{((x, y), z) : + y 2 ≤ 1 and z ≤ 0}
4
then it is injective.
Proof: Suppose that H((x, y), z) = H((u, v), w) this is equivalent to saying
((x, y), F ((x, y), z))

= ((u, v), F ((u, v), w)) which is true
⇐⇒ (x, y) = (u, v) and F ((x, y), z) = F ((u, v), w)
⇐⇒ x = u, y = v and F ((u, v), z) = F ((u, v), w)
2 2
⇐⇒ u4 + v 2 + z 2 − 1 = u4 + v 2 + w2 − 1
but since z, w ≤ 0 this is true ⇐⇒ z = w.
But
x2
{((x, y), z) : + y 2 ≤ 1 and z ≤ 0}
4
is not open in R3 so we take for X the set
x2
{((x, y), z) : + y 2 < 1 and z < 0}.
4
2
Now to find A ⊆ R2 : if we take A = {(x, y) : x4 + y 2 < 1} and for φ : A → R
the function defined by
r
x2
φ(x, y) = − 1 − ( + y 2 )
4
then φ(0, √12 ) = − √12 and so φ satisfies the first condition. In addition:
q
let (x, y) ∈ A then (x, y), − 1 − ( x4 + y 2 ) ∈ R2 × R; so (a) is satisfied.
2
q
2
F ((x, y), φ(x, y)) = F (x, y), − 1 − ( x4 + y 2 )
q 2
x2 2 x2 2
= 4 +y + − 1−( 4 +y ) −1=0
so (b) is satisfied.
And
2
X ∩ F −1 ({0}) = {((x, y), z) : x4 + y 2 < 1 and z < 0}∩
{((x, y), φ(x, y)) : (x, y) ∈ A}
2
= {((x, y), z) : x4 + y 2 < 1 and z < 0}∩
2 2
{((x, y), 1 − ( x4 + y 2 )) : x4 + y 2 < 1}
= {((x, y), φ(x, y)) : (x, y) ∈ A}
since z = {((x, y), φ(x, y)) : (x, y) ∈ A}
whenever (x, y) ∈ A.
Hence (c) is satisfied.
q
2
Finally φ(x, y) = − 1 − ( x4 + y 2 ) means that
2
φ/1 (x, y) = − 21 (1 − ( x4 + y 2 ))−1/2 .(− 2x
4 )
= q xx2
4 1−( 4 +y 2 )
and
2
φ/2 (x, y) = − 12 (1 − ( x4 + y 2 ))−1/2 .(−2y)
= q yx2
1−( 4 +y 2 )
so
x y
Dφ(x, y) = q Id1 + q Id2 .
x2 2
4 1 − ( 4 + y2 ) 1 − ( x4 + y 2 )
1
Dφ(0, √ ) = Id2 .
2
101
Recall that we had

√
(∂2 F ((0, √12 ), − √12 ))← = − 22 Id
∂1 F ((0, √12 ), − √12 ) = √22 Id2
so you can check that

1 1 1 1
(−(∂2 F ((0, √ ), − √ ))← ◦ ∂1 F ((0, √ ), − √ ))(u, v) = v
2 2 2 2
Thus
1 1 1 1 1
−(∂2 F ((0, √ ), − √ ))← ◦ ∂1 F ((0, √ ), − √ ) = Dφ(0, √ )
2 2 2 2 2
as required.
Exercise 12: Go back to the questions in Exercise 11 and find relevant sets A.
X and the function φ.
Chapter 13
Appendix A- Vector Spaces
A vector space (over R) consists of a set V together with a pair of operations
• vector addition: V × V → V (a, b) 7→ a + b

• scalar multiplication: R × V → V (λ, a) 7→ λa
and an element 0 ∈ V such that the following conditions are satisfied:
(A1) ∀a, b ∈ V a+b=b+a

(A2) ∀a, b, c ∈ V a + (b + c) = (a + b) + c
(A3) ∀a ∈ V a+0=a
(A4) ∀a ∈ V ∃(−a) ∈ V a + (−a) = 0
(M1) ∀a ∈ V 1a = a
(M2) ∀a, b ∈ V ∀λ ∈ R λ(a + b) = λa + λb
(M3) ∀a ∈ V ∀λ, µ ∈ R (λ + µ)a = λa + µa
(M4) ∀a ∈ V ∀λ, µ ∈ R λ(µa) = (λµ)a
Note: it is easy to check that there is at most one element 0 in V which satisfies
(A3)–this element is called the zero element of V .
Question: where did the 1 come from in (M1)?
A vector space is defined as a set V over R (in our case). In fact for R one
can substitute any field. Examples of fields are: R, C, Q with respect to the
operations of addition and multiplication.
Vector Subspace
Given a vector space V equipped with a pair of operations, + and multiplication
by an element of the field of scalars, a subset S of the set V is said to be a vector
subspace of V if and only if it is non-empty and closed under + and the scalar
multiplication.
103
104 CHAPTER 13. APPENDIX A- VECTOR SPACES
Field axioms:
(i) ∀x, y ∈ F x+y =y+x
(ii) ∀x, y, z ∈ F x + (y + z) = (x + y) + z
(iii) ∃0 ∈ F ∀x ∈ F x+0=x
(iv) ∀x ∈ F ∃(−x) ∈ F x + (−x) = 0

(v) ∃1 ∈ F ∀x ∈ F 1x = x
(vi) ∀x, y ∈ F xy = yx
(vii) ∀x, y, z ∈ F (xy)z = x(yz)

viii) ∀x ∈ F, x 6= 0, ∃x−1 ∈ F xx−1 = 1
(ix) ∀x, y, z ∈ R x(y + z) = xy + xz
Examples of vector spaces: Rn for any n ∈ N; a set of matrices, for example
the set M2×2 (R) of 2 × 2 matrices with real entries; a space of functions, for
example F(X, R), the space of real-valued functions on X; a set of polynomials;
a set of solutions to a differential equation; a set of solutions to a system of
linear equations, to mention only a few.
Normed Vector Space
Recall that a map || || : V → R satisfying the following
(a) ||a|| ≥ 0 and ||a|| = 0 ⇔ a = 0,

(b) ||a + b|| ≤ ||a|| + ||b||,
(c) for any λ ∈ R, ||λa|| = |λ|||a||,
for all a, b ∈ V is a norm on V .

A normed vector space is then a pair (V, || ||).
Chapter 14
Appendix B- Complete
Spaces
Definition 14.0.2 We say (xn )n∈N is a Cauchy sequence if and only if

∀ε > 0, ∃ν ∈ N such that ∀n, m ∈ N, with n, m > ν, d(xn , xm ) < ε.
The difference between the definition of convergence and the definition of a
Cauchy sequence, is that in the former the limit is explicitly involved and not
in the latter. The next proposition may enable us to decide whether or not a
given sequence converges, without knowledge of the limit.
Proposition 14.0.3 In any metric space, all convergent sequences are Cauchy
sequences.
Proof: Let (X, d, T ) be a metric space. Let {xn } → x be a convergent sequence
in X. In particular, let (xn )n∈N be a sequence of points in X, let x ∈ X, let
ε > 0, then ∃ν ∈ N such that for all n ∈ N with n > ν we have d(xn , x) < 2ε .
Suppose that m, n > ν, so d(xm , x) < 2ε and d(xn , x) < 2ε . Thus
d(xm , xn ) ≤ d(xm , x) + d(xn , x) < ε.
And so (xn )n∈N is Cauchy.

Thus if a sequence is not Cauchy it is not convergent.
Definition 14.0.4 A subset of a metric space is complete if and only if every
Cauchy sequence in the subset is convergent.
A metric space is complete if and only if the universal set (the set of all elements
of the space) is complete.
The subset (0, 1) of R is not complete. For example the sequence (s(n)) with
1
s(n) = 2n converges to zero. (We have for any given > 0, choose ν = 1 + 1.
Then for any n > ν, n > 1 ; i.e. > n1 > 2n
1 1
. Thus d( 2n , 0) < ,). But 0 6∈ (0, 1)
and so the sequence (s(n)) is not convergent in (0, 1).
105
106 CHAPTER 14. APPENDIX B- COMPLETE SPACES
Proposition 14.0.5 Rn , n ≥ 1 with the usual metric, is complete.
Proof: First we show that (R, | |, T ) is complete.

Suppose (sn )n∈N is a Cauchy sequence in R. Then {sn |n ∈ N} is bounded;
since for example if ε = 1, ∃ν, such that ∀m, n > ν, |sn − sm | < 1. Thus for
any n > ν, |sn − sν | < 1 and so 1 − |sν | < |sn | < 1 + |sν |.
Now let m > ν and define the set Sm = {|sn | : n ≥ m} ⊆ R. Sm is bounded
above by ε + |sν | and it’s not empty since |sm | is in it. Thus sup(Sm ) exists.
Let tm = sup(Sm ).
Notice that Sm+1 ⊆ Sm ; i.e. {|sn | : n ≥ m + 1} ⊆ {|sn | : n ≥ m}. So
sup(Sm+1 ) ≤ sup(Sm ); i.e. tm+1 ≤ tm .
Thus the sequence (tn )n∈N is monotonic decreasing.
Also tm is an upper bound of Sm and so tm ≥ |sm | ∈ Sm . Since Sm is bounded
below by 0, we have that {tm : m ∈ N} is also bounded below. So (tm )n∈N is a
bounded sequence which means ∃l ∈ R such that {tm } → l.
Since (sn )n∈N is Cauchy, given any ε > 0, ∃ν1 such that for all m, n > ν1 we
have |sm − sn | < 3ε and since {tm } → l , ∃ν2 such that |l − tm | < 3ε for all
m > ν2 .
Let ν = max{ν1 , ν2 }. By definition of tν , tν − 3ε is not an upper bound of Sν
(tν is the least upper bound). So there exists µ ≥ ν such that |sµ | > tν − 3ε and
sµ ≤ |sµ | ≤ tν (tν is an upper bound of {|sn | : n ≥ ν}).
Now for any n ≥ µ ≥ ν,
||sn | − l| ≤ ||sn | − |sµ || + ||sµ | − tν | + |tν − l|

3
< 3 < ε.
Hence (|sn |) → l. But (sn ) is Cauchy and so (sn ) → l or (sn ) → −l. Hence
every Cauchy sequence in R is convergent; i.e. R is complete.
Now suppose that xr = (x1r , x2r , . . . , xnr ), r ∈ N, is a Cauchy sequence in Rn ;
xir the ith component i
pP of xr . Then for each i, xr is a Cauchy sequence in R,
i i
since |xr − xs | ≤ (xr − xs ) = d(xr , xs ); where d is the usual metric on Rn .
i i 2
Since R is complete, we have {xir } → xi ∈ R for each i : 1 ≤ i ≤ n.

Now let ε > 0, for each i, ∃νi such that |xir − xis | < √εn , for all positive integers
psP> νi . So choosing r = max{νi : i ∈ {1, 2, . . . , n}} we have
r, d(xr , x) =
(xir − xis )2 < ε. Since n is fixed this means {xr } → x = (x1 , x2 , . . . , xn ) ∈
Rn .
Proposition 14.0.6 The normed vector space B[a, b] is complete.
Proof: Let {fn }∞ n=1 be a Cauchy sequence in B[a, b]. We are to prove that
there is a function f in B[a, b] which is the limit of this sequence.
The first step is to find some way of constructing a function f which has a
reasonable chance of being the required limit. We now show how to define the
value of such a function f at each point x in [a, b].
To this end, let x ∈ [a, b]. Since for all m, n, ∈ N
|fm (x) − fn (x)| ≤ ||fm − fn ||

107
it follows that the sequence {fn (x)}∞n=1 is dominated by a Cauchy sequence;

hence it is itself a Cauchy sequence.
But R is complete. Hence this Cauchy sequence {fn (x)}∞ n=1 of real numbers
must have a limit in R. It is therefore meaningful to define f (x) by putting
f (x) = lim fn (x).

n→∞
It remains to show that the function f defined in the previous paragraph is

indeed the limit of the sequence of functions {fn }∞
n=1 in the space B[a, b].
Let > 0.
Since {fn }∞
n=1 is a Cauchy sequence in B[a, b] it follows that
(∃ν ∈ N)(∀m, n ∈ N) m, n > ν ⇒ ||fm − fn || < .
Let ν be this natural number.

Now let x ∈ [a, b] and let m, n ∈ N, we have
|fm (x) − fn (x)| < . (∗)
We can now fix m ∈ N and let n → ∞. Since inequalities are respected by

limits we have
lim |fm (x) − fn (x)| ≤ .
n→∞
But the absolute value function is continuous. So by the theorem continuity via
sequences we may take lim inside:
|fm (x) − limn→∞ fn (x)| ≤ remembering that fm (x) is fixed.

i.e. |fm (x) − f (x)| ≤ .
But this is true for all x ∈ [a, b]. Hence
||fm − f || ≤ .
Thus we have shown that
(∀ > 0)(∃ν ∈ N)|(∀m ∈ N) m > ν ⇒ ||fm − f || ≤ . (∗∗)
From this it would seem that we have reached our goal, except that we have
“≤ ” rather than “< ”. We can fix this by using the 2 trick at line (∗).
More seriously we have not yet shown that f ∈ B[a, b]. So we do this next.
Let x ∈ [a, b], and let n ∈ N. By the triangle inequality for R,
|f (x)| = |f (x) − fn (x) + fn (x)|

≤ |f (x) − fn (x)| + |fn (x)|
≤ ||f − fn || + ||fn ||
where by (∗∗), ||f −fn || is certainly defined for some n and where ||fn || is defined
since fn ∈ B[a, b]. The right hand side is thus an upper bound for the set whose
typical member appears on the left. Hence f ∈ B[a, b], as required.
108 CHAPTER 14. APPENDIX B- COMPLETE SPACES
Chapter 15
APPENDIX C-Higher
Order Fréchet Derivatives
In Chapter 4, we saw how to construct from a given pair E, F of normed vector

spaces, a third normed vector space, the space L(E, F) of continuous linear maps
L : E → F with supremum norm (i.e. ||L|| = sup{||L(x)|| : ||x|| ≤ 1}).
We can continue the process, constructing
L(E, L(E, F)), L(E, L(E, L(E, F))), L(E, L(E, L(E, L(E, F)))) . . .
Applying the construction to the pair R2 , R leads to spaces with the following
bases:
L(R2 , R) with basis
{Id1 , Id2 } (since for any L ∈ L(R2 , R), L = aId1 + bId2 , a, b ∈ R; the set
{Id1 , Id2 } is a set of linearly independent vectors which spans L(R2 , R).)
L(R2 , L(R2 , R)) with basis
{Id1 Id1 , Id1 Id2 , Id2 Id1 , Id2 Id2 } (since for any L ∈ L(R2 , L(R2 , R)) and (x, y) ∈
R2 we have
L(x, y) = (mx + ny)Id1 + (px + qy)Id2 ;
i.e. L = mId1 Id1 + pId1 Id2 + nId2 Id1 + qId2 Id2 };
L(R2 , L(R2 , L(R2 , R))) with basis
{Id1 Id1 Id1 , Id1 Id1 Id2 , Id1 Id2 Id1 , Id1 Id2 Id2 , Id2 Id1 Id1 , Id2 Id1 Id2 , Id2 Id2 Id1 , Id2 Id2 Id2 }.
More generally we have
Definition 15.0.7 Let E, F be vector spaces over R, f : E → R and g : E → F.
The scalar product f g is a function f g : E → F defined by
(f g)(v) = f (v)g(v).
Theorem 15.0.8 Let F be a vector space over R with basis {w1 , w2 , . . . , wm }.
Then a basis for L(Rn , F) is
{Idj .wi : i = 1, 2, . . . , m; j = 1, 2, . . . n}.
109
110CHAPTER 15. APPENDIX C-HIGHER ORDER FRÉCHET DERIVATIVES
Proof: Let F be a vector space over R with basis {w1 , w2 , . . . , wm }.

(a) First we show that Idj .wi is linear for each i, j.
Let x = (x1 , . . . , xn ), y = (y1 , . . . , yn ) ∈ Rn and λ ∈ R, then
Idj .wi (x + λy) = Idj (x1 + λy1 , . . . , xn + λyn )wi (x1 + λy1 , . . . , xn + λyn )
= (xj + λyj )wi
= xj wi + λyj wi
= Idj .wi (x) + λIdj .wi (y).
(b) To prove that Idj .wi is continuous we express it as the composite Id1 .Id2 ◦
(Idj , wi ) which we will show is continuous being the composite of contin-
uous functions.
(i) To prove that (Idj , wi ) : Rn → R × F is continuous we will show that
it is differentiable at any a = (a1 , . . . , an ) ∈ Rn .
Let a ∈ Rn , h ∈ Rn then
(Idj , wi )(a + h) − (Idj , wi )(a) = (aj + hj , wi ) − (aj , wi )
= (hj , 0).
Now choosing D(Idj , wi )(a)(h) = (hj , 0); i.e. D(Idj , wi )(a) = (Idj , 0),
it is clear that the map is linear and continuous and satisfies the re-
quired limit condition. Thus (Idj , wi ) is continuous at any point
a ∈ Rn , being differentiable everywhere.
(ii) Here we prove that
Id1 .Id2 : R × F → F
(a, â) 7→ aâ
is continuous.
Let a = (a, â) ∈ R × F.
Let ε > 0.
Choose δ = min{1, 1+||â||εF +|a|R }; so δ > 0 and it is in R.
Let x = (x, x̂) ∈ R × F and suppose that
||x − a||Rn < δ
then
||x − a|| < 1 and ||x − a|| < 1+||â||εF +|a|R thus
||x − a||(1 + ||â||F + |a|R ) < ε and so
||x − a||(||x − a|| + ||â||F + |a|R ) < ε, since ||x − a|| < 1 thus
||x − a||2 + ||x − a||(||â||F ) + ||x − a||(|a|R ) < ε but since
|x − a| ≤ ||x − a|| and ||x̂ − â|| ≤ ||x − a|| we have
|x − a|(||x̂ − â||) + |x − a|(||â||) + |a|(||x̂ − â||) < ε this gives
||(x − a)(x̂ − â) + (x − a)â + a(x̂ − â)|| < ε
by homogeneity and the triangle inequality i.e.
||xx̂ − aâ|| < ε.
111
And so Id1 Id2 : R×F → F is continuous. Thus the composite function

Idj .wi : Rn → W is continuous at each a ∈ Rn .
(c) By (a) and (b) we have that Idj .wi ∈ L(Rn , F). We now show that {Idj .wi :
j = 1, . . . , n; i = 1, . . . , m} is a basis for L(Rn , F).
Let L ∈ L(Rn , F), let {e1 , . . . , en } be the usual basis for Rn , let {w1 , w2 , . . . , wm }
be a basis for F (over R) and let x = (x1 , . . . , xn ) ∈ Rn ; then
Xn n
X
L(x) = L( xj ej ) = xj L(ej )
j=1 j=1
since L is linear. Pm
Also L(ej ) ∈ F and so L(ej ) = i=1 αij wi since wi are elements of a
basis for F. Thus
Pn
L(x) = j=1 xj L(ej )
Pn Pm
= j=1 xj i=1 αij wi
Pn Pm
= j=1 i=1 αij xj wi
Pn Pm
= j=1 i=1 αij Idj .wi (x).
Thus ∃λij ∈ R such that
X
L= λij Idj wi .
1≤j≤n
1≤i≤m
This says that {Idj .wi : j = 1, . . . , n; i = 1, . . . , m} spans L(Rn , F). Fur-

thermore, since {w1 , w2 , . . . , wm } is a basis for F, the expression of L(ej )
in terms of this basis is unique. In addition the linear independence
of the Idj .wi follows from the linear independence of the wi ; thus the
{Idj .wi : j = 1, . . . , n; i = 1, . . . , m} is a basis for L(Rn , F), completing
the proof.
Returning to the previous examples and applying the theorem we have: for
L(R2 , R) the basis is {Id1 .1, Id2 .1} = {Id1 , Id2 } since {1} is a basis for R and
for L(R2 , L(R2 , R)) the basis is {Id1 Id1 , Id1 Id2 , Id2 Id1 , Id2 Id2 }, since {Id1 , Id2 }
is a basis for L(R2 , R).
Definition 15.0.9 A finite dimensional vector space F with basis {w1 , w2 , . . . , wm }
is said to have “Euclidean dominated” norm if
v
m
X
um
uX
|| λi wi || ≤ t λ2i .
i=1 i=1
Lemma 15.0.10 If F has a Euclidean dominated norm then so too does L(Rn , F).
That is if
n P m
λij Idj wi ∈ L(Rn , F)
P
L =
j=1
s i=1
n Pm
λ2ij .
P
then ||L|| ≤
j=1 i=1
n P
P m
Proof: Suppose L = λij Idj wi . We have
j=1 i=1
||L|| = sup||x||≤1 {||L(x)||}

Pn Pm
= sup||x||≤1 {|| j=1 i=1 λij xj wi ||}
Pm Pn
= sup||x||≤1 {|| i=1 yi wi ||} for yi = j=1 λij xj
pPm
≤ sup||x||≤1 { yi2 } since F has Euclidean dominated norm
qPi=1 P
m n 2
= sup||x||≤1 { i=1 ( j=1 λij xj ) }
Pm q Pn 2
P n 2
≤ sup||x||≤1 { i=1 j=1 λij j=1 xj }
qP
n Pm 2
= j=1 i=1 λij
Let f : E → F be (Fréchet) differentiable at each x ∈ E.

Df (x) : E → F is a continuous linear map from E to F i.e.
Df (x) ∈ L(E, F).
Now consider the map x 7→ Df (x) denoted by
Df : E → L(E, F)
The map Df need not be linear, continuous or differentiable. But it makes
sense to ask whether it is differentiable at each x ∈ E.
If Df is differentiable at each x ∈ E, then it has a derivative D(Df )(x) at x
which is by definition continuous and linear. And so
D(Df )(x) : E → L(E, F) thus
D(Df )(x) ∈ L(E, L(E, F)).
Again consider the map D(Df ) : E → L(E, L(E, F)).
If D2 f = D(Df ) is differentiable at each x ∈ E, then it has a derivative
D(D2 f )(x) at x and so
D(D2 f )(x) : E → L(E, L(E, F))

thus D3 f (x) ∈ L(E, L(E, L(E, F)))
and so on.
Returning to the map D2 f : E → L(E, L(E, F)) how do we get back to elements
of F?
For x ∈ E we haveD2 f (x) ∈ L(E, L(E, F))
for u ∈ E we haveD2 f (x)(u) ∈ L(E, F)
and for w ∈ E we haveD2 f (x)(u)(w) ∈ F.
In the case E = Rm , F = Rn (i.e. finite dimensional Euclidean spaces)
Df : Rm → (Rm , Rn ) is continuous ⇐⇒ f is continuous.
How to find higher order derivatives for a function f : Rn → Rm .
1. Standard methods applying the Chain rule and the lemmas 8.0.1,8.0.2 and
theorem 8.0.3.
113
2. Note that since {Idj .ei : j = 1, . . . , n; i = 1, . . . , m} is a basis for L(Rn , Rm )

we may write
X n X m
Df (v) = αij (v)Idj .ei ,
j=1 i=1
where αij (v) ∈ R. Then applying theorem 9.3.2 we can show that αij (v) =
/j
fi (v), since for each v ∈ Rn , αij (v) ∈ R and so we may regard αij as a
function from Rn to R.
In this sense
n X
m n X
m
/j
X X
Df = αij Idj .ei = fi Idj .ei .
j=1 i=1 j=1 i=1
And so we have
n X
X m
D(Df )(v) = Dαij (v)Idj .ei ,
j=1 i=1
where Dαij (v) may be computed using techniques mentioned in 1 above.

3. By noting that {Idk : k = 1, . . . , n} is a basis for L(Rn , R), we may write
n
X
Dαij (v) = αijk (v)Idk
k=1
where αijk ∈ R. By theorem 9.3.2 we have
n X
n X
m n X
n X
m
/j/k
X X
D(Df )(v) = αijk (v)Idk Idj .ei = fi (v)Idk Idj .ei .
k=1 j=1 i=1 k=1 j=1 i=1
For example:
f : R2 → R with f (x, y) = x3 + 4xy 2 . We have f = Id31 + 4Id1 .Id22 and so
f is differentiable at each point x = (x, y) ∈ R2 .
Df (x) = f /1 (x)Id1 + f /2 (x)Id2

= (3x2 + 4y 2 )Id1 + (8xy)Id2 and so
Df = (3Id21 + 4Id22 )Id1 + (8Id1 Id2 )Id2
Set α1 = f /1 = 3Id21 + 4Id22 and α2 = f /2 = 8Id1 Id2 . These are differen-

tiable on R2 , thus Df is differentiable and
/1 /2
Dα1 (x) = α1 (x)Id1 + α1 (x)Id2
= 6xId1 + 8yId2
/1 /2
Dα2 (x) = α2 (x)Id1 + α2 (x)Id2
= 8yId1 + 8xId2 thus
D2 f (x) = Dα1 (x)Id1 + Dα2 (x)Id2
= 6xId1 Id1 + 8yId2 Id1 + 8yId1 Id2 + 8xId2 Id2
We have α11 = f /1/1 = 6Id1 , α12 = f /1/2 = 8Id2 , α21 = f /2/1 = 8Id2 and
α22 = f /2/2 = 8Id1 all of which are differentiable this gives
P2 P2 P2
D3 f (x) = k=1 j=1 i=1 f /i/j/k (x)Idk Idj .Idi .ei
= 6Id1 Id1 Id1 + 8Id2 Id2 Id1 + 8Id2 Id1 Id2 + 8Id1 Id2 Id2 .
In fact it can be shown that the existence of the derivative D(Df )(x)
implies the existence of each of the derivatives Dαij (x).
Theorem 15.0.11 Let f : Rn → Rm and suppose Dk f (v) exists. Then it’s

given by
n n Xm
/i /i /.../ik
X X
··· fi 1 2 (v)Idik Idik−1 . . . Idi1 ei .
ik =1 i1 =1 i=1
The following conditions are required for the existence of Dk (f )(v)

(i) the existence of Dk−1 f ,
/i /i /.../i
(ii) the existence of Dfi 1 2 k−1
(v) for each i, i1 , . . . , ik−1 ; i.e. every order
k − 1 partial derivative of every component of f must be differentiable at
v
In fact there is a simpler form for (ii); just require that each of the functions
/i /i /.../ik
fi 1 2 are continuous at v.

Notes On MATH 441

Uploaded by

Copyright:

Available Formats

Notes On MATH 441

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Notes On MATH 441

Uploaded by

Copyright:

Available Formats

MATH 441: Advanced Calculus

July 24, 2017

Details of the course iii

1 The Fréchet Derivative for maps f : R → R 1

2 Linear and Affine Maps 7

3 Continuity and Limits 15

4 Spaces of Linear Maps 27

8 Differentiability of real-valued functions 49

9 Partial derivatives and Jacobian matrices 55

12 Proof of the Theorems 89

13 Appendix A- Vector Spaces 103

14 Appendix B- Complete Spaces 105

15 APPENDIX C-Higher Order Fréchet Derivatives 109

related to the study of continuity. We will digress to study some function

5. In calculus of several variables, differentiation of maps from Rn to R and

6. A generalised notion of partial derivative (where the partial derivative is

7. Next we state and apply an important theorem of analysis, the Inverse

8. Then we study the Implicit Function theorem, which is considered to pro-

The Fréchet Derivative for

The primary aim of this course is to extend the notion of differentiability to

Sketch of proof: If either x = 0 or y = 0 then we have zero on each P side. In

a quadratic in λ having no real solution. Thus the discriminant is negative i.e.

from which the Cauchy-Schwartz inequality follows.

The ideas of elementary calculus do not carry to higher dimensions or to arbi-

c : V → W is the constant function which assigns the value c ∈ W to each

f = g ⇔ ∀x ∈ R, f (x) = g(x), where f, g ∈ F(R, R).

1.1 Linear and Affine functions from R to R.

Definition 1.1.1 A function f : R → R is linear ⇔

Definition 1.1.2 A function f : R → R is affine ⇔

1.2 The Fréchet derivative for maps

The linear map given by f 0 (a)Id is the Fréchet derivative of f at a. More

Definition 1.2.1 The linear map Df (a) : R → R defined by

Df (a) = f 0 (a)Id is the Fréchet derivative of f at a.

Thus for each x ∈ R, Df (a)(x) = (f 0 (a)Id)x = f 0 (a)x.

Repeated differentiation: higher derivatives:

1. Let f = Id + 3, g = −2Id. Find f + g and 7f .

2. Check that L(R, R) is closed under the operation of

3. Describe the zero vector in the vector space L(R, R).

4. Check that ||mId|| = |m| is a norm on L(R, R).

6. Let a, h ∈ R and f : R → R. In each case find f 0 , f 0 (a), Df (a), Df (a)(h)

(a) f = (-2Id+4)2 . Show the geometric significance of Df (1) on a graph.

7. Let f : R → R be differentiable at each a ∈ R. Match all of the appropriate

8. Let f : R → R be differentiable at each a ∈ R.

(a) Show that f 0 (a) = Df (a)(1)

(c) Show that Df = f 0 · Id.

D(f ◦ g)(a) = Df (g(a)) ◦ Dg(a).

(a) Let f : R → R be a linear map and let a ∈ R.

Linear and Affine Maps

To show that a map is not linear, we have to find a counterexample: a con-

(†) Given L : R2 → R2 with L(x) = L(x, y) = (x − y, x + y) see where a

(b) L maps R2 onto a line through the origin.

(c) L maps R2 onto the origin. Obviously this is the case a, b, c, d = 0.

2.1 The Matrix of a Linear Map

where {e1 , e2 , . . . , em } is the usual orthonormal basis for Rm i.e.

Theorem 2.1.2 Given an n × m matrix A with real entries, define a map

Theorem 2.1.3 If L : Rm → Rn and M : Rn → Rp are linear, then so is

L sends x, y and x + y to L(x), L(y) and L(x + y) respectively. Since triangles

L(u + δ1 v) = L(u) + (δ1 )L(v) and L(u + δ2 v) = L(u) + (δ2 )L(v)

(M ◦ L)(x + λy) = M (L(x + λy)

Required to prove: ∀B = B (0) there is A = Bδ (a) such that ∀x ∈