0% found this document useful (0 votes)
350 views

Differential Calculus in Banach Spaces

Uploaded by

Anonymous MNQ6iZ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
350 views

Differential Calculus in Banach Spaces

Uploaded by

Anonymous MNQ6iZ
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 79

M3302

Differential calculus in Banach spaces

Lecture notes of Prof. Hicham Gebran


hicham.gebran@gmail.com

Lebanese University, Fanar, Fall 2020-2021


www.hichamgebran.com
2

Introduction and orientation


In Calculus, we study properties related to the derivative of a function of a real variable
f : I ⊂ R → R. For example we have

I The sum rule: (f + g)0 = f 0 + g 0 .

I The product rule: (f g)0 = f 0 g + f g 0 .


Ä ä0 0
I The inverse rule f1 = −f f2
.

I The chain rule: (f ◦ g)0 = (f 0 ◦ g)g 0 .

I The mean value theorem: f (b) − f (a) = f 0 (c)(b − a).

I If f has a local maximum or minimum at a point x0 then f 0 (x0 ) = 0.

I If f 0 (x0 ) = 0 and f 00 (x0 ) ≥ 0 then f (x0 ) is a local minimum.

I If f 0 = 0 on an interval I, then f is constant.

I Taylor formulas: if f is smooth enough then


1 1
f (x + h) = f (x) + f 0 (x)h + f 00 (x)h2 + · · · + f (n) (x)hn + remainder.
2 n!

I The inverse function theorem. If g : R → R is differentiable and g 0 (x0 ) 6= 0, then g is bijective


1
near x0 and (g −1 )0 (x) = 0 −1 .
g (g (x)
I The implicit function theorem. Let x and y be two variables related by a relation of the form
f (x, y) = 0 where f is smooth enough. Can we solve y in terms of x at least locally? If at some
∂f
point (x0 , y0 ), 6= 0 then in a neighborhood of (x0 , y0 ) we can write y = ϕ(x) where ϕ is as
∂y
smooth as f .

Sometimes we need to consider a function of several variables or a function with several components.
For example the temperature may be a function of three spatial variables x, y, z and time t. This is
an example of what we call a scalar field. The velocity of a fluid may be a function of space and time
(x, y, z, t) and has three components. This is an example of a vector field. This leads to the study
of functions f : Rn → Rm called vector fields as well. We can still push the generalization further
and consider functions between normed spaces. But how to define the notion of differentiability in
this context? And will the results mentioned above still hold in this more general context? This
is this the purpose of the course: to extend the above results to functions between normed spaces
(which will be Banach spaces most of the time).

This course is fundamental in modern mathematical analysis and differential geometry.

Some references

Henri Cartan, Cours de calcul différentiel (Hermann, 2017).


Dominique Azé, Calcul diffŕentiel et équations différentielles (EDP SCIENCES, 2010).
Contents

1 Preliminaries on normed spaces 5


1.1 Definitions and basic examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 5
1.2 Linear bounded operators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1.3 Multilinear bounded opeators . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
1.4 Hilbert spaces . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 18

2 Differentiable maps 21
2.1 Definitions and fundamental examples . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Linearity of the derivative and the chain rule . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Functions between products of normed spaces . . . . . . . . . . . . . . . . . . . . . 32

3 The mean value theorem 37


3.1 Some common versions of the mean value theorem . . . . . . . . . . . . . . . . . . 37
3.2 Connectedness and locally constant maps . . . . . . . . . . . . . . . . . . . . . . . 41
3.3 Relation between differentiability and partial differentiability . . . . . . . . . . . . . 43
3.4 Convergence of a sequence of differentiable maps . . . . . . . . . . . . . . . . . . . 45
3.5 Strictly differentiable maps . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 49
3.6 More general versions of the mean value theorem . . . . . . . . . . . . . . . . . . . 52

4 Higher order derivatives 53


4.1 Definitions, examples and basic properties . . . . . . . . . . . . . . . . . . . . . . . 53
4.2 Taylor’s formulas . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
4.3 Local extrema . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5 The inverse function theorem and the implicit function theorem 69


5.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
5.2 The inverse function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
5.3 The implicit function theorem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76

3
4 CONTENTS
Chapter 1

Preliminaries on normed spaces

1.1 Definitions and basic examples


Let E be a real vector space. Recall that a norm on E is a function that associates with each vector
x a nonnegative number ||x|| and which satisfies the following.

(i) kxk = 0 if and only if x = 0.

(ii) kλxk = |λ|kxk for any real number λ and any x ∈ E.

(iii) kx + yk ≤ kxk + kyk for all x, y ∈ E (the triangle inequality).

Remark. Condition (iii) can be replaced by the following condition known as the second form of
the triangle inequality.

(iii)’ ||x|| − ||y|| ≤ ||x − y|| for all x, y ∈ E.
A normed space (or normed linear space, or normed vector space) is a vector space equipped
with a norm. A norm defines a distance by setting d(x, y) = ||x − y||. Therefore, a normed space
is a metric space and therefore a topological space. Thus, in a normed space, one can talk about
open balls, open sets, closed sets, compact sets...
The open ball of center a and radius r > 0 is the set

B(a, r) = {x ∈ E; ||x − a|| < r}.

The closed ball of center a and radius r > 0 is the set

B 0 (a, r) = {x ∈ E; ||x − a|| ≤ r}.

The sphere of center a and radius r > 0 is the set

S(a, r) = {x ∈ E; ||x − a|| = r}.

The closed unit ball in a normed space E will be denoted by BE , i.e.,

BE = {x ∈ E; ||x|| ≤ 1}.

The unit sphere will be denoted by SE , i.e.,

SE = {x ∈ E; ||x|| = 1}.

Recall that B(a, r) = B 0 (a, r), B 0 (a, r) = B(a, r) and S(a, r) = ∂B(a, r) = ∂B 0 (a, r).

5
6 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

A Banach space is a normed space which is complete, i.e., in which every Cauchy sequence is
convergent.

Examples
1) Let N ∈ N∗ and p ≥ 1. For x = (x1 , . . . , xN ) ∈ RN , set

N
!1/p
X
p
||x||p = |xi | .
i=1

Then || · ||p is a norm on RN and (RN , || · ||p ) is a Banach space.


2) For x = (x1 , . . . , xN ) ∈ RN , set

||x||∞ = max |xi |.


i=1,...,N

Then || · ||∞ is a norm on RN and (RN , || · ||∞ ) is a Banach space.


Remark 1. Any finite dimensional normed space is complete.
Remark 2. limp→+∞ ||x||p = ||x||∞ , hence the notation.
3) Let p ≥ 1. We set

X
`p = {x = (xn ); |xn |p < +∞}.
n=1

Then `p is a vector space. Moreover, if for x ∈ `p , we set


!1/p
X
||x||p = |xn |p ,
n=1

then || · ||p is a norm on `p and (`p , || · ||p ) is a Banach space.


4) We denote by `∞ the set of all bounded sequences. Then `∞ is a vector space. For x ∈ `∞ , we
set
||x||∞ = sup |xn |.
n∈N∗

Then || · ||∞ is a norm on `∞ and (`∞ , || · ||∞ ) is a Banach space.


Remark 1. `1 ⊂ `p ⊂ `∞ and so `1 = p≥1 `p .
T

Remark 2. If x ∈ `1 , then limp→+∞ ||x||p = ||x||∞ , hence the notation.


5) Let a and b be two real numbers with a < b. We denote by C[a, b] (or C([a, b]) the set of
continuous functions f : [a, b] → R. C[a, b] is a vector space. For f ∈ C[a, b] we set

||f ||∞ = sup |f (x)| = max |f (x)|.


x∈[a,b] x∈[a,b]

Then || · ||∞ is a norm on C[a, b] which makes it a Banach space.


6) Let C 1 [a, b] denote the set of all continuously differentiable functions f : [a, b] → R. Then this
a vector space. If we set
||f || = ||f ||∞ + ||f 0 ||∞ ,
then we get a norm which makes the space complete.
1.1. DEFINITIONS AND BASIC EXAMPLES 7

7) For f ∈ C[a, b] and p ≥ 1, we set


ÇZ b
å1/p
p
||f ||p = |f (t)| dt .
a

Then we get a norm but the space (C[a, b], || · ||p ) is not complete.
Remark. limp→+∞ ||f ||p = ||f ||∞ , hence the notation.
8) Let X be a set and Y be a normed space. We denote by Cb (X, Y ) the set of all bounded
functions f : X → Y . This is a vector space. For f ∈ Cb (X, Y ), we set

||f ||∞ = sup ||f (x)||.


x∈X

Then we get a norm. If Y is complete then Cb (X, Y ) is complete as well.


Remark. The spaces in examples (3)-(7) are all infinite dimensional. This means that we can find
in them arbitrarily large linearly independent sets.
P
Definition. Let (xn ) be a sequence of Pa normed space E. We say that the series xn is normally
convergent if the series of real terms kxn || is convergent.
Proposition 1.1. Let E be a normed space. Then the following conditions are equivalent.

1. E is complete.

2. Every normally convergent series of E is convergent.


P
In this case, if xn is normally convergent, we have (generalized triangle inequality)
∞ ∞

X X
xn ≤ ||xn ||.



n=1 n=1

Equivalent norms
Definition. Let E be a vector space and let || · ||1 and || · ||2 be two norms on E. We say that
these two norms are equivalent if there exist two positive constants α and β such that

α||x||1 ≤ ||x||2 ≤ β||x||1 .

This is of course an equivalence relation between norms on E.


Here are some results concerning equivalent norms.

Proposition 1.2. Let || · ||1 and || · ||2 be two norms on a vector space E. Then the following
conditions are equivalent.

(i) || · ||1 and || · ||2 are equivalent.

(ii) || · ||1 and || · ||2 generate the same topology.

(iii) The identity operator I : (E, || · ||1 ) → (E, || · ||2 ) is a homeomorphism.

In this case, (E, || · ||1 ) is complete if and only if (E, || · ||2 ) is complete.

Proposition 1.3. On a finite dimensional vector space, all norms are equivalent.
Remark. The converse is also true. Actually, if E be a vector space, then the following conditions
are equivalent.
8 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

(i) E is finite dimensional.


(ii) All norms on E are equivalent.
(iii) Every linear map f : E → R is continuous (with respect to any norm on E).
(i)⇒(ii) is just Proposition 1.3 and it was proved in second year. (ii)⇒ (iii) will be proved in the
exercises. (iii)⇒ (i) will be proved in the exercises of Math 400 (functional analysis).

Proposition 1.4. Let (E, || · ||E ) and (F, || · ||F ) be two normed spaces. Set
ˆ ||(x, y)||1 = ||x||E + ||y||F .
ˆ ||(x, y)||∞ = max(||x||E , ||y||F ).
1/2
ˆ ||(x, y)||2 = ||x||2E + ||y||2F .
Then the following hold.
(a) All the above norms are equivalent and generate the product topology.
(b) If (E, || · ||E ) and (F, || · ||F ) are complete, then (E × F, || · ||1 ) is complete.
More generally, we have

Proposition 1.5. Q Let (E1 , || · ||E1 ), (E2 , || · ||E2 ) . . . , (En , || · ||En ) be normed spaces. For
x = (x1 , . . . , xn ) ∈ nk=1 Ek , set
ˆ ||x||1 = nk=1 ||xk ||Ek .
P

ˆ ||x||∞ = maxk ||xk ||Ek .


ÄP ä1/2
n
ˆ ||x||2 = 2
k=1 ||xk ||Ek .
Then the following hold.
(a) All the above norms are equivalent and generate the product topology.
(b) If all (Ek , || · ||Ek ) are complete, then ( nk=1 Ek , || · ||1 ) is complete.
Q

Continuity of fundamental operations


Proposition 1.6. Let E be a normed space. Equip E × E with the product topology (for example
with the norm ||(x, y)|| = ||x|| + ||y||). Then the following maps are continuous
T : R × E −→ E
(λ, x) 7−→ λx.
S : E × E −→ E
(x, y) 7−→ x + y.

Proof. Let (λ, x) ∈ R × E and let (λn , xn ) be a sequence of R × E converging to (λ, x). Then
it is easy to see that λn → λ in R and xn → x in E. Now,
||T (λn , xn ) − T (λ, x)|| = ||λn xn − λx|| = ||λn xn − λn x + λn x − λx||
≤ ||λn (xn − x)|| + ||(λn − λ)x||
= |λn |||xn − x|| + |λn − λ|||x||.
Since (λn )is convergent it is bounded. Therefore the first term tends to 0 since ||xn − x|| → 0.
The second term tends to 0 because |λn − λ| → 0. It follows that T (λn , xn ) → T (λ, x). Hence the
continuity of T .
The proof of the continuity of S is similar but easier. 
1.2. LINEAR BOUNDED OPERATORS 9

Convexity
Let A be a subset of a vector space. We say that A is convex if

x, y ∈ A ⇒ (1 − t)x + ty ∈ A for all t ∈ [0, 1].

Subspaces are convex and balls in a normed space are convex.

1.2 Linear bounded operators


Recall the following. A linear operator L : E → F between vector spaces is a function that satisfies

L(x + αy) = Lx + αLy

for all x, y ∈ E and all α ∈ R.

Proposition 1.7. Let E and F be normed spaces and let L : E → F be a linear operator. Then
the following conditions are equivalent.

(i) L is Lipschitz continuous i.e., there exists a constant M ≥ 0 such that ||Lx−Ly|| ≤ M ||x−y||
for all x, y ∈ E.

(ii) L is uniformly continuous.

(iii) L is continuous.

(iv) L is continuous at the origin.

(v) L maps bounded sets into bounded sets.

(vi) There exists a constant M ≥ 0 such that ||Lx|| ≤ M for all x ∈ BE .

(vii) There exists a constant M ≥ 0 such that ||Lx|| ≤ M for all x ∈ SE .

(viii) There exists a constant M ≥ 0 such that ||Lx|| ≤ M ||x|| for all x ∈ E.

If L satisfies one of the above properties, then, L is called a bounded linear operator or a
continuous linear operator. The space of continuous linear operators from E to F is a vector space
denoted by L(E, F ) or by B(E, F ). If F = E, we write L(E) or B(E) instead of L(E, E). For
L ∈ L(E, F ), we set
||L|| = sup ||Lx|| (the operator norm).
||x||≤1

Then, it turns out that this defines a norm of L(E, F ) which is also denoted by ||L||L(E,F ) .

Remark. For L ∈ L(E, F ) we also have


||Lx||
||L|| = sup ||Lx|| = sup ||Lx|| = sup = inf{C > 0 | ||Lx|| ≤ C||x||}.
||x||=1 ||x||<1 x6=0 ||x||

It is easily seen that ||Lx|| ≤ ||L||||x|| for all x ∈ E and so the infimum is a minimum, but the
supremum need not be a maximum.

Some fundamental examples.


1) Let E = C[a, b] be equipped with the sup-norm. For u ∈ E, let Lu be the anti-derivative of u
that vanishes at a. So Z x
(Lu)(x) = u(t) dt.
a
10 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

Then L ∈ L(E) and ||L|| = b − a.


1’) More generally let ϕ ∈ C[a, b]. For u ∈ E, let Lu be the anti-derivative of ϕu that vanishes at
a. So Z x
(Lu)(x) = ϕ(t)u(t) dt.
a
Rb
Then L ∈ L(E) and ||L|| ≤ a |ϕ(t)| dt = ||ϕ||1 . To prove equality, consider the sequence (un )
nϕ(t)
defined by un (t) = 1+n|ϕ(t)| .
2) Let E = C 1 [0, 1] be equipped with the norm ||u|| = ||u||∞ + ||u0 ||∞ and F = C[0, 1] be
equipped with the sup-norm. Let L be the derivation operator i.e., Lu = u0 . Then L ∈ L(E, F )
and ||L|| ≤ 1. Actually ||L|| = 1. Hint. To prove that ||L|| ≥ 1, consider the sequence (un ) defined
by un (t) = sinnnt .
2’) More generally let ϕ ∈ E, we set Lu = (ϕu)0 . Then L ∈ L(E, F ) and ||L|| ≤ ||ϕ||.

Proposition 1.8. Let L : E → F be a linear operator between normed spaces. If E is finite


dimensional, then L is bounded.

Proposition 1.9. Let E and F be normed spaces. If F is a Banach space then L(E, F ) is also a
Banach space.

Proposition 1.10. If T ∈ L(E, F ) and S ∈ L(F, G), then S ◦ T ∈ L(E, G) and

||S ◦ T || ≤ ||S||||T ||.

Sometimes it is convenient to write ST instead of S ◦ T . So if S ∈ L(E), we can write S n


instead of S ◦ S ◦ · · · ◦ S (n-fold). An induction on n shows then that ||S n || ≤ ||S||n . We set
S 0 = IE (the identity operator of E).

Definition. Let E be a normed space. The space L(E, R) of all linear bounded functionals
f : E → R is called the dual of E and it is denoted by E ∗ . The norm on E ∗ is therefore defined by

||f ||E ∗ = sup |f (x)| = sup |f (x)|.


||x||≤1 ||x||=1

It is easy to check that


||f ||E ∗ = sup f (x) = sup f (x)
||x||≤1 ||x||=1

because if x ∈ BE then −x ∈ BE and f (−x) = −f (x).


Given f ∈ E ∗ and x ∈ E we may write hf, xi instead of f (x); we say that h·, ·i is the scalar
product for the duality (E ∗ , E). Note that, by Proposition 1.9., E ∗ is always a Banach space (i.e.
complete) even if E is not.

Some fundamental examples.


1) Let E = C[a, b] be equipped with the sup-norm. For u ∈ E, set
Z b
f (u) = u(t) dt.
a

Then f ∈ E∗ and ||f || = b − a.


1’) More generally let ϕ ∈ C[a, b]. For u ∈ E, set
Z b
f (u) = ϕ(t)u(t) dt.
a
1.2. LINEAR BOUNDED OPERATORS 11

Then f ∈ E ∗ and ||f || = ||ϕ||1 .


2) Let E = C[a, b] and let t0 ∈ [a, b] be given. For u ∈ E, set f (u) = u(t0 ). Then f ∈ E ∗ and
||f || = 1. f is called the evaluation operator at t0 or the Dirac functional at t0 .
p
3) Let p > 1 and let p0 = p0 is called the conjugate of p and we have p1 + p10 = 1. Now let
p−1 .
p0
y ∈ ` Pand let L : `p → R be defined by Lx = ∞
P
n=1 xn yn . Note that L is well defined, i.e. the
series ∞ x y
n=1 n n is convergent because by Holder’s inequality

X
|xn yn | ≤ ||x||p ||y||p0 .
n=1

It follows that |Lx| ≤ ||x||p ||y||p0 and so ||L|| ≤ ||y||p0 . Actually we have equality.

Definition. An isomorphism of normed spaces or a topological isomorphism is a linear operator


L : E → F such that L is bijective, continuous and whose inverse is also continuous.
Otherwise stated, an isomorphism (of normed spaces) is an algebraic isomorphism and a homeomorphism
as well. Two isomorphic normed spaces have therefore the same algebraic structure and the same
topological structure and so can be identified algebraically and topologically. In every field of
mathematics there is a notion of isomorphism. An isomorphism is a bijection that conserves the
structure under consideration. For an algebraist, the group of symmetries of an equilateral triangle
and the group of permutations of {1, 2, 3} are the same object. For a topologist, a circle, an ellipse,
a square and a triangle are the same object. For a functional analyst, all N −dimensional normed
spaces are the same object, all separable Hilbert spaces are the same object.
If E and F are isomorphic we may write E ' F . The set of isomorphisms between E and F
will be denoted by Isom(E, F ). Thus, Isom(E, F ) = ∅ if and only if E and F are not isomorphic.
We may write Isom(E) instead of Isom(E, E).
Remark. Isom(E) satisfies the following properties
(i) L ∈ Isom(E) ⇒ −L ∈ Isom(E) (Isom(E) is symmetric with respect to the origin).
(ii) S, T ∈ Isom(E) ⇒ ST ∈ Isom(E) (Isom(E) is stable under composition).
(iii) S ∈ Isom(E) ⇒ S −1 ∈ Isom(E) (Isom(E) is stable under inversion).
Properties (ii) and (iii) can be restated by saying that Isom(E) is a group under the composition
operation. This is why it is often denoted by GL(E) (general linear group of E).

Proposition 1.11. If two normed spaces are isomorphic then either they are both complete or
both incomplete.

There is even a stronger notion of isomorphism.

Definition. An isometry L : E → F between normed spaces is a linear operator satisfying


||Lx|| = ||x|| for all x ∈ E (it conserves the norm). It is automatically continuous. An isometry
conserves distances ||Lx − Ly|| = ||x − y||. An isometric isomorphism is both an isometry and an
isomorphism. If there is an isometric isomorphism between E and F we may write E ∼ = F.
An isometry is not necessarily surjective. An isometric isomorphism conserves not only the
algebraic and the topological structures but also the metric structure.

Example-Exercise 1. Let E be a normed space. Then L(R, E) ∼


= E. Indeed define T by
T : E −→ L(R, E)
x 7−→ Tx : R −→ E
t 7−→ tx.
Prove the following.
12 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

1. T is well defined.

2. T is linear.

3. ||T x|| = ||x||. This implies that T is injective.

4. T is surjective.

5. T −1 is given by
T −1 : L(R, E) −→ E
g 7−→ g(1)

Example 2. Let n ∈ N∗ . Equip Rn with some norm || · || and fix a basis B = {e1 , . . . , en }. Let
Mn (R) denote the space of n × n real matrices. For A ∈ Mn (R) set

||A|| = sup ||Ax||.


||x||≤1

Then this defines a norm on Mn (R). Equip L(Rn ) with the operator norm. Then L(Rn ) ∼ = Mn (R).
An isometry is given by the operator T that associates with each L ∈ L(Rn ) its matrix in the basis
B. It should be clear that T is linear, bijective and ||T (L)|| = ||L||.
Example 3. Let E be a normed space, let λ be a real number 6= 1 and let Lx = λx (homothecy).
Then L is an isomorphism of E which is not an isometry.
Example 4. Let n ≥ 2. Then (Rn , || · ||2 ) and (Rn , || · ||2 ) are isomorphic but not isometric.

Theorem 1.1. (Banach isomorphism theorem). Let E and F be Banach spaces. If T ∈


L(E, F ) is bijective, then T −1 is bounded and so T ∈ Isom(E, F ).
Proof. Next year in Math 400 (functional analysis). 
Remark. If one of the spaces is not complete, then the conclusion of the theorem may fail.

Openness of Isom(E, F )
We know from a previous example that we can identify L(R) with R. The isometry between L(R)
and R was denoted by T −1 and is given by T −1 g = g(1). This is because if g ∈ L(R), then by
linearity, we can write g(x) = g(x1) = xg(1) = ax. This identification amounts to identifying a
straight line through the origin with its slope.
What is then Isom(R)? Let g ∈ L(R). Then g is bijective if and only if g(1) 6= 0. It follows
that Isom(R) can be identified with R\{0}.
Observe now that R\{0} is an open subset of R. Equivalently, if x0 6= 0 then x0 + h 6= 0 if h
is small enough.
More generally, the isometry that takes L(Rn ) into Mn (R) takes Isom(Rn ) into the set of
invertible matrices GL(n, R) = {A ∈ Mn (R); det A 6= 0}. So we can identify Isom(Rn ) with
GL(n, R). The set of invertible n × n matrices is open in Mn (R). This means that if we perturb a
little bit an invertible matrix the resulting matrix is still invertible.
Is it true that more generally Isom(E, F ) is an open subset of L(E, F ). The answer is yes. But
to prove that we will need to establish first a preliminary result which is particular case of the general
result.
The preliminary result is an extension to linear operators of the geometric series

1 X
= 1 + x + x2 + · · · + xn + · · · = xn provided |x| < 1.
1−x
n=0
1.2. LINEAR BOUNDED OPERATORS 13

The identity operator on E will be denoted by IE , 1E or 1.

Theorem 1.2. Let E be a Banach space and let S ∈ L(E) satisfy ||S|| < 1. Then 1E − S ∈
Isom(E) and

X
(IE − S)−1 = IE + S + S 2 + · · · = Sn.
n=0
||S||n is convergent. Because
P
Proof. Since ||S|| < 1, the geometric series of real numbers n≥0P
||S n n n
P || ≤ n||S|| , the comparison test for series implies that the series n≥0 ||S ||, that is, the series
n≥0 S is normally (or absolutely) convergent. Completeness of E implies that this series is
convergent. Let T denote the sum of this series. Observe that T ∈ L(E).
Then using the continuity of S and T we see that,

X ∞
X
T S = ST = S n+1 = Sn.
n=0 n=1

It follows that

X ∞
X
T (IE − S) = T − T S = Sn − S n = IE ,
n=0 n=1
and

X ∞
X
n
(IE − S)T = T − ST = S − S n = IE .
n=0 n=1
This means that IE − S is invertible and

X
(IE − S)−1 = T = Sn. 
n=0

Remark. Here’s an equivalent way of stating the above theorem. Let E be a Banach space and
let L ∈ L(E) satisfy ||IE − L|| < 1, then L ∈ Isom(E) and

X
L−1 = (IE − L)n .
n=0

The first conclusion can be stated as B(IE , 1) ⊂ Isom(E).


Theorem 1.3. Let E and F be Banach spaces. Then
(a) Isom(E, F ) is open in L(E, F ).
(b) The map Φ : Isom(E, F ) → L(F, E) defined by Φ(S) = S −1 is continuous.

Proof. (a) If Isom(E, F ) = ∅, there is nothing to prove. Let S0 ∈ Isom(E, F ).


 
1
Claim 1. B S0 , ||S −1 ⊂ Isom(E, F ).
0 ||
 
1
Proof of Claim 1. Let S ∈ B S0 , ||S −1 ||
1
. Then ||S − S0 || < ||S −1 ||
and so ||S0−1 ||||S − S0 || < 1.
0 0
It follows from Proposition 1.10. that

||S0−1 S − IE || = ||S0−1 (S − S0 )|| < 1.

By the previous theorem, S0−1 S is invertible and so S is invertible, i.e. S ∈ Isom(E, F ). This proves
the claim and so point (a).
 
1
(b) Proof/Exercise. Let S0 ∈ Isom(E, F ). To prove continuity at S0 , let S ∈ B S0 , ||S −1 ||
and
0
set T = 1E − S0−1 S. Prove the following points.
14 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

1. T is well defined and T ∈ L(E).

2. ||T || < 1.

3. T → 0 as S → S0 .

4. S = S0 (1E − T ) and so S −1 = (1E − T )−1 S0−1 .

5. We have
||T ||
||Φ(S) − Φ(S0 )|| ≤ ||S0−1 ||.
1 − ||T ||

Continuity of Φ follows. 

Remark 1. Φ is not linear. Φ is an extension of the hyperbola t → 1t .


Remark 2. In general, ||S0−1 || =
6 1
||S0−1 ||
.

1.3 Multilinear bounded opeators


Let us start with the particular case of bilinear operators.
Definition. Let E1 , E2 and F be vector spaces and let L : E1 × E2 → F be a map (L is a function
of two variables). We say that L is bilinear if it is linear in each variable. This means that

1. L(x1 + αy, x2 ) = L(x1 , x2 ) + αL(y, x2 ), and

2. L(x1 , x2 + αz) = L(x1 , x2 ) + αL(x1 , z),

for all x1 , y ∈ E1 , x2 , z ∈ E2 and α ∈ R.

Remark 1. Do not confuse a linear map of two variables with a bilinear map. For example, let
E1 = E2 = F = C[0, 1]. For u, v ∈ C[0, 1], we set L(u, v) = uv. Then L : E1 × E2 → F is bilinear
but not linear because

L((u, v) + (h1 , h2 )) = L(u + h1 , v + h2 ) = (u + h1 )(v + h2 ) = uv + h1 h2 + uh2 + vh1

whereas
L(u, v) + L(h1 , h2 ) = uv + h1 h2 .
Also
L(α(u, v)) = L(αu, αv) = α2 uv
whereas
αL(u, v) = αuv.

Remark 2. A bilinear operator satisfies L(0, x2 ) = L(x1 , 0) = 0 for all x1 ∈ E1 and x2 ∈ E2 .

Proposition 1.12. Let E1 , E2 and F be normed spaces and let L : E1 × E2 → F be bilinear.


Equip E1 × E2 with the norm ||(x1 , x2 )|| = max(||x1 ||, ||x2 ||) (or any equivalent norm). Then the
following conditions are equivalent.

(i) L is continuous.

(ii) L is continuous at 0.

(iii) L maps bounded sets into bounded sets.

(iv) L is bounded on the unit ball of E1 × E2 .


1.3. MULTILINEAR BOUNDED OPEATORS 15

(v) L is bounded on the unit sphere E1 × E2 .


(vi) There exists a constant M > 0 such that ||L(x1 , x2 )|| ≤ M ||x1 ||||x2 || for all x1 ∈ E1 , x2 ∈ E2 .

Proof. (i)⇒(ii), (iii)⇒(iv) and (iv)⇒(v) are trivial.


(ii)⇒(iii). In the continuity condition of L at 0 take ε = 1. Then there exists δ > 0 such that
||(u1 , u2 )|| ≤ δ ⇒ ||L(u1 , u2 )|| ≤ 1. (1.1)
Let now A be a bounded subset of E1 × E2 . Then A ⊂ B 0 (0, R) for some R > 0. We claim that
2 2
L(A) ⊂ B 0 (0, Rδ2 ), or equivalently ||L(x1 , x2 )|| ≤ Rδ2 for all (x1 , x2 ) ∈ A. So let (x1 , x2 ) ∈ A. Then
||(x1 , x2 )|| ≤ R. Let u1 = Rδ x1 and u2 = Rδ x2 . Then ||u1 || = ||u2 || ≤ δ and so ||(u1 , u2 )|| ≤ δ. It
follows from condition (1.1) that ||L(u1 , u2 )|| ≤ 1. Therefore
Å ã
L δ x1 , δ x2 ≤ 1.

R R

Homogeneity of L in each variables and homogeneity of the norm imply that


R2
||L(x1 , x2 )|| ≤ .
δ2
(v)⇒(vi). By assumption, there is M > 0 such that ||L(x1 , x2 )|| ≤ M for all (x1 , x2 ) in the unit
sphere. Let (x1 , x2 ) ∈ E1 × E2 be arbitrary. If x1 = 0 or x2 = 0 then L(x1 , x2 ) = 0 and so the
inequality of (vi) is satisfied. If not, set u1 = ||xx11 || and u2 = ||xx22 || . Then ||(u1 , u2 )|| = 1 and so
||L(u1 , u2 )|| ≤ M . This means that
Å ã
L x1 , x2 ≤ M.

||x1 || ||x2 ||
Homogeneity of L in each variable and homogeneity of the norm imply that
||L(x1 , x2 )|| ≤ M ||x1 ||||x2 ||.
(vi)⇒(i). Fix an element x = (x1 , x2 ) ∈ E1 × E2 and let ε > 0 be given. We will prove that there
exists α > 0 such that
||h|| ≤ α ⇒ ||L(x + h) − L(x)|| ≤ ε.
Å ã
ε
Let M be as in condition (iii) and let α = min 1, .
M (||x1 || + ||x2 || + 1)
Let h = (h1 , h2 ) satisfy ||h|| ≤ α and let k = (0, h2 ). Then, by the triangle inequality
k|L(x + h) − L(x)|| ≤ ||L(x + h) − L(x + k)|| + ||L(x + k) − L(x)||. (1.2)
Now, by bilinearity of L, we have
L(x + h) − L(x + k) = L(x1 + h1 , x2 + h2 ) − L(x1 , x2 + h2 ) = L(h1 , x2 + h2 )
and
L(x + k) − L(x) = L(x1 , x2 + h2 ) − L(x1 , x2 ) = L(x1 , h2 ).
It follows from condition inequality (1.2) and condition (iii) that
k|L(x + h) − L(x)|| ≤ M ||h1 ||(||x2 + h2 ||) + M ||x1 ||||h2 ||
≤ M ||h1 ||(||x2 || + ||h2 ||) + M ||x1 ||||h2 ||
≤ M ||h1 ||(||x2 || + 1) + M ||x1 ||||h2 || because ||h2 || ≤ ||h|| ≤ 1
≤ M ||h||(||x2 || + 1) + M ||x1 ||||h|| because ||h1 ||, ||h2 || ≤ ||h||
= M (||x1 || + ||x2 || + 1)||h||
≤ ε. 
16 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

Definition. A bilinear continuous map is also called a bounded bilinear map (because of condition
(iii)). The set of bilinear continuous map L : E1 × E2 → F is denoted by L(E1 , E2 ; F ) or
B(E1 , E2 ; F ). It is a vector space. If E1 = E2 = E, we also write L2 (E; F ) instead of L(E, E; F ).
Proposition. For a bilinear bounded map L : E1 × E2 → F , we set

||L|| = sup ||Lx||.


||x||≤1

Then, this defines a norm on L(E1 , E2 ; F ). If F is complete, then this space is complete.
Remark. For L ∈ L(E1 , E2 ; F ) we also have

||L|| = sup ||Lx|| = sup ||Lx|| = inf{C > 0; ||Lx|| ≤ C||x1 ||||x2 ||}.
||x||=1 ||x||<1

It is easily seen that ||Lx|| ≤ ||L||||x1 ||||x2 || for all x ∈ E and so the infimum is a minimum, but
the supremum need not be a maximum.

Examples.
a) Let E = F = C[a, b] be equipped with the norm of uniform convergence. For u, v ∈ C[a, b], we
set L(u, v) = uv. Then L ∈ L2 (E; F ) and ||L|| = 1.
b) More generally, let ϕ ∈ C[a, b] be fixed. We set L(u, v) = ϕuv. Then L : E × E → F is bilinear
and bounded and ||L|| = ||ϕ||∞ .
c) Under the same assumptions, for u, v ∈ E, let L(u, v) be the antiderivative of ϕuv that vanishes
at a. So Z x
(L(u, v))(x) = ϕ(t)u(t)v(t) dt.
a

Then L ∈ L2 (E; E) and ||L|| = ||ϕ||1


d) Let E1 = E2 = C 1 [a, b] and F = C[0, 1]. Equip E1 and E2 with the norm ||u|| = ||u||∞ + ||u0 ||∞
and equip F with the sup-norm. Fix ϕ ∈ C 1 [a, b] and let L(u, v) = (ϕuv)0 . Then L ∈ L2 (E1 ; F ).

Notation. If f : X × Y → Z is a function of two variables, then for x ∈ X, we denote by


f (x, ·) the function from Y to Z that to each y ∈ Y associates f (x, y) (it is called the partial map
corresponding to the second variable). We define similarly f (·, y) for y ∈ Y .

Let E and F be normed spaces and let L ∈ L(E, L(E, F )). This means in particular that for every
x ∈ E, Lx is an element of L(E, F ) and so for y ∈ E, L(x)(y) is an element of F . Since L is linear
and takes values in a space of linear maps, this suggest that we can view L as a bilinear operator
from E × E into F . This is the idea behind the next result.

Theorem 1.4. Let E and F be normed spaces and equip E × E with the norm ||(x, y)|| =
max(||x||, ||y||). Then L(E, L(E, F )) ∼
= L2 (E; F ).
Proof. Consider the operator Φ

Φ : L(E, L(E, F )) −→ L2 (E; F )


L 7−→ Φ(L) : E × E −→ F
(x, y) 7−→ (Lx)(y).

It is easy to see that Φ is well defined, linear, bijective and its inverse is given by

Ψ : L2 (E; F ) −→ L(E, L(E, F ))


T 7−→ Ψ(T ) : E −→ L(E, F )
x 7−→ T (x, ·).
1.3. MULTILINEAR BOUNDED OPEATORS 17

Next,
kΦ(L)k = sup ||(Lx)(y)||
||(x,y)||≤1

= sup ||(Lx)(y)||
||x||≤1,||y||≤1

= sup sup ||(Lx)(y)||


||x||≤1 ||y||≤1

= sup ||Lx||
||x||≤1

= ||L||. 
Now we move to the general case.
Definition. Let E1 , E2 , . . . , En and F be vector spaces and let L : E1 × E2 × · · · × En → F be
a map (L is a function of n variables). We say that L is multilinear if it is linear in each variable.

Proposition 1.12. Let E1 , E2 , . . . , En and F be normed spaces and let L : E1 ×E2 ×· · ·×En → F
be multilinear. Equip E1 × E2 × · × En with the norm ||(x1 , x2 , . . . , xn )|| = maxni=1 ||xi || (or any
equivalent norm). Then the following conditions are equivalent.
(i) L is continuous.

(ii) L is continuous at 0.

(iii) L maps bounded sets into bounded sets.

(iv) L is bounded on the unit ball of E1 × E2 × · · · × En .

(v) L is bounded on the unit sphere E1 × E2 × · · · × En .

(vi) There exists a constantQn M > 0 such that ||L(x1 , x2 , . . . , xn )|| ≤ M ||x1 ||||x2 || · · · ||xn || for all
(x1 , x2 , . . . xn ) ∈ i=1 Ei .

Definition. A multilinear continuous map is also called a bounded bilinear map (because of
condition (iii)). The set of multilinear continuous map L : E1 × E2 × · · · × En → F is denoted
by L(E1 , E2 , . . . , En ; F ) or B(E1 , E2 , . . . , En ; F ). If Ei = E for all i = 1, . . . , n, we also write
Ln (E, F ) instead of L(E, E, . . . , E; F ).
Qn
Proposition. For a multilinear bounded map L : i=1 Ei → F , we set

||L|| = sup ||Lx||.


||x||≤1
Qn
Then, this defines a norm on L( i=1 Ei ; F ). If F is complete, then this space is complete.
Remark. For L ∈ L(E1 , E2 , . . . , En ; F ) we also have

||L|| = sup ||Lx|| = sup ||Lx|| = inf{C > 0; ||Lx|| ≤ C||x1 ||||x2 || · · · ||xn ||}.
||x||=1 ||x||<1

It is easily seen that ||Lx|| ≤ ||L||||x1 ||||x2 || · · · ||xn || for all x ∈ E and so the infimum is a minimum,
but the supremum need not be a maximum.

Theorem 1.5. Let E and F be normed spaces and equip E n with the norm ||(x1 , . . . , xn )|| =
maxni=1 ||xi ||. Then Ln (E; Lm (E; F )) ∼
= Ln+m (E; F ).
Proof. Consider the operator Φ : Ln (E; Lm (E; F )) → Ln+m (E; F ) defined by

Φ(L)(x1 , . . . , xn , xn+1 , . . . , xn+m ) = L(x1 , . . . , xn )(xn+1 , . . . , xn+m ).


18 CHAPTER 1. PRELIMINARIES ON NORMED SPACES

It is easy to see that Φ is well defined, linear, bijective and its inverse Φ−1 : Ln+m (E; F ) →
Ln (E; Lm (E; F ) is given by

Φ−1 (T )(x1 , . . . , xn ) = T (x1 , . . . , xn , ·, · · · , ·) (m dots).

To prove that ||Φ(L)|| = ||L||, we proceed as in the proof of Theorem 1.4, but we replace x by
(x1 , . . . , xn ) and y by (xn+1 , . . . , xn+m ). 

1.4 Hilbert spaces


Definition. Let V be a vector space over R. An inner product on V is a function L : V × V → R
that satisfies the following properties

(i) L is bilinear.

(ii) L is symmetric i.e. L(x, y) = L(y, x) for all x, y ∈ V .

(iii) L is positive i.e. L(x, x) ≥ 0 for every x ∈ V .

(iv) L is non-degenerate i.e. L(x, x) = 0 ⇒ x = 0.

An inner product is usually denoted by h·, ·i or (·|·) or (·, ·). A vector space equipped with an inner
product is called a inner product space or a pre-Hilbert space. An inner product induces a norm by
setting
||x|| = hx, xi1/2 .
The triangle inequality is a consequence of the fundamental Cauchy-Schwarz inequality

|hx, yi| ≤ ||x||||y||.

Thus, an inner product space is a normed space. An inner product space is called a Hilbert space if
it is complete.

Examples.
1) Let N ∈ N∗ and For x = (x1 , . . . , xN ) and y = (y1 , . . . , yN ) RN , we set
n
X
hx, yi = x i yi .
i=1

This is an inner product on RN that generates the usual Euclidean norm || · ||2 . Equipped with this
inner product RN is a Hilbert space.
2) We set

X
`2 = {x = (xn ); |xn |2 < +∞}.
n=1

For x, y ∈ `2 , we set

X
hx, yi = x i yi .
i=1

This is an inner product for which `2 is a Hilbert space.


3) For f, g ∈ C[a, b], we set
Z b
hf, gi = f (t)g(t) dt.
a
1.4. HILBERT SPACES 19

Then we get an inner product. However C[a, b] is not a Hilbert space for this inner product.

Theorem 1.6. (Riesz representation theorem). Let H be a Hilbert space and let f ∈ H ∗ =
L(H, R) (the dual of H). Then there exists a unique element a ∈ H such that

f (x) = ha, xi for all x ∈ H.

and
||f || = ||a||.

Corollary 1.1. H ∗ ∼
= H.
20 CHAPTER 1. PRELIMINARIES ON NORMED SPACES
Chapter 2

Differentiable maps

2.1 Definitions and fundamental examples


Big O and small o notation
Let E, F , G be normed spaces, let U be a subset of E and a ∈ U (or a = +∞ if U is an interval
which is unbounded from above). Let f : U → F and g : U → G be two maps. We write

||f (x)||
ˆ f (x) = o(g(x)) as x → a (or near x = a) if lim = 0. This is equivalent to the
x→a ||g(x)||
following condition.

∀ε > 0, ∃δ > 0 s.t ||x − a|| < δ ⇒ ||f (x)|| ≤ ε||g(x)||.

ˆ f (x) = O(g(x)) as x → a if there exists a constant C such that ||f (x)|| ≤ C||g(x)|| for all x
in a neighborhood of a.

Examples.

ˆ ln(1 + x) = O(x) near x = 0.

ˆ sin x = x + O(x3 ) near x = 0

ˆ ln x = o(x) near x = +∞.

Remark. The conditions f (h) = o(h) and f (h) = o(||h||) mean the same thing.
Some rules
For a real variable x near 0, we have

ˆ f (x) = O(x2 ) ⇒ f (x) = o(x). But the converse is not true.

ˆ o(x) ± o(x) = o(x).

ˆ o(x)o(x) = o(x2 ).

ˆ O(x) ± O(x) = O(x).

ˆ O(x)O(x) = O(x2 ).

21
22 CHAPTER 2. DIFFERENTIABLE MAPS

Motivation
Let us revisit the notion of the derivative of a map f : I ⊂ R → R. We assume that I is open. Let
a ∈ I. We say that f is differentiable at a if the following limit exist

f (a + h) − f (a)
` := lim .
h→0 h

df
In this case this limit is denoted by f 0 (a) or dx and it is called the derivative of f at a.
a
Geometrically, f 0 (a), when it exists, is the slope of the tangent to the graph of f at the point
(a, f (a)). We say that f is differentiable on I if it is differentiable at every point of I.
Remark 1. If we want f to be differentiable on I, we must have the property

a∈I ⇒a+h∈I for all h small enough.

This condition means precisely that I is open.


Remark. 2 We cannot generalize directly this definition to maps between normed spaces because
division by a vector is not defined. So we we will have to transform it a little bit.
Under the above assumptions, the following conditions are equivalent.

(i) f is differentiable at a.
f (a + h) − f (a)
(ii) There exists ` ∈ R such that lim − ` = 0.
h→0 h
f (a + h) − f (a) − `h
(iii) There exists ` ∈ R such that lim = 0.
h→0 h
(iv) There exists ` ∈ R such that f (a + h) − f (a) − `h = o(h).

(v) There exists ` ∈ R such that f (a + h) = f (a) + `h + o(h).

(vi) There exists ` ∈ R such that

∀ε > 0, ∃δ > 0 s.t |h| ≤ δ ⇒ |f (a + h) − f (a) − `h| ≤ ε|h|.

The last three formulations of differentiability have a common geometric interpretation: the graph
of f near the point (a, f (a)) can be approximated by its tangent, otherwise stated, the graph of f
looks locally like a line.
We can extend this definition to maps between inner product spaces if we replace `h by h`, hi.
But we can do better. As a function of h, the term `h, is linear (and of course continuous). This
leads to the following definition.

Definitions
Let E and F be normed spaces, let U ⊂ E be open, a ∈ U , and let f : U → F be a map. We
say that f is differentiable in the sense of Fréchet (or Fréchet differentiable) at the point a if there
exists L ∈ L(E, F ) such that

f (a + h) − f (a) − Lh = o(h) near h = 0.

or equivalently,
||f (a + h) − f (a) − Lh|| = o(||h||) near h = 0.
2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 23

or equivalently,
f (x) − f (a) − L(x − a) = o(x − a) for x near a.
If you don’t like this small o notation, you can write

f (a + h) − f (a) − Lh = r(h)||h|| where lim r(h) = 0,


h→0

or
ε(h)
f (a + h) − f (a) − Lh = ε(h) where lim = 0,
h→0 ||h||
or
∀ε > 0, ∃δ > 0 s.t ||h|| ≤ δ ⇒ ||f (a + h) − f (a) − Lh|| ≤ ε||h||.
We will see that if this condition holds, then the operator L is unique. It is called the Fréchet
derivative or the Fréchet differential of f at the point x. It is denoted by one of the following
symbols
f 0 (a), Df (a), dfa , df (a).
Remark. We shall see that there is a weaker form of differentiability called Gâteau differentiability.
But since we shall not study it in detail, we will refer to the above condition as differentiability.

Proposition 2.1. In the previous definition, the operator L is unique.


For the proof we need a lemma.
Lemma 2.1. Let S : E → F be a linear map between normed spaces. If Sh = o(h) near h = 0,
then S = 0.
Proof of the lemma. Let ε > 0 be given. Then by definition, there exists δ > 0 such that
δ
||Sh|| ≤ ε||h|| whenever ||h|| ≤ δ. Let now x ∈ E be an arbitrary non zero element. Let h = ||x|| x.
Then ||h|| = δ and so ||Sh|| ≤ εδ. This means that
Å ã
δ
S x ≤ εδ.
||x||

By homogeneity of S and the norm, we get

||Sx|| ≤ ε||x||.

Since this inequality is true for x = 0, it is true for all x ∈ E. It follows that S is bounded and
||S|| ≤ ε. Since ε was arbitrary, we get ||S|| = 0 and so S = 0. 

Proof of Proposition 2.1. Suppose that there is another bounded linear operator T such that

f (a + h) − f (a) − T h = o(h).

Subtracting this identity from

f (a + h) − f (a) − Lh = o(h),

we get
(T − L)h = o(h).
It follows from the lemma that T − L = 0 and so T = L. 

Remark. When f : U ⊂ R → R is differentiable at a point a, f 0 (a) ∈ L(R, R) ∼ = R. Therefore


f 0 (a) is identified with a real number and we get the familiar definition of the derivative at a point.
24 CHAPTER 2. DIFFERENTIABLE MAPS

Definition. If f : U ⊂ E → F is differentiable at every point of U , we say that f is differentiable


on U . In this case, we have a function

f 0 : U → L(E, F ).

If this function is continuous (at every point of U ) we say that f is of class C 1 or that f is
continuously differentiable.
Remark. Do not confuse the continuity of f 0 (x) : E → F with the continuity of f 0 : U → L(E, F ).
When it exists, f 0 (x) is continuous by definition. Whereas f 0 need not be continuous. Otherwise
stated, f 0 (x)h is continuous in h but not necessarily continuous in x. Here’s an example of a
differentiable function which is not of class C 1 .
Example. Let
®
x2 sin x1 if x 6= 0
f (x) =
0 if x = 0.
Then f is differentiable at every point x 6= 0. It is also differentiable at 0 and f 0 (0) = 0 because

f (h) − f (0) 1
lim = lim h sin = 0.
h→0 h h→0 h
Therefore ®
2x sin x1 − cos x1 if x 6= 0
f 0 (x) =
0 if x = 0.
Before we give some fundamental examples and counter-examples, let us formulate two basic
facts about differentiability.

Proposition 2.2. If f : U ⊂ E → F is differentiable at a ∈ U , then f is continuous at a.


Proof. By assumption we have

f (a + h) − f (a) = Df (a)h + o(h) → 0 as h → 0 

Proposition 2.3. Let f : U ⊂ E → F be a map which is differentiable at a point a ∈ U . If we


replace the norms on E and F by equivalent norms, then f will still be differentiable at a with the
same derivative.
Proof/Exercise. Let || · ||1 be a norm on E which is equivalent to the original norm || · ||E and
let || · ||2 be a norm on F which is equivalent to the original norm || · ||F . Then there are positive
constants α and β such that for all x ∈ E and y ∈ F we have

||x||E ≤ α||x||1 ||y||F ≥ β||y||2 .

Let ε > 0. Then by assumption there exists δ > 0 such that

βε
||h||E ≤ δ ⇒ ||f (a + h) − f (a) − f 0 (a)h||F ≤ ||h||E .
α
Find δ1 > 0 such that

||h||1 ≤ δ1 ⇒ ||f (a + h) − f (a) − f 0 (a)h||2 ≤ ε||h||1 .

Finally observe that continuity of f 0 (a) is conserved by the change of norms. 


2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 25

Fundamental examples
Now we give some fundamental examples and counter-examples.
Proposition 2.4. If f : U → F is constant, then f is C 1 and its derivative is the 0 operator.
Proof. Since f is constant we have f (a + h) = f (a) and so if L is the zero operator we have

f (a + h) − f (a) − Lh = 0 = o(h). 

Proposition 2.5. Let f ∈ L(E, F ), then f is of class C 1 and at every point x ∈ E we have
f 0 (x) = f and so f 0 is constant.
Proof. By linearity of f , we have f (x + h) = f (x) + f (h) and so

f (x + h) − f (x) − f (h) = 0 = o(h). 

Proposition 2.6. Let f ∈ L(E, F ; G), then f is of class C 1 and for any x = (x1 , x2 ), h = (h1 , h2 )
in E × F we have
f 0 (x1 , x2 )(h1 , h2 ) = f (h1 , x2 ) + f (x1 , h2 ).
Proof. Here we equip E × F with the norm ||(x1 , x2 )|| = max(||x1 ||, ||x2 ||) but we we can take
any other equivalent norm like ||(x1 , x2 )|| = ||x1 || + ||x2 ||. By the bilinearity of f , we have

f (x1 + h1 , x2 + h2 ) = f (x1 , x2 ) + f (h1 , x2 ) + f (x1 , h2 ) + f (h1 , h2 ).

Therefore
f (x1 + h1 , x2 + h2 ) − f (x1 , x2 ) − f (h1 , x2 ) − f (x1 , h2 ) = f (h1 , h2 ).
Now (h1 , h2 ) 7→ f (h1 , x2 ) + f (x1 , h2 ) is linear and bounded (check that). Finally,

||f (h1 , h2 )|| ≤ ||f ||||h1 ||||h2 || ≤ ||f ||||(h1 , h2 )||2 = O(||h||2 ) = o(||h||).

Hence the result. 


More generally we have.

f ∈ L( ni=1 Ei ; G), then f is of class C 1 and for any x = (x1 , x2 , . . . , xn ),


Q
Proposition 2.7. Let Q
h = (h1 , h2 , . . . , hn ) in ni=1 Ei we have

f 0 (x1 , x2 , . . . , xn )(h1 , h2 , . . . , hn ) =f (h1 , x2 , . . . , xn ) + f (x1 , h2 , x3 , . . . , xn ) + · · · +


+ f (x1 , . . . , xn−1 , hn ).

Proposition 2.8. Let f : U ⊂ Rn → R and let a ∈ U . If f is differentiable at a, then f has


partial derivatives at a. The converse is not true.
Proof. Let {e1 , . . . , en } be the canonical basis of Rn ; that is, ei = (0, . . . , 1, . . . , 0) (the ith
component is 1 and the remaining are 0). By assumption we have

||f (a + h) − f (a) − Df (a)h||


lim = 0.
h→0 ||h||

Take h = tei , so that ||h|| = |t|. Then

||f (a + tei ) − f (a) − tDf (a)ei ||


lim = 0.
t→0 |t|
26 CHAPTER 2. DIFFERENTIABLE MAPS

Equivalently,
f (a + tei ) − f (a) − tDf (a)ei
lim = 0.
t→0 t

Equivalently,
f (a + tei ) − f (a)
lim − Df (a)e i = 0.

t→0 t
Equivalently,
f (a + tei ) − f (a)
lim = Df (a)ei .
t→0 t
By definition, the existence of this limit means the existence of the ith partial derivative:

f (a + tei ) − f (a) f (a1 , . . . , ai + t, . . . , an ) − f (a1 , . . . , an ) ∂f


lim = lim = (a).
t→0 t t→0 t ∂xi

Remark. Let h = (h1 , . . . , hn ) ∈ Rn be arbitrary. Then


n n n
!
X X X ∂f
Df (a)h = Df (a) hi ei = hi Df (a)ei = hi (a) = h∇f (a), hi.
∂xi
i=1 i=1 i=1

Thus we can identify the Fréchet derivative with the gradient.


To prove that the converse does not hold, take
(
xy
x2 +y 2
if (x, y) 6= (0, 0)
f (x, y) =
0 if (x, y) = (0, 0).

Note that by definition


f (x, 0) = f (0, y) = 0 ∀x, y ∈ R.
This implies that
f (0 + t, 0) − f (0, 0)
lim = lim 0 = 0.
t→0 t t→0

Therefore ∂f ∂f
∂x (0, 0) exists and is equal to 0. Similarly ∂y (0, 0) exists and is equal to 0. However, f
is not continuous at 0 and so cannot be differentiable there.

Proposition 2.9. Let f : U ⊂ Rn → Rm and let a ∈ U . This means that f is function of n


real variables and has m components f1 , . . . , fm . Then f is differentiable at a if and only if each
component fk is differentiable at a. In this case Df (a) can be identified with the Jacobian matrix
Ö ∂f1 ∂f1 è
∂x1 ··· ∂xn
Df (a) = .. .. .. .
. . .
∂fm ∂fm
∂x1 ··· ∂xn

Proposition 2.10. Let E be a normed space and let f (x) = ||x||. Then f is never differentiable
at 0.
Proof. Suppose that f is differentiable at 0. Then ||h|| − Df (0)h = o(h). Replacing h by −h,
we get ||h|| + Df (0)h = o(h). Adding the two equations we get 2||h|| = o(h) which is impossible.

1
The last result of this section generalizes the fact that the map t 7→ t is C 1 on R\{0} and its
derivative is t 7→ −1
t2
.
2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 27

Theorem 2.1. Let E and F be isomorphic Banach spaces and consider the map Φ : Isom(E, F ) →
L(F, E) defined by Φ(S) = S −1 . Then Φ is of class C 1 and for S ∈ Isom(E, F ) and H ∈ L(E, F )
we have
Φ0 (S)H = −S −1 HS −1 .
Remark. The order is important and generally not commutative. Note that Φ0 (S)H is a linear
bounded operator obtained by the diagram
S −1 H S −1
F −→ E −→ F −→ E

Proof. Check first that for each S ∈ Isom(E, F ), the operator H 7→ −S −1 HS −1 is indeed linear
and continuous. Now, let H ∈ Isom(E, F ) be small enough. We can write

Φ(S + H) − Φ(S) = (S + H)−1 − S −1


= (S + H)−1 SS −1 − (S + H)−1 (S + H)S −1
= (S + H)−1 (S − (S + H))S −1
= −(S + H)−1 HS −1 .

Therefore
Φ(S + H) − Φ(S) − (−S −1 HS −1 ) = − (S + H)−1 HS −1 − S −1 HS −1


= − (S + H)−1 − S −1 HS −1


= − (Φ(S + H) − Φ(S)) HS −1 .

We will prove that the above term is o(H).


Taking norms, and using the fundamental inequality ||T L|| ≤ ||T ||||L||, we get
Φ(S + H) − Φ(S) − (−S −1 HS −1 ) ≤ ||Φ(S + H) − Φ(S)||||H|||||S||.

Continuity of Φ implies that the term to the right is o(H). This proves that Φ is differentiable and
Φ0 (S)H = −S −1 HS −1 (check that this is linear and continuous in H for every S).
Now we prove that Φ0 : Isom(E, F ) → L (L(E, F ), L(F, E)) is continuous. Consider the map

Ψ : L(F, E) × L(F, E) → L (L(E, F ), L(F, E))

defined by
Ψ(T, L)H = −T HL.
You can check that that Ψ is bilinear and continuous because ||Ψ(T, L)|| ≤ ||T ||||L||. Now observe
that
Φ0 (S) = Ψ(S −1 , S −1 ) = Ψ(Φ(S), Φ(S)).
This implies that Φ0 is continuous as a composition of continuous functions. 

The Gâteau derivative


The notion of Gâteau derivative is an extension of the notion of directional derivative that you
probably encountered in Calculus. So let us revisit the latter.

Definition Let f : U ⊂ Rn → R be a function of several real variables. Let a ∈ U and let v ∈ Rn .


The directional derivative of f at a along the vector v, when it exists, is the limit
f (a + tv) − f (a)
lim .
t→0 t
28 CHAPTER 2. DIFFERENTIABLE MAPS

It is usually denoted by one of the following symbols

∂f
∇v f (a), (a), fv0 (a).
∂v
Note that we can write
d
∇v f (a) = f (a + tv) .
dt t=0

Proposition 2.11. If f is Fréchet differentiable at a, then f has a directional derivative at a along


any vector v and
∇v f (a) = ∇f (a) · v.
Proof. If we replace h by tv in the definition of Fréchet derivative, we get

f (a + tv) − f (a) = tDf (a)v + o(t) near t = 0,

and so
f (a + tv) − f (a) o(t)
= Df (a)v + .
t t
f (a+tv)−f (a)
It follows that limt→0 t exists and

f (a + tv) − f (a)
lim = Df (a)v = ∇f (a) · v. 
t→0 t

Remark 1. The above relation can also be obtained by the chain rule. Indeed

d
∇v f (a) = f (a + tv) = Df (a + tv)v = Df (a)v.
dt t=0 t=0

Remark 2. If f has a directional derivative at a along any vector, then it has partial derivatives
at a. Indeed, take v to be an element of the canonical basis {e1 , . . . , en }.
Remark 3. The relation ∇v f (a) = ∇f (a) · v has a very important consequence. Suppose that
||v|| = 1 and let θ denote the angle between ∇f (a) and v. Then

∇v f (a) = ||∇f (a)|| cos θ.

∇v f (a) represents the rate of change of f at the point a. It achieves a maximum value when θ = 0,
that is, when v points in the same direction as ∇f (a). Thus, the gradient of f at a represents
the direction in which f increases the most. Therefore −∇f (a) represents the direction in which f
decreases the most. This interpretation of the gradient underlies many minimization algorithms in
numerical analysis and machine learning. Such algorithms are based on iterations of the form

xn+1 = xn − α∇f (xn ).

Definition. Let E an F be normed spaces, let U be an open subset of E. Let f : U ⊂ E → F


be a map, Let a ∈ U and let v ∈ E. The Gâteau derivative of f at a along the vector v, when it
exists, is the limit
f (a + tv) − f (a)
lim .
t→0 t
It is usually denoted by one of the following symbols.

df (a, v), Dv f (a).


2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 29

Note that we can write


d
df (a, v) = f (a + tv) .
dt t=0

Proposition 2.12. If f is Fréchet differentiable at a, then f has a Gâteau derivative at a along


any vector v and
df (a, v) = Df (a)v.

Proof. Same proof as Proposition 2.11. 

Remark. However the converse is not true. Indeed, the map v 7→ df (a, v) need not be linear
or continuous. Some mathematicians include in the definition of the Gâteau derivative the linearity
and continuity of the map v 7→ df (a, v). However even if this is true, the function f need not be
differentiable in the sense of Fréchet. We present some examples.

Example 1. Where the Gâteau derivative exists but is not linear. Let f : R2 → R be defined by

x3

 if (x, y) 6= (0, 0)
f (x, y) = x2 + y2
0 if (x, y) = (0, 0).

As you can check, the Gateau derivative at 0 is given by

u3

if (u, v) 6= (0, 0)

df (0, 0, u, v) = u2 + v 2
0 if (u, v) = (0, 0).

Therefore (u, v) 7→ df (0, 0, u, v) is not linear.

Example 2. Where the Gâteau derivative exists, is linear but not continuous. Let E be an infinite
dimensional normed space. Then there exists a linear unbounded map f : E → R. It is easy to
check that the Gâteau derivative at any point x ∈ E is given by

df (x, v) = f (v).

Therefore v 7→ df (x, v) is linear but not continuous.

Example 3. Where the Gâteau derivative is linear and continuous but the Fréchet derivative does
not exist. Let f : R2 → R be defined by
 3
 x y if (x, y) 6= (0, 0)
f (x, y) = x6 + y 2
0 if (x, y) = (0, 0).

As you can check, the Gâteau derivative at 0 is the 0 map:

df (0, 0, u, v) = 0.

Therefore (u, v) 7→ df (0, 0, u, v) is linear and continuous. However f is not even continuous at
0. Therefore it is not Fréchet differentiable at 0. This example therefore shows that Gâteau
differentiability does not imply continuity.
30 CHAPTER 2. DIFFERENTIABLE MAPS

Methodology
Thus, to study the Fréchet differentiability of a map f : U ⊂ E → F at a point a, we may proceed
as follows.

1. If f is not continuous at a, then f is not Fréchet differentiable at a.

2. Otherwise, we find
f (a + tv) − f (a)
df (a, v) = lim .
t→0 t
3. If either the limit does not exist, or exists but is either not linear or not continuous in v, we
conclude that f is not Fréchet differentiable at a.

4. If this limit is linear and continuous in v, then df (a, v) is the unique candidate for the position
of the Fréchet derivative. But we must prove or disprove that

f (a + h) − f (a) − df (a, h) = o(h).

If this relation is satisfied, we conclude that the Fréchet derivative exists and is df (a, h). If not,
we conclude that f is not Fréchet differentiable at a.

This methodology will be useful in some exercises. However, it is not necessary to always follow it.
Sometimes, the situation is much simpler. Here’s an example.
Example. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let us show that f is
Fréchet differentiable and find its derivative. We start by writing

f (x + h) − f (x) = ||x + h||2 − ||x||2 = hx + h, x + hi − ||x||2 = ||x||2 + 2hx, hi + ||h||2 − ||x||2


= 2hx, hi + ||h||2 .

It should be clear that the linear part is indeed 2hx, hi. This term is indeed continuous in h and we
have
f (x + h) − f (x) − 2hx, hi = ||h||2 = o(h).
This means that f is Fréchet differentiable at x and f 0 (x)h = 2hx, hi.

2.2 Linearity of the derivative and the chain rule


Convention. Unless otherwise stated, in the notation

f : U ⊂ E → F,

E and F are normed spaces and U is an open subset of E.


Proposition 2.13. Let f, g : U ⊂ E → F be two differentiable maps at a point a ∈ U and let
λ ∈ R. Then f + g and λf are differentiable at a and
(f + g)0 (a) = f 0 (a) + g 0 (a)
(λf )0 (a) = λf 0 (a).
Thus, differentiation is a linear operation on maps.
Proof. By assumption, we have near h = 0,
f (a + h) − f (a) − f 0 (a)h = o(h)
g(a + h) − g(a) − g 0 (a)h = o(h).
2.2. LINEARITY OF THE DERIVATIVE AND THE CHAIN RULE 31

Adding these two identities and using the fact that o(h) + o(h) = o(h), we get

(f + g)(a + h) − (f + g)(a) − f 0 (a) + g 0 (a) h = o(h)




This proves that f + g is differentiable at a and (f + g)0 (a) = f 0 (a) + g 0 (a).


Next, multiplying by λ the identity

f (a + h) − f (a) − f 0 (a)h = o(h)

and using the fact that λo(h) = o(h), we get

(λf )(a + h) − (λf )(a) − λf 0 (a)h = o(h).

This proves that λf is differentiable at a and (λf )0 (a) = λf 0 (a). 

In different notation,
D(f + g)(a) = Df (a) + Dg(a)
D(λf )(a) = λDf (a).
Corollary 2.1. If f and g are of class C 1 , then f + g and λf are of class C 1 .

Proposition 2.14. (The chain rule). Let f : U ⊂ E → F be and g : V ⊂ F → G be maps such


that f (U ) ⊂ V . This condition insures that g ◦ f is well defined and is a map from U to G. If f
is differentiable at a point a ∈ U and g is differentiable at the point b = f (a) ∈ V , then g ◦ f is
differentiable at a and
(g ◦ f )0 (a) = g 0 (f (a)) ◦ f 0 (a)
or in different notation
D(g ◦ f )(a) = Dg(f (a)) ◦ Df (a).
Proof. Differentiability of f at a means that

f (a + h) − f (a) = f 0 (a)h + ϕ(h), where ϕ(h) = o(h) near h = 0. (2.1)

Differentiability of g at b means that

g(y) − g(b) = g 0 (b)(y − b) + ψ(y), where ψ(y) = o(y − b) near y = b. (2.2)

Replacing y by f (a + h) and b by f (a), we get

g(f (a + h)) − g(f (a)) = g 0 (f (a))(f (a + h) − f (a)) + ψ(f (a + h)).

Replacing f (a + h) − f (a) by f 0 (a)h + ϕ(h) and using the linearity of g 0 (f (a)), we get

g(f (a + h)) − g(f (a)) = g 0 (f (a))(f 0 (a)h + ϕ(h)) + ψ(f (a + h))


= g 0 (f (a))(f 0 (a)h) + g 0 (f (a))ϕ(h) + ψ(f (a + h))
= g 0 (f (a)) ◦ f 0 (a) h + g 0 (f (a))ϕ(h) + ψ(f (a + h)).


We have to prove that

g 0 (f (a))ϕ(h) + ψ(f (a + h)) = o(h) near h = 0.

The first term is indeed o(h) because

||g 0 (f (a))ϕ(h)|| ||g 0 (f (a))||||ϕ(h)||


≤ → 0 as h → 0.
||h|| ||h||
32 CHAPTER 2. DIFFERENTIABLE MAPS

To prove that the second term is o(h), let ε > 0 be given. Let M = ||f 0 (a)|| + 1. Since ψ(y) =
o(y − b), there exists δ > 0 such that

||ψ(y)|| ≤ ε||y − b|| whenever ||y − b|| ≤ δ. (2.3)

Since ϕ(h) = o(h), there exists δ1 > 0 such that

||ϕ(h)|| ≤ ||h|| whenever ||h|| ≤ δ1 .

It follows from equation (2.1) that

||f (a + h) − f (a)|| ≤ ||f 0 (a)||||h|| + ||h|| = M ||h|| whenever ||h|| ≤ δ1 . (2.4)


δ δ
Let δ2 = min(δ1 , M ). Let ||h|| ≤ δ2 . Then ||h|| ≤ δ1 and ||h|| ≤ M . It follows that M ||h|| ≤ δ
and so ||f (a + h) − f (a)|| ≤ δ by the previous inequality. It follows from inequalities (2.3) and (2.4)
that
||ψ(f (a + h))|| ≤ ε||f (a + h) − f (a)|| ≤ εM ||h||.
Since ε was arbitrary, the last inequality means that

ψ(f (a + h)) = o(h) near h = 0. 

Corollary 2.2. Under the above assumptions, if f and g are of class C 1 then g ◦ f is also of class
C 1.

2.3 Functions between products of normed spaces


We will first generalize the following fact. Let f : U ⊂ R → Rm be a function with components
f1 , f2 , . . . , fm . It is easy to prove that f is differentiable at a point a ∈ U if and only if all its
components are differentiable at a and in this case

f 0 (a) = (f10 (a), . . . , fm


0
(a)).

Proposition 2.16. Let E, F1 , . . . , Fm be normed spaces and let f : U ⊂ E → m


Q
i=1 Fi be a
map with components f1 , . . . , fm . Then f is differentiable at a point a ∈ U if and only if all the
components fi are differentiable at a. In this case we have

f 0 (a)h = (f10 (a)h, . . . , fm


0
(a)h) ∀ h ∈ E.

If we define (f10 (a), . . . , fm


0 (a))h as a vector or raw matrix of linear bounded operators, we can write

f 0 (a) = (f10 (a), . . . , fm


0
(a)).
Qm
Proof. Let pk : i=1 Fi → Fk denote the projection onto Fk ; so that

pk (y1 , . . . , ym ) = yk .
Qm Qm
Let uk : Fk → i=1 Fi denote the natural injection of Fk into i=1 Fi defined by

uk (x) = (0, . . . , x, . . . , 0) x at the kth component.

Then it is easy to see that each pk and each uk is a linear


Q bounded map. According to Proposition
0 0
2.5., uk (x) = uk for all x ∈ Fk and pk (y) = pk for y ∈ Fi . It is also easy to see that

fi = pi ◦ f
Xm
f= ui ◦ fi .
i=1
2.3. FUNCTIONS BETWEEN PRODUCTS OF NORMED SPACES 33

If f is differentiable at a, then by the chain rule, for every i = 1, . . . , m, fi is differentiable at a and

fi0 (a) = p0i (f (a)) ◦ f 0 (a) = pi ◦ f 0 (a),

or equivalently
fi0 (a)h = pi f 0 (a)h .


We can write this as


f 0 (a)h = (f10 (a)h, . . . fm
0
(a)h).

Pm fi is differentiable at a, then by the chain rule, ui ◦ fi is differentiable at a. By


Conversely if each
linearity, f = i=1 ui ◦ f is differentiable at a. 

Corollary 2.3. Under the above assumptions, f is C 1 if and only if all its components are C 1 .

In order to generalize the formula (f g)0 = f 0 g + f g 0 , we replace f g by a bilinear continuous map


L(f, g).

Corollary 2.3. (The product rule). Let f : U ⊂ E → F1 , g : U ⊂ E → F2 and let


L ∈ L(F1 , F2 ; G). If f and g are differentiable at a point a ∈ U then the map F = L(f, g) is also
differentiable at a and
F 0 (a)h = L(f 0 (a)h, g(a)) + L(f (a), g 0 (a)h).

Proof. Let Φ(a) = (f (a), g(a)). By the previous proposition, Φ is differentiable at a and
Φ0 (a)h = (f 0 (a)h, g 0 (a)h). Now we can write F = L ◦ Φ. It follows from the chain rule and
Proposition 2.6. that

F 0 (a)h = L0 (Φ(a))(Φ0 (a)h)


= L0 (f (a), g(a))(f 0 (a)h, g 0 (a)h)
= L(f 0 (a)h, g(a)) + L(f (a), g 0 (a)h). 

Corollary 2.4. Under the above assumptions, if f and g are C 1 , then L(f, g) is also C 1 .
f
Now we will generalize the derivative of a quotient g. But for this we need a lemma.

Lemma 2.2. Let F : R × (R\{0}) → R be defined by


s
F (s, t) = .
t
Then F is C 1 and its Fréchet derivative is given by
1 s
DF (s, t)(h, k) = h − 2 k.
t t
Proof. We will prove in the next chapter that if the partial derivatives of a function of several
variables exist and are continuous then the function is C 1 . This is the case with the above function
−s
because ∂F 1 ∂F
∂s = t and ∂t = t2 . If you do not want to anticipate, you can go back to the definition
and prove that
(th − sk)k
Å ã
1 s
F (s + h, t + k) − F (s, t) − h − 2k = − 2 = O(||(h, k)||2 ). 
t t t (t + k)
Corollary 2.5. (Derivative of a quotient). Let f, g : U ⊂ E → R be two maps such that
g(x) 6= 0 for all x ∈ U . If f and g are differentiable at a point a ∈ U , then fg is differentiable at a
and Å ã0
f g(a)f 0 (a)h − f (a)g 0 (a)h
(a)h = .
g g(a)2
34 CHAPTER 2. DIFFERENTIABLE MAPS

f
Proof. This is a consequence of the previous lemma and the chain rule because g = F (f, g) and
so Å ã0
f
(a)h = DF (f (a), g(a))(f 0 (a)h, g 0 (a)h)
g
1 0 f (a) 0
= f (a)h − g (a)h
g(a) g(a)2
g(a)f 0 (a)h − f (a)g 0 (a)h
= . 
g(a)2
f
Corollary 2.6. Under the above assumptions, if f and g are C 1 , then g is also C 1 .

Functions of several variables


Let E1 , . . . , En , F be normed spaces and let f : U ⊂ ni=1 Ei → F be a map.
Q
Definition.
The partial derivative with respect to the ith variable at a point a = (a1 , . . . , an ), if it exists, is
the derivative of the partial map x 7→ f (a1 , . . . , ai−1 , x, ai+1 , . . . , an ) at the point ai . The partial
derivative is denoted by one of the following symbols

∂f
(a), fx0 i (a), fxi (a), Di f (a), ∂i f (a).
∂xi

∂f
Thus, when it exists, (a) ∈ L(Ei , F ).
∂xi
Remark. Let λi : Ei → ni=1 Ei denote the map defined by
Q

λi (x) = (a1 , . . . , ai−1 , x, ai+1 , . . . , an ).

Then
∂f
(a) = (f ◦ λi )0 (ai ).
∂xi

Let f : U ⊂ ni=1 Ei → F be a map. If f is Fréchet differentiable at a


Q
Proposition 2.18.
point a ∈ U , then Q f has partial derivatives with respect to all its variables at the point a and for
h = (h1 , . . . hn ) ∈ Ei we have
n
0
X ∂f
f (a)h = (a)hi .
∂xi
i=1

Proof. Let ui : Ei → ni=1 Ei be the canonical injection defined by


Q

ui (x) = (0, . . . , x, . . . , 0) x at the ith component.

Then we can write


λi (x) = (a1 , . . . , ai−1 , x, ai+1 , . . . , an )
= (a1 , . . . , ai−1 , ai + x − ai , ai+1 , . . . , an )
= (a1 , . . . , an ) + (0, . . . , x − ai , 0 . . . , 0)
= a + ui (x − ai ) = a + ui (x) − ui (ai ).
It follows that λi is differentiable and
λ0i (x) = ui .
Since f is differentiable at a, it follows from the chain rule that f ◦ λi is differentiable at a and

(f ◦ λi )0 (a) = f 0 (a) ◦ ui .
2.3. FUNCTIONS BETWEEN PRODUCTS OF NORMED SPACES 35

∂f
This means that ∂xi (a) exists and

∂f
(a)hi = f 0 (a)(ui (hi ))
∂xi
Therefore
n n
X ∂f X
(a)hi = f 0 (a)(ui (hi ))
∂xi
i=1 i=1
n
!
X
0
= f (a) ui (hi )
i=1
= f 0 (a)h. 
Remark. If we write h as a column vector, then we can write
Ö è
Å ã h1
∂f ∂f ..
f 0 (a)h = (a) · · · (a) . .
∂x1 ∂xn
hn

Several variables and several components


Combining Propositions 2.16 and 2.18, we get
Qn Qm
Proposition 2.19. Let f : U ⊂ j=1 Ej → i=1 Fj
be a map with components f1 , . . . , fm . If f
∂fi
is differentiable at a point a ∈ U , then all partial derivatives (a) exist and
∂xj
Ö ∂f1 ∂f1 èÖ è
∂x1 (a) ··· ∂xn (a) h1
f 0 (a)h = .. .. .. .. .
. . . .
∂fm ∂fm hn
∂x1 (a) · · · ∂xn (a)
36 CHAPTER 2. DIFFERENTIABLE MAPS
Chapter 3

The mean value theorem

3.1 Some common versions of the mean value theorem


Here’s the precise statement of the classical mean value theorem.
The Mean value theorem, version 1.0. Let f : [a, b] → R be continuous and differentiable on
]a, b[. Then there exists c ∈]a, b[ such that

f (b) − f (a) = f 0 (c)(b − a).

This theorem can be deduced from Rolle’s theorem (see M1106).


Remark. Do not confuse the mean value theorem (théorème des accroissements finis) with
the intermediate value theorem (théorème des valeurs intermédiaires). In its elementary form, the
intermediate value theorem states that if f : [a.b] → R is continuous and r is a number between
f (a) and f (b), then there exists a number c ∈ [a, b] such that f (c) = r.
In this form, the mean value theorem (MVT) does not hold when f takes values in an arbitrary
normed space. Here’s an example.
Counter-example. Let f : [0, 2π] → R2 be defined by f (t) = (cos t, sin t). Suppose that there
exists c ∈]0.2π[ such that
f (2π) − f (0) = f 0 (c)2π.
Then
(0, 0) = (− sin c, cos c)2π.
But this is impossible because sin and cos do not vanish at the same time.
Challenging exercise. Show that for every two normed spaces E and F with dim F ≥ 2 (and
of course dim E ≥ 1), there exists a differentiable function f : E → F that does not satisfy the
classical mean value theorem.
Question. What happens if dim E = 0?

However, not everything is lost. We will show first that the mean value theorem holds for real
valued maps. Next, by modifying its form, we will be able to extend the mean value theorem to
higher dimensions. But first, we have to generalize the notion of an interval.
Definition. Let E be a normed space and let a, b ∈ E. We set

ˆ [a, b] = {(1 − t)a + tb; 0 ≤ t ≤ 1}. We call it the straight-line segment joining a to b or the
straight-line segment with endpoints a and b.

ˆ ]a, b[= {(1 − t)a + tb; 0 < t < 1}. We call it the open straight-line segment joining a to b.

37
38 CHAPTER 3. THE MEAN VALUE THEOREM

A set U ⊂ E is said to be convex if [a, b] ⊂ U whenever a, b ∈ U . The notion of convexity is one


generalization of the notion of interval. Two other generalizations are the notions of connectedness
and path connectedness (see the next section).

The mean value theorem, version 2.0. Let U be an open subset of a normed space E and let
f : U → R be differentiable. If [a, b] ⊂ U , then there exists ζ ∈]a, b[ such that
f (b) − f (a) = f 0 (ζ)(b − a).

b
ζ
a

Proof. Let γ(t) = (1 − t)a + tb = a + t(b − a), for t ∈ R. Then γ is differentiable and γ 0 (t) = b − a
(here we identify L(R, E) with E). Let g = f ◦ γ. Then g is a map from [0, 1] to R. It is continuous
on [0, 1] and differentiable on ]0, 1[. By the chain rule
g 0 (t) = f 0 (γ(t))γ 0 (t) = f 0 ((1 − t)a + tb)(b − a).
By version 1.0, there exists c ∈]0, 1[ such that
g(1) − g(0) = g 0 (c)(1 − 0) = g 0 (c).
Therefore,
f (b) − f (a) = f 0 (γ(c))(b − a).
Letting ζ = γ(c), we see that ζ ∈]a, b[ and
f (b) − f (a) = f 0 (ζ)(b − a). 
With the same arguments used above, we can prove the following version.

The mean value theorem, version 2.1. Let U be an open subset of a normed space E and let
f : U → R be continuous and differentiable on U . If ]a, b[⊂ U , then there exists ζ ∈]a, b[ such that
f (b) − f (a) = f 0 (ζ)(b − a).

a ζ
b

U
3.1. SOME COMMON VERSIONS OF THE MEAN VALUE THEOREM 39

Proof. Note that γ(]0, 1[) =]a, b[⊂ U and γ([0, 1]) = [a, b] ⊂ U . It follows that g = f ◦ γ is
continuous on [0, 1] and differentiable on ]0, 1[. Therefore we can apply version 1.0 to g. 
As a consequence of version 2.0 we get

The mean value theorem, version 2.2. Let U be an open and convex subset of a normed space
E and let f : U → R be differentiable. Then for every a, b ∈ U , there exists ζ ∈]a, b[ such that

f (b) − f (a) = f 0 (ζ)(b − a).

We saw that version 1.0 of the MVT is not generalizable for maps f : U ⊂ E → F if dim F ≥ 2.
However there is a very important consequence of version 1.0 known as the MVT inequality.

The mean value theorem, version 1.1. Let I be an open interval of R and let f : I → R be
differentiable. Suppose that there is a constant M such that |f 0 (x)| ≤ M for all x ∈ I. Then

|f (x) − f (y)| ≤ M |x − y| ∀x, y ∈ I.

In words, a differentiable map with a bounded derivative is Lipschitz continuous.


As a consequence, we get
The mean value theorem, version 1.2. Let I be an open interval and let f : I → R be of class
C 1 . Then f is Lipschitz-continuous on the compact subsets of I.

Now we are going to generalize these two versions.


The mean value theorem, version 3.0. Let F be a normed space and let f : [a, b] ⊂ R → F
be continuous and differentiable on ]a, b[. If there is a constant M such that ||f 0 (x)|| ≤ M for all
x ∈]a, b[. Then
||f (b) − f (a)|| ≤ M (b − a).
Proof. We will prove that for every x ∈]a, b[ we have

||f (x) − f (a)|| ≤ M (x − a).

The result will follow by continuity of f and the norm when we let x → b. To prove this we will
prove that for every ε > 0

||f (x) − f (a)|| ≤ (M + ε)(x − a) + ε.

Let ε > 0 be given and consider the set

U = {x ∈]a, b[; ||f (x) − f (a)|| > (M + ε)(x − a) + ε} .

We can write
U = {x ∈]a, b[; ϕ(x) > 0} .
where ϕ(x) = ||f (x) − f (a)|| − (M + ε)(x − a) − ε. Continuity of ϕ implies that U is open. Suppose
by contradiction that U 6= ∅. Since U is bounded from below by a, U has an infimum c ≥ a.
Claim 1. c > a. Indeed, if not, then c = a. By a property of the infimum, there exists a sequence
(xn ) in U that converges to c = a. Since ϕ(xn ) > 0 and ϕ is continuous, we get ϕ(a) ≥ 0. However
ϕ(a) = −ε < 0.
Claim 2. c ∈ / U . This follows from the general fact that the infimum of a subset A of R belongs
to ∂A i.e., does not belong to the interior of A (check that).
Claim 3. c < b. Indeed, let x0 ∈ U . Then x0 ∈]a, b[ and x0 ≥ inf U = c. Thus, c ≤ x0 < b.
40 CHAPTER 3. THE MEAN VALUE THEOREM

Since c ∈]a, b[, it follows from the assumption of the theorem that ||f 0 (c)|| ≤ M . On the other
hand, it follows from the definition of the derivative that
f (x) − f (c)
lim = f 0 (c),
x→c x−c
and so
f (x) − f (c)
lim = ||f 0 (c)||.
x→c x−c
It follows that there exists δ > 0 such that

0
f (x) − f (c)
||f (c)|| ≥
− ε, whenever 0 < |x − c| < δ.
x−c

By a property of the infimum, [c, c + δ[∩U 6= ∅. Choose accordingly x ∈ [c, c + δ[∩U . Since c ∈
/ U,
we have x > c.
Since M ≥ ||f 0 (c)||, we get
f (x) − f (c)
M ≥ − ε,
x−c
and so
||f (x) − f (c)|| ≤ (M + ε)(x − c).
Since c ∈
/ U , we have
||f (c) − f (a)|| ≤ (M + ε)(c − a) + ε.
By the triangle inequality, we get

||f (x) − f (a)|| ≤ (M + ε)(x − a) + ε.

This means that x ∈


/ U , a contradiction. 
As a consequence we obtain the following.
The mean value theorem, version 3.1. Let E and F be normed spaces, let U be an open subset
of E and let f : U → F be differentiable. If [a, b] ⊂ U and ||f 0 (ζ)|| ≤ M for all ζ ∈ [a, b], then

||f (b) − f (a)|| ≤ M ||b − a||.

Proof/Exercise. We let γ(t) = (1 − t)a + tb = a + t(b − a) and g = f ◦ γ. Then g is a map from


[0, 1] to F that satisfies the assumptions of the previous theorem. Therefore we continue as in the
proof of version 2.0. 
The same arguments also prove the following version.

The mean value theorem, version 3.2. Let U be an open subset of a normed space E and let
f : U → F be continuous and differentiable on U . If ]a, b[⊂ U and ||f 0 (ζ)|| ≤ M for all ζ ∈]a, b[,
then
||f (b) − f (a)|| ≤ M ||b − a||.
As a corollary of version 3.1 we get

The mean value theorem, version 3.3. Let U be an open and convex subset of a normed space
E and let f : U → F be differentiable. Suppose that there is a constant M such that ||f 0 (x)|| ≤ M
for all x ∈ U . Then
||f (x) − f (y)|| ≤ M ||x − y|| ∀x, y ∈ U.
In words, a differentiable map with a bounded derivative is Lipschitz continuous.
3.2. CONNECTEDNESS AND LOCALLY CONSTANT MAPS 41

As a corollary of this corollary, we get


The mean value theorem, version 3.4. Let U be an open and convex subset of a normed space
E and let f : U → F be of class C 1 . Then f is Lipschitz-continuous on the compact subsets of U .

Challenging exercise. Let E and F be normed spaces, let U ⊂ E and let f : U ⊂ E → F be


map. We say that f is locally Lipschitz continuous if every point x ∈ U has a neighborhood Vx on
which f is Lipschitz continuous. Consider the following two conditions.

(i) f is locally Lipschitz continuous.

(ii) f is Lipschitz continuous on compact subsets of U .

Questions.

(a) Show that (ii)⇒(i) if dim E < ∞.

(b) Give an example of a locally Lipschitz continuous function which is not globally Lipschitz
continuous.

(c) Show that (i) implies that f is continuous.

(d) Show that (ii) implies that f is continuous.

(e) Show that (i) ⇒ (ii).

(f) Does (ii)⇒(i) if dim E = ∞?

3.2 Connectedness and locally constant maps


One consequence of the classical mean value theorem is the following.
Proposition 3.1. Let I be an open interval of R and let f : I → R. If f 0 (x) = 0 for all x ∈ I,
then f is constant.
Proof. Let a, b ∈ I. Then, by the mean value theorem, there exists c between a and b such that
f (b) − f (a) = f 0 (c)(b − a) = 0. Therefore f (b) = f (a) for all a and b in I. This means that f is
constant on I. 
Consider however the following example.
Example. Let f (x) = Arctan x + Arctan x1 for x ∈ R\{0}. Then

1 −1/x2
f 0 (x) = + = 0.
1 + x2 1 + x12
π
But f (1) = Arctan 1 + Arctan 1 = 2 and f (1) = Arctan (−1) + Arctan (−1) = − π2 . What’s wrong?

Recall the following.


Definition. Let X be a topological space. We say that X is connected if the only closed and
open subsets of X are ∅ and X. Equivalently, we cannot write X = A ∪ B where A and B are
both open, nonempty and disjoint.
A path in X is a continuous map γ : [α, β] ⊂ R → X. γ(α) and γ(β) are called the endpoints
of γ. We say that X is path connected if every two points of X can be joined by a path in X,
that is, for every two points a, b ∈ X there exists a path γ : [α, β] → X such that γ(α) = a and
γ(β) = b.
42 CHAPTER 3. THE MEAN VALUE THEOREM

In M2201, we proved that a path connected space is connected, but the converse is not true.
However, we have the following.
Proposition 3.2. Let U be an open subset of a normed space. Then the following conditions are
equivalent.
(i) U is connected.

(ii) U is path connected.

(iii) Any two points of U can be joined by a polygonal path in U .

Remark on the proof. It should be clear that (iii)⇒ (ii)⇒ (i). To prove that (i)⇒(iii), we
use a connectedness argument. We fix a point a ∈ U and we consider the set A of points in U
that can be joined to a by a polygonal line. Then we prove that A is not empty, open and closed.
Connectedness then implies that A = U and so all points of U can be joined to a by a polygonal
path. Since a was arbitrary, this means that any two points of U can be joined by a polygonal path.
Try to carry out this program or see the proof in the book of Cartan.

Exercise. What assumptions should we put on the topological X so that every open and connected
subset of X is path connected?

Remark. We have the following implications

Convex ⇒ Path connected ⇒ Connected.

The reverse implications are not true in general.

Definition. We say that a map f : U ⊂ E → F is locally constant if every point of U has a


neighborhood on which f is constant.
Example. Let f : R\{0} → R be defined by
®
1 if x > 0
f (x) =
−1 if x < 0.

Then f is locally constant but not constant.

Theorem 3.1. Let f : U ⊂ E → F be a map between normed spaces. If f is locally constant and
U is connected then f is constant.
Proof. It should be clear that a locally constant map is continuous. Let a ∈ U and consider the
set A = {x ∈ U ; f (x) = f (a)}. Then a ∈ A and so A 6= ∅. Continuity of implies that A is closed
because we can write A = f −1 ({f (a)}). Local constancy implies that A is open. Connectedness of
U implies that A = U and so f (x) = f (a) for all x ∈ U . 

Exercise. What condition should we put on the topological space Y so that the above result holds
for any function f : X → Y between topological spaces?
Corollary 3.1. Let E and F be normed spaces, let U ⊂ E be open and let f : U ⊂ E → F . Then
f is locally constant if and only if f is constant on the connected components of U .
Proof. Recall that connected components are maximal (with respect to inclusion) connected
subsets of U and that every connected subset of U is contained in some component. If f is locally
constant, then, by the previous theorem, it is constant on every connected subset of U and in
particular it is constant on the connected components of U . Conversely suppose that f is constant
on every connected component of U and let x ∈ U and let C denote the connected component
3.3. RELATION BETWEEN DIFFERENTIABILITY AND PARTIAL DIFFERENTIABILITY 43

containing x. Since U is open there is a ball B(x, r) ⊂ U . Since B(x, r) is convex, it is connected
and so is contained in C since C is the biggest connected subset of U containing x. Since f is
constant on C, it is also constant on B(x, r). This proves that f is locally constant.
Remark. The last argument shows that connected components of U are open.
Exercise. What conditions should we put on X and Y so that the previous corollary holds for a
map f : X → Y between topological spaces?

Theorem 3.2. Let f : U ⊂ E → F be differentiable (U open). Then f 0 = 0 if and only if f is


locally constant.
Proof. Suppose first that f is locally constant. Then it is not difficult to see that f is differentiable
and f 0 (a) = 0 for all a ∈ U . Conversely, suppose that f 0 = 0 on U and let x ∈ U . Since U is
open, there exists a ball B(x, r) ⊂ U. Let a, b ∈ B(x, r). Since [a, b] ⊂ B(x, r) and ||f 0 (ζ)|| = 0
for all ζ ∈ [a, b], the mean value theorem implies that ||f (b) − f (a)|| ≤ 0||b − a|| = 0. It follows
that f (a) = f (b) and so f is constant on B(x, r). 
Corollary 3.2. If in in the above theorem, U is connected and f 0 = 0, then, f is constant.

3.3 Relation between differentiability and partial differentiability


Q
Let f : U ⊂ Ei → F be a map of several variables. We saw in the last chapter that if f is Fréchet
differentiable at a point a ∈ U , then all the partial derivatives of f at a exist but the converse is
not true. However if we assume that the partial derivatives exist and are continuous on U , then f
is of class C 1 on U . This is the content of the next theorem.

Theorem 3.3. Let f : U ⊂ ni=1 Ei → F be a map of several variables. Then f is of class C 1 on


Q
U if and only f has continuous partial derivatives on U .
Proof. Suppose first that f is of class C 1 on U . We know from Proposition 2.18. that f has
partial derivatives and we can write for every a ∈ U ,
Å ã
∂f ∂f
f 0 (a) = (a), . . . , (a) , (3.1)
∂x1 ∂xn

which really means that


n
X ∂f
f 0 (a)h = (a)hi .
∂xi
i=1

Otherwise stated, the partial derivatives are the components of the Fréchet derivative. Therefore,
continuity of f 0 is equivalent to the continuity of each component.
∂f
Suppose conversely that for every i, the map ∂x i
: U → L(Ei , F ) is continuous. We will prove
that f is differentiable at every point a ∈ U . Continuity of f 0 will follow from equation (3.1). We
will show that
n
X ∂f
f (x) − f (a) − (a)(xi − ai ) = o(x − a) as x → a.
∂xi
i=1

where x = (x1 , . . . , xn ) and a = (a1 , . . . , an ), or equivalently, ∀ε > 0, ∃η > 0 such that



n
X ∂f
f (x) − f (a) − (a)(xi − ai ) ≤ ε||x − a|| whenever ||x − a|| ≤ η.

∂xi
i=1
Qn
Here we equip i=1 Ei with the norm ||x|| = maxni=1 ||xi || but any any other equivalent norm would
∂f
do as well. Let ε > 0 be given. For every i = 1, . . . , n, continuity of ∂x i
at a dictates that there
44 CHAPTER 3. THE MEAN VALUE THEOREM

exists ηi > 0 such that B(a, ηi ) ⊂ U and



∂f ∂f ε
∂xi (x) − ∂xi (a) ≤ n whenever ||x − a|| ≤ ηi . (3.2)

Our η will be η = mini=1,...,n ηi . Note that B(a, η) ⊂ B(a, ηi ) ⊂ U . Let x ∈ B(a, η). We can
write
n
X ∂f
f (x) − f (a) − (a)(xi − ai )
∂xi
i=1
∂f
= f (x1 , . . . , xn ) − f (a1 , x2 , . . . , xn ) − (a)(x1 − a1 )+
∂x1
∂f
+ f (a1 , x2 , . . . , xn ) − f (a1 , a2 , x3 , . . . , xn ) − (a)(x2 − a2 )+
∂x2
+ ···+
∂f
+ f (a1 , . . . , an−1 , xn ) − f (a1 , . . . , an ) − (a)(xn − an ). (3.3)
∂xn
Set for ζ1 ∈ B(a1 , η),
∂f
g1 (ζ1 ) = f (ζ1 , x2 . . . , xn ) − f (a1 , x2 , . . . , xn ) − (a)(ζ1 − a1 ).
∂x1
Then, first, g is well defined because (ζ1 , x2 , . . . , xn ) and (a1 , x2 , . . . , xn ) belong to the ball B(a, η)
which is contained in U where f is defined. Second, g1 (a1 ) = 0. Third, the first term in equation
(3.3) above is g1 (x1 ) = g1 (x1 ) − g1 (a1 ). And fourth,
∂f ∂f
g10 (ζ1 ) = (ζ1 , x2 , . . . , xn ) − (a1 , a2 , . . . , an ).
∂x1 ∂x1
Since (ζ1 , x2 , . . . , xn ) ∈ B(a, η) ⊂ B(a, η1 ), it follows from the continuity condition (3.2) that

0
∂f ∂f ε
||g1 (ζ1 )|| =
(ζ1 , x2 , . . . , xn ) − (a1 , a2 , . . . , an ) ≤ .
∂x1 ∂x1 n
The mean value theorem version 3.3. - applied to the ball B(a1 , η) - implies that
ε
||g1 (x1 )|| = ||g1 (x1 ) − g1 (a1 )|| ≤ ||x1 − a1 ||.
n
Similarly, we set for ζ2 ∈ B(a2 , η),
∂f
g2 (ζ2 ) = f (a1 , ζ2 , x3 , . . . , xn ) − f (a1 , a2 , x3 , . . . , xn ) − (a)(ζ2 − a2 ).
∂x2
Then, first, g2 is well defined. Second, g2 (a2 ) = 0. Third, the second term in equation (3.3) is
g2 (x2 ) = g2 (x2 ) − g2 (a2 ). And fourth,
∂f ∂f
g20 (ζ2 ) = (a1 , ζ2 , x3 , . . . , xn ) − (a1 , a2 , . . . , an ).
∂x2 ∂x2
Since (a1 , ζ2 , x3 , . . . , xn ) ∈ B(a, η) ⊂ B(a, η2 ), it follows from the continuity condition (3.2) that

0
∂f ∂f ε
||g2 (ζ2 )|| = (a1 , ζ2 , x3 . . . , xn ) − ≤ n.
(a1 , a2 , . . . , an )
∂x2 ∂x2
The mean value theorem implies that
ε
||g2 (x2 )|| = ||g2 (x2 ) − g2 (a2 )|| ≤ ||x2 − a2 ||.
n
3.4. CONVERGENCE OF A SEQUENCE OF DIFFERENTIABLE MAPS 45

We continue in this way, and we set ultimately for ζn ∈ B(an , η),


∂f
gn (ζn ) = f (a1 , a2 , . . . , an−1 , ζn ) − f (a1 , a2 , . . . , an ) − (a)(ζn − an ).
∂xn
Then, first, g is well defined. Second, gn (an ) = 0. Third, the last term in equation (3.3) is
gn (xn ) = gn (xn ) − gn (an ). And fourth,
∂f ∂f
gn0 (ζn ) = (a1 , a2 , . . . , an−1 , ζn ) − (a1 , a2 , . . . , an ).
∂xn ∂xn
Since (a1 , a2 , . . . , an−1 , ζn ) ∈ B(a, η) ⊂ B(a, ηn ), it follows from the continuity condition (3.2)
that
0
∂f ∂f ε
||gn (ζn )|| = (a1 , a2 , . . . , an−1 , ζn ) − (a1 , a2 , . . . , an ) ≤
∂xn ∂xn n
The mean value theorem implies that
ε
||gn (xn )|| = ||gn (xn ) − gn (an )|| ≤ ||xn − an ||.
n
Thus, taking norms in equation (3.3), and using the triangle inequality, we get
n

X ∂f ε ε
f (x) − f (a) − (a)(xi − ai ) ≤ ||x1 − a1 || + · · · + ||xn − an || ≤ ε||x − a||. 

∂xi n n
i=1

3.4 Convergence of a sequence of differentiable maps


Questions
ˆ Suppose that we have a sequence of differentiable maps fn : U ⊂ E → F . If (fn ) converges
pointwise (or uniformly) to a limit map f , does it follow that f is differentiable?
»
The answer is no. pause Take for example the sequence fn (x) = x2 + n1 for x ∈ R. Then
(fn ) converges pointwise to f (x) = |x| which is not differentiable at 0. Actually, convergence
is uniform (prove that).

ˆ Suppose now that the limit f is differentiable. Does it follow that the sequence of derivatives
fn0 converges to f 0 ?

The answer is again no. Take for example the sequence fn (x) = sinnnx . Then (fn ) converges
uniformly to the zero function. However, fn0 (x) = cos nx does not converge at all.

ˆ What about the converse? What can we say if fn0 converges uniformly? The answer is given
by the next two theorems.

Theorem 3.4. Let fn : U ⊂ E → F be a sequence of differentiable maps. Assume that


(i) U is convex.

(ii) F is complete.

(iii) For some point a ∈ U , the sequence fn (a) converges to a limit (this happens if fn converges
pointwise).

(iv) fn0 converges uniformly on U to a map g : U → L(E, F ).


Then,
46 CHAPTER 3. THE MEAN VALUE THEOREM

(a) fn converges pointwise to a map f : U → F ,

(b) convergence is uniform on the bounded subsets of U ,

(c) f is differentiable and f 0 = g.

Remark. If instead of (iii) we assume that (fn ) is pointwise convergent on U , then, assumption
(ii) about completeness of F is not needed.

Theorem 3.5. In the above theorem, let us replace conditions (i) and (iv) by the following weaker
assumptions.
(i’) U is connected.
(iv’) For every x0 ∈ U , there exists a ball B(x0 , r) on which fn0 converges uniformly (we say
that fn0 is locally uniformly convergent).
Then,

(a) fn converges pointwise to a map f : U → F ,

(b) convergence is locally uniform,

(c) f is differentiable and f 0 = g.

Proof of Theorem 3.4. It follows from the mean value theorem version 3.3 (where convexity is
assumed), that for every x ∈ U ,

kfn (x) − fn (a) − (fm (x) − fm (a))k ≤ ||x − a|| sup fn0 (ζ) − fm
0

(ζ) . (3.4)
ζ∈U

This implies that 1) fn (x) is a Cauchy sequence in F and 2) this happens in a uniform way on
bounded subsets of U . Indeed, let ε > 0 be given. Assumption (iv) means that there exists n0 ∈ N
such that
1
sup ||fn0 (ζ) − g(ζ)|| ≤ ε ∀n ≥ n0 .
ζ∈U 2
The triangle inequality implies that

sup ||fn0 (ζ) − fm


0
(ζ)|| ≤ ε ∀n, m ≥ n0
ζ∈U

(fn0 is a uniform Cauchy sequence). It follows from inequality (3.4) that for every x ∈ U and every
n, m ≥ n0 ,
kfn (x) − fn (a) − (fm (x) − fm (a))k ≤ ε||x − a||.
Assumption (iii) implies that ||fn (a) − fm (a)|| ≤ ε for n, m large enough say for n, m ≥ n1 . Let
n, m ≥ n2 := max(n0 , n1 ). By the triangle inequality, we get

kfn (x) − fm (x)k ≤ ||fn (a) − fm (a)|| + kfn (x) − fn (a) − (fm (x) − fm (a))k
≤ ε + ε||x − a||. (3.5)

Since ε was arbitrary, this means that for every x ∈ U , fn (x) is a Cauchy sequence in F .
Completeness of F implies that fn (x) has a limit that we denote by f (x). This defines a map
f : U → F and proves conclusion (a).
To prove (b), fix, ε, a and n in inequality (3.5) and let m → ∞. We get

kfn (x) − f (x)k ≤ ε(1 + ||x − a||).


3.4. CONVERGENCE OF A SEQUENCE OF DIFFERENTIABLE MAPS 47

If x varies in a bounded subset of B ⊂ U , then ||x − a|| ≤ M for some constant M The previous
inequality implies that
kfn (x) − f (x)k ≤ ε(1 + M ).
and this holds for every x ∈ B. Since ε was arbitrary, this means that (fn ) converges uniformly to
f in B.
Now we prove conclusion (c) by showing that for every x0 ∈ U ,

f (x0 + h) − f (x0 ) − g(x0 )h = o(h) near h = 0.

By the triangle inequality

kf (x0 + h) − f (x0 ) − g(x0 )hk ≤||f (x0 + h) − f (x0 ) − (fn (x0 + h) − fn (x0 ))||+
+ kfn (x0 + h) − fn (x0 ) − fn0 (x0 )hk+
+ ||fn0 (x0 )h − g(x0 )h||.

By the mean value theorem version 3.3., we have


0
||fm (x0 + h) − fm (x0 ) − (fn (x0 + h) − fn (x0 ))|| ≤ ||h|| sup ||fm (ζ) − fn0 (ζ)|| ≤ ε||h||
ζ∈U

for all n, m ≥ n0 . Fixing everything and letting m → ∞ we get for n ≥ n0

||f (x0 + h) − f (x0 ) − (fn (x0 + h) − fn (x0 ))|| ≤ ε||h||.

For n ≥ n0 , we have

||fn0 (x0 )h − g(x0 )h|| ≤ ||fn0 (x0 ) − g(x0 )||||h|| ≤ ε||h||

Fix now an integer n ≥ n0 . Differentiability of fn at x0 implies that there exists δ > 0 such that

kfn (x0 + h) − fn (x0 ) − fn0 (x0 )hk ≤ ε||h|| for ||h|| ≤ δ.

Putting everything together, we get

kf (x0 + h) − f (x0 ) − g(x0 )hk ≤ 3ε||h|| for ||h|| ≤ δ. 

Proof of Theorem 3.5. Let A denote the set of points x ∈ U such that fn (x) converges. The
plan for proving the first conclusion is to prove the following points.

1. A 6= ∅.

2. A is open in U .

3. A is closed in U .

Connectedness of U dictates that A = U and so (fn ) is pointwise convergent. This is what we call
a connectedness argument.
1. Assumption (iii) means that a ∈ A and so A 6= ∅.
2. Let x0 ∈ A. Since U is open, there exists a ball B(x0 , r) ⊂ U . We apply the previous theorem
to the open convex set B(x0 , r) with a replaced by x0 . Accordingly, (fn ) converges uniformly on
B(x0 , r) and in particular pointwise on B(x0 , r). This means that B(x0 , r) ⊂ A and so A is open.
U
3. Let x ∈ A = A ∩ U (the closure of A in U ). Since x ∈ U and U is open, then again, there
exists a ball B(x, r) ⊂ U . From the characterization of the closure of a set, B(x, r) ∩ A 6= ∅.
Choose y ∈ B(x, r) ∩ A. Then fn (y) is convergent. We apply the previous theorem to the open
convex set B(x, r) with a replaced by y. Accordingly, (fn ) converges uniformly on B(x, r) and in
48 CHAPTER 3. THE MEAN VALUE THEOREM

particular pointwise on B(x, r). In particular fn (x) is convergent and so x ∈ A. This proves that
A is closed.
(b) Convergence of (fn ) on U defines a map f : U → F to which (fn ) converges pointwise. But
the proof of point 2. above shows that convergence is uniform on any ball B(x0 , r) ⊂ U .
(c) Let x0 ∈ U and let B(x0 , r) ⊂ U . We apply the previous theorem to the open convex set
B(x0 , r). Accordingly, f is differentiable ad f 0 = g where g is the limit of (fn0 ). 
Remark. If we assume that (fn ) is pointwise convergent on U , then the connectedness assumption
and the completeness assumption are not needed.

Exercise 1. Give an example of a sequence of functions that converges locally uniformly but not
globally uniformly.

Exercise 2. Let U be a subset of a finite dimensional normed space E and let fn : U → F be a


sequence of maps. Show that the following conditions are equivalent.
(i) (fn ) converges locally uniformly on U .
(ii) (fn ) converges uniformly on the compact subsets of U .
(iii) (fn ) converges uniformly on the bounded subsets of U .
What happens if E is infinite dimensional?

Corollary 3.3. Let un : U ⊂ E → F be a sequence of differentiable maps. Assume that

(i) U is connected.
(ii) F is complete.
P P
(iii) For some point a ∈ U , the series n≥1 un (a) is convergent (this happens if n≥1 un converges
pointwise).
(iv) The series of derivatives n≥1 u0n is locally uniformly convergent.
P

Then,
P
(a) the series n≥1 un is pointwise convergent on U ,
(b) convergence is locally uniform on U ,
P
(c) n≥1 un is differentiable and
Ñ é0
X X
un = u0n (differentation term by term).
n≥1 n≥1

Remark 1. Without any assumption, the above conclusion may fail. Here’s an example. In the
theory of Fourier series it is proved that

X sin nx π−x
= for 0 < x < 2π,
n 2
n=1

convergence being pointwise in ]0, 2π[ and uniform on every compact subset of ]0, 2π[. If we
differentiate term by term we get

X 1
cos nx = − for 0 < x < 2π,
2
n=1
3.5. STRICTLY DIFFERENTIABLE MAPS 49

which does not make sense unless we redefine the notion of convergence.
Remark 2. Convergent power series of a complex variable can be differentiated term by term
inside their disc of convergence.
Example/Exercise. Set for x > 1

X 1
f (x) = .
nx
n=1

We know actually know that this series is convergent. Show that f is differentiable on ]1, +∞[.

Remark. f is the restriction of the celebrated Riemann zeta function to ]1, +∞[. The Riemann
zeta function is a holomorphic function on C\{1} and it plays an important role in complex variables
theory and in number theory. The celebrated Riemann hypothesis sates that all the nontrivial zeros
of the zeta function lye on the line Re z = 21 .

3.5 Strictly differentiable maps


Questions
Let us revisit one more time the notion of derivative of a function f : I ⊂ R → R. f is differentiable
at a point a ∈ I if the following limit exists

f (x) − f (a)
lim . (3.6)
x→a x−a
What about the limit
f (x) − f (y)
lim ? (3.7)
(x,y)→(a,a) x−y
If the limit in (3.7) exists, then the limit in (3.6) exists as well because we can take y = a in (3.7).
What about the converse? The converse is not true as showed by the following example.
Example. Let ®
x2 sin x1 if x 6= 0
f (x) =
0 if x = 0.
Then f is differentiable at 0 and f 0 (0) = 0 because

f (x) − f (0) 1
= x sin → 0 as x → 0.
x−0 x
However, the limit in (3.7) does not exist (at the point 0). Indeed, if not then for any two sequences
(xn ) and (yn ) converging to 0 the limit

f (xn ) − f (yn )
lim should exist.
n→∞ x n − yn
1 1
However, take xn = π and yn = π .
+ nπ
2 2 + (n + 1)π
Using the fact that sin x1n = sin( π2 + nπ) = (−1)n , we get after some algebraic simplifications

3 1
ñ ô
f (xn ) − f (yn ) (−1)n n + 2 n+ 2
= 1 + 3 ,
xn − yn π n+ 2 n+ 2
π
and this quantity has no limit as n → ∞. If you want to check that it is useful to set sn = 2 + nπ.
50 CHAPTER 3. THE MEAN VALUE THEOREM

This example leads us to a stronger condition of differentiability that we call strict differentiability
or strong differentiability.
Definition. Let f : I ⊂ R → R be a map defined on an open subset of I of R. We say that f is
strictly (or strongly) differentiable at a point a ∈ I if the following limit exists

f (x) − f (y)
lim .
(x,y)→(a,a) x−y

This condition is equivalent to the following one

∃` ∈ R, f (x) − f (y) − `(x − y) = o(x − y) as x, y → a.

This means that

∃` ∈ R, ∀ε > 0, ∃δ > 0 s.t |x − a| + |y − a| ≤ δ ⇒ |f (x) − f (y) − `(x − y)| ≤ ε|x − y|.

And now we can extend this definition to maps between normed spaces.
Definition. Let E and F be normed spaces, let U ⊂ E be open, a ∈ U , and let f : U → F be a
map. We say that f is strictly differentiable in the sense of Fréchet (or strictly Fréchet differentiable)
at the point a if there exists L ∈ L(E, F ) such that

f (x) − f (y) − L(x − y) = o(x − y) for x, y near a,

or equivalently if there exists L ∈ L(E, F )

∀ε > 0, ∃δ > 0 s.t ||x − a|| + ||y − a|| ≤ δ ⇒ ||f (x) − f (y) − L(x − y)|| ≤ ε||x − y||.

This condition can also be written as

∀ε > 0, ∃δ > 0 s.t x, y ∈ B 0 (a, δ) ⇒ ||f (x) − f (y) − L(x − y)|| ≤ ε||x − y||.

Remark. It should be clear that a strictly Fréchet differentiable map is Fréchet differentiable
(take y = a in the above condition) and the Fréchet derivative is the operator L in the definition.
However, as the previous example showed, the converse is not true. But as the next theorem shows,
the notion of strict differentiability is not that mysterious.

Theorem 3.6. Let f : U ⊂ E → F be Fréchet differentiable and let a ∈ U . Then f is strictly


Fréchet differentiable at a if and only if f 0 is continuous at a.
Before we prove this theorem, let us establish a useful result which is the converse of the mean
value theorem version 3.3.

Lemma 3.1. Let f : U ⊂ E → F be Fréchet differentiable. If f is M −Lipschitz continuous, i.e.,

||f (x) − f (y)|| ≤ M ||x − y|| ∀x, y ∈ U,

then ||f 0 (x)|| ≤ M for all x ∈ U .


Proof of the lemma. Let x ∈ U and h ∈ E. We know from the appendix of chapter 2 that
f (x + th) − f (x)
f 0 (x)h = lim .
t→0 t
In particular (since the norm is continuous),

0
f (x + th) − f (x)
||f (x)h|| = lim
.
t→0 t
3.5. STRICTLY DIFFERENTIABLE MAPS 51

Since as usual U is open x + th ∈ U for all t small enough. It follows from the Lipschitz condition
that
||f (x + th) − f (x)|| ≤ M ||th|| = M |t|||h||.
It follows that (for all t small enough).

f (x + th) − f (x)
≤ M ||h||.
t

Since taking limits conserves large inequalities, we get

||f 0 (x)h|| ≤ M ||h||.

Since h was arbitrary this proves that ||f 0 (x)|| ≤ M . 


Proof of Theorem 3.6. Set for ζ ∈ U ,

g(ζ) = f (ζ) − f 0 (a)ζ.

Then

1. g 0 (ζ) = f 0 (ζ) − f 0 (a).

2. g(x) − g(y) = f (x) − f (y) − f 0 (a)(x − y).

Suppose first that f is strictly Fréchet differentiable at a. Let ε > 0 be given. Strict Fréchet
differentiability of f at a implies that

||g(x) − g(y)|| ≤ ε||x − y||

for all x, y in some ball B(a, δ). It follows from the previous lemma that ||g 0 (ζ)|| ≤ ε for all
ζ ∈ B(a, δ). Thus, ||f 0 (ζ) − f 0 (a)|| ≤ ε for all ζ ∈ B(a, δ), and this is continuity of f 0 at a.
Suppose now conversely that f 0 is continuous at a. Continuity of f 0 at a implies that ||g 0 (ζ)|| ≤ ε
for all ζ in some ball B(a, δ) around a. The mean value theorem version 3.3. implies that

||g(x) − g(y)|| ≤ ε||x − y|| ∀x, y ∈ B(a, δ).

This means that

||f (x) − f (y) − f 0 (a)(x − y)|| ≤ ε||x − y|| ∀x, y ∈ B(a, δ).

This is precisely the strictly Fréchet differentiability of f at a. 

Remark. In our starting example we have


®
0 2x sin x1 − cos x1 if x 6= 0
f (x) =
0 if x = 0.

It follows that f 0 has no limit as x → 0 and so it is not continuous at 0. This explains why f fails
to be strictly differentiable at 0.
Corollary 3.4. Let f : U ⊂ E → F be Fréchet differentiable. Then f is of class C 1 on U if and
only if f is strictly Fréchet differentiable on U (which means that f is strictly Fréchet differentiable
at every point of U ).
52 CHAPTER 3. THE MEAN VALUE THEOREM

3.6 More general versions of the mean value theorem


The key to proving the mean value theorem in version 3.1., 3.2. and 3.3. was the following version.
The mean value theorem, version 3.0. Let F be a normed space and let f : [a, b] ⊂ R → F
be continuous and differentiable on ]a, b[. If there is a constant M such that ||f 0 (x)|| ≤ M for all
x ∈]a, b[. Then
||f (b) − f (a)|| ≤ M (b − a).
It turns out that this form can still be extended. For this, observe the following. If g(x) = M x then
g 0 (x) = M and M (b − a) = g(b) − g(a). So version 3.0 is a particular case of the following version.

The mean value theorem, version 3.4. Let F be a normed space. Consider two maps f : [a, b] ⊂
R → F and g : [a, b] ⊂ R → R that continuous and differentiable on ]a, b[. If ||f 0 (x)|| ≤ g 0 (x) for
all x ∈]a, b[. Then
||f (b) − f (a)|| ≤ g(b) − g(a).
Again this version is a particular case of a still more general version where f and g are assumed to
be right differentiable instead of differentiable.
Definition. A map f : [a, b[→ F is said to be right (Fréchet) differentiable at a point x0 ∈ [a, b[ is
the following limit exists
f (x0 + h) − f (x0 )
lim .
h→0,h>0 h
If it exists this limit is denoted by fr0 (x0 ) (in French fd0 (x0 )). We have a similar definition for left
differentiability.

The mean value theorem, version 3.5. Let F be a normed space. Consider two maps f :
[a, b] ⊂ R → F and g : [a, b] ⊂ R → R that are continuous and right differentiable on [a, b[. If
||fr0 (x)|| ≤ gr0 (x) for all x ∈]a, b[. Then

||f (b) − f (a)|| ≤ g(b) − g(a).

If you are interested you can prove this version along the same lines of the proof of version 3.0.
(or you can read it in the book of Cartan).
We can push the generalization further and prove the following version. See the book of Cartan.

The mean value theorem, version 3.6. Let F be a normed space. Consider two continuous
maps f : [a, b] ⊂ R → F and g : [a, b] ⊂ R → R. Suppose that f and g are right differentiable on
[a, b[\D where D is countable subset of [a, b[. If ||fr0 (x)|| ≤ gr0 (x) for all x ∈]a, b[\D. Then

||f (b) − f (a)|| ≤ g(b) − g(a).

In this general form (due to Cartan), this theorem is useful in the theory of the Bochner integral.
But this is outside the scope of this course and outside our undergraduate curriculum.
Chapter 4

Higher order derivatives

4.1 Definitions, examples and basic properties


Let f : U ⊂ E → F be Fréchet differentiable. Then we have a map

f 0 : U → L(E, F ).

If this map is Fréchet differentiable at a point a ∈ U , we say that f is twice Fréchet differentiable at a.
In this case, (f 0 )0 (a) is a linear bounded map from E to L(E, F ), that is, (f 0 )0 (a) ∈ L(E, L(E, F )).
This element is called the second order derivative (or differential) of f at the point a and it is
denoted by
f 00 (a) or D2 f (a).
If h, k ∈ E, then f 00 (a)h ∈ L(E, F ) and so (f 00 (a)h)k ∈ F . In chapter 1, we agreed to identify
L(E, L(E, F )) with L2 (E, F ) and this consists of considering f 00 (a) as a bilinear map from E × E
into F . Therefore we will write f 00 (a)(h, k) instead of (f 00 (a)h)k. If f 00 (a) exists at every point
a ∈ U , we say that f is twice Fréchet differentiable on U , and in this case, we have a map

f 00 : U → L2 (E, F ).

We say that f is of class C 2 on U (or twice continuously differentiable) if f 00 is continuous.

Remark. Without assuming that f is differentiable on U , we say more generally, that f is twice
Fréchet differentiable at a ∈ U if f is Fréchet differentiable on a open neighborhood V of a and the
map f : V → L(E, F ) is differentiable at a.

Example 1. Let f : U ⊂ R → R be twice differentiable at a ∈ U . Then

f 00 (a) ∈ L(R, L(R, R)) ∼


= L(R, R) ∼
= R.

This amounts to identifying the bilinear map f 00 (a) with the number f 00 (a)(1, 1) (every bilinear map
B : R × R → R is necessarily of the form B(x, y) = cxy where c = B(1, 1)).
More generally if f : U ⊂ R → F is twice differentiable at a ∈ U , then

f 00 (a) ∈ L(R, L(R, F )) ∼


= L(R, F ) ∼
= F.

This amounts to identifying the bilinear map f 00 (a) with the vector f 00 (a)(1, 1) (every bilinear map
B : R × R → F is necessarily of the form B(x, y) = xya where a = B(1, 1)).

Example 2. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let us show that f is
C 2 . We know that f is Fréchet differentiable on E and f 0 (x)h = 2hx, hi. Therefore, we can write
 0
f (x + k) − f 0 (x) h = f 0 (x + k)h − f 0 (x)h = 2hx + k, hi − 2hx, hi = 2hk, hi.


53
54 CHAPTER 4. HIGHER ORDER DERIVATIVES

You should recognize that the bilinear (bounded) term is 2hk, hi. Let L denote this map, i.e,
(Lk)h = 2hk, hi. Then we can write
f 0 (x + k) − f 0 (x) − Lk h = 0.


Since this is true for every h ∈ E, we get


f 0 (x + k) − f 0 (x) − L = 0 = o(k).
This means that f 0 is differentiable at x and (f 0 )0 (x) = f 00 (x) = L. Since f 00 (x) does not depend
on x, f 00 is constant and therefore continuous. Thus, f is of class C 2 . The spaces that f 00 (x) acts
between are given by the following diagram.
f 00 (x) : E −→ L(E, R)
k 7−→ 2hk, ·i : E −→ R
h 7−→ 2hk, hi.
Finally observe that (f 00 (x)k)h = (f 00 (x)h)k, or equivalently f 00 (x)(k, h) = f 00 (x)(h, k). As we shall
see, this property is not a coincidence.

Let us establish now a useful rule for second order differentiation. Let f : U ⊂ E → F be
differentiable. For x ∈ U and h ∈ E, the term f 0 (x)h ∈ F depends on two variables x and h. This
defines a map g : U × E → F given by
g(x, h) = f 0 (x)h
and the partial map g(x, ·) = f 0 (x) is linear and bounded. Now observe that differentiability of f 0
at a point a ∈ U is equivalent to the partial differentiability of g with respect to x at the point a
(write the details). In this case, when we differentiate this relation with respect to x and put x = a,
we get
∂g
(a, ·) = f 00 (a)
∂x
which means that
∂g
(a, ·)k = f 00 (a)k, ∀k ∈ E,
∂x
which means that
∂g
(a, h)k = (f 00 (a)k)h ∀k ∈ E, ∀h ∈ E.
∂x
This equation can be written as
∂g
f 00 (a)(k, h) = (f 00 (a)k)h = (a, h)k ∀h, k ∈ E. (4.1)
∂x
Example 1. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let g(x, h) = 2hx, hi.
Then f 0 (x)h = g(x, h). Since g is linear and bounded in the first variable, we know from chapter 2
that g is differentiable with respect to x and
∂g
(x, h)k = g(k, h) = 2hk, hi.
∂x
Therefore f is differentiable and
f 00 (x)(k, h) = 2hk, hi.

Example 2. Let f : U ⊂ ni=1 Ei → F be a map of n variables which is twice differentiable at a


Q
point a ∈ U . We would like to find a relation between f 00 (a) and the second order
Q partial derivatives
of f at a. We know from Chapter 2 that for x ∈ U and h = (h1 , . . . , hn ) ∈ ni=1 Ei , we have
n
X ∂f
f 0 (x)(h1 , . . . , hn ) = (x)hi . (4.2)
∂xi
i=1
4.1. DEFINITIONS, EXAMPLES AND BASIC PROPERTIES 55

∂ Pn
Using the rule above and interchanging ∂xj and i=1 , we can write
n ï
∂f 0
Å ã Å ã ò
X ∂ ∂f
(a)kj (h1 , . . . , hn ) = (a)kj hi .
∂xj ∂xj ∂xi
i=1

As you expect, we set

∂2f ∂2f ∂2f


Å ã
∂ ∂f
(a) := (a) and (a) = (a).
∂xj ∂xi ∂xj ∂xi ∂x2i ∂xi ∂xi
∂2f
Instead of ∂xj ∂xi (a), we can write Dj Di f (a) or ∂j ∂i f (a). Therefore we can write
n ï n
∂f 0 ∂2f ∂2f
Å ã X ò X
(a)kj (h1 , . . . , hn ) = (a)kj hi = (a)(kj , hi ).
∂xj ∂xj ∂xi ∂xj ∂xi
i=1 i=1

Applying equation (4.2) to f 0 , we get


n
X ∂f 0
f 00 (a)(k1 , . . . , kn ) = (a)kj .
∂xj
j=1

and therefore
Ñ é
n
X ∂f 0
(f 00 (a)(k1 , . . . , kn ))(h1 , . . . , hn ) = (a)kj (h1 , . . . , hn )
∂xj
j=1
n X
n
X ∂2f
= (a)(kj , hi ).
∂xj ∂xi
j=1 i=1

This relation can be written in the following matrix form


≤Ü ∂ 2 f ∂2f
êÖ èº
(a) ··· ∂x1 ∂xn (a)
è Ö
∂x21 k1 h1
00
f (a)(k, h) = .
.. .. .. .. , .. .
. . . .
2
∂ f ∂2f
··· kn hn
∂xn ∂x1 (a) ∂x2n
(a)

The matrix above is called the Hessian matrix of f at a.


∂2f
Remark. Note that if i = j, then the bilinear map ∂x2i
(a) is symmetric (see the next proposition).
2
However, if i 6= j, it makes no sense in general to say that ∂x∂i ∂x
f
j
(a) is symmetric because ki ∈ Ei
and hj ∈ Ej and these spaces may be different. Remember that

∂2f
(a) : Ei × Ej → F
∂xi ∂xj
and
∂2f
(a) : Ej × Ei → F.
∂xj ∂xi

Proposition 4.1. Let f : U ⊂ E → F be twice Fréchet differentiable at a ∈ U . Then the bilinear


map f 00 (a) is symmetric, that is,

f 00 (a)(h, k) = f 00 (a)(k, h) ∀h, k ∈ E.

For the proof we need a useful lemma similar to Lemma 2.1.


56 CHAPTER 4. HIGHER ORDER DERIVATIVES

Lemma 4.1. Let S : E1 × E2 → F be a bilinear map. If S(h, k) = o((||h|| + ||k||)2 ) near (0, 0),
then, S = 0.
Proof of the lemma. Let ε > 0 be given. Our assumption implies that there exists δ > 0 such
that
||S(h, k)|| ≤ ε (||h|| + ||k||)2 whenever ||(h, k)|| ≤ δ.
Here we take ||(h, k)|| = max(||h||, ||k||) but any other equivalent norm on E1 × E2 would do as
well.
Consider two arbitrary elements h ∈ E1 and k ∈ E2 . Take a positive number t such that
||(th, tk)|| ≤ δ. Then, according to the above condition

kS (th, tk)k ≤ ε (||th|| + ||tk||)2 .

Using homogeneity of the norm and double homogeneity of S we get

t2 kS (h, k)k ≤ εt2 (||h|| + ||k||)2 .

Dividing by t2 , we get
kS (h, k)k ≤ ε (||h|| + ||k||)2 .
Since (h, k) was arbitrary in E1 × E2 , this implies that S is bounded and ||S|| ≤ 4ε. Since ε was
arbitrary, we conclude that ||S|| = 0 and so S = 0. 
Proof of Proposition 4.1. Set

A(h, k) = f (a + h + k) − f (a + h) − f (a + k) + f (a).

and note that A(h, k) = A(k, h). We claim that for ||(h, k)|| small enough

A(h, k) − f 00 (a)(h, k) = o((||h|| + ||k||)2 ).

We prove the claim. By the triangle inequality we can write

||A(h, k) − f 00 (a)(h, k)|| ≤ ||A(h, k) − f 0 (a + h)k + f 0 (a)k||


+ ||f 0 (a + h)k − f 0 (a)k − f 00 (a)(h, k)|| (4.3)

Let ε > 0 be given. The second term of (4.3) is easy to estimate

||f 0 (a + h)k − f 0 (a)k − f 00 (a)(h, k)|| = ||f 0 (a + h)k − f 0 (a)k − (f 00 (a)h)k||


≤ ||f 0 (a + h) − f 0 (a) − f 00 (a)h|| ||k||.

Differentiability of f 0 at a dictates that there exists δ > 0 such that

||f 0 (a + h) − f 0 (a) − f 00 (a)h|| ≤ ε||h|| whenever ||h|| ≤ δ

and so
||f 0 (a + h)k − f 0 (a)k − f 00 (a)(h, k)|| ≤ ε||h|| ||k|| whenever ||h|| ≤ δ. (4.4)
To estimate the first term of equation (4.3), we set for k small enough

B(k) = f (a + h + k) − f (a + k) − f 0 (a + h)k + f 0 (a)k.

Then B(0) = f (a + h) − f (a) and we can write the first term as

A(h, k) − f 0 (a + h)k + f 0 (a)k = B(k) − B(0).


4.1. DEFINITIONS, EXAMPLES AND BASIC PROPERTIES 57

Applying the mean value theorem version 3.3 to a small enough ball of center 0 containing k, we
get
||B(k) − B(0)|| ≤ ||k|| sup ||B 0 (tk)||.
0≤t≤1

But
B 0 (k) = f 0 (a + h + k) − f 0 (a + k) − f 0 (a + h) + f 0 (a),
and so
B 0 (tk) = f 0 (a + h + tk) − f 0 (a + tk) − f 0 (a + h) + f 0 (a)
= f 0 (a + h + tk) − f 0 (a) − f 00 (a)(h + tk)
− f 0 (a + tk) − f 0 (a) − f 00 (a)(tk)
 

− f 0 (a + h) − f 0 (a) − f 00 (a)h) .
 

Let ||(h, k)|| ≤ 2δ . Then, by the differentiability of f 0 at a, we get

||f 0 (a + h + tk) − f 0 (a) − f 00 (a)(h + tk)|| ≤ ε||h + tk|| ≤ ε(||h|| + ||k||)

and
||f 0 (a + tk) − f 0 (a) − f 00 (a)(tk)|| ≤ ε||tk|| ≤ ε||k||
and
||f 0 (a + h) − f 0 (a) − f 00 (a)h)|| ≤ ε||h||.
It follows from the triangle inequality that

||B 0 (tk)|| ≤ 2ε(||h|| + ||k||).

Thus, the first term in (4.3) is bounded by

||A(h, k) − f 0 (a + h)k + f 0 (a)k|| ≤ 2ε||k|| (||h|| + ||k||) ≤ 2ε(||h|| + ||k||)2 .

Recalling (4.4), we have

||f 0 (a + h)k − f 0 (a)k − f 00 (a)(h, k)|| ≤ ε||h|| ||k|| ≤ ε(||h|| + ||k||)2

Thus finally,
||A(h, k) − f 00 (a)(h, k)|| ≤ 3ε(||h|| + ||k||)2 .
This proves the claim. Interchanging the roles of h and k, we get

||A(k, h) − f 00 (a)(k, h)|| ≤ 3ε(||k|| + ||h||)2 .

By the triangle inequality

||f 00 (a)(h, k) − f 00 (a)(k, h)|| ≤ 6ε(||h|| + ||k||)2 .

It follows from the lemma that f 00 (a)(h, k) − f 00 (a)(k, h) = 0 for all h, k ∈ E. 


Qn
Exercise. Let f : U ⊂ i=1 Ei → F be twice Fréchet differentiable at a point a ∈ U .
(a) Show that
n X
n n X
n
X ∂2f X ∂2f
(a)(kj , hi ) = (a)(hi , kj ).
∂xj ∂xi ∂xi ∂xj
j=1 i=1 i=1 j=1

(b) Show that for every i, j = 1, . . . , n, u ∈ Ei , v ∈ Ej , we have

∂2f ∂2f
(a)(v, u) = (a)(u, v).
∂xj ∂xi ∂xi ∂xj
58 CHAPTER 4. HIGHER ORDER DERIVATIVES

(c) Deduce that if f : U ⊂ Rn → F is twice differentiable at a ∈ U , then

∂2f ∂2f
(a) = (a) ∈ F.
∂xj ∂xi ∂xi ∂xj

This is a generalization of Schwarz’s theorem on mixed partial derivatives, known also as Clairaut’s
theorem. It is usually proved under the assumption that f is of class C 2 or that the mixed partial
derivatives are continuous.
(d) Challenging question. If f : U ⊂ E × E → F is twice differentiable at a point a ∈ U , does
∂2f
it follow that the bilinear map ∂x∂y (a) is symmetric?

Having understood second order derivatives, let us move to derivatives of arbitrary order.

Let f : U ⊂ E → F be twice differentiable and let a ∈ U . If f 00 is differentiable at a, its


derivative (f 00 )0 (a) is denoted by f 000 (a) or f (3) (a). In this case, we say that f is three times
differentiable at a or simply that f (3) (a) exists. We say that f is three times differentiable on U if
f (3) (x) exists for all x ∈ U . In this case

f (3) : U → L(E, L2 (E; F )) ∼


= L3 (E; F ).

If this map is continuous we say that f is of class C 3 on U .


It should be clear then what do we mean by f (n) (a) and what do we mean by saying that f is
of class C n . But here’s the precise recursive definition.

Definition. Let n ≥ 2. We say that f is n times differentiable at a if f is n − 1 times differentiable


at every point of U and if the map f (n−1) : U → Ln−1 (E; F ) is differentiable at a. In this case,
the derivative of f (n−1) at a is called the nth order derivative of f at a and it is denoted by
f (n) (a). It is an element of Ln (E; F ), i.e., f (n) (a) is a multilinear bounded map from E n → F ; if
h = (h1 , . . . , hn ) ∈ E n , the image of h under f (n) (a) is denoted by

f (n) (a)h, f (n) (a)(h1 , . . . , hn ) f (n) (a) · (h1 , . . . , hn ) (Cartan’s notation)

If f (n) (x) exists at every point x ∈ U , we say that f is n times differentiable on U . If the map
f (n) : U → Ln (E; F ) is continuous, we say that f is of class C n . If f is of class C n for every
n ≥ 1, we say that f is of class C ∞ . If f is continuous we say that f is of class C 0 and we set
f (0) (x) = f (x). It is easy to see that is f is n + m times differentiable at a, then

(f (n) )(m) (a) = f (n+m) (a).

Example 1. All the elementary functions of calculus are C ∞ in the interior of their domains. This
includes, polynomials, rational functions, radicals, the exponential function, all the logarithms, the
trigonometric and inverse trigonometric functions.

Example 2/Exercise. Let f : R → R be given by


®
0 if t ≤ 0
f (t) =
t2 if t ≥ 0.

Show that f is of class C 1 but not of class C 2 .

Example 3/Exercise. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Then f is
of class C ∞ .
4.1. DEFINITIONS, EXAMPLES AND BASIC PROPERTIES 59

Example 4/Exercise. Equip E = C[0, 1] with the maximum norm and let f : E → R be defined
by Z 1
f (u) = u3 (t) dt.
0
Compute f 0 (u)h, f 00 (u)(h, k) and 000
f (u)(h, k, w) and deduce that f is of class C ∞ .

Example 5/Exercise. Let f : R → R be given by


®
0 if t ≤ 0
f (t) = −1/t
e if t > 0.

t
1

(a) Show by induction that f is n times differentiable at 0 and that f (n) (0) = 0.
(b) Deduce that f is C ∞ .
(c) What is the Taylor series of f at 0? What can you conclude?

Proposition 4.2. Let f : U ⊂ E → F be n times differentiable at a point a ∈ U (n ≥ 2). Then


the multilinear map f (n) (a) is symmetric. This means that for any permutation σ of {1, 2, . . . , n}
and any (h1 , . . . , hn ) ∈ E n we have

f (n) (a)(h1 , . . . , hn ) = f (n) (a)(hσ(1) , . . . , hσ(n) ).

Proof/Exercise. Reason by induction and use Proposition 4.1 and the fact that any permutation
is a product of transpositions interchanging two consecutive elements. 

Proposition 4.3. Let f : U ⊂ E → F be and g : V ⊂ F → G be two maps such that f (U ) ⊂ V .


This condition insures that g ◦ f is well defined and is a map from U to G. Let n ∈ N∗ .
(a) If f is n times differentiable at a point a ∈ U and g is n times differentiable at the point
b = f (a) ∈ V , then g ◦ f is n times differentiable at a.
(b) If f and g are of class C n , then g ◦ f is also of class C n .

Proof/Exercise. Reason by induction (the basis step is just the chain rule). 

Proposition 4.4. Let E and F be isomorphic Banach spaces and consider the map Φ :
Isom(E, F ) → L(F, E) defined by Φ(S) = S −1 . Then Φ is of class C ∞ .
Proof/Exercise. Reason by induction and use the previous proposition. 

Notation. Let f be a function of n variables having enough partial derivatives. Let α =


(α1 , . . . , αn ) be a n−tuple of nonnegative integers; we say that α is a multi-index and we set
|α| = α1 + · · · + αn . If we differentiate α1 times with respect to the first variable, then α2 times
with respect to the second variable and so on and αn times with respect to the last variable (at the
same point a), then we denote the resulting partial derivative by one of the following symbols

∂ |α| f
Dα f (a), (a), (∂nαn · · · ∂1α1 f ) (a), (Dnαn · · · D1α1 f ) (a).
∂xαnn · · · ∂xα1 1
60 CHAPTER 4. HIGHER ORDER DERIVATIVES

4.2 Taylor’s formulas


Definition. A ”smooth” function is a function possessing enough derivatives in a given situation.
For example, in differential geometry, a smooth function is usually C ∞ . In complex analysis, a
smooth path is usually of class C 1 .

Taylor’s formula is an approximation of a smooth function by polynomials. There are three forms
of Taylor formula that depend on the assumptions about the function and the form of the remainder:

ˆ Taylor-Young.

ˆ Taylor-Lagrange.

ˆ Taylor-Cauchy or Taylor with integral remainder.

Let us recall these formulas in the case of a function f : I → R where I is an open interval of
R.

Taylor-Young. Let f : I → R have derivatives up to order n − 1 and let x, a ∈ I. If f (n) (a) exists,
then
1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + Rn (x)
2! n!
where
Rn (x)
lim .
x→a (x − a)n

Therefore we can write for x near a,


1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + o((x − a)n )
2! n!
Or equivalently, for h small,

1 00 1
f (a + h) = f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn + o(hn ).
2! n!

Taylor-Lagrange. Let f : I → R have derivatives up to order n + 1. Let x, a ∈ I. Then

1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + Rn (x)
2! n!
where
f (n+1) (c)
Rn (x) = (x − a)n+1 for some c between a and x.
(n + 1)!
Equivalently,

1 00 1 f (n+1) (c) n+1


f (a + h) = f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn + h
2! n! (n + 1)!

Consequence. Suppose that f n+1 is bounded on some neighborhood of a and let M be an


upper bound of |f n+1 |, then

M
|Rn (x)| ≤ |h|n+1 = O(hn+1 ).
(n + 1)!
4.2. TAYLOR’S FORMULAS 61

Therefore, we can write for h small


1 00 1
f (a + h) = f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn + O(hn+1 ).
2! n!

Taylor-Cauchy. Let f : I → R be of class C n+1 and let x, a ∈ I. Then


1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + Rn (x)
2! n!
where Z x
1
Rn (x) = (x − t)n f (n+1) (t) dt.
n! a
Or equivalently,

1 a+h
Z
0 1 00 2 1 (n) n
f (a + h) = f (a) + f (a)h + f (a)h + · · · + f (a)h + (a + h − t)n f (n+1) (t) dt
2! n! n! a
hn+1 1
Z
1 1
= f (a) + f 0 (a)h + f 00 (a)h2 + · · · + f (n) (a)hn + (1 − s)n f (n+1) (a + sh) ds.
2 n! n! 0

Now we will extend theses formulas to a map f : U ⊂ E → F between normed spaces. To


simplify the writing of the Taylor’s polynomial we set

f 00 (a)h2 := f 00 (a)(h, h)

and more generally


f (n) (a)hn := f (n) (a)(h, . . . , h).
Taylor’s polynomial of order n will conserve its one dimensional form
1 00 1
f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn .
2! n!
but now a and h are elements of a normed space and the derivatives of f are multilinear bounded
maps.
More generally, if L ∈ Ln (E, F ) is symmetric, we will write

L(hn ) = Lhn = L(h, . . . , h) n variables

and
L(hn−1 )k = L(h, . . . , h, k)
Therefore Lhn−1 is a bounded linear map from E to F .

Let us establish first a useful result that generalizes that fact that the derivative of xn is nxn−1 ,
(n ∈ N∗ ).

Lemma 4.2. Let L ∈ Ln (E, F ) be symmetric and set g(h) = Lhn . Then g is differentiable and

g 0 (h)k = nL(hn−1 )k

or equivalently
g 0 (h) = nLhn−1 .

Proof. We write

L(h + k)n − Lhn = L(h + k, . . . , h + k) − L(h, . . . , h)


62 CHAPTER 4. HIGHER ORDER DERIVATIVES

Using linearity in each variable, we get

L(h + k)n − Lhn = L(k, h, . . . , h) + L(h, k, h, . . . , h) + · · · + L(h, . . . , h, k)

Since L is symmetric, we get

L(h + k)n − Lhn = nL(h, . . . , h, k) = nL(hn−1 )k.

Therefore
L(h + k)n − Lhn − nL(hn−1 )k = 0 = o(k). 

Taylor-Young’s theorem. Let f : U ⊂ E → F be n − 1 times Fréchet differentiable and let


a ∈ U . If f (n) (a) exists, then for h small,
1 00 1
f (a + h) = f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn + o(||h||n ).
2! n!

Proof. We reason by induction. For n = 1, the formula becomes

f (a + h) = f (a) + f 0 (a)h + o(h)

and this is just the Fréchet differentiability of f at a. Suppose that the theorem is true for a certain
integer n ≥ 1 (and for any map satisfying the assumptions). Let f be n times Fréchet differentiable
and suppose that f (n+1) (a) exists. Set for h small
1 00 1 1
ϕ(h) = f (a + h) − f (a) − f 0 (a)h − f (a)h2 − · · · − f (n) (a)hn − f (n+1) (a)hn+1 .
2! n! (n + 1)!
According to the lemma, ϕ is differentiable and
1 1
ϕ0 (h) = f 0 (a + h) − f 0 (a) − f 00 (a)h − · · · − f (n) (a)hn−1 − f (n+1) (a)hn .
(n − 1)! n!
By the induction assumption applied to f 0 , we get ϕ0 (h) = o(||h||n ). Let ε > 0 be given. Then
there exists δ > 0 such that
||ϕ0 (h)|| ≤ ε||h||n for ||h|| < δ.
Let ||h|| < δ. By the mean value theorem applied to the ball B(0, δ),

||ϕ(h) − ϕ(0)|| ≤ ε||h||n ||h − 0||.

Since ϕ(0) = 0, we get


||ϕ(h)|| ≤ ε||h||n+1 .
that is,
ϕ(h) = o ||h||n+1 .


This means that the theorem is true for n + 1. 

To prove the other versions of Taylor’s formula, we need two preliminary results. Recall that for
map g : I ⊂ R → F which is differentiable at some point t0 ∈ I, we can consider g 0 (t0 ) as an
element in F and write
d g(t0 + h) − g(t0 )
g(t0 ) = g 0 (t0 ) = lim .
dt h→0 h
Lemma 4.3. (Product rule). Let I be an open interval of R and let F be a normed spaces. Let
f : I → R and g : I → F be differentiable at some point t0 ∈ I. Then the map f g is differentiable
at t0 and
(f g)0 (t0 ) = f (t0 )g 0 (t0 ) + f 0 (t0 )g(t0 ).
4.2. TAYLOR’S FORMULAS 63

Proof/Exercise. You can prove this familiar rule by invoking Proposition 2.6 and the chain rule.
Alternatively, you can write

(f g)(t0 + h) − (f g)(t0 ) = f (t0 + h) [g(t0 + h) − g(t0 )] + [f (t0 + h) − f (t0 )] g(t0 ).

Then divide by h and let h → 0. 

Lemma 4.4. Let v : I ⊂ R → F be n + 1 times differentiable. Then


ï ò
d 0 1 2 00 1 1
v(t) + (1 − t)v (t) + (1 − t) v (t) + · · · + (1 − t) v (t) = (1 − t)n v (n+1) (t).
n (n)
dt 2! n! n!

Proof/Exercise. Reason by induction on n and use the previous lemma. 

Corollary 4.1. Let v : [0, 1] → F be continuous and n + 1 times differentiable on ]0, 1[. Suppose
that ||v n+1) (t)|| ≤ M for some constant M . Then

0 1 00 1 (n)

v(1) − v(0) − v (0) − v (0) − · · · − v (0) ≤ M
.
2! n! (n + 1)!

Proof. Set
1 1
u(t) = v(t) + (1 − t)v 0 (t) + (1 − t)2 v 00 (t) + · · · + (1 − t)n v (n) (t).
2! n!
Then u(1) = v(1) and

1 00 1
u(0) = v(0) + v 0 (0) + v (0) + · · · + v (n) (0).
2! n!
The previous lemma states that
1
u0 (t) = (1 − t)n v (n+1) (t)
n!
and so
M
||u0 (t)|| ≤ (1 − t)n .
n!
Setting,
M
g(t) = − (1 − t)n+1 ,
(n + 1)!
we can write this inequality as
||u0 (t)|| ≤ g 0 (t).
By the mean value theorem version 3.4

M
||u(1) − u(0)|| ≤ g(1) − g(0) = .
(n + 1)!

Hence the result. 

Taylor-Lagrange’s theorem. Let f : U ⊂ E → F be n + 1 times differentiable and let


[a, a + h] ⊂ U . Suppose that there is a constant M such that ||f (n+1) (x)|| ≤ M for all x ∈ [a + th].
Then

f (a + h) − f (a) − f 0 (a)h − 1 f 00 (a)h2 − · · · − 1 f (n) (a)hn ≤ M
||h||n+1 .

2! n! (n + 1)!
64 CHAPTER 4. HIGHER ORDER DERIVATIVES

In particular, we can write for h small

f 00 (a) 2 1
f (a + h) = f (a) + f 0 (a)h + h + · · · + f (n) (a)hn + O ||h||n+1 .

2 n!

Proof. For t ∈ [0, 1], set v(t) = f (a + th). The assumption [a, a + h] ⊂ U implies that v is well
defined and continuous on [0,1]. The chain rule implies that v is n + 1 times differentiable on ]0,1[
and
v 0 (t) = f 0 (a + th)h; v 00 (t) = f 00 (a + th)h2 ; · · · v (n+1) (t) = f (n+1) (a + th)hn+1 .
The assumption ||f (n+1) (x)|| ≤ M implies that ||v (n+1) (t)|| ≤ M ||h||n+1 .
According to the previous corollary (with M replaced by M ||h||n+1 ), we obtain

v(1) − v(0) − v 0 (0) − 1 v 00 (0) − · · · − 1 v (n) (0) ≤ M
||h||n+1 .

2! n! (n + 1)!

But since v (k) (0) = f (k) (a)hk , we get the result. 

The Taylor-Cauchy’s formula can be extended to maps between Banach spaces. But for this
one has to extend the notion of the Riemann integral to maps that take values in a Banach space
(completeness is needed in this theory). See for example the book of Azé. In this theory, one can
prove the fundamental theorem of calculus which states that if g : [a, b] → F is C 1 , then
Z b
g 0 (t) dt = g(b) − g(a).
a

Here, C 1 on [a, b] means that

(i) g is continuous on [a, b].

(ii) g is C 1 on ]a, b[.

(iii) g 0 has a left limit at b and a right limit at a.

Having done this, it is easy to prove


Taylor’s formula with integral remainder. Let F be a Banach space, let f : U ⊂ E → F be of
class C n+1 and let [a, a + h] ⊂ U . Then
Z 1
1 1 1
f (a + h) = f (a) + f (a)h + f 00 (a)h2 + · · · + f (n) (a)hn +
0
(1 − t)n f (n+1) (a + th)hn+1 dt.
2! n! n! 0

Proof/Exercise. Integrate from 0 to 1 the identity in Lemma 4.4. and set v(t) = f (a + th). 

4.3 Local extrema


Definitions
Let f : U ⊂ E → R be a map and let a ∈ U .
• We say that f has a local minimum at a if there is a neighborhood V ⊂ U of a such that

f (x) ≥ f (a) ∀x ∈ V.

Or equivalently,
f (a + h) ≥ f (a) for h small enough.
4.3. LOCAL EXTREMA 65

In this case, we also say a is a local minimum point of f . We have a similar definition for local
maximum. A local minimum or maximum is called a local extremum.

• We say that f has a global minimum at a if

f (x) ≥ f (a) ∀x ∈ U.

We have a similar definition for global maximum. A global minimum or maximum is called a global
extremum.

• We say that f has a strict local minimum at a if there is a neighborhood V ⊂ U of a such that

f (x) > f (a) ∀x ∈ V \{a}.

Similar definitions for strict local maximum, strict global maximum and strict global minimum.

• We say that a is critical point of f if either f is not differentiable at a or if f is differentiable at


a and f 0 (a) = 0. The image under f of a critical point is called a critical value.

Observation. If f has a local (resp. global) minimum at a, then −f has a local (resp. global)
maximum at a. a is critical point of f if and only if it is a critical point of −f . Thus, studying
minima is equivalent to studying maxima.

Proposition 4.5. Let f : U ⊂ E → R be differentiable at a point a ∈ U . If f has a local


extremum at a, then a is a critical point of f .

Proof. This result is known for a map v : I ⊂ R → R. It is enough to prove the result in the
case of local minimum. Let h ∈ E be arbitrary. Set for t small v(t) = f (a + th) (v is well defined
because U is open). By assumption, f (a + th) ≥ f (a) for all t small enough, say t ∈] − δ, δ[. This
can be written as
v(t) ≥ v(0) for t ∈] − δ, δ[.

This means that v has a local minimum at t = 0. By definition of v, v is differentiable at 0 and

f (a + th) − f (a)
v 0 (0) = lim = f 0 (a)h.
t→0 t

We know from calculus that v 0 (0) = 0. Therefore f 0 (a)h = 0. Since h was arbitrary, this means
that f 0 (a) = 0. 

Remark 1. The converse is not true. For example let f (t) = t3 for t ∈ R. Then f 0 (0) = 0 but f
has has no local extremum at 0.

Remark 2. (Methodology). If f is differentiable, we search the points at which f has a local


extremum among the critical points of f .

Definition. A critical point at which f has no local extremum is called a saddle point. This term
comes from the following analogy. The graph of a function of two real variables near a saddle point
looks like the saddle of the horse.
66 CHAPTER 4. HIGHER ORDER DERIVATIVES

Definition. Let f : U ⊂ E → R be a twice differentiable at a point a ∈ U . We say that the


bilinear symmetric form f 00 (a) is positive and write f 00 (a) ≥ 0 if

f 00 (a)x2 = f 00 (a)(x, x) ≥ 0 ∀x ∈ E.

We say that f 00 (a) is positive definite (or positive nondegenerate) if there exists a positive constant
α such that
f 00 (a)x2 ≥ α||x||2 ∀x ∈ E.
Remark 1. For a function of a real variable, saying that f 00 (a) is positive means that the number
f 00 (a) is ≥ 0. Saying that f 00 (a) is positive definite means that f 00 (a) > 0.

Remark 2. For a map of n real variables, f 00 (a) is identified with the Hessian matrix and saying
that f 00 (a) is positive means that f 00 (a) is positive in the linear algebra sense, i.e,

xT f 00 (a)x ≥ 0 ∀x ∈ Rn (column vector).

This is equivalent to saying the eigenvalues of the f 00 (a) are ≥ 0.


Similarly, saying that f 00 (a) is positive definite means that f 00 (a) is positive definite in the linear
algebra sense, i.e, there exists α > 0 such that

xT f 00 (a)x ≥ α||x||2 ∀x ∈ Rn .

This is equivalent to saying the eigenvalues of the f 00 (a) are > 0.

Exercise. Let A be an n × n real matrix. Show that A is positive definite if and only if xT Ax > 0
whenever x 6= 0. Hint. Use the compactness of the unit sphere in Rn .

Proposition 4.6. Let f : U ⊂ E → R be a twice differentiable at a point a ∈ U . If f has a local


minimum at a, then f 0 (a) = 0 and f 00 (a) ≥ 0.
Proof/Exercise. By Taylor-Young’s formula and our assumptions, we can write
1
f (a + h) − f (a) = f 00 (a)h2 + r(h) ≥ 0
2
where r(h) = o(||h||2 ). Let x ∈ E be arbitrary and let h = tx for t small. Let ε > 0 be given. By
assumption, |r(h)| ≤ ε||h||2 for h small enough. Using this, prove that f 00 (a)x2 + 2ε||x||2 ≥ 0 and
conclude. 

Remark. The converse is not true. Give a counterexample.


4.3. LOCAL EXTREMA 67

Proposition 4.7. Let f : U ⊂ E → R be a twice differentiable at a point a ∈ U . If f 0 (a) = 0


and f 00 (a) is positive definite, then f has a strict local minimum at a.

Proof/Exercise. By Taylor-Young’s formula and our assumptions, we can write

1
f (a + h) − f (a) = f 00 (a)h2 + r(h)
2

where r(h) = o(||h||2 ). By assumption the exists α > 0 such that f 00 (a)x2 ≥ α||x||2 . Choose
0 < ε < α2 . Then for h small |r(h)| ≤ ε||h||2 . Conclude. 

Exercise. Formulate similar results for local maxima.

Exercise (Methodology). Let f : U ⊂ E → R be a twice differentiable at a point a ∈ U .


Suppose that f 0 (a) = 0 and

(i) f 00 (a) and −f 00 (a) are not positive, i.e there exists x, y ∈ E such that f 00 (a)x2 < 0 and
f 00 (a)y 2 > 0.

Then a is a saddle point of f .


If E = Rn , then condition (i) is equivalent to

(ii) The Hessian matrix f 00 (a) has one strictly negative eigenvalue and one strictly positive eigenvalue.

Hence the following methodology for classifying critical points of a twice differentiable map f : U ⊂
Rn → R. Suppose that f 0 (a) = 0.

1. Compute the eigenvalues of f 00 (a).

2. If all eigenvalues are > 0, then, a is a strict local minimum point.

3. If all eigenvalues are < 0, then, a is a strict local maximum point.

4. If one eigenvalue is > 0 and another eigenvalue is < 0, then a is saddle point.

5. If none of the above hold, we cannot conclude in general. However, if for example, f is three
times differentiable, we look at the third term in the Taylor-Young’s formula.

Remark. Finding the eigenvalues is not the unique way for showing that a matrix is positive
definite. For example, prove in three ways that the following n × n matrix (which is standard in
numerical analysis) is positive definite.

2 −1 0 · · · ···
 
0
−1 2 −1 · · · ···  0
 
 0 −1 2 −1 ···  0
A= . .
 
 .. .. .. .. .. ..
 . . . . 
 .
0 0 −1 2 −1
0 0 ··· ··· −1 2

Hint. Consider this matrix for small values of n and find a pattern.
68 CHAPTER 4. HIGHER ORDER DERIVATIVES

Convex functions
Let U be a convex subset of normed space E and let f : U → R be a map. We say that f is convex
if 
f (1 − t)x + ty ≤ (1 − t)f (x) + tf (y) ∀x, y ∈ U ∀t ∈ [0, 1].
Geometric interpretation. Given any two points on the graph of f , the straight line segment
joining them is above the graph of f (draw a figure).

Proposition 4.8. Let U be an open and convex subset of normed space E and let f : U → R be
a differentiable map. The the following conditions are equivalent.

(i) f is convex.
(ii) f (y) ≥ f (x) + f 0 (x)(y − x) for all x, y ∈ U .

Geometric interpretation of (ii). The graph of f is above its tangents.


If f is twice differentiable, then these conditions are equivalent to
(iii) f 00 (x) ≥ 0 for all x ∈ U .

Proof. We will prove that (i)⇔(ii). You will prove that (ii)⇔(iii) in the exercises.
(i)V(ii). Let x, y ∈ U and t ∈]0, 1[. Then, by assumption
 
f x + t(y − x) = f (1 − t)x + ty ≤ (1 − t)f (x) + tf (y).
Letting h = y − x, we can write
f (x + th) ≤ (1 − t)f (x) + tf (y).
Subtracting f (x) and dividing by t we get
f (x + th) − f (x)
≤ f (y) − f (x).
t
Letting t → 0, we get
f 0 (x)h ≤ f (y) − f (x),
i.e.,
f 0 (x)(y − x) ≤ f (y) − f (x).
(ii)V(i). Let x, y ∈ U , t ∈ [0, 1] and set z = (1 − t)x + ty. By assumption,
f (x) ≥ f (z) + f 0 (z)(x − z)
f (y) ≥ f (z) + f 0 (z)(y − z).
Multiplying the first inequality by (1 − t) and the second one by t and then adding them, we get
(1 − t)f (x) + tf (y) ≥ f (z)
because (1 − t)(x − z) + t(y − z) = (1 − t)x + ty − z = 0. 

Corollary 4.2. Let f : U ⊂ E → R be convex and differentiable. Then, the following conditions
are equivalent.

(i) f 0 (a) = 0.
(ii) f has a global minimum at a.

Exercise. Prove, without the assumption of differentiability, that if a convex function has a local
minimum at a point, then it has a global minimum at that point.
Chapter 5

The inverse function theorem and the


implicit function theorem

5.1 Introduction
We shall consider two interesting and related problems.

Problem 1. Given an equation of the form y = f (x), can we write x = g(y)? For this to make
sense, f must be bijective. Suppose that this is the case. If y depends smoothly on x, does x
depend smoothly on y? Otherwise stated, if f is smooth, does it follow that f −1 is smooth?

Without any additional assumption, the answer is no. For example the map f : R → R given
by f (x) = x3 is smooth but its inverse f −1 which is given by f −1 (y) = y 1/3 is not smooth (it is not
differentiable at 0). However, under certain assumptions, the answer is yes. We shall attack this
problem in full generality in the context of smooth maps between Banach spaces.
Remark. We can ask another question. If y depends continuously on x, does it follow that x
depends continuously on y? Otherwise stated, if f is a continuous bijection, does it follow that f −1
is continuous? Without any additional assumption, the answer is no and we gave a counterexample
in the topology course. However there are some interesting cases where this is true. Recall that a
continuous bijection whose inverse is continuous is called a homeomorphism.

Example 1. If L : E → F is a linear continuous bijection between two Banach spaces, then L−1 is
continuous as well. This is the celebrated Banach isomorphism theorem (known also as the bounded
inverse theorem) that we mentioned in Chapter 1.

Example 2. Let f : X → Y is a continuous bijection between two topological spaces. If X is


compact and Y is Hausdorff, then, f −1 is continuous.

Example 3/Challenging exercise. A continuous bijection from R to R is a homeomorphism.

Example 4. More generally a continuous bijection from Rn to Rn is a homeomorphism. This is


a deep theorem that requires advanced tools from topology.

Problem 2. If two variables x and y are related by a relation of the form f (x, y) = 0, can we
express y as a function of x, i.e, can we write y = g(x)?
Without any additional assumption, the answer is no. For example consider the equation x2 +
y2 = 1. We cannot express y in terms of x or vice versa. Geometrically, this means that a full circle
is not the graph of a function. However small parts of the circle are the graph of a function. If we
restrict x and y to a neighborhood of the north pole (0, 1), then y > 0 and so we can write in that

69
70CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM


neighborhood y = 1 − x2 . If √ we restrict x and y to a neighborhood of the south pole (0, −1),
then we can write locally y = − 1 − x2 . However, no matter howp close to the east pole (1, 0) we
are, we cannot write y as a function of x (but we can write x = 1 − y 2 ).
Suppose that we can solve the equation f (x, y) = 0 locally and write y = g(x) and suppose
that f is smooth, does it follow that g is smooth?

As a warm up, let us tackle Problem 1 in a very particular case. Let f : V → W be a smooth
bijection between two open sets of R. If f −1 is differentiable, then f 0 (x) 6= 0 for all x ∈ I.
Conversely, if f 0 never vanishes and f −1 is continuous, then f −1 is as smooth as f .
Suppose first that f −1 is differentiable. Differentiating both sides of the equation f −1 (f (x)) = x,
we get by the chain rule
0
f −1 (f (x)) · f 0 (x) = 1.
This means that f 0 (x) 6= 0 and
0 1
(f −1 (f (x)) = .
f 0 (x)
Or equivalently
0 1
f −1 (y) = .
f 0 (f −1 (y))
Remark. A physicist would just write
dx 1
= dy
dy
dx
0
and the problem is over. This is a good way to remember the relation between f −1 and f 0 .
However, we need to be rigorous and precise and put the right assumptions.

Suppose conversely that f −1 is continuous and that f 0 never vanishes. We prove first that f −1
is differentiable at every point b ∈ W . For y = f (x) near b = f (a), we can write

f −1 (y) − f −1 (b) x−a 1


= = f (x)−f (a)
(5.1)
y−b f (x) − f (a)
x−a

Continuity of f −1 implies that x → a as y → b. Accordingly, the left hand side of (5.1) tends to
1 −1 is differentiable at b and
f 0 (a) as y → b. It follows that f

0 1
f −1 (b) = . (5.2)
f 0 (f −1 (b))

If f is of class C 1 , then f 0 is continuous. On the other hand, f −1 is continuous and the map t 7→ 1t
0
is continuous on R\{0}. Therefore f −1 is continuous as a composition of continuous functions.
If now f is of class C n , we can easily prove by induction that f −1 is also of class C n . Indeed,
we just proved the result for n = 1. Suppose it is true for n − 1 and let f be of class C n . Then f 0
is of class C n−1 . By the induction assumption f −1 is of class C n−1 . Since the inverse map t 7→ 1t
is C ∞ , the map b 7→ f 0 (f −1
1
(b))
is of class C n−1 as a composition of maps of class C n−1 . It follows
0
from relation (5.2) that f −1 is of class C n−1 . This means that f −1 is of class C n .


The assumption that f 0 does not vanish is a strong assumption. What if f 0 does not vanish
at just one point a? If f 0 is continuous, then f 0 does not vanish on a neighborhood of that point.
Therefore according to the previous discussion, f −1 is as smooth as f locally, i.e, near the point
b = f (a).
5.2. THE INVERSE FUNCTION THEOREM 71

We can prove a better result. If f is not necessarily globally bijective but f 0 is continuous and
does not vanish at a point a, then f is bijective near a and its inverse is as smooth as f . This is
essentially the content of the local inversion theorem that we shall prove in the general context of
Banach spaces. As we know, the generalization of condition f 0 (a) 6= 0 is the condition that f 0 (a)
is an isomorphism (invertible Jacobian matrix in n−dimensions).

Consider now Problem 2 in dimension 1. Under certain general assumptions, we can solve the
equation f (x, y) = 0 locally near a point (a, b) where f (a, b) = 0. If f is smooth and ∂f
∂y (a, b) 6= 0,
then, there is a function g defined on a neighborhood of a such that f (x, g(x)) = 0. This function
is as smooth as f . Differentiating this relation and using the chain rule, we get
∂f ∂f 0
+ g (x) = 0.
∂x ∂y
It follows that Å ã−1
0 ∂f ∂f
g (x) = − (5.3)
∂y ∂x
where the partial derivatives are evaluated at the point (x, g(x)). This is essentially the content of
the implicit function theorem. Equation (5.3) is called implicit differentiation.

Remark. Except for the first lemma, in this chapter, we need to work in Banach spaces for three
reasons. First, we will use the Banach fixed point theorem. Second, we need the fact that for two
Banach spaces E and F , Isom(E, F ) is open in L(E, F ) and third, we need the fact that the map
Φ : Isom(E, F ) → Isom(F, E) given by Φ(S) = S −1 is of class C ∞ .

Banach fixed point theorem (or the Banach contraction principle). Let (X, d) be a complete
metric space and let ψ : X → X be a contraction, i.e, there exists a constant k < 1 such that

d(ψ(x), ψ(y)) ≤ kd(x, y) ∀x, y ∈ X.

Then ψ has a unique fixed point, i.e, there exists a unique ` ∈ X such that ψ(`) = `.

Remark 1. The element ` is given by a constructive procedure. Start with an arbitrary point x0
and define a sequence (xn ) by xn+1 = ψ(xn ). It is easy to prove that (xn ) is a Cauchy sequence.
The completeness assumption implies that (xn ) has a limit `. Then ` is the required fixed point.

Remark 2. It’s essential that ψ maps X into itself. First, if x is a fixed point of ψ, then x and
ψ(x) should belong to the same space. But even, if X ⊂ Y and ψ :√ X → Y is a contraction,√then ψ
need not have a fixed point. For example the map ψ : [0, 1] → [0, 2] defined by ψ(x) = 1 + x2
is a contraction but has no fixed point (check that). Of course a contraction has at most one fixed
point.

Remark 3. Completeness is also essential. For example let X = {x ∈ Q; 1 ≤ x ≤ 2} be equipped


with the usual distance. Let ψ be defined by ψ(x) = x2 + x1 . Then ψ is a contraction from X to
itself but it has no fixed point (check the details).

5.2 The inverse function theorem


Lemma 5.1. Let E and F be isomorphic normed spaces. Let V be an open subset E and W be
an open subset of F . Let f : V → W be a bijection that satisfies the following assumptions
(a) f is differentiable at a point a ∈ V .

(b) g = f −1 is continuous at the point b = f (a).


72CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM

Then, the following conditions are equivalent.

(i) g = f −1 is differentiable at the point b = f (a).

(ii) f 0 (a) ∈ Isom (E, F ).

In this case g 0 (b) = (f 0 (a))−1 (the derivative of the inverse is the inverse of the derivative).
Proof. (i) ⇒ (ii). Consider the identity g ◦ f = 1E . By the chain rule

g 0 (b) ◦ f 0 (a) = (1E )0 (a) = 1E . (5.4)

Applying the chain rule to the identity f ◦ g = 1F , we get

f 0 (a) ◦ g 0 (b) = (1F )0 (b) = 1F . (5.5)

Together, equations (5.4) and (5.5) imply that f 0 (a) is bijective and its inverse is g 0 (b). Since by
definition f 0 (a) and g 0 (b) are bounded, we get f 0 (a) ∈ Isom (E, F ).
(ii) ⇒ (i). We will write y = f (x) to denote the general argument of g. We start by writing the
differentiability condition at a

y − b − f 0 (a)(x − a) = f (x) − f (a) − f 0 (a)(x − a) = r(x)

where r(x) = o(x − a) for x near a. Applying (f 0 (a))−1 to both sides of the equation, we get
−1 −1
f 0 (a) (y − b) − (x − a) = f 0 (a) r(x), (5.6)

or equivalently −1 −1


g(y) − g(b) = x − a = f 0 (a) (y − b) − f 0 (a) r(x).
We need to prove that − (f 0 (a))−1 r(x) = o(y − b) as y → b, that is
0 −1
∀ε > 0 ∃β > 0 f (a) r(x) ≤ ε||y − b|| whenever ||y − b|| ≤ β.

Let ε be a positive number < ||(f 0 (a))−1 ||. Since r(x) = o(x − a), there exits α > 0 such that
ε
||r(x)|| ≤ −1 2 kx − ak whenever ||x − a|| ≤ α.
0
2 f (a)

It follows that
0 −1
f (a) r(x) ≤ f 0 (a) −1 r(x) ≤
 ε 1
−1 kx − ak < kx − ak. (5.7)
0
2 f (a)
2

It follows from equation (5.6) that


0 −1
(f (a) (y − b) = (x − a) + f 0 (a) −1 r(x)


Using the second form of the triangle inequality, we get


0 −1
f (a) (y − b) ≥ kx − ak − f 0 (a) −1 r(x)


1 1
≥ ||x − a|| − ||x − a|| = ||x − a||.
2 2
Thus we proved that
−1 −1
||x − a|| ≤ 2 (f 0 (a) (y − b) ≤ 2 (f 0 (a) ||y − b||

(5.8)
5.2. THE INVERSE FUNCTION THEOREM 73

whenever ||x − a|| ≤ α. Continuity of g at b implies that there exists β > 0 such that ||x − a|| =
||g(y) − g(b)|| ≤ α if ||y − b|| ≤ β. Thus, let ||y − b|| ≤ β. Combining estimates (5.7) and (5.8),
we get
0 −1
(f (a) r(x) ≤ ε
−1 kx − ak
2 f 0 (a)
ε 0 −1
≤ −1 2 (f (a) ||y − b||
2 f 0 (a)
= ε||y − b||. 
Theorem 5.1. (the inverse function theorem or the local inversion theorem – version 1).
Let E and F be isomorphic Banach spaces and let f : U ⊂ E → F be a differentiable map such
that
(a) f 0 is continuous at a point a ∈ U .

(b) f 0 (a) ∈ Isom (E, F ).


Then there exists an open neighborhood V ⊂ U of a and an open neighborhood W of b = f (a)
such that
(i) f : V → W is a homeomorphism.

(ii) g = f −1 is differentiable at the point b = f (a) and g 0 (b) = (f 0 (a))−1 .

Remark. In this theorem, we did not assume that f is a bijection or that f −1 is continuous
because this is the conclusion.
Proof. Here’s our plan for the proof.
Step 1. Write the equation y = f (x) as a fixed point problem.
Step 2. Show that this problem has a unique solution for y in some closed neighborhood W of b.
This defines a map g : W → E that to each y ∈ W associates the unique solution of the equation
y = f (x).
Step 3. Show that g is continuous.
Step 4. Find a neighborhood V of a such that f : V → W is a bijective. Its inverse will be
necessarily g : W → V .

Step 1. Let
ψy (x) = x − (f 0 (a))−1 (y − f (x)) .
Since (f 0 (a))−1 is an isomorphism, we see that y = f (x) if and only if ψy (x) = x. We will show
shortly that ψy is a contraction, but without any restriction on y, ψy need not have any fixed point.
We need to ensure that ψ maps a closed subset of E into itself.
Step 2. Continuity of f 0 at a implies that there exists a number α > 0 such that
1
||f 0 (a) − f 0 (ζ)|| ≤ ∀ζ ∈ B 0 (a, α).
2||(f 0 (a))−1 ||

Let x and x0 belong to B 0 (a, α). Set for t ∈ [0, 1]

u(t) = tx + (1 − t)x0 − (f 0 (a))−1 f tx + (1 − t)x0 .




Observe that

u(1) = x − (f 0 (a))−1 f (x) u(0) = x0 − (f 0 (a))−1 f x0 .



and
74CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM

By the chain rule

u0 (t) = x − x0 − (f 0 (a))−1 f 0 (tx + (1 − t)x0 )(x − x0 )


= (f 0 (a))−1 f 0 (a) − f 0 (tx + (1 − t)x0 ) (x − x0 ).


It follows that
1
||u0 (t)|| ≤ ||(f 0 (a))−1 || ||f 0 (a) − f 0 tx + (1 − t)x0 || ||x − x0 || ≤ ||x − x0 ||.

2
By the mean value theorem version 3.3.
1
||u(1) − u(0)|| ≤ ||x − x0 ||,
2
that is
1
||x − x0 − (f 0 (a))−1 f (x) + (f 0 (a))−1 f (x0 )|| ≤ ||x − x0 ||. (5.9)
2
Recalling the definition of ψy , the above inequality implies that

1
||ψy (x) − ψy (x0 )|| ≤ ||x − x0 ||.
2
This means that ψy is a contraction.
α
Let β = , so that ||(f 0 (a))−1 ||β = α2 .
2||(f (a))−1 ||
0

Claim 1. For every y ∈ B 0 (b, β), ψy maps B 0 (a, α) into itself.


Let x ∈ B 0 (a, α). Then

||ψy (x) − a|| ≤ ||ψy (x) − ψy (a)|| + ||ψy (a) − a||


1
≤ ||x − a|| + ||f 0 (a)−1 (y − b)||
2
α
≤ + ||f 0 (a)−1 || ||y − b||
2
α α
≤ + = α.
2 2
This proves the claim.
Since B 0 (a, α) is closed in the complete metric space E, it is itself complete. By the Banach
fixed point theorem for every y ∈ B 0 (b, β) there exists a unique x ∈ B 0 (a, α) such that x = ψy (x),
i,e, y = f (x). This defines a map g : B 0 (b, β) ⊂ F → B 0 (a, α) ⊂ E. Note that g(b) = a because
b ∈ B 0 (b, β) and a ∈ B 0 (a, α). Note also that by construction f (g(y)) = y for every y ∈ B 0 (b, β).
But we cannot conclude that f maps B 0 (a, α) onto B 0 (b, β) or that f is injective on B 0 (a, α). We
have to restrict f to a smaller subset. What we can say from uniqueness of the fixed point is the
following.
x, x0 ∈ B 0 (a, α) and f (x) = f (x0 ) ∈ B 0 (b, β) =⇒ x = x0 .

Step 3. g is Lipschitz continuous.


Note that by construction g(y) = ψy (g(y)). Now, let y, y 0 ∈ B 0 (b, β). By definition,

ψy (g(y 0 )) = g(y 0 ) + (f 0 (a))−1 (y − f (g(y 0 ))) = g(y 0 ) + (f 0 (a))−1 (y − y 0 )

and
ψy0 (g(y 0 )) = g(y 0 ) + (f 0 (a))−1 y 0 − f (g(y 0 )) = g(y 0 ).

5.2. THE INVERSE FUNCTION THEOREM 75

Therefore
ψy (g(y 0 )) − ψy0 (g(y 0 )) = (f 0 (a))−1 (y − y 0 ).
Using these observations, the triangle inequality and that ψ is a 21 −contraction, we get

||g(y) − g(y 0 )|| = ||ψy (g(y)) − ψy0 (g(y 0 ))||


≤ ||ψy (g(y)) − ψy (g(y 0 ))|| + ||ψy (g(y 0 )) − ψy0 (g(y 0 ))||
1
≤ ||g(y) − g(y 0 )|| + ||(f 0 (a))−1 || ||y − y 0 ||.
2
It follows that
||g(y) − g(y 0 )|| ≤ 2||(f 0 (a))−1 || ||y − y 0 ||.
This completes step 3. It follows from this estimate that g maps the open ball B(b, β) into the open
ball B(a, α).

Step 4. Now let W = B(b, β) and V = f −1 (W ) ∩ B(a, α).

Claim 2. V is open, f (V ) = W and f : V → W is injective and its inverse is g (exercise).

It follows from the last claim that f : V → W is a homeomorphism. This proves conclusion (i).
Conclusion (ii) follows from Lemma 5.1. 
Remark. The proof of the above theorem gives an algorithm for solving the equation y = f (x).
If we recall the proof of the Banach fixed point theorem, we see that x is the limit of a sequence
defined by x0 ∈ B 0 (a, α) and

xn+1 = xn + (f 0 (a))−1 (y − f (xn )) .

This iteration is called Newton’s method. It plays a central role in numerical analysis. If y is
sufficiently close to b (||y − b|| ≤ β) and x0 is sufficiently close to a (||x0 − a|| ≤ α), convergence
in principle is very fast. However, the algorithm may diverge and inverting a large matrix can be
computationally challenging.

Definition. Let E and F be normed spaces. Let V be an open subset E and W be an open
subset of F . Let f : V → W be a bijection. We say that f is a diffeomorphism if f and f −1 are
differentiable. We say that f is a diffeomorphism of class C n (or a C n −diffeomorphism) if f and
f −1 are of class C n .

Corollary 5.1. (the inverse function theorem or the local inversion theorem – version 2).
Let E and F be isomorphic Banach spaces and let f : U ⊂ E → F be a map of class C n . Suppose
that
f 0 (a) ∈ Isom (E, F ) for some point a ∈ U.
Then there exists an open neighborhood V of a and an open neighborhood W of b = f (a) such
that f : V → W is a diffeomorphism of class C n .
Proof. We reason by induction on n.
Basis step (n = 1). The continuity of f 0 at a and the openness of Isom (E, F ) imply together that
there exists a neighborhood V 0 of a such that

f 0 (x) ∈ Isom (E, F ) ∀x ∈ V 0 ,

(write the details).


We consider the restriction of f to V 0 . By version 1, there exists an open neighborhood V ⊂ V 0
of a and an open neighborhood W of b = f (a) such that
76CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM

(i) f : V → W is a homeomorphism.

(ii) g = f −1 is differentiable at the point b = f (a) and g 0 (b) = (f 0 (a))−1 .


Consider an arbitrary point y ∈ W . Then y = f (x) for some x ∈ V . Since f 0 (x) is an
isomorphism, we apply Lemma 5.1 to the point x. Accordingly, g is differentiable at y and

g 0 (y) = (f 0 (g(y)))−1 .

This relation can be written as


g 0 = Φ ◦ f 0 ◦ g. (5.10)
Since Φ, f0
and g are continuous, it follows that g0 is continuous as well. This means f is
1
C −diffeomorphism between V and W .
Induction step. You should be able to complete this step on your own using equation (5.10). 

Corollary 5.2. (The global inverse function theorem). Let E and F be isomorphic Banach
spaces and let U be an open subset of E. Consider a map f : U → F of class C n . Then the
following conditions are equivalent.
(i) f (U ) is open and f is a (global) C n −diffeomorphism between U and f (U ).

(ii) f is injective and f 0 (x) ∈ Isom (E, F ) for all x ∈ U .

Proof. (i) ⇒ (ii) follows from Lemma 5.1.


(ii) ⇒ (i). Let us prove that f is an open map (i.e., maps open sets into open sets). Let O be an
open subset of U . We prove that f (O) is a neighborhood of all its points. Let b ∈ f (O). Then
b = f (a) for some point a ∈ O. Since f 0 (a) is an isomorphism, it follows from the inverse function
theorem version 1 that f is a homeomorphism between an open subset V of O and an open subset
W of F (that containing b = f (a)). Since V ⊂ O, it follows that W = f (V ) ⊂ f (O). This means
that f (O) is a neighborhood of b. In particular, f (U ) is open.
The map f : U → f (U ) is bijective, continuous and open. This means that f is a homeomorphism.
It follows from the Lemma 5.1. that g = f −1 is differentiable at every point y ∈ f (U ) and
g 0 (y) = f 0 (g(y))−1 . This means that
g 0 = Φ ◦ f 0 ◦ g.
Continuity of g 0 follows. An induction argument shows that g is of class C n . 

5.3 The implicit function theorem

Theorem 5.3. (the implicit function theorem). Let E, F and G be Banach spaces with F and
G isomorphic. Let U be an open subset of E × F and consider a map f : U → G of class C n . Let
(a, b) ∈ U such that
(a) f (a, b) = 0,
∂f
(b) (a, b) is an isomorphism between F and G.
∂y
Then the following hold.
(i) There exists an open neighborhood V of (a, b) contained in U , there exists an open neighborhood
W of a and there exists a C n function g : W → F such that

(x, y) ∈ V and f (x, y) = 0 ⇐⇒ x ∈ W and y = g(x).


5.3. THE IMPLICIT FUNCTION THEOREM 77

(ii) For every x ∈ W ,


Å ã−1
0 ∂f ∂f
g (x) = − (x, g(x)) (x, g(x)).
∂y ∂x

Proof. We will deduce this theorem from the inverse function theorem. Define the map f1 : U →
E × G by
f1 (x, y) = (x, f (x, y)) .
Then f1 is of class C n and Ñ é
IE 0
f10 (x, y) = ∂f ∂f .
∂x ∂y
The assumption that ∂f 0
∂y (a, b) is an isomorphism implies that f1 (a, b) is an isomorphism (you should
be able to check that).
Thus, f1 satisfies the assumptions of version 2 of the inverse function theorem. Accordingly,
there exists a neighborhood V of (a, b) and a neighborhood W1 of f1 (a, b) = (a, 0) such that
f1 : V → W1 is C n −diffeomorphism. Let g1 = f1−1 . Then, g is of the form g1 (x, z) = (x, ϕ(x, z))
(the first component of g1 must be the identity and the second component is ϕ). Note that ϕ is of
class C n . Now,

(x, y) ∈ V and f (x, y) = z ⇐⇒ (x, y) ∈ V and f1 (x, y) = (x, z)


⇐⇒ (x, z) ∈ W1 and (x, y) = g1 (x, z)
⇐⇒ (x, z) ∈ W1 and y = ϕ(x, z).

In particular, letting z = 0 and g(x) = ϕ(x, 0), we get

(x, y) ∈ V and f (x, y) = 0 ⇐⇒ (x, 0) ∈ W1 and y = ϕ(x, 0) = g(x).

Now let π : E → E × G be defined by π(x) = (x, 0) and set W = π −1 (W1 ). Then W is an open
neighborhood of a and (x, 0) ∈ W1 ⇔ x ∈ W . Thus, we have proved

(x, y) ∈ V and f (x, y) = 0 ⇐⇒ x ∈ W and y = g(x).

This is conclusion (i). Conclusion (ii) follows from the chain rule. 

Remark 1. Since (a, b) ∈ V and f (a, b) = 0, we see that b = g(a).


Remark 2/Exercise 1. In some sense g is the unique continuous function that solves the equation
f (x, y) = 0 near (a, b). More precisely, let W 0 ⊂ W be connected and let h : W 0 → F be continuous
and satisfies the following.

1. h(a) = b.

2. (x, h(x)) ∈ U ∀ x ∈ W 0 .

3. f (x, h(x)) = 0 ∀ x ∈ W 0 .

Then h(x) = g(x) for all x ∈ W 0 . Hint. Use a connectedness argument.

Exercise 2. Deduce the inverse function theorem version 2 from the implicit function theorem.
Hint. Consider the function f1 (x, y) = y − f (x).

Exercise 3. Formulate and prove a global implicit function theorem.


78CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM

Lagrange multipliers
In chapter 4, we considered some optimization problems for a smooth map f : U ⊂ E → R and we
gave conditions to detect local extrema. Now we consider optimization problems with constraints.
For example what is the maximum value of x + y + z subject to the condition x2 + y 2 + z 2 = 1?
This problem can be stated as

max (x + y + z) or max (x + y + z).


x2 +y 2 +z 2 =1 x2 +y 2 +z 2 −1=0

We call the condition x2 + y 2 + z 2 = 1 a constraint. Geometrically, this means that we want to find
the maximum (or minimum) value of the function x + y + z on the unit sphere.
More generally given two smooth functions f and g from a open subset of U of Rn to R, we
consider the maximization problem

max f (x) = max f (x).


g(x)=0 x∈g −1 (0)

Theorem 5.4. (Lagrange multiplier theorem with one constraint). Let U be an open subset
of Rn (n ≥ 2). Let f and g be C 1 −maps from U to R. If f restricted to g −1 (0) has a local
extremum at a point a and ∇g(a) 6= 0, then there exists a number λ such that

∇f (a) = λ∇g(a).

The number λ is called the Lagrange multiplier relative to the optimization problem.
Remark. In some definitions, you may find the conclusion sated as ∇f (a) + λ∇g(a) = 0.
Proof. For simplicity, we present a proof when n = 3, but the same proof works for any n ≥ 2.
Let a = (a1 , a2 , a3 ) and denote the arguments of f and g by x, y, z. Since ∇g(a) 6= 0, at least
∂g ∂g
one among the partial derivatives ∂1 g(a) = ∂x (a), ∂2 g(a) = ∂y (a) and ∂3 g(a) = ∂g
∂z (a) does not
vanish. After relabeling the variables, we may assume that ∂3 g(a) 6= 0. Since g(a1 , a2 , a3 ) = 0,
by the implicit function theorem, there exists a smooth map ϕ defined on a neighborhood W of
(a1 , a2 ) such that a3 = ϕ(a1 , a2 ) and

g(x, y, ϕ(x, y)) = 0 ∀(x, y) ∈ W.

It follows that the partial derivatives with respect to x and y of the this map of two variables vanish.
By the chain rule
∂1 g + ∂3 g ∂1 ϕ = 0
∂2 g + ∂3 g ∂2 ϕ = 0
where the partial derivatives are evaluated at (x, y, ϕ(x, y)). In particular

∂1 g(a) + ∂3 g(a) ∂1 ϕ(a1 , a2 ) = 0


∂2 g(a) + ∂3 g(a) ∂2 ϕ(a1 , a2 ) = 0.

It follows that
∂1 g(a) ∂2 g(a)
−∂1 ϕ(a1 , a2 ) = and − ∂2 ϕ(a1 , a2 ) = . (5.11)
∂3 g(a) ∂3 g(a)
If (x, y) is sufficiently close to (a1 , a2 ), then continuity of g implies that (x, y, ϕ(x, y)) is close to
a. If follows that the map (x, y) 7→ f (x, y, ϕ(x, y)) has a local extremum at (a1 , a2 ). Therefore
(a1 , a2 ) is a critical point for this map. By the chain rule again

∂1 f (a) + ∂3 f (a) ∂1 ϕ(a1 , a2 ) = 0


∂2 f (a) + ∂3 f (a) ∂2 ϕ(a1 , a2 ) = 0.
5.3. THE IMPLICIT FUNCTION THEOREM 79

If ∂3 f (a)(a) = 0, then, the above system implies that ∂1 f (a) = ∂2 f (a)(a) = 0. Therefore
∇f (a) = 0 and so we can take λ = 0. If not, then

∂1 f (a) ∂2 f (a)
−∂1 ϕ(a1 , a2 ) = and − ∂2 ϕ(a1 , a3 ) = .
∂3 f (a) ∂3 f (a)

Recalling equation (5.11) we see that

∂1 f (a) ∂1 g(a) ∂2 f (a) ∂2 g(a)


= and = .
∂3 f (a) ∂3 g(a) ∂3 f (a) ∂3 g(a)
∂3 f (a)
Letting λ = ∂3 g(a) , we get the conclusion. 

Methodology
Consider the problem of finding the minimum and maximum values of a smooth function f of n
real variables with the smooth constraint g(x) = 0.

1. Make sure that the maximum an minimum exist. This is the case when g −1 (0) is compact.

2. We set up the system of n + 1 equations in (n + 1) unknowns


®
∇f (a) = λ∇g(a)
g(a) = 0.

and we try to solve it. This system can be hard or even impossible to solve with a simple
formula because it is not necessarily linear in a. In this case, we may solve it numerically.
However there are some simple cases where we can solve it explicitly.

3. In this case, we evaluate f at the given solutions. The biggest value leads to a maximum and
the smallest value leads to a minimum.

Example. Find the maximum and minimum values of f (x, y, z) = x + y + z subject to the
constraint g(x, y, z) = x2 + y 2 + z 2 − 1 = 0. First, we must observe that there is a minimum and a
maximum value because x + y + z is continuous and the sphere is compact. Second, the gradient
of g vanishes only at zero. Therefore we are sure that there is a Lagrange multiplier. The system
we want to solve is 


 1 = 2λx

1 = 2λy


 1 = 2λz
x2 + y 2 + z 2 = 1.


This gives x = y = z = 2λ1
and x2 + y 2 + z 2 = 1. Therefore 4λ3 2 = 1 and so λ= ±2 3 . This gives
−1 √
two candidates for the maximum and minimum √13 , √13 , √13 and √ 3
, −13 , √
−1
3
. The first point is

√ value 3 and the second one is necessarily
necessarily a global maximum point because f takes the
a global minimum point because f takes the value − 3.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy