Differential Calculus in Banach Spaces
Differential Calculus in Banach Spaces
Sometimes we need to consider a function of several variables or a function with several components.
For example the temperature may be a function of three spatial variables x, y, z and time t. This is
an example of what we call a scalar field. The velocity of a fluid may be a function of space and time
(x, y, z, t) and has three components. This is an example of a vector field. This leads to the study
of functions f : Rn → Rm called vector fields as well. We can still push the generalization further
and consider functions between normed spaces. But how to define the notion of differentiability in
this context? And will the results mentioned above still hold in this more general context? This
is this the purpose of the course: to extend the above results to functions between normed spaces
(which will be Banach spaces most of the time).
Some references
2 Differentiable maps 21
2.1 Definitions and fundamental examples . . . . . . . . . . . . . . . . . . . . . . . . . 21
2.2 Linearity of the derivative and the chain rule . . . . . . . . . . . . . . . . . . . . . . 30
2.3 Functions between products of normed spaces . . . . . . . . . . . . . . . . . . . . . 32
3
4 CONTENTS
Chapter 1
Remark. Condition (iii) can be replaced by the following condition known as the second form of
the triangle inequality.
(iii)’ ||x|| − ||y|| ≤ ||x − y|| for all x, y ∈ E.
A normed space (or normed linear space, or normed vector space) is a vector space equipped
with a norm. A norm defines a distance by setting d(x, y) = ||x − y||. Therefore, a normed space
is a metric space and therefore a topological space. Thus, in a normed space, one can talk about
open balls, open sets, closed sets, compact sets...
The open ball of center a and radius r > 0 is the set
BE = {x ∈ E; ||x|| ≤ 1}.
SE = {x ∈ E; ||x|| = 1}.
◦
Recall that B(a, r) = B 0 (a, r), B 0 (a, r) = B(a, r) and S(a, r) = ∂B(a, r) = ∂B 0 (a, r).
5
6 CHAPTER 1. PRELIMINARIES ON NORMED SPACES
A Banach space is a normed space which is complete, i.e., in which every Cauchy sequence is
convergent.
Examples
1) Let N ∈ N∗ and p ≥ 1. For x = (x1 , . . . , xN ) ∈ RN , set
N
!1/p
X
p
||x||p = |xi | .
i=1
∞
!1/p
X
||x||p = |xn |p ,
n=1
Then we get a norm but the space (C[a, b], || · ||p ) is not complete.
Remark. limp→+∞ ||f ||p = ||f ||∞ , hence the notation.
8) Let X be a set and Y be a normed space. We denote by Cb (X, Y ) the set of all bounded
functions f : X → Y . This is a vector space. For f ∈ Cb (X, Y ), we set
1. E is complete.
Equivalent norms
Definition. Let E be a vector space and let || · ||1 and || · ||2 be two norms on E. We say that
these two norms are equivalent if there exist two positive constants α and β such that
Proposition 1.2. Let || · ||1 and || · ||2 be two norms on a vector space E. Then the following
conditions are equivalent.
In this case, (E, || · ||1 ) is complete if and only if (E, || · ||2 ) is complete.
Proposition 1.3. On a finite dimensional vector space, all norms are equivalent.
Remark. The converse is also true. Actually, if E be a vector space, then the following conditions
are equivalent.
8 CHAPTER 1. PRELIMINARIES ON NORMED SPACES
Proposition 1.4. Let (E, || · ||E ) and (F, || · ||F ) be two normed spaces. Set
||(x, y)||1 = ||x||E + ||y||F .
||(x, y)||∞ = max(||x||E , ||y||F ).
1/2
||(x, y)||2 = ||x||2E + ||y||2F .
Then the following hold.
(a) All the above norms are equivalent and generate the product topology.
(b) If (E, || · ||E ) and (F, || · ||F ) are complete, then (E × F, || · ||1 ) is complete.
More generally, we have
Proposition 1.5. Q Let (E1 , || · ||E1 ), (E2 , || · ||E2 ) . . . , (En , || · ||En ) be normed spaces. For
x = (x1 , . . . , xn ) ∈ nk=1 Ek , set
||x||1 = nk=1 ||xk ||Ek .
P
Proof. Let (λ, x) ∈ R × E and let (λn , xn ) be a sequence of R × E converging to (λ, x). Then
it is easy to see that λn → λ in R and xn → x in E. Now,
||T (λn , xn ) − T (λ, x)|| = ||λn xn − λx|| = ||λn xn − λn x + λn x − λx||
≤ ||λn (xn − x)|| + ||(λn − λ)x||
= |λn |||xn − x|| + |λn − λ|||x||.
Since (λn )is convergent it is bounded. Therefore the first term tends to 0 since ||xn − x|| → 0.
The second term tends to 0 because |λn − λ| → 0. It follows that T (λn , xn ) → T (λ, x). Hence the
continuity of T .
The proof of the continuity of S is similar but easier.
1.2. LINEAR BOUNDED OPERATORS 9
Convexity
Let A be a subset of a vector space. We say that A is convex if
Proposition 1.7. Let E and F be normed spaces and let L : E → F be a linear operator. Then
the following conditions are equivalent.
(i) L is Lipschitz continuous i.e., there exists a constant M ≥ 0 such that ||Lx−Ly|| ≤ M ||x−y||
for all x, y ∈ E.
(iii) L is continuous.
(viii) There exists a constant M ≥ 0 such that ||Lx|| ≤ M ||x|| for all x ∈ E.
If L satisfies one of the above properties, then, L is called a bounded linear operator or a
continuous linear operator. The space of continuous linear operators from E to F is a vector space
denoted by L(E, F ) or by B(E, F ). If F = E, we write L(E) or B(E) instead of L(E, E). For
L ∈ L(E, F ), we set
||L|| = sup ||Lx|| (the operator norm).
||x||≤1
Then, it turns out that this defines a norm of L(E, F ) which is also denoted by ||L||L(E,F ) .
It is easily seen that ||Lx|| ≤ ||L||||x|| for all x ∈ E and so the infimum is a minimum, but the
supremum need not be a maximum.
Proposition 1.9. Let E and F be normed spaces. If F is a Banach space then L(E, F ) is also a
Banach space.
Definition. Let E be a normed space. The space L(E, R) of all linear bounded functionals
f : E → R is called the dual of E and it is denoted by E ∗ . The norm on E ∗ is therefore defined by
It follows that |Lx| ≤ ||x||p ||y||p0 and so ||L|| ≤ ||y||p0 . Actually we have equality.
Proposition 1.11. If two normed spaces are isomorphic then either they are both complete or
both incomplete.
1. T is well defined.
2. T is linear.
4. T is surjective.
5. T −1 is given by
T −1 : L(R, E) −→ E
g 7−→ g(1)
Example 2. Let n ∈ N∗ . Equip Rn with some norm || · || and fix a basis B = {e1 , . . . , en }. Let
Mn (R) denote the space of n × n real matrices. For A ∈ Mn (R) set
Then this defines a norm on Mn (R). Equip L(Rn ) with the operator norm. Then L(Rn ) ∼ = Mn (R).
An isometry is given by the operator T that associates with each L ∈ L(Rn ) its matrix in the basis
B. It should be clear that T is linear, bijective and ||T (L)|| = ||L||.
Example 3. Let E be a normed space, let λ be a real number 6= 1 and let Lx = λx (homothecy).
Then L is an isomorphism of E which is not an isometry.
Example 4. Let n ≥ 2. Then (Rn , || · ||2 ) and (Rn , || · ||2 ) are isomorphic but not isometric.
Openness of Isom(E, F )
We know from a previous example that we can identify L(R) with R. The isometry between L(R)
and R was denoted by T −1 and is given by T −1 g = g(1). This is because if g ∈ L(R), then by
linearity, we can write g(x) = g(x1) = xg(1) = ax. This identification amounts to identifying a
straight line through the origin with its slope.
What is then Isom(R)? Let g ∈ L(R). Then g is bijective if and only if g(1) 6= 0. It follows
that Isom(R) can be identified with R\{0}.
Observe now that R\{0} is an open subset of R. Equivalently, if x0 6= 0 then x0 + h 6= 0 if h
is small enough.
More generally, the isometry that takes L(Rn ) into Mn (R) takes Isom(Rn ) into the set of
invertible matrices GL(n, R) = {A ∈ Mn (R); det A 6= 0}. So we can identify Isom(Rn ) with
GL(n, R). The set of invertible n × n matrices is open in Mn (R). This means that if we perturb a
little bit an invertible matrix the resulting matrix is still invertible.
Is it true that more generally Isom(E, F ) is an open subset of L(E, F ). The answer is yes. But
to prove that we will need to establish first a preliminary result which is particular case of the general
result.
The preliminary result is an extension to linear operators of the geometric series
∞
1 X
= 1 + x + x2 + · · · + xn + · · · = xn provided |x| < 1.
1−x
n=0
1.2. LINEAR BOUNDED OPERATORS 13
Theorem 1.2. Let E be a Banach space and let S ∈ L(E) satisfy ||S|| < 1. Then 1E − S ∈
Isom(E) and
∞
X
(IE − S)−1 = IE + S + S 2 + · · · = Sn.
n=0
||S||n is convergent. Because
P
Proof. Since ||S|| < 1, the geometric series of real numbers n≥0P
||S n n n
P || ≤ n||S|| , the comparison test for series implies that the series n≥0 ||S ||, that is, the series
n≥0 S is normally (or absolutely) convergent. Completeness of E implies that this series is
convergent. Let T denote the sum of this series. Observe that T ∈ L(E).
Then using the continuity of S and T we see that,
∞
X ∞
X
T S = ST = S n+1 = Sn.
n=0 n=1
It follows that
∞
X ∞
X
T (IE − S) = T − T S = Sn − S n = IE ,
n=0 n=1
and
∞
X ∞
X
n
(IE − S)T = T − ST = S − S n = IE .
n=0 n=1
This means that IE − S is invertible and
∞
X
(IE − S)−1 = T = Sn.
n=0
Remark. Here’s an equivalent way of stating the above theorem. Let E be a Banach space and
let L ∈ L(E) satisfy ||IE − L|| < 1, then L ∈ Isom(E) and
∞
X
L−1 = (IE − L)n .
n=0
By the previous theorem, S0−1 S is invertible and so S is invertible, i.e. S ∈ Isom(E, F ). This proves
the claim and so point (a).
1
(b) Proof/Exercise. Let S0 ∈ Isom(E, F ). To prove continuity at S0 , let S ∈ B S0 , ||S −1 ||
and
0
set T = 1E − S0−1 S. Prove the following points.
14 CHAPTER 1. PRELIMINARIES ON NORMED SPACES
2. ||T || < 1.
3. T → 0 as S → S0 .
5. We have
||T ||
||Φ(S) − Φ(S0 )|| ≤ ||S0−1 ||.
1 − ||T ||
Continuity of Φ follows.
Remark 1. Do not confuse a linear map of two variables with a bilinear map. For example, let
E1 = E2 = F = C[0, 1]. For u, v ∈ C[0, 1], we set L(u, v) = uv. Then L : E1 × E2 → F is bilinear
but not linear because
whereas
L(u, v) + L(h1 , h2 ) = uv + h1 h2 .
Also
L(α(u, v)) = L(αu, αv) = α2 uv
whereas
αL(u, v) = αuv.
(i) L is continuous.
(ii) L is continuous at 0.
Definition. A bilinear continuous map is also called a bounded bilinear map (because of condition
(iii)). The set of bilinear continuous map L : E1 × E2 → F is denoted by L(E1 , E2 ; F ) or
B(E1 , E2 ; F ). It is a vector space. If E1 = E2 = E, we also write L2 (E; F ) instead of L(E, E; F ).
Proposition. For a bilinear bounded map L : E1 × E2 → F , we set
Then, this defines a norm on L(E1 , E2 ; F ). If F is complete, then this space is complete.
Remark. For L ∈ L(E1 , E2 ; F ) we also have
||L|| = sup ||Lx|| = sup ||Lx|| = inf{C > 0; ||Lx|| ≤ C||x1 ||||x2 ||}.
||x||=1 ||x||<1
It is easily seen that ||Lx|| ≤ ||L||||x1 ||||x2 || for all x ∈ E and so the infimum is a minimum, but
the supremum need not be a maximum.
Examples.
a) Let E = F = C[a, b] be equipped with the norm of uniform convergence. For u, v ∈ C[a, b], we
set L(u, v) = uv. Then L ∈ L2 (E; F ) and ||L|| = 1.
b) More generally, let ϕ ∈ C[a, b] be fixed. We set L(u, v) = ϕuv. Then L : E × E → F is bilinear
and bounded and ||L|| = ||ϕ||∞ .
c) Under the same assumptions, for u, v ∈ E, let L(u, v) be the antiderivative of ϕuv that vanishes
at a. So Z x
(L(u, v))(x) = ϕ(t)u(t)v(t) dt.
a
Let E and F be normed spaces and let L ∈ L(E, L(E, F )). This means in particular that for every
x ∈ E, Lx is an element of L(E, F ) and so for y ∈ E, L(x)(y) is an element of F . Since L is linear
and takes values in a space of linear maps, this suggest that we can view L as a bilinear operator
from E × E into F . This is the idea behind the next result.
Theorem 1.4. Let E and F be normed spaces and equip E × E with the norm ||(x, y)|| =
max(||x||, ||y||). Then L(E, L(E, F )) ∼
= L2 (E; F ).
Proof. Consider the operator Φ
It is easy to see that Φ is well defined, linear, bijective and its inverse is given by
Next,
kΦ(L)k = sup ||(Lx)(y)||
||(x,y)||≤1
= sup ||(Lx)(y)||
||x||≤1,||y||≤1
= sup ||Lx||
||x||≤1
= ||L||.
Now we move to the general case.
Definition. Let E1 , E2 , . . . , En and F be vector spaces and let L : E1 × E2 × · · · × En → F be
a map (L is a function of n variables). We say that L is multilinear if it is linear in each variable.
Proposition 1.12. Let E1 , E2 , . . . , En and F be normed spaces and let L : E1 ×E2 ×· · ·×En → F
be multilinear. Equip E1 × E2 × · × En with the norm ||(x1 , x2 , . . . , xn )|| = maxni=1 ||xi || (or any
equivalent norm). Then the following conditions are equivalent.
(i) L is continuous.
(ii) L is continuous at 0.
(vi) There exists a constantQn M > 0 such that ||L(x1 , x2 , . . . , xn )|| ≤ M ||x1 ||||x2 || · · · ||xn || for all
(x1 , x2 , . . . xn ) ∈ i=1 Ei .
Definition. A multilinear continuous map is also called a bounded bilinear map (because of
condition (iii)). The set of multilinear continuous map L : E1 × E2 × · · · × En → F is denoted
by L(E1 , E2 , . . . , En ; F ) or B(E1 , E2 , . . . , En ; F ). If Ei = E for all i = 1, . . . , n, we also write
Ln (E, F ) instead of L(E, E, . . . , E; F ).
Qn
Proposition. For a multilinear bounded map L : i=1 Ei → F , we set
||L|| = sup ||Lx|| = sup ||Lx|| = inf{C > 0; ||Lx|| ≤ C||x1 ||||x2 || · · · ||xn ||}.
||x||=1 ||x||<1
It is easily seen that ||Lx|| ≤ ||L||||x1 ||||x2 || · · · ||xn || for all x ∈ E and so the infimum is a minimum,
but the supremum need not be a maximum.
Theorem 1.5. Let E and F be normed spaces and equip E n with the norm ||(x1 , . . . , xn )|| =
maxni=1 ||xi ||. Then Ln (E; Lm (E; F )) ∼
= Ln+m (E; F ).
Proof. Consider the operator Φ : Ln (E; Lm (E; F )) → Ln+m (E; F ) defined by
It is easy to see that Φ is well defined, linear, bijective and its inverse Φ−1 : Ln+m (E; F ) →
Ln (E; Lm (E; F ) is given by
To prove that ||Φ(L)|| = ||L||, we proceed as in the proof of Theorem 1.4, but we replace x by
(x1 , . . . , xn ) and y by (xn+1 , . . . , xn+m ).
(i) L is bilinear.
An inner product is usually denoted by h·, ·i or (·|·) or (·, ·). A vector space equipped with an inner
product is called a inner product space or a pre-Hilbert space. An inner product induces a norm by
setting
||x|| = hx, xi1/2 .
The triangle inequality is a consequence of the fundamental Cauchy-Schwarz inequality
Thus, an inner product space is a normed space. An inner product space is called a Hilbert space if
it is complete.
Examples.
1) Let N ∈ N∗ and For x = (x1 , . . . , xN ) and y = (y1 , . . . , yN ) RN , we set
n
X
hx, yi = x i yi .
i=1
This is an inner product on RN that generates the usual Euclidean norm || · ||2 . Equipped with this
inner product RN is a Hilbert space.
2) We set
∞
X
`2 = {x = (xn ); |xn |2 < +∞}.
n=1
For x, y ∈ `2 , we set
∞
X
hx, yi = x i yi .
i=1
Then we get an inner product. However C[a, b] is not a Hilbert space for this inner product.
Theorem 1.6. (Riesz representation theorem). Let H be a Hilbert space and let f ∈ H ∗ =
L(H, R) (the dual of H). Then there exists a unique element a ∈ H such that
and
||f || = ||a||.
Corollary 1.1. H ∗ ∼
= H.
20 CHAPTER 1. PRELIMINARIES ON NORMED SPACES
Chapter 2
Differentiable maps
||f (x)||
f (x) = o(g(x)) as x → a (or near x = a) if lim = 0. This is equivalent to the
x→a ||g(x)||
following condition.
f (x) = O(g(x)) as x → a if there exists a constant C such that ||f (x)|| ≤ C||g(x)|| for all x
in a neighborhood of a.
Examples.
Remark. The conditions f (h) = o(h) and f (h) = o(||h||) mean the same thing.
Some rules
For a real variable x near 0, we have
o(x)o(x) = o(x2 ).
O(x)O(x) = O(x2 ).
21
22 CHAPTER 2. DIFFERENTIABLE MAPS
Motivation
Let us revisit the notion of the derivative of a map f : I ⊂ R → R. We assume that I is open. Let
a ∈ I. We say that f is differentiable at a if the following limit exist
f (a + h) − f (a)
` := lim .
h→0 h
df
In this case this limit is denoted by f 0 (a) or dx and it is called the derivative of f at a.
a
Geometrically, f 0 (a), when it exists, is the slope of the tangent to the graph of f at the point
(a, f (a)). We say that f is differentiable on I if it is differentiable at every point of I.
Remark 1. If we want f to be differentiable on I, we must have the property
(i) f is differentiable at a.
f (a + h) − f (a)
(ii) There exists ` ∈ R such that lim − ` = 0.
h→0 h
f (a + h) − f (a) − `h
(iii) There exists ` ∈ R such that lim = 0.
h→0 h
(iv) There exists ` ∈ R such that f (a + h) − f (a) − `h = o(h).
The last three formulations of differentiability have a common geometric interpretation: the graph
of f near the point (a, f (a)) can be approximated by its tangent, otherwise stated, the graph of f
looks locally like a line.
We can extend this definition to maps between inner product spaces if we replace `h by h`, hi.
But we can do better. As a function of h, the term `h, is linear (and of course continuous). This
leads to the following definition.
Definitions
Let E and F be normed spaces, let U ⊂ E be open, a ∈ U , and let f : U → F be a map. We
say that f is differentiable in the sense of Fréchet (or Fréchet differentiable) at the point a if there
exists L ∈ L(E, F ) such that
or equivalently,
||f (a + h) − f (a) − Lh|| = o(||h||) near h = 0.
2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 23
or equivalently,
f (x) − f (a) − L(x − a) = o(x − a) for x near a.
If you don’t like this small o notation, you can write
or
ε(h)
f (a + h) − f (a) − Lh = ε(h) where lim = 0,
h→0 ||h||
or
∀ε > 0, ∃δ > 0 s.t ||h|| ≤ δ ⇒ ||f (a + h) − f (a) − Lh|| ≤ ε||h||.
We will see that if this condition holds, then the operator L is unique. It is called the Fréchet
derivative or the Fréchet differential of f at the point x. It is denoted by one of the following
symbols
f 0 (a), Df (a), dfa , df (a).
Remark. We shall see that there is a weaker form of differentiability called Gâteau differentiability.
But since we shall not study it in detail, we will refer to the above condition as differentiability.
||Sx|| ≤ ε||x||.
Since this inequality is true for x = 0, it is true for all x ∈ E. It follows that S is bounded and
||S|| ≤ ε. Since ε was arbitrary, we get ||S|| = 0 and so S = 0.
Proof of Proposition 2.1. Suppose that there is another bounded linear operator T such that
f (a + h) − f (a) − T h = o(h).
f (a + h) − f (a) − Lh = o(h),
we get
(T − L)h = o(h).
It follows from the lemma that T − L = 0 and so T = L.
f 0 : U → L(E, F ).
If this function is continuous (at every point of U ) we say that f is of class C 1 or that f is
continuously differentiable.
Remark. Do not confuse the continuity of f 0 (x) : E → F with the continuity of f 0 : U → L(E, F ).
When it exists, f 0 (x) is continuous by definition. Whereas f 0 need not be continuous. Otherwise
stated, f 0 (x)h is continuous in h but not necessarily continuous in x. Here’s an example of a
differentiable function which is not of class C 1 .
Example. Let
®
x2 sin x1 if x 6= 0
f (x) =
0 if x = 0.
Then f is differentiable at every point x 6= 0. It is also differentiable at 0 and f 0 (0) = 0 because
f (h) − f (0) 1
lim = lim h sin = 0.
h→0 h h→0 h
Therefore ®
2x sin x1 − cos x1 if x 6= 0
f 0 (x) =
0 if x = 0.
Before we give some fundamental examples and counter-examples, let us formulate two basic
facts about differentiability.
βε
||h||E ≤ δ ⇒ ||f (a + h) − f (a) − f 0 (a)h||F ≤ ||h||E .
α
Find δ1 > 0 such that
Fundamental examples
Now we give some fundamental examples and counter-examples.
Proposition 2.4. If f : U → F is constant, then f is C 1 and its derivative is the 0 operator.
Proof. Since f is constant we have f (a + h) = f (a) and so if L is the zero operator we have
f (a + h) − f (a) − Lh = 0 = o(h).
Proposition 2.5. Let f ∈ L(E, F ), then f is of class C 1 and at every point x ∈ E we have
f 0 (x) = f and so f 0 is constant.
Proof. By linearity of f , we have f (x + h) = f (x) + f (h) and so
Proposition 2.6. Let f ∈ L(E, F ; G), then f is of class C 1 and for any x = (x1 , x2 ), h = (h1 , h2 )
in E × F we have
f 0 (x1 , x2 )(h1 , h2 ) = f (h1 , x2 ) + f (x1 , h2 ).
Proof. Here we equip E × F with the norm ||(x1 , x2 )|| = max(||x1 ||, ||x2 ||) but we we can take
any other equivalent norm like ||(x1 , x2 )|| = ||x1 || + ||x2 ||. By the bilinearity of f , we have
Therefore
f (x1 + h1 , x2 + h2 ) − f (x1 , x2 ) − f (h1 , x2 ) − f (x1 , h2 ) = f (h1 , h2 ).
Now (h1 , h2 ) 7→ f (h1 , x2 ) + f (x1 , h2 ) is linear and bounded (check that). Finally,
||f (h1 , h2 )|| ≤ ||f ||||h1 ||||h2 || ≤ ||f ||||(h1 , h2 )||2 = O(||h||2 ) = o(||h||).
Equivalently,
f (a + tei ) − f (a) − tDf (a)ei
lim
= 0.
t→0
t
Equivalently,
f (a + tei ) − f (a)
lim
− Df (a)e i
= 0.
t→0
t
Equivalently,
f (a + tei ) − f (a)
lim = Df (a)ei .
t→0 t
By definition, the existence of this limit means the existence of the ith partial derivative:
Therefore ∂f ∂f
∂x (0, 0) exists and is equal to 0. Similarly ∂y (0, 0) exists and is equal to 0. However, f
is not continuous at 0 and so cannot be differentiable there.
Proposition 2.10. Let E be a normed space and let f (x) = ||x||. Then f is never differentiable
at 0.
Proof. Suppose that f is differentiable at 0. Then ||h|| − Df (0)h = o(h). Replacing h by −h,
we get ||h|| + Df (0)h = o(h). Adding the two equations we get 2||h|| = o(h) which is impossible.
1
The last result of this section generalizes the fact that the map t 7→ t is C 1 on R\{0} and its
derivative is t 7→ −1
t2
.
2.1. DEFINITIONS AND FUNDAMENTAL EXAMPLES 27
Theorem 2.1. Let E and F be isomorphic Banach spaces and consider the map Φ : Isom(E, F ) →
L(F, E) defined by Φ(S) = S −1 . Then Φ is of class C 1 and for S ∈ Isom(E, F ) and H ∈ L(E, F )
we have
Φ0 (S)H = −S −1 HS −1 .
Remark. The order is important and generally not commutative. Note that Φ0 (S)H is a linear
bounded operator obtained by the diagram
S −1 H S −1
F −→ E −→ F −→ E
Proof. Check first that for each S ∈ Isom(E, F ), the operator H 7→ −S −1 HS −1 is indeed linear
and continuous. Now, let H ∈ Isom(E, F ) be small enough. We can write
Therefore
Φ(S + H) − Φ(S) − (−S −1 HS −1 ) = − (S + H)−1 HS −1 − S −1 HS −1
= − (S + H)−1 − S −1 HS −1
= − (Φ(S + H) − Φ(S)) HS −1 .
Continuity of Φ implies that the term to the right is o(H). This proves that Φ is differentiable and
Φ0 (S)H = −S −1 HS −1 (check that this is linear and continuous in H for every S).
Now we prove that Φ0 : Isom(E, F ) → L (L(E, F ), L(F, E)) is continuous. Consider the map
defined by
Ψ(T, L)H = −T HL.
You can check that that Ψ is bilinear and continuous because ||Ψ(T, L)|| ≤ ||T ||||L||. Now observe
that
Φ0 (S) = Ψ(S −1 , S −1 ) = Ψ(Φ(S), Φ(S)).
This implies that Φ0 is continuous as a composition of continuous functions.
∂f
∇v f (a), (a), fv0 (a).
∂v
Note that we can write
d
∇v f (a) = f (a + tv) .
dt t=0
and so
f (a + tv) − f (a) o(t)
= Df (a)v + .
t t
f (a+tv)−f (a)
It follows that limt→0 t exists and
f (a + tv) − f (a)
lim = Df (a)v = ∇f (a) · v.
t→0 t
Remark 1. The above relation can also be obtained by the chain rule. Indeed
d
∇v f (a) = f (a + tv) = Df (a + tv)v = Df (a)v.
dt t=0 t=0
Remark 2. If f has a directional derivative at a along any vector, then it has partial derivatives
at a. Indeed, take v to be an element of the canonical basis {e1 , . . . , en }.
Remark 3. The relation ∇v f (a) = ∇f (a) · v has a very important consequence. Suppose that
||v|| = 1 and let θ denote the angle between ∇f (a) and v. Then
∇v f (a) represents the rate of change of f at the point a. It achieves a maximum value when θ = 0,
that is, when v points in the same direction as ∇f (a). Thus, the gradient of f at a represents
the direction in which f increases the most. Therefore −∇f (a) represents the direction in which f
decreases the most. This interpretation of the gradient underlies many minimization algorithms in
numerical analysis and machine learning. Such algorithms are based on iterations of the form
Remark. However the converse is not true. Indeed, the map v 7→ df (a, v) need not be linear
or continuous. Some mathematicians include in the definition of the Gâteau derivative the linearity
and continuity of the map v 7→ df (a, v). However even if this is true, the function f need not be
differentiable in the sense of Fréchet. We present some examples.
Example 1. Where the Gâteau derivative exists but is not linear. Let f : R2 → R be defined by
x3
if (x, y) 6= (0, 0)
f (x, y) = x2 + y2
0 if (x, y) = (0, 0).
u3
if (u, v) 6= (0, 0)
df (0, 0, u, v) = u2 + v 2
0 if (u, v) = (0, 0).
Example 2. Where the Gâteau derivative exists, is linear but not continuous. Let E be an infinite
dimensional normed space. Then there exists a linear unbounded map f : E → R. It is easy to
check that the Gâteau derivative at any point x ∈ E is given by
df (x, v) = f (v).
Example 3. Where the Gâteau derivative is linear and continuous but the Fréchet derivative does
not exist. Let f : R2 → R be defined by
3
x y if (x, y) 6= (0, 0)
f (x, y) = x6 + y 2
0 if (x, y) = (0, 0).
df (0, 0, u, v) = 0.
Therefore (u, v) 7→ df (0, 0, u, v) is linear and continuous. However f is not even continuous at
0. Therefore it is not Fréchet differentiable at 0. This example therefore shows that Gâteau
differentiability does not imply continuity.
30 CHAPTER 2. DIFFERENTIABLE MAPS
Methodology
Thus, to study the Fréchet differentiability of a map f : U ⊂ E → F at a point a, we may proceed
as follows.
2. Otherwise, we find
f (a + tv) − f (a)
df (a, v) = lim .
t→0 t
3. If either the limit does not exist, or exists but is either not linear or not continuous in v, we
conclude that f is not Fréchet differentiable at a.
4. If this limit is linear and continuous in v, then df (a, v) is the unique candidate for the position
of the Fréchet derivative. But we must prove or disprove that
If this relation is satisfied, we conclude that the Fréchet derivative exists and is df (a, h). If not,
we conclude that f is not Fréchet differentiable at a.
This methodology will be useful in some exercises. However, it is not necessary to always follow it.
Sometimes, the situation is much simpler. Here’s an example.
Example. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let us show that f is
Fréchet differentiable and find its derivative. We start by writing
It should be clear that the linear part is indeed 2hx, hi. This term is indeed continuous in h and we
have
f (x + h) − f (x) − 2hx, hi = ||h||2 = o(h).
This means that f is Fréchet differentiable at x and f 0 (x)h = 2hx, hi.
f : U ⊂ E → F,
Adding these two identities and using the fact that o(h) + o(h) = o(h), we get
In different notation,
D(f + g)(a) = Df (a) + Dg(a)
D(λf )(a) = λDf (a).
Corollary 2.1. If f and g are of class C 1 , then f + g and λf are of class C 1 .
Replacing f (a + h) − f (a) by f 0 (a)h + ϕ(h) and using the linearity of g 0 (f (a)), we get
To prove that the second term is o(h), let ε > 0 be given. Let M = ||f 0 (a)|| + 1. Since ψ(y) =
o(y − b), there exists δ > 0 such that
Corollary 2.2. Under the above assumptions, if f and g are of class C 1 then g ◦ f is also of class
C 1.
pk (y1 , . . . , ym ) = yk .
Qm Qm
Let uk : Fk → i=1 Fi denote the natural injection of Fk into i=1 Fi defined by
fi = pi ◦ f
Xm
f= ui ◦ fi .
i=1
2.3. FUNCTIONS BETWEEN PRODUCTS OF NORMED SPACES 33
or equivalently
fi0 (a)h = pi f 0 (a)h .
Corollary 2.3. Under the above assumptions, f is C 1 if and only if all its components are C 1 .
Proof. Let Φ(a) = (f (a), g(a)). By the previous proposition, Φ is differentiable at a and
Φ0 (a)h = (f 0 (a)h, g 0 (a)h). Now we can write F = L ◦ Φ. It follows from the chain rule and
Proposition 2.6. that
Corollary 2.4. Under the above assumptions, if f and g are C 1 , then L(f, g) is also C 1 .
f
Now we will generalize the derivative of a quotient g. But for this we need a lemma.
f
Proof. This is a consequence of the previous lemma and the chain rule because g = F (f, g) and
so Å ã0
f
(a)h = DF (f (a), g(a))(f 0 (a)h, g 0 (a)h)
g
1 0 f (a) 0
= f (a)h − g (a)h
g(a) g(a)2
g(a)f 0 (a)h − f (a)g 0 (a)h
= .
g(a)2
f
Corollary 2.6. Under the above assumptions, if f and g are C 1 , then g is also C 1 .
∂f
(a), fx0 i (a), fxi (a), Di f (a), ∂i f (a).
∂xi
∂f
Thus, when it exists, (a) ∈ L(Ei , F ).
∂xi
Remark. Let λi : Ei → ni=1 Ei denote the map defined by
Q
Then
∂f
(a) = (f ◦ λi )0 (ai ).
∂xi
(f ◦ λi )0 (a) = f 0 (a) ◦ ui .
2.3. FUNCTIONS BETWEEN PRODUCTS OF NORMED SPACES 35
∂f
This means that ∂xi (a) exists and
∂f
(a)hi = f 0 (a)(ui (hi ))
∂xi
Therefore
n n
X ∂f X
(a)hi = f 0 (a)(ui (hi ))
∂xi
i=1 i=1
n
!
X
0
= f (a) ui (hi )
i=1
= f 0 (a)h.
Remark. If we write h as a column vector, then we can write
Ö è
Å ã h1
∂f ∂f ..
f 0 (a)h = (a) · · · (a) . .
∂x1 ∂xn
hn
However, not everything is lost. We will show first that the mean value theorem holds for real
valued maps. Next, by modifying its form, we will be able to extend the mean value theorem to
higher dimensions. But first, we have to generalize the notion of an interval.
Definition. Let E be a normed space and let a, b ∈ E. We set
[a, b] = {(1 − t)a + tb; 0 ≤ t ≤ 1}. We call it the straight-line segment joining a to b or the
straight-line segment with endpoints a and b.
]a, b[= {(1 − t)a + tb; 0 < t < 1}. We call it the open straight-line segment joining a to b.
37
38 CHAPTER 3. THE MEAN VALUE THEOREM
The mean value theorem, version 2.0. Let U be an open subset of a normed space E and let
f : U → R be differentiable. If [a, b] ⊂ U , then there exists ζ ∈]a, b[ such that
f (b) − f (a) = f 0 (ζ)(b − a).
b
ζ
a
Proof. Let γ(t) = (1 − t)a + tb = a + t(b − a), for t ∈ R. Then γ is differentiable and γ 0 (t) = b − a
(here we identify L(R, E) with E). Let g = f ◦ γ. Then g is a map from [0, 1] to R. It is continuous
on [0, 1] and differentiable on ]0, 1[. By the chain rule
g 0 (t) = f 0 (γ(t))γ 0 (t) = f 0 ((1 − t)a + tb)(b − a).
By version 1.0, there exists c ∈]0, 1[ such that
g(1) − g(0) = g 0 (c)(1 − 0) = g 0 (c).
Therefore,
f (b) − f (a) = f 0 (γ(c))(b − a).
Letting ζ = γ(c), we see that ζ ∈]a, b[ and
f (b) − f (a) = f 0 (ζ)(b − a).
With the same arguments used above, we can prove the following version.
The mean value theorem, version 2.1. Let U be an open subset of a normed space E and let
f : U → R be continuous and differentiable on U . If ]a, b[⊂ U , then there exists ζ ∈]a, b[ such that
f (b) − f (a) = f 0 (ζ)(b − a).
a ζ
b
U
3.1. SOME COMMON VERSIONS OF THE MEAN VALUE THEOREM 39
Proof. Note that γ(]0, 1[) =]a, b[⊂ U and γ([0, 1]) = [a, b] ⊂ U . It follows that g = f ◦ γ is
continuous on [0, 1] and differentiable on ]0, 1[. Therefore we can apply version 1.0 to g.
As a consequence of version 2.0 we get
The mean value theorem, version 2.2. Let U be an open and convex subset of a normed space
E and let f : U → R be differentiable. Then for every a, b ∈ U , there exists ζ ∈]a, b[ such that
We saw that version 1.0 of the MVT is not generalizable for maps f : U ⊂ E → F if dim F ≥ 2.
However there is a very important consequence of version 1.0 known as the MVT inequality.
The mean value theorem, version 1.1. Let I be an open interval of R and let f : I → R be
differentiable. Suppose that there is a constant M such that |f 0 (x)| ≤ M for all x ∈ I. Then
The result will follow by continuity of f and the norm when we let x → b. To prove this we will
prove that for every ε > 0
We can write
U = {x ∈]a, b[; ϕ(x) > 0} .
where ϕ(x) = ||f (x) − f (a)|| − (M + ε)(x − a) − ε. Continuity of ϕ implies that U is open. Suppose
by contradiction that U 6= ∅. Since U is bounded from below by a, U has an infimum c ≥ a.
Claim 1. c > a. Indeed, if not, then c = a. By a property of the infimum, there exists a sequence
(xn ) in U that converges to c = a. Since ϕ(xn ) > 0 and ϕ is continuous, we get ϕ(a) ≥ 0. However
ϕ(a) = −ε < 0.
Claim 2. c ∈ / U . This follows from the general fact that the infimum of a subset A of R belongs
to ∂A i.e., does not belong to the interior of A (check that).
Claim 3. c < b. Indeed, let x0 ∈ U . Then x0 ∈]a, b[ and x0 ≥ inf U = c. Thus, c ≤ x0 < b.
40 CHAPTER 3. THE MEAN VALUE THEOREM
Since c ∈]a, b[, it follows from the assumption of the theorem that ||f 0 (c)|| ≤ M . On the other
hand, it follows from the definition of the derivative that
f (x) − f (c)
lim = f 0 (c),
x→c x−c
and so
f (x) − f (c)
lim
= ||f 0 (c)||.
x→c
x−c
It follows that there exists δ > 0 such that
0
f (x) − f (c)
||f (c)|| ≥
− ε, whenever 0 < |x − c| < δ.
x−c
By a property of the infimum, [c, c + δ[∩U 6= ∅. Choose accordingly x ∈ [c, c + δ[∩U . Since c ∈
/ U,
we have x > c.
Since M ≥ ||f 0 (c)||, we get
f (x) − f (c)
M ≥
− ε,
x−c
and so
||f (x) − f (c)|| ≤ (M + ε)(x − c).
Since c ∈
/ U , we have
||f (c) − f (a)|| ≤ (M + ε)(c − a) + ε.
By the triangle inequality, we get
The mean value theorem, version 3.2. Let U be an open subset of a normed space E and let
f : U → F be continuous and differentiable on U . If ]a, b[⊂ U and ||f 0 (ζ)|| ≤ M for all ζ ∈]a, b[,
then
||f (b) − f (a)|| ≤ M ||b − a||.
As a corollary of version 3.1 we get
The mean value theorem, version 3.3. Let U be an open and convex subset of a normed space
E and let f : U → F be differentiable. Suppose that there is a constant M such that ||f 0 (x)|| ≤ M
for all x ∈ U . Then
||f (x) − f (y)|| ≤ M ||x − y|| ∀x, y ∈ U.
In words, a differentiable map with a bounded derivative is Lipschitz continuous.
3.2. CONNECTEDNESS AND LOCALLY CONSTANT MAPS 41
Questions.
(b) Give an example of a locally Lipschitz continuous function which is not globally Lipschitz
continuous.
1 −1/x2
f 0 (x) = + = 0.
1 + x2 1 + x12
π
But f (1) = Arctan 1 + Arctan 1 = 2 and f (1) = Arctan (−1) + Arctan (−1) = − π2 . What’s wrong?
In M2201, we proved that a path connected space is connected, but the converse is not true.
However, we have the following.
Proposition 3.2. Let U be an open subset of a normed space. Then the following conditions are
equivalent.
(i) U is connected.
Remark on the proof. It should be clear that (iii)⇒ (ii)⇒ (i). To prove that (i)⇒(iii), we
use a connectedness argument. We fix a point a ∈ U and we consider the set A of points in U
that can be joined to a by a polygonal line. Then we prove that A is not empty, open and closed.
Connectedness then implies that A = U and so all points of U can be joined to a by a polygonal
path. Since a was arbitrary, this means that any two points of U can be joined by a polygonal path.
Try to carry out this program or see the proof in the book of Cartan.
Exercise. What assumptions should we put on the topological X so that every open and connected
subset of X is path connected?
Theorem 3.1. Let f : U ⊂ E → F be a map between normed spaces. If f is locally constant and
U is connected then f is constant.
Proof. It should be clear that a locally constant map is continuous. Let a ∈ U and consider the
set A = {x ∈ U ; f (x) = f (a)}. Then a ∈ A and so A 6= ∅. Continuity of implies that A is closed
because we can write A = f −1 ({f (a)}). Local constancy implies that A is open. Connectedness of
U implies that A = U and so f (x) = f (a) for all x ∈ U .
Exercise. What condition should we put on the topological space Y so that the above result holds
for any function f : X → Y between topological spaces?
Corollary 3.1. Let E and F be normed spaces, let U ⊂ E be open and let f : U ⊂ E → F . Then
f is locally constant if and only if f is constant on the connected components of U .
Proof. Recall that connected components are maximal (with respect to inclusion) connected
subsets of U and that every connected subset of U is contained in some component. If f is locally
constant, then, by the previous theorem, it is constant on every connected subset of U and in
particular it is constant on the connected components of U . Conversely suppose that f is constant
on every connected component of U and let x ∈ U and let C denote the connected component
3.3. RELATION BETWEEN DIFFERENTIABILITY AND PARTIAL DIFFERENTIABILITY 43
containing x. Since U is open there is a ball B(x, r) ⊂ U . Since B(x, r) is convex, it is connected
and so is contained in C since C is the biggest connected subset of U containing x. Since f is
constant on C, it is also constant on B(x, r). This proves that f is locally constant.
Remark. The last argument shows that connected components of U are open.
Exercise. What conditions should we put on X and Y so that the previous corollary holds for a
map f : X → Y between topological spaces?
Otherwise stated, the partial derivatives are the components of the Fréchet derivative. Therefore,
continuity of f 0 is equivalent to the continuity of each component.
∂f
Suppose conversely that for every i, the map ∂x i
: U → L(Ei , F ) is continuous. We will prove
that f is differentiable at every point a ∈ U . Continuity of f 0 will follow from equation (3.1). We
will show that
n
X ∂f
f (x) − f (a) − (a)(xi − ai ) = o(x − a) as x → a.
∂xi
i=1
Our η will be η = mini=1,...,n ηi . Note that B(a, η) ⊂ B(a, ηi ) ⊂ U . Let x ∈ B(a, η). We can
write
n
X ∂f
f (x) − f (a) − (a)(xi − ai )
∂xi
i=1
∂f
= f (x1 , . . . , xn ) − f (a1 , x2 , . . . , xn ) − (a)(x1 − a1 )+
∂x1
∂f
+ f (a1 , x2 , . . . , xn ) − f (a1 , a2 , x3 , . . . , xn ) − (a)(x2 − a2 )+
∂x2
+ ···+
∂f
+ f (a1 , . . . , an−1 , xn ) − f (a1 , . . . , an ) − (a)(xn − an ). (3.3)
∂xn
Set for ζ1 ∈ B(a1 , η),
∂f
g1 (ζ1 ) = f (ζ1 , x2 . . . , xn ) − f (a1 , x2 , . . . , xn ) − (a)(ζ1 − a1 ).
∂x1
Then, first, g is well defined because (ζ1 , x2 , . . . , xn ) and (a1 , x2 , . . . , xn ) belong to the ball B(a, η)
which is contained in U where f is defined. Second, g1 (a1 ) = 0. Third, the first term in equation
(3.3) above is g1 (x1 ) = g1 (x1 ) − g1 (a1 ). And fourth,
∂f ∂f
g10 (ζ1 ) = (ζ1 , x2 , . . . , xn ) − (a1 , a2 , . . . , an ).
∂x1 ∂x1
Since (ζ1 , x2 , . . . , xn ) ∈ B(a, η) ⊂ B(a, η1 ), it follows from the continuity condition (3.2) that
0
∂f ∂f
ε
||g1 (ζ1 )|| =
(ζ1 , x2 , . . . , xn ) − (a1 , a2 , . . . , an )
≤ .
∂x1 ∂x1
n
The mean value theorem version 3.3. - applied to the ball B(a1 , η) - implies that
ε
||g1 (x1 )|| = ||g1 (x1 ) − g1 (a1 )|| ≤ ||x1 − a1 ||.
n
Similarly, we set for ζ2 ∈ B(a2 , η),
∂f
g2 (ζ2 ) = f (a1 , ζ2 , x3 , . . . , xn ) − f (a1 , a2 , x3 , . . . , xn ) − (a)(ζ2 − a2 ).
∂x2
Then, first, g2 is well defined. Second, g2 (a2 ) = 0. Third, the second term in equation (3.3) is
g2 (x2 ) = g2 (x2 ) − g2 (a2 ). And fourth,
∂f ∂f
g20 (ζ2 ) = (a1 , ζ2 , x3 , . . . , xn ) − (a1 , a2 , . . . , an ).
∂x2 ∂x2
Since (a1 , ζ2 , x3 , . . . , xn ) ∈ B(a, η) ⊂ B(a, η2 ), it follows from the continuity condition (3.2) that
0
∂f ∂f
ε
||g2 (ζ2 )|| =
(a1 , ζ2 , x3 . . . , xn ) −
≤ n.
(a1 , a2 , . . . , an )
∂x2 ∂x2
The mean value theorem implies that
ε
||g2 (x2 )|| = ||g2 (x2 ) − g2 (a2 )|| ≤ ||x2 − a2 ||.
n
3.4. CONVERGENCE OF A SEQUENCE OF DIFFERENTIABLE MAPS 45
Suppose now that the limit f is differentiable. Does it follow that the sequence of derivatives
fn0 converges to f 0 ?
The answer is again no. Take for example the sequence fn (x) = sinnnx . Then (fn ) converges
uniformly to the zero function. However, fn0 (x) = cos nx does not converge at all.
What about the converse? What can we say if fn0 converges uniformly? The answer is given
by the next two theorems.
(ii) F is complete.
(iii) For some point a ∈ U , the sequence fn (a) converges to a limit (this happens if fn converges
pointwise).
Remark. If instead of (iii) we assume that (fn ) is pointwise convergent on U , then, assumption
(ii) about completeness of F is not needed.
Theorem 3.5. In the above theorem, let us replace conditions (i) and (iv) by the following weaker
assumptions.
(i’) U is connected.
(iv’) For every x0 ∈ U , there exists a ball B(x0 , r) on which fn0 converges uniformly (we say
that fn0 is locally uniformly convergent).
Then,
Proof of Theorem 3.4. It follows from the mean value theorem version 3.3 (where convexity is
assumed), that for every x ∈ U ,
kfn (x) − fn (a) − (fm (x) − fm (a))k ≤ ||x − a|| sup
fn0 (ζ) − fm
0
(ζ)
. (3.4)
ζ∈U
This implies that 1) fn (x) is a Cauchy sequence in F and 2) this happens in a uniform way on
bounded subsets of U . Indeed, let ε > 0 be given. Assumption (iv) means that there exists n0 ∈ N
such that
1
sup ||fn0 (ζ) − g(ζ)|| ≤ ε ∀n ≥ n0 .
ζ∈U 2
The triangle inequality implies that
(fn0 is a uniform Cauchy sequence). It follows from inequality (3.4) that for every x ∈ U and every
n, m ≥ n0 ,
kfn (x) − fn (a) − (fm (x) − fm (a))k ≤ ε||x − a||.
Assumption (iii) implies that ||fn (a) − fm (a)|| ≤ ε for n, m large enough say for n, m ≥ n1 . Let
n, m ≥ n2 := max(n0 , n1 ). By the triangle inequality, we get
kfn (x) − fm (x)k ≤ ||fn (a) − fm (a)|| + kfn (x) − fn (a) − (fm (x) − fm (a))k
≤ ε + ε||x − a||. (3.5)
Since ε was arbitrary, this means that for every x ∈ U , fn (x) is a Cauchy sequence in F .
Completeness of F implies that fn (x) has a limit that we denote by f (x). This defines a map
f : U → F and proves conclusion (a).
To prove (b), fix, ε, a and n in inequality (3.5) and let m → ∞. We get
If x varies in a bounded subset of B ⊂ U , then ||x − a|| ≤ M for some constant M The previous
inequality implies that
kfn (x) − f (x)k ≤ ε(1 + M ).
and this holds for every x ∈ B. Since ε was arbitrary, this means that (fn ) converges uniformly to
f in B.
Now we prove conclusion (c) by showing that for every x0 ∈ U ,
kf (x0 + h) − f (x0 ) − g(x0 )hk ≤||f (x0 + h) − f (x0 ) − (fn (x0 + h) − fn (x0 ))||+
+ kfn (x0 + h) − fn (x0 ) − fn0 (x0 )hk+
+ ||fn0 (x0 )h − g(x0 )h||.
For n ≥ n0 , we have
Fix now an integer n ≥ n0 . Differentiability of fn at x0 implies that there exists δ > 0 such that
Proof of Theorem 3.5. Let A denote the set of points x ∈ U such that fn (x) converges. The
plan for proving the first conclusion is to prove the following points.
1. A 6= ∅.
2. A is open in U .
3. A is closed in U .
Connectedness of U dictates that A = U and so (fn ) is pointwise convergent. This is what we call
a connectedness argument.
1. Assumption (iii) means that a ∈ A and so A 6= ∅.
2. Let x0 ∈ A. Since U is open, there exists a ball B(x0 , r) ⊂ U . We apply the previous theorem
to the open convex set B(x0 , r) with a replaced by x0 . Accordingly, (fn ) converges uniformly on
B(x0 , r) and in particular pointwise on B(x0 , r). This means that B(x0 , r) ⊂ A and so A is open.
U
3. Let x ∈ A = A ∩ U (the closure of A in U ). Since x ∈ U and U is open, then again, there
exists a ball B(x, r) ⊂ U . From the characterization of the closure of a set, B(x, r) ∩ A 6= ∅.
Choose y ∈ B(x, r) ∩ A. Then fn (y) is convergent. We apply the previous theorem to the open
convex set B(x, r) with a replaced by y. Accordingly, (fn ) converges uniformly on B(x, r) and in
48 CHAPTER 3. THE MEAN VALUE THEOREM
particular pointwise on B(x, r). In particular fn (x) is convergent and so x ∈ A. This proves that
A is closed.
(b) Convergence of (fn ) on U defines a map f : U → F to which (fn ) converges pointwise. But
the proof of point 2. above shows that convergence is uniform on any ball B(x0 , r) ⊂ U .
(c) Let x0 ∈ U and let B(x0 , r) ⊂ U . We apply the previous theorem to the open convex set
B(x0 , r). Accordingly, f is differentiable ad f 0 = g where g is the limit of (fn0 ).
Remark. If we assume that (fn ) is pointwise convergent on U , then the connectedness assumption
and the completeness assumption are not needed.
Exercise 1. Give an example of a sequence of functions that converges locally uniformly but not
globally uniformly.
(i) U is connected.
(ii) F is complete.
P P
(iii) For some point a ∈ U , the series n≥1 un (a) is convergent (this happens if n≥1 un converges
pointwise).
(iv) The series of derivatives n≥1 u0n is locally uniformly convergent.
P
Then,
P
(a) the series n≥1 un is pointwise convergent on U ,
(b) convergence is locally uniform on U ,
P
(c) n≥1 un is differentiable and
Ñ é0
X X
un = u0n (differentation term by term).
n≥1 n≥1
Remark 1. Without any assumption, the above conclusion may fail. Here’s an example. In the
theory of Fourier series it is proved that
∞
X sin nx π−x
= for 0 < x < 2π,
n 2
n=1
convergence being pointwise in ]0, 2π[ and uniform on every compact subset of ]0, 2π[. If we
differentiate term by term we get
∞
X 1
cos nx = − for 0 < x < 2π,
2
n=1
3.5. STRICTLY DIFFERENTIABLE MAPS 49
which does not make sense unless we redefine the notion of convergence.
Remark 2. Convergent power series of a complex variable can be differentiated term by term
inside their disc of convergence.
Example/Exercise. Set for x > 1
∞
X 1
f (x) = .
nx
n=1
We know actually know that this series is convergent. Show that f is differentiable on ]1, +∞[.
Remark. f is the restriction of the celebrated Riemann zeta function to ]1, +∞[. The Riemann
zeta function is a holomorphic function on C\{1} and it plays an important role in complex variables
theory and in number theory. The celebrated Riemann hypothesis sates that all the nontrivial zeros
of the zeta function lye on the line Re z = 21 .
f (x) − f (a)
lim . (3.6)
x→a x−a
What about the limit
f (x) − f (y)
lim ? (3.7)
(x,y)→(a,a) x−y
If the limit in (3.7) exists, then the limit in (3.6) exists as well because we can take y = a in (3.7).
What about the converse? The converse is not true as showed by the following example.
Example. Let ®
x2 sin x1 if x 6= 0
f (x) =
0 if x = 0.
Then f is differentiable at 0 and f 0 (0) = 0 because
f (x) − f (0) 1
= x sin → 0 as x → 0.
x−0 x
However, the limit in (3.7) does not exist (at the point 0). Indeed, if not then for any two sequences
(xn ) and (yn ) converging to 0 the limit
f (xn ) − f (yn )
lim should exist.
n→∞ x n − yn
1 1
However, take xn = π and yn = π .
+ nπ
2 2 + (n + 1)π
Using the fact that sin x1n = sin( π2 + nπ) = (−1)n , we get after some algebraic simplifications
3 1
ñ ô
f (xn ) − f (yn ) (−1)n n + 2 n+ 2
= 1 + 3 ,
xn − yn π n+ 2 n+ 2
π
and this quantity has no limit as n → ∞. If you want to check that it is useful to set sn = 2 + nπ.
50 CHAPTER 3. THE MEAN VALUE THEOREM
This example leads us to a stronger condition of differentiability that we call strict differentiability
or strong differentiability.
Definition. Let f : I ⊂ R → R be a map defined on an open subset of I of R. We say that f is
strictly (or strongly) differentiable at a point a ∈ I if the following limit exists
f (x) − f (y)
lim .
(x,y)→(a,a) x−y
And now we can extend this definition to maps between normed spaces.
Definition. Let E and F be normed spaces, let U ⊂ E be open, a ∈ U , and let f : U → F be a
map. We say that f is strictly differentiable in the sense of Fréchet (or strictly Fréchet differentiable)
at the point a if there exists L ∈ L(E, F ) such that
∀ε > 0, ∃δ > 0 s.t ||x − a|| + ||y − a|| ≤ δ ⇒ ||f (x) − f (y) − L(x − y)|| ≤ ε||x − y||.
∀ε > 0, ∃δ > 0 s.t x, y ∈ B 0 (a, δ) ⇒ ||f (x) − f (y) − L(x − y)|| ≤ ε||x − y||.
Remark. It should be clear that a strictly Fréchet differentiable map is Fréchet differentiable
(take y = a in the above condition) and the Fréchet derivative is the operator L in the definition.
However, as the previous example showed, the converse is not true. But as the next theorem shows,
the notion of strict differentiability is not that mysterious.
Since as usual U is open x + th ∈ U for all t small enough. It follows from the Lipschitz condition
that
||f (x + th) − f (x)|| ≤ M ||th|| = M |t|||h||.
It follows that (for all t small enough).
f (x + th) − f (x)
≤ M ||h||.
t
Then
Suppose first that f is strictly Fréchet differentiable at a. Let ε > 0 be given. Strict Fréchet
differentiability of f at a implies that
for all x, y in some ball B(a, δ). It follows from the previous lemma that ||g 0 (ζ)|| ≤ ε for all
ζ ∈ B(a, δ). Thus, ||f 0 (ζ) − f 0 (a)|| ≤ ε for all ζ ∈ B(a, δ), and this is continuity of f 0 at a.
Suppose now conversely that f 0 is continuous at a. Continuity of f 0 at a implies that ||g 0 (ζ)|| ≤ ε
for all ζ in some ball B(a, δ) around a. The mean value theorem version 3.3. implies that
||f (x) − f (y) − f 0 (a)(x − y)|| ≤ ε||x − y|| ∀x, y ∈ B(a, δ).
It follows that f 0 has no limit as x → 0 and so it is not continuous at 0. This explains why f fails
to be strictly differentiable at 0.
Corollary 3.4. Let f : U ⊂ E → F be Fréchet differentiable. Then f is of class C 1 on U if and
only if f is strictly Fréchet differentiable on U (which means that f is strictly Fréchet differentiable
at every point of U ).
52 CHAPTER 3. THE MEAN VALUE THEOREM
The mean value theorem, version 3.4. Let F be a normed space. Consider two maps f : [a, b] ⊂
R → F and g : [a, b] ⊂ R → R that continuous and differentiable on ]a, b[. If ||f 0 (x)|| ≤ g 0 (x) for
all x ∈]a, b[. Then
||f (b) − f (a)|| ≤ g(b) − g(a).
Again this version is a particular case of a still more general version where f and g are assumed to
be right differentiable instead of differentiable.
Definition. A map f : [a, b[→ F is said to be right (Fréchet) differentiable at a point x0 ∈ [a, b[ is
the following limit exists
f (x0 + h) − f (x0 )
lim .
h→0,h>0 h
If it exists this limit is denoted by fr0 (x0 ) (in French fd0 (x0 )). We have a similar definition for left
differentiability.
The mean value theorem, version 3.5. Let F be a normed space. Consider two maps f :
[a, b] ⊂ R → F and g : [a, b] ⊂ R → R that are continuous and right differentiable on [a, b[. If
||fr0 (x)|| ≤ gr0 (x) for all x ∈]a, b[. Then
If you are interested you can prove this version along the same lines of the proof of version 3.0.
(or you can read it in the book of Cartan).
We can push the generalization further and prove the following version. See the book of Cartan.
The mean value theorem, version 3.6. Let F be a normed space. Consider two continuous
maps f : [a, b] ⊂ R → F and g : [a, b] ⊂ R → R. Suppose that f and g are right differentiable on
[a, b[\D where D is countable subset of [a, b[. If ||fr0 (x)|| ≤ gr0 (x) for all x ∈]a, b[\D. Then
In this general form (due to Cartan), this theorem is useful in the theory of the Bochner integral.
But this is outside the scope of this course and outside our undergraduate curriculum.
Chapter 4
f 0 : U → L(E, F ).
If this map is Fréchet differentiable at a point a ∈ U , we say that f is twice Fréchet differentiable at a.
In this case, (f 0 )0 (a) is a linear bounded map from E to L(E, F ), that is, (f 0 )0 (a) ∈ L(E, L(E, F )).
This element is called the second order derivative (or differential) of f at the point a and it is
denoted by
f 00 (a) or D2 f (a).
If h, k ∈ E, then f 00 (a)h ∈ L(E, F ) and so (f 00 (a)h)k ∈ F . In chapter 1, we agreed to identify
L(E, L(E, F )) with L2 (E, F ) and this consists of considering f 00 (a) as a bilinear map from E × E
into F . Therefore we will write f 00 (a)(h, k) instead of (f 00 (a)h)k. If f 00 (a) exists at every point
a ∈ U , we say that f is twice Fréchet differentiable on U , and in this case, we have a map
f 00 : U → L2 (E, F ).
Remark. Without assuming that f is differentiable on U , we say more generally, that f is twice
Fréchet differentiable at a ∈ U if f is Fréchet differentiable on a open neighborhood V of a and the
map f : V → L(E, F ) is differentiable at a.
This amounts to identifying the bilinear map f 00 (a) with the number f 00 (a)(1, 1) (every bilinear map
B : R × R → R is necessarily of the form B(x, y) = cxy where c = B(1, 1)).
More generally if f : U ⊂ R → F is twice differentiable at a ∈ U , then
This amounts to identifying the bilinear map f 00 (a) with the vector f 00 (a)(1, 1) (every bilinear map
B : R × R → F is necessarily of the form B(x, y) = xya where a = B(1, 1)).
Example 2. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let us show that f is
C 2 . We know that f is Fréchet differentiable on E and f 0 (x)h = 2hx, hi. Therefore, we can write
0
f (x + k) − f 0 (x) h = f 0 (x + k)h − f 0 (x)h = 2hx + k, hi − 2hx, hi = 2hk, hi.
53
54 CHAPTER 4. HIGHER ORDER DERIVATIVES
You should recognize that the bilinear (bounded) term is 2hk, hi. Let L denote this map, i.e,
(Lk)h = 2hk, hi. Then we can write
f 0 (x + k) − f 0 (x) − Lk h = 0.
Let us establish now a useful rule for second order differentiation. Let f : U ⊂ E → F be
differentiable. For x ∈ U and h ∈ E, the term f 0 (x)h ∈ F depends on two variables x and h. This
defines a map g : U × E → F given by
g(x, h) = f 0 (x)h
and the partial map g(x, ·) = f 0 (x) is linear and bounded. Now observe that differentiability of f 0
at a point a ∈ U is equivalent to the partial differentiability of g with respect to x at the point a
(write the details). In this case, when we differentiate this relation with respect to x and put x = a,
we get
∂g
(a, ·) = f 00 (a)
∂x
which means that
∂g
(a, ·)k = f 00 (a)k, ∀k ∈ E,
∂x
which means that
∂g
(a, h)k = (f 00 (a)k)h ∀k ∈ E, ∀h ∈ E.
∂x
This equation can be written as
∂g
f 00 (a)(k, h) = (f 00 (a)k)h = (a, h)k ∀h, k ∈ E. (4.1)
∂x
Example 1. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Let g(x, h) = 2hx, hi.
Then f 0 (x)h = g(x, h). Since g is linear and bounded in the first variable, we know from chapter 2
that g is differentiable with respect to x and
∂g
(x, h)k = g(k, h) = 2hk, hi.
∂x
Therefore f is differentiable and
f 00 (x)(k, h) = 2hk, hi.
∂ Pn
Using the rule above and interchanging ∂xj and i=1 , we can write
n ï
∂f 0
Å ã Å ã ò
X ∂ ∂f
(a)kj (h1 , . . . , hn ) = (a)kj hi .
∂xj ∂xj ∂xi
i=1
and therefore
Ñ é
n
X ∂f 0
(f 00 (a)(k1 , . . . , kn ))(h1 , . . . , hn ) = (a)kj (h1 , . . . , hn )
∂xj
j=1
n X
n
X ∂2f
= (a)(kj , hi ).
∂xj ∂xi
j=1 i=1
∂2f
(a) : Ei × Ej → F
∂xi ∂xj
and
∂2f
(a) : Ej × Ei → F.
∂xj ∂xi
Lemma 4.1. Let S : E1 × E2 → F be a bilinear map. If S(h, k) = o((||h|| + ||k||)2 ) near (0, 0),
then, S = 0.
Proof of the lemma. Let ε > 0 be given. Our assumption implies that there exists δ > 0 such
that
||S(h, k)|| ≤ ε (||h|| + ||k||)2 whenever ||(h, k)|| ≤ δ.
Here we take ||(h, k)|| = max(||h||, ||k||) but any other equivalent norm on E1 × E2 would do as
well.
Consider two arbitrary elements h ∈ E1 and k ∈ E2 . Take a positive number t such that
||(th, tk)|| ≤ δ. Then, according to the above condition
Dividing by t2 , we get
kS (h, k)k ≤ ε (||h|| + ||k||)2 .
Since (h, k) was arbitrary in E1 × E2 , this implies that S is bounded and ||S|| ≤ 4ε. Since ε was
arbitrary, we conclude that ||S|| = 0 and so S = 0.
Proof of Proposition 4.1. Set
A(h, k) = f (a + h + k) − f (a + h) − f (a + k) + f (a).
and note that A(h, k) = A(k, h). We claim that for ||(h, k)|| small enough
and so
||f 0 (a + h)k − f 0 (a)k − f 00 (a)(h, k)|| ≤ ε||h|| ||k|| whenever ||h|| ≤ δ. (4.4)
To estimate the first term of equation (4.3), we set for k small enough
Applying the mean value theorem version 3.3 to a small enough ball of center 0 containing k, we
get
||B(k) − B(0)|| ≤ ||k|| sup ||B 0 (tk)||.
0≤t≤1
But
B 0 (k) = f 0 (a + h + k) − f 0 (a + k) − f 0 (a + h) + f 0 (a),
and so
B 0 (tk) = f 0 (a + h + tk) − f 0 (a + tk) − f 0 (a + h) + f 0 (a)
= f 0 (a + h + tk) − f 0 (a) − f 00 (a)(h + tk)
− f 0 (a + tk) − f 0 (a) − f 00 (a)(tk)
− f 0 (a + h) − f 0 (a) − f 00 (a)h) .
and
||f 0 (a + tk) − f 0 (a) − f 00 (a)(tk)|| ≤ ε||tk|| ≤ ε||k||
and
||f 0 (a + h) − f 0 (a) − f 00 (a)h)|| ≤ ε||h||.
It follows from the triangle inequality that
Thus finally,
||A(h, k) − f 00 (a)(h, k)|| ≤ 3ε(||h|| + ||k||)2 .
This proves the claim. Interchanging the roles of h and k, we get
∂2f ∂2f
(a)(v, u) = (a)(u, v).
∂xj ∂xi ∂xi ∂xj
58 CHAPTER 4. HIGHER ORDER DERIVATIVES
∂2f ∂2f
(a) = (a) ∈ F.
∂xj ∂xi ∂xi ∂xj
This is a generalization of Schwarz’s theorem on mixed partial derivatives, known also as Clairaut’s
theorem. It is usually proved under the assumption that f is of class C 2 or that the mixed partial
derivatives are continuous.
(d) Challenging question. If f : U ⊂ E × E → F is twice differentiable at a point a ∈ U , does
∂2f
it follow that the bilinear map ∂x∂y (a) is symmetric?
Having understood second order derivatives, let us move to derivatives of arbitrary order.
If f (n) (x) exists at every point x ∈ U , we say that f is n times differentiable on U . If the map
f (n) : U → Ln (E; F ) is continuous, we say that f is of class C n . If f is of class C n for every
n ≥ 1, we say that f is of class C ∞ . If f is continuous we say that f is of class C 0 and we set
f (0) (x) = f (x). It is easy to see that is f is n + m times differentiable at a, then
Example 1. All the elementary functions of calculus are C ∞ in the interior of their domains. This
includes, polynomials, rational functions, radicals, the exponential function, all the logarithms, the
trigonometric and inverse trigonometric functions.
Example 3/Exercise. Let E be an inner product space and let f (x) = ||x||2 = hx, xi. Then f is
of class C ∞ .
4.1. DEFINITIONS, EXAMPLES AND BASIC PROPERTIES 59
Example 4/Exercise. Equip E = C[0, 1] with the maximum norm and let f : E → R be defined
by Z 1
f (u) = u3 (t) dt.
0
Compute f 0 (u)h, f 00 (u)(h, k) and 000
f (u)(h, k, w) and deduce that f is of class C ∞ .
t
1
(a) Show by induction that f is n times differentiable at 0 and that f (n) (0) = 0.
(b) Deduce that f is C ∞ .
(c) What is the Taylor series of f at 0? What can you conclude?
Proof/Exercise. Reason by induction and use Proposition 4.1 and the fact that any permutation
is a product of transpositions interchanging two consecutive elements.
Proof/Exercise. Reason by induction (the basis step is just the chain rule).
Proposition 4.4. Let E and F be isomorphic Banach spaces and consider the map Φ :
Isom(E, F ) → L(F, E) defined by Φ(S) = S −1 . Then Φ is of class C ∞ .
Proof/Exercise. Reason by induction and use the previous proposition.
∂ |α| f
Dα f (a), (a), (∂nαn · · · ∂1α1 f ) (a), (Dnαn · · · D1α1 f ) (a).
∂xαnn · · · ∂xα1 1
60 CHAPTER 4. HIGHER ORDER DERIVATIVES
Taylor’s formula is an approximation of a smooth function by polynomials. There are three forms
of Taylor formula that depend on the assumptions about the function and the form of the remainder:
Taylor-Young.
Taylor-Lagrange.
Let us recall these formulas in the case of a function f : I → R where I is an open interval of
R.
Taylor-Young. Let f : I → R have derivatives up to order n − 1 and let x, a ∈ I. If f (n) (a) exists,
then
1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + Rn (x)
2! n!
where
Rn (x)
lim .
x→a (x − a)n
1 00 1
f (a + h) = f (a) + f 0 (a)h + f (a)h2 + · · · + f (n) (a)hn + o(hn ).
2! n!
1 00 1
f (x) = f (a) + f 0 (a)(x − a) + f (a)(x − a)2 + · · · + f (n) (a)(x − a)n + Rn (x)
2! n!
where
f (n+1) (c)
Rn (x) = (x − a)n+1 for some c between a and x.
(n + 1)!
Equivalently,
M
|Rn (x)| ≤ |h|n+1 = O(hn+1 ).
(n + 1)!
4.2. TAYLOR’S FORMULAS 61
1 a+h
Z
0 1 00 2 1 (n) n
f (a + h) = f (a) + f (a)h + f (a)h + · · · + f (a)h + (a + h − t)n f (n+1) (t) dt
2! n! n! a
hn+1 1
Z
1 1
= f (a) + f 0 (a)h + f 00 (a)h2 + · · · + f (n) (a)hn + (1 − s)n f (n+1) (a + sh) ds.
2 n! n! 0
f 00 (a)h2 := f 00 (a)(h, h)
and
L(hn−1 )k = L(h, . . . , h, k)
Therefore Lhn−1 is a bounded linear map from E to F .
Let us establish first a useful result that generalizes that fact that the derivative of xn is nxn−1 ,
(n ∈ N∗ ).
Lemma 4.2. Let L ∈ Ln (E, F ) be symmetric and set g(h) = Lhn . Then g is differentiable and
g 0 (h)k = nL(hn−1 )k
or equivalently
g 0 (h) = nLhn−1 .
Proof. We write
Therefore
L(h + k)n − Lhn − nL(hn−1 )k = 0 = o(k).
and this is just the Fréchet differentiability of f at a. Suppose that the theorem is true for a certain
integer n ≥ 1 (and for any map satisfying the assumptions). Let f be n times Fréchet differentiable
and suppose that f (n+1) (a) exists. Set for h small
1 00 1 1
ϕ(h) = f (a + h) − f (a) − f 0 (a)h − f (a)h2 − · · · − f (n) (a)hn − f (n+1) (a)hn+1 .
2! n! (n + 1)!
According to the lemma, ϕ is differentiable and
1 1
ϕ0 (h) = f 0 (a + h) − f 0 (a) − f 00 (a)h − · · · − f (n) (a)hn−1 − f (n+1) (a)hn .
(n − 1)! n!
By the induction assumption applied to f 0 , we get ϕ0 (h) = o(||h||n ). Let ε > 0 be given. Then
there exists δ > 0 such that
||ϕ0 (h)|| ≤ ε||h||n for ||h|| < δ.
Let ||h|| < δ. By the mean value theorem applied to the ball B(0, δ),
To prove the other versions of Taylor’s formula, we need two preliminary results. Recall that for
map g : I ⊂ R → F which is differentiable at some point t0 ∈ I, we can consider g 0 (t0 ) as an
element in F and write
d g(t0 + h) − g(t0 )
g(t0 ) = g 0 (t0 ) = lim .
dt h→0 h
Lemma 4.3. (Product rule). Let I be an open interval of R and let F be a normed spaces. Let
f : I → R and g : I → F be differentiable at some point t0 ∈ I. Then the map f g is differentiable
at t0 and
(f g)0 (t0 ) = f (t0 )g 0 (t0 ) + f 0 (t0 )g(t0 ).
4.2. TAYLOR’S FORMULAS 63
Proof/Exercise. You can prove this familiar rule by invoking Proposition 2.6 and the chain rule.
Alternatively, you can write
Corollary 4.1. Let v : [0, 1] → F be continuous and n + 1 times differentiable on ]0, 1[. Suppose
that ||v n+1) (t)|| ≤ M for some constant M . Then
0 1 00 1 (n)
v(1) − v(0) − v (0) − v (0) − · · · − v (0)
≤ M
.
2! n!
(n + 1)!
Proof. Set
1 1
u(t) = v(t) + (1 − t)v 0 (t) + (1 − t)2 v 00 (t) + · · · + (1 − t)n v (n) (t).
2! n!
Then u(1) = v(1) and
1 00 1
u(0) = v(0) + v 0 (0) + v (0) + · · · + v (n) (0).
2! n!
The previous lemma states that
1
u0 (t) = (1 − t)n v (n+1) (t)
n!
and so
M
||u0 (t)|| ≤ (1 − t)n .
n!
Setting,
M
g(t) = − (1 − t)n+1 ,
(n + 1)!
we can write this inequality as
||u0 (t)|| ≤ g 0 (t).
By the mean value theorem version 3.4
M
||u(1) − u(0)|| ≤ g(1) − g(0) = .
(n + 1)!
f 00 (a) 2 1
f (a + h) = f (a) + f 0 (a)h + h + · · · + f (n) (a)hn + O ||h||n+1 .
2 n!
Proof. For t ∈ [0, 1], set v(t) = f (a + th). The assumption [a, a + h] ⊂ U implies that v is well
defined and continuous on [0,1]. The chain rule implies that v is n + 1 times differentiable on ]0,1[
and
v 0 (t) = f 0 (a + th)h; v 00 (t) = f 00 (a + th)h2 ; · · · v (n+1) (t) = f (n+1) (a + th)hn+1 .
The assumption ||f (n+1) (x)|| ≤ M implies that ||v (n+1) (t)|| ≤ M ||h||n+1 .
According to the previous corollary (with M replaced by M ||h||n+1 ), we obtain
v(1) − v(0) − v 0 (0) − 1 v 00 (0) − · · · − 1 v (n) (0)
≤ M
||h||n+1 .
2! n!
(n + 1)!
The Taylor-Cauchy’s formula can be extended to maps between Banach spaces. But for this
one has to extend the notion of the Riemann integral to maps that take values in a Banach space
(completeness is needed in this theory). See for example the book of Azé. In this theory, one can
prove the fundamental theorem of calculus which states that if g : [a, b] → F is C 1 , then
Z b
g 0 (t) dt = g(b) − g(a).
a
Proof/Exercise. Integrate from 0 to 1 the identity in Lemma 4.4. and set v(t) = f (a + th).
f (x) ≥ f (a) ∀x ∈ V.
Or equivalently,
f (a + h) ≥ f (a) for h small enough.
4.3. LOCAL EXTREMA 65
In this case, we also say a is a local minimum point of f . We have a similar definition for local
maximum. A local minimum or maximum is called a local extremum.
f (x) ≥ f (a) ∀x ∈ U.
We have a similar definition for global maximum. A global minimum or maximum is called a global
extremum.
• We say that f has a strict local minimum at a if there is a neighborhood V ⊂ U of a such that
Similar definitions for strict local maximum, strict global maximum and strict global minimum.
Observation. If f has a local (resp. global) minimum at a, then −f has a local (resp. global)
maximum at a. a is critical point of f if and only if it is a critical point of −f . Thus, studying
minima is equivalent to studying maxima.
Proof. This result is known for a map v : I ⊂ R → R. It is enough to prove the result in the
case of local minimum. Let h ∈ E be arbitrary. Set for t small v(t) = f (a + th) (v is well defined
because U is open). By assumption, f (a + th) ≥ f (a) for all t small enough, say t ∈] − δ, δ[. This
can be written as
v(t) ≥ v(0) for t ∈] − δ, δ[.
f (a + th) − f (a)
v 0 (0) = lim = f 0 (a)h.
t→0 t
We know from calculus that v 0 (0) = 0. Therefore f 0 (a)h = 0. Since h was arbitrary, this means
that f 0 (a) = 0.
Remark 1. The converse is not true. For example let f (t) = t3 for t ∈ R. Then f 0 (0) = 0 but f
has has no local extremum at 0.
Definition. A critical point at which f has no local extremum is called a saddle point. This term
comes from the following analogy. The graph of a function of two real variables near a saddle point
looks like the saddle of the horse.
66 CHAPTER 4. HIGHER ORDER DERIVATIVES
f 00 (a)x2 = f 00 (a)(x, x) ≥ 0 ∀x ∈ E.
We say that f 00 (a) is positive definite (or positive nondegenerate) if there exists a positive constant
α such that
f 00 (a)x2 ≥ α||x||2 ∀x ∈ E.
Remark 1. For a function of a real variable, saying that f 00 (a) is positive means that the number
f 00 (a) is ≥ 0. Saying that f 00 (a) is positive definite means that f 00 (a) > 0.
Remark 2. For a map of n real variables, f 00 (a) is identified with the Hessian matrix and saying
that f 00 (a) is positive means that f 00 (a) is positive in the linear algebra sense, i.e,
xT f 00 (a)x ≥ α||x||2 ∀x ∈ Rn .
Exercise. Let A be an n × n real matrix. Show that A is positive definite if and only if xT Ax > 0
whenever x 6= 0. Hint. Use the compactness of the unit sphere in Rn .
1
f (a + h) − f (a) = f 00 (a)h2 + r(h)
2
where r(h) = o(||h||2 ). By assumption the exists α > 0 such that f 00 (a)x2 ≥ α||x||2 . Choose
0 < ε < α2 . Then for h small |r(h)| ≤ ε||h||2 . Conclude.
(i) f 00 (a) and −f 00 (a) are not positive, i.e there exists x, y ∈ E such that f 00 (a)x2 < 0 and
f 00 (a)y 2 > 0.
(ii) The Hessian matrix f 00 (a) has one strictly negative eigenvalue and one strictly positive eigenvalue.
Hence the following methodology for classifying critical points of a twice differentiable map f : U ⊂
Rn → R. Suppose that f 0 (a) = 0.
4. If one eigenvalue is > 0 and another eigenvalue is < 0, then a is saddle point.
5. If none of the above hold, we cannot conclude in general. However, if for example, f is three
times differentiable, we look at the third term in the Taylor-Young’s formula.
Remark. Finding the eigenvalues is not the unique way for showing that a matrix is positive
definite. For example, prove in three ways that the following n × n matrix (which is standard in
numerical analysis) is positive definite.
2 −1 0 · · · ···
0
−1 2 −1 · · · ··· 0
0 −1 2 −1 ··· 0
A= . .
.. .. .. .. .. ..
. . . .
.
0 0 −1 2 −1
0 0 ··· ··· −1 2
Hint. Consider this matrix for small values of n and find a pattern.
68 CHAPTER 4. HIGHER ORDER DERIVATIVES
Convex functions
Let U be a convex subset of normed space E and let f : U → R be a map. We say that f is convex
if
f (1 − t)x + ty ≤ (1 − t)f (x) + tf (y) ∀x, y ∈ U ∀t ∈ [0, 1].
Geometric interpretation. Given any two points on the graph of f , the straight line segment
joining them is above the graph of f (draw a figure).
Proposition 4.8. Let U be an open and convex subset of normed space E and let f : U → R be
a differentiable map. The the following conditions are equivalent.
(i) f is convex.
(ii) f (y) ≥ f (x) + f 0 (x)(y − x) for all x, y ∈ U .
Proof. We will prove that (i)⇔(ii). You will prove that (ii)⇔(iii) in the exercises.
(i)V(ii). Let x, y ∈ U and t ∈]0, 1[. Then, by assumption
f x + t(y − x) = f (1 − t)x + ty ≤ (1 − t)f (x) + tf (y).
Letting h = y − x, we can write
f (x + th) ≤ (1 − t)f (x) + tf (y).
Subtracting f (x) and dividing by t we get
f (x + th) − f (x)
≤ f (y) − f (x).
t
Letting t → 0, we get
f 0 (x)h ≤ f (y) − f (x),
i.e.,
f 0 (x)(y − x) ≤ f (y) − f (x).
(ii)V(i). Let x, y ∈ U , t ∈ [0, 1] and set z = (1 − t)x + ty. By assumption,
f (x) ≥ f (z) + f 0 (z)(x − z)
f (y) ≥ f (z) + f 0 (z)(y − z).
Multiplying the first inequality by (1 − t) and the second one by t and then adding them, we get
(1 − t)f (x) + tf (y) ≥ f (z)
because (1 − t)(x − z) + t(y − z) = (1 − t)x + ty − z = 0.
Corollary 4.2. Let f : U ⊂ E → R be convex and differentiable. Then, the following conditions
are equivalent.
(i) f 0 (a) = 0.
(ii) f has a global minimum at a.
Exercise. Prove, without the assumption of differentiability, that if a convex function has a local
minimum at a point, then it has a global minimum at that point.
Chapter 5
5.1 Introduction
We shall consider two interesting and related problems.
Problem 1. Given an equation of the form y = f (x), can we write x = g(y)? For this to make
sense, f must be bijective. Suppose that this is the case. If y depends smoothly on x, does x
depend smoothly on y? Otherwise stated, if f is smooth, does it follow that f −1 is smooth?
Without any additional assumption, the answer is no. For example the map f : R → R given
by f (x) = x3 is smooth but its inverse f −1 which is given by f −1 (y) = y 1/3 is not smooth (it is not
differentiable at 0). However, under certain assumptions, the answer is yes. We shall attack this
problem in full generality in the context of smooth maps between Banach spaces.
Remark. We can ask another question. If y depends continuously on x, does it follow that x
depends continuously on y? Otherwise stated, if f is a continuous bijection, does it follow that f −1
is continuous? Without any additional assumption, the answer is no and we gave a counterexample
in the topology course. However there are some interesting cases where this is true. Recall that a
continuous bijection whose inverse is continuous is called a homeomorphism.
Example 1. If L : E → F is a linear continuous bijection between two Banach spaces, then L−1 is
continuous as well. This is the celebrated Banach isomorphism theorem (known also as the bounded
inverse theorem) that we mentioned in Chapter 1.
Problem 2. If two variables x and y are related by a relation of the form f (x, y) = 0, can we
express y as a function of x, i.e, can we write y = g(x)?
Without any additional assumption, the answer is no. For example consider the equation x2 +
y2 = 1. We cannot express y in terms of x or vice versa. Geometrically, this means that a full circle
is not the graph of a function. However small parts of the circle are the graph of a function. If we
restrict x and y to a neighborhood of the north pole (0, 1), then y > 0 and so we can write in that
69
70CHAPTER 5. THE INVERSE FUNCTION THEOREM AND THE IMPLICIT FUNCTION THEOREM
√
neighborhood y = 1 − x2 . If √ we restrict x and y to a neighborhood of the south pole (0, −1),
then we can write locally y = − 1 − x2 . However, no matter howp close to the east pole (1, 0) we
are, we cannot write y as a function of x (but we can write x = 1 − y 2 ).
Suppose that we can solve the equation f (x, y) = 0 locally and write y = g(x) and suppose
that f is smooth, does it follow that g is smooth?
As a warm up, let us tackle Problem 1 in a very particular case. Let f : V → W be a smooth
bijection between two open sets of R. If f −1 is differentiable, then f 0 (x) 6= 0 for all x ∈ I.
Conversely, if f 0 never vanishes and f −1 is continuous, then f −1 is as smooth as f .
Suppose first that f −1 is differentiable. Differentiating both sides of the equation f −1 (f (x)) = x,
we get by the chain rule
0
f −1 (f (x)) · f 0 (x) = 1.
This means that f 0 (x) 6= 0 and
0 1
(f −1 (f (x)) = .
f 0 (x)
Or equivalently
0 1
f −1 (y) = .
f 0 (f −1 (y))
Remark. A physicist would just write
dx 1
= dy
dy
dx
0
and the problem is over. This is a good way to remember the relation between f −1 and f 0 .
However, we need to be rigorous and precise and put the right assumptions.
Suppose conversely that f −1 is continuous and that f 0 never vanishes. We prove first that f −1
is differentiable at every point b ∈ W . For y = f (x) near b = f (a), we can write
Continuity of f −1 implies that x → a as y → b. Accordingly, the left hand side of (5.1) tends to
1 −1 is differentiable at b and
f 0 (a) as y → b. It follows that f
0 1
f −1 (b) = . (5.2)
f 0 (f −1 (b))
If f is of class C 1 , then f 0 is continuous. On the other hand, f −1 is continuous and the map t 7→ 1t
0
is continuous on R\{0}. Therefore f −1 is continuous as a composition of continuous functions.
If now f is of class C n , we can easily prove by induction that f −1 is also of class C n . Indeed,
we just proved the result for n = 1. Suppose it is true for n − 1 and let f be of class C n . Then f 0
is of class C n−1 . By the induction assumption f −1 is of class C n−1 . Since the inverse map t 7→ 1t
is C ∞ , the map b 7→ f 0 (f −1
1
(b))
is of class C n−1 as a composition of maps of class C n−1 . It follows
0
from relation (5.2) that f −1 is of class C n−1 . This means that f −1 is of class C n .
The assumption that f 0 does not vanish is a strong assumption. What if f 0 does not vanish
at just one point a? If f 0 is continuous, then f 0 does not vanish on a neighborhood of that point.
Therefore according to the previous discussion, f −1 is as smooth as f locally, i.e, near the point
b = f (a).
5.2. THE INVERSE FUNCTION THEOREM 71
We can prove a better result. If f is not necessarily globally bijective but f 0 is continuous and
does not vanish at a point a, then f is bijective near a and its inverse is as smooth as f . This is
essentially the content of the local inversion theorem that we shall prove in the general context of
Banach spaces. As we know, the generalization of condition f 0 (a) 6= 0 is the condition that f 0 (a)
is an isomorphism (invertible Jacobian matrix in n−dimensions).
Consider now Problem 2 in dimension 1. Under certain general assumptions, we can solve the
equation f (x, y) = 0 locally near a point (a, b) where f (a, b) = 0. If f is smooth and ∂f
∂y (a, b) 6= 0,
then, there is a function g defined on a neighborhood of a such that f (x, g(x)) = 0. This function
is as smooth as f . Differentiating this relation and using the chain rule, we get
∂f ∂f 0
+ g (x) = 0.
∂x ∂y
It follows that Å ã−1
0 ∂f ∂f
g (x) = − (5.3)
∂y ∂x
where the partial derivatives are evaluated at the point (x, g(x)). This is essentially the content of
the implicit function theorem. Equation (5.3) is called implicit differentiation.
Remark. Except for the first lemma, in this chapter, we need to work in Banach spaces for three
reasons. First, we will use the Banach fixed point theorem. Second, we need the fact that for two
Banach spaces E and F , Isom(E, F ) is open in L(E, F ) and third, we need the fact that the map
Φ : Isom(E, F ) → Isom(F, E) given by Φ(S) = S −1 is of class C ∞ .
Banach fixed point theorem (or the Banach contraction principle). Let (X, d) be a complete
metric space and let ψ : X → X be a contraction, i.e, there exists a constant k < 1 such that
Then ψ has a unique fixed point, i.e, there exists a unique ` ∈ X such that ψ(`) = `.
Remark 1. The element ` is given by a constructive procedure. Start with an arbitrary point x0
and define a sequence (xn ) by xn+1 = ψ(xn ). It is easy to prove that (xn ) is a Cauchy sequence.
The completeness assumption implies that (xn ) has a limit `. Then ` is the required fixed point.
Remark 2. It’s essential that ψ maps X into itself. First, if x is a fixed point of ψ, then x and
ψ(x) should belong to the same space. But even, if X ⊂ Y and ψ :√ X → Y is a contraction,√then ψ
need not have a fixed point. For example the map ψ : [0, 1] → [0, 2] defined by ψ(x) = 1 + x2
is a contraction but has no fixed point (check that). Of course a contraction has at most one fixed
point.
In this case g 0 (b) = (f 0 (a))−1 (the derivative of the inverse is the inverse of the derivative).
Proof. (i) ⇒ (ii). Consider the identity g ◦ f = 1E . By the chain rule
Together, equations (5.4) and (5.5) imply that f 0 (a) is bijective and its inverse is g 0 (b). Since by
definition f 0 (a) and g 0 (b) are bounded, we get f 0 (a) ∈ Isom (E, F ).
(ii) ⇒ (i). We will write y = f (x) to denote the general argument of g. We start by writing the
differentiability condition at a
where r(x) = o(x − a) for x near a. Applying (f 0 (a))−1 to both sides of the equation, we get
−1 −1
f 0 (a) (y − b) − (x − a) = f 0 (a) r(x), (5.6)
Let ε be a positive number < ||(f 0 (a))−1 ||. Since r(x) = o(x − a), there exits α > 0 such that
ε
||r(x)|| ≤
−1
2 kx − ak whenever ||x − a|| ≤ α.
0
2 f (a)
It follows that
0 −1
f (a) r(x)
≤
f 0 (a) −1
r(x)
≤
ε 1
−1
kx − ak < kx − ak. (5.7)
0
2 f (a)
2
1 1
≥ ||x − a|| − ||x − a|| = ||x − a||.
2 2
Thus we proved that
−1 −1
||x − a|| ≤ 2
(f 0 (a) (y − b)
≤ 2
(f 0 (a)
||y − b||
(5.8)
5.2. THE INVERSE FUNCTION THEOREM 73
whenever ||x − a|| ≤ α. Continuity of g at b implies that there exists β > 0 such that ||x − a|| =
||g(y) − g(b)|| ≤ α if ||y − b|| ≤ β. Thus, let ||y − b|| ≤ β. Combining estimates (5.7) and (5.8),
we get
0 −1
(f (a) r(x)
≤
ε
−1
kx − ak
2
f 0 (a)
ε
0 −1
≤
−1
2
(f (a)
||y − b||
2
f 0 (a)
= ε||y − b||.
Theorem 5.1. (the inverse function theorem or the local inversion theorem – version 1).
Let E and F be isomorphic Banach spaces and let f : U ⊂ E → F be a differentiable map such
that
(a) f 0 is continuous at a point a ∈ U .
Remark. In this theorem, we did not assume that f is a bijection or that f −1 is continuous
because this is the conclusion.
Proof. Here’s our plan for the proof.
Step 1. Write the equation y = f (x) as a fixed point problem.
Step 2. Show that this problem has a unique solution for y in some closed neighborhood W of b.
This defines a map g : W → E that to each y ∈ W associates the unique solution of the equation
y = f (x).
Step 3. Show that g is continuous.
Step 4. Find a neighborhood V of a such that f : V → W is a bijective. Its inverse will be
necessarily g : W → V .
Step 1. Let
ψy (x) = x − (f 0 (a))−1 (y − f (x)) .
Since (f 0 (a))−1 is an isomorphism, we see that y = f (x) if and only if ψy (x) = x. We will show
shortly that ψy is a contraction, but without any restriction on y, ψy need not have any fixed point.
We need to ensure that ψ maps a closed subset of E into itself.
Step 2. Continuity of f 0 at a implies that there exists a number α > 0 such that
1
||f 0 (a) − f 0 (ζ)|| ≤ ∀ζ ∈ B 0 (a, α).
2||(f 0 (a))−1 ||
Observe that
It follows that
1
||u0 (t)|| ≤ ||(f 0 (a))−1 || ||f 0 (a) − f 0 tx + (1 − t)x0 || ||x − x0 || ≤ ||x − x0 ||.
2
By the mean value theorem version 3.3.
1
||u(1) − u(0)|| ≤ ||x − x0 ||,
2
that is
1
||x − x0 − (f 0 (a))−1 f (x) + (f 0 (a))−1 f (x0 )|| ≤ ||x − x0 ||. (5.9)
2
Recalling the definition of ψy , the above inequality implies that
1
||ψy (x) − ψy (x0 )|| ≤ ||x − x0 ||.
2
This means that ψy is a contraction.
α
Let β = , so that ||(f 0 (a))−1 ||β = α2 .
2||(f (a))−1 ||
0
and
ψy0 (g(y 0 )) = g(y 0 ) + (f 0 (a))−1 y 0 − f (g(y 0 )) = g(y 0 ).
5.2. THE INVERSE FUNCTION THEOREM 75
Therefore
ψy (g(y 0 )) − ψy0 (g(y 0 )) = (f 0 (a))−1 (y − y 0 ).
Using these observations, the triangle inequality and that ψ is a 21 −contraction, we get
It follows from the last claim that f : V → W is a homeomorphism. This proves conclusion (i).
Conclusion (ii) follows from Lemma 5.1.
Remark. The proof of the above theorem gives an algorithm for solving the equation y = f (x).
If we recall the proof of the Banach fixed point theorem, we see that x is the limit of a sequence
defined by x0 ∈ B 0 (a, α) and
This iteration is called Newton’s method. It plays a central role in numerical analysis. If y is
sufficiently close to b (||y − b|| ≤ β) and x0 is sufficiently close to a (||x0 − a|| ≤ α), convergence
in principle is very fast. However, the algorithm may diverge and inverting a large matrix can be
computationally challenging.
Definition. Let E and F be normed spaces. Let V be an open subset E and W be an open
subset of F . Let f : V → W be a bijection. We say that f is a diffeomorphism if f and f −1 are
differentiable. We say that f is a diffeomorphism of class C n (or a C n −diffeomorphism) if f and
f −1 are of class C n .
Corollary 5.1. (the inverse function theorem or the local inversion theorem – version 2).
Let E and F be isomorphic Banach spaces and let f : U ⊂ E → F be a map of class C n . Suppose
that
f 0 (a) ∈ Isom (E, F ) for some point a ∈ U.
Then there exists an open neighborhood V of a and an open neighborhood W of b = f (a) such
that f : V → W is a diffeomorphism of class C n .
Proof. We reason by induction on n.
Basis step (n = 1). The continuity of f 0 at a and the openness of Isom (E, F ) imply together that
there exists a neighborhood V 0 of a such that
(i) f : V → W is a homeomorphism.
g 0 (y) = (f 0 (g(y)))−1 .
Corollary 5.2. (The global inverse function theorem). Let E and F be isomorphic Banach
spaces and let U be an open subset of E. Consider a map f : U → F of class C n . Then the
following conditions are equivalent.
(i) f (U ) is open and f is a (global) C n −diffeomorphism between U and f (U ).
Theorem 5.3. (the implicit function theorem). Let E, F and G be Banach spaces with F and
G isomorphic. Let U be an open subset of E × F and consider a map f : U → G of class C n . Let
(a, b) ∈ U such that
(a) f (a, b) = 0,
∂f
(b) (a, b) is an isomorphism between F and G.
∂y
Then the following hold.
(i) There exists an open neighborhood V of (a, b) contained in U , there exists an open neighborhood
W of a and there exists a C n function g : W → F such that
Proof. We will deduce this theorem from the inverse function theorem. Define the map f1 : U →
E × G by
f1 (x, y) = (x, f (x, y)) .
Then f1 is of class C n and Ñ é
IE 0
f10 (x, y) = ∂f ∂f .
∂x ∂y
The assumption that ∂f 0
∂y (a, b) is an isomorphism implies that f1 (a, b) is an isomorphism (you should
be able to check that).
Thus, f1 satisfies the assumptions of version 2 of the inverse function theorem. Accordingly,
there exists a neighborhood V of (a, b) and a neighborhood W1 of f1 (a, b) = (a, 0) such that
f1 : V → W1 is C n −diffeomorphism. Let g1 = f1−1 . Then, g is of the form g1 (x, z) = (x, ϕ(x, z))
(the first component of g1 must be the identity and the second component is ϕ). Note that ϕ is of
class C n . Now,
Now let π : E → E × G be defined by π(x) = (x, 0) and set W = π −1 (W1 ). Then W is an open
neighborhood of a and (x, 0) ∈ W1 ⇔ x ∈ W . Thus, we have proved
This is conclusion (i). Conclusion (ii) follows from the chain rule.
1. h(a) = b.
2. (x, h(x)) ∈ U ∀ x ∈ W 0 .
3. f (x, h(x)) = 0 ∀ x ∈ W 0 .
Exercise 2. Deduce the inverse function theorem version 2 from the implicit function theorem.
Hint. Consider the function f1 (x, y) = y − f (x).
Lagrange multipliers
In chapter 4, we considered some optimization problems for a smooth map f : U ⊂ E → R and we
gave conditions to detect local extrema. Now we consider optimization problems with constraints.
For example what is the maximum value of x + y + z subject to the condition x2 + y 2 + z 2 = 1?
This problem can be stated as
We call the condition x2 + y 2 + z 2 = 1 a constraint. Geometrically, this means that we want to find
the maximum (or minimum) value of the function x + y + z on the unit sphere.
More generally given two smooth functions f and g from a open subset of U of Rn to R, we
consider the maximization problem
Theorem 5.4. (Lagrange multiplier theorem with one constraint). Let U be an open subset
of Rn (n ≥ 2). Let f and g be C 1 −maps from U to R. If f restricted to g −1 (0) has a local
extremum at a point a and ∇g(a) 6= 0, then there exists a number λ such that
∇f (a) = λ∇g(a).
The number λ is called the Lagrange multiplier relative to the optimization problem.
Remark. In some definitions, you may find the conclusion sated as ∇f (a) + λ∇g(a) = 0.
Proof. For simplicity, we present a proof when n = 3, but the same proof works for any n ≥ 2.
Let a = (a1 , a2 , a3 ) and denote the arguments of f and g by x, y, z. Since ∇g(a) 6= 0, at least
∂g ∂g
one among the partial derivatives ∂1 g(a) = ∂x (a), ∂2 g(a) = ∂y (a) and ∂3 g(a) = ∂g
∂z (a) does not
vanish. After relabeling the variables, we may assume that ∂3 g(a) 6= 0. Since g(a1 , a2 , a3 ) = 0,
by the implicit function theorem, there exists a smooth map ϕ defined on a neighborhood W of
(a1 , a2 ) such that a3 = ϕ(a1 , a2 ) and
It follows that the partial derivatives with respect to x and y of the this map of two variables vanish.
By the chain rule
∂1 g + ∂3 g ∂1 ϕ = 0
∂2 g + ∂3 g ∂2 ϕ = 0
where the partial derivatives are evaluated at (x, y, ϕ(x, y)). In particular
It follows that
∂1 g(a) ∂2 g(a)
−∂1 ϕ(a1 , a2 ) = and − ∂2 ϕ(a1 , a2 ) = . (5.11)
∂3 g(a) ∂3 g(a)
If (x, y) is sufficiently close to (a1 , a2 ), then continuity of g implies that (x, y, ϕ(x, y)) is close to
a. If follows that the map (x, y) 7→ f (x, y, ϕ(x, y)) has a local extremum at (a1 , a2 ). Therefore
(a1 , a2 ) is a critical point for this map. By the chain rule again
If ∂3 f (a)(a) = 0, then, the above system implies that ∂1 f (a) = ∂2 f (a)(a) = 0. Therefore
∇f (a) = 0 and so we can take λ = 0. If not, then
∂1 f (a) ∂2 f (a)
−∂1 ϕ(a1 , a2 ) = and − ∂2 ϕ(a1 , a3 ) = .
∂3 f (a) ∂3 f (a)
Methodology
Consider the problem of finding the minimum and maximum values of a smooth function f of n
real variables with the smooth constraint g(x) = 0.
1. Make sure that the maximum an minimum exist. This is the case when g −1 (0) is compact.
and we try to solve it. This system can be hard or even impossible to solve with a simple
formula because it is not necessarily linear in a. In this case, we may solve it numerically.
However there are some simple cases where we can solve it explicitly.
3. In this case, we evaluate f at the given solutions. The biggest value leads to a maximum and
the smallest value leads to a minimum.
Example. Find the maximum and minimum values of f (x, y, z) = x + y + z subject to the
constraint g(x, y, z) = x2 + y 2 + z 2 − 1 = 0. First, we must observe that there is a minimum and a
maximum value because x + y + z is continuous and the sphere is compact. Second, the gradient
of g vanishes only at zero. Therefore we are sure that there is a Lagrange multiplier. The system
we want to solve is
1 = 2λx
1 = 2λy
1 = 2λz
x2 + y 2 + z 2 = 1.
√
This gives x = y = z = 2λ1
and x2 + y 2 + z 2 = 1. Therefore 4λ3 2 = 1 and so λ= ±2 3 . This gives
−1 √
two candidates for the maximum and minimum √13 , √13 , √13 and √ 3
, −13 , √
−1
3
. The first point is
√
√ value 3 and the second one is necessarily
necessarily a global maximum point because f takes the
a global minimum point because f takes the value − 3.