0% found this document useful (0 votes)
15 views

Math 188 Fall 2017 Notes

Uploaded by

omsharma06747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
15 views

Math 188 Fall 2017 Notes

Uploaded by

omsharma06747
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 95

Notes on the Calculus of Variations and

Optimization
Preliminary Lecture Notes

Adolfo J. Rumbos
c Draft date November 14, 2017
2
Contents

1 Preface 5

2 Variational Problems 7
2.1 Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Linearized Minimal Surface Equation . . . . . . . . . . . . . . . . 12
2.3 Vibrating String . . . . . . . . . . . . . . . . . . . . . . . . . . . 14

3 Indirect Methods 19
3.1 Geodesics in the plane . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Fundamental Lemmas . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 The Euler–Lagrange Equations . . . . . . . . . . . . . . . . . . . 31

4 Convex Minimization 47
4.1 Gâteaux Differentiability . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 A Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Convex Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Convex Minimization Theorem . . . . . . . . . . . . . . . . . . . 62

5 Optimization with Constraints 65


5.1 Queen Dido’s Problem . . . . . . . . . . . . . . . . . . . . . . . . 65
5.2 Euler–Lagrange Multiplier Theorem . . . . . . . . . . . . . . . . 67
5.3 An Isoperimetric Problem . . . . . . . . . . . . . . . . . . . . . . 78

A Some Inequalities 87
A.1 The Cauchy–Schwarz Inequality . . . . . . . . . . . . . . . . . . . 87

B Theorems About Integration 89


B.1 Differentiating Under the Integral Sign . . . . . . . . . . . . . . . 89
B.2 The Divergence Theorem . . . . . . . . . . . . . . . . . . . . . . 90

C Continuity of Functionals 93
C.1 Definition of Continuity . . . . . . . . . . . . . . . . . . . . . . . 93

3
4 CONTENTS
Chapter 1

Preface

This course is an introduction to the Calculus of Variations and its applications


to the theory of differential equations, in particular, boundary value problems.
The calculus of variations is a subject as old as the Calculus of Newton and
Leibniz. It arose out of the necessity of looking at physical problems in which
an optimal solution is sought; e.g., which configurations of molecules, or paths of
particles, will minimize a physical quantity like the energy or the action? Prob-
lems like these are known as variational problems. Since its beginnings, the
calculus of variations has been intimately connected with the theory of differen-
tial equations; in particular, the theory of boundary value problems. Sometimes
a variational problem leads to a differential equation that can be solved, and
this gives the desired optimal solution. On the other hand, variational meth-
ods can be successfully used to find solutions of otherwise intractable problems
in nonlinear partial differential equations. This interplay between the theory of
boundary value problems for differential equations and the calculus of variations
will be one of the major themes in the course.
We begin the course with an example involving surfaces that span a wire
loop in space. Out of all such surfaces, we would like to find, if possible, the one
that has the smallest possible surface area. If such a surface exists, we call it a
mimimal surface. This example will serve to motivate a large portion of what
we will be doing in this course. The minimal surface problem is an example of
a variational problem.
In a variational problem, out of a class of functions (e.g., functions whose
graphs in three–dimensional space yield surface spanning a given loop) we seek
to find one that optimizes (minimizes or maximizes) a certainty quantity (e.g.,
the surface area of the surface). There are two approaches to solving this kind
of problems: the direct approach and the indirect approach. In the direct
approach, we try to find a minimizer or a maximizer of the quantity, in some
cases, by considering sequences of functions for which the quantity under study
approaches a maximum or a minimum, and then extracting a subsequence of
the functions that converge in some sense to the sought after optimal solution.
In the indirect method of the Calculus of Variations, which was developed first

5
6 CHAPTER 1. PREFACE

historically, we first find necessary conditions for a given function to be an


optimizer for the quantity. In cases in which we assume that functions in the
class under study are differentiable, these conditions, sometimes, come in the
form of a differential equations, or system of differential equations, that the
functions must satisfy, in conjunction with some boundary conditions. This
process leads to a boundary value problem. If the boundary value problem can
be solved, we can obtain a candidate for an optimizer of the quantity (a critical
“point”). The next step in the process is to show that the given candidate is
an optimizer. This can be done, in some cases, by establishing some sufficient
conditions for a function to be an optimizer. The indirect method in the Calculus
of Variations is reminiscent of the optimization procedure that we first learn in
a first single variable Calculus course.
Conversely, some classes of boundary value problems have a particular struc-
ture in which solutions are optimizers (minimizers, maximizers, or, in general,
critical “points”) of a certain quantity over a class of functions. Thus, these
differential equations problems can, in theory, be solved by finding optimizers
of a certain quantity. In some cases, the existence of optimizers can be achieved
by a direct method in the Calculus of Variations. This provides an approach,
known as the variational approach in the theory of differential equations.
Chapter 2

Examples of a Variational
Problems

2.1 Minimal Surfaces


Imagine you take a twisted wire loop, as that pictured in Figure 2.1.1, and dip it
into a soap solution. When you pull it out of the solution, a soap film spanning
the wire loop develops. We are interested in understanding the mathematical
properties of the film, which can be modeled by a smooth surface in three

x Ω

Figure 2.1.1: Wire Loop

7
8 CHAPTER 2. VARIATIONAL PROBLEMS

dimensional space. Specifically, the shape of the soap film spanning the wire
loop, can be modeled by the graph of a smooth function, u : Ω → R, defined on
the closure of a bounded region, Ω, in the xy–plane with smooth boundary ∂Ω.
The physical explanation for the shape of the soap film relies on the variational
principle that states that, at equilibrium, the configuration of the film must be
such that the energy associated with the surface tension in the film must be the
lowest possible. Since the energy associated with surface tension in the film is
proportional to the area of the surface, it follows from the least–energy principle
that a soap film must minimize the area; in other words, the soap film spanning
the wire loop must have the shape of a smooth surface in space containing
the wire loop with the property that it has the smallest possible area among
all smooth surfaces that span the wire loop. In this section we will develop a
mathematical formulation of this variational problem.
The wire loop can be modeled by the curve determined by the set of points:

(x, y, g(x, y)), for (x, y) ∈ ∂Ω,

where ∂Ω is the smooth boundary of a bounded open region Ω in the xy–plane


(see Figure 2.1.1), and g is a given function defined in a neighborhood of ∂Ω,
which is assumed to be continuous. A surface, S, spanning the wire loop can
be modeled by the image of a C 1 map

Φ : Ω → R3

given by
Φ(x, y) = (x, y, u(x, u)), for all x ∈ Ω, (2.1)
where Ω = Ω ∪ ∂R is the closure of Ω, and

u: Ω → R

is a function that is assumed to be C 2 in Ω and continuous on Ω; we write

u ∈ C 2 (Ω) ∩ C(Ω).

Let Ag denote the collection of functions u ∈ C 2 (Ω) ∩ C(Ω) satisfying

u(x, y) = g(x, y), for all (x, y) ∈ ∂Ω;

that is,
Ag = {u ∈ C 2 (Ω) ∩ C(Ω) | u = g on ∂Ω}. (2.2)
Next, we see how to compute the area of the surface Su = Φ(Ω), where Φ is
the map given in (2.1) for u ∈ Ag , where Ag is the class of functions defined in
(2.2).
The grid lines x = c and y = d, for arbitrary constants c and d, are mapped
by the parametrization Φ into curves in the surface Su given by

y 7→ Φ(c, y)
2.1. MINIMAL SURFACES 9

and
x 7→ Φ(x, d),
respectively. The tangent vectors to these paths are given by
 
∂u
Φy = 0, 1, (2.3)
∂y
and  
∂u
Φx = 1, 0, , (2.4)
∂x
respectively. The quantity
kΦx × Φy k∆x∆y (2.5)
gives an approximation to the area of portion of the surface Su that results
from mapping the rectangle [x, x + ∆x] × [y, y + ∆y] in the region Ω to the
surface Su by means of the parametrization Φ given in (2.1). Adding up all the
contributions in (2.5), while refining the grid, yields the following formula for
the area Su : ZZ
area(Su ) = kΦx × Φy k dxdy. (2.6)

Using the definitions of the tangent vectors Φx and Φy in (2.3) and (2.4), re-
spectively, we obtain that
 
∂u ∂u
Φ x × Φy = − , − , 1 ,
∂x ∂y
so that s  2  2
∂u ∂u
kΦx × Φy k = 1+ + ,
∂x ∂y
or p
kΦx × Φy k = 1 + |∇u|2 ,
where |∇u| denotes the Euclidean norm of ∇u. We can therefore write (2.6) as
ZZ p
area(Su ) = 1 + |∇u|2 dxdy. (2.7)

The formula in (2.7) allows us to define a map

A : Ag → R

by ZZ p
A(u) = 1 + |∇u|2 dxdy, for all u ∈ Ag , (2.8)

which gives the area of the surface parametrized by the map Φ : Ω → R3 given
in (2.1) for u ∈ Ag . We will refer to the map A : Ag → R defined in (2.8) as the
area functional. With the new notation we can restate the variational problem
of this section as follows:
10 CHAPTER 2. VARIATIONAL PROBLEMS

Problem 2.1.1 (Variational Problem 1). Out of all functions in Ag , find one
such that
A(u) 6 A(v), for all v ∈ Ag . (2.9)

That is, find a function in Ag that minimizes the area functional in the class
Ag .
Problem 2.1.1 is an instance of what has been known as Plateau’s problem
in the Calculus of Variations. The mathematical question surrounding Pateau’s
problem was first formulated by Euler and Lagrange around 1760. In the middle
of the 19th century, the Belgian physicist Joseph Plateu conducted experiments
with soap films that led him to the conjecture that soap films that form around
wire loops are of minimal surface area. It was not until 1931 that the American
mathematician Jesse Douglas and the Hungarian mathematician Tibor Radó,
independently, came up with the first mathematical proofs for the existence
of minimal surfaces. In this section we will derive a necessary condition for
the existence of a solution to Problem 2.1.1, which is expressed in terms of
a partial differential equation (PDE) that u ∈ Ag must satisfy, the minimal
surface equation.
Suppose we have found a solution, u ∈ Ag , of Problem 2.1.1 in u ∈ Ag . Let
ϕ : Ω → R by a C ∞ function with compact support in Ω; we write ϕ ∈ Cc∞ (Ω)
(we show a construction of such function in the Appendix). It then follows that

u + tϕ ∈ Ag , for all t ∈ R, (2.10)

since ϕ vanishes in a neighborhood of ∂Ω and therefore u + tϕ = g on ∂Ω. It


follows from (2.10) and (2.9) that

A(u) 6 A(u + tϕ), for all t ∈ R. (2.11)

Consequently, the function f : R → R defined by

f (t) = A(u + tϕ), for all t ∈ R, (2.12)

has a minimum at 0, by virtue of (2.12) and (2.12). It follows from this obser-
vation that, if f is differentiable at 0, then

f 0 (0) = 0. (2.13)

We will see next that, since we are assuming that u ∈ C 2 (R) ∩ C(Ω) and
ϕ ∈ Cc∞ (Ω), f is indeed differentiable. To see why this is the case, use (2.12)
and (2.8) to compute
ZZ p
f (t) = 1 + |∇(u + tϕ)|2 dxdy, for all t ∈ R, (2.14)

where
∇(u + tϕ) = ∇u + t∇ϕ, for all t ∈ R,
2.1. MINIMAL SURFACES 11

by the linearity of the differential operator ∇. It then follows that

∇(u + tϕ) = (∇u + t∇ϕ) · (∇u + t∇ϕ)

= ∇u · ∇u + t∇u · ∇ϕ + t∇ϕ · ∇u + t2 ∇ϕ · ∇ϕ

= |∇u|2 + 2t∇u · ∇ϕ + t2 |∇ϕ|2 ,

so that, substituting into (2.14),


ZZ p
f (t) = 1 + |∇u|2 + 2t∇u · ∇ϕ + t2 |∇ϕ|2 dxdy, for all t ∈ R. (2.15)

Since the integrand in (2.15) is C 1 , we can differentiate under the integral sign
(see Appendix) to get

∇u · ∇ϕ + t|∇ϕ|2
ZZ
0
f (t) = p dxdy, (2.16)
Ω 1 + |∇u|2 + 2t∇u · ∇ϕ + t2 |∇ϕ|2

for all t ∈ R. Thus, f is differentiable and, substituting 0 for t in (2.16),


∇u · ∇ϕ
ZZ
f 0 (0) = p dxdy. (2.17)
Ω 1 + |∇u|2

Hence, if u is a minimizer of the area functional in Ag , it follows from (2.12)


and (2.17) that
∇u · ∇ϕ
ZZ
p dxdy = 0, for all ϕ ∈ Cc∞ (Ω). (2.18)
Ω 1 + |∇u|2

The statement in (2.18) provides a necessary condition for the existence of


a minimizer of the area functional in Ag . We will next see how (2.18) gives rise
to a PDE that u ∈ C 2 (Ω) ∩ C(Ω) must satisfy in order for it to be minimizer of
the area functional in Ag .
First, we “integrate by parts” (see Appendix) in (2.18) to get
!
∇u ∇u · ~n
ZZ Z
− ∇· p ϕ dxdy + ϕp ds = 0, (2.19)
1 + |∇u| 2 1 + |∇u|2
Ω ∂Ω

for all ϕ ∈ Cc∞ (Ω), where the second integral in (2.19) is a path integral around
the boundary of Ω. Since ϕ ∈ Cc∞ (Ω) vanishes in a neighborhood of the bound-
ary of R, it follows from (2.19) that
!
∇u
ZZ
∇· p ϕ dxdy = 0, for all ϕ ∈ Cc∞ (Ω). (2.20)
Ω 1 + |∇u|2

By virtue of the assumption that u is a C 2 functions, it follows that the di-


vergence term of the integrand (2.20) is continuous on Ω, it follows from the
12 CHAPTER 2. VARIATIONAL PROBLEMS

statement in (2.20) that


!
∇u
∇· p = 0, in Ω. (2.21)
1 + |∇u|2

The equation in (2.21) is a second order nonlinear PDE known as the min-
imal surface equation. It provides a necessary condition for a function
u ∈ C 2 (Ω) ∩ C(Ω) to be a minimizer of the area functional in Ag . Since,
we are also assuming that u ∈ Ag , we get that u must solve the boundary value
problem (BVP):
 !
 ∇u
 ∇· p = 0 in Ω;


1 + |∇u|2 (2.22)



u = g on ∂Ω.

The BVP in (2.22) is called the Dirichlet problem for the minimal surface
equation.
The PDE in (2.21) can also be written as

(1 + u2y )uxx − 2ux uy uxy + (1 + u2x )uyy = 0, in Ω, (2.23)

where the subscripted symbols read as follows:

∂u ∂u
ux = , uy = ,
∂x ∂y

∂2u ∂2u
uxx = , uyy = ,
∂x2 ∂y 2
and
∂2u ∂2u
uxy = = = uyx . (2.24)
∂y∂x ∂x∂y
The fact that the “mixed” second partial derivatives in (2.24) are equal follows
from the assumption that u is a C 2 function.
The equation in (2.23) is a nonlinear, second order, elliptic PDE.

2.2 The Linearized Minimal Surface Equation


For the case in which the wire loop in the previous section is very close to a
horizontal plane (see Figure 2.2.2), it is reasonable to assume that, if u ∈ Ag ,
|∇u| is very small throughout R. We can therefore use the linear approximation
√ 1
1 + t ≈ 1 + t, for small |t|, (2.25)
2
2.2. LINEARIZED MINIMAL SURFACE EQUATION 13

x R

Figure 2.2.2: Almost Planar Wire Loop

to approximate the area function in (2.8) by


ZZ  
1 2
A(u) ≈ 1 + |∇u| dxdy, for all u ∈ Ag ,
Ω 2

so that ZZ
1
A(u) ≈ area(Ω) + |∇u|2 dxdy, for all u ∈ Ag . (2.26)
2 Ω

The integral on the right–hand side of the expression in (2.26) is known as


the Dirichlet Integral. We will use it in these notes to define the Dirichlet
functional, D : Ag → R,
ZZ
1
D(u) = |∇u|2 dxdy, for all u ∈ Ag . (2.27)
2 Ω

Thus, in view of (2.26) and (2.27),

A(u) ≈ area(Ω) + D(u), for all u ∈ Ag . (2.28)

Thus, according to (2.28), for wire loops close to a horizontal plane, minimal sur-
faces spanning the wire loop can be approximated by solutions to the following
variational problem,
Problem 2.2.1 (Variational Problem 2). Out of all functions in Ag , find one
such that
D(u) 6 D(v), for all v ∈ Ag . (2.29)
14 CHAPTER 2. VARIATIONAL PROBLEMS

It can be shown that a necessary condition for u ∈ Ag to be a solution to


the Variational Problem 2.2.1 is that u solves the boundary value problem

 ∆u = 0 in Ω;
(2.30)
u = g on ∂Ω,

where
∆u = uxx + uyy ,
the two–dimensional Laplacian. The BVP in (2.30) is called the Dirichlet Prob-
lem for Laplace’s equation.

2.3 Vibrating String


Consider a string of length L (imagine a guitar string or a violin string) whose
ends are located at x = 0 and x = L along the x–axis (see Figure 2.3.3). We
assume that the string is made of some material of (linear) density ρ(x) (in units
of mass per unit length). Assume that the string is fixed at the end–points and

x
0 L

Figure 2.3.3: String of Length L at Equilibrium

is tightly stretched so that there is a constant tension, τ , acting tangentially


along the string at all times. We would like to model what happens to the
string after it is plucked to a configuration like that pictured in Figure 2.3.4
and then released. We assume that the shape of the plucked string is described

x
0 L

Figure 2.3.4: Plucked String of Length L

by a continuous function, f , of x, for x ∈ [0, L]. At any time t > 0, the shape
of the string is described by a function, u, of x and t; so that u(x, t) gives the
vertical displacement of a point in the string located at x when the string is in
the equilibrium position pictured in Figure 2.3.3, and at time t > 0. We then
have that
u(x, 0) = f (x), for all x ∈ [0, L]. (2.31)
In addition to the initial condition in (2.31), we will also prescribe the initial
speed of the string,
∂u
(x, 0) = g(x), for all x ∈ [0, L], (2.32)
∂t
2.3. VIBRATING STRING 15

where g is a continuous function of x; for instance, if the plucked string is


released from rest, then g(x) = 0 for all x ∈ [0, L]. We also have the boundary
conditions,
u(0, t) = u(L, t) = 0, for all t, (2.33)
which model the assumption that the ends of the string do not move.
The question we would like to answer is: Given the initial conditions in
(2.31) and (2.32), and the boundary conditions in (2.33), can we determine the
shape of the string, u(x, t), for all x ∈ [0, L] and all times t > 0? We will answer
this questions in a subsequent chapter in these notes. In this section, though,
we will derive a necessary condition in the form of a PDE that u must satisfy
in order for it to describe the motion of the vibrating string.
In order to find the PDE governing the motion of the string, we will formulate
the problem as a variational problem. We will use Hamilton’s Principle in
Mechanics, or the Principle of Least Action. This principle states that the
the path that configurations of a mechanical system take from time t = 0 to
t = T is such that a quantity called the action is minimized (or optimized)
along the path. The action is defined by
Z T
A= [K(t) − V (t)] dt, (2.34)
0

where K(t) denotes the kinetic energy of the system at time t, and V (t) its
potential energy at time t. For the case of a string whose motion is described by
small vertical displacements u(x, t), for all x ∈ [0, L] and all times t, the kinetic
energy is given by
2
1 L
Z 
∂u
K(t) = ρ(x) (x, t) dx. (2.35)
2 0 ∂t
To see how (2.35) comes about, note that the kinetic energy of a particle of
mass m is
1
K = mv 2 ,
2
where v is the speed of the particle. Thus, for a small element of the string whose
projection on the x–axis is the interval [x, x+∆x], so that its approximate length
is ∆x, the kinetic energy is, approximately,
 2
1 ∂u
∆K ≈ ρ(x) (x, t) . (2.36)
2 ∂t
Thus, adding up the kinetic energies in (2.36) over all elements of the string
adding in length to L, and letting ∆x → 0, yields the expression in (2.35),
which we rewrite as
1 L 2
Z
K(t) = ρut dx, for all t, (2.37)
2 0
where ut denotes the partial derivative of u with respect to t.
16 CHAPTER 2. VARIATIONAL PROBLEMS

In order compute the potential energy of the string, we compute the work
done by the tension, τ , along the string in stretching the string from its equi-
librium length of L, to the length at time t given by
Z L p
1 + u2x dx; (2.38)
0

so that "Z #
L p
V (t) = τ 1 + u2x dx − L , for all t. (2.39)
0

Since we are considering small vertical displacements of the string, we can lin-
earize the expression in in (2.38) by means of the linear approximation in (2.25)
to get
Z Lp Z L
1 L1 2
Z
1
1 + u2x dx ≈ [1 + u2x ] dx = L + u dx,
0 0 2 2 0 2 x
so that, substituting into (2.39),
Z L
1
V (t) ≈ τ u2x dx, for all t. (2.40)
2 0

Thus, in view of, we consider the problem of optimizing the quantity


Z T Z L  
1 2 1 2
A(u) = ρu − τ u dxdt, (2.41)
0 0 2 t 2 x

where we have substitute the expressions for K(t) and V (t) in (2.37) and (2.40),
respectively, into the expression for the action in (2.34).
We will use the expression for the action in (2.41) to the define a functional in
the class of functions A defines as follows: Let R = (0, L) × (0, T ), the cartesian
product of the open intervals (0, L) and (0, T ). Then, R is an open rectangle in
the xt–plane. We say that u ∈ A if u ∈ C 2 (R) ∩ C(R), and u satisfies the initial
conditions in (2.31) and (2.32), and the boundary conditions in (2.33). Then,
the action functional,
A : A → R,
is defined by the expression in (2.41), so that
ZZ
1  2
ρut − τ u2x dxdt,

A(u) = for u ∈ A. (2.42)
2 R

Next, for ϕ ∈ Cc∞ (R), note that u + sϕ ∈ A, since ϕ has compact support
in R, and therefore ϕ and all its derivatives are 0 on ∂R. We can then define a
real valued function h : R → R by

h(s) = A(u + sϕ), for s ∈ R, (2.43)


2.3. VIBRATING STRING 17

Using the definition of the functional A in (2.42), we can rewrite h(s) in (2.43)
as ZZ
1
ρ[(u + sϕ)t ]2 − τ [(u + sϕ)x ]2 dxdt
 
h(s) =
2 R
ZZ
1
ρ[ut + sϕt ]2 − τ [ux + sϕx ]2 dxdt,
 
=
2 R
so that
ZZ
h(s) = A(u) + s [ρut ϕt − τ ux ϕx ] dxdt + s2 A(ϕ), (2.44)
R

for s ∈ R, where we have used the definition of the action functional in (2.42).
It follows from (2.44) that h is differentiable and
ZZ
h0 (s) = [ρut ϕt − τ ux ϕx ] dxdt + 2sA(ϕ), for s ∈ R. (2.45)
R

The principle of least action implies that, if u describes the shape of the string,
then s = 0 must be ac critical point of h. Hence, h0 (0) = 0 and (2.45) implies
that ZZ
[ρut ϕt − τ ux ϕx ] dxdt = 0, for ϕ ∈ Cc∞ (R), (2.46)
R

is a necessary condition for u(x, t) to describe the shape of a vibrating string


for all times t.
Next, we use the integration by parts formulas
ZZ Z ZZ
∂ϕ ∂ψ
ψ dxdt = ψϕn1 ds − ϕ dxdt,
R ∂x ∂R R ∂x

for C 1 functions ψ and ϕ, where n1 is the first component of the outward unit
normal, ~n, on ∂R (wherever this vector is defined), and
ZZ Z ZZ
∂ϕ ∂ψ
ψ dxdt = ψϕn2 ds − ϕ dxdt,
R ∂t ∂R R ∂t

where n2 is the second component of the outward unit normal, ~n, (see Problem
1 in Assignment #8), to obtain
ZZ Z ZZ

ρut ϕt dxdt = ρut ϕn2 ds − [ρut ]ϕ dxdt,
R ∂R R ∂t
so that ZZ ZZ

ρut ϕt dxdt = − [ρut ]ϕ dxdt, (2.47)
R R ∂t
since ϕ has compact support in R.
Similarly, ZZ ZZ

τ ux ϕx dxdt = − [τ ux ]ϕ dxdt. (2.48)
R R ∂x
18 CHAPTER 2. VARIATIONAL PROBLEMS

Next, substitute the results in (2.47) and (2.48) into (2.46) to get
ZZ  
∂ ∂
[ρut ] − [τ ux ] ϕ dxdt = 0, for ϕ ∈ Cc∞ (R). (2.49)
R ∂t ∂x

Thus, applying the Fundamental Lemma of the Calculus of Variations (see the
next chapter in these notes), we obtain from (2.49) that

∂2u ∂2u
ρ − τ = 0, in R, (2.50)
∂t2 ∂x2
since we area assuming that u is C 2 , ρ is a continuous function of x, and τ is
constant.
The PDE in (2.50) is called the one–dimensional wave equation. It is
sometimes written as
∂2u τ ∂2u
2
= ,
∂t ρ ∂x2
or
∂2u ∂2u
2
= c2 2 , (2.51)
∂t ∂x
where
τ
c2 = ,
ρ
where the case in which ρ is assumed to be constant.
The wave equation in (2.50) or (2.51) is a second order, linear, hyperbolic
PDE.
Chapter 3

Indirect Methods in the


Calculus of Variations

We begin this chapter by discussing a very simple problem in the Calculus of


Variations: Given two points in the plane, find a smooth path connecting those
two points that has the shortest length. A solution of this problem is called a
geodesic curve between the two points. This example is simple because we
know the answer to this question from Euclidean geometry. Nevertheless, the
solutions that we present here serve to illustrate both the direct and indirect
methods in the calculus of variation. It will certainly be a good introduction to
the indirect methods.

3.1 Geodesics in the plane


Let P and Q denote two points in the xy–plane with coordinates (xo , yo ) and
(x1 , y1 ), respectively. We consider the class, A, of smooth paths that connect
P to Q. One of theses paths is shown in Figure 3.1.1. We assume that paths
are given parametrically by the pair of functions
(x(s), y(s)), for s ∈ [0, 1],
where x : [0, 1] → R and y : [0, 1] → R are differentiable functions with continu-
ous derivatives in some oven interval that contains [0, 1], such that
(x(0), y(0)) = (xo , yo ) and (x(1), y(1)) = (x1 , y1 ).
We may write this more succinctly as
A = {(x, y) ∈ C 1 ([0, 1], R2 ) | (x(0), y(0)) = P and (x(1), y(1)) = Q}. (3.1)
We define a functional, J : A → R, by
Z 1p
J(x, y) = (x0 (s))2 + (y 0 (s))2 ds, for all (x, y) ∈ A. (3.2)
0

19
20 CHAPTER 3. INDIRECT METHODS

xo x1 x

Figure 3.1.1: Path connecting P and Q

Thus, J(x, y) is the arc–length along the path from P to Q parametrized by


(x(s), y(s)), for s ∈ [0, 1].
We would like to solve the following variational problem:
Problem 3.1.1 (Geodesic Problem 1). Out of all paths in A, find one, (x, y),
such that
J(x, y) 6 J(u, v), for all (u, v) ∈ A (3.3)
Observe that the expression for J in (3.2) can be written as
Z 1
J(x, y) = |(x0 (s), y 0 (s))| ds, for all (x, y) ∈ A, (3.4)
0

where |(·, ·)| in the integrand in (3.4) denotes the Euclidean norm in R2 .
We will first use this example to illustrate the direct method in the Calculus
of Variations. We begin by showing that the functional J defined in (3.4) is
bounded below in A by kQ − P k, or
p
|(x1 , y1 ) − (xo , yo )| = (x1 − xo )2 + (y1 − yo )2 ,
the Euclidean distance from P to Q; that is
|Q − P | 6 J(u, v), for all (u, v) ∈ A. (3.5)
Indeed, it follows from the Fundamental Theorem of Calculus that
Z 1
(u(1), v(1)) − (u(0), v(0)) = (u0 (s), v 0 (s)) ds,
0

for any (u, v) ∈ A, or


Z 1
Q−P = (u0 (s), v 0 (s)) ds, for any (u, v) ∈ A. (3.6)
0
3.1. GEODESICS IN THE PLANE 21

Now, using the fact that |Q − P |2 = (Q − P ) · (Q − P ), the dot product (or


Euclidean inner product) of Q − P with itself, we obtain from (3.6) that
Z 1
2
|Q − P | = (Q − P ) · (u0 (s), v 0 (s)) ds, for any (u, v) ∈ A,
0
or
Z 1
|Q − P |2 = (Q − P ) · (u0 (s), v 0 (s)) ds, for any (u, v) ∈ A. (3.7)
0

Thus, applying the Cauchy–Schwarz inequality to the integrand of the integral


on the right–hand side of (3.7), we get that
Z 1
|Q − P |2 6 |Q − P | |(u0 (s), v 0 (s))| ds, for any (u, v) ∈ A,
0
or Z 1
|Q − P |2 6 |Q − P | |(u0 (s), v 0 (s))| ds, for any (u, v) ∈ A,
0
or
|Q − P |2 6 |Q − P | J(u, v), for any (u, v) ∈ A. (3.8)
For the case in which P 6= Q, we see that the estimate in (3.5) follows from the
inequality in (3.8).
Now, it follows from (3.5) that the functional J is bounded from below in A
by kQ − P k. Hence, the infimum of J over A exists and

|Q − P | 6 inf J(u, v). (3.9)


(u,v)∈A

Next, we see that the infimum in (3.9) is attained and is |Q − P |. Indeed,


let
(x(s), y(s)) = P + s(Q − P ), for s ∈ [0, 1], (3.10)
the straight line segment connecting P to Q.
Note that
(x(0), y(0)) = P and (x(1), y(1)) = Q,
and (x, y) is a differentiable path with

(x0 (s), y 0 (s)) = Q − P, for all s ∈ [0, 1]. (3.11)

Thus, (x, y) belongs to the class A defined in (3.1).


Using the definition of the functional J in (3.4) and the fact in (3.11), com-
pute Z 1
J(x, y) = |Q − P | ds = |Q − P |.
0
Consequently, we get from (3.9) that

inf J(u, v) = |Q − P |. (3.12)


(u,v)∈A
22 CHAPTER 3. INDIRECT METHODS

Furthermore, the infimum in (3.12) is attained on the path given in (3.10).


This is in accord with the notion from elementary Euclidean geometry that the
shortest distance between two points is attained along a straight line segment
connecting the points.
Next, we illustrate the indirect method in the Calculus of Variations, which
is the main topic of this chapter.
We consider the special parametrization

(x(s), y(s)) = (s, y(s)), for xo 6 s 6 x1 ,

where y : [xo , x1 ] → R is a continuous function that is differentiable, with contin-


uous derivative, in an open interval that contains [xo , x1 ]. Here we are assuming
that
xo < x1 ,

y(xo ) = yo and y(x1 ) = y1 .


Thus, the path connecting P and Q is the graph of a differentiable function over
the interval [x1 , x2 ]. This is illustrated in Figure 3.1.1.
We’ll have to define the class A and the functional J : A → R in a different
way. Put

A = {y ∈ C 1 ([xo , x1 ], R) | y(xo ) = yo and y(x1 ) = y1 }, (3.13)

and Z x1 p
J(y) = 1 + (y 0 (s))2 ds, for all y ∈ A. (3.14)
xo

We would like to solve the variational problem:

Problem 3.1.2 (Geodesic Problem 2). Out of all functions in A, find one, y,
such that
J(y) 6 J(v), for all v ∈ A (3.15)

In the indirect method, we assume that we have a solution of the optimization


problem, and then deduce conditions that this function must satisfy; in other
words, we find a necessary condition for a function in a competing class to be a
solution. Sometimes, the necessary conditions can be used to find a candidate
for a solution of the optimization problem (a critical “point”). The next step in
the indirect method is to verify that the candidate indeed solves the optimization
problem.
Thus, assume that there is a function y ∈ A that solves Geodesic Problem
2; that is, y satisfy the estimates in (3.15). Next, let η : [xo , x1 ] → R denote a
C 1 function such that η(xo ) = 0 and η(x1 ) = 0 (we will see later in these notes
how this function η can be constructed). It then follows from the definition of
A in (3.13) that
y + tη ∈ A, for all t ∈ R.
3.1. GEODESICS IN THE PLANE 23

It then follows from (3.15) that

J(y) 6 J(y + tη), for all t ∈ R. (3.16)

Next, define f : R → R by

f (t) = J(y + tη), for all t ∈ R. (3.17)

It follows from (3.16) and the definition of f in (3.17) that f has a minimum at
0; that is
f (t) > f (0), for all t ∈ R.
Thus, if it can be shown that the function f defined in (3.17) is differentiable,
we get the necessary condition

f 0 (0) = 0, (3.18)

which can be written in terms of J as


d
[J(y + tη)] = 0. (3.19)
dt t=0

Next, we proceed to show that the function


Z x1 p
f (t) = J(y + tη) = 1 + (y 0 (s) + tη 0 (s))2 ds, for t ∈ R,
xo

is a differentiable function of t. Observe that we can write f as


Z x1 p
f (t) = 1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2 ds, for t ∈ R. (3.20)
xo

To see show that f is differentiable, we need to see that we can differentiate the
expression on the right–hand side of (3.20) under the integral sign. This follows
from the fact that the partial derivative of the integrand on the right–hand side
of 3.20,
∂p
1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2 ,
∂t
or
y 0 (s)η 0 (s) + t(η 0 (s))2
p ,
1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2
for t, y and s, for s in some open interval containing [xo , x1 ]. It then follows
from the results in Appendix B.1 that f given in (3.20) is differentiable and
Z x1
y 0 (s)η 0 (s) + t(η 0 (s))2
f 0 (t) = p ds, for t ∈ R, (3.21)
xo 1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2

(see Proposition B.1.1). Evaluating the expression for f 0 (t) in (3.21) at t = 0,


we obtain that Z x1
0 y 0 (s)η 0 (s)
f (0) = p ds. (3.22)
xo 1 + (y 0 (s))2
24 CHAPTER 3. INDIRECT METHODS

Thus, in view of (3.18) and (3.22), we see that a necessary condition for y ∈ A,
where A is given in (3.13), to be a minimizer of the functional J : A → R given
in (3.14), is that
Z x1
y 0 (s)
p η 0 (s) ds = 0, for all η ∈ Co1 ([xo , x1 ], R), (3.23)
1 + (y 0 (s))2
xo

where

Co1 ([xo , x1 ], R) = {η ∈ C 1 ([xo , x1 ], R) | η(xo ) = 0 and η(x1 ) = 0}, (3.24)

the class of C 1 , real–valued functions in [xo , x1 ] that vanish at the end–points


of the interval [xo , x1 ].
We will see in the next section that if the condition in (3.23) holds true for
every η ∈ Co1 ([xo , x1 ], R), then
y 0 (s)
p = c1 , for all s ∈ [xo , x1 ], (3.25)
1 + (y 0 (s))2
where c1 is a constant (see the second fundamental lemma in the Calculus of
Variations, Lemma 3.2.8 on page 29 in these notes).
Now, squaring on both sides of (3.25),
(y 0 (s))2
= c21 , for all s ∈ [xo , x1 ]. (3.26)
1 + (y 0 (s))2
It follows from (3.26) that c21 6= 1 (otherwise we would conclude that 1 = 0,
which is impossible). Hence, we can solve (3.26) for (y 0 (s))2 to obtain
c2
(y 0 (s))2 = , for all s ∈ [xo , x1 ], (3.27)
1 − c21
from which we conclude that

y 0 (s) = c2 , for all s ∈ [xo , x1 ], (3.28)

where c2 is a constant.
We can solve the differential equation in (3.28) to obtain the general solution

y(s) = c2 s + c3 , for all s ∈ [xo , x1 ], (3.29)

where c1 and c2 is a constants.


Since we are also assuming that y ∈ A, where A is given in (3.13), it follows
that y must satisfy the boundary conditions

y(xo ) = yo and y(x1 ) = y1 .

We therefore get the following system of equations that c2 and c3 must solve
(
c2 xo + c3 = yo ;
c2 x1 + c3 = y1 ,
3.2. FUNDAMENTAL LEMMAS 25

or (
xo c2 + c3 = yo ;
(3.30)
x1 c2 + c3 = y1 .
Solving the system in (3.30) for c2 and c3 yields
y1 − yo x1 yo − xo y1
c2 = and c3 = .
x1 − xo x1 − xo
Thus, using the expression for y in (3.29),
y1 − yo x1 yo − xo y1
y(s) = s+ . (3.31)
x1 − xo x1 − xo
Note that the expression in (3.31) is the equation of a straight line that goes
through the points (xo , yo ) and (x1 , y1 ). Thus, we have shown that a candidate
for a minimizer of the arc–length functional J defined in (3.14) over the class
given in (3.13) is a straight line segment connecting the point P to the point
Q. It remains to show that the function y in (3.31) is in indeed a minimizer
of J in A, and that it is the only minimizer of J in A. This will be done in a
subsequent section in these notes.

3.2 Fundamental Lemmas in the Calculus of Vari-


ations
In the previous section we found a necessary condition for a function y ∈ A,
where
A = {y ∈ C 1 ([xo , x1 ], R) | y(xo ) = yo and y(x1 ) = y1 }, (3.32)
to be a minimizer of the arc–length functional J : A → R,
Z x1 p
J(y) = 1 + (y 0 (x))2 dx, for y ∈ C 1 ([xo , x1 ], R), (3.33)
xo

over the class A.


We found that, if y ∈ A is a minimizer of J over the class A, then y must
satisfy the condition
Z x1
y 0 (x)
p η 0 (s) dx = 0, for all η ∈ Co1 ([xo , x1 ], R). (3.34)
xo 1 + (y 0 (x))2

This is a necessary condition for y ∈ A to be a minimizer of the functional


J : A → R defined in (3.33).
We then invoked a fundamental lemma in the Calculus of Variations to
deduce that the condition in (3.34) implies that

y 0 (x)
p = c1 , for all x ∈ [xo , x1 ], (3.35)
1 + (y 0 (x))2
26 CHAPTER 3. INDIRECT METHODS

and for some constant c1 . This is also a necessary condition for y ∈ A being a
minimizer of the functional JA → R defined in (3.33), where A given in given
in (3.32).
We will see in this section that the differential equation in (3.35) follows
from the condition in (3.34) provided that the function

y0
p
1 + (y 0 )2

is known to be continuous. This is the case, for instance, if y is assumed to come


from certain classes of differentiable functions defined on a closed and bounded
interval. We’ll start by defining these classes of functions.
Let a, b ∈ R, and assume that a < b.

Definition 3.2.1. The class C([a, b], R) consists of real–valued functions,

f : [a, b] → R,

defined on the closed and bounded interval [a, b], and which are assumed to
be continuous on [a, b]. It can be shown that C([a, b], R) is a linear space (or
vector) space in which the operations are point–wise addition

(f + g)(x) = f (x) + g(x), for all x ∈ [a, b], and all f, g ∈ C([a, b], R),

and scalar multiplication

(cf )(x) = cf (x), for all x ∈ [a, b], all c ∈ R and all f ∈ C([a, b], R).

Definition 3.2.2. The class Co ([a, b], R) consists of all functions in C([a, b], R)
that vanish at a and b; in symbols,

Co ([a, b], R) = {f ∈ C([a, b], R) | f (a) = 0 and f (b) = 0}.

We note that Co ([a, b], R) is a linear subspace of C([a, b], R).

Definition 3.2.3. The class C 1 ([a, b], R) consists of all functions f : U → R


that are differentiable in an open interval U that contains [a, b] and such that
f 0 is continuous on [a, b]. We note that C 1 ([a, b], R) is a linear subspace of
C([a, b], R).

Definition 3.2.4. The class Co1 ([a, b], R) consists of all functions f ∈ C 1 ([a, b], R)
that vanish at the end–points of the interval [a, b]; thus,

Co1 ([a, b], R) = {f ∈ C 1 ([a, b], R) | f (a) = 0 and f (b) = 0}.

We begin by stating and proving the following basic lemma for the class
C([a, b], R).
3.2. FUNDAMENTAL LEMMAS 27

Lemma 3.2.5 (Basic Lemma 1). Let f ∈ C([a, b], R) and assume that f (x) > 0
for all x ∈ [a, b]. Suppose that
Z b
f (x) dx = 0.
a

Then, f (x) = 0 for all x ∈ [a, b].


Proof: Suppose that f ∈ C([a, b], R), f (x) > 0 for all x ∈ [a, b], and
Z b
f (x) dx = 0. (3.36)
a

Assume, by way of contradiction, that there exists xo ∈ (a, b) with f (xo ) > 0.
Then, since f is continuous at xo , there exists δ > 0 such that (xo − δ, xo + δ) ⊂
(a, b) and
f (xo )
x ∈ (xo − δ, xo + δ) ⇒ |f (x) − f (xo )| < . (3.37)
2
Now, using the triangle inequality, we obtain the estimate
f (xo ) 6 |f (x) − f (xo )| + |f (x)|;
so that, in view of (3.37) and the assumption that f is nonnegative on [a, b],
f (xo )
f (xo ) < + f (x), for xo − δ < x < xo + δ,
2
from which we get that
f (xo )
f (x) > , for xo − δ < x < xo + δ. (3.38)
2
It follows from (3.38) that
Z xo +δ Z xo +δ
f (xo )
f (x) dx > dx = δf (xo ).
xo −δ xo −δ 2
Thus, since we are assuming that f > 0 on [a, b],
Z b Z xo +δ
f (x) dx > f (x) dx > δf (xo ) > 0,
a xo −δ

which is in direct contradiction with the assumption in (3.36). Consequently,


f (x) = 0 for all x ∈ (a, b). By the continuity of f on [a, b], we also get that
f (x) = 0 for all x ∈ [a, b], and the proof of the lemma is now complete. 
Lemma 3.2.6 (Basic Lemma 2). Let f ∈ C([a, b], R) and assume that
Z d
f (x) dx = 0, for every (c, d) ⊂ (a, b).
c

Then, f (x) = 0 for all x ∈ [a, b].


28 CHAPTER 3. INDIRECT METHODS

Proof: Suppose that f ∈ C([a, b], R) and


Z d
f (x) dx = 0, for every (c, d) ⊂ (a, b). (3.39)
c

Arguing by contradiction, assume that f (xo ) 6= 0 for some xo ∈ (a, b). Without
loss of generality, we may assume that f (xo ) > 0. Then, by the continuity of f ,
there exists δ > 0 such that (xo − δ, xo + δ) ⊂ (a, b) and

f (xo )
x ∈ (xo − δ, xo + δ) ⇒ |f (x) − f (xo )| < ,
2
or
f (xo ) f (xo )
x ∈ (xo − δ, xo + δ) ⇒ f (xo ) − < f (x) < f (xo ) + ,
2 2
from which we get that
f (xo )
f (x) > , for xo − δ < x < xo + δ. (3.40)
2
It follows from (3.40) that
Z xo +δ Z xo +δ
f (xo )
f (x) dx > dx = δf (xo ) > 0,
xo −δ xo −δ 2

which is in direct contradiction with (3.39). Hence, it must be the case that
f (x) = 0 for a < x < b. It then follows from the continuity of f that f (x) = 0
for all x ∈ [a, b]. 
Next, we state and prove the first fundamental lemma of the Calculus of
Variations. A version of this result is presented as a Basic Lemma in Section
3–1 in [Wei74].
Lemma 3.2.7 (Fundamental Lemma 1). Let G ∈ C([a, b], R) and assume that
Z b
G(x)η(x) dx = 0, for every η ∈ Co ([a, b], R).
a

Then, G(x) = 0 for all x ∈ [a, b].


Proof: Let G ∈ C([a, b], R) and suppose that
Z b
G(x)η(x) dx = 0, for every η ∈ Co ([a, b], R). (3.41)
a

Arguing by contradiction, assume that there exists xo ∈ (a, b) such that G(xo ) 6=
0. Without loss of generality, we may also assume that G(xo ) > 0. Then, since
G is continuous on [a, b], there exists δ > 0 such that (xo − δ, xo + δ) ⊂ (a, b)
such that
G(xo )
G(x) > , for xo − δ < x < xo + δ. (3.42)
2
3.2. FUNDAMENTAL LEMMAS 29

Put x1 = xo − δ and x2 = xo + δ and define η : [a, b] → R by



0,
 if a 6 x 6 x1 ;
η(x) = (x − x1 )(x2 − x), if x1 < x 6 x2 ; (3.43)

0, if x2 < x 6 b.

Note that η is continuous on [a, b] and that η(a) = η(b) = 0; so that, η ∈


Co ([a, b], R). Observe also, from the definition of η in (3.43) that

η(x) > 0, for x1 < x < x2 . (3.44)

It also follows from the definition of η in 3.43 that


Z b Z x2
G(x)η(x) dx = G(x)η(x) dx;
a x1

so that, in view of (3.44) and (3.42),


Z b Z x2
G(xo )
G(x)η(x) dx > η(x) dx > 0,
a 2 x1

which is in direct contradiction with the assumption in (3.41). Consequently,


G(x) = 0 for all x ∈ (a, b). The continuity of G on [a, b] then implies that
G(x) = 0 for all x ∈ [a, b]. 
The following result is the one that we used in the example resented in the
previous section. We shall refer to it as the second fundamental lemma in the
Calculus of Variations.
Lemma 3.2.8 (Fundamental Lemma 2). Let G ∈ C([a, b], R) and assume that
Z b
G(x)η 0 (x) dx = 0, for every η ∈ Co1 ([a, b], R).
a

Then, G(x) = c for all x ∈ [a, b], where c is a constant.


Proof: Let G ∈ C([a, b], R) and assume that
Z b
G(x)η 0 (x) dx = 0, for every η ∈ Co1 ([a, b], R). (3.45)
a

Put Z b
1
c= G(x) dx, (3.46)
b−a a

the average value of G over [a, b].


Define η : [a, b] → R by
Z x
η(x) = (G(t) − c) dt, for x ∈ [a, b]. (3.47)
a
30 CHAPTER 3. INDIRECT METHODS

Then, η is a differentiable function, by virtue of the Fundamental Theorem of


Calculus, with
η 0 (x) = G(x) − c, for x ∈ [a, b], (3.48)
which defines a continuous function on [a, b]. Consequently, η ∈ C 1 ([a, b], R).
Observe also, from (3.47) that
Z a
η(a) = (G(t) − c) dt = 0,
a

and Z b Z b Z b
η(b) = (G(t) − c) dt = G(t) dt − c dt = 0,
a a a

in view of the definition of c in (3.46). It then follows that η ∈ Co1 ([a, b], R).
Next, compute
Z b Z b
2
(G(x) − c) dx = (G(x) − c)(G(x) − c) dx
a a

Z b
= (G(x) − c)η 0 (x) dx,
a

where we have used (3.48); so that,


Z b Z b Z b
0
(G(x) − c) dx 2
= G(x)η (x) dx − c η 0 (x).
a a a

Thus, using the assumption in (3.45) and the Fundamental Theorem of Calculus,
Z b
(G(x) − c)2 dx = 0, (3.49)
a

since η ∈ Co1 ([a, b], R).


It follows from (3.49) and the Basic Lemma 1 (Lemma 3.2.5) that G(x) = c
for all x ∈ [a, b], since (G − c)2 > 0 is [a, b]. 

The following lemma combines the results of the first and the second funda-
mental lemmas in the Calculus of Variations.
Lemma 3.2.9 (Fundamental Lemma 3). Let f : [a, b] → R and g : [a, b] → R
be continuous real values functions defined on [a, b]. Assume that
Z b
[f (x)η(x) + g(x)η 0 (x)] dx = 0, for every η ∈ Co1 ([a, b], R).
a

Then, g is differentiable in (a, b) and


d
[g(x)] = f (x), for all x ∈ (a, b).
dx
3.3. THE EULER–LAGRANGE EQUATIONS 31

Proof: Let f ∈ C([a, b], R), g ∈ C([a, b], R), and assume that
Z b
[f (x)η(x) + g(x)η 0 (x)] dx = 0, for every η ∈ Co1 ([a, b], R). (3.50)
a

Put Z x
F (x) = f (t) dt, for x ∈ [a, b]. (3.51)
a

Then, by the Fundamental Theorem of Calculus, F is differentiable in (a, b) and

F 0 (x) = f (x), for x ∈ (a, b). (3.52)

Next, let η ∈ Co1 ([a, b], R) and use integration by parts to compute
Z b b
Z b
f (x)η(x) dx = f (x)η(x) − F (x)η 0 (x) dx
a a a

Z b
= − F (x)η 0 (x) dx,
a

since η(a) = η(b) = 0; consequently, we can rewrite the condition in (3.50) as


Z b
[g(x) − F (x)]η 0 (x) dx = 0, for every η ∈ Co1 ([a, b], R). (3.53)
a

We can then apply the second fundamental lemma (Lemma 3.2.8) to obtain
from (3.53) that
g(x) − F (x) = C, for all x ∈ [a, b]
and some constant C, from which we get that

g(x) = F (x) + C, for all x ∈ [a, b]. (3.54)

It follows from (3.54), (3.51) and the Fundamental Theorem of Calculus that g
is differentiable with derivative

g 0 (x) = F 0 (x), for all x ∈ (a, b);

so that g 0 (x) = f (x) for all x ∈ (a, b), in view of (3.52). 

3.3 The Euler–Lagrange Equations


In the previous two sections we saw how the second fundamental lemma in
the Calculus of Variations (Lemma 3.2.8) can be used to obtain the differential
equation
y0
p = c1 , (3.55)
1 + (y 0 )2
32 CHAPTER 3. INDIRECT METHODS

where c1 is a constant, as a necessary condition for a function y ∈ C 1 ([xo , x1 ], R)


to be a minimizer of the arc–length functional, J : C 1 ([xo , x1 ], R) → R:
Z x1 p
J(y) = 1 + (y 0 (x))2 dx, for C 1 ([xo , x1 ], R), (3.56)
xo

over the class of functions

A = {y : C 1 ([xo , x1 ], R) | y(xo ) = yo and y(x1 ) = y1 }. (3.57)

The differential equation in (3.55) and the boundary conditions

y(xo ) = yo and y(x1 ) = y1 ,

defining the class A in (3.57), constitute a boundary value problem. In Section


3.1 we were able to solve this boundary value problem to obtain the solution in
(3.31), a straight line segment from the point (xo , yo ) to the point (x1 , y1 ). This
is a candidate for a minimizer of the arc–length functional J given in (3.56) over
the class A in (3.57).
In this section we illustrate the procedure employed in the Sections 3.1 and
3.2 in the case of a general functional J : C 1 ([a, b], R) → R of the form
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([a, b], R), (3.58)
a

where F : [a, b] × R × R → R is a continuous function of three variables. An


example of a function F in the integrand in (3.58) with value F (x, y, z) is
p
F (x, y, z) = 1 + z 2 , for all (x, y, z) ∈ [xo , x1 ] × R × R,

which is used in the definition of the arc–length functional J given in (3.56). In


the following example we provide another example of F (x, y, z) that comes up
in the celebrated brachistochrone problem, or the curve of shortest descent
time. A version of this problem is also discussed on page 19 of [Wei74] (note
that is that version of the problem, the positive y–axis points downwards, while
in the version discussed here, it points upwards as shown in Figure 3.3.2).
Example 3.3.1 (Brachistochrone Problem). Given points P and Q in a vertical
plane, with P higher that Q and to the left of Q (see Figure 3.3.2), find the
curve connecting P to Q along which a particle of mass m descends from P to
Q in the shortest possible time, assuming that only the force of gravity is acting
on the particle.
The sketch in Figure 3.3.2 shows a possible curve of descent from P to Q.
Observe also from the figure that we have assigned coordinates (0, yo ) to P and
(x1 , y1 ) to Q, where x1 > 0 and yo > y1 .
We assume that the path from P to Q is the graph of a C 1 function
y : [0, x1 ] → R such that

y(0) = yo and y(x1 ) = y1 .


3.3. THE EULER–LAGRANGE EQUATIONS 33

yo P

Q
y1 (x1 , y1 )

x1 x

Figure 3.3.2: Path descending from P to Q

The arc-length along the path from the point P to any point on the path, as a
function of x, is then given by
Z xp
s(x) = 1 + (y 0 (t))2 dt, for 0 6 x 6 x1 ;
0

so that, p
s0 (x) = 1 + (y 0 (x))2 , for 0 < x < x1 , (3.59)
by the Fundamental Theorem of Calculus.
The speed, v, of the particle along the path at any point on the curve is
given by
ds
v= . (3.60)
dt
We can use (3.59) and (3.60) to obtain a formula for the descent time, T , of the
particle: Z x1 0
s (x)
T = dx,
0 v
or Z x1 p
1 + (y 0 (x))2
T = dx. (3.61)
0 v
It thus remains to compute v in the denominator of the integrand in (3.61).
The speed of the particle will depend on the location of the particle along
the path. If we assume that the particle is released from rest, the v = 0 at P . To
find the speed at other points on the path, we will use the law of conservation of
energy, which says that the total mechanical energy of the system is conserved;
that is, the total energy remains constant throughout the motion. The total
energy of this particular system is the sum of the kinetic energy and the potential
energy of the particle of mass m.

Total Energy = Kinetic Energy + Potential Energy.


34 CHAPTER 3. INDIRECT METHODS

At P the particle is at rest, so


Kinetic Energy at P = 0,
while its potential energy is
Potential Energy at P = mgyo ,
where g is the gravitational acceleration. Thus,
Total Energy at P = mgyo .
At any point (x, y(x)) on the path the total energy is
1
Total Energy at (x, y) =mv 2 + mgy.
2
Thus, the law of conservation of energy implies that
1
mv 2 + mgy = mgyo ,
2
or
1 2
v + gy = gyo ,
2
after cancelling m; so that,
v 2 = 2g(yo − y),
from which we get that p
v= 2g(yo − y). (3.62)
This gives us an expression for v as a function of y. Note that we need to assume
that all the paths connecting P to Q under consideration must satisfy y(x) < yo
for all 0 < x < x1 .
Substituting the expression for v in (3.62) into the denominator of the inte-
grand in (3.61), we obtain the expression for the time of descent
Z x1 p
1 + (y 0 (x))2
T (y) = p dx, (3.63)
0 2g(yo − y(x))
for y ∈ C 1 ([0, x1 ], R) in the class
A = {y ∈ C 1 ([0, x1 ], R) | y(0) = yo , y(x1 ) = y1 , and y(x) < yo for 0 < x < x1 }.
(3.64)
We would like to minimize the time of descent functional in (3.63) over the
class of functions A defined in (3.64). Note that, if y ∈ A is a minimizer of the
functional T given in (3.63), then y is also a minimizer of
Z x1 p
p 1 + (y 0 (x))2
2gT (y) = p dx, for y ∈ A.
0 yo − y(x)
Thus, we will seek a minimizer of the functional J : A → R given by
Z x1 p
1 + (y 0 (x))2
J(y) = p dx, for y ∈ A. (3.65)
0 yo − y(x)
3.3. THE EULER–LAGRANGE EQUATIONS 35

The functional J derived in Example 3.3.1 (the Brachistochrone Problem),


corresponds to a function F : [0, x1 ] × (−∞, yo ) × R → R defined by

1 + z2
F (x, y, z) = √ , for (x, y, z) ∈ [0, x1 ] × (−∞, yo ) × R,
yo − y

in the general functional J given in (3.58). We will see many more examples of
this general class of functionals in these notes and in the homework assignments.
The general variational problem we would like to consider in this section is
the following:

Problem 3.3.2 (General Variational Problem 1). Given real numbers a and b
such that a < b, let F : [a, b] × R × R → R denote a continuous function of three
variables, (x, y, z), with x ∈ [a, b], and y and z in the set of real numbers (in
some cases, as in the Brachistochrone problem, we might need to restrict the
values of y and z as well). Define the functional J : C 1 ([a, b], R) → R by
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([a, b], R), (3.66)
a

For real numbers yo and y1 , consider the class of functions

A = {y ∈ C 1 ([a, b], R) | y(a) = yo , y(b) = y1 }. (3.67)

If possible, find y ∈ A such that

J(y) 6 J(v), for all v ∈ A, (3.68)

or
J(y) > J(v), for all v ∈ A. (3.69)

Thus, in Problem 3.3.2 we seek a minimum, in the case of (3.68), of the


functional J in (3.66) over the class A in (3.67) (a minimization problem), or
we seek a maximum, in the case of (3.69), of J over A (a maximization problem).
In general, we call either problem (3.68) or (3.69) an optimization problem.
We will see in these notes that, in order to answer the questions posed in
the General Variational Problem 1, we need to impose additional conditions on
the function F . We will see what those conditions are as we attempt to solve
the problem.
Suppose that we know a priori that the functional J defined in (3.66) is
bounded from below in A given in (3.67). Thus, it makes sense to ask whether
there exists y ∈ A at which J is minimized; that is, is there a y ∈ A for which
(3.68) holds true? We begin by assuming that this is the case; that is, there
exists y ∈ A for which

J(y) 6 J(v), for all v ∈ A. (3.70)


36 CHAPTER 3. INDIRECT METHODS

As in one of our solutions of the geodesic problem presented in Section 3.1, we


next seek to find necessary conditions for y ∈ A to be a minimizer of J defined
in (3.66) over the class A given in (3.68).
Let η : [a, b] → R denote a C 1 function such that η(a) = 0 and η(b) = 0; so
that η ∈ Co1 ([a, b], R). It then follows from the definition of A in (3.67) that

y + tη ∈ A, for all t ∈ R.

Thus, we get from (3.70) that

J(y) 6 J(y + tη), for all t ∈ R. (3.71)

Define g : R → R by

g(t) = J(y + tη), for all t ∈ R. (3.72)

It follows from (3.70) and the definition of g in (3.72) that g has a minimum at
0; that is
g(t) > g(0), for all t ∈ R.
Thus, if it can be shown that the function g defined in (3.72) is differentiable,
we get the necessary condition

g 0 (0) = 0,

which can be written in terms of J as


d
[J(y + tη)] = 0. (3.73)
dt t=0

Hence, in order to obtain the necessary condition in (3.73), we need to make


sure that the map
t 7→ J(y + tη), for t ∈ R, (3.74)
is a differentiable function of t at t = 0, where, according to (3.66),
Z b
J(y + tη) = F (x, y(x) + tη(x), y 0 (x) + tη 0 (x)) dx, for t ∈ R. (3.75)
a

According to Proposition B.1.1 in Appendix B.1 in these notes, the question


of differentiability of the map in (3.74) reduces to the continuity of the partial
derivatives
∂ ∂
[F (x, y, z)] = Fy (x, y, z) and [F (x, y, z)] = Fz (x, y, z).
∂y ∂z
Thus, in addition to being continuous, we will assume that F has continuous
partial derivatives with respect to y and with respect to z, Fy and Fz , respec-
tively. Making these additional assumptions, we can apply Proposition B.1.1 to
the expression in (3.75) to obtain, using the Chain Rule as well,
Z b
d
[J(y + tη)] = [Fy (x, y + tη, y 0 + tη 0 )η + Fz (x, y + tη, y 0 + tη 0 )η 0 ] dx, (3.76)
dt a
3.3. THE EULER–LAGRANGE EQUATIONS 37

for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0
for η 0 (x) in the integrand of the integral on the right–hand side of (3.76).
Substituting 0 for t in (3.76) we then obtain that
Z b
d
[J(y + tη)] = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx.
dt t=0 a

Thus, the necessary condition in (3.73) for y ∈ A to be a minimizer of J in A,


for the case in which F is continuous with continuous partial derivatives Fy and
Fz , is that
Z b
[Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx = 0, for all η ∈ Co1 ([a, b], R). (3.77)
a

Since we are assuming that Fy and Fz are continuous, we can apply the third
fundamental lemma in the Calculus of Variations (Lemma 3.2.9) to obtain from
(3.77) that the map

x 7→ Fz (x, y(x), y 0 (x)), for x ∈ [a, b],

is differentiable for all x ∈ (a, b) and

d
[Fz (x, y(x), y 0 (x))] = Fy (x, y(x), y 0 (x)), for all x ∈ (a, b). (3.78)
dx
The differential equation in (3.78) is called the Euler–Lagrange equation as-
sociated with the functional J defined in (3.66). It gives a necessary condition
for a function y ∈ A to be an optimizer of J over the class A given in (3.67). We
restate this fact, along with the assumptions on F , in the following proposition.

Proposition 3.3.3 (Euler–Lagrange Equation). Let a, b ∈ R be such that


a < b and let F : [a, b] × R × R be a continuous function of three variables
(x, y, x) ∈ [a, b] × R × R with continuous partial derivatives with respect to y
and with respect to z, Fy and Fz , respectively.
Define J : C 1 ([a, b], R) → R by
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([a, b], R). (3.79)
a

For real numbers yo and y1 , define

A = {y ∈ C 1 ([a, b], R) | y(a) = yo , y(b) = y1 }. (3.80)

A necessary condition for y ∈ A to be a minimizer, or a maximizer, of J


over the class A is that y solves the Euler–Lagrange equation

d
[Fz (x, y(x), y 0 (x))] = Fy (x, y(x), y 0 (x)), for all x ∈ (a, b). (3.81)
dx
38 CHAPTER 3. INDIRECT METHODS

In the indirect method of the Calculus of Variations, as it applies to the


General Variational Problem 1 (Problem 3.3.2), we first seek for a function
y ∈ A, where A is given in (3.80), that solves the Euler–Lagrange equation in
(3.81). This leads to the two–point boundary value problem
d


 [Fz (x, y(x), y 0 (x))] = Fy (x, y(x), y 0 (x)), for all x ∈ (a, b);
dx (3.82)


y(a) = yo , y(b) = y1 .

A solution of the boundary value problem in (3.82) will be a candidate for a


minimizer, or a maximizer, of the functional J in (3.79) over the class A given
in (3.80).
The second step in the indirect method is to verify that the candidate is a
minimizer or a maximizer. In the remainder of this section we give examples of
the boundary value problems in (3.82) involving the Euler–Lagrange equation.
In subsequent sections we will see how to verify that a solution of the boundary
value problem in (3.82) yields a minimizer for a large class of problems.
Example 3.3.4 (Geodesic Problem 2, Revisited). In Problem 3.1.2 (The Geodesic
Problem 2), we looked at the problem of minimizing the functional,

J : C 1 ([xo , x1 ], R) → R,

given by
Z x1 p
J(y) = 1 + (y 0 (x))2 dx, for all y ∈ C 1 ([xo , x1 ], R), (3.83)
xo

over the class

A = {y ∈ C 1 ([xo , x1 ], R) | y(xo ) = yo , y(x1 ) = y1 }. (3.84)

In this case,
p
F (x, y, z) = 1 + z2, for (x, y, z) ∈ [xo , x1 ] × R × R.

So that,
z
Fy (x, y, z) = 0 and Fz (x, y, z) = √ , for (x, y, z) ∈ [xo , x1 ] × R × R.
1 + z2
The Euler–Lagrange equation associated with the functional in (3.83) is then
" #
d y 0 (x)
p = 0, for x ∈ (xo , x1 ).
dx 1 + (y 0 (x))2

Integrating this equation yields


y 0 (x)
p = c1 , for x ∈ (xo , x1 ),
1 + (y 0 (x))2
3.3. THE EULER–LAGRANGE EQUATIONS 39

for some constant of integration c1 . This is the same equation in (3.25) that we
obtained in the solution of Geodesic Problem 2. The solution of this equation
subject to the boundary conditions

y(xo ) = yo and y(x1 ) = y1

was given in (3.31); namely,

y1 − yo x1 yo − xo y1
y(x) = x+ , for xo 6 x 6 x1 . (3.85)
x1 − xo x1 − xo

The graph of the function y given in (3.85) is a straight line segment form the
point (xo , yo ) to the point (x1 , y1 ). We will see in a subsequent section that the
function in (3.85) is the unique minimizer of the arc–length functional defined
in (3.83) over the class A given in (3.84).

Example 3.3.5 (The Brachistochrone Problem, Revisited). In Example 3.3.1


we saw that the time of descent form point P (0, yo ) to point Q(x1 , y1 ), where
x1 > 0 and yo > y1 , along a path that is the graph of a function y ∈ C 1 ([0, x1 ], R),
with y(0) = yo and y(x1 ) = y1 , is proportional to
x1
p
1 + (y 0 (x))2
Z
J(y) = p dx, for y ∈ A, (3.86)
0 yo − y(x)

where A is the class

A = {y ∈ C 1 ([0, x1 ], R) | y(0) = yo , y(x1 ) = y1 , and y(x) < yo for 0 < x < x1 }.


(3.87)
In this case, the function F corresponding to the functional J in (3.86) is given
by √
1 + z2
F (x, y, z) = √ , for 0 6 x 6 x1 , y < yo , and z ∈ R.
yo − y
We then have that

1 + z2
Fy (x, y, z) = , for 0 6 x 6 x1 , y < yo , and z ∈ R,
2(yo − y)3/2

and
z
Fz (x, y, z) = √ √ , for 0 6 x 6 x1 , y < yo , and z ∈ R.
1 + z 2 yo − y

Thus, the Euler–Lagrange equations associated with the functional J given in


(3.86) is
" # p
d y 0 (x) 1 + (y 0 (x))2
p p = , for 0 < x < x1 . (3.88)
dx 1 + (y 0 (x))2 yo − y(x) 2(yo − y(x))3/2
40 CHAPTER 3. INDIRECT METHODS

To simplify the differential equation in (3.88), we will assume that y is twice


differentiable. We will also introduce a new variable, u, which is a function of
x defined by

u(x) = yo − y(x), where y(x) < yo for 0 < x < x1 ; (3.89)

so that, u(x) > 0 for 0 < x < x1 and

y(x) = yo − u(x), where u(x) > 0 for 0 < x < x1 , (3.90)

and
y 0 = −u0 . (3.91)
We can then rewrite the equation in (3.88) as
" # p
d −u0 (x) 1 + (u0 (x))2
p p = , for 0 < x < x1 ,
dx 1 + (u0 (x))2 u(x) 2(u(x)3/2

or " # p
d u0 1 + (u0 )2
p √ =− , for 0 < x < x1 , (3.92)
dx 0 2
1 + (u ) u 2u3/2

where we have written u for u(x) and u0 for u0 (x). Next, we proceed to evaluate
the derivative on the left–hand side of the equation in (3.92) and simplify to
obtain from (3.92) that

(u0 )2 + 2uu00 + 1 = 0 for 0 < x < x1 , (3.93)

where u00 denotes the second derivative of u. Multiply on both sides of (3.93)
by u0 to get
(u0 )3 + 2uu0 u00 + u0 = 0, for 0 < x < x1 ,
which can in turn be written as
d
[u + u(u0 )2 ] = 0. (3.94)
dx
Integrating the differential in (3.94) yields

u(1 + (u0 )2 ) = C, for 0 < x < x1 ,

and some constant C, which we can solve for (u0 )2 to get

C −u
(u0 )2 = , for 0 < x < x1 . (3.95)
u
Next, we solve (3.95) for u0 to get
r
0 C −u
u = , for 0 < x < x1 , (3.96)
u
3.3. THE EULER–LAGRANGE EQUATIONS 41

where we have taken the positive square root in (3.96) in view of (3.91), since
y decreases with increasing x.
Our goal now is to find a solution of the differential equation in (3.96) subject
to the conditions
u = 0 when x = 0, (3.97)
according to (3.89), amd

u = yo − y1 when x = x1 . (3.98)

Using the Chain Rule, we can rewrite (3.96) as


r
dx u
= , for 0 < u < yo − y1 . (3.99)
du C −u

The graph of a solution of (3.99) will be a smooth path connecting the point
(0, 0) to the pint (x1 , yo − y1 ) in the xu–plane as pictured in Figure 3.3.3. We

yo − y1 (x1 , yo − y1 )

x1 x

Figure 3.3.3: Shortest “descent” time path in the xu–plane

can also obtain the path as a parametrized curve

x = x(θ), u = u(θ), for θo < θ < θ1 , (3.100)

where θ is the angle the tangent line to the curve makes with a vertical line (see
the sketch in Figure 3.3.3). We then have that

dx
= tan θ; (3.101)
du
so that, using (3.99),

u sin2 θ
= , for θo < θ < θ1 . (3.102)
C −u cos2 θ
42 CHAPTER 3. INDIRECT METHODS

Solving the equation in (3.102) for u yields

u = C sin2 θ, (3.103)

where we have used the trigonometric identity cos2 θ + sin2 θ = 1; thus, using
the trigonometric identity
1
sin2 θ = (1 − cos 2θ),
2
C
u(θ) = (1 − cos 2θ), for θo < θ < θ1 . (3.104)
2
In view of the condition in (3.97) we see from (3.104) that θo = 0; so that,

C
u(θ) = (1 − cos 2θ), for 0 < θ < θ1 . (3.105)
2
To find the parametric expression for x in terms of θ, use the Chain Rule to
obtain
dx dx du
= ;
dθ du dθ
so that, in view of (3.101) and (3.103)

dx sin θ
= · 2C sin θ cos θ,
dθ cos θ
which which we get
dx
= 2C sin2 θ,

or
dx
= C(1 − cos 2θ), for 0 < θ < θ1 . (3.106)

Integrating the differential equation in (3.106) and using the boundary condition
in (3.97), we obtain that

1
x(θ) = C(θ − sin 2θ), for 0 < θ < θ1 ,
2
which we can rewrite as
C
x(θ) = (2θ − sin 2θ) for 0 < θ < θ1 . (3.107)
2
C
Putting together the expressions in (3.105) and (3.107), denoting by a,
2
and introducing a new parameter t = 2θ, we obtain the parametric equations
(
x(t) = at − a sin t;
(3.108)
u(t) = a − a cos t,
3.3. THE EULER–LAGRANGE EQUATIONS 43

a
t

x
P

Figure 3.3.4: Cycloid

for 0 6 t 6 t1 , which are the parametric equations of a cycloid. This is the curve
traced by a point, P , on a circle of radius a and center (0, a), which starts at
the origin when t = 0, as the circle rolls on the x–axis in the positive direction
(see Figure 3.3.4). The parameter t gives the angle the vector from the center
of the circle to P makes with a vertical vector emanating from the center and
pointing downwards; this is shown in Figure 3.3.4.
To find a curve parametrized by (3.108) that goes through the point

(x1 , yo − y1 ),

so that the boundary condition in (3.98) is satisfied, we need to find a and t1


such that 
at1 − a sin t1 = x1
(3.109)
a − a cos t1 = yo − y1 .
We will show presently that the system in (3.109) can always be solved for
positive values of x1 and yo − y1 by an appropriate choice of a.
The sketch in Figure 3.3.5 shows a cycloid generated by rolling a circle of
radius 1 along the x–axis in the positive direction. Assume, for the sake of
illustration, that the point (x1 , yo − y1 ) lies above the cycloid and draw the line
segment joining the origin in the xu–plane to the point (x1 , yo − y1 ). The line
will meet the cycloid at at exactly one point; we labeled that point P1 in the
sketch in Figure 3.3.5. Observe that, in this case, the distance from the origin
to P1 is shorter than the distance from the origin to (x1 , yo − y1 ). However, by
increasing the value of a > 1 in (3.108) we can get another cycloid that meets
the line segment from (0, 0) to (x1 , yo − y1 ) at a point whose distance from
the origin is bigger than that from P1 to (0, 0); see the sketch in Figure 3.3.5.
According to the parametric equations in (3.108), the distance from any point
on a cycloid generated by a circle of radius a is given by
p
k(x(t), u(t))k = a t2 + 2 − 2t sin t − 2 cos t, for 0 6 t 6 2π. (3.110)
44 CHAPTER 3. INDIRECT METHODS

(x1 , yo − y1 )

P1

Figure 3.3.5: Solving the system in (3.109)

Observe that, for 0 < t < 2π,

k(x(t), u(t))k → ∞ as a → ∞.

Thus, since the distance defined in (3.110) is an increasing and continuous func-
tion of a, it follows from the intermediate value theorem that there exists a
value of a such that the cycloid generated by a circle of radius a goes through
the point (x1 , yo − y1 ); this is also shown in Figure 3.3.5. On the other hand, if
the point (x1 , yo − y1 ) is below the original cycloid, we can decrease the radius
a < 1 of the circle generating the cycloid until we reach the point (x1 , yo − y1 ).
Once the value of a > 0 is determined, we can find the value of t1 by solving
the second equation in (3.109) to obtain
 
−1 a − (yo − y1 )
t1 = cos .
a

A sketch of the curve obtained in this fashion is shown in Figure 3.3.6.


To get the solution of the Euler–Lagrange equation in (3.88) in the xy–plane
subject to the boundary conditions

y(0) = yo and y(x1 ) = y1 ,

we use the transformation equation in (3.89) to get from (3.108) the parametric
equations
(
x(t) = at − a sin t;
(3.111)
y(t) = yo − a + a cos t,

for 0 6 t 6 t1 , where a and t1 have already been determined. A sketch of this


curve is shown in Figure 3.3.7. This curve is the graph of a twice–differentiable
3.3. THE EULER–LAGRANGE EQUATIONS 45

(x1 , yo − y1 )

Figure 3.3.6: Sketch of solution of (3.93) subject to u(0) = 0 and u(x1 ) = yo −y1

function, y : [0, x1 ] → R, that solves the two–point boundary value problem


 " # p
 d y0 1 + (y 0 )2
√ = ;

 p
 dx 2(yo − y)3/2

 1 + (y 0 )2 yo − y
(3.112)

y(0) = yo ;




y(x1 ) = y1 .

The solution of the two–point boundary value problem in (3.112) described


in Example 3.3.5 is a candidate for a minimizer of the descent time functional
in (3.63). We have not shown that this function is indeed a minimizer. In the
next chapter we shall see how to show that the solution of the boundary value
problem in (3.112) provides the curve of fastest descent from the point P (0, yo )
to the point (x1 , y1 ) in Figure 3.3.2.
46 CHAPTER 3. INDIRECT METHODS

yo

y1 (x1 , y1 )

x1 x

Figure 3.3.7: Sketch of solution of (3.88) subject to y(0) = yo and y(x1 ) = y1


Chapter 4

Convex Minimization

The functionals we encountered in the Geodesic Example 2 (see Example 3.1.2)


and in the Brachistochrone Problem (Example 3.3.1),
Z x1 p
J(y) = 1 + (y 0 (x))2 dx, for all y ∈ C 1 ([xo , x1 ], R), (4.1)
xo

and
x1
p
1 + (y 0 (x))2
Z
J(y) = p dx, for all y ∈ A, (4.2)
0 yo − y(x)
where

A = {y ∈ C 1 ([0, x1 ], R) | y(0) = yo , y(x1 ) = x1 , y(x) < yo for 0 < x < x1 },

respectively, are strictly convex functionals. We will see in this chapter that the
for this class of functionals we can prove the existence of a unique minimizer.
The functionals in (4.1) and (4.2) are also Gâteaux differentiable. We begin this
chapter with a discussion of Gâteaux differentiability.

4.1 Gâteaux Differentiability


We consider the general situation of a functional J : V → R defined on a linear
space V . Let Vo denote a nontrivial subspace of V ; that is, Vo is not the trivial
subspace {0}. For every vectors u in V and v in Vo , define the real valued
function, g : R → R, of a single variable as follows

g(t) = J(u + tv), for all t ∈ R; (4.3)

that is, the function g gives the values of J along a line through u in the
direction of v 6= 0. We will focus on the special case in which the function g is
differentiable at t = 0. If this the case, we say that the functional J is Gâteaux
differentiable at u in the direction of v and denote g 0 (0) by dJ(u; v), and call

47
48 CHAPTER 4. CONVEX MINIMIZATION

it the Gâteaux derivative of J at u in the direction of v; so that, according to


the definition of g in (4.3)
d
dJ(u; v) = [J(u + tv)] . (4.4)
dt t=0

The existence of the expression on the right–hand side of (4.4) translate into
the existence of the limit defining g 0 (0), or
J(u + tv) − J(u)
lim .
t→0 t
Here is the formal definition of Gâteaux differentiability.
Definition 4.1.1 (Gâteaux Differentiability). Let V be a normed linear space,
Vo be a nontrivial subspace of V , and J : V → R be a functional defined on V .
We say that J is Gâteaux differentiable at u ∈ V in the direction of v ∈ Vo if
the limit
J(u + tv) − J(u)
lim (4.5)
t→0 t
exists. If the limit in (4.5) exists, we denote it by the symbol dJ(u; v) and call
it the Gâteaux derivative of J at u in the direction of v, or the first variation
of J at u in the direction of v. Thus, if J is Gâteaux differentiable at u in the
direction of v, its Gâteaux derivative at u in the direction of v is given by
d
dJ(u; v) = [J(u + tv)] , (4.6)
dt t=0

or, in view of (4.5),


J(u + tv) − J(u)
dJ(u; v) = lim (4.7)
t→0 t

We now present a few examples of Gâteaux differentiable functionals in


various linear spaces, and their Gâteaux derivatives. In practices, we usually
compute (if possible) the derivative of J(u + tv) with respect to t, and then
evaluate it at t = 0 (see the right–hand side of the equation in (4.6)).
Example 4.1.2 (The Dirichlet Integral). Let Ω denote an open, bounded subset
of Rn . Let C 1 (Ω, R) denote the space or real–valued functions u : Ω → R whose
partial derivatives exist, and are continuous in an open subset, U , that contains
Ω. Define J : C 1 (Ω, R) → R by
Z
1
J(u) = |∇u|2 dx, for all u ∈ C 1 (Ω, R). (4.8)
2 Ω
The expression |∇u| in the integrand on the right–hand–side of (4.8) is the
Euclidean norm of the gradient of u,
 
∂u ∂u
∇u = , = (ux , uy ), if n = 2,
∂x ∂y
4.1. GÂTEAUX DIFFERENTIABILITY 49

or  
∂u ∂u ∂u
∇u = , , = (ux , uy , uz ), if n = 3;
∂x ∂y ∂z
so that
|∇u|2 = (ux )2 + (uy )2 , if n = 2,
or
|∇u|2 = (ux )2 + (uy )2 + (uy )2 , if n = 3.
The differential dx in the integral on the right–hand side of (4.8) represents the
element of area, dxdy, in the case in which Ω ⊂ R2 , or the element of volume,
dxdydz, in the case in which Ω ⊂ R3 . Thus, if n = 2 the integral on the right–
hand side of (4.8) is a double integral over the plane region Ω ⊂ R2 ; while, if
n = 3 the integral in (4.8) is a triple integral over the region in three–dimensional
Euclidean space.
We shall denote by Co1 (Ω, R) the space of functions v ∈ C 1 (Ω, R) that are 0
on the boundary, ∂Ω, of the region Ω; thus,
Co1 (Ω, R) = {v ∈ C 1 (Ω, R) | v = 0 on ∂Ω}.
We shall show in this example that the functional J : C 1 (Ω, R) → R defined in
(4.8) is Gâteaux differentiable at every u ∈ C 1 (Ω, R) in the direction of every
v ∈ Co1 (Ω, R) and
Z
dJ(u; v) = ∇u · ∇v dx, for all u ∈ C 1 (Ω, R) and all v ∈ Co1 (Ω, R), (4.9)

where ∇u · ∇v denotes the dot product of the gradients of u and v; thus,
∇u · ∇v = ux vx + uy vy , if n = 2,
or
∇u · ∇v = ux vx + uy vy + uz vz , if n = 3.
1
For u ∈ C (Ω, R) and v ∈ Co1 (Ω, R), use the definition of J in (4.8) to
compute Z
1
J(u + tv) = |∇(u + tv)|2 dx
2 Ω
Z
1
= |∇u + t∇v)|2 dx,
2 Ω
where we have used the linearity of the differential operator ∇. Thus, using the
fact that the square of the Euclidean norm of a vector is the dot product of the
vector with itself,
Z
1
J(u + tv) = (∇u + t∇v) · ∇u + t∇v) dx
2 Ω
Z
1
= (|∇u|2 + 2t∇u · ∇v + t2 |∇v|2 ) dx
2 Ω
Z Z Z
1
= |∇u|2 + t ∇u · ∇v dx + t2 |∇v|2 dx.
2 Ω Ω Ω
50 CHAPTER 4. CONVEX MINIMIZATION

Thus, using the definition of J in (4.8),


Z
J(u + tv) = J(u) + t ∇u · ∇v dx + t2 J(v), for all t ∈ R. (4.10)

Observe that the right–hand side of the expression in (4.10) is a quadratic


polynomial in t. Hence, it is differentiable in t and
Z
d
[J(u + tv)] = ∇u · ∇v dx + 2tJ(v), for all t ∈ R.
dt Ω

from which we get that


Z
d
[J(u + tv)] = ∇u · ∇v dx. (4.11)
dt t=0 Ω

It follows from (4.11) that J is Gâteaux differentiable at every u ∈ C 1 (Ω, R) in


the direction of every v ∈ Co1 (Ω, R), and its Gâteaux derivative is as claimed in
(4.9).

Example 4.1.3. Let a, b ∈ R be such that a < b and let

F : [a, b] × R × R → R

be a continuous function of three variables (x, y, x) ∈ [a, b] × R × R with con-


tinuous partial derivatives with respect to y and with respect to z, Fy and Fz ,
respectively. Put V = C 1 ([a, b], R) → R and Vo = Co1 ([a, b], R) → R.
Define J : V → R by
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ V. (4.12)
a

Then, J is Gâteaux differentiable at every y ∈ V in the direction of every


η ∈ Vo , and its Gâteaux derivative is given by
Z b
dJ(y, η) = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx, for y ∈ V and η ∈ Vo . (4.13)
a

For y ∈ V and η ∈ Vo , use (4.12) to compute


Z b
J(y + tη) = F (x, y(x) + tη(x), y 0 (x) + tη 0 (x)) dx, for t ∈ R. (4.14)
a

Thus, according to Proposition B.1.1 in Appendix B.1 in these notes, the ques-
tion of differentiability of the map

t 7→ J(y + tη), for t ∈ R, (4.15)


4.1. GÂTEAUX DIFFERENTIABILITY 51

reduces to whether the partial derivatives

∂ ∂
[F (x, y, z)] = Fy (x, y, z) and [F (x, y, z)] = Fz (x, y, z)
∂y ∂z

are continuous in [a, b] × R × R. This is one of the assumptions we have made


in this example.
Now, it follows from the Chain Rule that


[F (x, y(x)+tη(x), y 0 (x)+tη 0 (x))] = Fy (·, y+tη, y 0 +tη 0 )η+Fz (·, y+tη, y 0 +tη 0 )η 0 ,
∂t
for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0 for
η 0 (x). Thus, since we are assuming that y and η are C 1 functions, we see that
the assumptions for Proposition B.1.1 in Appendix B.1 hold true. We therefore
conclude that the map in (4.15) is differentiable and
Z b
d
[J(y + tη)] = [Fy (x, y + tη, y 0 + tη 0 )η + Fz (x, y + tη, y 0 + tη 0 )η 0 ] dx, (4.16)
dt a

for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0
for η 0 (x) in the integrand of the integral on the right–hand side of (4.15).
Setting t = 0 in (4.16) we then obtain that
Z b
d
[J(y + tη)] = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx,
dt t=0 a

from which the expression in (4.13) follows.

Example 4.1.4 (A Sturm–Liouville Problem). Let p ∈ C([xo , x1 ], R) and q ∈


C([xo , x1 ], R) be functions satisfying p(x) > 0 and q(x) > 0 for xo 6 x 6 x1 .
Define J : C 1 ([xo , x1 ], R) → R by
Z x1
J(y) = [p(x)(y 0 (x))2 + q(x)(y(x))2 ] dx, for y ∈ C 1 ([xo , x1 ], R). (4.17)
xo

We show that J is Gâteaux differentiable at any y ∈ V = C 1 ([xo , x1 ], R)


in the direction of any η ∈ Vo = Co1 ([xo , x1 ], R). To do this, observe that the
functional J in (4.17) is of the form
Z x1
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([xo , x1 ], R),
xo

where

F (x, y, z) = p(x)z 2 + q(x)y 2 , for x ∈ [xo , x1 ], y ∈ R, z ∈ R. (4.18)


52 CHAPTER 4. CONVEX MINIMIZATION

Since we are assuming that the functions p and q are continuous on [xo , x1 ], it
follows that the function F : [xo , x1 ] × R × R → R defined in (4.18) is continuous
on [xo , x1 ] × R × R with continuous partial derivatives

Fy (x, y, z) = 2q(x)y and Fz (x, y, z) = 2p(x)z,

for (x, y, z) ∈ [xo , x1 ]×R×R. Consequently, by the result of Example and (4.13),
we conclude that the functional J defined in (4.17) is Gâteaux differentiable at
every y ∈ C 1 ([xo , x1 ], R) with Gâteaux derivative
Z x1
dJ(y, η) = [2q(x)y(x)η(x) + 2p(x)y 0 (x)η 0 (x)] dx, (4.19)
xo

for every direction η ∈ Co1 ([xo , x1 ], R).

4.2 A Minimization Problem


Let V denote a normed linear space and J : V → R a functional defined on V .
For a given nonempty subset A of V , we consider the problem of finding an
element u of A for which J(u) is the smallest possible among all values of J(v)
for v in A. We write:

J(u) 6 J(v), for all v ∈ A. (4.20)

We call A the class of admissible vectors for the minimization problem in


(4.20).
In addition, suppose that there exists a nontrivial subspace Vo of V with the
property that: for every u ∈ A and every v ∈ Vo there exists δ > 0 such that

|t| < δ ⇒ u + tv ∈ A. (4.21)

We will refer to Vo as the space of admissible directions.


Suppose we have found a solution u ∈ A of the minimization problem in
(4.20). Assume further that J is Gâtaeux differentiable at u along any direction
v ∈ Vo ,
Let v be any direction in Vo . It follows from (4.21) and (4.20) that there
exists δ > 0 such that

J(u + tv) > J(u), for |t| < δ,

or
J(u + tv) − J(u) > 0, for |t| < δ. (4.22)
Dividing on both sides of the inequality in (4.22) by t > 0 we obtain that

J(u + tv) − J(u)


> 0, for 0 < t < δ. (4.23)
t
4.3. CONVEX FUNCTIONALS 53

Thus, letting t → 0+ in (4.23) and using the definition of the Gâteaux derivative
of J in (4.7), we get that
dJ(u; v) > 0, (4.24)

since we are assuming that J is Gâteaux differentiable at u.


Similarly, dividing on both sides of the inequality in (4.22) by t < 0 we
obtain that
J(u + tv) − J(u)
6 0, for − δ < t < 0. (4.25)
t
Letting t → 0− in (4.25), using the assumption that J is Gâteaux differentiable
at u, we have that
dJ(u; v) 6 0. (4.26)

Combining (4.24) and (4.26), we obtain the result that, if J is Gâteaux differ-
entiable at u, and u is a minimizer of J over A, then

dJ(u; v) = 0, for all v ∈ Vo . (4.27)

The condition in (4.27) is a necessary condition for u to be a minimizer of J


over A in the case in which J is Gâteaux differentiable at u along any direction
v ∈ Vo .

Remark 4.2.1. In view of the expression in (4.13) derived in Example 4.1.3, we


note that the necessary condition in (4.27), in conjunction with the Fundamental
Lemma 3 (Lemma 3.2.9), was used in Section 3.3 to derive the Euler–Lagrange
equation.

4.3 Convex Functionals


Many of the functionals discussed in the examples in these notes so far are
convex. In this section we present the definitions of convex and strictly convex
functionals and discuss a few of their properties.

Definition 4.3.1 (Convex Functionals). Let V denote a normed linear space, Vo


a nontrivial subspace of V , and A a given nonempty subset of V . Let J : V → R
be a functional defined on V . Suppose that J is Gâteaux differentiable at every
u ∈ A in any direction v ∈ Vo . The functional J is said to be convex on A if

J(u + v) > J(u) + dJ(u; v) (4.28)

for all u ∈ A and v ∈ Vo such that u + v ∈ A.


A Gâteaux differentiable functional J : V → R is said to be strictly convex
in A if it is convex in A, and

J(u+v) = J(u)+dJ(u; v), for u ∈ A, v ∈ Vo with u+v ∈ A, iff v = 0. (4.29)


54 CHAPTER 4. CONVEX MINIMIZATION

Example 4.3.2 (The Dirichlet Integral, Revisited). Let V = C 1 (Ω, R) and


Vo = C 1 (Ω, R). Let U denote an open subset in Rn that contains Ω and let
g ∈ C(U, R). Defined

A = {u ∈ C 1 (Ω, R) | u = g on ∂Ω}; (4.30)

that is, A is the class of C 1 functions u : Ω → R that take on the values of g on


the boundary of Ω.
Define J : V → R by
Z
1
J(u) = |∇u|2 dx, for all u ∈ C 1 (Ω, R). (4.31)
2 Ω
We showed in Example 4.1.2 that J is Gâteaux differentiable at every u ∈
C 1 (Ω, R) in the direction of every v ∈ Co1 (Ω, R), with Gâteaux derivative given
by
Z
dJ(u; v) = ∇u · ∇v dx, for u ∈ C 1 (Ω, R) and v ∈ Co1 (Ω, R). (4.32)

We will first show that J is convex in A, where A is defined in (4.30). Thus,


let u ∈ A and v ∈ Co1 (Ω, R). Then, u + v ∈ A, since v vanishes on ∂Ω.
Compute
Z Z
1 1
J(u + v) = |∇(u + v)|2 dx = J(u) + dJ(u; v) + |∇v|2 dx. (4.33)
2 Ω 2 Ω
Thus,
J(u + v) > J(u) + dJ(u; v)
for all u ∈ A and v ∈ Co1 (Ω, R).Consequently, J is convex in A.
Next, we show that J is strictly convex.
From (4.33) we get that
Z
1
J(u + v) = J(u) + dJ(u; v) + |∇v|2 dx,
2 Ω

for all u ∈ A and v ∈ Co1 (Ω, R). Consequently,

J(u + v) = J(u) + dJ(u; v)

if and only if Z
|∇v|2 dx = 0.

1
Thus, since v ∈ C (Ω, R), ∇u = 0 in Ω, and therefore v is constant on connected
components of Ω. Hence, since v = 0 on ∂Ω, it follows that v(x) = 0 for all
x ∈ Ω. We conclude therefore that the Dirichlet integral functional J defined
in (4.31) is strictly convex in A.
4.3. CONVEX FUNCTIONALS 55

Example 4.3.3 (A Sturm–Liouville Problem, Revisited). Let p : [xo , x1 ] → R


and q : [xo , x1 ] → R be coninuous functions satisfying p(x) > 0 and q(x) > 0 for
xo 6 x 6 x1 . Define J : C 1 ([xo , x1 ], R) → R by
Z x1
J(y) = [p(x)(y 0 (x))2 + q(x)(y(x))2 ] dx, for y ∈ C 1 ([xo , x1 ], R). (4.34)
xo

Consider the problem of minimizing J over the class

A = {y ∈ C 1 ([xo , x1 ], R) | y(xo ) = yo and y(x1 ) = y1 }, (4.35)

for given real numbers yo and y1 .


In Example 4.1.4 we showed that the functional J : C 1 ([xo , x1 ], R) → R given
in (4.34) is Gâteaux differentiable at every y ∈ C 1 ([xo , x1 ], R) with Gâteaux
derivative given by (4.19); namely,
Z x1
dJ(y, η) = 2 [q(x)y(x)η(x) + p(x)y 0 (x)η 0 (x)] dx, for y ∈ V, η ∈ Vo , (4.36)
xo

where V = C 1 ([xo , x1 ], R) and Vo = Co1 ([xo , x1 ], R).


In this example we show that J is strictly convex in A given in (4.35).
We first show that J is convex.
Let y ∈ A and η ∈ Vo . Then, y + η ∈ A, given that η(xo ) = η(x1 ) = 0 and
the definition of A in (4.35).
Compute
Z x1
J(y + η) = [p(x)(y 0 (x) + η 0 (x))2 + q(x)(y(x) + η(x))2 ] dx,
xo

or, expanding the integrand,


Z x1
J(y + η) = [p(x)(y 0 (x))2 + 2p(x)y 0 (x)η 0 (x) + p(x)(η 0 (x))2 ] dx
xo Z
x1
+ [q(x)(y(x))2 + 2q(x)y(x)η(x) + q(x)(η(x))2 ] dx,
xo

which we can rewrite as


Z x1
J(y + η) = [p(x)(y 0 (x))2 + q(x)(y(x))2 ] dx
xo Z x1
+2 [p(x)y 0 (x)η 0 (x) + q(x)y(x)η(x)] dx
xo Z
x1
+ [p(x)(η 0 (x))2 + q(x)(η(x))2 ] dx;
xo

so that, in view of (4.34) and (4.36),


Z x1
J(y + η) = J(y) + dJ(y; η) + [p(x)(η 0 (x))2 + q(x)(η(x))2 ] dx. (4.37)
xo
56 CHAPTER 4. CONVEX MINIMIZATION

Since we are assuming in this example that p(x) > 0 and q(x) > 0 for all
x ∈ [xo , x1 ], it follows from (4.37) that

J(y + η) > J(y) + dJ(y; η), for y ∈ A and η ∈ Vo ,

which shows that J is convex in A.


Next, observe that, in view of (4.37),

J(y + η) = J(y) + dJ(y; η), for y ∈ A and η ∈ Vo

if and only if Z x1
[p(x)(η 0 (x))2 + q(x)(η(x))2 ] dx = 0. (4.38)
xo

Now, it follows from (4.38) and the assumption that p > 0 and q > 0 on [xo , x1 ]
that Z x1 Z x1
0 2
p(x)(η (x)) dx = 0 and q(x)(η(x))2 dx = 0,
xo xo

from which we obtain that η(x) = 0 for all x ∈ [xo , x1 ], since η is continuous on
[xo , x1 ]. Hence, J is strictly convex in A.

The functional in (4.34) is an example of a general class of functionals of the


form Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([s, b], R),
a
where F : [a, b] × R × R → R is a continuous function with continuous partial
derivatives derivatives Fy (x, yz) and Fz (x, y, z) in [a, b] × R × R. In the next
example, we present conditions that will guarantee that functionals of this type
are convex, or strictly convex.
Example 4.3.4. In Example 4.1.3 we saw that the functional J : V → R given
by
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ V, (4.39)
a
1
where V = C ([a, b], R) → R, and the function

F : [a, b] × R × R → R

is a continuous function of three variables (x, y, x) ∈ [a, b] × R × R with con-


tinuous partial derivatives with respect to y and with respect to z, Fy and Fz ,
respectively, on [a, b] × R × R, is Gâteaux differentiable at every y ∈ V in the
direction of η ∈ Vo , where Vo = Co1 ([a, b], R). We saw in Example 4.1.3 that
Z b
dJ(y, η) = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx, for y ∈ V and η ∈ Vo . (4.40)
a

In this example we find conditions on the function F that will guarantee that
the functional J given in (4.39) is convex or strictly convex.
4.3. CONVEX FUNCTIONALS 57

In view of 4.40, according to Definition 4.3.1, the functional J defined in


(4.39) is convex in

A = {y ∈ C 1 ([a, b], R) | y(a) = yo , y(b) = y1 }, (4.41)

provided that
Z b Z b Z b
F (x, y+v, y 0 +v 0 ) dx > F (x, y, y 0 ) dx+ [Fy (x, y, y 0 )v+Fz (x, y, y 0 )v 0 ] dx
a a a

for all y ∈ A and v ∈ Vo . This inequality will follow, for example, if the function
F satisfies

F (x, y + v, z + w) > F (x, y, z) + Fy (x, y, z)v + Fz (x, y, z)w, (4.42)

for all (x, y, z) and (x, y +v, z +w) where F is defined. Furthermore, the equality

J(u + v) = J(u) + dJ(u; v)

holds true if and only if equality in (4.42) holds true; and this is the case if and
only if v = 0 or w = 0. In this latter case we get that J is also strictly convex.

Example 4.3.5 (A Sturm–Liouville Problem, Revisited Once Again). Let p ∈


C([xo , x1 ], R) and q ∈ C([xo , x1 ], R) be such that p(x) > 0 and q(x) > 0 for
xo 6 x 6 x1 . Define J : C 1 ([xo , x1 ], R) → R by
Z x1
J(y) = [p(x)(y 0 (x))2 + q(x)(y(x))2 ] dx, for y ∈ C 1 ([xo , x1 ], R). (4.43)
xo

Thus, this functional corresponds to the function F : [xo , x1 ] × R × R → R

F (x, y, z) = p(x)z 2 + q(x)y 2 , for x ∈ [xo , x1 ], y ∈ R and z ∈ R;

so that
Fy (x, y, z) = 2q(x)y and Fz (x, y, z) = 2p(x)z,
for (x, y, z) ∈ [xo , x1 ] × R × R. Thus, the condition in (4.42) for the functional
in (4.42) reads

p(x)(z + w)2 + q(x)(y + v)2 > p(x)z 2 + q(x)y 2 + 2q(x)yv + 2p(x)zw (4.44)

To show that (4.44) holds true, expand the term on the left–hand side to get

p(x)(z + w)2 + q(x)(y + v)2 = p(x)(z 2 + 2zw + w2 )

+q(x)(y 2 + 2yv + v 2 )

= p(x)z 2 + 2p(x)zw + p(x)w2

+q(x)y 2 + 2q(x)yv + q(x)v 2 ,


58 CHAPTER 4. CONVEX MINIMIZATION

which we can rewrite as


p(x)(z + w)2 + q(x)(y + v)2 = p(x)z 2 + q(x)y 2

+2q(x)yv + 2p(x)zw (4.45)

+p(x)w2 + q(x)v 2 .

Since we are assuming that p > 0 and q > 0 on [xo , x1 ], we see that (4.44)
follows from (4.45). Thus, the functional J given in (4.43) is convex.
To see that J is strictly convex,assume that equality holds in (4.44); so that,

p(x)(z + w)2 + q(x)(y + v)2 = p(x)z 2 + q(x)y 2 + 2q(x)yv + 2p(x)zw.

It then follows from (4.45) that

p(x)w2 + q(x)v 2 = 0, for all x ∈ [xo , x1 ].

Consequently, since we are assuming that p(x) > 0 and q(x) > 0 for all x ∈
[xo , x1 ], we get that
w = v = 0.
Hence, the functional J given in (4.43) is strictly convex.

Example 4.3.6. Let p : [a, b] → R be continuous on [a, b]. Suppose p(x) > 0
for all x ∈ [a, b] and define
Z b p
J(y) = p(x) 1 + (y 0 )2 dx for y ∈ C 1 ([a, b], R). (4.46)
a

We consider the question of whether J is convex in the class

A = {y ∈ C 1 ([a, b], R) | y(a) = yo and y(b) = y1 } (4.47)

for given real numbers yo and y1 .


Set Vo = {v ∈ C 1 ([a, b], R) | v(a) = 0 and v(b) = 0}. Then, Vo is a nontrivial
subspace of C 1 ([a, b], R). √
In this case, F (x, y, z) = p(x) 1 + z 2 for x ∈ [a, b] and all real values for y
p(x)z
and z. Observe that Fy ≡ 0 and Fz (x, y, z) = √ are continuous, and the
1 + z2
inequality (4.42) for this case reads
p p p(x)zw
p(x) 1 + (z + w)2 > p(x) 1 + z 2 + √ ,
1 + z2
which, by virtue of the assumption that p > 0 on [a, b], is equivalent to
p p zw
1 + (z + w)2 > 1 + z 2 + √ . (4.48)
1 + z2
4.3. CONVEX FUNCTIONALS 59

The fact that the inequality in (4.46) holds true for all z, w ∈ R, with equality
iff w = 0, is a consequence of the Cauchy–Schwarz inequality in R2 applied to
the vectors A ~ = (1, z) and B
~ = (1, z + w).
It follows from the inequality in (4.48) that the functional in (4.46) is convex
in A.
To show that the functional J given in (4.46) is strictly convex, first rewrite
the inequality in (4.48) as
p p zw
1 + (z + w)2 − 1 + z 2 − √ > 0, for z, w ∈ R, (4.49)
1 + z2
and note that equality in (4.49) holds if and only if w = 0.
It follows from what we have just shown that

J(y + v) > J(y) + dJ(y; v), for y ∈ A and v ∈ Vo , (4.50)

where, by virtue of the result in Example 4.1.3,


Z b
y 0 (x)v 0 (x)
dJ(y; v) = p(x) p dx, for y ∈ A and v ∈ Vo . (4.51)
a 1 + (y 0 (x))2

Thus, equality in (4.50) holds if and only if


Z b Z b b
y0 v0
p p Z
0 0 2
p(x) 1 + (y + v ) dx = 0 2
p(x) 1 + (y ) dx + p(x) p dx,
a a a 1 + (y 0 )2

where we have used (4.51). So, equality in (4.50) holds if and only if
Z b !
p
0 0 2
p
0 2
y0 v0
p(x) 1 + (y + v ) − 1 + (y ) − p dx = 0. (4.52)
a 1 + (y 0 )2

It follows from (4.52), the inequality in (4.49), the assumption that p(x) > 0 for
all x ∈ [a, b], and the assumptions that p, y 0 and v 0 are continuous, that
p p y 0 (x)v 0 (x)
1 + (y 0 (x) + v 0 (x))2 − 1 + (y 0 (x))2 − p = 0, for all x ∈ [a, b];
1 + (y 0 (x))2

so that, since equality in (4.48) holds if and only if w = 0, we ontain that

v 0 (x) = 0, for all x ∈ [a, b].

Thus, v is constant on [a, b]. Therefore, since v(a) = 0, it follows that v(x) = 0
for all x ∈ [a, b]. We have therefore demonstrated that he functional J defined
in (4.46) is strictly convex in A given in (4.47).

We note that in the previous


√ example (Example 4.3.6), the corresponding
function, F (x, y, z) = p(x) 1 + z 2 , to the functional J does not depend explic-
itly on y. In the next example we consider a general class of functionals of this
type.
60 CHAPTER 4. CONVEX MINIMIZATION

Example 4.3.7. Let I denote an open interval of real numbers (we note that I
could be the entire real line). We consider function F : [a, b] × I → R; that is, F
is a function of two variables (x, z), where x ∈ [a, b] and z ∈ I. We assume that
F is continuous on [a, b] × I. Suppose also that the second partial derivative of
F with respect to z, Fzz (x, z), exists and is continuous on [a, b] × I, and that

Fzz (x, z) > 0, for (x, z) ∈ [a, b] × I, (4.53)

except possibly at finitely many values of z in I.


Let V = C 1 ([a, b], R) and define the functional
Z b
J(y) = F (x, y 0 (x)) dx, for y ∈ V. (4.54)
a

In this example we show that, if (4.53) holds true, then the functional J defined
in (4.54) is strictly convex in

A = {y ∈ C 1 ([a, b], R) | y(a) = yo and y(b) = y1 } (4.55)

for given real numbers yo and y1 .


According to (4.42) in Example 4.3.6, we need to show that

F (x, z + w) > F (x, z) + Fz (x, z)w, (4.56)

for all x ∈ [a, b], z ∈ I and w ∈ R such that z + w ∈ I, since Fy = 0 in this case.
Fix x ∈ [a, b], z ∈ I and w ∈ R such that z + w ∈ I, and put

g(t) = F (x, z + tw), for all t ∈ [0, 1].

Then, g is C 2 in (0, 1) and, integrating by parts,


Z 1 1
Z 1
(1 − t)g 00 (t) dt = (1 − t)g 0 (t) + g 0 (t) dt
0 0 0

= −g 0 (0) + g(1) − g(0)

= −Fz (x, z)w + F (x, z + w) − F (x, z),

from which it follows that


Z 1
F (x, z + w) = F (x, z) + Fz (x, z)w + (1 − t)Fzz (x, z + tw)w2 dt, (4.57)
0

and so
F (x, z + w) > F (x, z) + Fz (x, z)w, (4.58)
which is the inequality in (4.56).
Next, assume that equality holds in (4.58). It then follows from (4.57) that
Z 1
(1 − t)Fzz (x, z + tw)w2 dt = 0,
0
4.3. CONVEX FUNCTIONALS 61

from which we get that w = 0, in view of the assumption in (4.53). We have


therefore shown that equality in (4.58) holds true if and only if w = 0.
To show that the functional J in (4.54) is strictly convex, use the inequality
in (4.58) to see that, for y ∈ V and η ∈ Vo = Co1 ([a, b], R),
Z b
J(y + η) = F (x, y 0 (x) + η 0 (x)) dx
a

Z b Z b
> F (x, y 0 (x)) dx + Fz (x, y 0 (x))η 0 (x) dx;
a a

so that, using the result in Example 4.1.3 and the definition of J in (4.54),

J(y + η) > J(y) + dJ(y; η) for y ∈ V and η ∈ Vo . (4.59)

Thus, the functional J defined in (4.54) is convex in A, where A is given in


(4.55).
To show that J is strictly convex, assume that equality hods in the inequality
in (4.59) holds true; so that

J(y + η) = J(y) + dJ(y; η) for some y ∈ V and η ∈ Vo ,

or, using the the definition of J in (4.54) and the result in Example 4.1.3,
Z b Z b Z b
F (x, y 0 (x) + η 0 (x)) dx = F (x, y 0 (x)) dx + Fz (x, y 0 (x))η 0 (x) dx,
a a a

or
Z b
[F (x, y 0 (x) + η 0 (x)) − F (x, y 0 (x)) − Fz (x, y 0 (x))η 0 (x)] dx = 0. (4.60)
a

It follows from the inequality in (4.58) that the integrand in (4.60) is nonnegative
on [a, b]; hence, since y 0 , η 0 , F and Fz are continuous functions, it follows from
(4.60) that

F (x, y 0 (x) + η 0 (x)) − F (x, y 0 (x)) − Fz (x, y 0 (x))η 0 (x) = 0, for x ∈ [a, b]. (4.61)

Thus, since equality in (4.56) holds true if an only if w = 0, it follows from


(4.61) that
η 0 (x) = 0, for all x ∈ (a, b),
from which we get that η(x) = c for all x ∈ [a, b], where c is a constant. Thus,
since η ∈ Vo , it follows that η(a) = 0 and, therefore c = 0; so that, η(x) = 0 for
all x ∈ [a, b]. We have therefore shown that equality in (4.59) holds true if and
only if η(x) = 0 for all x ∈ [a, b]. Hence, the functional J defined in (4.54) is
strictly convex in A, where A is given in (4.55).
62 CHAPTER 4. CONVEX MINIMIZATION

4.4 Convex Minimization Theorem


The importance of knowing that a given Gâteaux differentiable functional

J: V → R

is convex is that, once a vector u in A satisfying the condition

dJ(u; v) = 0, for all v ∈ Vo with u + v ∈ A,

is found, then we can conclude that u is a minimizer of J over the class A.


Furthermore, if we know that J is strictly convex, then we can conclude that
there exists a unique minimizer of J over A. This is the essence of the convex
minimization theorem presented in this section.
Theorem 4.4.1 (Convex Minimization Theorem). Let V denote a normed
linear space and Vo a nontrivial subspace of V . Let A be a nonempty subset
of V . Assume that a functional J : V → R is is Gâteaux differentiable at every
u ∈ A in any direction v ∈ Vo such that u + v ∈ A. Assume also that J is
convex in A. Suppose there exists uo ∈ A such that u − uo ∈ Vo for all u ∈ A
and
dJ(uo ; v) = 0 for v ∈ Vo such that uo + v ∈ A.
Then,
J(uo ) 6 J(u) for all u ∈ A; (4.62)
that is, uo is a minimizer of J over A. Moreover, if J is strictly convex in A,
then J can have at most one minimizer over A.
Proof: Let uo ∈ A be such that u − uo ∈ Vo for all u ∈ A and

dJ(uo ; v) = 0 for all v ∈ V such that uo + v ∈ A. (4.63)

Given u ∈ A, put v = u−uo . Then, v ∈ Vo and uo +v ∈ A, since uo +v = u ∈ A.


Consequently, by virtue of (4.63),

dJ(uo ; v) = 0, where v = u − uo . (4.64)

Now, since we are assuming that J is convex in A, it follows that

J(u) = J(uo + v) > J(uo ) + dJ(uo ; v);

so that, in view of (4.64),

J(u) > J(uo ), for all u ∈ A,

which is the assertion in (4.62).


Assume further that J is strictly convex in A, and let u1 and u2 be two
minimizers of J over A such that u − u1 ∈ Vo and u − u2 ∈ Vo for all u ∈ A.
Then,
J(u1 ) = J(u2 ), (4.65)
4.4. CONVEX MINIMIZATION THEOREM 63

since J(u1 ) 6 J(u2 ), given that u2 ∈ A and u1 is a minimizer of J over A; and


J(u2 ) 6 J(u1 ), given that u1 ∈ A and u2 is a minimizer of J over A.
It is also the case that

dJ(u1 ; v) = 0, for any v ∈ Vo with u1 + v ∈ A, (4.66)

by the results in Section 4.2 in these notes, since u1 is a minimizer of J over A.


Next, put v = u2 − u1 ; so that, v ∈ Vo and u1 + v = u2 ∈ A. We then have
that
J(u2 ) = J(u1 + v);
so that, in view of (4.65),
J(u1 + v) = J(u1 ),
and, using (4.66),
J(u1 + v) = J(u1 ) + dJ(u1 ; v). (4.67)
Thus, since J is strictly convex, it follows from (4.67) that v = 0 (see (4.29) in
Definition 4.3.1); that is, u2 − u1 = 0 or u1 = u2 . Hence, if J is strictly convex
in A, then J can have at most one minimizer over A. 
64 CHAPTER 4. CONVEX MINIMIZATION
Chapter 5

Optimization Problems
with Constraints

Let V be a normed linear space and Vo a nontrivial subspace of V . Let J : V → R


be a functional that is Gâteaux differentiable at every u ∈ V in the direction
of every v ∈ Vo . In may applications we would like to optimize (maximize or
minimize) J not over a class of admissible vectors A, but over a subset of V that
is defined as the level set (or sets) of another Gâteaux differentiable functional
(or functionals) K : V → R. In this chapter, we discuss how to obtain necessary
conditions for a given vector u ∈ V to be an optimizer of J over a constraint of
the form
K −1 (c) = {v ∈ V | K(v) = ko }, (5.1)
for some real number c. The level set K −1 (ko ) is an example of a constraint. In
some applications there can be several constraints, and sometimes constraints
might come in the form of inequalities.
An example of an optimization problem with constraint might take the form
of: Find u ∈ K −1 (c) such that
J(u) = max J(v). (5.2)
v∈K −1 (c)

In the following section, we present a classical example of an optimization prob-


lem of the type in (5.2) and (5.1).

5.1 Queen Dido’s Problem


Let ` denote a given positive number and consider all curves in the xy–plane
that are the graph of a function y ∈ C 1 ([0, `], R) such that y(x) > 0 for all
x ∈ [0, b], y(0) = 0, y(b) = 0 for some b ∈ (a, `), and the arc–length of the curve
from (0, 0) to (b, 0) equal to `; that is,
Z bp
1 + (y 0 (x))2 dx = `. (5.3)
0

65
66 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

x
b `

Figure 5.1.1: Graph of y(x) for 0 6 x 6 b

Figure 5.1.1 shows one of those curves (the graph of the corresponding function
y ∈ C 1 ([0, b], R) is shown over the interval [0, b]). We will denote the class of
these curves by A. Setting Vo = Co1 ([0, b], R) and
Z b p
K(y) = 1 + (y 0 (x))2 dx, for y ∈ C 1 ([0, b], R), (5.4)
0

we can express the class A as

A = {y ∈ Vo | y(x) > 0 for x ∈ [0, b], and K(y) = `}. (5.5)

Note that K in (5.4) defines a functional K : V → R where V = C 1 ([0, b], R).


This is the arc–length functional that we have encountered previously in these
notes. The expression
K(y) = `, for y ∈ A (5.6)
in the definition of A in (5.5) defines a constraint.
We would like to find, if possible, the curve in A for which the area enclosed
by it and the positive x–axis is the largest possible; that is,
Z b
y(x) dx, for y ∈ A
0

is the largest possible among all functions y ∈ A.


Defining the functional J : V → R by
Z b
J(y) = y(x) dx, for y ∈ V, (5.7)
0

we can restate the problem as


5.2. EULER–LAGRANGE MULTIPLIER THEOREM 67

Problem 5.1.1 (Dido’s Problem). Let J : V → R be the area functional defined


in (5.7), K : V → R the arc–length functional defined in (5.4), and A be the
class of admissible functions defined in (5.5) in terms of the constraint K(y) = `
in (5.6). If possible, find y ∈ A such that

J(y) > J(v), for all v ∈ A.

To make the dependence of the constraint in (5.6) more explicit, we may also
write the problem as: Find y ∈ A such that

J(y) > J(v), for v ∈ Vo with v(x) > 0 for x ∈ [0, b] and K(v) = `. (5.8)

In the following section we will see how we can approach this kind of problems
in a general setting.

5.2 Euler–Lagrange Multiplier Theorem


Let V denote a normed linear space with norm k · k, and let Vo be a nontrivial
subspace of V . Let J : V → R and K : V → R be functionals that are Gâteaux
differentiable at every u ∈ V in the direction of every v ∈ Vo . We would like
to obtain necessary conditions for a given vector u ∈ V to be an optimizer (a
maximizer or a minimizer) of J subject to a constraint of the form

K −1 (c) = {v ∈ V | K(v) = c}, (5.9)

where c is a given real number; it is assumed that the level set K −1 (c) in (5.9)
is nonempty. Thus, we would like to find conditions satisfied by uo ∈ V such
that
K(uo ) = c (5.10)
and

J(uo ) 6 J(v) (or J(uo ) > J(v)) for all v ∈ V such that K(v) = c. (5.11)

We also assume that the Gâteaux derivatives of J and K, dJ(u; v) and


dK(u; v), respectively, are weakly continuous in u.

Definition 5.2.1 (Weak Continuity). Let V denote a normed linear space


with norm k · k and Vo a nontrivial subspace of V . Let J : V → R be Gâteaux
differentiable at every u ∈ V in the direction of every v ∈ Vo . We say that the
Gâteaux derivative of J, dJ(u; v), is weakly continuous at uo ∈ V if and only if

lim dJ(u; v) = dJ(uo ; v), for every v ∈ Vo ; (5.12)


u→uo

that is, for each v ∈ Vo , given ε > 0, there exists δ > 0 such that

ku − uo k < δ ⇒ |dJ(u; v) − dJ(uo ; v)| < ε.


68 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

Example 5.2.2. Let


 Z 1 Z 1 
V = y ∈ C 1 ([0, 1], R) (y(x))2 dx < ∞ and (y 0 (x))2 dx < ∞ ,
0 0

and define a norm on V by


s
Z 1 Z 1
kyk = 2
(y(x)) dx + (y 0 (x))2 dx, for all y ∈ V. (5.13)
0 0

Define J : V → R by
Z 1
1
J(y) = (y 0 (x))2 dx for all y ∈ V. (5.14)
2 0

Then, J is Gâteaux differentiable at every y ∈ V with Gâteaux derivative at


y ∈ V in the direction of v ∈ V given by
Z 1
dJ(y; v) = y 0 (x)v 0 (x) dx, for y, v ∈ V. (5.15)
0

To see that the Gâteaux derivative of J given in (5.15) is weakly continuous at


every y ∈ V , use (5.15) to compute, for u ∈ V ,
Z 1 Z 1
dJ(u; v) − dJ(y; v) = u0 (x)v 0 (x) dx − y 0 (x)v 0 (x) dx
0 0

Z 1
= [u0 (x) − y 0 (x)]v 0 (x) dx;
0

so that, taking absolute values on both sides,


Z 1
|dJ(u; v) − dJ(y; v)| 6 |u0 (x) − y 0 (x)| |v 0 (x)| dx. (5.16)
0

Then, applying the Cauchy–Schwarz inequality on the right–hand side of (5.16),


s s
Z 1 Z 1
|dJ(u; v) − dJ(y; v)| 6 0 0 2
|u (x) − y (x)| dx · |v 0 (x)|2 dx;
0 0

so that, using the definition of norm in V given in (5.13),


|dJ(u; v) − dJ(y; v)| 6 ku − yk · kvk. (5.17)
Thus, given any ε > 0 and assuming that v 6= 0, we see from (5.17) that, setting
ε
δ= ,
kvk
ku − yk < δ ⇒ |dJ(u; v) − dJ(y; v)| < ε.
We therefore conclude that
lim dJ(u; v) = dJ(y; v), for all v ∈ V.
u→y
5.2. EULER–LAGRANGE MULTIPLIER THEOREM 69

The following theorem will be helpful in obtaining necessary conditions for


solutions of the general constrained optimization problem in (5.10) and (5.11).
It is a generalization of the Lagrange Multiplier Theorem in Euclidean space.
Theorem 5.2.3 (Euler–Lagrange Multiplier Theorem). Let V denote a normed
linear space and Vo a nontrivial subspace of V . Let J : V → R and K : V → R
be functionals that are Gâteaux differentiable at every u ∈ V in the direction
of every v ∈ Vo . Suppose there exists uo ∈ V such that

K(uo ) = c, (5.18)

for some real number c, and

J(uo ) 6 J(v) (or J(uo ) > J(v)) for all v ∈ V such that K(v) = c. (5.19)

Suppose also that the Gâteaux derivatives, dJ(u; v) and dK(u; v), of J and K,
respectively, are weakly continuous in u for all u ∈ V . Then, either

dK(uo ; v) = 0, for all v ∈ Vo , (5.20)

or there exists a real number µ such that

dJ(uo ; v) = µ dK(uo ; v), for all v ∈ Vo . (5.21)

Remark 5.2.4. A proof of a slightly more general version of Theorem 5.2.3


can be found in [Smi74, pp. 72–77].
Remark 5.2.5. The scalar µ in (5.21) is called and Euler–Lagrange multiplier.
Remark 5.2.6. In practice, when solving the constrained optimization prob-
lem in (5.18) and (5.19), we solve (if possible) the equations in (5.18), (5.20) and
(5.21), simultaneously, to find a candidate, uo , for the solution of the optimiza-
tion problem. We will get to see and instance of this approach in the following
example.
Example 5.2.7 (Queen Dido’s Problem, Revisited). For the problem intro-
duced in Section 5.1, we were given a fixed positive number `, and we defined
V = C 1 ([0, b], R), Vo = Co1 ([0, b], R), and

A = {y ∈ Vo | y(x) > 0 for x ∈ [0, b], and K(y) = `} (5.22)

where K : V → R is the arc–length functional


Z bp
K(y) = 1 + (y 0 (x))2 dx, for y ∈ V. (5.23)
a

We would like to maximize the area functional J : V → R given by


Z b
J(y) = y(x) dx, for y ∈ V, (5.24)
0
70 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

subject to the constraint K(y) = `.


Eying to apply the Euler–Lagrange Multiplier Theorem, we compute the
Gâteaux derivatives of the functionals in (5.23) and (5.24) to get
Z b
y 0 (x)η 0 (x)
dK(y; η) = p dx, for y ∈ V and η ∈ Vo , (5.25)
a 1 + (y 0 (x))2
and Z b
dJ(y; η) = η(x) dx, for y ∈ V and η ∈ Vo . (5.26)
0
Endowing the space V with the norm

kyk = max |y(x)| + max |y 0 (x)|, for all y ∈ V,


06x6b 06x6b

we can show that the Gâteaux derivative of K given in (5.25) is weakly contin-
uous (see Problem 4 in Assignment #6).
To see that the Gâteaux derivative of J in (5.26) is also weakly continuous,
observe that, for all u and y in V ,

dJ(u; η) − dJ(y; η) = 0, for all η ∈ Vo ,

in view of (5.26).
Thus, we can apply the Euler–Lagrange Multiplier Theorem to the optimiza-
tion problem: Find y ∈ A, where A is given in (5.22), such that

J(y) = max J(u). (5.27)


u∈A

We obtain that necessary conditions for y ∈ A to be a candidate for a solution


of the problem in (5.27) are
Z bp
1 + (y 0 (x))2 dx = `; (5.28)
0

b
y (x)η 0 (x)
0
Z
p dx = 0, for all η ∈ Vo ; (5.29)
a 1 + (y 0 (x))2
or, there exists a multiplier µ ∈ R such that
Z b Z b
y 0 (x)η 0 (x)
η(x) dx = µ p dx, for all η ∈ Vo , (5.30)
0 a 1 + (y 0 (x))2

where we have used (5.18), (5.20) and (5.21) in the conclusion of Theorem 5.2.3.
We first consider the case in which (5.29) holds true. In this case, the Second
Fundamental Lemma of the Calculus of Variations (Lemma 3.2.8 on page 29 in
these notes) implies that
y 0 (x)
p = C, for all x ∈ [0, b], (5.31)
1 + (y 0 (x))2
5.2. EULER–LAGRANGE MULTIPLIER THEOREM 71

for some constant C.


Solving the equation in (5.31) for y 0 (x) yields the differential equation

y 0 (x) = c1 , for all x ∈ [0, b], (5.32)

for some constant c1 . Solving the equation in (5.32) in turn yields

y(x) = c1 x + c2 , for all x ∈ [0, b], (5.33)

for some constants c1 and c2 . Then, using the requirement that y ∈ A, so that
y ∈ Vo , we obtain from (5.33) that

c1 = c2 = 0;

Thus,
y(x) = 0, for all x ∈ [0, b], (5.34)
is a candidate for an optimizer of J over A, provided that (5.28) holds true.
We get from (5.28) and (5.34) that
Z b p
1 + 02 dx = `,
0

from which we get that


b = `;
this shows a connection between the length of the curve, `, and the end–point,
b, of the interval [0, b] determined by the constraint in (5.28).
We have shown that the function y ∈ Co1 ([0, b], R), where b = `, given in
(5.34) is a candidate for an optimizer of J defined in (5.24) over the class A
given in (5.22). Since in this case J(y) = 0, for the function y given in (5.34),
y is actually a minimizer of J over A, and not a maximizer. Thus, we turn to
the second alternative in (5.30), which we can rewrite as
Z b !
µy 0 (x) 0
η(x) − p η (x) dx = 0, for all η ∈ Vo . (5.35)
0 1 + (y 0 (x))2

Now, it follows from (5.35) and the Third Fundamental Lemma (Lemma 3.2.9
on page 30 of theses notes) that y must be a solution of the differential equation
" #
d µy 0 (x)
− p = 1, for 0 < x < b. (5.36)
dx 1 + (y 0 (x))2

We see from (5.36) that µ 6= 0, since 1 6= 0. We can therefore rewrite the


equation in (5.36) as
" #
d y 0 (x) 1
p = − , for 0 < x < b. (5.37)
dx 0
1 + (y (x))2 µ
72 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

Integration the equation in (5.37) yields

y 0 (x) x
p = − + c1 , for 0 < x < b, (5.38)
1 + (y 0 (x))2 µ

for some constant of integration c1 .


Next, solve the equation in (5.38) for y 0 (x) to obtain the differential equation

dy x − µc1
= ±p , for 0 < x < b, (5.39)
dx µ − (x − µc1 )2
2

The differential equation in (5.39) can be integrated to yield


p
y=∓ µ2 − (x − µc1 )2 + c2 , (5.40)

for a constant of integration c2 .


Observe that the equation in (5.40) can be written as

(x − µc1 )2 + (y − c2 )2 = µ2 , (5.41)

which is the equation of a circle of radius µ (here we are taking µ > 0) and
center at (µc1 , c2 ). Thus, the graph of a y ∈ A for which the area, J(y), under
it and above the x–axis is the largest possible must be a semicircle of radius µ
and centered at (µc1 , 0); so that, c2 = 0. We then get from (5.40) that
p
y(x) = µ2 − (x − µc1 )2 , for 0 < x < b, (5.42)

where we have taken the positive solution in (5.40) to ensure that y(x) > 0
for all x ∈ [0, b], according to the definition of A in (5.22); so that, the graph
of y is the upper semicircle. We are also assuming that c1 > 0. Furthermore,

x
b

Figure 5.2.2: Graph of optimal solution y(x) for 0 6 x 6 b


5.2. EULER–LAGRANGE MULTIPLIER THEOREM 73

the condition y(0) = 0 in the definition of A in (5.22) implies from (5.42) that
c1 = 1; so that, (5.42) now reads
p
y(x) = µ2 − (x − µ)2 , for 0 < x < b; (5.43)

thus, the graph of y is a semicircle of radius µ > 0 centered at (µ, 0); this is
pictured in Figure 5.2.2, where b = 2µ. Hence, according to the definition of A
in (5.23), we also obtain an expression for µ in terms of b:

b
µ= . (5.44)
2
Finally, since K(y) = `, according to the definition of A in (5.23), it must also
be the case that
πµ = `, (5.45)
given that πµ is the arc–length of the semicircle of radius µ pictured in Figure
5.2.2. Combining (5.44) and (5.45) we see that

2`
b= ,
π
which gives the connection between b and ` imposed by the constraint in (5.28).

Remark 5.2.8. It is important to keep in mind that the condition for an


optimizer that we obtained in Example 5.2.7 was obtained under the assumption
that an optimizer exists. This is the main assumption in the statement of the
Euler–Lagrange Multiplier Theorem. We have not proved that an optimizer for
J exists in A. What we did prove is that, if a solution y ∈ A of the optimization
problem
J(y) = max J(v), subjec to K(y) = `,
v∈A

where A is as given in (5.22) and J as defined in (5.24) exists, then the graph
of y(x), for 0 6 x 6 b, must be a semicircle.

Example 5.2.9. In this example we consider a general class of problems that


can be formulated as follows: Let a and b be real numbers with a < b and define
V = C 1 ([a, b], R). The space V is a normed linear space with norm

kyk = max |y(x)| + max |y 0 (x)|, for all y ∈ V. (5.46)


a6c6b a6c6b

Put Vo = C 1 ([a, b], R) and define

A = {y ∈ V | y(a) = yo and y(b) = y1 }, (5.47)

for given real numbers yo and y1 .


74 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

Let F : [a, b] × R × R and G : [a, b] × R × R be continuous functions with


continuous partial derivatives

Fy (x, y, z), Fz (x, y, z), Gy (x, y, z) and Gz (x, y, z), (5.48)

for (x, y, z) ∈ [a, b] × R × R. Define functionals J : V → R and K : V → R by


Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ V, (5.49)
a

and Z b
K(y) = G(x, y(x), y 0 (x)) dx, for y ∈ V, (5.50)
a
respectively.
We consider the problem of finding an optimizer y ∈ A, of J over the class
A, where A is given in (5.47) subject to the constraint

K(y) = c, (5.51)

for some given constant c.


In order to apply the Euler–Lagrange Multiplier Theorem (Theorem 5.2.3
on page 69 in these notes), we need to consider the Gâteaux derivatives of the
functionals J and K given in (5.49) and (5.50), respectively:
Z b
dJ(y, η) = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx, for y ∈ V and η ∈ Vo , (5.52)
a

and
Z b
dK(y, η) = [Gy (x, y, y 0 )η + Gz (x, y, y 0 )η 0 ] dx, for y ∈ V and η ∈ Vo , (5.53)
a

where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0 for η 0 (x) in
the integrands in (5.52) and (5.53). The assumption that the partial derivatives
in (5.48) are continuous for (x, y, z) ∈ [a, b] × R × R will allow us to show that
the Gâteaux derivatives of J and K in J in (5.52) and (5.53), respectively, are
weakly continuous in V with respect to the norm k · k defined in (5.46). This
fact is proved in Appendix C starting on page 93 of these notes. Thus, we can
apply Theorem 5.2.3 to obtain that, if (x, y) ∈ A, where A is given in (5.47), is
an optimizer of J over A subject to the constraint in (5.51), then, either
Z b
[Gy (x, y, y 0 )η + Gz (x, y, y 0 )η 0 ] dx = 0, for all η ∈ Vo , (5.54)
a

or there exists a multiplier µ ∈ R such that


Z b Z b
[Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx = µ [Gy (x, y, y 0 )η + Gz (x, y, y 0 )η 0 ] dx,
a a
5.2. EULER–LAGRANGE MULTIPLIER THEOREM 75

for all η ∈ Vo , which we can rewrite as


Z b
[Hy (x, y, y 0 )η + Hz (x, y, y 0 )η 0 ] dx = 0, for all η ∈ Vo , (5.55)
a

where H : [a, b] × R × R → R is given by

H(x, y, z) = F (x, y, z) − µG(x, y, z), for (x, y, z) ∈ [a, b] × R × R → R. (5.56)

Thus, the conditions in (5.54), and (5.55) and (5.56) are necessary conditions
for (x, y) ∈ A being an optimizer of J over A subject to the constraint in (5.51).
These conditions in turn yield the following Euler–Lagrange equations by virtue
of the third fundamental lemma in the Calculus of Variations (Lemma 3.2.9):
Either
d
[Gz (x, y, y 0 )] = Gy (x, y, y 0 ), (5.57)
dx
or there exists µ ∈ R such that
d
[Hz (x, y, y 0 )] = Hy (x, y, y 0 ), (5.58)
dx
where H is given in (5.56).

In the following example we present an application of the equations in (5.57),


(5.58 and (5.56.
Example 5.2.10. For given b > 0, put V = C 1 ([0, b], R) and Vo = Co1 ([0, b], R).
Define functionals J : V → R and K : V → R by
Z bp
J(y) = 1 + (y 0 (x))2 dx, for all y ∈ V, (5.59)
0

and Z b
K(y) = y(x) dx, for all y ∈ V, (5.60)
0
respectively. Let
A = {y ∈ Vo | y(x) > 0}. (5.61)
We consider the following constraint optimization problem:
Problem 5.2.11. Minimize J(y) for y ∈ A subject to the constraint

K(y) = a, (5.62)

for some a > 0.


The Gâteaux derivatives of the functionals J and K defined in (5.59) and
(5.61), respectively, are
Z b
y 0 (x)
dJ(y; η) = p η 0 (x) dx, for y ∈ V and η ∈ Vo , (5.63)
0 1 + (y 0 (x))2
76 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

and Z b
dK(y; η) = η(x) dx, for y ∈ V and η ∈ Vo , (5.64)
a
respectively. In Example 5.2.7 we saw that the Gâteaux derivatives in (5.64) and
(5.63) are weakly continuous. Thus, the Euler–Lagrange Multiplier Theorem
applies. Hence, if y ∈ A is an optimizer for J over A subject to the constraint
in (5.62), then either y must solve the Euler–Lagrange equation in (5.57), or
there exists a multiplier µ ∈ R such that

H(x, y, z) = F (x, y, z) − µF (x, y, z), for x ∈ [0, b] × R × R,

solves the Euler–Lagrange equation in (5.58), where


p
F (x, y, z) = 1 + z 2 , for x ∈ [0, b] × R × R,

and
G(x, y, z) = y, for x ∈ [0, b] × R × R.
We then have that
z
Fy (x, y, z) = 0 and Fz (x, y, z) = √ , for x ∈ [0, b] × R × R,
1 + z2
and

Gy (x, y, z) = 1 and Gz (x, y, z) = 0, for x ∈ [0, b] × R × R.

The differential equation in (5.57) then reads: 0 = 1, which is impossible;


hence, there must exist an Euler–Lagrange multiplier µ ∈ R such that y solves
the differential equation
" #
d y0
p = µ,
dx 1 + (y 0 )2

which yields
y0
p = µx + c1 , (5.65)
1 + (y 0 )2
for some constant c1 .
We consider two cases in (5.65): either µ = 0, or µ 6= 0.
If µ = 0 in (5.65), we obtain that

y 0 = c2 ,

for some constant c2 , which yields the general solution

y(x) = c2 x + c3 ,

for another constant c3 . The assumption that y ∈ Vo (see the definition of A in


(5.61)) then implies that y(x) = 0 for all x ∈ [0, b]; however, since y also must
5.2. EULER–LAGRANGE MULTIPLIER THEOREM 77

satisfy the constraint in (5.62), we get from the definition of K in (5.60) that
a = 0, which is impossible since we are assuming that a > 0. Hence, it must be
the case that µ 6= 0.
Thus, assume that µ 6= 0 and solve the equation in (5.65) for y 0 to obtain
the differential equation
dy µx + c1
= ±p , for 0 < x < b,
dx 1 − (µx + c1 )2
which can be integrated to yield
1p
y=∓ 1 − (µx + c1 )2 + c2 , for 0 6 x 6 b, (5.66)
µ
for some constant of integration c2 .
It follows from the expression in (5.66) that
(µx + c1 )2 + (µy − µc2 )2 = 1,
which we can rewrite as
 2
c1 1
x+ + (y − c2 )2 = . (5.67)
µ µ2
Observe that the expression in (5.67) is the equation of a circle in the xy–plane
of radius 1/µ (here, we are taking µ > 0) and center at
 
c1
− , c2 .
µ
The boundary conditions in the definition of A in (5.61) imply that
y(0) = 0 and y(b) = 0.
Using these conditions in (5.67) we obtain that
c1 b
=− .
µ 2
Thus, we can rewrite the equation in (5.67) as
 2
b 1
x− + (y − c2 )2 = 2 , (5.68)
2 µ
which is the equation of a circle in the xy–plane of radius 1/µ and center at
 
b
, c2 .
2
Thus, according to (5.68), a solution y ∈ A of the constrained optimization
Problem 5.2.11 has graph along an arc of the circle connecting the point (0, 0)
to the point (b, 0). The value of c2 can be determined by the condition in (5.62).
We’ll get back to the solution Problem 5.2.11 in the next section dealing with
an isoperimetric problem
78 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

5.3 An Isoperimetric Problem


In this section we discuss an example in the same class of problems as Queen
Dido’s Problem presented in Example 5.2.7.
Problem 5.3.1 (An Isoperimetric Problem). Out of all smooth, simple, closed
curves in the plane of a fixed perimeter, `, find one that encloses the largest
possible area.
In what follows, we shall define several of the terms that appear in the statement
of Problem 5.3.1.
Let V = C 1 ([0, 1], R2 ); so that, the elements of V are vector–valued functions
(x, y) : [0, 1] → R2 ,
whose values are denoted by (x(t), y(t)), for t ∈ [0, 1], where the functions
x : [0, 1] → R and y : [0, 1] → R are differentiable functions of t ∈ (0, 1), with
continuous derivatives ẋ and ẏ (the dot on top of a variable name indicates
derivative with respect to t). Not that V is a linear space with sum and scalar
multiplication defined by
(x(t), y(t)) + (u(t), v(t)) = (x(t) + u(t), y(t)) + v(t)), for all t ∈ [0, 1],
for (x, y) ∈ V and (u, v) ∈ V , and
c(x(t), y(t)) = (cx(t), cy(t))), for all t ∈ [0, 1],
for c ∈ R and (x, y) ∈ V .
We can also endow V with a norm k(·, ·)k defined by
k(x, y)k = max |x(t)| + max |y(t)| + max |ẋ(t)| + max |ẏ(t)|, (5.69)
06t61 06t61 06t61 06t61

for (x, y) ∈ V (see Problem 2 in Assignment #7).


Notation 5.3.2. We will denote and element (x, y) in V by a single single
symbol σ ∈ V ; so that,
σ(t) = (x(t), y(t)), for t ∈ [0, 1].
The derivative of σ : [0, 1] → R2 will be denoted by
σ 0 (t) = (ẋ(t), ẏ(t)), for t ∈ (0, 1),
and σ 0 (t) is tangent to the curve traced by σ at the point σ(t), provided that
σ 0 (t) 6= (0, 0).

Definition 5.3.3 (Smooth, simple, closed curve). A plane curve parametrized


by a map σ ∈ V is said to be a smooth, simple, closed curve if
σ(0) = σ(1);
the map σ : [0, 1) → R2 is one–to–one; and
σ 0 (t) 6= (0, 0), for all t ∈ [0, 1]. (5.70)
5.3. AN ISOPERIMETRIC PROBLEM 79

Remark 5.3.4. A continuous, simple, closed curve in the plane is also called a
Jordan curve. The Jordan Curve Theorem states that any continuous, simple,
closed curve in the plane separates the plain into two disjoint, connected, regions:
a bounded region (we shall refer to this region as the region enclosed by the
curve) and an unbounded region (the region outside the curve).
We denote by A the class of smooth, simple, closed curves in the plane.
According the definition of a smooth, simple, closed curve in Definition 5.3.3,
we can identify A with the class of functions σ ∈ V such that σ : [0, 1] → R2
satisfies the conditions in Definition 5.3.3. We will also assume that the paths
in A induce a counterclockwise (or positive) orientation on the curve that σ
parametrizes. Thus, for each σ = (x, y) ∈ A, we can compute the are of the
region enclosed by σ by using the formula (B.13) derived in Appendix B.2 using
the Divergence Theorem. Denoting the area enclosed by (x, y) ∈ A by J((x, y))
we have that I
1
J((x, y)) = (xdy − ydx), (5.71)
2 ∂Ω
where Ω is the region enclosed by the path (x, y) ∈ A. Expressing the line
integral in (5.71) in terms of the parametrization (x, y) : [0, 1] → R2 of ∂Ω, we
have that
1 1
Z
J((x, y)) = (x(t)ẏ(t) − y(t)ẋ(t)) dt, for (x, y) ∈ A. (5.72)
2 0
We note that the functional J : V → R defined in (5.72) defines a a functional
on V that, when restricted to A, gives the area of the region enclosed by σ =
(x, y) ∈ A. The arc–length of any curve (x, y) ∈ V is given by
Z 1p
K((x, y)) = (ẋ(t))2 + (ẏ(t))2 dt, for (x, y) ∈ V. (5.73)
0

We can then restate Problem 5.3.1 as


Problem 5.3.5 (An Isoperimetric Problem, Restated). Find (x, y) ∈ A such
that
J((x, y)) = max J((u, v)), subject to K((x, y)) = `. (5.74)
(u,v)∈A

Thus, the isoperimetric problem in Problem 5.3.1 is a constrained optimiza-


tion problem. Thus, we may attempt to use the Euler–Lagrange Multiplier
Theorem to obtain necessary conditions for a solution of the problem.
Observe that the functionals in (5.72) and (5.73) are of the form
Z b
J((x, y)) = F (t, x(t), y(t), ẋ(t), ẏ(t)) dt, for (x, y) ∈ C 1 ([a, b], R2 ), (5.75)
a

and
Z b
K((x, y)) = G(t, x(t), y(t), ẋ(t), ẏ(t)) dt, for (x, y) ∈ C 1 ([a, b], R2 ), (5.76)
a
80 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

where F : [a, b] × R4 → R and G : [a, b] × R4 → R are continuous functions of


the variables (t, x, y, p, q) ∈ [a, b] × R4 , that have continuous partial derivatives

Fx (t, x, y, p, q), Fy (t, x, y, p, q), Fp (t, x, y, p, q), Fq (t, x, y, p, q),

and

Gx (t, x, y, p, q), Gy (t, x, y, p, q), Gp (t, x, y, p, q), Gq (t, x, y, p, q),

for (t, x, y, p, q) ∈ [a, b] × R4 (with possible exceptions). Indeed, for the func-
tionals in (5.72) and (5.73), [a, b] = [0, 1],

1 1
F (t, x, y, p, q) = xq − yp, for (t, x, y, p, q) ∈ [0, 1] × R4 , (5.77)
2 2
and
p
G(t, x, y, p, q) = p2 + q 2 , for (t, x, y, p, q) ∈ [0, 1] × R4 , (5.78)

with p2 + q 2 6= 0. We note that for the functions F and G defined in (5.77) and
(5.78), respectively,

1 1
Fx (t, x, y, p, q) = q, Fy (t, x, y, p, q) = − p,
2 2
(5.79)
1 1
Fp (t, x, y, p, q) = − y, Fq (t, x, y, p, q) = x,
2 2
which are continuous functions for (t, x, y, p, q) ∈ [0, 1] × R4 , and

Gx (t, x, y, p, q) = 0, Gy (t, x, y, p, q) = 0,
p q (5.80)
Gp (t, x, y, p, q) = p , Gq (t, x, y, p, q) = p ,
p2 + q2 p + q2
2

which are continuous as long as p2 + q 2 6= 0.


Using V to denote C 1 ([a, b], R2 ), momentarily, Vo to denote Co1 ([a, b], R2 ),
and A to denote

{(x, y) ∈ V | (x(a), y(a)) = (xo , yo ) and (x(b), y(b)) = (x1 , y1 )},

for given (xo , yo ) ∈ R2 and (x1 , y1 ) ∈ R2 , we can use the assumptions that
the partial derivatives Fx , Fy , Fp , Fq , Gx , Gy , Gp and Gq are continuous to
show the the functionals J : V → R and K : V → R defined in (5.75) and
(5.76), respectively, are Gâteaux differentiable at (x, y) ∈ V in the direction of
(η1 , η2 ) ∈ Vo , with Gâteaux derivatives given by
Z b
dJ((x, y); (η1 , η2 )) = [Fx η1 + Fy η2 + Fp η̇1 + Fq η̇2 ] dt, (5.81)
a
5.3. AN ISOPERIMETRIC PROBLEM 81

for all (x, y) ∈ V and (η1 , η2 ) ∈ Vo , where we have written

Fx for Fx (t, x(t), y(t), ẋ(t), ẏ(t)), for t ∈ [a.b];

Fy for Fy (t, x(t), y(t), ẋ(t), ẏ(t)), for t ∈ [a.b];


Fp for Fp (t, x(t), y(t), ẋ(t), ẏ(t)), for t ∈ [a.b];
Fq for Fq (t, x(t), y(t), ẋ(t), ẏ(t)), for t ∈ [a.b];
and η1 , η̇1 , η2 and η̇2 for η1 (t), η̇1 (t), η2 (t) and η̇2 (t), for t ∈ [a, b], respectively,
(see Problem 3 in Assignment #7); similarly, we have that
Z b
dK((x, y); (η1 , η2 )) = [Gx η1 + Gy η2 + Gp η̇1 + Gq η̇2 ] dt, (5.82)
a

The isoperimetric problem in Problem 5.3.5 is then a special case of the opti-
mization problem:

optimize J((x, y)) over A subject to K((x, y)) = c, (5.83)

for some constant c, where J and K are given in (5.75) and (5.76), respectively.
We can use the Euler–Lagrange Multiplier Theorem (Theorem 5.2.3 on page 69
in these notes) to obtain necessary conditions for the solvability of the variational
problem in (5.83), provided we can show that the Gâteaux derivatives of J and
K in (5.81) and (5.82), respectively, are weakly continuous; this can be shown
using the arguments in Appendix C. We therefore obtain that, if (x, y) ∈ A is
an optimizer of J over A subject to the constraint K((x, y)) = c, then either
Z b
[Gx η1 + Gy η2 + Gp η̇1 + Gq η̇2 ] dt = 0, for all (η1 , η2 ) ∈ Vo , (5.84)
a

or there exists a multiplier µ ∈ R such that


Z b Z b
[Fx η1 + Fy η2 + Fp η̇1 + Fq η̇2 ] dt = µ [Gx η1 + Gy η2 + Gp η̇1 + Gq η̇2 ] dt,
a a

for all (η1 , η2 ) ∈ Vo , or, setting

H(t, x, y, p, q) = F (t, x, y, p, q) − µG(t, x, y, p, q), (5.85)

for (t, x, y, p, q) ∈ [a, b] × R4 ,


Z b
[Hx η1 + Hy η2 + Hp η̇1 + Hq η̇2 ] dt = 0, for all (η1 , η2 ) ∈ Vo . (5.86)
a

Now, taking η2 (t) = 0 for all t ∈ [a, b] we obtain from (5.84) that
Z b
[Gx η1 + Gp η̇1 ] dt = 0, for all η1 ∈ Co1 [a, b]. (5.87)
a
82 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

It follows from (5.87) and the third fundamental lemma in the Calculus of Vari-
ations (Lemma 3.2.9 on page 30 in these notes) that Gp (t, x(t), y(t), ẋ(t), ẏ(t))
is a differentiable function of t with derivative
d
[Gp (t, x, y, ẋ, ẏ)] = Gx (t, x, y, ẋ, ẏ), for t ∈ (a, b).
dt
Similarly, taking η1 (t) = 0 for all t ∈ [a, b] in (5.84) and applying the third
fundamental lemma of the Calculus of Variations, we obtain the differential
equation
d
[Gq (t, x, y, ẋ, ẏ)] = Gy (t, x, y, ẋ, ẏ).
dt
We have therefore shown that the condition in (5.84) implies that (x, y) ∈ A
must solve the system of differential equations

d
 dt [Gp (t, x, y, ẋ, ẏ)] = Gx (t, x, y, ẋ, ẏ);



(5.88)

 d

 [Gq (t, x, y, ẋ, ẏ)] = Gy (t, x, y, ẋ, ẏ).
dt
Similarly, we obtain from (5.87) the system of differential equations

 d [Hp (t, x, y, ẋ, ẏ)] = Hx (t, x, y, ẋ, ẏ);

 dt

(5.89)

 d

 [Hq (t, x, y, ẋ, ẏ)] = Hy (t, x, y, ẋ, ẏ),
dt
where H = F − µG is given in (5.85).
Hence, if (x, y) is a solution of the constrained optimization problem in
(5.83), where J : V → R and K : V → R are as given in (5.75) and (5.76),
respectively, then, either (x, y) solves the system of Euler–Lagrange equations
in (5.88), or there exists an Euler–Lagrange Multiplier µ ∈ R such that (x, y)
solves the system of Euler–Lagrange equations in (5.89), where H is as given
in (5.85). We next apply this reasoning to the isoperimetric problem stated in
Problem 5.3.5.
In the case of the constrained optimization problem in Problem 5.3.5, F and
G are given by (5.77) and (5.78), respectively, and their partial derivatives are
given in (5.79) and (5.80), respectively. Thus, if (x, y) is a simple, closed curve
of arc–length ` that encloses the largest possible area, then either (x, y) solves
the system  !
 d ẋ
= 0;

 p
 dt


 ẋ2 + ẏ 2
! (5.90)


 d ẏ
= 0,


 dt pẋ2 + ẏ 2

5.3. AN ISOPERIMETRIC PROBLEM 83

or there exists a multiplier, µ, such that (x, y) solves the system of differential
equations  !
 d
 1 ẋ 1
 − y − µp = ẏ;
 dt 2 2


 ẋ2 + ẏ 2
! (5.91)


 d 1 ẏ 1
x − µp = − ẋ.


dt 2 2

 ẋ + ẏ 2
2

Let (x, y) ∈ A, where A is the class of smooth, simple, closed curves in the
plane, be a solution of the isoperimetric problem in Problem 5.74. suppose also
that (x, y) solves the system of Euler–Lagrange equations in (5.90). We then
have that



 p = c1 ;
ẋ + ẏ 2
2



(5.92)

 ẏ


 p 2 = d1 ,
ẋ + ẏ 2
for constants c1 and d1 . Note that, in view of (5.70) in Definition 5.3.3, which
is part of the definition of A, the denominators in the expressions in (5.92) are
not zero. Also, it follows from (5.92) that

c21 + d21 = 1;

consequently, c1 and d1 cannot both be zero.


Assume that c1 = 0. In this case, it follows from the first equation in (5.92)
that ẋ = 0; from which we get that x(t) = c2 for all t ∈ [0, 1], which implies
that the simple, closed curve (x(t), y(t), for t ∈ [0, 1], lies on the line x = c2 ;
this is impossible.
Suppose next that c1 6= 0 in (5.92) and divide the first expression in (5.92)
into the second to obtain
ẏ d1
= ;
ẋ c1
so that, by virtue of the Chain Rule,
dy
= c3 ,
dx
for some constant c3 . We therefore get that

y = c3 x + c4 ,

for constants c3 and c4 ; so that, the simple closed (x(t), y(t)), for t ∈ [0, 1], again
lies on a straight line, which is impossible.
Thus, the second alternative in the Euler–Lagrange Multiplier Theorem must
hold true for a solution (x, y) ∈ A of the constrained optimization problem in
(5.74). Therefore, there exists a multiplier µ ∈ R for which the system of
Euler–Lagrange equations in (5.91) holds true.
84 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS

Rewrite the equations in (5.91) as


 !
 1 d ẋ 1
− ẏ − µ = ẏ;

 p
 2 dt 2


 ẋ2 + ẏ 2
 !

 1 d ẏ 1
ẋ − µ = − ẋ,

 p
2 dt 2

 ẋ2 + ẏ 2

which can in turn be written as


 !
 d ẋ
ẏ + µ = 0;

 p
dt



 ẋ2 + ẏ 2
 !

 d ẏ
 ẋ − µ dt = 0,


 p
ẋ2 + ẏ 2
or  !
 d ẋ
y + µp = 0;


dt



 ẋ2 + ẏ 2
! (5.93)


 d ẏ
x − µp = 0.


 dt

ẋ2 + ẏ 2
The equations in (5.93) can be integrated to yield



 y + µp = c2 ;
ẋ + ẏ 2
2




 ẏ
 x − µp 2 = c1 ,


ẋ + ẏ 2
for constants c1 and c2 , from which we get that



 y − c2 = −µ p ;
ẋ + ẏ 2
2



(5.94)

 ẏ
 x − c1 = µp .


ẋ2 + ẏ 2
It follows from the equations in (5.94) that

(x − c1 )2 + (y − c2 )2 = µ2 , (5.95)

which is the equation of a circle of radius µ (we are taking µ > 0) and center at
(c1 , c2 ).
We have therefore shown that if (x, y) ∈ A is a solution of the isoperimetric
problem in (5.74), then the simple, closed curve (x(t), y(t)), for t ∈ [0, 1], must
5.3. AN ISOPERIMETRIC PROBLEM 85

lie on some circle of radius µ (see equation (5.95)). Since K((x, y)) = `, it
follows that
2πµ = `,
from which we get that
`
µ= .

86 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
Appendix A

Some Inequalities

A.1 The Cauchy–Schwarz Inequality


Theorem A.1.1 (The Cauchy–Schwarz Inequality). Let f and g be continuous
functions on [a, b]. Then
s s
Z b Z b Z b
f (x)g(x) dx 6 |f (x)|2 dx |g(x)|2 dx.
a a a

In terms of the L2 norm, k · k2 , this inequality can be written as


Z b
f (x)g(x) dx 6 kf k2 kgk2 .
a

87
88 APPENDIX A. SOME INEQUALITIES
Appendix B

Theorems About
Integration

B.1 Differentiating Under the Integral Sign


Solutions of problems in the Calculus of Variations often require the differen-
tiation of functions defined in terms of integrals of other functions. In many
instance this involves differentiation under the integral sign. In this appendix
we preset a few results that specify conditions under which differentiation under
the integral sign is valid.

Proposition B.1.1 (Differentiation Under the Integral Sign). Suppose that


H : [a, b] × R → R is a C 1 function. Define h : R → R by
Z b
h(t) = H(x, t) dx, for all t ∈ R.
a

∂H
Assume that the functions H and are absolutely integrable over [a, b].
∂t
1
Then, h is C and its derivative is given by
Z b
0 ∂
h (t) = [H(x, t)] dx.
a ∂t

Proposition B.1.2 (Differentiation Under the Integral Sign and Fundamental


Theorem of Calculus). Suppose that H : [a, b] × R × R → R is a C 1 function.
Define Z t
h(y, t) = H(x, y, t) dx, for all y ∈ R, t ∈ R.
a

∂ ∂
Assume that the functions H, [H(x, y, t)] and [H(x, y, t)] are absolutely
∂y ∂t

89
90 APPENDIX B. THEOREMS ABOUT INTEGRATION

integrable over [a, b]. Then, h is C 1 and its partial derivatives are given by
Z t
∂ ∂
[h(y, t)] = [H(x, y, t)] dx
∂y a ∂y

and Z t
∂ ∂
[h(x, t)] = H(t, y, t) + [H(x, y, t)] dx.
∂t a ∂t

Proposition B.1.2 can be viewed as a generalization of the Fundamental


Theorem of Calculus and is a special case of Leibnitz Rule.

B.2 The Divergence Theorem


We begin by stating the two–dimensional version of the divergence theorem. We
then present some consequences of the result.
Let U denote an open subset of R2 and Ω a subset of U such that Ω ⊂ U .
We assume that Ω is bounded with boundary, ∂Ω, that can be parmetrized by
σ : [0, 1] → R, where σ(t) = (x(t), y(t)), for t ∈ [0, 1], with x, y ∈ C 1 ([0, 1], R)
satisfying
(ẋ(t))2 + (ẏ(t))2 6= 0, for all t ∈ [0, 1], (B.1)
(where the dot on top of the variable indicates derivative with respect to t), and
σ(0) = σ(1). Implicit in the definition of a parametrization is the assumption
that the map σ : [0, 1) → R2 is one–to–one on [0, 1). Thus, ∂Ω is a simple closed
curve in U . Observe that the assumption in (B.1) implies that at every point
σ(t) ∈ ∂Ω, a tangent vector

σ 0 (t) = (ẋ(t), ẏ(t)), for t ∈ [0, 1]. (B.2)




Let F : U → R2 denote a C 1 vector field in U ; so that,


F (x, y) = (P (x, y), Q(x, y)), for (x, y) ∈ U, (B.3)

where P : U → R and Q : U → R are C 1 , real–valued functions defined on U .




The divergence of the vector field F ∈ C 1 (U, R2 ) given in (B.3) is a scalar


field div F : U → R defined by

− ∂P ∂Q
div F (x, y) = (x, y) + (x, y) for (x, y) ∈ U. (B.4)
∂x ∂y
Example B.2.1. Imagine a two–dimensional fluid moving through a region U
in the xy–plane. Suppose the velocity of the fluid at a point (x, y) ∈ R2 is given


by a C 1 vector field V : U → R2 in units of distance per time. Suppose that we
also know the density of the fluid, ρ(x, y) at any point (x, y) ∈ U (in units of
mass per area), and that ρ : U → R is a C 1 scalar field. Define

− →

F (x, y) = ρ(x, y) V (x, y), for (x, y) ∈ U. (B.5)
B.2. THE DIVERGENCE THEOREM 91


− →

Then F has units of mass per unit length, per unit time. The vector field F in
(B.5) is called the flow field and it measures the amount of fluid per unit time
that goes through a cross section of unit length perpendicular to the direction of


V . Thus, to get a measure of the amount of fluid per unit time that crosses the
boundary ∂Ω in direction away from the region Ω, we compute the line integral


I
F ·nb ds, (B.6)
∂Ω

where ds is the element of arc–length along ∂Ω, and n b is unit vector that is
perpendicular to the curve ∂Ω and points away from Ω. The, expression in (B.6)


is called the flux of the flow field F across ∂Ω and it measures the amount of
fluid per unit time that crosses the boundary ∂Ω.

− →

On the other hand, the divergence, div F , of the flow field F in (B.5) has
units of mass/time × length2 , and it measures the amount of fluid that diverges
from a point per unit time per unit area. Thus, the integral


ZZ
div F dxdy (B.7)

the total amount of fluid leaving the reagin Ω per unit time. In the case where
there are not sinks or sources of fluid inside the region Ω, the integrals in (B.6)
and (B.7) must be equal; so that,

− →

ZZ I
div F dxdy = F ·nb ds. (B.8)
Ω ∂Ω

The expression in (B.8) is the Divergence Theorem.


Theorem B.2.2 (The Divergence Theorem in R2 ). Let U be an open subset
of R2 and Ω an open subset of U such that Ω ⊂ U . Suppose that Ω is bounded
with boundary ∂Ω. Assume that ∂Ω is a piece–wise C 1 , simple, closed curve.


Let F ∈ C 1 (U, R2 ). Then,

− →

ZZ I
div F dxdy = F ·nb ds, (B.9)
Ω ∂Ω

where n
b is the outward, unit, normal vector to ∂Ω that exists everywhere on
∂Ω, except possibly at finitely many points.

For the special case in which ∂Ω is parmatrized by σ ∈ C 1 ([0, 1], R2 ) satisfy-


ing (B.2), σ(0) = σ(1), the map σ : [0, 1) → R2 is one–to–one, and σ is oriented
in the counterclockwise sense, the outward unit normal to ∂Ω is given by
1
n
b(σ(t)) = (ẏ(t), −ẋ(t)), for t ∈ [0, 1]. (B.10)
|σ 0 (t)|
Note that the vector n b in (B.10) is a unit vector that is perpendicular to the
vector σ 0 (t) in (B.2) that is tangent to the curve at σ(t). In follows from (B.10)
92 APPENDIX B. THEOREMS ABOUT INTEGRATION



that, for the C 1 vector field F given in (B.3), the line integral on the right–hand
side of (B.9) can be written as
1


I Z
1
F ·n
b ds = (P (σ(t)), Q(σ(t))) · (ẏ(t), −ẋ(t)) |σ 0 (t)| dt,
∂Ω 0 |σ 0 (t)|
or
1


I Z
F ·n
b ds = [P (σ(t))ẏ(t) − Q(σ(t))ẋ(t)] dt,
∂Ω 0
which we can write, using differentials, as


I I
F ·nb ds = (P dy − Qdx). (B.11)
∂Ω ∂Ω



Thus, using the definition of the divergence of F in (B.4) and (B.11), we can
rewrite (B.9) as
ZZ   I
∂P ∂Q
+ dxdy = (P dy − Qdx), (B.12)
Ω ∂x ∂y ∂Ω

which is another form of the Divergence Theorem in (B.9).




Applying the Divergence Theorem (B.9) to the vector field F = (Q, −P ),
where P, Q ∈ C 1 (U, R) yields from (B.12) that
ZZ   I
∂Q ∂P
− dxdy = (P dx + Qdy),
Ω ∂x ∂y ∂Ω

which is Green’s Theorem.


As an application of the Divergence Theorem as stated in (B.12), consider
the case of the vector field (P, Q) = (x, y) for all (x, y) ∈ R2 . In this case (B.12)
yields ZZ I
2 dxdy = (xdy − ydx),
Ω ∂Ω
or I
2 area(Ω) = (xdy − ydx),
∂Ω
from which we get the formula
I
1
area(Ω) = (xdy − ydx), (B.13)
2 ∂Ω

for the area of the region Ω enclosed by a simple closed curve ∂Ω.
Appendix C

Continuity of Functionals

In many of the examples presented in these notes, we consider functionals de-


fined on the vector space V = C 1 ([a, b], R) of the form
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ V,
a

where F : [a, b]×R×R → R is a continuous function. In this appendix we discuss


continuity properties of this type of functionals with respect to the norm some
norm defined in V .

C.1 Definition of Continuity


In general, let V denote a normed linear space with norm k · k. We say that
a functional J : V → R is continuous at uo ∈ V if and only if, for every ε > 0
there exists δ > 0 such that

ku − uo k < δ ⇒ |J(u) − J(uo )| < ε.

If J is continuous at every uo ∈ V , we say that J is continuous in V .


Example C.1.1. Let V = C([a, b], R) be endowed with the norm

kyk = max |y(x)|, for every y ∈ V. (C.1)


a6x6b

Let g : [a, b] × R be a continuous function and define J : V → R by


Z b
J(y) = g(x, y(x)) dx, for all y ∈ V. (C.2)
a

We will show that J is continuous in V .


First, we consider the spacial case in which

g(x, 0) = 0, for all x ∈ [a, b], (C.3)

93
94 APPENDIX C. CONTINUITY OF FUNCTIONALS

and show that the functional J defined in (C.2) is continuous at yo , where


yo (x) = 0 for all x ∈ [a, b]; that is, in view of (C.3), (C.2) and the definition of
continuity given at the start of this section, we show that for every ε > 0, there
exists δ > 0 such that
v ∈ V and kvk < δ ⇒ |J(v)| < ε. (C.4)
We first show that, if g : [a, b] × R → R is continuous and (C.3) holds true,
then for every η > 0 there exists δ > 0 such that
|s| < δ ⇒ |g(x, s)| < η. (C.5)
To establish this claim, we argue by contradiction. Assume to the contrary
that there exists ηo > 0 such that, for every n ∈ N, there exists sn ∈ R and
xn ∈ [a, b] such that
1
|sn | < and |g(xn , yn )| > ηo , for all n. (C.6)
n
Now, since [a, b] is compact, we may assume (after passing to a subsequence, if
necessary) that there exists xo ∈ [a, b] such that
xn → xo as n → ∞. (C.7)
It follows from (C.6), (C.7) and the assumption that g is continuous that
|g(xo , 0)| > ηo . (C.8)
However, (C.8) is in direct contradiction with (C.3), since ηo > 0. We have
therefore shown that, if g : [a, b] × R → R is continuous and (C.3) holds true,
then, for every η > 0, there exists δ > 0 such that (C.5) is true.
Let ε > 0 be given and put
ε
η= . (C.9)
b−a
By what we have just proved, there exists δ > 0 for which (C.5) holds true. Let
v ∈ V be such that kvk < δ; then, by the definition of the norm k · k in (C.1),
|v(x)| < δ, for all x ∈ [a, b]. (C.10)
We then get from (C.5) and (C.10) that
|g(x, v(x))| < η, for all x ∈ [a, b]. (C.11)
Consequently, integrating on both sides of the estimate in (C.11) from a to b,
Z b
|g(x, v(x))| dx < η(b − a). (C.12)
a

In view of (C.12) and (C.9), we see that we have shown that


Z b
v ∈ V and kvk < δ ⇒ |g(x, v(x))| dx < ε,
a

from which (C.4) follows.


Bibliography

[Smi74] D. R. Smith. Variational Methods in Optimization. Prentice Hall, 1974.


[Wei74] R. Weinstock. Calculus of Variations. Dover Publications, 1974.

95

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy