Math 188 Fall 2017 Notes
Math 188 Fall 2017 Notes
Optimization
Preliminary Lecture Notes
Adolfo J. Rumbos
c Draft date November 14, 2017
2
Contents
1 Preface 5
2 Variational Problems 7
2.1 Minimal Surfaces . . . . . . . . . . . . . . . . . . . . . . . . . . . 7
2.2 Linearized Minimal Surface Equation . . . . . . . . . . . . . . . . 12
2.3 Vibrating String . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
3 Indirect Methods 19
3.1 Geodesics in the plane . . . . . . . . . . . . . . . . . . . . . . . . 19
3.2 Fundamental Lemmas . . . . . . . . . . . . . . . . . . . . . . . . 25
3.3 The Euler–Lagrange Equations . . . . . . . . . . . . . . . . . . . 31
4 Convex Minimization 47
4.1 Gâteaux Differentiability . . . . . . . . . . . . . . . . . . . . . . . 47
4.2 A Minimization Problem . . . . . . . . . . . . . . . . . . . . . . . 52
4.3 Convex Functionals . . . . . . . . . . . . . . . . . . . . . . . . . . 53
4.4 Convex Minimization Theorem . . . . . . . . . . . . . . . . . . . 62
A Some Inequalities 87
A.1 The Cauchy–Schwarz Inequality . . . . . . . . . . . . . . . . . . . 87
C Continuity of Functionals 93
C.1 Definition of Continuity . . . . . . . . . . . . . . . . . . . . . . . 93
3
4 CONTENTS
Chapter 1
Preface
5
6 CHAPTER 1. PREFACE
Examples of a Variational
Problems
x Ω
7
8 CHAPTER 2. VARIATIONAL PROBLEMS
dimensional space. Specifically, the shape of the soap film spanning the wire
loop, can be modeled by the graph of a smooth function, u : Ω → R, defined on
the closure of a bounded region, Ω, in the xy–plane with smooth boundary ∂Ω.
The physical explanation for the shape of the soap film relies on the variational
principle that states that, at equilibrium, the configuration of the film must be
such that the energy associated with the surface tension in the film must be the
lowest possible. Since the energy associated with surface tension in the film is
proportional to the area of the surface, it follows from the least–energy principle
that a soap film must minimize the area; in other words, the soap film spanning
the wire loop must have the shape of a smooth surface in space containing
the wire loop with the property that it has the smallest possible area among
all smooth surfaces that span the wire loop. In this section we will develop a
mathematical formulation of this variational problem.
The wire loop can be modeled by the curve determined by the set of points:
Φ : Ω → R3
given by
Φ(x, y) = (x, y, u(x, u)), for all x ∈ Ω, (2.1)
where Ω = Ω ∪ ∂R is the closure of Ω, and
u: Ω → R
u ∈ C 2 (Ω) ∩ C(Ω).
that is,
Ag = {u ∈ C 2 (Ω) ∩ C(Ω) | u = g on ∂Ω}. (2.2)
Next, we see how to compute the area of the surface Su = Φ(Ω), where Φ is
the map given in (2.1) for u ∈ Ag , where Ag is the class of functions defined in
(2.2).
The grid lines x = c and y = d, for arbitrary constants c and d, are mapped
by the parametrization Φ into curves in the surface Su given by
y 7→ Φ(c, y)
2.1. MINIMAL SURFACES 9
and
x 7→ Φ(x, d),
respectively. The tangent vectors to these paths are given by
∂u
Φy = 0, 1, (2.3)
∂y
and
∂u
Φx = 1, 0, , (2.4)
∂x
respectively. The quantity
kΦx × Φy k∆x∆y (2.5)
gives an approximation to the area of portion of the surface Su that results
from mapping the rectangle [x, x + ∆x] × [y, y + ∆y] in the region Ω to the
surface Su by means of the parametrization Φ given in (2.1). Adding up all the
contributions in (2.5), while refining the grid, yields the following formula for
the area Su : ZZ
area(Su ) = kΦx × Φy k dxdy. (2.6)
Ω
Using the definitions of the tangent vectors Φx and Φy in (2.3) and (2.4), re-
spectively, we obtain that
∂u ∂u
Φ x × Φy = − , − , 1 ,
∂x ∂y
so that s 2 2
∂u ∂u
kΦx × Φy k = 1+ + ,
∂x ∂y
or p
kΦx × Φy k = 1 + |∇u|2 ,
where |∇u| denotes the Euclidean norm of ∇u. We can therefore write (2.6) as
ZZ p
area(Su ) = 1 + |∇u|2 dxdy. (2.7)
Ω
A : Ag → R
by ZZ p
A(u) = 1 + |∇u|2 dxdy, for all u ∈ Ag , (2.8)
Ω
which gives the area of the surface parametrized by the map Φ : Ω → R3 given
in (2.1) for u ∈ Ag . We will refer to the map A : Ag → R defined in (2.8) as the
area functional. With the new notation we can restate the variational problem
of this section as follows:
10 CHAPTER 2. VARIATIONAL PROBLEMS
Problem 2.1.1 (Variational Problem 1). Out of all functions in Ag , find one
such that
A(u) 6 A(v), for all v ∈ Ag . (2.9)
That is, find a function in Ag that minimizes the area functional in the class
Ag .
Problem 2.1.1 is an instance of what has been known as Plateau’s problem
in the Calculus of Variations. The mathematical question surrounding Pateau’s
problem was first formulated by Euler and Lagrange around 1760. In the middle
of the 19th century, the Belgian physicist Joseph Plateu conducted experiments
with soap films that led him to the conjecture that soap films that form around
wire loops are of minimal surface area. It was not until 1931 that the American
mathematician Jesse Douglas and the Hungarian mathematician Tibor Radó,
independently, came up with the first mathematical proofs for the existence
of minimal surfaces. In this section we will derive a necessary condition for
the existence of a solution to Problem 2.1.1, which is expressed in terms of
a partial differential equation (PDE) that u ∈ Ag must satisfy, the minimal
surface equation.
Suppose we have found a solution, u ∈ Ag , of Problem 2.1.1 in u ∈ Ag . Let
ϕ : Ω → R by a C ∞ function with compact support in Ω; we write ϕ ∈ Cc∞ (Ω)
(we show a construction of such function in the Appendix). It then follows that
has a minimum at 0, by virtue of (2.12) and (2.12). It follows from this obser-
vation that, if f is differentiable at 0, then
f 0 (0) = 0. (2.13)
We will see next that, since we are assuming that u ∈ C 2 (R) ∩ C(Ω) and
ϕ ∈ Cc∞ (Ω), f is indeed differentiable. To see why this is the case, use (2.12)
and (2.8) to compute
ZZ p
f (t) = 1 + |∇(u + tϕ)|2 dxdy, for all t ∈ R, (2.14)
Ω
where
∇(u + tϕ) = ∇u + t∇ϕ, for all t ∈ R,
2.1. MINIMAL SURFACES 11
= ∇u · ∇u + t∇u · ∇ϕ + t∇ϕ · ∇u + t2 ∇ϕ · ∇ϕ
Since the integrand in (2.15) is C 1 , we can differentiate under the integral sign
(see Appendix) to get
∇u · ∇ϕ + t|∇ϕ|2
ZZ
0
f (t) = p dxdy, (2.16)
Ω 1 + |∇u|2 + 2t∇u · ∇ϕ + t2 |∇ϕ|2
for all ϕ ∈ Cc∞ (Ω), where the second integral in (2.19) is a path integral around
the boundary of Ω. Since ϕ ∈ Cc∞ (Ω) vanishes in a neighborhood of the bound-
ary of R, it follows from (2.19) that
!
∇u
ZZ
∇· p ϕ dxdy = 0, for all ϕ ∈ Cc∞ (Ω). (2.20)
Ω 1 + |∇u|2
The equation in (2.21) is a second order nonlinear PDE known as the min-
imal surface equation. It provides a necessary condition for a function
u ∈ C 2 (Ω) ∩ C(Ω) to be a minimizer of the area functional in Ag . Since,
we are also assuming that u ∈ Ag , we get that u must solve the boundary value
problem (BVP):
!
∇u
∇· p = 0 in Ω;
1 + |∇u|2 (2.22)
u = g on ∂Ω.
The BVP in (2.22) is called the Dirichlet problem for the minimal surface
equation.
The PDE in (2.21) can also be written as
∂u ∂u
ux = , uy = ,
∂x ∂y
∂2u ∂2u
uxx = , uyy = ,
∂x2 ∂y 2
and
∂2u ∂2u
uxy = = = uyx . (2.24)
∂y∂x ∂x∂y
The fact that the “mixed” second partial derivatives in (2.24) are equal follows
from the assumption that u is a C 2 function.
The equation in (2.23) is a nonlinear, second order, elliptic PDE.
x R
so that ZZ
1
A(u) ≈ area(Ω) + |∇u|2 dxdy, for all u ∈ Ag . (2.26)
2 Ω
Thus, according to (2.28), for wire loops close to a horizontal plane, minimal sur-
faces spanning the wire loop can be approximated by solutions to the following
variational problem,
Problem 2.2.1 (Variational Problem 2). Out of all functions in Ag , find one
such that
D(u) 6 D(v), for all v ∈ Ag . (2.29)
14 CHAPTER 2. VARIATIONAL PROBLEMS
where
∆u = uxx + uyy ,
the two–dimensional Laplacian. The BVP in (2.30) is called the Dirichlet Prob-
lem for Laplace’s equation.
x
0 L
x
0 L
by a continuous function, f , of x, for x ∈ [0, L]. At any time t > 0, the shape
of the string is described by a function, u, of x and t; so that u(x, t) gives the
vertical displacement of a point in the string located at x when the string is in
the equilibrium position pictured in Figure 2.3.3, and at time t > 0. We then
have that
u(x, 0) = f (x), for all x ∈ [0, L]. (2.31)
In addition to the initial condition in (2.31), we will also prescribe the initial
speed of the string,
∂u
(x, 0) = g(x), for all x ∈ [0, L], (2.32)
∂t
2.3. VIBRATING STRING 15
where K(t) denotes the kinetic energy of the system at time t, and V (t) its
potential energy at time t. For the case of a string whose motion is described by
small vertical displacements u(x, t), for all x ∈ [0, L] and all times t, the kinetic
energy is given by
2
1 L
Z
∂u
K(t) = ρ(x) (x, t) dx. (2.35)
2 0 ∂t
To see how (2.35) comes about, note that the kinetic energy of a particle of
mass m is
1
K = mv 2 ,
2
where v is the speed of the particle. Thus, for a small element of the string whose
projection on the x–axis is the interval [x, x+∆x], so that its approximate length
is ∆x, the kinetic energy is, approximately,
2
1 ∂u
∆K ≈ ρ(x) (x, t) . (2.36)
2 ∂t
Thus, adding up the kinetic energies in (2.36) over all elements of the string
adding in length to L, and letting ∆x → 0, yields the expression in (2.35),
which we rewrite as
1 L 2
Z
K(t) = ρut dx, for all t, (2.37)
2 0
where ut denotes the partial derivative of u with respect to t.
16 CHAPTER 2. VARIATIONAL PROBLEMS
In order compute the potential energy of the string, we compute the work
done by the tension, τ , along the string in stretching the string from its equi-
librium length of L, to the length at time t given by
Z L p
1 + u2x dx; (2.38)
0
so that "Z #
L p
V (t) = τ 1 + u2x dx − L , for all t. (2.39)
0
Since we are considering small vertical displacements of the string, we can lin-
earize the expression in in (2.38) by means of the linear approximation in (2.25)
to get
Z Lp Z L
1 L1 2
Z
1
1 + u2x dx ≈ [1 + u2x ] dx = L + u dx,
0 0 2 2 0 2 x
so that, substituting into (2.39),
Z L
1
V (t) ≈ τ u2x dx, for all t. (2.40)
2 0
where we have substitute the expressions for K(t) and V (t) in (2.37) and (2.40),
respectively, into the expression for the action in (2.34).
We will use the expression for the action in (2.41) to the define a functional in
the class of functions A defines as follows: Let R = (0, L) × (0, T ), the cartesian
product of the open intervals (0, L) and (0, T ). Then, R is an open rectangle in
the xt–plane. We say that u ∈ A if u ∈ C 2 (R) ∩ C(R), and u satisfies the initial
conditions in (2.31) and (2.32), and the boundary conditions in (2.33). Then,
the action functional,
A : A → R,
is defined by the expression in (2.41), so that
ZZ
1 2
ρut − τ u2x dxdt,
A(u) = for u ∈ A. (2.42)
2 R
Next, for ϕ ∈ Cc∞ (R), note that u + sϕ ∈ A, since ϕ has compact support
in R, and therefore ϕ and all its derivatives are 0 on ∂R. We can then define a
real valued function h : R → R by
Using the definition of the functional A in (2.42), we can rewrite h(s) in (2.43)
as ZZ
1
ρ[(u + sϕ)t ]2 − τ [(u + sϕ)x ]2 dxdt
h(s) =
2 R
ZZ
1
ρ[ut + sϕt ]2 − τ [ux + sϕx ]2 dxdt,
=
2 R
so that
ZZ
h(s) = A(u) + s [ρut ϕt − τ ux ϕx ] dxdt + s2 A(ϕ), (2.44)
R
for s ∈ R, where we have used the definition of the action functional in (2.42).
It follows from (2.44) that h is differentiable and
ZZ
h0 (s) = [ρut ϕt − τ ux ϕx ] dxdt + 2sA(ϕ), for s ∈ R. (2.45)
R
The principle of least action implies that, if u describes the shape of the string,
then s = 0 must be ac critical point of h. Hence, h0 (0) = 0 and (2.45) implies
that ZZ
[ρut ϕt − τ ux ϕx ] dxdt = 0, for ϕ ∈ Cc∞ (R), (2.46)
R
for C 1 functions ψ and ϕ, where n1 is the first component of the outward unit
normal, ~n, on ∂R (wherever this vector is defined), and
ZZ Z ZZ
∂ϕ ∂ψ
ψ dxdt = ψϕn2 ds − ϕ dxdt,
R ∂t ∂R R ∂t
where n2 is the second component of the outward unit normal, ~n, (see Problem
1 in Assignment #8), to obtain
ZZ Z ZZ
∂
ρut ϕt dxdt = ρut ϕn2 ds − [ρut ]ϕ dxdt,
R ∂R R ∂t
so that ZZ ZZ
∂
ρut ϕt dxdt = − [ρut ]ϕ dxdt, (2.47)
R R ∂t
since ϕ has compact support in R.
Similarly, ZZ ZZ
∂
τ ux ϕx dxdt = − [τ ux ]ϕ dxdt. (2.48)
R R ∂x
18 CHAPTER 2. VARIATIONAL PROBLEMS
Next, substitute the results in (2.47) and (2.48) into (2.46) to get
ZZ
∂ ∂
[ρut ] − [τ ux ] ϕ dxdt = 0, for ϕ ∈ Cc∞ (R). (2.49)
R ∂t ∂x
Thus, applying the Fundamental Lemma of the Calculus of Variations (see the
next chapter in these notes), we obtain from (2.49) that
∂2u ∂2u
ρ − τ = 0, in R, (2.50)
∂t2 ∂x2
since we area assuming that u is C 2 , ρ is a continuous function of x, and τ is
constant.
The PDE in (2.50) is called the one–dimensional wave equation. It is
sometimes written as
∂2u τ ∂2u
2
= ,
∂t ρ ∂x2
or
∂2u ∂2u
2
= c2 2 , (2.51)
∂t ∂x
where
τ
c2 = ,
ρ
where the case in which ρ is assumed to be constant.
The wave equation in (2.50) or (2.51) is a second order, linear, hyperbolic
PDE.
Chapter 3
19
20 CHAPTER 3. INDIRECT METHODS
xo x1 x
where |(·, ·)| in the integrand in (3.4) denotes the Euclidean norm in R2 .
We will first use this example to illustrate the direct method in the Calculus
of Variations. We begin by showing that the functional J defined in (3.4) is
bounded below in A by kQ − P k, or
p
|(x1 , y1 ) − (xo , yo )| = (x1 − xo )2 + (y1 − yo )2 ,
the Euclidean distance from P to Q; that is
|Q − P | 6 J(u, v), for all (u, v) ∈ A. (3.5)
Indeed, it follows from the Fundamental Theorem of Calculus that
Z 1
(u(1), v(1)) − (u(0), v(0)) = (u0 (s), v 0 (s)) ds,
0
and Z x1 p
J(y) = 1 + (y 0 (s))2 ds, for all y ∈ A. (3.14)
xo
Problem 3.1.2 (Geodesic Problem 2). Out of all functions in A, find one, y,
such that
J(y) 6 J(v), for all v ∈ A (3.15)
Next, define f : R → R by
It follows from (3.16) and the definition of f in (3.17) that f has a minimum at
0; that is
f (t) > f (0), for all t ∈ R.
Thus, if it can be shown that the function f defined in (3.17) is differentiable,
we get the necessary condition
f 0 (0) = 0, (3.18)
To see show that f is differentiable, we need to see that we can differentiate the
expression on the right–hand side of (3.20) under the integral sign. This follows
from the fact that the partial derivative of the integrand on the right–hand side
of 3.20,
∂p
1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2 ,
∂t
or
y 0 (s)η 0 (s) + t(η 0 (s))2
p ,
1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2
for t, y and s, for s in some open interval containing [xo , x1 ]. It then follows
from the results in Appendix B.1 that f given in (3.20) is differentiable and
Z x1
y 0 (s)η 0 (s) + t(η 0 (s))2
f 0 (t) = p ds, for t ∈ R, (3.21)
xo 1 + (y 0 (s))2 + 2ty 0 (s)η 0 (s) + t2 (η 0 (s))2
Thus, in view of (3.18) and (3.22), we see that a necessary condition for y ∈ A,
where A is given in (3.13), to be a minimizer of the functional J : A → R given
in (3.14), is that
Z x1
y 0 (s)
p η 0 (s) ds = 0, for all η ∈ Co1 ([xo , x1 ], R), (3.23)
1 + (y 0 (s))2
xo
where
where c2 is a constant.
We can solve the differential equation in (3.28) to obtain the general solution
We therefore get the following system of equations that c2 and c3 must solve
(
c2 xo + c3 = yo ;
c2 x1 + c3 = y1 ,
3.2. FUNDAMENTAL LEMMAS 25
or (
xo c2 + c3 = yo ;
(3.30)
x1 c2 + c3 = y1 .
Solving the system in (3.30) for c2 and c3 yields
y1 − yo x1 yo − xo y1
c2 = and c3 = .
x1 − xo x1 − xo
Thus, using the expression for y in (3.29),
y1 − yo x1 yo − xo y1
y(s) = s+ . (3.31)
x1 − xo x1 − xo
Note that the expression in (3.31) is the equation of a straight line that goes
through the points (xo , yo ) and (x1 , y1 ). Thus, we have shown that a candidate
for a minimizer of the arc–length functional J defined in (3.14) over the class
given in (3.13) is a straight line segment connecting the point P to the point
Q. It remains to show that the function y in (3.31) is in indeed a minimizer
of J in A, and that it is the only minimizer of J in A. This will be done in a
subsequent section in these notes.
y 0 (x)
p = c1 , for all x ∈ [xo , x1 ], (3.35)
1 + (y 0 (x))2
26 CHAPTER 3. INDIRECT METHODS
and for some constant c1 . This is also a necessary condition for y ∈ A being a
minimizer of the functional JA → R defined in (3.33), where A given in given
in (3.32).
We will see in this section that the differential equation in (3.35) follows
from the condition in (3.34) provided that the function
y0
p
1 + (y 0 )2
f : [a, b] → R,
defined on the closed and bounded interval [a, b], and which are assumed to
be continuous on [a, b]. It can be shown that C([a, b], R) is a linear space (or
vector) space in which the operations are point–wise addition
(f + g)(x) = f (x) + g(x), for all x ∈ [a, b], and all f, g ∈ C([a, b], R),
(cf )(x) = cf (x), for all x ∈ [a, b], all c ∈ R and all f ∈ C([a, b], R).
Definition 3.2.2. The class Co ([a, b], R) consists of all functions in C([a, b], R)
that vanish at a and b; in symbols,
Definition 3.2.4. The class Co1 ([a, b], R) consists of all functions f ∈ C 1 ([a, b], R)
that vanish at the end–points of the interval [a, b]; thus,
We begin by stating and proving the following basic lemma for the class
C([a, b], R).
3.2. FUNDAMENTAL LEMMAS 27
Lemma 3.2.5 (Basic Lemma 1). Let f ∈ C([a, b], R) and assume that f (x) > 0
for all x ∈ [a, b]. Suppose that
Z b
f (x) dx = 0.
a
Assume, by way of contradiction, that there exists xo ∈ (a, b) with f (xo ) > 0.
Then, since f is continuous at xo , there exists δ > 0 such that (xo − δ, xo + δ) ⊂
(a, b) and
f (xo )
x ∈ (xo − δ, xo + δ) ⇒ |f (x) − f (xo )| < . (3.37)
2
Now, using the triangle inequality, we obtain the estimate
f (xo ) 6 |f (x) − f (xo )| + |f (x)|;
so that, in view of (3.37) and the assumption that f is nonnegative on [a, b],
f (xo )
f (xo ) < + f (x), for xo − δ < x < xo + δ,
2
from which we get that
f (xo )
f (x) > , for xo − δ < x < xo + δ. (3.38)
2
It follows from (3.38) that
Z xo +δ Z xo +δ
f (xo )
f (x) dx > dx = δf (xo ).
xo −δ xo −δ 2
Thus, since we are assuming that f > 0 on [a, b],
Z b Z xo +δ
f (x) dx > f (x) dx > δf (xo ) > 0,
a xo −δ
Arguing by contradiction, assume that f (xo ) 6= 0 for some xo ∈ (a, b). Without
loss of generality, we may assume that f (xo ) > 0. Then, by the continuity of f ,
there exists δ > 0 such that (xo − δ, xo + δ) ⊂ (a, b) and
f (xo )
x ∈ (xo − δ, xo + δ) ⇒ |f (x) − f (xo )| < ,
2
or
f (xo ) f (xo )
x ∈ (xo − δ, xo + δ) ⇒ f (xo ) − < f (x) < f (xo ) + ,
2 2
from which we get that
f (xo )
f (x) > , for xo − δ < x < xo + δ. (3.40)
2
It follows from (3.40) that
Z xo +δ Z xo +δ
f (xo )
f (x) dx > dx = δf (xo ) > 0,
xo −δ xo −δ 2
which is in direct contradiction with (3.39). Hence, it must be the case that
f (x) = 0 for a < x < b. It then follows from the continuity of f that f (x) = 0
for all x ∈ [a, b].
Next, we state and prove the first fundamental lemma of the Calculus of
Variations. A version of this result is presented as a Basic Lemma in Section
3–1 in [Wei74].
Lemma 3.2.7 (Fundamental Lemma 1). Let G ∈ C([a, b], R) and assume that
Z b
G(x)η(x) dx = 0, for every η ∈ Co ([a, b], R).
a
Arguing by contradiction, assume that there exists xo ∈ (a, b) such that G(xo ) 6=
0. Without loss of generality, we may also assume that G(xo ) > 0. Then, since
G is continuous on [a, b], there exists δ > 0 such that (xo − δ, xo + δ) ⊂ (a, b)
such that
G(xo )
G(x) > , for xo − δ < x < xo + δ. (3.42)
2
3.2. FUNDAMENTAL LEMMAS 29
Put Z b
1
c= G(x) dx, (3.46)
b−a a
and Z b Z b Z b
η(b) = (G(t) − c) dt = G(t) dt − c dt = 0,
a a a
in view of the definition of c in (3.46). It then follows that η ∈ Co1 ([a, b], R).
Next, compute
Z b Z b
2
(G(x) − c) dx = (G(x) − c)(G(x) − c) dx
a a
Z b
= (G(x) − c)η 0 (x) dx,
a
Thus, using the assumption in (3.45) and the Fundamental Theorem of Calculus,
Z b
(G(x) − c)2 dx = 0, (3.49)
a
The following lemma combines the results of the first and the second funda-
mental lemmas in the Calculus of Variations.
Lemma 3.2.9 (Fundamental Lemma 3). Let f : [a, b] → R and g : [a, b] → R
be continuous real values functions defined on [a, b]. Assume that
Z b
[f (x)η(x) + g(x)η 0 (x)] dx = 0, for every η ∈ Co1 ([a, b], R).
a
Proof: Let f ∈ C([a, b], R), g ∈ C([a, b], R), and assume that
Z b
[f (x)η(x) + g(x)η 0 (x)] dx = 0, for every η ∈ Co1 ([a, b], R). (3.50)
a
Put Z x
F (x) = f (t) dt, for x ∈ [a, b]. (3.51)
a
Next, let η ∈ Co1 ([a, b], R) and use integration by parts to compute
Z b b
Z b
f (x)η(x) dx = f (x)η(x) − F (x)η 0 (x) dx
a a a
Z b
= − F (x)η 0 (x) dx,
a
We can then apply the second fundamental lemma (Lemma 3.2.8) to obtain
from (3.53) that
g(x) − F (x) = C, for all x ∈ [a, b]
and some constant C, from which we get that
It follows from (3.54), (3.51) and the Fundamental Theorem of Calculus that g
is differentiable with derivative
yo P
Q
y1 (x1 , y1 )
x1 x
The arc-length along the path from the point P to any point on the path, as a
function of x, is then given by
Z xp
s(x) = 1 + (y 0 (t))2 dt, for 0 6 x 6 x1 ;
0
so that, p
s0 (x) = 1 + (y 0 (x))2 , for 0 < x < x1 , (3.59)
by the Fundamental Theorem of Calculus.
The speed, v, of the particle along the path at any point on the curve is
given by
ds
v= . (3.60)
dt
We can use (3.59) and (3.60) to obtain a formula for the descent time, T , of the
particle: Z x1 0
s (x)
T = dx,
0 v
or Z x1 p
1 + (y 0 (x))2
T = dx. (3.61)
0 v
It thus remains to compute v in the denominator of the integrand in (3.61).
The speed of the particle will depend on the location of the particle along
the path. If we assume that the particle is released from rest, the v = 0 at P . To
find the speed at other points on the path, we will use the law of conservation of
energy, which says that the total mechanical energy of the system is conserved;
that is, the total energy remains constant throughout the motion. The total
energy of this particular system is the sum of the kinetic energy and the potential
energy of the particle of mass m.
in the general functional J given in (3.58). We will see many more examples of
this general class of functionals in these notes and in the homework assignments.
The general variational problem we would like to consider in this section is
the following:
Problem 3.3.2 (General Variational Problem 1). Given real numbers a and b
such that a < b, let F : [a, b] × R × R → R denote a continuous function of three
variables, (x, y, z), with x ∈ [a, b], and y and z in the set of real numbers (in
some cases, as in the Brachistochrone problem, we might need to restrict the
values of y and z as well). Define the functional J : C 1 ([a, b], R) → R by
Z b
J(y) = F (x, y(x), y 0 (x)) dx, for y ∈ C 1 ([a, b], R), (3.66)
a
or
J(y) > J(v), for all v ∈ A. (3.69)
y + tη ∈ A, for all t ∈ R.
Define g : R → R by
It follows from (3.70) and the definition of g in (3.72) that g has a minimum at
0; that is
g(t) > g(0), for all t ∈ R.
Thus, if it can be shown that the function g defined in (3.72) is differentiable,
we get the necessary condition
g 0 (0) = 0,
for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0
for η 0 (x) in the integrand of the integral on the right–hand side of (3.76).
Substituting 0 for t in (3.76) we then obtain that
Z b
d
[J(y + tη)] = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx.
dt t=0 a
Since we are assuming that Fy and Fz are continuous, we can apply the third
fundamental lemma in the Calculus of Variations (Lemma 3.2.9) to obtain from
(3.77) that the map
d
[Fz (x, y(x), y 0 (x))] = Fy (x, y(x), y 0 (x)), for all x ∈ (a, b). (3.78)
dx
The differential equation in (3.78) is called the Euler–Lagrange equation as-
sociated with the functional J defined in (3.66). It gives a necessary condition
for a function y ∈ A to be an optimizer of J over the class A given in (3.67). We
restate this fact, along with the assumptions on F , in the following proposition.
d
[Fz (x, y(x), y 0 (x))] = Fy (x, y(x), y 0 (x)), for all x ∈ (a, b). (3.81)
dx
38 CHAPTER 3. INDIRECT METHODS
J : C 1 ([xo , x1 ], R) → R,
given by
Z x1 p
J(y) = 1 + (y 0 (x))2 dx, for all y ∈ C 1 ([xo , x1 ], R), (3.83)
xo
In this case,
p
F (x, y, z) = 1 + z2, for (x, y, z) ∈ [xo , x1 ] × R × R.
So that,
z
Fy (x, y, z) = 0 and Fz (x, y, z) = √ , for (x, y, z) ∈ [xo , x1 ] × R × R.
1 + z2
The Euler–Lagrange equation associated with the functional in (3.83) is then
" #
d y 0 (x)
p = 0, for x ∈ (xo , x1 ).
dx 1 + (y 0 (x))2
for some constant of integration c1 . This is the same equation in (3.25) that we
obtained in the solution of Geodesic Problem 2. The solution of this equation
subject to the boundary conditions
y1 − yo x1 yo − xo y1
y(x) = x+ , for xo 6 x 6 x1 . (3.85)
x1 − xo x1 − xo
The graph of the function y given in (3.85) is a straight line segment form the
point (xo , yo ) to the point (x1 , y1 ). We will see in a subsequent section that the
function in (3.85) is the unique minimizer of the arc–length functional defined
in (3.83) over the class A given in (3.84).
and
z
Fz (x, y, z) = √ √ , for 0 6 x 6 x1 , y < yo , and z ∈ R.
1 + z 2 yo − y
and
y 0 = −u0 . (3.91)
We can then rewrite the equation in (3.88) as
" # p
d −u0 (x) 1 + (u0 (x))2
p p = , for 0 < x < x1 ,
dx 1 + (u0 (x))2 u(x) 2(u(x)3/2
or " # p
d u0 1 + (u0 )2
p √ =− , for 0 < x < x1 , (3.92)
dx 0 2
1 + (u ) u 2u3/2
where we have written u for u(x) and u0 for u0 (x). Next, we proceed to evaluate
the derivative on the left–hand side of the equation in (3.92) and simplify to
obtain from (3.92) that
where u00 denotes the second derivative of u. Multiply on both sides of (3.93)
by u0 to get
(u0 )3 + 2uu0 u00 + u0 = 0, for 0 < x < x1 ,
which can in turn be written as
d
[u + u(u0 )2 ] = 0. (3.94)
dx
Integrating the differential in (3.94) yields
C −u
(u0 )2 = , for 0 < x < x1 . (3.95)
u
Next, we solve (3.95) for u0 to get
r
0 C −u
u = , for 0 < x < x1 , (3.96)
u
3.3. THE EULER–LAGRANGE EQUATIONS 41
where we have taken the positive square root in (3.96) in view of (3.91), since
y decreases with increasing x.
Our goal now is to find a solution of the differential equation in (3.96) subject
to the conditions
u = 0 when x = 0, (3.97)
according to (3.89), amd
u = yo − y1 when x = x1 . (3.98)
The graph of a solution of (3.99) will be a smooth path connecting the point
(0, 0) to the pint (x1 , yo − y1 ) in the xu–plane as pictured in Figure 3.3.3. We
yo − y1 (x1 , yo − y1 )
x1 x
where θ is the angle the tangent line to the curve makes with a vertical line (see
the sketch in Figure 3.3.3). We then have that
dx
= tan θ; (3.101)
du
so that, using (3.99),
u sin2 θ
= , for θo < θ < θ1 . (3.102)
C −u cos2 θ
42 CHAPTER 3. INDIRECT METHODS
u = C sin2 θ, (3.103)
where we have used the trigonometric identity cos2 θ + sin2 θ = 1; thus, using
the trigonometric identity
1
sin2 θ = (1 − cos 2θ),
2
C
u(θ) = (1 − cos 2θ), for θo < θ < θ1 . (3.104)
2
In view of the condition in (3.97) we see from (3.104) that θo = 0; so that,
C
u(θ) = (1 − cos 2θ), for 0 < θ < θ1 . (3.105)
2
To find the parametric expression for x in terms of θ, use the Chain Rule to
obtain
dx dx du
= ;
dθ du dθ
so that, in view of (3.101) and (3.103)
dx sin θ
= · 2C sin θ cos θ,
dθ cos θ
which which we get
dx
= 2C sin2 θ,
dθ
or
dx
= C(1 − cos 2θ), for 0 < θ < θ1 . (3.106)
dθ
Integrating the differential equation in (3.106) and using the boundary condition
in (3.97), we obtain that
1
x(θ) = C(θ − sin 2θ), for 0 < θ < θ1 ,
2
which we can rewrite as
C
x(θ) = (2θ − sin 2θ) for 0 < θ < θ1 . (3.107)
2
C
Putting together the expressions in (3.105) and (3.107), denoting by a,
2
and introducing a new parameter t = 2θ, we obtain the parametric equations
(
x(t) = at − a sin t;
(3.108)
u(t) = a − a cos t,
3.3. THE EULER–LAGRANGE EQUATIONS 43
a
t
x
P
for 0 6 t 6 t1 , which are the parametric equations of a cycloid. This is the curve
traced by a point, P , on a circle of radius a and center (0, a), which starts at
the origin when t = 0, as the circle rolls on the x–axis in the positive direction
(see Figure 3.3.4). The parameter t gives the angle the vector from the center
of the circle to P makes with a vertical vector emanating from the center and
pointing downwards; this is shown in Figure 3.3.4.
To find a curve parametrized by (3.108) that goes through the point
(x1 , yo − y1 ),
(x1 , yo − y1 )
P1
k(x(t), u(t))k → ∞ as a → ∞.
Thus, since the distance defined in (3.110) is an increasing and continuous func-
tion of a, it follows from the intermediate value theorem that there exists a
value of a such that the cycloid generated by a circle of radius a goes through
the point (x1 , yo − y1 ); this is also shown in Figure 3.3.5. On the other hand, if
the point (x1 , yo − y1 ) is below the original cycloid, we can decrease the radius
a < 1 of the circle generating the cycloid until we reach the point (x1 , yo − y1 ).
Once the value of a > 0 is determined, we can find the value of t1 by solving
the second equation in (3.109) to obtain
−1 a − (yo − y1 )
t1 = cos .
a
we use the transformation equation in (3.89) to get from (3.108) the parametric
equations
(
x(t) = at − a sin t;
(3.111)
y(t) = yo − a + a cos t,
(x1 , yo − y1 )
Figure 3.3.6: Sketch of solution of (3.93) subject to u(0) = 0 and u(x1 ) = yo −y1
yo
y1 (x1 , y1 )
x1 x
Convex Minimization
and
x1
p
1 + (y 0 (x))2
Z
J(y) = p dx, for all y ∈ A, (4.2)
0 yo − y(x)
where
respectively, are strictly convex functionals. We will see in this chapter that the
for this class of functionals we can prove the existence of a unique minimizer.
The functionals in (4.1) and (4.2) are also Gâteaux differentiable. We begin this
chapter with a discussion of Gâteaux differentiability.
that is, the function g gives the values of J along a line through u in the
direction of v 6= 0. We will focus on the special case in which the function g is
differentiable at t = 0. If this the case, we say that the functional J is Gâteaux
differentiable at u in the direction of v and denote g 0 (0) by dJ(u; v), and call
47
48 CHAPTER 4. CONVEX MINIMIZATION
The existence of the expression on the right–hand side of (4.4) translate into
the existence of the limit defining g 0 (0), or
J(u + tv) − J(u)
lim .
t→0 t
Here is the formal definition of Gâteaux differentiability.
Definition 4.1.1 (Gâteaux Differentiability). Let V be a normed linear space,
Vo be a nontrivial subspace of V , and J : V → R be a functional defined on V .
We say that J is Gâteaux differentiable at u ∈ V in the direction of v ∈ Vo if
the limit
J(u + tv) − J(u)
lim (4.5)
t→0 t
exists. If the limit in (4.5) exists, we denote it by the symbol dJ(u; v) and call
it the Gâteaux derivative of J at u in the direction of v, or the first variation
of J at u in the direction of v. Thus, if J is Gâteaux differentiable at u in the
direction of v, its Gâteaux derivative at u in the direction of v is given by
d
dJ(u; v) = [J(u + tv)] , (4.6)
dt t=0
or
∂u ∂u ∂u
∇u = , , = (ux , uy , uz ), if n = 3;
∂x ∂y ∂z
so that
|∇u|2 = (ux )2 + (uy )2 , if n = 2,
or
|∇u|2 = (ux )2 + (uy )2 + (uy )2 , if n = 3.
The differential dx in the integral on the right–hand side of (4.8) represents the
element of area, dxdy, in the case in which Ω ⊂ R2 , or the element of volume,
dxdydz, in the case in which Ω ⊂ R3 . Thus, if n = 2 the integral on the right–
hand side of (4.8) is a double integral over the plane region Ω ⊂ R2 ; while, if
n = 3 the integral in (4.8) is a triple integral over the region in three–dimensional
Euclidean space.
We shall denote by Co1 (Ω, R) the space of functions v ∈ C 1 (Ω, R) that are 0
on the boundary, ∂Ω, of the region Ω; thus,
Co1 (Ω, R) = {v ∈ C 1 (Ω, R) | v = 0 on ∂Ω}.
We shall show in this example that the functional J : C 1 (Ω, R) → R defined in
(4.8) is Gâteaux differentiable at every u ∈ C 1 (Ω, R) in the direction of every
v ∈ Co1 (Ω, R) and
Z
dJ(u; v) = ∇u · ∇v dx, for all u ∈ C 1 (Ω, R) and all v ∈ Co1 (Ω, R), (4.9)
Ω
where ∇u · ∇v denotes the dot product of the gradients of u and v; thus,
∇u · ∇v = ux vx + uy vy , if n = 2,
or
∇u · ∇v = ux vx + uy vy + uz vz , if n = 3.
1
For u ∈ C (Ω, R) and v ∈ Co1 (Ω, R), use the definition of J in (4.8) to
compute Z
1
J(u + tv) = |∇(u + tv)|2 dx
2 Ω
Z
1
= |∇u + t∇v)|2 dx,
2 Ω
where we have used the linearity of the differential operator ∇. Thus, using the
fact that the square of the Euclidean norm of a vector is the dot product of the
vector with itself,
Z
1
J(u + tv) = (∇u + t∇v) · ∇u + t∇v) dx
2 Ω
Z
1
= (|∇u|2 + 2t∇u · ∇v + t2 |∇v|2 ) dx
2 Ω
Z Z Z
1
= |∇u|2 + t ∇u · ∇v dx + t2 |∇v|2 dx.
2 Ω Ω Ω
50 CHAPTER 4. CONVEX MINIMIZATION
F : [a, b] × R × R → R
Thus, according to Proposition B.1.1 in Appendix B.1 in these notes, the ques-
tion of differentiability of the map
∂ ∂
[F (x, y, z)] = Fy (x, y, z) and [F (x, y, z)] = Fz (x, y, z)
∂y ∂z
∂
[F (x, y(x)+tη(x), y 0 (x)+tη 0 (x))] = Fy (·, y+tη, y 0 +tη 0 )η+Fz (·, y+tη, y 0 +tη 0 )η 0 ,
∂t
for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0 for
η 0 (x). Thus, since we are assuming that y and η are C 1 functions, we see that
the assumptions for Proposition B.1.1 in Appendix B.1 hold true. We therefore
conclude that the map in (4.15) is differentiable and
Z b
d
[J(y + tη)] = [Fy (x, y + tη, y 0 + tη 0 )η + Fz (x, y + tη, y 0 + tη 0 )η 0 ] dx, (4.16)
dt a
for all t ∈ R, where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0
for η 0 (x) in the integrand of the integral on the right–hand side of (4.15).
Setting t = 0 in (4.16) we then obtain that
Z b
d
[J(y + tη)] = [Fy (x, y, y 0 )η + Fz (x, y, y 0 )η 0 ] dx,
dt t=0 a
where
Since we are assuming that the functions p and q are continuous on [xo , x1 ], it
follows that the function F : [xo , x1 ] × R × R → R defined in (4.18) is continuous
on [xo , x1 ] × R × R with continuous partial derivatives
for (x, y, z) ∈ [xo , x1 ]×R×R. Consequently, by the result of Example and (4.13),
we conclude that the functional J defined in (4.17) is Gâteaux differentiable at
every y ∈ C 1 ([xo , x1 ], R) with Gâteaux derivative
Z x1
dJ(y, η) = [2q(x)y(x)η(x) + 2p(x)y 0 (x)η 0 (x)] dx, (4.19)
xo
or
J(u + tv) − J(u) > 0, for |t| < δ. (4.22)
Dividing on both sides of the inequality in (4.22) by t > 0 we obtain that
Thus, letting t → 0+ in (4.23) and using the definition of the Gâteaux derivative
of J in (4.7), we get that
dJ(u; v) > 0, (4.24)
Combining (4.24) and (4.26), we obtain the result that, if J is Gâteaux differ-
entiable at u, and u is a minimizer of J over A, then
if and only if Z
|∇v|2 dx = 0.
Ω
1
Thus, since v ∈ C (Ω, R), ∇u = 0 in Ω, and therefore v is constant on connected
components of Ω. Hence, since v = 0 on ∂Ω, it follows that v(x) = 0 for all
x ∈ Ω. We conclude therefore that the Dirichlet integral functional J defined
in (4.31) is strictly convex in A.
4.3. CONVEX FUNCTIONALS 55
Since we are assuming in this example that p(x) > 0 and q(x) > 0 for all
x ∈ [xo , x1 ], it follows from (4.37) that
if and only if Z x1
[p(x)(η 0 (x))2 + q(x)(η(x))2 ] dx = 0. (4.38)
xo
Now, it follows from (4.38) and the assumption that p > 0 and q > 0 on [xo , x1 ]
that Z x1 Z x1
0 2
p(x)(η (x)) dx = 0 and q(x)(η(x))2 dx = 0,
xo xo
from which we obtain that η(x) = 0 for all x ∈ [xo , x1 ], since η is continuous on
[xo , x1 ]. Hence, J is strictly convex in A.
F : [a, b] × R × R → R
In this example we find conditions on the function F that will guarantee that
the functional J given in (4.39) is convex or strictly convex.
4.3. CONVEX FUNCTIONALS 57
provided that
Z b Z b Z b
F (x, y+v, y 0 +v 0 ) dx > F (x, y, y 0 ) dx+ [Fy (x, y, y 0 )v+Fz (x, y, y 0 )v 0 ] dx
a a a
for all y ∈ A and v ∈ Vo . This inequality will follow, for example, if the function
F satisfies
for all (x, y, z) and (x, y +v, z +w) where F is defined. Furthermore, the equality
holds true if and only if equality in (4.42) holds true; and this is the case if and
only if v = 0 or w = 0. In this latter case we get that J is also strictly convex.
so that
Fy (x, y, z) = 2q(x)y and Fz (x, y, z) = 2p(x)z,
for (x, y, z) ∈ [xo , x1 ] × R × R. Thus, the condition in (4.42) for the functional
in (4.42) reads
p(x)(z + w)2 + q(x)(y + v)2 > p(x)z 2 + q(x)y 2 + 2q(x)yv + 2p(x)zw (4.44)
To show that (4.44) holds true, expand the term on the left–hand side to get
+q(x)(y 2 + 2yv + v 2 )
+p(x)w2 + q(x)v 2 .
Since we are assuming that p > 0 and q > 0 on [xo , x1 ], we see that (4.44)
follows from (4.45). Thus, the functional J given in (4.43) is convex.
To see that J is strictly convex,assume that equality holds in (4.44); so that,
Consequently, since we are assuming that p(x) > 0 and q(x) > 0 for all x ∈
[xo , x1 ], we get that
w = v = 0.
Hence, the functional J given in (4.43) is strictly convex.
Example 4.3.6. Let p : [a, b] → R be continuous on [a, b]. Suppose p(x) > 0
for all x ∈ [a, b] and define
Z b p
J(y) = p(x) 1 + (y 0 )2 dx for y ∈ C 1 ([a, b], R). (4.46)
a
The fact that the inequality in (4.46) holds true for all z, w ∈ R, with equality
iff w = 0, is a consequence of the Cauchy–Schwarz inequality in R2 applied to
the vectors A ~ = (1, z) and B
~ = (1, z + w).
It follows from the inequality in (4.48) that the functional in (4.46) is convex
in A.
To show that the functional J given in (4.46) is strictly convex, first rewrite
the inequality in (4.48) as
p p zw
1 + (z + w)2 − 1 + z 2 − √ > 0, for z, w ∈ R, (4.49)
1 + z2
and note that equality in (4.49) holds if and only if w = 0.
It follows from what we have just shown that
where we have used (4.51). So, equality in (4.50) holds if and only if
Z b !
p
0 0 2
p
0 2
y0 v0
p(x) 1 + (y + v ) − 1 + (y ) − p dx = 0. (4.52)
a 1 + (y 0 )2
It follows from (4.52), the inequality in (4.49), the assumption that p(x) > 0 for
all x ∈ [a, b], and the assumptions that p, y 0 and v 0 are continuous, that
p p y 0 (x)v 0 (x)
1 + (y 0 (x) + v 0 (x))2 − 1 + (y 0 (x))2 − p = 0, for all x ∈ [a, b];
1 + (y 0 (x))2
Thus, v is constant on [a, b]. Therefore, since v(a) = 0, it follows that v(x) = 0
for all x ∈ [a, b]. We have therefore demonstrated that he functional J defined
in (4.46) is strictly convex in A given in (4.47).
Example 4.3.7. Let I denote an open interval of real numbers (we note that I
could be the entire real line). We consider function F : [a, b] × I → R; that is, F
is a function of two variables (x, z), where x ∈ [a, b] and z ∈ I. We assume that
F is continuous on [a, b] × I. Suppose also that the second partial derivative of
F with respect to z, Fzz (x, z), exists and is continuous on [a, b] × I, and that
In this example we show that, if (4.53) holds true, then the functional J defined
in (4.54) is strictly convex in
for all x ∈ [a, b], z ∈ I and w ∈ R such that z + w ∈ I, since Fy = 0 in this case.
Fix x ∈ [a, b], z ∈ I and w ∈ R such that z + w ∈ I, and put
and so
F (x, z + w) > F (x, z) + Fz (x, z)w, (4.58)
which is the inequality in (4.56).
Next, assume that equality holds in (4.58). It then follows from (4.57) that
Z 1
(1 − t)Fzz (x, z + tw)w2 dt = 0,
0
4.3. CONVEX FUNCTIONALS 61
Z b Z b
> F (x, y 0 (x)) dx + Fz (x, y 0 (x))η 0 (x) dx;
a a
so that, using the result in Example 4.1.3 and the definition of J in (4.54),
or, using the the definition of J in (4.54) and the result in Example 4.1.3,
Z b Z b Z b
F (x, y 0 (x) + η 0 (x)) dx = F (x, y 0 (x)) dx + Fz (x, y 0 (x))η 0 (x) dx,
a a a
or
Z b
[F (x, y 0 (x) + η 0 (x)) − F (x, y 0 (x)) − Fz (x, y 0 (x))η 0 (x)] dx = 0. (4.60)
a
It follows from the inequality in (4.58) that the integrand in (4.60) is nonnegative
on [a, b]; hence, since y 0 , η 0 , F and Fz are continuous functions, it follows from
(4.60) that
F (x, y 0 (x) + η 0 (x)) − F (x, y 0 (x)) − Fz (x, y 0 (x))η 0 (x) = 0, for x ∈ [a, b]. (4.61)
J: V → R
Optimization Problems
with Constraints
65
66 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
x
b `
Figure 5.1.1 shows one of those curves (the graph of the corresponding function
y ∈ C 1 ([0, b], R) is shown over the interval [0, b]). We will denote the class of
these curves by A. Setting Vo = Co1 ([0, b], R) and
Z b p
K(y) = 1 + (y 0 (x))2 dx, for y ∈ C 1 ([0, b], R), (5.4)
0
To make the dependence of the constraint in (5.6) more explicit, we may also
write the problem as: Find y ∈ A such that
J(y) > J(v), for v ∈ Vo with v(x) > 0 for x ∈ [0, b] and K(v) = `. (5.8)
In the following section we will see how we can approach this kind of problems
in a general setting.
where c is a given real number; it is assumed that the level set K −1 (c) in (5.9)
is nonempty. Thus, we would like to find conditions satisfied by uo ∈ V such
that
K(uo ) = c (5.10)
and
J(uo ) 6 J(v) (or J(uo ) > J(v)) for all v ∈ V such that K(v) = c. (5.11)
that is, for each v ∈ Vo , given ε > 0, there exists δ > 0 such that
Define J : V → R by
Z 1
1
J(y) = (y 0 (x))2 dx for all y ∈ V. (5.14)
2 0
Z 1
= [u0 (x) − y 0 (x)]v 0 (x) dx;
0
K(uo ) = c, (5.18)
J(uo ) 6 J(v) (or J(uo ) > J(v)) for all v ∈ V such that K(v) = c. (5.19)
Suppose also that the Gâteaux derivatives, dJ(u; v) and dK(u; v), of J and K,
respectively, are weakly continuous in u for all u ∈ V . Then, either
we can show that the Gâteaux derivative of K given in (5.25) is weakly contin-
uous (see Problem 4 in Assignment #6).
To see that the Gâteaux derivative of J in (5.26) is also weakly continuous,
observe that, for all u and y in V ,
in view of (5.26).
Thus, we can apply the Euler–Lagrange Multiplier Theorem to the optimiza-
tion problem: Find y ∈ A, where A is given in (5.22), such that
b
y (x)η 0 (x)
0
Z
p dx = 0, for all η ∈ Vo ; (5.29)
a 1 + (y 0 (x))2
or, there exists a multiplier µ ∈ R such that
Z b Z b
y 0 (x)η 0 (x)
η(x) dx = µ p dx, for all η ∈ Vo , (5.30)
0 a 1 + (y 0 (x))2
where we have used (5.18), (5.20) and (5.21) in the conclusion of Theorem 5.2.3.
We first consider the case in which (5.29) holds true. In this case, the Second
Fundamental Lemma of the Calculus of Variations (Lemma 3.2.8 on page 29 in
these notes) implies that
y 0 (x)
p = C, for all x ∈ [0, b], (5.31)
1 + (y 0 (x))2
5.2. EULER–LAGRANGE MULTIPLIER THEOREM 71
for some constants c1 and c2 . Then, using the requirement that y ∈ A, so that
y ∈ Vo , we obtain from (5.33) that
c1 = c2 = 0;
Thus,
y(x) = 0, for all x ∈ [0, b], (5.34)
is a candidate for an optimizer of J over A, provided that (5.28) holds true.
We get from (5.28) and (5.34) that
Z b p
1 + 02 dx = `,
0
Now, it follows from (5.35) and the Third Fundamental Lemma (Lemma 3.2.9
on page 30 of theses notes) that y must be a solution of the differential equation
" #
d µy 0 (x)
− p = 1, for 0 < x < b. (5.36)
dx 1 + (y 0 (x))2
y 0 (x) x
p = − + c1 , for 0 < x < b, (5.38)
1 + (y 0 (x))2 µ
dy x − µc1
= ±p , for 0 < x < b, (5.39)
dx µ − (x − µc1 )2
2
(x − µc1 )2 + (y − c2 )2 = µ2 , (5.41)
which is the equation of a circle of radius µ (here we are taking µ > 0) and
center at (µc1 , c2 ). Thus, the graph of a y ∈ A for which the area, J(y), under
it and above the x–axis is the largest possible must be a semicircle of radius µ
and centered at (µc1 , 0); so that, c2 = 0. We then get from (5.40) that
p
y(x) = µ2 − (x − µc1 )2 , for 0 < x < b, (5.42)
where we have taken the positive solution in (5.40) to ensure that y(x) > 0
for all x ∈ [0, b], according to the definition of A in (5.22); so that, the graph
of y is the upper semicircle. We are also assuming that c1 > 0. Furthermore,
x
b
the condition y(0) = 0 in the definition of A in (5.22) implies from (5.42) that
c1 = 1; so that, (5.42) now reads
p
y(x) = µ2 − (x − µ)2 , for 0 < x < b; (5.43)
thus, the graph of y is a semicircle of radius µ > 0 centered at (µ, 0); this is
pictured in Figure 5.2.2, where b = 2µ. Hence, according to the definition of A
in (5.23), we also obtain an expression for µ in terms of b:
b
µ= . (5.44)
2
Finally, since K(y) = `, according to the definition of A in (5.23), it must also
be the case that
πµ = `, (5.45)
given that πµ is the arc–length of the semicircle of radius µ pictured in Figure
5.2.2. Combining (5.44) and (5.45) we see that
2`
b= ,
π
which gives the connection between b and ` imposed by the constraint in (5.28).
where A is as given in (5.22) and J as defined in (5.24) exists, then the graph
of y(x), for 0 6 x 6 b, must be a semicircle.
and Z b
K(y) = G(x, y(x), y 0 (x)) dx, for y ∈ V, (5.50)
a
respectively.
We consider the problem of finding an optimizer y ∈ A, of J over the class
A, where A is given in (5.47) subject to the constraint
K(y) = c, (5.51)
and
Z b
dK(y, η) = [Gy (x, y, y 0 )η + Gz (x, y, y 0 )η 0 ] dx, for y ∈ V and η ∈ Vo , (5.53)
a
where we have written y for y(x), y 0 for y 0 (x), η for η(x), and η 0 for η 0 (x) in
the integrands in (5.52) and (5.53). The assumption that the partial derivatives
in (5.48) are continuous for (x, y, z) ∈ [a, b] × R × R will allow us to show that
the Gâteaux derivatives of J and K in J in (5.52) and (5.53), respectively, are
weakly continuous in V with respect to the norm k · k defined in (5.46). This
fact is proved in Appendix C starting on page 93 of these notes. Thus, we can
apply Theorem 5.2.3 to obtain that, if (x, y) ∈ A, where A is given in (5.47), is
an optimizer of J over A subject to the constraint in (5.51), then, either
Z b
[Gy (x, y, y 0 )η + Gz (x, y, y 0 )η 0 ] dx = 0, for all η ∈ Vo , (5.54)
a
Thus, the conditions in (5.54), and (5.55) and (5.56) are necessary conditions
for (x, y) ∈ A being an optimizer of J over A subject to the constraint in (5.51).
These conditions in turn yield the following Euler–Lagrange equations by virtue
of the third fundamental lemma in the Calculus of Variations (Lemma 3.2.9):
Either
d
[Gz (x, y, y 0 )] = Gy (x, y, y 0 ), (5.57)
dx
or there exists µ ∈ R such that
d
[Hz (x, y, y 0 )] = Hy (x, y, y 0 ), (5.58)
dx
where H is given in (5.56).
and Z b
K(y) = y(x) dx, for all y ∈ V, (5.60)
0
respectively. Let
A = {y ∈ Vo | y(x) > 0}. (5.61)
We consider the following constraint optimization problem:
Problem 5.2.11. Minimize J(y) for y ∈ A subject to the constraint
K(y) = a, (5.62)
and Z b
dK(y; η) = η(x) dx, for y ∈ V and η ∈ Vo , (5.64)
a
respectively. In Example 5.2.7 we saw that the Gâteaux derivatives in (5.64) and
(5.63) are weakly continuous. Thus, the Euler–Lagrange Multiplier Theorem
applies. Hence, if y ∈ A is an optimizer for J over A subject to the constraint
in (5.62), then either y must solve the Euler–Lagrange equation in (5.57), or
there exists a multiplier µ ∈ R such that
and
G(x, y, z) = y, for x ∈ [0, b] × R × R.
We then have that
z
Fy (x, y, z) = 0 and Fz (x, y, z) = √ , for x ∈ [0, b] × R × R,
1 + z2
and
which yields
y0
p = µx + c1 , (5.65)
1 + (y 0 )2
for some constant c1 .
We consider two cases in (5.65): either µ = 0, or µ 6= 0.
If µ = 0 in (5.65), we obtain that
y 0 = c2 ,
y(x) = c2 x + c3 ,
satisfy the constraint in (5.62), we get from the definition of K in (5.60) that
a = 0, which is impossible since we are assuming that a > 0. Hence, it must be
the case that µ 6= 0.
Thus, assume that µ 6= 0 and solve the equation in (5.65) for y 0 to obtain
the differential equation
dy µx + c1
= ±p , for 0 < x < b,
dx 1 − (µx + c1 )2
which can be integrated to yield
1p
y=∓ 1 − (µx + c1 )2 + c2 , for 0 6 x 6 b, (5.66)
µ
for some constant of integration c2 .
It follows from the expression in (5.66) that
(µx + c1 )2 + (µy − µc2 )2 = 1,
which we can rewrite as
2
c1 1
x+ + (y − c2 )2 = . (5.67)
µ µ2
Observe that the expression in (5.67) is the equation of a circle in the xy–plane
of radius 1/µ (here, we are taking µ > 0) and center at
c1
− , c2 .
µ
The boundary conditions in the definition of A in (5.61) imply that
y(0) = 0 and y(b) = 0.
Using these conditions in (5.67) we obtain that
c1 b
=− .
µ 2
Thus, we can rewrite the equation in (5.67) as
2
b 1
x− + (y − c2 )2 = 2 , (5.68)
2 µ
which is the equation of a circle in the xy–plane of radius 1/µ and center at
b
, c2 .
2
Thus, according to (5.68), a solution y ∈ A of the constrained optimization
Problem 5.2.11 has graph along an arc of the circle connecting the point (0, 0)
to the point (b, 0). The value of c2 can be determined by the condition in (5.62).
We’ll get back to the solution Problem 5.2.11 in the next section dealing with
an isoperimetric problem
78 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
Remark 5.3.4. A continuous, simple, closed curve in the plane is also called a
Jordan curve. The Jordan Curve Theorem states that any continuous, simple,
closed curve in the plane separates the plain into two disjoint, connected, regions:
a bounded region (we shall refer to this region as the region enclosed by the
curve) and an unbounded region (the region outside the curve).
We denote by A the class of smooth, simple, closed curves in the plane.
According the definition of a smooth, simple, closed curve in Definition 5.3.3,
we can identify A with the class of functions σ ∈ V such that σ : [0, 1] → R2
satisfies the conditions in Definition 5.3.3. We will also assume that the paths
in A induce a counterclockwise (or positive) orientation on the curve that σ
parametrizes. Thus, for each σ = (x, y) ∈ A, we can compute the are of the
region enclosed by σ by using the formula (B.13) derived in Appendix B.2 using
the Divergence Theorem. Denoting the area enclosed by (x, y) ∈ A by J((x, y))
we have that I
1
J((x, y)) = (xdy − ydx), (5.71)
2 ∂Ω
where Ω is the region enclosed by the path (x, y) ∈ A. Expressing the line
integral in (5.71) in terms of the parametrization (x, y) : [0, 1] → R2 of ∂Ω, we
have that
1 1
Z
J((x, y)) = (x(t)ẏ(t) − y(t)ẋ(t)) dt, for (x, y) ∈ A. (5.72)
2 0
We note that the functional J : V → R defined in (5.72) defines a a functional
on V that, when restricted to A, gives the area of the region enclosed by σ =
(x, y) ∈ A. The arc–length of any curve (x, y) ∈ V is given by
Z 1p
K((x, y)) = (ẋ(t))2 + (ẏ(t))2 dt, for (x, y) ∈ V. (5.73)
0
and
Z b
K((x, y)) = G(t, x(t), y(t), ẋ(t), ẏ(t)) dt, for (x, y) ∈ C 1 ([a, b], R2 ), (5.76)
a
80 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
and
for (t, x, y, p, q) ∈ [a, b] × R4 (with possible exceptions). Indeed, for the func-
tionals in (5.72) and (5.73), [a, b] = [0, 1],
1 1
F (t, x, y, p, q) = xq − yp, for (t, x, y, p, q) ∈ [0, 1] × R4 , (5.77)
2 2
and
p
G(t, x, y, p, q) = p2 + q 2 , for (t, x, y, p, q) ∈ [0, 1] × R4 , (5.78)
with p2 + q 2 6= 0. We note that for the functions F and G defined in (5.77) and
(5.78), respectively,
1 1
Fx (t, x, y, p, q) = q, Fy (t, x, y, p, q) = − p,
2 2
(5.79)
1 1
Fp (t, x, y, p, q) = − y, Fq (t, x, y, p, q) = x,
2 2
which are continuous functions for (t, x, y, p, q) ∈ [0, 1] × R4 , and
Gx (t, x, y, p, q) = 0, Gy (t, x, y, p, q) = 0,
p q (5.80)
Gp (t, x, y, p, q) = p , Gq (t, x, y, p, q) = p ,
p2 + q2 p + q2
2
for given (xo , yo ) ∈ R2 and (x1 , y1 ) ∈ R2 , we can use the assumptions that
the partial derivatives Fx , Fy , Fp , Fq , Gx , Gy , Gp and Gq are continuous to
show the the functionals J : V → R and K : V → R defined in (5.75) and
(5.76), respectively, are Gâteaux differentiable at (x, y) ∈ V in the direction of
(η1 , η2 ) ∈ Vo , with Gâteaux derivatives given by
Z b
dJ((x, y); (η1 , η2 )) = [Fx η1 + Fy η2 + Fp η̇1 + Fq η̇2 ] dt, (5.81)
a
5.3. AN ISOPERIMETRIC PROBLEM 81
The isoperimetric problem in Problem 5.3.5 is then a special case of the opti-
mization problem:
for some constant c, where J and K are given in (5.75) and (5.76), respectively.
We can use the Euler–Lagrange Multiplier Theorem (Theorem 5.2.3 on page 69
in these notes) to obtain necessary conditions for the solvability of the variational
problem in (5.83), provided we can show that the Gâteaux derivatives of J and
K in (5.81) and (5.82), respectively, are weakly continuous; this can be shown
using the arguments in Appendix C. We therefore obtain that, if (x, y) ∈ A is
an optimizer of J over A subject to the constraint K((x, y)) = c, then either
Z b
[Gx η1 + Gy η2 + Gp η̇1 + Gq η̇2 ] dt = 0, for all (η1 , η2 ) ∈ Vo , (5.84)
a
Now, taking η2 (t) = 0 for all t ∈ [a, b] we obtain from (5.84) that
Z b
[Gx η1 + Gp η̇1 ] dt = 0, for all η1 ∈ Co1 [a, b]. (5.87)
a
82 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
It follows from (5.87) and the third fundamental lemma in the Calculus of Vari-
ations (Lemma 3.2.9 on page 30 in these notes) that Gp (t, x(t), y(t), ẋ(t), ẏ(t))
is a differentiable function of t with derivative
d
[Gp (t, x, y, ẋ, ẏ)] = Gx (t, x, y, ẋ, ẏ), for t ∈ (a, b).
dt
Similarly, taking η1 (t) = 0 for all t ∈ [a, b] in (5.84) and applying the third
fundamental lemma of the Calculus of Variations, we obtain the differential
equation
d
[Gq (t, x, y, ẋ, ẏ)] = Gy (t, x, y, ẋ, ẏ).
dt
We have therefore shown that the condition in (5.84) implies that (x, y) ∈ A
must solve the system of differential equations
d
dt [Gp (t, x, y, ẋ, ẏ)] = Gx (t, x, y, ẋ, ẏ);
(5.88)
d
[Gq (t, x, y, ẋ, ẏ)] = Gy (t, x, y, ẋ, ẏ).
dt
Similarly, we obtain from (5.87) the system of differential equations
d [Hp (t, x, y, ẋ, ẏ)] = Hx (t, x, y, ẋ, ẏ);
dt
(5.89)
d
[Hq (t, x, y, ẋ, ẏ)] = Hy (t, x, y, ẋ, ẏ),
dt
where H = F − µG is given in (5.85).
Hence, if (x, y) is a solution of the constrained optimization problem in
(5.83), where J : V → R and K : V → R are as given in (5.75) and (5.76),
respectively, then, either (x, y) solves the system of Euler–Lagrange equations
in (5.88), or there exists an Euler–Lagrange Multiplier µ ∈ R such that (x, y)
solves the system of Euler–Lagrange equations in (5.89), where H is as given
in (5.85). We next apply this reasoning to the isoperimetric problem stated in
Problem 5.3.5.
In the case of the constrained optimization problem in Problem 5.3.5, F and
G are given by (5.77) and (5.78), respectively, and their partial derivatives are
given in (5.79) and (5.80), respectively. Thus, if (x, y) is a simple, closed curve
of arc–length ` that encloses the largest possible area, then either (x, y) solves
the system !
d ẋ
= 0;
p
dt
ẋ2 + ẏ 2
! (5.90)
d ẏ
= 0,
dt pẋ2 + ẏ 2
5.3. AN ISOPERIMETRIC PROBLEM 83
or there exists a multiplier, µ, such that (x, y) solves the system of differential
equations !
d
1 ẋ 1
− y − µp = ẏ;
dt 2 2
ẋ2 + ẏ 2
! (5.91)
d 1 ẏ 1
x − µp = − ẋ.
dt 2 2
ẋ + ẏ 2
2
Let (x, y) ∈ A, where A is the class of smooth, simple, closed curves in the
plane, be a solution of the isoperimetric problem in Problem 5.74. suppose also
that (x, y) solves the system of Euler–Lagrange equations in (5.90). We then
have that
ẋ
p = c1 ;
ẋ + ẏ 2
2
(5.92)
ẏ
p 2 = d1 ,
ẋ + ẏ 2
for constants c1 and d1 . Note that, in view of (5.70) in Definition 5.3.3, which
is part of the definition of A, the denominators in the expressions in (5.92) are
not zero. Also, it follows from (5.92) that
c21 + d21 = 1;
y = c3 x + c4 ,
for constants c3 and c4 ; so that, the simple closed (x(t), y(t)), for t ∈ [0, 1], again
lies on a straight line, which is impossible.
Thus, the second alternative in the Euler–Lagrange Multiplier Theorem must
hold true for a solution (x, y) ∈ A of the constrained optimization problem in
(5.74). Therefore, there exists a multiplier µ ∈ R for which the system of
Euler–Lagrange equations in (5.91) holds true.
84 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
(x − c1 )2 + (y − c2 )2 = µ2 , (5.95)
which is the equation of a circle of radius µ (we are taking µ > 0) and center at
(c1 , c2 ).
We have therefore shown that if (x, y) ∈ A is a solution of the isoperimetric
problem in (5.74), then the simple, closed curve (x(t), y(t)), for t ∈ [0, 1], must
5.3. AN ISOPERIMETRIC PROBLEM 85
lie on some circle of radius µ (see equation (5.95)). Since K((x, y)) = `, it
follows that
2πµ = `,
from which we get that
`
µ= .
2π
86 CHAPTER 5. OPTIMIZATION WITH CONSTRAINTS
Appendix A
Some Inequalities
87
88 APPENDIX A. SOME INEQUALITIES
Appendix B
Theorems About
Integration
∂H
Assume that the functions H and are absolutely integrable over [a, b].
∂t
1
Then, h is C and its derivative is given by
Z b
0 ∂
h (t) = [H(x, t)] dx.
a ∂t
∂ ∂
Assume that the functions H, [H(x, y, t)] and [H(x, y, t)] are absolutely
∂y ∂t
89
90 APPENDIX B. THEOREMS ABOUT INTEGRATION
integrable over [a, b]. Then, h is C 1 and its partial derivatives are given by
Z t
∂ ∂
[h(y, t)] = [H(x, y, t)] dx
∂y a ∂y
and Z t
∂ ∂
[h(x, t)] = H(t, y, t) + [H(x, y, t)] dx.
∂t a ∂t
→
− →
−
Then F has units of mass per unit length, per unit time. The vector field F in
(B.5) is called the flow field and it measures the amount of fluid per unit time
that goes through a cross section of unit length perpendicular to the direction of
→
−
V . Thus, to get a measure of the amount of fluid per unit time that crosses the
boundary ∂Ω in direction away from the region Ω, we compute the line integral
→
−
I
F ·nb ds, (B.6)
∂Ω
where ds is the element of arc–length along ∂Ω, and n b is unit vector that is
perpendicular to the curve ∂Ω and points away from Ω. The, expression in (B.6)
→
−
is called the flux of the flow field F across ∂Ω and it measures the amount of
fluid per unit time that crosses the boundary ∂Ω.
→
− →
−
On the other hand, the divergence, div F , of the flow field F in (B.5) has
units of mass/time × length2 , and it measures the amount of fluid that diverges
from a point per unit time per unit area. Thus, the integral
→
−
ZZ
div F dxdy (B.7)
Ω
the total amount of fluid leaving the reagin Ω per unit time. In the case where
there are not sinks or sources of fluid inside the region Ω, the integrals in (B.6)
and (B.7) must be equal; so that,
→
− →
−
ZZ I
div F dxdy = F ·nb ds. (B.8)
Ω ∂Ω
where n
b is the outward, unit, normal vector to ∂Ω that exists everywhere on
∂Ω, except possibly at finitely many points.
→
−
that, for the C 1 vector field F given in (B.3), the line integral on the right–hand
side of (B.9) can be written as
1
→
−
I Z
1
F ·n
b ds = (P (σ(t)), Q(σ(t))) · (ẏ(t), −ẋ(t)) |σ 0 (t)| dt,
∂Ω 0 |σ 0 (t)|
or
1
→
−
I Z
F ·n
b ds = [P (σ(t))ẏ(t) − Q(σ(t))ẋ(t)] dt,
∂Ω 0
which we can write, using differentials, as
→
−
I I
F ·nb ds = (P dy − Qdx). (B.11)
∂Ω ∂Ω
→
−
Thus, using the definition of the divergence of F in (B.4) and (B.11), we can
rewrite (B.9) as
ZZ I
∂P ∂Q
+ dxdy = (P dy − Qdx), (B.12)
Ω ∂x ∂y ∂Ω
for the area of the region Ω enclosed by a simple closed curve ∂Ω.
Appendix C
Continuity of Functionals
93
94 APPENDIX C. CONTINUITY OF FUNCTIONALS
95