Calculus of Variations
Calculus of Variations
Gjerrit Meinsma
Arjan van der Schaft
2018
Preface
This book reflects a long history of teaching Optimal Control at the Department of Applied
Mathematics, University of Twente. Main contributors to the original lecture notes are Hans
Zwart, Jan Willem Polderman (University of Twente) and Henk Nijmeijer (Eindhoven Univer-
sity of Technology). In 2006–2008 Arjan van der Schaft (University of Groningen) made a num-
ber of substantial revisions and modifications. In the period 2010–2018 Gjerrit Meinsma (Uni-
versity of Twente) rewrote most of the material, included new material and added a number
of examples, illustrations and alternative proofs and more.
The book aims at final year Bsc students and Msc students. Now in optimal control we fre-
quently switch from constants x to functions x(·) and this can be confusing upon first reading.
This is the reason we emphasise this difference by including brackets (·) whenever a function
is meant.
ii
Contents
1 Calculus of Variations 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Beltrami identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Higher-order Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Relaxed boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Intermezzo: Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Second order conditions for minimality . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 Integral constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2 Minimum Principle 35
2.1 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Summary of the classic Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . 36
2.3 First-order conditions for optimal control . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Optimal control with final constraints . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Free final time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
3 Dynamic Programming 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Principle of optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Discrete-time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Hamilton-Jacobi-Bellman equation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Connection with Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Infinite horizon and Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . 78
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80
iii
A.1 Positive definite functions and matrices . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2 A notation for partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.3 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.4 Linear constant-coefficient DE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A.5 System of linear time-invariant DE’s . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A.6 Stabilizability and detectability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.7 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126
C Bibliography 161
Index 162
iv
Chapter 1
Calculus of Variations
1.1 Introduction
Calculus of variations deals with minimization of expressions of the form
Z T
F (t , x(t ), ẋ(t )) dt
0
x : [0, T ] → Rn .
1
mv 2 (x) − mg y(x) = c. (1.1)
2
1 Newton’s minimal resistance problem can also be seen as a calculus of variations problem and it predates it
1
A A
B B
A A
B B
x
(x0 , 0)
y
dy ds
dx
p
F IGURE 1.3: ds = 1 + ẏ 2 (x) dx
2
Here v(x) is the speed of the mass and c is the kinetic energy at time zero c = 12 mv 2 (x 0 ).
We release the mass with zero speed so c = 0 and hence the speed follows from the vertical
displacement as
p
v(x) = 2g y(x). (1.2)
This way the time T needed to travel from (x 0 , y 0 ) = (x 0 , 0) to (x 1 , y 1 ) can be seen as an integral
over x,
Z T Z x1 s
1 + ẏ 2 (x)
T= dt = dx. (1.3)
0 x0 2g y(x)
We need to minimize the integral (1.3) over all functions y(x) subject to y(x 0 ) = y 0 = 0 and
y(x 1 ) = y 1 . ä
Example 1.1.2 (Oil production). An oil company is to deliver an amount of L liters of oil at a
delivery time T . The company wants to find a production schedule for completing the order
with minimal costs. Let `(t ) denote the amount of oil at time t . We assume that both storing
oil and producing oil is costly. The total cost might be modeled as
Z T
α`˙2 (t ) + β`(t ) dt (1.4)
0
where β`(t ) models the storage cost per unit time and α`˙2 (t ) models the production cost per
unit time. The constants α, β are positive numbers. The objective of the oil company is to
˙ ) that minimizes the above cost, subject to the conditions
determine a production rate `(t
`(0) = 0, `(T ) = L, ˙ ) ≥ 0.
`(t (1.5)
Example 1.1.3 (Shortest path). What is the shortest path between two points (x 0 , y 0 ) and
(x 1 , y 1 ) in R2 ? Of course we know the answer but let us anyway formulate this problem in
more detail.
Clearly the path is characterized by a function y(x). As in Example 1.1.1, theplength ds of
an infinitesimal part of the path follows from an infinitesimal part dx as ds = 1 + ẏ 2 (x) dx
dy(x)
where ẏ(x) = dx . So the total length of the path is
Z x1 q
1 + ẏ 2 (x) dx. (1.6)
x0
With the exception of the final example, the optimal solution – if one exists at all – is
typically hard to find.
3
xT
x1 (t )
x0
x2 (t )
t =0 t =T
Definition 1.2.1 (Simplest problem in the calculus of variations). Given a final time T > 0
and a function F : [0, T ] × Rn × Rn → R and states x 0 , x T ∈ Rn , the simplest problem in the cal-
culus of variations is to minimize the cost J (·) defined as
Z T
J (x(·)) = F (t , x(t ), ẋ(t )) dt (1.8)
0
The function J (·) is called the cost function and the integrand F (·) of this cost is sometimes
called the running cost. For n = 1 the problem is visualized in Fig. 1.4. Given the two points
(0, x 0 ) and (T, x T ) each smooth function x(·) between the two points determines a cost J (x(·))
as defined in (1.8) and the problem is to find the function x(·) that minimizes this cost.
This problem can be regarded as an infinite dimensional version of the standard prob-
lem of finding the minimizer z ∗ ∈ Rn of a function K : Rn → R. The difference is that the
function K (z) is replaced by an integral expression J (x(·)) and z ∈ Rn is replaced by functions
x : [0, T ] → Rn .
Most of the times we make the following two assumptions.
Assumption 1.2.2. The function F (t , x, y) in (1.8) is twice continuously differentiable in all its
components t , x, y. ä
Under the above assumptions we next derive a differential equation that every solution of
the simplest problem in the calculus of variations must satisfy. This differential equation is
the infinite-dimensional generalization of the well-known first-order condition that a z ∗ ∈ Rn
minimizes a differentiable function K : Rn → R only if the gradient vector ∂K∂z(z) is zero at z =
z∗ .
4
xT
x∗ (t ) + αδx (t )
x∗ (t )
x0
t =0 t =T
Proof. Suppose x ∗ (·) is a C 2 solution to the simplest problem in the calculus of variations and
let δx (·) be an arbitrary C 2 -function on [0, T ] that is zero at the boundaries,
δx (0) = δx (T ) = 0. (1.11)
with α ∈ R. Notice that x(t ) for every α ∈ R satisfies the boundary conditions x(0) = x ∗ (0) = x 0
and x(T ) = x ∗ (T ) = x T , see Fig. 1.5. Since x ∗ (·) is a minimizing solution for our problem we
have that
For fixed δx (·) the cost J (x ∗ (·) + αδx (·)) is a function of the scalar variable α,
The minimality condition (1.13) thus says that J¯(0) ≤ J¯(α) for all α ∈ R. Given that x ∗ (·), δx (·)
and F (·) are all assumed C 2 , it follows that the J¯(α) is differentiable and so the above implies
that J¯0 (0) = 0. This derivative is2
·Z T ¸
d
J¯0 (0) = F (t , x ∗ (t ) + αδx (t ), ẋ ∗ (t ) + αδ̇x (t )) dt
dα 0 α=0
Z T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) + δ̇x (t ) dt . (1.14)
0 ∂x T ∂ẋ T
R R
2 Leibniz’ integral rule says that d
dα
G(α, t )dt = dG(α,t dα
)
dt if G(α, t ) and dG(α,t
dα
)
are continuous in t and α.
1
Here they are continuous because F (t , x ∗ (t ), ẋ ∗ (t )) and δx (t ) are C .
5
Integration by parts of the second term in (1.14) yields3
Z T
∂F (t , x ∗ (t ), ẋ ∗ (t ))
δ̇x (t ) dt
0 ∂ẋ T
¯T Z T µ ¶
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯ d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= ¯
δx (t )¯ − δx (t ) dt
∂ẋ T 0 0 dt ∂ẋ T
| {z }
=0
Z Tµ d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
¶
=− δx (t ) dt . (1.15)
0 dt ∂ẋ T
Here the underbred term is zero because of the boundary conditions (1.11). Plugging (1.15)
into (1.14) and using that J¯0 (0) = 0 we find that
Z T ·µ ¶ ¸T
∂ d ∂
0= − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt . (1.16)
0 ∂x dt ∂ẋ
So far the perturbation δx (t ) in our derivation was some fixed function. However since the
perturbation can be arbitrarily chosen, the equality (1.16) must hold for every C 2 perturbation
δx (t ) that satisfies (1.11). But this implies, via the next lemma, that the term in between the
square brackets in (1.16) is zero, i.e. that (1.5.1) holds. ■
φ(t )
δx (t )
a t̄ b
for every C 2 -function δx : [0, T ] → Rn satisfying (1.11), if-and-only if φ(t ) is zero for all t ∈
[0, T ].
Proof. We prove it for n = 1. Figure 1.6 explains it all: suppose that φ(t ) is not the zero func-
tion, i.e. that φ(t̄ ) is nonzero for some t̄ ∈ [0, T ]. Then by continuity, φ(t ) is sign-definite on
some interval [a, b] ⊆ [0, T ] about t̄ (and with 0 ≤ a < b ≤ T ). Consider the function δx (t )
defined as
(
((t − a)(b − t ))3 t ∈ [a, b],
δx (t ) = (1.18)
0 elsewhere,
see Figure 1.6. Clearly this δx (t ) fulfils the requirements of (1.11) but it violates (1.17) because
both φ(t ) and δx (t ) are sign-definite on [a, b]. The assumption that φ(t̄ ) 6= 0 at some t̄ ∈ [0, T ]
hence is wrong. ■
6
This result was derived independently by Euler and Lagrange, and in honor of its inventors
Eqn. (1.5.1) is nowadays called the Euler-Lagrange equation (or the Euler equation). We want
to stress that the Euler-Lagrange equation is only a necessary condition for optimality. All it
guarantees is that a “small” perturbation of x ∗ (·) results in a “very small” change in cost. To
put it more mathematically, the solutions x ∗ (·) of the Euler-Lagrange equation are precisely
those functions for which for every allowable δx (·) we have
with o some little-o function. Such solutions x ∗ (·) are referred to as stationary solutions. They
might be minimizing J (x(·)), or maximizing J (x(·)), or neither.
Interestingly the Euler-Lagrange equation does not depend on the initial or final values
x 0 , x T . More on this in § 1.5.
with the boundary conditions y(x 0 ) = y 0 and y(x 1 ) = y 1 . One may expand (1.19) but in this
form the problem is still rather complicated. In the following section we use a more sophisti-
cated approach. ä
Example 1.2.7 (Shortest path, Example 1.1.3 continued). The Euler-Lagrange equation for
the shortest path problem described by (1.6) and (1.7) is
µ ¶q
∂ d ∂
0= − 1 + ẏ 2 (x) (1.20)
∂y dx ∂ ẏ
ÿ(x) = 0 (1.22)
which is another way of saying that y(x) is a straight line. In light of the boundary conditions
y(x 0 ) = y 0 and y(x 1 ) = y 1 it has the unique solution
y −y
y(x) = y 0 + x11 −x00 (x − x 0 ). (1.23)
This solution is not surprising. It is of course the solution, but formally we may not yet draw
this conclusion because the theory presented so far can handle only C 2 functions and it only
claims that solutions of (1.21) are stationary solutions and they need not be optimal. ä
Example 1.2.8 (Oil production, Example 1.1.2 continued). Corresponding to the criterion to
be minimized, (1.4), with the boundary conditions (1.5), we find the Euler-Lagrange equation
µ ¶
∂ d ∂ d
0= − (α`˙2 (t ) + β`(t )) = β − (2α`(t
˙ )) = β − 2α`(t
¨ ). (1.24)
∂` dt ∂`˙ dt
7
¨ )=
So `(t
β
2α , that is,
β 2
`(t ) = t + `1 t + `0 . (1.25)
4α
The constants `1 and `0 follow from the boundary conditions `(0) = 0 and `(T ) = L, i.e. `0 =
0, `1 = L/T − β/(4α)T . Of course, it still remains to be seen that the above `(t ) defined in
˙ ) ≥ 0 from (1.5) puts a
(1.25) is indeed minimizing (1.4). Notice that the extra constraint `(t
further restriction on the total amount of L and the final time T . ä
F (x, ẋ).
∂F
Obviously the partial derivative ∂t is zero now. An interesting implication is that then
µ ¶
∂
1 − ẋ T (t ) F (x(t ), ẋ(t ))
∂ẋ
is constant over time for solutions x(t ) of the Euler-Lagrange equation. To see this, we dif-
ferentiate the above with respect to time (and for ease of notation we momentarily write x(t )
simply as x):
µ ¶
d T ∂F (x, ẋ)
F (x, ẋ) − ẋ
dt ∂ẋ
d d ¡ T ∂F (x, ẋ) ¢
= F (x, ẋ) − ẋ
µdt dt ∂ẋ ¶ µ ¶
T ∂F (x, ẋ) T ∂F (x, ẋ) ∂F (x, ẋ) d ∂F (x, ẋ)
= ẋ + ẍ − ẍ T + ẋ T ( )
∂x ∂ẋ ∂ẋ dt ∂ẋ
µ ¶
∂F (x, ẋ) d ∂F (x, ẋ)
= ẋ T − . (1.26)
∂x dt ∂ẋ
This is zero for every solution x(t ) of the Euler-Lagrange equation. Hence every stationary
solution x ∗ (t ) to our problem has the property that
∂F (x ∗ (t ), ẋ ∗ (t ))
F (x ∗ (t ), ẋ ∗ (t )) − ẋ ∗T (t ) =C ∀t
∂ẋ T
for some integration constant C . This identity is known as the Beltrami identity. We illustrate
the usefulness of this identity by explicitly solving the brachistochrone problem. It is good to
realize, though, that the Beltrami identity is not equivalent to the Euler-Lagrange equation.
Indeed, every constant function x(t ) satisfies the Beltrami identity. From (1.26) it can be seen
that the Beltrami identity and the Euler-Lagrange equation are equivalent for scalar functions
x : [0, T ] → R if ẋ(t ) is nonzero for almost all time. It is good to keep this in mind.
Example 1.3.1 (Brachistochrone, Example 1.1.1 continued). The running cost F (x, y, ẏ) in
the brachistochrone problem is
p
1 + ẏ 2
F (y, ẏ) = p .
2g y
8
↑
y
0 x→ πc 2
x→
A
y B
↓
2 2
F IGURE 1.7: Top: shown in red is the cycloid x(φ) = c2 (φ − sin(φ)), y(φ) = c2 (1 − cos(φ)) for
φ ∈ [0, 2π]. It is the curve that a point on a rolling disc of radius c 2 /2 traces out. Bottom: a
downwards facing cycloid (solution of the Brachistochrone problem). See Example 1.3.1
A = (0, 0) x→
y
→
F IGURE 1.8: Cycloids (1.28) for various c > 0. Given a B to the right and below A = (0, 0) there is
a unique cycloid that joins A and B , see Example 1.3.1
∂F (y(x), ẏ(x))
C = F (y(x), ẏ(x)) − ẏ(x)
∂ ẏ
p
1 + ẏ 2 (x) ẏ 2 (x)
= p −p
2g y(x) 2g y(x)(1 + ẏ 2 (x))
1
=p .
2g y(x)(1 + ẏ 2 (x))
c2 c2
x(φ) = (φ − sin(φ)), y(φ) = (1 − cos(φ)). (1.28)
2 2
The curve (x(φ), y(φ)) is known as the cycloid. It is the curve that a fixed point on the bound-
ary of a wheel with radius c 2 /2 traces out while rolling without slipping on a horizontal line
4 Quick derivation: since the cotangent cos(φ/2)/ sin(φ/2) for φ ∈ [0, 2π] ranges over all real numbers once (in-
cluding ±∞) it follows that any dy/dx can uniquely be written as dy/dx = cos(φ/2)/ sin(φ/2) with φ ∈ [0, 2π].
Then (1.27) implies that y = c 2 /(1 + cos2 (φ/2)/ sin2 (φ/2)) = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2 and then dx/dφ =
(dy/dφ)/(dy/dx) = [c 2 sin(φ/2) cos(φ/2)]/[cos(φ/2)/ sin(φ/2)] = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2. Integrating this
expression shows that x(φ) = c 2 (φ − sin(φ))/2 + d where d is some integration constant. This d = 0 because
(x, y) = (0, 0) is on the curve. (See Exercise 1.6 for more details.)
9
(think of the valve on your bike’s wheel), see Fig. 1.7. For the cycloid, Beltrami and Euler-
Lagrange are equivalent because ẏ(x) is nonzero almost everywhere. Hence all smooth-
enough stationary solutions of the Brachistochrone problem are precisely these cycloids.
Varying c in (1.28) generates a family of cycloids, see Fig. 1.8. Given a destination point
B to the right and below A = (0, 0) there is a unique cycloid that connects A and B , and the
solution of the brachistochrone problem is that segment of the cycloid. Notice that for certain
destinations B the curve extends below the final destination! ä
y max y max
y(x)
−1 x 1
dx
Example 1.3.2 (Minimal surface). Warning, this is an elaborate example. We seek a positive
function y(x) whose surface of revolution about the x-axis has minimal area, see Fig. 1.9. The
length of the axis we take equal to 2 and we take it symmetric around zero, and the boundary
conditions we assume to be the same
10
y max →
y G = 1.895
y ∗ = 1.509
surface
area a < a∗
17.16 a > a∗
surface
area Goldschmidt
22.56
17.16 a > a∗
y∗ yG y max → (c)
F IGURE 1.10: (a) y max = a cosh(1/a) as a function of a. The minimal y max is y ∗ ≈ 1.509 (attained
at a ∗ ≈ 0.834). (b) the area of the catenoid as a function of y max . (c) the area of the catenoid (in
red) and of the Goldschmidt two-disc solution (in blue) as a function of y max . The areas are the
same at y G = 1.895. This y G corresponds to a G = 1.564 (see part (a) of this figure)
11
Since the radius y(x) is nonnegative we have that a defined as a = C /(2π) is ≥ 0. Squaring left
and right hand-side we end up with
This can be derived using separation of variables5 (see Appendix A.3). Figure 1.9 is an example
of such a hyperbolic cosine. The two-dimensional surface of revolution of a hyperbolic cosine
is called catenoid. From the shape of hyperbolic cosines it will be clear that for every a > 0
the derivative ẏ(x) is nonzero almost everywhere, and so Beltrami and Euler-Lagrange are
equivalent.
Are such hyperbolic cosines optimal solutions? Not necessarily, and Figure 1.10(a) con-
firms this. It shows the boundary value
It is interesting to plot this against y max = a cosh(1/a). This is done in Fig. 1.10(b). The black
curve is for a < a ∗ and the red curve is for a > a ∗ . This shows that for a given y max > y ∗ the
area of the catenoid is the smallest for the largest of the two a’s. Thus we need only consider
a ≥ a∗ .
Now the case y max < y ∗ . Then no hyperbolic cosine meets the boundary condition. What
does this mean? It means that no smooth function y(x) exists that is stationary and satisfies
y max < y ∗ . A deeper analysis (see Exercise 1.8) shows that the only other stationary curve is
the so-called Goldschmidt two-disc solution. This is when y(x) = 0 in the interior, and y(±1) =
y max at the boundary, see Fig. 1.11. In this case the area of the surface of revolution is the sum
2
of the areas of the two discs, 2πy max .
It can be shown that a global minimal solution exists and since it must be stationary it
is either the Goldschmidt two-disc solution or the catenoid for an appropriate a ≥ a ∗ . If
y max < y ∗ then clearly the Goldschmidt solution is the only stationary solution, hence must
be globally optimal. Now, for y max > y ∗ something odd occurs: Fig. 1.10(c) gives us the area of
the surface of revolution of the Goldschmidt two-disc solution as well as that of the catenoid.
We see that there is a point y G = 1.8950254556 at which the Goldschmidt and catenoid so-
lution have the same area. This point is attained at a G = 1.5643765887. For y max > y G the
catenoid (for the corresponding a > a G ) has the smallest area, hence is then globally optimal,
but for y max < y G it is the Goldschmidt two-disc solution that is globally optimal. The conclu-
sion is that the optimal solution is discontinuous in y max ! ä
5 There is, however, a technicality in this derivation that is often overlooked, see Exercise 1.7, but we need not
12
y max y max
x(0) = x 0 , x(T ) = x T
(1.32)
ẋ(0) = x 0d , ẋ(T ) = x Td
Proof. Define J¯(α) = J (x ∗ (·)+αδx (·)). Then, as before, the derivative J¯0 (0) is zero. Analogously
to (1.14) we compute J¯0 (0). For ease of exposition we momentarily skip all time arguments in
x ∗ (t ) and δx (t ) and, sometimes, F :
· Z ¸
d T
0 = J¯0 (0) = F (t , x ∗ + αδx , ẋ ∗ + αδ̇x , ẍ ∗ + αδ̈x ) dt
dα 0 α=0
Z T
∂F ∂F ∂F
= δ + T δ̇x + T δ̈x dt .
T x
(1.34)
0 ∂x ∂ẋ ∂ẍ
Integration by parts of the second term of the integrand yields
Z T · ¸T Z T Z T
∂F ∂F d ¡ ∂F ¢ d ¡ ∂F ¢
δ̇ x dt = δ x − δ x dt = − δx dt .
0 ∂ẋ ∂ẋ T 0 dt ∂ẋ 0 dt ∂ẋ
T T T
0
| {z }
=0
The last equality follows from the boundary condition that δx (0) = δx (T ) = 0. Integration by
parts of the third term in (1.34) similarly gives
Z T · ¸T Z T Z T
∂F ∂F ¡ d ∂F ¢ ¡ d ∂F ¢
δ̈ x dt = δ̇ x − δ̇ x dt = − δ̇x dt
0 ∂ẍ ∂ẍ T dt ∂ẍ T dt ∂ẍ T
T
0 0
| {z 0}
=0
13
y =0 y =0
z =0 z =ℓ
where now the second equality is the result of the boundary conditions that δ̇x (0) = δ̇x (T ) = 0.
Another time integration by parts of this just obtained term yields
Z T¡ · µ ¶ ¸T Z T µ 2 ¶
d ∂F ¢ d ∂F d ∂F
− δ̇x dt = − δx + δx dt .
0 dt ∂ẍ T dt ∂ẍ T 0 0 dt 2 ∂ẍ T
| {z }
=0
Combination of both integration by parts procedures (one time for the second term, and two
times for the third term) and Lemma 1.2.5 thus yields (1.33). ■
Example 1.4.2 (Elastic bar). Consider a horizontal elastic bar, loaded by weights and sup-
ported at its two ends. The equilibrium of the bar is determined by the condition that its po-
tential energy is minimal, see Fig. 1.12. Denote by ` the length of the bar, and by z ∈ [0, `] the
spatial variable. The small vertical displacement caused by the loading of the bar is denoted
by y(z), while ρ(z) denotes the load per unit length. We assume that the bar has a uniform
cross-section (independent of z). If the curvature of the elastic bar is not too large then the
potential energy due to elastic forces is proportional to the square of the second derivative,
Z
k ` d2 y(z) 2
V1 = ( ) dz
2 0 dz 2
where k is a constant depending on the elasticity of the bar. Furthermore, the potential energy
due to gravity is given by
Z `
V2 = g ρ(z)y(z) dz
0
hence the minimum of the potential energy is obtained by minimizing the integral
Z ` k d2 y(z) 2
( ) + g ρ(z)y(z) dz.
0 2 dz 2
The Euler-Lagrange equation (1.33) of this variational problem is the fourth-order differential
equation
d4 y(z)
k = −g ρ(z) ∀z ∈ [0, `]. (1.35)
dz 4
If ρ(z) is constant then y(z) is a polynomial of degree 4. Fig. 1.12 depicts a solution for the
boundary conditions y(0) = y(`) = 0 and ẏ(0) = ẏ(`) = 0. In this case the solution is y(z) =
gρ
− 4!k (z(z − `))2 . ä
14
1.5 Relaxed boundary conditions
In the problems considered so far, the initial state x(0) and final state x(T ) were fixed. A useful
extension is obtained by removing some of these conditions. This means that we allow more
functions x(t ) to optimize over, and, consequently, that the first-order conditions expand. As
an example, suppose the state x(t ) has 3 components and that the first component of the
initial state and the last component of the final state are now free to choose,
free fixed
x(0) = fixed , x(T ) = fixed . (1.36)
fixed free
Following the exact same arguments as in the proof of Thm. 1.2.4, the necessary first-order
condition is that
¯T Z T µµ ¶ ¶T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯ ∂ d ∂
¯
δx (t )¯ + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0 (1.37)
∂ẋ T 0 0 ∂x dt ∂ẋ
at the optimal solution for all allowable perturbations. In particular it must be zero for all
perturbations δx (t ) that are zero at t = 0 and t = T . For these special perturbations the first-
order condition reduces to
Z T µµ ¶ ¶T
∂ d ∂
− F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0
0 ∂x dt ∂ẋ
for all such δx (t ). But this means precisely that the Euler-Lagrange equation must hold. This
proves that also for relaxed boundary conditions the Euler-Lagrange equations hold (this was
to be expected). Knowing this, the first-order conditions (1.37) simplify to
¯T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯
δ x (t )¯ = 0. (1.38)
∂ẋ T ¯
0
When is this equal to zero for all allowable perturbations? Since the perturbed state x(t ) =
x ∗ (t ) + αδx (t ) for our example must obey the boundary condition (1.36) it follows that the
allowable perturbations are exactly those that satisfy
free 0
δx (0) = 0 , δx (T ) = 0 .
0 free
In view of this it will be clear that the first-order condition (1.38) holds for all allowable δx (t )
iff
0 free
∂F (0, x(0), ẋ(0)) ∂F (T, x(T ), ẋ(T ))
= free , = free .
∂ẋ ∂ẋ
free 0
This example demonstrates that to every initial or final state component that is free to choose
there corresponds a condition on the derivative of F (·) with respect to that component of ẋ.
Incidentally, by allowing states with free entries at initial and final time, it can now make sense
to include an initial- and/or a final cost to the cost function:
Z T
J (x(·)) = F (t , x(t ), ẋ(t )) dt +G(x(0)) + S(x(T )). (1.39)
0
Here G(x(0)) denotes an initial cost and S(x(T )) a final cost (aka terminal cost). The addi-
tion of these two costs does not complicate matters much: the above example generalizes as
follows.
15
Proposition 1.5.1 (Relaxed boundary conditions). Let T > 0 and suppose F (t , x, y) is C 2 and
that S(x),G(x) are C 1 . Let I, J be subsets of {1, . . . , n} and consider the functions x : [0, T ] → Rn
whose initial x(0) and final x(T ) are fixed except for the components:
Among these functions, a C 2 function x ∗ (·) is a stationary solution of the cost (1.39) if-and-
only if it satisfies the Euler-Lagrange equation together with
A common special case of this is the free end-point problem, which is when the initial
state is completely fixed and the final state is completely free. This means I = ; and J =
{1, . . . , n} and so (1.41) holds for the entire state vector x ∈ Rn :
over all functions x : [−1, 1] → R. First we solve the standard problem, so with both initial and
final state fixed, for instance, assume that
The running cost α2 x 2 (t ) + ẋ 2 (t ) is a sum of two squares, so with minimization we would like
both terms small, but one depends on the other. The parameter α models a trade-off between
making ẋ 2 (t ) small and x 2 (t ) small. Whatever α is, the optimal solution x(t ) needs to satisfy
the Euler-Lagrange equation,
µ ¶
∂ d ∂ d
0= − (α2 x 2 (t ) + ẋ 2 (t )) = 2α2 x(t ) − (2ẋ(t )) = 2α2 x(t ) − 2ẍ(t ).
∂x dt ∂ẋ dt
Therefore ẍ(t ) = α2 x(t ). This differential equation can be solved using characteristic equa-
tions (do this yourself, see Appendix A.4) and the general solution is
with c, d two arbitrary constants. The two constants follow from the two boundary condi-
tions (1.44):
16
The solution is c = d = 1/(eα + e−α ). That c equals d is expected because of the symmetry
of the boundary conditions. We see that there is exactly one function x(t ) that satisfies the
Euler-Lagrange equation and that meets the boundary conditions:
α = 1/2
For α = 0 the solution is a constant x ∗ (t ) = 1 which, in hind side, is not a surprise because
for α = 0 the running cost is just F (t , x(t ), ẋ(t )) = ẋ 2 (t ) and clearly then a zero derivative (a
constant x(t )) is optimal. For large values of α, on the other hand, the x 2 (t ) is penalized
strongly in F (t , x(t ), ẋ(t )) = ẋ 2 (t ) + α2 x 2 (t ) and so then it pays to take x(t ) close to zero, even
if that is to the expense of some increase of ẋ 2 (t ). Indeed this is what happens.
Consider next the free end-point problem with
We stick to the same cost function (1.43). In the terminology of (1.39) this means we take
)
the initial and final cost equal to zero, G(x) = S(x) = 0. Hence ∂S(x(T
∂x = 0 and then the free
end-point boundary constraint (1.42) becomes
∂α2 x 2 (t ) + ẋ 2 (t ) ¯¯
0= ¯ = 2ẋ(1).
∂ẋ t =1
The parameters c, d in (1.45) now follow from the initial condition x(−1) = 1 and the above
boundary constraint 0 = ẋ(1):
The solution is
e−α e+α
c= , d= ,
e2α + e−2α e2α + e−2α
(check it yourself). So again the first-order conditions has a unique solution,
α = 1/2
eα(t −1) + e−α(t −1)
x ∗free (t ) = α=1
e2α + e−2α
α=2
t = −1 t = +1
The free end-point condition is that the derivative of x(t ) is zero at the final time. Again we
see that x(t ) approaches zero faster if α increases. Makes sense. ä
17
Let q ∈ Rn denote the position in Cartesian coordinates of some point mass. Newton’s
d
second law states that the derivative of momentum dt (m q̇(t )) of a point mass with mass m
equals the net force F (t ) exerted on the mass,
d
(m q̇(t )) = F (t ). (1.46)
dt
In a conservative force field (such as the gravitational force field) the force depends on q alone
∂
and can be expressed as the negative of a gradient, F (q) = − ∂q U (q), of some function U :
d
Rn → R called the potential energy. Then the equation of motion (1.46) becomes dt (m q̇(t )) =
∂
− ∂q U (q(t )). We reorder it as
∂ d
− U (q(t )) − (m q̇(t )) = 0. (1.47)
|
∂q
{z } |dt {z }
net force derivate of momentum
φ
ℓ
mass
Example 1.6.1 (Pendulum). Consider the pendulum (Fig. 1.13). Here polar coordinates are
more appropriate than Cartesian coordinates. The mass of the particle is denoted m, the
angle with respect to the vertical hanging position is denoted φ(t ) and the length of cable is
`. The kinetic energy then is
18
and the potential energy is
The angle φ is our (generalized) coordinate vector q. Now Euler-Lagrange says that the equa-
tion of motion is determined by
∂ d ∂
0=( − d
)L(q(t ), q̇(t )) = mg ` sin(φ(t )) − dt (m`2 φ̇(t )).
∂φ dt ∂φ̇
g
That is the familiar φ̈(t ) = − ` sin(φ(t )). ä
is referred to as the action integral and Newton’s second law just means that trajectories q(t )
are stationary functions of this action integral. This stationary property is known as Hamil-
ton’s principle.
Since L(q, q̇) does not explicitly depend on time, we have according to Beltrami that
∂L(q(t ), q̇(t ))
q̇(t ) − L(q(t ), q̇(t )) (1.51)
∂q̇ T
is constant over time for all trajectories. Exploiting the quadratic nature of the kinetic en-
ergy (1.50) this constant function is
It is nothing but the sum of kinetic and potential energy, i.e. the total energy. This proves that
the total energy K (q̇) +U (q) is preserved over time for all trajectories in a conservative force
field.
It is easy to see that the Beltrami function (1.51) equals the Hamiltonian H̃ (q, p) defined
as
∂L(q, q̇)
p= . (1.53)
∂q̇
By the way, for p equal to the momentum (1.53), the Hamiltonian H̃ (q, p) defined in (1.52)
is indeed a function of q, p alone (and not q̇) because ∂ H̃ /∂q̇ = p − ∂L(q, q̇)/∂q̇ = 0. Since
∂L(q,q̇)
the Beltrami function is constant over time, obviously also the Hamiltonian H̃ (q, ∂q̇ ) is
19
constant over time. With Hamiltonians and generalized momenta we can expand the Euler-
Lagrange equation (1.49) into two first-order differential equations, known as the Hamilto-
nian equations
∂ H̃ (q(t ), p(t ))
q̇(t ) = ,
∂p
∂ H̃ (q(t ), p(t ))
ṗ(t ) = − .
∂q
Example 1.6.2 (Newton’s apple). Consider an apple of mass m subject to a downwards grav-
itational acceleration g . With y the upwards position, the difference of kinetic and potential
energy becomes
L(y, ẏ) = 21 m ẏ 2 − mg y.
∂L(y, ẏ)
Now the momentum by definition is p = ∂ ẏ = m ẏ and the Hamiltonian is
p2
H̃ (y, p) = p ẏ + mg y − 21 m ẏ 2 = + mg y.
2m
As predicted, the dependence on ẏ cancels in H̃ . The two first-order differential equations
(i.e. the Hamiltonian equations) are
∂ H̃ (y(t ), p(t )) 1
ẏ(t ) = = p(t ),
∂p m
∂ H̃ (y(t ), p(t ))
ṗ(t ) = − = −mg .
∂y
Example 1.6.3 (Pendulum). In the pendulum example (Example 1.6.1) we took the angle q :=
φ as (generalized) coordinate, and we found the Lagrangian
m 2 2
L(φ, φ̇) = ` φ̇ + mg ` cos(φ).
2
Its generalized momentum p (with respect to this q = φ) is
∂L(q, q̇)
p := = m`2 φ̇
∂φ̇
which is known as the angular momentum. Its Hamiltonian is
m 2 2
H̃ (φ, p) = p φ̇ − ` φ̇ − mg ` cos(φ) = p 2 /(2m`2 ) − mg ` cos(φ).
2
As predicted, q̇ := φ̇ cancels in the Hamiltonian. The total energy equals
m 2 2
H̃ (φ, m`2 φ̇) = ` φ̇ − mg ` cos(φ)
2
and the Hamiltonian equations are
20
1.7 Second order conditions for minimality
The Euler-Lagrange equation was derived from the condition that optimal solutions x ∗ (·) are
necessarily stationary solutions, i.e. solutions for which
for every admissible perturbation δx (·). Now stationary solutions need not be minimizing
solutions. To be minimizing the above term “o(α)” needs to be nonnegative in a neighbor-
hood of α = 0. In this section we analyze this problem. We derive a necessary condition and
a sufficient condition for stationary solutions to be minimizing. These conditions are second-
order conditions and they require a second-order Taylor series expansion of F (t , x, y) for fixed
t around (x, y):
F (t , x + δx ,y + δ y )
· ¸· ¸
∂F (t , x, y) ∂F (t , x, y) δx
= F (t , x, y) +
∂x T ∂y T δy
2 2
∂ F (t , x, y) ∂ F (t , x, y)
· ¸ · ¸
1£ T ¤ ∂x∂x T
T ∂x∂y T δx δx
+ δ δ + o( ).
y 2
2
x
∂ F (t , x, y) ∂2 F (t , x, y) δ y δy
∂y∂x T ∂y∂y T
| {z }
Hessian
Theorem 1.7.1 (Legendre condition – 2nd order necessary condition). Consider the sim-
plest problem in the calculus of variations and suppose Assumptions 1.2.2 and 1.2.3 are sat-
isfied. Let x ∗ (t ) be a solution of the Euler-Lagrange equation (1.5.1), meeting the boundary
conditions (1.9). Necessary for x ∗ (t ) to be minimizing is that
∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ]. (1.54)
∂ẋ∂ẋ T
Proof. For ease of notation we prove it for the case that x has one component. Similar to
the proof of Thm. 1.2.4, let δx (t ) be a C 2 -perturbation on [0, T ] that satisfies the boundary
conditions (1.11). Let α ∈ R. Defining J¯(α) as
we have by construction that every solution x ∗ (·) of the Euler-Lagrange equation makes
J¯0 (0) = 0. Using the second order Taylor series of J¯(α) at α = 0 we find (and skipping time
arguments) that
2
Z T ∂ F (t ,x ∗ ,ẋ ∗ ) ∂2 F (t ,x ∗ ,ẋ ∗ ) · ¸
£ ¤
J¯00 (0) = δx δ̇x 2
∂x 2 ∂x∂ẋ δx dt (1.55)
2
0 ∂ F (t ,x ∗ , ẋ ∗ ) ∂ F (t ,x ∗ , ẋ ∗ ) δ̇x
∂x∂ẋ ∂ẋ 2
| {z }
Hessian
Z T 2 2 2
∂ F 2 ∂ F ∂ F 2
= δ
∂x 2 x
+ 2 ∂x∂ ẋ δx δ̇x + ∂ẋ 2 δ̇x dt . (1.56)
0
For an optimal x ∗ (·) this has to be nonnegative for every allowable δx (·). This does not imply
that the Hessian is positive semi-definite because δx (·) and δ̇x (·) are related. Indeed, using
21
integration by parts the cross term can be simplified as
Z T Z T ¯T Z T
∂2 F ∂2 F d 2 ∂2 F 2 ¯ d ∂2 F 2
2 ∂x∂ ẋ δx δ̇x dt = ∂x∂ẋ ( dt δx ) dt = ∂x∂ẋ δx ¯ − ( dt ∂x∂ẋ )δx dt .
0 0 0 0
| {z }
0
Therefore
Z T 2
∂2 F 2 2
J¯00 (0) = d
[ ∂∂xF2 − dt ∂x∂ẋ ]δx + ∂∂ẋF2 δ̇2x dt . (1.57)
0
∂2 F
Lemma 1.7.2 below shows that (1.57) is non-negative for every possible δx (·) only if ∂ẋ 2
is
nonnegative definite for the candidate x ∗ (·) and all time. I.e. only if (1.54) holds. ■
Lemma 1.7.2 (Technical lemma). Let φ(t ) and ψ(t ) be continuous functions on [0, T ] and
suppose that
Z T
φ(t )δ2x (t ) + ψ(t )δ̇2x (t ) dt ≥ 0 (1.58)
0
Proof. Suppose, to obtain a contradiction, that ψ(t̄ ) < 0 for some t̄ ∈ [0, T ]. Then for every
² > 0 we can construct a possibly small interval [a, b] about t̄ in [0, T ] and a C 2 -function δx (·)
on [0, T ] that is zero for t 6∈ [a, b] and satisfies
Z b Z b
δ2x (t ) dt < ² and δ̇2x (t ) dt > 1.
a a
This may be clear from Figure 1.14. Such a δx (t ) satisfies all the conditions of the lemma but
renders the integral in (1.58) negative for small enough ² > 0. That is a contradiction hence
the assumption that ψ(t̄ ) < 0 is wrong. ■
δx (t )
0 a T
b
ψ(t )
F IGURE 1.14: About the construction of a δx (t ) that violates (1.58), see the proof of Lemma 1.7.2
This second order condition (1.54) is known as the Legendre condition. Notice that the
2
inequality (1.54) means that ∂ F (t ,x ∗ (t ), ẋ ∗ (t ))
∂ẋ∂ẋ T (which is an n × n matrix if x has n components)
is a symmetric positive semi-definite matrix at every moment in time.
˙ = α`˙2 + β`
F (t , `, `) (1.60)
and so
˙
∂2 F (t , `, `)
= 2α. (1.61)
∂`˙ 2
22
As remarked earlier, a maximization problem is obtained from a minimization problem
by changing the sign of F (t , x, ẋ).
In the preceding two examples the Legendre condition was easy to verify because the sec-
ond derivative turned out to be trivially nonnegative for all t , x, ẋ and not just for the optimal
t , x ∗ (t ), ẋ ∗ (t ).
The Euler-Lagrange condition together with the Legendre condition are necessary but are
not sufficient for minimality. This is confirmed by the next example.
Example 1.7.4 (Stationary solution, but not a minimizer). The Euler-Lagrange equation for
the minimization of
Z 1µ ¶
ẋ(t ) 2
− x 2 (t ) dt (1.62)
0 2π
is the differential equation (2π)2 x(t ) + ẍ(t ) = 0. Given the boundary conditions
x(0) = x(1) = 0
x(t ) = A sin(2πt ), A ∈ R.
Each of these solutions x(t ) meets the Legendre condition (1.54) since
∂2 F (t , x(t ), ẋ(t )) 2
2
= > 0.
∂ẋ (2π)2
Also, each such x(t ) renders the integral in (1.62) equal to zero. There are however many other
functions x(t ) that achieve x(0) = x(1) = 0 but for which the integral (1.62) takes a negative
value. For example x(t ) = −t 2 + t . By scaling this last function with a constant we can make
the cost as negative as we desire. In this case there is no optimal solution x ∗ (t ). ä
The proof of the Legendre condition actually provides us with an elegant sufficient condi-
tion for optimality, in fact for global optimality. If the Hessian, defined earlier as
∂2 F (t , x, y) ∂2 F (t , x, y)
∂x∂x T ∂x∂y T
H (t , x, y) := 2 , (1.63)
∂ F (t , x, y) ∂2 F (t , x, y)
∂y∂x T ∂y∂y T
is positive semi-definite for all x ∈ Rn and all y ∈ Rn and all t ∈ [0, T ] then at each t the running
cost F (t , x(t ), ẋ(t )) is convex in x(t ), ẋ(t ). For convex functions it is known that stationarity
implies global optimality:
Theorem 1.7.5 (Convexity). Consider the simplest problem in the calculus of variations and
suppose that F (t , x, y) is C 2 . If the Hessian (1.63) is positive semi-definite for all x, y ∈ Rn and
all t ∈ [0, T ] then every solution x ∗ (·) of the Euler-Lagrange equation that meets the boundary
conditions is a global optimal solution. If the Hessian is positive definite then this x ∗ (·) is the
unique optimal solution.
Proof. Suppose that the Hessian is positive semi-definite. Let x ∗ (·), x(·) be two functions that
satisfy the boundary conditions and suppose x ∗ (·) satisfies Euler-Lagrange. Let δ(t ) = x(t ) −
x ∗ (t ) and define J¯(α) = J (x ∗ (·) + αδ(·)). This way J¯(0) = J (x ∗ (·)) while J¯(1) = J (x(·)). We need
to prove that J¯(1) ≥ J¯(0).
23
As before we have that J¯0 (0) is zero by the fact that x ∗ (·) satisfies the Euler-Lagrange equa-
tion.
The second derivative of J¯(α) with respect to α is (skipping time arguments)
Z T · ¸
£ ¤ δ
¯00
J (α) = δ δ̇ H (t , x ∗ + αδ, ẋ ∗ + αδ̇) dt .
0 δ̇
Since H (t , x, y) is positive semi-definite for all x, y ∈ Rn and all time, we see that J¯00 (α) ≥ 0 for
all α ∈ R. Therefore for every β ≥ 0 there holds
Z β
¯0 ¯0
J (β) = J (0) + J¯00 (α)dα ≥ J¯0 (0) = 0.
0
R1
But then J¯(1) = J¯(0) + 0 J¯0 (β) dβ ≥ J¯(0) which is what we had to prove.
Next suppose that H (t , x, y) is positive definite and that x(·) 6= x ∗ (·). Then δ(·) is not the
zero function and so by positive definiteness of H (t , x, y) we have J 00 (α) > 0 for every α ∈ [0, 1].
Then J (x(·)) = J¯(1) > J¯(0) = J (x ∗ (·)) for every x(·) 6= x ∗ (·). ■
This result produces a lot, but also requires a lot. Indeed the convexity assumption fails
in many cases of interest. Here are a couple examples where the convexity assumption is
satisfied.
∂2 F (x, y, ẏ) 1
2
= > 0. (1.64)
∂ ẏ (1 + ẏ 2 )3/2
It is positive for all time and all y, ẏ, in particular for solutions (y(t ), ẏ(t )) of the Euler-Lagrange
equation. This implies that the solution found in Example 1.2.7 – namely the line through the
points (x 0 , y 0 ) and (x 1 , y 1 ) – satisfies Legendre’s condition. The Hessian (1.63) is
" #
0 0
H (x, y, ẏ) = ≥ 0.
0 (1+ ẏ12 )3/2
It is positive semi definite hence the straight-line solution is globally optimal. No surprise. ä
Clearly this is positive definite for every α 6= 0 and hence the solution of the Euler-Lagrange
equation found in Example 1.5.2 is the unique global optimal solution of the problem. ä
One can pose many different types of problems in the calculus of variations by giving
different boundary conditions, for instance involving ẋ(T ), or by imposing further constraints
on the required solution. An example of the latter is presented in Example 1.1.2, see (1.5) and
we deal with another one in the next section. The Legendre condition (1.54) is only one of
the second order conditions for optimality. Additional second order conditions go under the
names of Weierstrass and Jacobi.
24
F IGURE 1.15: Three areas enclosed by ropes of the same length
The standard example of this type is a version of Queen Dido’s isoperimetric problem, which
is the problem to enclose an area as large as possible by a rope of given length. Intuition tells
us that the optimal area is a disc (the right-most option in Fig. 1.15). To put it more math-
ematically, in Dido’s problem we want to find a function x : [0, T ] → R with given boundary
values x(0) = x 0 , x(T ) = x T , that maximizes the area
Z T
J (x(·)) = x(t ) dt
0
for a given `.
How to solve such constraint minimization problems? A quick-and-dirty argument goes
as follows: from basic calculus it is known that the solution of a minimization problem of
some function J (x(·)) subject to a constraint C (x(·)) − c 0 = 0 is a stationary solution of the
Lagrangian defined as
for some constant Lagrange parameter7 µ. The stationary solutions (x ∗ (·), µ∗ ) of J¯(x(·), µ) ac-
cording to Euler-Lagrange satisfies
µ ¶
∂ d ∂
− (F (t , x ∗ (t ), ẋ ∗ (t )) + µ∗ M (t , x ∗ (t ), ẋ ∗ (t )) = 0.
∂x dt ∂ẋ
Below we formally prove that this argument is, in essence, correct. This may sound a bit
vague, but it does put us on the right track. The theorem presented next is motivated by
the above but the proof is given from scratch. The proof assumes knowledge of the Inverse
Function Theorem.
7 Lagrange parameters are usually denoted as λ. We use µ in order to avoid a a confusion in the next chapter.
25
Theorem 1.8.1 (Euler-Lagrange for integral-constraint minimization). Let c 0 be some con-
stant. Suppose that F (t , x, y) and M (t , x, y) are C 2 in all of its components and that x ∗ (·) is a
minimizer of
Z T
F (t , x(t ), ẋ(t )) dt
0
subject to
Z T
M (t , x(t ), ẋ(t )) dt = c 0
0
and that x ∗ (·) is C 2 . Then either there is a constant Lagrange parameter µ∗ ∈ R such that
µ ¶
∂ d ∂
− (F (t , x ∗ (t ), ẋ ∗ (t )) + µ∗ M (t , x ∗ (t ), ẋ ∗ (t ))) = 0 (1.65)
∂x dt ∂ẋ
Proof. This is not an easy proof. Suppose x ∗ (·) solves the constrained minimization problem
and fix two C 2 functions δx (·), ²x (·) that vanish at the boundaries
δx (0) = 0 = ²x (0), δx (T ) = 0 = ²x (T ).
RT RT
Define J (x(·)) = 0 F (t , x(t ), ẋ(t )) dt and C (x(·)) = 0 M (t , x(t ), ẋ(t )) dt and consider the map-
ping that sends two real numbers (α, β) to the two real numbers
· ¸ · ¸
J¯(α, β) J (x ∗ (·) + αδx (·) + β²x (·))
:= .
C̄ (α, β) C (x ∗ (·) + αδx (·) + β²x (·))
If the Jacobian
¯ ¯
∂ J (α, β) ∂ J¯(α, β) ¯
¯
∂α ∂β ¯
JAC := ¯ (1.67)
∂C̄ (α, β) ∂C̄ (α, β) ¯¯
¯
∂α ∂β (α=0,β=0)
of this mapping is nonsingular at (α, β) = (0, 0) then by the inverse function theorem there is
a neighborhood of (α, β) = (0, 0) on which the mapping is invertible. In particular we then
can find small enough α, β such that C̄ (α, β) = C̄ (0, 0) = c 0 —hence satisfying the integral
constraint—but rendering a cost J¯(α, β) smaller than J¯(0, 0) = J (x ∗ (·)). This contradicts that
x ∗ (·) is minimizing. Conclusion: at an optimal x ∗ (·) the Jacobian (1.67) is singular for every
allowable δx (·), ²x (·).
We rewrite the Jacobian (1.67) in terms of F (·) and M (·). To this end let f (t ) and m(t )
denote the functions
µ ¶
∂ d ∂
f (t ) = − F (t , x ∗ (t ), ẋ ∗ (t )),
∂x dt ∂ẋ
µ ¶
∂ d ∂
m(t ) = − M (t , x ∗ (t ), ẋ ∗ (t )).
∂x dt ∂ẋ
26
This way the Jacobian (1.67) becomes (verify this yourself)
Z Z
f (t )δx (t )dt f (t )²x (t )dt
JAC = Z Z . (1.68)
m(t )δx (t )dt m(t )²x (t )dt
If m(t ) = 0 for all t then (1.66) holds and the proof is complete. Remains to consider the case
that m(t 0 ) 6= 0 for at least one t 0 . Suppose, to obtain a contraction, that given such a t 0 there
is a t for which
· ¸
f (t 0 ) f (t )
(1.69)
m(t 0 ) m(t )
is nonsingular. Now take δx (·) to have support around t 0 and ²x (·) to have support around
t . Then by nonsingularity of (1.69) also (1.68) is nonsingular if the support is taken small
enough. However nonsingularity of the Jacobian is impossible by the fact that x ∗ (·) solves the
minimization problem. Therefore (1.69) is singular at every t . This means that
f (t 0 )m(t ) − f (t )m(t 0 ) = 0 ∀t .
In other words f (t ) + µ∗ m(t ) = 0 for µ∗ = − f (t 0 )/m(t 0 ) for all t . ■
The theorem says that the solution x ∗ (·) satisfies either (1.65) or (1.66). The first of these
two is often called the normal case, and the second the abnormal case. The next example
indeed suggests that we usually have the normal case.
R1
Example 1.8.2 (Normal & abnormal). Consider minimizing 0 x(t ) dt subject to the bound-
ary conditions x(0) = 0, x(1) = 1 and integral constraint
Z 1
ẋ 2 (t ) dt = C (1.70)
0
for some given C . The (normal) Euler-Lagrange equation (1.65) says that
µ ¶
∂ d ∂ d
0= − (x(t ) + µẋ 2 (t )) = 1 − (2µẋ(t )) = 1 − 2µẍ(t ).
∂x dt ∂ẋ dt
1 2
The general solution of this equation is x(t ) = 4µ t + bt + c and these satisfy the boundary
conditions x(0) = 0, x(1) = 1 iff
1 2 1
x(t ) = 4µ t + (1 − 4µ )t .
x(t )
µ<0
x(t )
µ>0
0 t→ 1
27
R1
Clearly, out of these two, the cost J (x(·)) := 0 x(t ) dt is minimal for the positive µ.
In the abnormal case (1.66) we have that
µ ¶
∂ d ∂
0= − ẋ 2 (t ) = 2ẍ(t ).
∂x dt ∂ẋ
Hence x(t ) = bt + c. Given the boundary conditions x(0) = 0, x(1) = 1 it is immediate that this
allows for only one solution: x(t ) = t ,
1
x→
0 t→ 1
R1
Now ẋ(t ) = 1 and the integral constraint is C = 0 ẋ 2 (t ) dt = 1. This corresponds to µ = ∞. It is
the case where the constraint is tight. There are, so to say, no degrees of freedom left to shape
the function. ä
1.9 Exercises
1.1 Find the solutions x : [0, T ] → R of the Euler-Lagrange equation for
Z T
J= F (t , x(t ), ẋ(t )) dt
0
and
(a) F (t , x, ẋ) = ẋ 2 − α2 x 2
(b) F (t , x, ẋ) = ẋ 2 + 2x
(c) F (t , x, ẋ) = ẋ 2 + 4t ẋ
(d) F (t , x, ẋ) = ẋ 2 + x ẋ + x 2
(e) F (t , x, ẋ) = t ẋ 2 − x ẋ + x
(f) F (t , x, ẋ) = g (t )ẋ 2 − h(t )x 2
(g) F (t , x, ẋ) = x 2 + 2t x ẋ (this one is curious.)
28
(a) Determine the Euler-Lagrange equation for this problem
(b) Solve the Euler-Lagrange equation
(c) Is Legendre’s second-order condition satisfied?
(d) Is the convexity condition (Thm. 1.7.5) satisfied?
(e) Show that the solution x ∗ (·) of Euler-Lagrange is globally optimal.
x(0) = 0, x(1) = 1
for every continuously differentiable function δx (·) with δx (0) = δx (1) = 0, and con-
clude that x ∗ (·) is globally optimal.
(e) Is the convexity condition (Thm. 1.7.5) satisfied?
y
y(x)
h
air speed v 0 x1
1.5 A simplified Newton’s minimal resistance problem. Consider a solid of revolution with
diameter y(x) as shown in Fig. 1.16. If the air flows with a constant speed v (as in the
figure) then the total air resistance (force) can be modeled as
Z
2
x1 y(x) ẏ 3 (x)
4πρv dx.
0 1 + ẏ 2 (x)
The question is: given y(0) = 0 and y(x 1 ) = h > 0 (some constant h) for which curve is
the resistance minimal? Now we are going to cheat! To make the problem a lot easier
we simply discard the quadratic term in the denominator of the running cost: consider
the cost function
Z x1
2
J (y(·)) := 4πρv y(x) ẏ 3 (x) dx.
0
29
Given the boundary conditions are y(0) = 0 and y(x 1 ) = h show that
µ ¶3/4
x
y(x) = h
x1
is a solution of the Beltrami-identity with the given boundary conditions. (This function
y(x) is depicted in Fig. 1.16.)
(Notice that y(x) is not differentiable at x = 0 so formally the theorem of Euler-Lagrange
does not apply. But that’s nitpicking.)
1.6 Technical problem: the lack of Lipschitz continuity in the Beltrami identity for the
Brachistochrone problem, and how to circumvent it. The footnote of Example 1.3.1 de-
rives the cycloid equations (1.28) from the Beltrami identity with initial condition
The derivation was quick and this exercise shows that it was a bit dirty as well.
dy dy/dφ
(a) Let x(φ), y(φ) be the cycloid solution (1.28). Use the identity dx = dx/dφ to show
that they satisfy (1.72).
(b) The curve of this cycloid solution for φ ∈ [0, 2π] is
y
2
c
0 c 2π c2π x
2
0 c 2π c 2π
+∆ c2π + ∆ x
2 2
Argue that for every ∆ ≥ 0 also this new function satisfies the Beltrami iden-
tity (1.72) for all x ∈ (0, c 2 π + ∆).
(c) This is not what the footnote of Example 1.3.1 says. What goes wrong in this foot-
note?
2 2
(d) This new function y(x) is constant over the interval [ c 2π , c 2π +∆]. Show that a con-
stant function y(x) does not satisfy the Euler-Lagrange equation of the Brachis-
tochrone problem.
(e) It can be shown that y(x) solves (1.72) if-and-only-if it is of this new form for
some ∆ ≥ 0 (possibly ∆ = ∞). Argue that the only function that satisfies the Euler-
Lagrange equation with y(0) = 0 is the cycloid solution (1.28).
1.7 Technical problem: the lack of Lipschitz continuity in the minimal-surface problem, and
how to circumvent it. In Example 1.3.2 we claimed that y(x) = a cosh(x/a) is the only
positive even solution of (1.29). That is not correct. In this problem we assume that
a > 0.
30
(a) Show that the differential equation
q
dy(x)
= y 2 (x)/a 2 − 1
dx
is not Lipschitz-continuous at y = a. (See Appendix B.) Hence we can expect mul-
tiple solutions when y(x) = a.)
(b) Show that (1.29) can be separated as
dy
p = dx
y 2 /a 2 − 1
(c) If y(x 0 ) > a, show that y(x) = a cosh((x − c)/a) around x = x 0 for some c.
(d) Argue that y(x) is a solution of (1.29) iff it is pieced together from a hyperbolic
cosine, a constant, and a hyperbolic cosine again, as in
x−c
a cosh( a ) if x < c
y(x) = a if x ∈ [c, d ]
c
a cosh( x−d ) if x > d d
a
Here c ≤ d . (Notice that for x ∈ [c, d ] the value of y(x) is a at which point the
differential equation is not Lipschitz.)
(e) If c < d then on the strip [c, d ] the function y(x) is a constant (equal to a > 0). Show
that this does not satisfy the Euler-Lagrange equation. (Recall that the Beltrami
identity may have more solutions than the Euler-Lagrange equation.)
(f) Verify that y(x) = a cosh(x/a) is the only function that satisfies the Euler-Lagrange
equation of the minimal-surface problem (Example 1.3.2) and that has the sym-
metry property that y(−1) = y(+1).
1.8 Technical problem: from function to parameterized smooth curve. Consider the minimal
surface problem of Example 1.3.2. The Goldschmidt two-disc solution is not a function
y(x) and therefore it is not a surprise that it does not show up as a stationary trajectory
of our problem. Nevertheless it may be optimal. Instead of writing y as a function of x
we now express (x(t ), y(t )) as a function of some parameter t . Such a parameterization
of a given curve is of course highly non-unique. For any such parameterization the
length of the graph traced out over an infinitesimal strip [t , t + dt ] is
q
ẋ 2 (t ) + ẏ 2 (t ) dt
Here L is the “time” at which x(L) equals the end-point. We do not know L but this is
not crucial.
(a) Use the Beltrami identity to show that any stationary curve (x(t ), y(t )) satisfies
q
y(t )ẋ 2 (t ) = a ẋ 2 (t ) + ẏ 2 (t )
31
(b) Show that for a = 0 Beltrami gives us the Goldschmidt solution.
For completeness we mention that if a 6= 0 then p the right-hand side is nonzero for an
appropriate parameterization (one for which ẋ 2 (t ) + ẏ 2 (t ) > 0 for all t ) and hence that
then ẋ(t ) 6= 0 for every t . As a result one can locally express t as a function of x which
brings us back to the parameterization proposed in Example 1.3.2 and therefore the
hyperbolic cosine solution. In summary: the only piecewise differentiable curves that
are stationary are those hyperbolic cosines and the Goldschmidt two-disc solution.
1.9 Show that the minimal surface example (Example 1.3.2) satisfies the second-order ne-
cessity condition of Thm. 1.7.1.
(a) For G(x) = S(x) = 0 the first-order conditions are that (1.37) holds for all possible
perturbations. Adapt this equation for the case that G(x) and/or S(x) or not zero.
(b) Prove that this equality implies that the Euler-Lagrange equation holds.
(c) Finish the proof of Proposition 1.5.1.
1.11 The optimal solar challenge. The solar vehicle receives power from solar radiation. This
power p(x, t ) depends on position x (due to clouds) and on time t (due to change of
clouds and sun’s angle of inclination). Driving at some speed v(t ) = ẋ(t ) also consumes
power. Denote this powerloss by f (ẋ) and assume that it is a function of speed alone.
This is reasonable if we do not change speed aggressively and if friction depends only
on speed. Now driving at higher speed is known to require more energy per meter than
driving at lower speed. This means that f (·) is convex, in fact that both first and second
derivative f 0 (·) and f 00 (·) are strictly positive,
x(0) = 0
and at time T it wants to be at some position x(T ) = x T and of course all that using
minimal net energy
Z T
f (ẋ(t )) − p(x(t ), t ) dt .
0
f (ẋ) = ẋ 2
(this is actually quite reasonable, modulo scaling) and that p(x, t ) does not depend
on time,
p(x, t ) = q(x),
i.e. that the sun’s angle does not change much over our time window [0, T ] and
that clouds are not moving. Use the Beltrami Identity to express ẋ(t ) in terms of
q(x(t )) and the initial speed ẋ(0) and initial q(0).
32
(e) Argue once again (but now using the explicit relation of the previous part) that
when we drive into a cloud then we should speed up.
(f) A computer might be useful for this part. Continue with f (ẋ) = ẋ 2 and p(x, t ) =
q(x). Now finally suppose that up to position x = 20 the sky is clear but that from
x = 20 onwards heavy clouds limit the power input:
(
100 x < 20
q(x) =
4 x > 20.
Determine the optimal speed profile ẋ(t ) that brings us from x(0) = 0 in T = 7 to
x(7) = 90.
over all functions x(·) subject to x(0) = 1 and free end-point x(1).
(a) Show that no function exists that satisfies Euler-Lagrange with x(0) = 1 and the
free end-point constraint (1.42).
(b) Conclude that there is no C 2 function x(·) that minimizes J (x(·)) subject to x(0) =
x 0 and free end-point.
(c) Determine all functions x(·) that satisfy Euler-Lagrange and with x(0) = 1. Then
compute J (x(·)) explicitly and conclude, once more, that the free end-point prob-
lem has no solution.
x(−1) = 0, x(1) = 1.
The exercise shows that smooth running costs F (·) may result in non-smooth optimal
solutions x ∗ (·).
33
1.15 The hanging cable. Any flexible free-hanging cable comes to a halt in a position of min-
imal energy, such as these three:
What is the shape of this minimal energy position? When hanging still it has no kinetic
energy, it only has potential energy. If the cable is very flexible then the potential energy
is only due to its height y(x). We assume that the cable is very thin, does not stretch
and that is has a constant mass per unit length. In a constant gravitational field with
gravitational acceleration g the potential energy J (y(·)) is
Z x1 q
J (y(·)) = ρg y(x) 1 + ẏ 2 (x) dx,
x0
for some constant a ≥ 0 and b, c ∈ R. (Hint: we considered a very similar running cost
elsewhere in these notes.)
1.16 Consider Example 1.8.2. Prove that for C < 1 there is no smooth function that satisfies
the boundary conditions and integral constraint.
RT RT
1.17 Minimize 0 ẋ 2 (t ) dt subject to x(0) = x(T ) = 0 and 0 x 2 (t ) dt = 1.
34
Chapter 2
Minimum Principle
and that we can not choose x(t ) directly but only can choose the input u(t ). In fact, the input
may itself be restricted to take values in some possibly limited set U,
u : [0, T ] → U.
For example U = [−1, 1] or U = Rm or U = [0, ∞). In the solar challenge example, the input
u(t ) might be the throttle opening and this takes values in between u(t ) = 0 (fully closed) and
u(t ) = 1 (fully open).
For a given U and given (2.1), the optimal control problem is to find a control input u :
[0, T ] → U that minimizes a cost function of the form
Z T
J (x 0 , u(·)) := S(x(T )) + L(x(t ), u(t )) dt . (2.2)
0
n n
Here S : R → R and L : R × U → R. The part S(x(T )) is called the terminal- or final cost, and
L(x(t ), u(t )) is commonly called the running cost and the optimal u(·) is referred to as the
optimal control and is often denoted with a star, u ∗ (·).
Using the calculus of variations of the previous chapter, combined with the classic idea
of Lagrange parameters, we derive first-order conditions that any optimal control must sat-
isfy. Motivated by these first-order conditions we then formulate the truly fabulous Minimum
Principle of Pontryagin. This result shocked the world when it appeared in the late fifties,
early sixties of the previous century. The Minimum Principle is very general and gives us
necessary conditions for a control to be optimal. In many applications these conditions are
numerically trackable. But be warned: the proof of the Minimum Principle is complicated!
A variation of the optimal control problem is to fix the final state vector x(T ) to a given
x T . Clearly in this case there is no need for a final cost S(x(T )) in that every input results in
the same final cost. In this case the optimal control problem is to find, for a given (2.1) and
x T and U, an input u : [0, T ] → U that minimizes
Z T
L(x(t ), u(t )) dt (2.3)
0
35
subject to x(T ) = x T . In the final section of this chapter we consider the optimal control
problem where also the final time T is variable and where the cost is to be minimized over all
allowable inputs u(·) as well as all T > 0.
The method of Lagrange multipliers can help to find minimizers. The idea is to solve the
unconstraint minimization problem of an associated cost function defined as
This function K : Rn × Rk → R is known as the Lagrangian and the λ’s are known as La-
grange multipliers and they live in Rk . Assuming K (·) is sufficiently smooth, the first-order
conditions of the unconstraint minimization of K (z, λ) over all z and λ are that both gradi-
ents are zero at the solution:
∂K (z ∗ , λ∗ ) ∂K (z ∗ , λ∗ )
= 0, = 0. (2.5)
∂z ∂λ
As the Lagrangian is linear in λ it is immediate that the gradient of K (z, λ) with respect to
λ is G T (z). Hence (z ∗ , λ∗ ) is a stationary solution of K (z, λ) only if the constraints hold,
G(z ∗ ) = 0, and then K (z ∗ , λ∗ ) = L(z ∗ , λ∗ ). Under mild assumptions on G(·) the first-order
conditions (2.5) are equivalent to the first-order conditions of the constrained minimization
problem (2.4). A more detailed discussion can be found in Appendix A.7.
For the optimal control problem a similar approach will be taken, however with the added
complication that we are not dealing with a minimization over a finite number of parameters,
z ∈ Rn , but over uncountably many functions u(·), x(·), and the constraints are the dynamical
constraints ẋ(t ) = f (x(t ), u(t )) and these need to be satisfied for all time t ∈ [0, T ].
subject to
U = Rm .
36
The optimal control problem can be regarded as a constrained optimization problem,
with (2.7) being the dynamical constraint. This observation provides a clue to its solution:
introduce Lagrange multiplier functions p : [0, T ] → Rn corresponding to these dynamical
constraints. Analogous to the classic Lagrange multiplier method we define the Lagrangian
K (·) as
Now the first objective is to determine the first-order conditions for K (·), i.e. the condi-
tions that stationary solutions
must satisfy. Before we delve into the resulting Euler-Lagrange equation, it is interesting to
figure out what the Beltrami identity gives us. Indeed our K (·) is of the form K (q, q̇) and so
does not explicitly depend on time. Therefore Beltrami applies and it says that
∂K (q(t ), q̇(t ))
K (q(t ), q̇(t )) − q̇ T (t )
∂q̇
is constant over time for the stationary solutions. For our K (q, q̇) this constant function takes
the form
∂K (q, q̇)
K (q, q̇) − q̇ T
∂q̇
µ ¶
∂K (q, q̇) ∂K (q, q̇) ∂K (q, q̇)
= K (q, q̇) − ẋ T + ṗ T + u̇ T
∂ẋ ∂ṗ ∂u̇
= p T ( f (x, u) − ẋ) + L(x, u) − (−ẋ T p + 0 + 0)
= p T f (x, u) + L(x, u).
The final function is known as the Hamiltonian and it plays a central role in optimal control.
Lemma 2.3.1 (Hamiltonian equations). Let U = Rm and x 0 ∈ Rn . The smooth enough sta-
tionary functions (x(·), p(·), u(·)) with x(0) = x 0 of the cost (2.9), where K (·) is defined as
in (2.8), satisfy
37
Proof. The stationary solutions are those that satisfy the Euler-Lagrange equation together
with the boundary conditions of Proposition 1.5.1. Define K (q, q̇) as in (2.8) with q := (x, p, u)
and notice that K (q, q̇) in terms of the Hamiltonian H (x, p, u) is
For ease of exposition we momentarily drop the arguments of all functions. The Euler-
∂ d ∂
Lagrange equation 0 = ( ∂q − dt ∂q̇ )K holds component-wise. For component x it says
∂ d ∂ ∂H
0=( − )(H − p T ẋ) = + ṗ.
∂x dt ∂ẋ ∂x
Hence ṗ = − ∂H
∂x . For component p it says
∂ d ∂ ∂H
0=( − )(H − p T ẋ) = − ẋ.
∂p dt ∂ṗ ∂p
∂H
Hence ẋ = ∂p . For component u it says
∂ d ∂ ∂H
0=( − )(H − p T ẋ) = .
∂u dt ∂u̇ ∂u
∂S(x(T )) ∂K (q(T ),q̇(T ))
The free final-point (aka fre end-point) condition (1.41) becomes 0 = ∂q + ∂q̇
and per component this is
The differential equations (2.10a, 2.10b) are known as the Hamiltonian equations. Note
that
∂H (x, p, u)
= f (x, u),
∂p
so the first Hamiltonian equation (2.10a) is nothing else than the system equation ẋ(t ) =
f (x(t ), u(t )), x(0) = x 0 .
The Lagrange multiplier p(t ) is called the costate (because mathematically it lives in a dual
space to the (variations) of the state x(t )). In examples it often has interesting interpretations
– shadow prices in economics and contact forces in mechanical systems – in terms of the
sensitivity of the minimized cost function. This is already illustrated by the condition p ∗ (T ) =
38
∂S(x ∗ (T ))
∂x ,
which means that p ∗ (T ) equals the sensitivity of the final time cost with respect to
variations in the optimal state at the final time. Later we will see that
∂J (x 0 , u ∗ (·))
p(0) = ,
∂x 0
where J (x 0 , u ∗ (·)) is the optimal cost for initial state x 0 , see § 3.5. A large p(0) hence means
that the optimal cost might be very sensitive to changes in the initial state.
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x 0 , (2.12a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = (2.12b)
∂x ∂x
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
0= . (2.12c)
∂u
ä
If only that were true. Well, it is true (under some mild smoothness assumption)! In fact
it holds in a far more general setting. The following celebrated theorem by Pontryagin and
coworkers provides a necessary condition for solutions of the true minimization problem (not
just stationary ones), and it can even deal with restricted sets U! The basic feature is that it
replaces the first-order optimality condition (2.12c) with a true minimization condition. Here
is the famous result, it is the central result of this chapter:
Theorem 2.4.2 (Minimum Principle). Consider a differential equation ẋ(t ) = f (x(t ), u(t ))
and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are all continuous in
x and u.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and assume it is
bounded and piecewise continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Given
such u ∗ (·) there is a unique function p ∗ : [0, T ] → Rn that satisfies
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x 0 , (2.13a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = (2.13b)
∂x ∂x
39
and along the solution x ∗ (t ), p ∗ (t ), the input u ∗ (t ) at every t ∈ [0, T ] where the input is con-
tinuous minimizes the Hamiltonian:
Proof. (This proof requires a couple of technical results regarding continuity of solutions of
differential equations. Upon first reading these can be discarded but for a full understanding
you should have a look at Appendix B.)
Let u ∗ (·) be an optimal input, and let x ∗ (·) be the corresponding optimal state. First notice
that the costate equations are linear in the costate:
where A(t ) := −∂ f (x ∗ (t ), u ∗ (t ))/∂x T and b(t ) := −∂L(x ∗ (t ), u ∗ (t ))/∂x. By assumption both A(t )
and b(t ) are piecewise continuous and bounded and so the solution p ∗ (t ) exists, is continu-
ous and is unique.
Now assume, to obtain a contradiction, that at some time t̄ ∈ [0, T ) where the input is
continuous, a ū ∈ U exists that achieves a smaller Hamiltonian H (x ∗ (t̄ ), p ∗ (t̄ ), ū) than u ∗ (t̄ )
does. Then, because of continuity, for some small enough ² > 0 the function defined as
(
ū if t ∈ [t̄ , t̄ + ²],
ū(t ) = (2.15)
u ∗ (t ) elsewhere
for the negative number c = H (x ∗ (t̄ ), p ∗ (t̄ ), ū) − H (x ∗ (t̄ ), p ∗ (t̄ ), u ∗ (t̄ )). Now write ū(t ) as a per-
turbation of the optimal input,
ū(t ) = u ∗ (t ) + δu (t ).
The so defined perturbation δu (t ) = ū(t ) − u ∗ (t ) has a support of ². Its graph might look like
δu (t )
ǫ
0 t̄ t→
In the rest of the proof we fix this perturbation and we only consider very small ². Such per-
turbations are called “needle” perturbations.
By perturbing the input, ū(t ) = u ∗ (t ) + δu (·), the solution of ẋ(t ) = f (x(t ), u ∗ (t ) + δu (t ))
perturbs as well. Denote the perturbed state as x(·) = x ∗ (·) + δx (·). The perturbation δx (t ) is
probably not a needle but at each moment in t it is of order1 ². The derivative of this δx (·)
satisfies
40
This expression we soon need. To avoid clutter we now drop all time arguments, that is, x(t )
is simply denoted as x, et cetera. Also in the equations that follow the approximate identity ≈
means equal up to an o(²) term. Let ∆ be the change in cost, ∆ = J (x 0 , u ∗ + δu ) − J (x 0 , u ∗ ). We
have
∆ = J (x 0 , u ∗ + δu ) − J (x 0 , u ∗ )
Z T
= S(x ∗ (T ) + δx (T )) − S(x ∗ (T )) + L(x ∗ + δx , u ∗ + δu ) − L(x ∗ , u ∗ ) dt
0
Z T
∂S(x ∗ (T ))
≈ δ x (T ) + L(x ∗ + δx , u ∗ + δu ) − L(x ∗ , u ∗ ) dt .
∂x T 0
Next use that L(x, u) = −p T f (x, u) + H (x, p, u) and let p be the optimal costate p ∗ :
Z T
∂S(x ∗ (T ))
∆≈ δ x (T ) + −p ∗T [ f (x ∗ + δx , u ∗ + δu ) − f (x ∗ , u ∗ )] dt
∂x T 0
Z T
+ H (x ∗ + δx , p ∗ , u ∗ + δu ) − H (x ∗ , p ∗ , u ∗ ) dt .
0
Here we also subtracted and added a term H (x ∗ , p ∗ , u ∗ + δu ). The reason is that now the
difference of the first two Hamiltonian terms can be recognized as an approximate partial
derivative with respect to x, and the difference of the final two terms is what we considered
earlier (it equals c² + o(²)), so:
Z T
∂S(x ∗ (T )) ∂H (x ∗ , p ∗ , u ∗ + δu )
∆≈ δx (T ) + −p ∗T δ̇x + δx dt + c².
∂x T 0 ∂x T
∗ ∗ ∗∂H (x ,p ,u +δ )
u ∗ ∗ ∗ ∂H (x ,p ,u )
Notice that the partial derivative ∂x T equals −ṗ ∗ = ∂x T almost everywhere
(except for ² units of time). Combined with the fact that δx at each moment in time is also of
order ² we have that
Z T
∂S(x ∗ (T ))
∆≈ δ x (T ) + −p ∗T δ̇x − ṗ ∗T δx dt + c².
∂x T 0
∂S(x ∗ (T )) h iT
T
∆≈ δ x (T ) + −p ∗ (t )δ x (t ) + c²
∂x T 0
∂S(x ∗ (T ))
=[ − p ∗T (T )] δx (T ) + p ∗T (0) δx (0) +c² = c² + o(²).
∂x T | {z }
| {z } 0
0
Here we used that δx (0) = 0. This is because of the boundary condition x(0) = x 0 . Since c < 0
we see that ∆ is negative for small enough ². But that would mean that ū(·) for small enough
² achieves a smaller cost than optimal. Not possible. Hence the assumption that u ∗ (t ) does
not minimize the Hamiltonian at every moment in time is wrong. ■
41
This theory of optimal control was developed in the Sovjet Union in the fifties of the 20th
century and to honour its main contributor is often called the Pontryagin Minimum Princi-
ple (or Pontryagin Maximum Principle if we would have considered maximization instead of
minimization). A drawback of the Minimum Principle is that it assumes the existence of an
optimal control u ∗ (t ), and only then guarantees that u ∗ (t ) minimizes the Hamiltonian at each
moment in time. In practical situations, though, it often is this minimization that determines
the optimal control u ∗ (t ).
with cost
Z 1
J (x 0 , u(·)) = x(t ) dt . (2.18)
0
So we have that f (x, u) = u, S(x) = 0, and L(x, u) = x. As input set we choose U = [−1, 1]. The
Hamiltonian follows as
H (x, p, u) = pu + x,
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − = −1, p ∗ (1) = = 0. (2.19)
∂x ∂x
Clearly this means that the costate is
p ∗ (t ) = 1 − t .
The optimal input u ∗ (t ) – assuming it exists – at each t ∈ [0, 1] minimizes the Hamiltonian
p 1 (t )u(t ) + x(t ), that is, we need to solve
min p ∗ (t )u + x ∗ (t ). (2.20)
u∈[−1,1]
Since p ∗ (t ) = 1 − t > 0 for all t ∈ [0, 1) the optimal input is the attained at the boundary where
U is minimal,
u ∗ (t ) = −1 ∀t .
R1
This makes perfect sense: to minimize 0 x(t ) dt we want x(t ) to go down as fast as possible
which given the system dynamics ẋ(t ) = u(t ) means taking u(t ) as negative as possible: u(t ) =
−1.
The situation changes qualitatively if we add a final cost S(x(1)) = − 12 x(1): consider the
cost
Z 1
1
J (x 0 , u(·)) = − x(1) + x(t ) dt .
2 0
Now it is not obvious what to with with u(t ) because the faster x(t ) goes down the larger the
final cost − 12 x(1) is going to be. So possibly u(t ) = −1 is no longer optimal. In fact, we will see
that it is not optimal. The costate equations now are
∂S(x ∗ (1)) 1
ṗ(t ) = −1, p(1) = =−
∂x 2
42
and therefore
1
p ∗ (t ) = − t.
2
It is positive for 0 ≤ t ≤ 12 but negative for 12 ≤ t ≤ 1. Hence the optimal control – still assuming
it exists – solves (2.20) and, therefore, switches at t = 1/2,
(
−1 if 0 ≤ t ≤ 21
u ∗ (t ) =
+1 if 12 ≤ t ≤ 1.
Apparently it is now optimal to move x(t ) down as fast as possible over the first half of the
time interval and then back up as fast as possible over the second half. ä
with cost
Z 1
J (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0
Now we allow any u(t ) ∈ R. Notice that this the same cost function as in Example 1.5.2 be-
cause u(t ) = ẋ(t ). The associated Hamiltonian is
H (x, p, u) = pu + x 2 + u 2 .
Since H (x, p, u) is quadratic in u and U = R the minimizing u is the one at which the gradient
of H (x, p, u) with respect to u is zero. This yields
1
u ∗ (t ) = − p ∗ (t ), (2.21)
2
and the Hamiltonian equations (2.13) then become
1
ẋ ∗ (t ) = − p ∗ (t ), x ∗ (0) = x 0 ,
2
ṗ ∗ (t ) = −2x ∗ (t ), p ∗ (1) = 0,
p ∗ (t ) = c 1 et +c 2 e−t .
1 1
x ∗ (t ) = − c 1 et + c 2 e−t .
2 2
The two constants c 1 , c 2 follow uniquely from the two constraints x ∗ (0) = x 0 and p ∗ (1) = 0
(verify this yourself) and it gives
x 0 £ 1−t ¤
x ∗ (t ) = −1
e + et −1 ,
e+e
2x 0 £ 1−t ¤
p ∗ (t ) = −1
e − et −1
e+e
and then u ∗ (t ) follows from (2.21). ä
43
Example 2.4.5 (Optimal reinvestment). Let x(t ) be the production rate of, say, gold of some
mining company. At each moment in time a fraction u(t ) ∈ [0, 1] of the gold is reinvested in
the company to increase the production rate. This can be modeled as
where α is some positive parameter that models the success of investment. Since we reinvest
u(t )x(t ), the net production rate available for the market is (1 − u(t ))x(t ). After T units of
RT
time we want to net total production 0 (1 − u(t ))x(t ) dt to be as large as possible. In our
setup (with minimization) it means that we want the opposite sign
Z T
J (x 0 , u(·)) := (u(t ) − 1)x(t ) dt
0
These differential equations are, in its present form, still hard to solve. However our Hamilto-
nian (2.22) is linear in u(t ) so the minimizer u ∗ (t ) ∈ [0, 1] of the Hamiltonian (2.22) depends
solely on the sign of x(t )(1 + αp(t )). The production rate x(t ) is inherently positive (because
x(0) = x 0 > 0 and ẋ(t ) = αu(t )x(t ) ≥ 0) therefore the Hamiltonian is minimized for
(
0 if 1 + αp ∗ (t ) > 0
u ∗ (t ) =
1 if 1 + αp ∗ (t ) < 0.
The value of the costate p ∗ (t ) where this u ∗ (t ) switches is p ∗ (t ) = −1/α, see Fig. 2.1(left). Now
at t = T we have p ∗ (T ) = 0, so near the final time T we have u ∗ (t ) = 0 (invest nothing, sell all)
and then the Hamiltonian dynamics reduces to ẋ(t ) = 0 and
That is, p(t ) = t − T near t = T , see Fig. 2.1. Solving backwards in time, starting at t = T , we
see that the costate reduces linearly, until at time
t s := T − 1/α
it reaches the level p(t s ) = −1/α < 0 at which point u ∗ (t ) switches sign. Since ṗ(t ) > 0 for
every input, the value of p(t ) is less than −1/α for t < t s which, in turn, implies that u ∗ (t ) = 1
for all t < t s . For this case the Hamiltonian dynamics simplifies to
Both x(t ) and p(t ) now have exponential solutions. The combination of before-and-after-
switch is shown in Fig. 2.1.
Notice that if t s < 0 then on the time window [0, T ] no switch takes place. It is then op-
timal to invest nothing and sell everything throughout [0, T ]. This happens if α < 1/T and
the interpretation is that the success of investment α is then too small to benefit from invest-
ment. If, on the other hand, α > 1/T then t s > 0 and then investment is beneficial and the
above shows that it is optimal to first invest all and in the final 1/α time units to sell all. Of
course this model is a simplification of reality. ä
44
u→
p→
u∗ (t ) = 1
ts t =T u∗ (t ) = 0
0
0 ts t =T
t
−
T
=
t)
∗(
p
− α1
x∗ (t ) = eαts x0
x→
ts )
αt x0
(t −
e
α e −α
)=
(t
−1
x∗
=
(t )
p∗
0 ts t =T
F IGURE 2.1: Optimal costate p ∗ (t ), optimal input u ∗ (t ) and optimal state x ∗ (t ) (Example 2.4.5)
An interesting consequence of the Hamiltonian form of the differential equations for x(t )
and p(t ) is that the Hamiltonian function H (x, p, u) = p T f (x, u) + L(x, u) is preserved along
optimal trajectories. For the unconstraint inputs U = Rm this follows from the Beltrami iden-
tity but it may also verified directly from the first-order equations for optimality expressed in
Proposition 2.4.1. Indeed, let x ∗ (t ), p ∗ (t ), u ∗ (t ) denote an optimal triple satisfying the equa-
tions of Proposition 2.4.1. Then a direct computation yields (and for the sake of exposition
we momentarily skip all arguments of H and other functions)
d ∂H ∂H ∂H
H = T ẋ ∗ + ṗ ∗ + T u̇ ∗
dt ∂x ∂p T ∂u
|{z}
0
∂H ∂H ∂H ∂H ∂H ∂H
= T ẋ ∗ + ṗ = T
T ∗
+ T
(− )=0 (2.23)
∂x ∂p ∂x ∂p ∂p ∂x
for every solution x ∗ (t ), p ∗ (t ), u ∗ (t ) of (2.12c). In the next chapter we prove that the conser-
vation of the Hamiltonian H (x, p, u) along optimal trajectories also holds for restricted input
sets U (such as U = [0, 1] et cetera). This is quite remarkable because in such cases the input
often is discontinuous. The following example illustrates this property.
Example 2.4.6 (Example 2.4.3 continued). In Example 2.4.3 we considered ẋ(t ) = u(t ) with
R1
initial condition x(0) = x 0 and cost J (x 0 , u(·)) = − 12 x(1) + 0 x(t ) dt . We found that the optimal
costate trajectory is linear in time,
1/2
p(t )
1
p ∗ (t ) = − t 0 1
2
45
and that the optimal input switches halfway
( 1
−1 if 0 ≤ t < 12 ,
u ∗ (t ) = 1
0 1
1 if 2 ≤ t ≤ 1.
−1 u(t )
Therefore the description of the optimal state trajectory also switches halfway. From ẋ(t ) =
u(t ) it follows that
x0 x(t )
(
x0 − t for 0 ≤ t ≤ 12 ,
x ∗ (t ) = 1 0 1
x0 − 1 + t for 2 ≤ t ≤ 1.
Based on this one would perhaps think that the Hamiltonian then switches as well, but it does
not: the Hamiltonian is H (x, p, u) = pu + x and along optimal trajectories it is constant for all
time:
H (x ∗ (t ), p ∗ (t ), u ∗ (t )) = p ∗ (t )u ∗ (t ) + x ∗ (t )
(
−( 1 − t ) + (x 0 − t ) if t < 1/2
= 12
( 2 − t ) + (x 0 − 1 + t ) if t ≥ 1/2
= x 0 − 12 ∀t .
Keep in mind that there are no conditions on the remaining final state components
x r +1 (T ), . . . , x n (T ). As before we take a cost of the form
Z T
J (x 0 , u(·)) = S(x(T )) + L(x(t ), u(t )) dt . (2.25)
0
Lemma 2.3.1 (the first-order conditions for U = Rm ) can be generalized to this case as follows.
In the proof of this lemma, the conditions on the final costate
∂S(x ∗ (T ))
p(T ) =
∂x
were derived from the free end-point condition (1.41), but in Proposition 1.5.1 we saw that
these conditions are absent if the final state is fixed. With that in mind it will be no surprise
that fixing the first r components of the state, x i (T ), i = 1, . . . , r implies that the conditions
on the corresponding first r components of the costate are absent, so only the remaining
components of p(T ) are fixed:
∂S(x ∗ (T ))
p i (T ) = , i = r + 1, . . . , n.
∂x i
46
That is indeed the case, normally. However, there can be a catch: the first-order conditions
were derived using a perturbation of the solution, but if both initial state and final state are
fixed then examples can be constructed where nonzero perturbations do not exist. For exam-
ple, if
then the only control that steers x(0) = 0 to x(1) = 0 is the zero function u(t ) = 0 and any
perturbation of this u(t ) is infeasible. Now matters become involved, and it would take way
to long to explain here how to resolve this problem. The interested reader might want to
consult the excellent book Liberzon (2012). Here we just provide the standard solution. The
solution involves the introduction of the modified Hamiltonian
It is the Hamiltonian but with an extra parameter λ and this parameter is either zero or one,
λ ∈ {0, 1}.
Note that H (x, p, u, 1) is the “normal” Hamiltonian, and that H (x, p, u, 0) completely neglects
the running cost L(x, u). The case λ = 0 is commonly referred to as the “abnormal” case,
indicating that it is not likely to happen in practice. With this modified Hamiltonian, the
Minimum Principle (Thm. 2.4.2) generalizes as follows.
Theorem 2.5.1 (Minimum Principle for constraint final state). Consider (2.24) with stan-
dard cost (2.25) and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are all
continuous in x and u.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and assume it is
bounded and piecewise continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Then
there is a function p ∗ : [0, T ] → Rn and a constant λ ∈ {0, 1} such that (λ∗ , p ∗ (t )) 6= (0, 0)∀t ∈
[0, T ], and
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ)
ẋ ∗ (t ) = , x ∗ (0) = x 0 , x i∗ (T ) = x̂ i , i = 1, . . . , r (2.27a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗i (T ) = , i = r + 1, . . . , n (2.27b)
∂x ∂x i
and along the solution x ∗ (t ), p ∗ (t ) the input u ∗ (t ) at every t ∈ [0, T ] where it is continuous
minimizes the modified Hamiltonian:
Example 2.5.2 (Singular optimal control – an abnormal case). Consider the scalar system
It is clear that the only feasible control is the zero function. So the minimal cost is 0, and
x ∗ (t ) = u ∗ (t ) = 0 for all time.
47
The modified Hamiltonian is H (x, p, u, λ) = pu 2 +λu. If we try to solve the normal Hamil-
tonian equations (2.27), (2.28) (so for λ = 1), we find that the costate is constant and that u ∗ (t )
at every t minimizes p ∗ (t )u 2 + u. But the true optimal control u ∗ (t ) = 0 does not minimize
p ∗ (t )u 2 + u.
For the abnormal case, λ = 0, the Hamiltonian equations again says that p ∗ (t ) is constant
but now that u ∗ (t ) at every t minimizes p ∗ (t )u 2 . Clearly for every positive constant p ∗ (t ) the
true optimal control u ∗ (t ) = 0 minimizes p ∗ (t )u 2 . ä
One more “abnormal” case is discussed in Exercise 2.12. All other examples in the chapter
are normal.
Example 2.5.3 (Shortest path). In the previous chapter we solved the (trivial) shortest path
problem by formulating it as an example of the simplest problem in the calculus of variations.
We now formulate this as an optimal control problem with final constraint. Let x(·) be a curve
through the points x(0) = a and x(T ) = b and assume T > 0. The length of the curve is
Z T p
`(x(·)) = 1 + ẋ(t )2 dt .
0
We want to minimize this `(x(·)). This can be seen as an optimal control problem for the
system
with cost
Z T p
J (x 0 , u(·)) = 1 + u 2 (t ) dt .
0
If we apply Thm. 2.5.1, we find that p ∗ (·) is constant. We denote this constant as p̂. Substitu-
tion of λ = 1 in (2.28) and some rearrangements yield the following candidates for the optimal
input (verify this yourself)
−∞ if p̂ ≥ 1
−p̂
u ∗ (t ) = p1−p̂ 2 if − 1 < p̂ < 1
∞ if p̂ ≤ −1.
We can strike off the first and the last candidates, because they clearly fail to achieve the final
constraint x(T ) = b. The second candidate says that u ∗ (t ) is some constant. With a constant
input u ∗ (t ) = u 0 the solution of the differential equation for x(t ) is x(t ) = t u 0 + a which is a
straight line. Because of the initial and final conditions it follows that v = (b − a)/T . Hence, as
expected,
b−a b−a
x ∗ (t ) = a + t, u ∗ (t ) = .
T T
p
The costate can be recovered from the fact that u ∗ (t ) = −p̂/ 1 − p̂ 2 . This gives
a −b
p ∗ (t ) = p .
T 2 + (b − a)2
48
It is interesting to compare this with the optimal cost (the optimal length of the curve)
p
`∗ (a, b) = T 2 + (b − a)2 .
(a,b)
Here we see that p ∗ (0) equals ∂`∗∂a . I.e. p ∗ (0) expresses how strongly the optimal cost
changes if we change a. Every costate has this property (we return to this in § 3.5). ä
x(0) = x(T ) = 0.
This roughly speaking says that we want x(t ) as small (negative) as possible, yet it needs to
start at zero, x(0) = 0, and needs to end up at zero again, x(T ) = 0. The Hamiltonian (with
λ = 1) is
H (x, p, u) = pu + x
ṗ(t ) = −1.
Notice that since the state is fixed at final time, x(T ) = 0, there is no condition on the costate
at the final time. So, for now, all we know about the costate is that its derivative is −1, i.e.
c p(t )
p(t ) = c − t 0 c T
for some as yet unknown constant c. Given this p(t ) = c −t , the minimizer u ∗ (t ) of the Hamil-
tonian is
( 1
−1 if t < c
u ∗ (t ) = 0 c T
+1 if t > c
−1 u(t )
This function switches sign (from negative to positive) at t = c, and as a result the state x(t ) is
piecewise linear. First it goes down and then, from t = c onwards, it goes up,
(
−t if t < c c
x(t ) = 0 T
2t − c if t > c
x(t )
It will be clear that the only value of c for which x(T ) is zero, is
c = T /2.
This then completely settles the optimal control: on the first half [0, T /2] we have ẋ ∗ (t ) = −1
and on the second half [T /2, T ] we have ẋ ∗ (t ) = +1. The optimal cost is the integral of this
RT
x ∗ (t ). The optimal cost is J (0, u ∗ (·)) = 0 x ∗ (t ) dt = −T 2 /4. ä
49
2.6 Free final time
So far the final time T was fixed. Now we drop this assumption and optimize the cost over all
inputs as well as over all final times T ≥ 0. As always we assume a cost of the form
Z T
J (u(·), T ) = S(x(T )) + L(x(t ), u(t )) dt .
0
As we have one more degree of freedom, the Minimum Principle still holds but with one more
condition. This condition is quite elegant:
Theorem 2.6.1 (Minimum Principle with free final time). Consider a differential equation
ẋ(t ) = f (x(t ), u(t )) and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are
all continuous in x and u. Suppose time T∗ and input u ∗ : [0, T ] → U are a solution of the
optimal control problem with free final time, and assume the input is bounded and piecewise
continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Then there exist a function
p ∗ (·) and a constant λ ∈ {0, 1}, such that
Proof (sketch). We prove it for the case without final state constraints (i.e., r = 0 and λ = 1)
and we assume that u ∗ (t ) is continuous at t = T . For technical reasons we define u ∗ (t ) to
equal u ∗ (T ) for all t > T . Then the cost is differentiable with respect to time the final time T
and, hence, at the optimal time T∗ we necessarily have
dJ (x 0 , u ∗ (·), T∗ )
= 0.
dT
This derivative equals
The extension to the case r > 0 is nontrivial. Detailed proofs can be found in Liberzon (2012).
■
The remarks about λ that were made after Thm. 2.5.1 also apply to this situation. Since
by (2.23) the Hamiltonian H (x, p, u) is constant along optimal trajectories we conclude
from (2.30) that actually the Hamiltonian H (x ∗ (t ), p ∗ (t ), u ∗ (t )) is identically zero for all time!
Now we can solve the classic problem of Zermelo.
50
(x1 , x2 ) = (a, b)
x2 →
u
S(x2 )
(x1 , x2 ) = (0, 0) x1 →
Example 2.6.2 (Zermelo). We want a boat to cross a river in minimal time. The water in the
river flows with a speed that depends on the distance to the banks. We denote the speed of
the boat with respect to the water by v and we assume it is a positive constant. The point
of departure of the boat (x 1 (0), x 2 (0)) = (0, 0) and the arrival point (x 1 (T ), x 2 (T )) = (a, b) are
given. The equations of motion of the boat are
where W (x 2 ) is the flow speed of the river at x 2 , and u is the angle between the boat’s principal
axis and the x 1 -axis, see Fig. 2.2. We take the minimal time cost
Z T
J (x 0 , u(·), T ) = dt = T.
0
−p 1 (t )v sin(u(t )) + p 2 (t )v cos(u(t )) = 0
and therefore, if p 1∗ (t ) 6= 0
p 2∗ (t ) p 2∗ (t )
tan (u ∗ (t )) = = . (2.32)
p 1∗ (t ) p 1∗ (T )
ṗ ∗1 (t ) = 0
∂W (x 2∗ (t )) (2.33)
ṗ 2∗ (t ) = −p 1∗ (t ) .
∂x 2
Finally, the condition (2.30) of optimality of final time implies as above that
0 = H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
= p 1∗ (t )v cos(u ∗ (t )) + p 1∗ (t )s(x ∗ (t )) + p 2∗ (t )v sin(u ∗ (t )) + 1. (2.34)
51
addition that u ∗ (·) is constant. For constant inputs u ∗ (·) and W (x 2 ), Equation (2.31) can be
solved directly. We denote this constant input as u 0 (·) and so
x 1 (t ) = (v cos(u 0 ) + w 0 )t
x 2 (t ) = (v sin(u 0 ))t .
Also, the state at the final time must satisfy x 1 (T ) = a and x 2 (T ) = b. This yields the following
system of equations in the unknowns u 0 and T :
a = (v cos(u 0 ) + w 0 )T
b = (v sin(u 0 )))T.
I.e.,
a − w0T
cos(u 0 ) =
Tv
(2.35)
b
sin(u 0 ) = .
Tv
Squaring and adding both equations yields
T 2 (v 2 − w 02 ) + 2w 0 aT − a 2 − b 2 = 0.
If we assume that v > w 0 , then this equation has exactly one positive solution T . Then u 0
follows from (2.35). ä
Example 2.6.3 (Bang-bang control). This is an elegant and classic application. We want to
steer a car into a parking spot and we want to do it in minimal time, and of course, at the
precise moment that we reach the spot our speed should be zero. To keep things manageable
we assume that we can steer the car in one dimension only (like a cart on a rail). The position
of the car is denoted x 1 (t ) and its speed as x 2 (t ). The acceleration u(t ) is bounded, say u(t ) ∈
[−1, 1]. The state equations thus are
ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = u(t ), u(t ) ∈ [−1, 1].
Let us assume that the desired (parking) state is the origin x 1 (0) = 0, x 2 (0) = 0, and that at time
zero we are at some other state x 1 (0), x 2 (0). The cost to be minimized is time T , so
Z T
J (x 1 (0), x 2 (0), u(·)) = 1 dt .
0
H (x, p, u) = p 1 x 2 + p 2 u + 1.
ṗ 1 (t ) = 0
ṗ 2 (t ) = −p 1 (t ).
Since both components of the final state x(T ) are fixed, the final constraints on both com-
ponents of the co-state are absent. Therefore in principle every constant p 1 is allowed and,
therefore, every linear function p 2 (t ):
p 2 (t ) = at + b.
52
For the optimal input we can not have a = b = 0 because that contradicts the fact that
H (x, p, u) = p 1 x 2 +p 2 u+1 is zero. Consequently, the second co-state entry p 2 (t ) is not the zero
function. This, in turn, implies that p 2 (t ) can switch sign at most once. Why is this important?
Well, the optimal u ∗ (t ) minimizes the Hamiltonian p 1 x 2 + p 2 u + 1 and since u ∗ (t ) ∈ [−1, 1] we
have
u ∗ (t ) = − sgn(p 2 (t )).
This is well defined because p 2 (t ) is nontrivial, and as p 2 (t ) switches sign at most once, also
Let t s be the moment of switching. Then, by definition, the input for t > t s does not switch
any more and so is either +1 throughout or −1 throughout. Now for u = +1 it is easy to see
that the solutions (x 1 (t ), x 2 (t )) are the shifted parabolas:
x2
u = +1 : x1
u = −1 : x1
Since on [t s , T ] the input does not change and since x(T ) = (0, 0) it must be that on [t s , T ] the
state is either this red or blue parabola:
x2
u = −1
x1
u = +1
53
These two are the only two parabolas that end up the desired final state x(T ) = (0, 0). Before
the switch time the input u(t ) had the opposite sign. For instance if after the switch we have
u = +1 (the red orbit) then before we the switch we have u = −1 e.g. any of the gray parabolas
. These have to end up at the red parabola. Inspection shows that the orbits are any of these:
x2
u = −1
x1
u = +1
Before the switch the orbit follows the gray parabola and then, after the moment of switching
it follows the red or blue parabola. This settles the problem for every initial state (x 1 (0), x 2 (0)).
ä
2.7 Exercises
2.1 Consider the scalar system (compare with Exercise 3.3)
(a) Determine the Hamiltonian H (x, p, u) and the differential equation for the costate.
(b) Determine the optimal input u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
(c) Show that the H (x ∗ (T ), p ∗ (T ), u ∗ (T )) is zero.
(d) Determine p ∗ (t ).
(e) Determine the optimal u(t ) as a function of x(t ) and then calculate the optimal
state trajectory for T = 2. [Hint: see Example B.1.5.]
(a) Give the Hamiltonian and the differential equation for the costate.
(b) Prove that from Pontryagin’s Minimum Principle it follows that u ∗ (t ) (generically)
assumes only two values.
(c) Prove that if x 0 > 0, then x ∗ (t ) > 0 for all t ∈ [0, T ].
(d) Prove that p ∗ (t ) under the conditions stated in c. has at most one change of sign.
What does this mean for u ∗ (t )?
(e) Solve the optimization problem for x 0 > 0. Also give the solution for p ∗ (t ).
54
2.3 A point mass attached to a spring with (positive) spring-constant k, displacement from
the equilibrium x 1 (t ) and velocity x 2 (t ) is subjected to an external force u(t ). The equa-
tions of motion are
ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = −kx 1 (t ) + u(t ).
uses u(t ) as control. Here, u(t ) and y(t ) are scalar quantities. The cost is given by
Z T
1
J [0,T ] ((y 0 , v 0 ), u(·)) = u(t )2 dt .
2 0
Determine the optimal control that drives the system from the initial state y(0) = y 0 ,
ẏ(0) = v 0 to the final state y(T ) = ẏ(T ) = 0.
where φ(t ) is the angle with respect to the stable equilibrium state, u(t ) is a torque
exerted around the suspension point.
The objective is to minimize the cost
Z T
2 2
J [0,T ] (x 0 , u(·)) = m` φ̇ (T ) − 2mg ` cos(φ(T )) + φ̇2 (t ) + u 2 (t ) dt .
0
£ x1 ¤ £φ¤
We introduce the notation x = x2 = φ̇
.
(a) Determine the state differential equation ẋ(t ) = f (x(t ), u(t )).
(b) Determine the Hamiltonian H (x, p, u) and the differential equation for the costate.
RT
(c) Calculate 0 φ̇(t )u(t ) dt . What do you see?
55
−c ṁ(t )
y
g m(t )
F IGURE 2.3: Soft landing on the Moon
(d) (Difficult) Give an expression in terms of φ∗ (t ) and φ̇∗ (t ) for the optimal control.
2.6 Soft landing on the Moon. By thrusting out gasses with a constant velocity c (but vari-
able quantities), a lunar ship with mass m(t ) is subjected to an upward force −c ṁ(t )
(note: ṁ(t ) ≤ 0). See Fig. 2.3. Also, a gravity −g m(t ) works on the ship. The altitude
y(t ) of the ship satisfies the differential equation
The objective is to determine the final time T > 0 such that the lunar ship makes a soft
landing, and such that the use of fuel is minimized. Fuel use is subject to an additional
restriction: −1 ≤ ṁ(t ) ≤ 0. With the state variables x 1 (t ) = y(t ), x 2 (t ) = ẏ(t ), x 3 (t ) = m(t )
and the input variable u(t ) = −ṁ(t ) we rewrite the problem as follows:
ẋ 1 (t ) = x 2 (t ), x 1 (0) = x 0 , x 1 (T ) = 0
u(t )
ẋ 2 (t ) = c −g, x 2 (0) = ẋ 0 , x 2 (T ) = 0
x 3 (t )
ẋ 3 (t ) = −u(t ), x 3 (0) = M 0
and
Z T
J [0,T ] (x 0 , u(·)) = u(t ) dt , U = [0, 1].
0
p 2 (t )
ρ(t ) = c − p 3 (t ) + 1.
x 3 (t )
−c p 1 (T )
Prove that ρ̇(t ) = x 3 (t ) , and give the conditions for the optimal control in terms
of ρ(t ).
(e) Conclude from (d) that u ∗ (t ) is of the following form:
56
i. u ∗ (t ) = 0, 0 ≤ t ≤ T∗ , or
ii. u ∗ (t ) = 1, 0 ≤ t ≤ t 1 , u ∗ (t ) = 0, t 1 < t ≤ T∗ , or
iii. u ∗ (t ) = 0, 0 ≤ t ≤ t 1 , u ∗ (t ) = 1, t 1 < t ≤ T∗ , or
iv. u ∗ (t ) = 1, 0 ≤ t ≤ T∗ .
(f) Prove that i. and ii. are not possible.
∗2 p (t )
(g) What is the relation between ρ ∗ (T∗ ), p ∗1 (0) and p ∗2 (0)? Here, ρ ∗ (t ) = c x∗3 (t ) −
p ∗3 (t ) + 1.
2.7 We want to move a mass in 2 seconds, beginning and ending with zero speed, using
bounded acceleration. With x 1 its position and x 2 its speed, a model for this problem is
ẋ 1 (t ) = x 2 (t ), x 1 (0) = 0
ẋ 2 (t ) = u(t ), x 2 (0) = 0, x 2 (2) = 0.
u(t ) ∈ [−1, 1]
with cost
Z 1
J (x 0 , u(·)) = − ln(x(t )u(t )) dt .
0
Since x(0) > 0 we have that x(t ) ≥ 0 for all t . For a well-defined cost we hence need
u(t ) ∈ [0, ∞) but for the moment we allow any u(t ) ∈ R and later verify that the optimal
u ∗ (t ) is in fact > 0.
57
2.9 Consider an economy consisting of two sectors where Sector 1 produces investment
goods and Sector 2 produces consumption goods. Let x i (t ), i = 1, 2 represent the pro-
duction in the i -th sector at time t and let u(t ) be the fraction of investments allocated
to Sector 1. Suppose the dynamics of the x i (t ) are given by
ẋ 1 (t ) = au(t )x 1 (t )
ẋ 2 (t ) = a(1 − u(t ))x 1 (t )
where a is a positive constant. Hence, the increase in production per unit of time in
each sector is assumed to be proportional to the investment allocated to the sector. By
definition we have
where [0, T ] denotes the planning period. As optimal control problem we may consider
the problem of maximizing the total consumption in the given planning period [0, T ],
thus our problem to maximize
Z T
J (x 0 , u(·)) = x 2 (t ) dt (2.37)
0
subject to
x 1 (0) = x 10 , x 1 (T ) = free,
x 2 (0) = x 20 , x 2 (T ) = free,
with x 10 > 0, x 20 ≥ 0.
2.10 Consider the second order system with mixed initial and final conditions
The input u : [0, 1] → R is not restricted, i.e. u(t ) can take on any real value.
58
2.11 Consider the second order system with mixed initial and final conditions
The input u : [0, 1] → R is not restricted, i.e. u(t ) can take on any real value.
2.12 Integral constraints. Let us go back to the calculus of variations problem of minimizing
Z T
F (x(t ), ẋ(t )) dt
0
Thm. 1.8.1 says that at the optimal solution either (1.65) holds for some µ∗ ∈ R or
that (1.66) holds. This problem can also be cast as an optimal control problem with
a final condition, and then Thm. 2.5.1 gives us the same two conditions (depending
whether the Hamiltonian is normal or abnormal):
(a) Let ẋ(t ) = u(t ) and define ż n+1 (t ) = M (x(t ), u(t )) and z = (x, z n+1 ) ∈ Rn+1 . Formu-
late the above calculus of variations problem as an optimal control problem with
a final state condition in state z and with U = Rn . (I.e. express f (z), L(z, u), S(z) in
terms of F (x, ẋ), M (x, ẋ), c 0 .)
(b) Since z = (x, z n+1 ) has n + 1 components also the corresponding costate p has n +
1 components. Show that p n+1 (t ) is constant for both the normal Hamiltonian
H (x, p, u, 1) and abnormal Hamiltonian H (x, p, u, 0).
(c) For the normal Hamiltonian H (x, p, u, 1), show that the existence of a solution of
the Hamiltonian equations (2.27) and (2.28) imply that (1.65) holds for µ∗ = p n+1 .
(d) For the abnormal Hamiltonian H (x, p, u, 0), show that the existence of a solution
of the Hamiltonian equations (2.27) and (2.28) with p n+1 6= 0 implies that (1.66)
holds.
2.13 Time-varying cost. Suppose ẋ(t ) = −x(t ) + u(t ) and that x(0) = 1, x(1) = 0. Minimize
R 1/2 2 R1 2
0 u (t ) dt + 1/2 2u (t ) dt .
59
60
Chapter 3
Dynamic Programming
3.1 Introduction
In the late fifties of the previous century, at the time that the Minimum Principle was de-
veloped in the Sovjet Union, a team in the USA developed an entirely different approach to
optimal control, called Dynamic Programming. In this chapter we deal with Dynamic Pro-
gramming. As in the previous chapter, we assume that the state satisfies a differential equa-
tion
in which x : [0, T ] → Rn , and that the input u(t ) at each moment in time takes values in a
possibly limited set U:
u : [0, T ] → U. (3.1b)
As before, we associate with system (3.31a) a cost over a finite time horizon [0, T ] of the form
Z T
J [0,T ] (x 0 , u(·)) := L(x(t ), u(t )) dt + S(x(T )). (3.1c)
0
The cost depends on the initial condition x(0) = x 0 and the input u(·). In this chapter it is also
convenient to emphasise the dependence of this cost on the time interval [0, T ]. The final
time T and the functions S : Rn → R and L : Rn × U → R are assumed given.
The crux of Dynamic Programming is to associate with this one cost over time horizon
[0, T ] a whole family of costs over subsets of this time horizon,
Z T
J [τ,T ] (z, u(·)) := L(x(t ), u(t )) dt + S(x(T )), (3.2)
τ
for each initial time τ ∈ [0, T ] and for each initial state x(τ) = z, and then to establish a dy-
namic relation between these costs (hence the name dynamic programming). On the one
hand this complicates the problem because we will need to solve many optimal control prob-
lems, but it generates structure and much insight and, as we will see, it produces sufficient
conditions for optimality.
61
u∗ (t )
û(t )
0 τ T
should be instructive here. It depicts an optimal control u ∗ (·) on [0, T ] and an alternative
input û(·) on a restricted time window [τ, T ] for some τ. The optimal control u ∗ (·) steers the
state from x(0) = x 0 to some value x ∗ (τ) at t = τ. Is it possible that the alternative û(·) achieves
a smaller cost-to-go J [τ,T ] (x ∗ (τ), u(·)) over the remaining window [τ, T ] than u ∗ (·)? That is, is
it possible that
No, because if it would then the new input ũ(·) constructed from u ∗ (·) over the initial [0, τ]
and û(·) over the remaining [τ, T ] would improve on u ∗ (·) over the entire horizon:
Z τ
J [0,T ] (x 0 , ũ(·)) = L(x(t ), ũ(t )) dt + J [τ,T ] (x(τ), ũ(·))
Z0 τ
= L(x ∗ (t ), u ∗ (t )) dt + J [τ,T ] (x ∗ (τ), û(·))
Z0 τ
< L(x ∗ (t ), u ∗ (t )) dt + J [τ,T ] (x ∗ (τ), u ∗ (·)) = J [0,T ] (x 0 , u ∗ (·))
0
and this contradicts the assumed optimality of u ∗ (·). Summary: if u ∗ (·) is optimal for
J [0,T ] (x 0 , u(·)) then for every τ ∈ [0, T ] it is optimal for J [τ,T ] (x ∗ (τ), u(·)) as well. That is the
principle of optimality. It will be of great help in the analysis to come.
x t +1 = f (x t , u t ), (3.3)
t ∈ {0, 1, . . . T − 1},
with x 0 given and T a given positive integer. We want to find a control sequence
(u ∗0 , . . . , u ∗T −1 ), called optimal control (sequence) and resulting state sequence (x ∗0 , x ∗1 , . . . , x ∗T )
that minimizes a cost of the form
TX
−1
J [0,T ] (x 0 , u 0 , . . . , u T −1 ) = L(x t , u t ) + S(x T ). (3.4)
t =0
Incidentally, in discrete-time systems there is no need to restrict the state space X to some set
on which derivatives are defined, like our default Rn . Indeed, the state space in applications
is often a finite set, for example the standard alphabet, X = {a, b, c, . . . , z}. The same is true for
the input set U. In what follows, the number of elements of a set X is denoted as |X|.
62
2
1
3
0
4
6
5
Example 3.3.1 (Naive optimization). Suppose the state space X consists of the 7 integer ele-
ments
X = {0, 2, . . . , 6}.
Align the states in a circle (Fig. 3.2) and suppose that at each moment in time the state can
either move one step counter-clockwise, or stay where it is. Thus at each moment in time we
have a choice of two. The input space U then has two elements. If we take
U = {0, 1}
then the transition from one state to the next is modeled by the discrete system
x t +1 = x t + u t , u t ∈ U, t ∈ {0, 1, . . . , T − 1}
(counting modulo 7, so 6 + 1 = 0). Each transition from one state x t to the next x t +1 is as-
sumed to cost a certain amount L(x t , u t ) and the final state x T at time T costs an additional
S(x T ). The total cost hence is (3.4). The naive approach to determine the optimal control
{u 0 , . . . , u T −1 } and resulting optimal state sequence {x 1 , . . . , x T } is to just explore them all and
pick the best. As we can move in two different ways each moment in time, this naive ap-
proach would require 2T sequences (x 1 , . . . , x T ) to explore. Since each sequence has length T
the evaluation of the cost for each sequence is (roughly) linear in T , and therefore the total
number of operations required in this naive approach is of order
T × 2T .
It is not hard to see that for arbitrary systems (3.3) the total number of operations that the
naive approach requires is of order
T × | U| T .
It is exponential in T .
In Dynamic Programming we solve the minimization backwards in time. This may at first
sight seem to complicate the analysis, but it allows us to exploit the principle of optimality.
The following example explains it all.
Example 3.3.2 (Backwards in time). Continue with the system of Example 3.3.1,
63
with x t ∈ {0, 1, . . . , 6} and to make it more explicit, assume that the final cost is x 2 and that each
counter-clockwise move costs 1, i.e.
x =6
x =1
x =0
t =0
t =1
t = T −1
t =T
The horizontal axis represents time t = 0, 1, . . . , T and the vertical axis represents the states x =
0, 1, . . . , 6. Vertices (dots) denote pairs (t , x) and lines (edges) represent possible transitions.
For instance the line connecting (t , x) = (0, 6) with (t , x) = (1, 0) says that we can move from
x = 6 to x = 0 in one time step.
Let us first figure out the cost of the final state, x T . Since we do not know in which final
state we end up, we have to determine this cost for every element of the state space. This cost
we denote as VT (x) and clearly this is simply the final cost VT (x) = S(x) = x 2 , so:
x =6 36
25
16
x =1 1
x =0 0
0 1 T −1 T
Now that the cost Vt (x) at the final t = T is known, consider the optimal cost-to-go from
t = T −1 onwards. This cost is denoted VT −1 (x) and since, again, we do not know which states
can be reached, we have to compute this cost for every x of the state space. This optimal cost-
to-go VT −1 (x) is by definition the smallest possible cost that we can achieve if at time t = T −1
we are at state x. This equals
because L(x, u) is the cost of the transition if we apply input u and VT ( f (x, u)) is the final cost
(because f (x, u) is the state we end up in if we apply u). With VT already established this
minimization requires at each state |U| = 2 inputs to explore and since we have to perform
the minimization for every state in X = {0, 1, . . . , 6}, the total number of operations that this
64
requires is of order |X| × |U|. The numbers inside the circles are VT (x) and VT −1 (x):
x =6 1 36
25 25
16 16
9 9
4 4
x =1 1 1
x =0 0 0
0 1 T −1 T
Along the way we also determined an optimal input u T −1 (denoted in the figure by the thick
edges). Notice that none of the states x at time T − 1 switches to x = 6 at time T . We can
continue in this fashion and determine backwards in time, for each t = T − 2, T − 3, . . . , 0, the
cost-to-go from t onwards, via the rule
As before, this equation says that the cost-to-go from t onwards starting at x t = x, is the cost
of the transition L(x, u) plus the optimal cost-to-go from t + 1 onwards. Eventually we end up
at t = 0 with this solution
x =6 1 1 1 1 1 36
x =5 2 2 2 2 25 25
x =4 3 3 3 16 16 16
x =3 4 4 9 9 9 9
x =2 4 4 4 4 4 4
x =1 1 1 1 1 1 1
x =0 0 0 0 0 0 0
0 1 T −1 T
The optimal control sequence in this example is actually not unique. The above indicates all
possible optimal solutions. The optimal cost Vt (x) of course is unique. Now the problem is
solved, in fact, it is solved for every initial condition x 0 . For x 0 = 6 we see that one optimal
input sequence is u ∗ = (1, 0, 0, 0, 0) while for x 0 = 5 one optimal input is u ∗ = (1, 1, 0, 0, 0). ä
In Dynamic Programming the game is to compute the optimal cost-to-go via the recursion
¡ ¡ ¢¢
Vt (x) = min L(x, u) + Vt +1 f (t , u) (3.5)
u∈U
starting at the final time, t = T , where the problem is trivial, and then subsequently going
backwards in time, t = T − 1, t = T − 2, . . . until we reach t = 0. To determine the final cost
VT (z) = S(x) for all x ∈ X requires order |X| operations. Then determining VT −1 (x) for all x ∈ X
requires |X| times the number of inputs |U| to explore, et cetera, and so the total number of
operations for all t ∈ {0, 1, . . . , T } is of order
T × |U| × |X|.
If the number of states is modest or if T is large, then this typically outperforms the naive ap-
proach (which requires order T ×|U|T operations). Equation (3.5) is called Bellman’s equation
of Dynamic Programming.
65
In continuous time the same basic idea survives, except for the results regarding its com-
putational complexity. In the continuous time case the optimization is over a set of input
functions on the time interval [0, T ], which is an infinite-dimensional space. Furthermore,
it is clear that contrary to the discrete-time case we will not be able to completely split the
problem into a series of finite-dimensional minimization problems.
Definition 3.4.1 (Value function or cost-to-go). The value function V : Rn ×[0, T ] → R at state
z and time τ is defined as the optimal cost-to-go over time horizon [τ, T ] with initial state
x(τ) = z, that is,
In most cases of interest the infimum in (3.6) is attained by some u ∗ (·), in which case the
infimum (3.6) is a minimum. In general, though, a minimizer need not exist but the infimum
always does exist (it might be −∞).
Example 3.4.2 (Integrator with linear cost). Consider once again the integrator from the sys-
tem of Example 2.4.3,
U = [−1, 1]
From (3.7) and the fact that ẋ(t ) = u(t ) ∈ [−1, 1] it is immediate that the optimal control is
u ∗ (t ) = −1 and, hence, x(t ) = x 0 − t . Then the value function at τ = 0 is
Z T
V (x 0 , 0) = J [0,1] (x 0 , u ∗ (·)) = x 0 − t dt = x 0 − T /2.
0
Next we determine value function at the other time instances. Analogously to the previous
situation, it is easy to see u ∗ (t ) = −1 is optimal for J [τ,T ] (z, u(·)) for every τ > 0 and every
x(τ) = z. Hence x ∗ (t ) = z − (t − τ) and
Z T h iT
V (z, τ) = z − (t − τ) dt = zt − 12 (t − τ)2 = z(T − τ) − 12 (T − τ)2 . (3.8)
τ τ
As expected, the value function is zero at the final time τ = T . It is not necessarily monotonic
in τ, see Fig. 3.3. Indeed for z = T /2 the value function is zero at τ = 0 and τ = T yet positive
in between. ä
66
1
V (1.5, τ)
0.5 V (1, τ)
V (.5, τ)
0 T =1 τ→
V (0, τ)
−0.5
V (−.5, τ)
−1
F IGURE 3.3: The value function V (z, τ) of the problem of Example 3.4.2 for various z as a func-
tion of τ ∈ [0, T ]. The plot assumes T = 1
For any input u(·) – optimal or not – the cost-to-go from τ onwards equals the cost over
[τ, τ + ²] plus the cost over the remaining [τ + ², T ], that is
Z τ+²
J [τ,T ] (z, u(·)) = L(x(t ), u(t )) dt + J [τ+²,T ] (x(τ + ²), u(·)) (3.9)
τ
with initial state x(τ) = z. The value function is defined as the infimum of this cost over all
admissible inputs, hence taking the infimum over u(·) of the left and right-hand side of (3.9)
shows that
³Z τ+² ´
V (z, τ) = inf L(x(t ), u(t )) dt + J [τ+²,T ] (x(τ + ²), u(·)) .
u:[τ,T ]→U τ
Now by the principle of optimality, any optimal control over [τ, T ] is optimal for J [t +²,T ] (x(t +
²), u(·)) as well, so the cost equals the value function. The right-hand side of the above equal-
ity can thus be simplified to
³ Z τ+² ´
V (z, τ) = min L(x(t ), u(t )) dt + V (x(τ + ²), τ + ²)
u:[τ,τ+²]→U τ
with initial condition x(τ) = z. Notice that in this last equation we need only optimize over
inputs on the time window [τ, τ+²] because optimization over the remaining time window [τ+
², T ] is incorporated in the value function V (x(τ + ²), τ + ²). For further analysis it is beneficial
to move the V (z, τ) to the right-hand side and to scale the entire equation by 1/²,
R τ+²
τ L(x(t ), u(t )) dt + V (x(τ + ²), τ + ²) − V (z, τ)
0= min .
u:[τ,τ+²]→U ²
In this form we can take the limit ² → 0. It is plausible that functions u : [τ, τ + ²] → U in the
limit can be identified with constants u ∈ U and that the difference of the above two value
functions converges to the total derivative with respect to τ. This gives
µ ¶
dV (x(τ), τ)
0 = min L(x(τ), u) + (3.10)
u∈U dτ
for all τ ∈ [0, T ] and all x(τ) = z ∈ Rn . Incidentally this identity is reminiscent of the cost-to-
go (B.16) explained in Section B.5 of Appendix B. The total derivative of V (x(τ), τ) with respect
67
to τ is
dV (x(τ), τ) ∂V (x(τ), τ) ∂V (x(τ), τ)
= f (x(τ), u(τ)) + .
dτ ∂x T ∂τ
Inserting this into (3.10) and using u := u(τ), x := x(τ) we arrive at a partial differential equa-
tion in V (x, τ):
µ ¶
∂V (x, τ) ∂V (x, τ)
0 = min L(x, u) + f (x, u) +
u∈U ∂x T ∂τ
for all τ ∈ [0, T ] and all x ∈ Rn . The partial derivative of V (x, τ) with respect to τ does not
depend on u and so does not contribute to the minimization. This, finally, brings us to the
famous equation
µ ¶
∂V (x, τ) ∂V (x, τ)
+ min f (x, u) + L(x, u) = 0. (3.11)
∂τ u∈U ∂x T
Ready. This equation is known as the Hamilton-Jacobi-Bellman equation, or just HJB equa-
tion.
What did we do so far? We made it plausible that the relation between the value functions
at neighboring points in state x and time τ is the partial differential equation (3.11). We need
to stress here the word “plausible”, because we have “derived” (3.11) only under several tech-
nical assumptions including existence of an optimal control, existence of a value functions
and existence of some limits1 . However, we can turn the analysis around, and obtain a suffi-
cient condition for optimality. This is the following theorem and it is the central result of this
chapter. In this formulation the time τ is called t again.
1. For any admissible input u(·), the value of V (z, τ) is a lower bound of the cost over [τ, T ]
starting at x(τ) = z:
function V (x, t ) and L(x, u) and S(x) are all C 1 , (b) for every possible τ, x an optimal control u : [τ, T ] → U exists
that is continuous.
68
3. Suppose the minimization problem in (3.12) for each x ∈ Rn and each t ∈ [0, T ] has
a (possibly non-unique) solution u. Denote one such solution as u(x, t ). If for every
z ∈ Rn and every τ ∈ [0, T ] the solution x(t ) of ẋ(t ) = f (x(t ), u(x(t ), t )) with x(τ) = z is
well defined for all t ∈ [τ, T ], then V (z, τ) is the value function and u ∗ (t ) := u(x(t ), t ) is
an optimal control for J [τ,T ] (z, u(·)).
Proof.
1. Given z and τ, let u(·) be an admissible input for ẋ(t ) = f (x(t ), u(t )) for t > τ and x(τ) =
z. Then
2. Then by assumption, x ∗ (t ) is well defined. Let z = x 0 and τ = 0. For the input u ∗ (t ) the
inequality in Eqn. (3.15) is an equality. Hence J [0,T ] (x 0 , u ∗ (·)) = V (x 0 , 0) and we already
showed that no control achieves a smaller cost.
3. Similar: then by assumption x(t ) is well defined. For the so defined input u ∗ (t )
the inequality in Eqn. (3.15) is an equality. Hence the optimal cost then equals
J [τ,T ] (x 0 , u ∗ (·)) = V (z, τ) and it is achieved by u ∗ (t ). Since this holds for every z ∈ Rn
and every τ ∈ [0, T ] the V (z, τ) is the value function.
Parts 2 and 3 are a bit technical because the input found by solving the minimization
problem of (3.12) pointwise (at each x and each t ) does not always give us an input u ∗ (t )
for which x(t ) is well-defined for all t ∈ [0, T ], see Exercise 3.3(c). Luckily, however, in most
applications this problem does not occur and then the above says the the so determined input
is the optimal solution and that V (x, t ) is the value function.
Theorem 3.4.3 provides a sufficient condition for optimality: if we can solve the Hamilton-
Jacobi-Bellman equations (3.12,3.13) and if the conditions of Theorem 3.4.3 are satisfied, then
we are guaranteed that u ∗ (t ) is an optimal control. Recall, on the other hand, from the pre-
vious chapter that the conditions for optimality found from the Minimum Principle are nec-
essary for optimality. So in a sense, the Minimum Principle and Dynamic Programming com-
plement each other.
69
Another difference between the two methods is that an optimal control u(t ) derived from
the Minimum Principle is given as a function of state x(t ) costate p(t ), which after solving the
Hamiltonian equations gives us u ∗ (t ) as a function of time, while in Dynamic Programming
the optimal input is given in state feedback form u(x, t ). The state feedback form is all we need
to compute the solutions x ∗ (t ), u ∗ (t ) of the system equation ẋ(t ) = f (x(t ), u(x(t ), t )). Also, in
applications the state feedback form is prefered, because it is way more robust2 . The next
example demonstrate this feedback property.
ẋ(t ) = u(t ),
with cost
Z T
2
J (x 0 , u(·)) = x (T ) + Ru 2 (t ) dt
0
for some R > 0. We allow any u(t ) in R. Then the HJB equations (3.12, 3.13) become
µ ¶
∂V (x, t ) ∂V (x, t ) 2
+ min u + Ru = 0, V (x, T ) = x 2 .
∂t u∈R ∂x
Since the term to be minimized is quadratic in u (and R > 0) the optimal u is where the
(x,t )
derivative of ∂V∂x u + Ru 2 with respect to u is zero. This is for
1 ∂V (x, t )
u=− , (3.16)
2R ∂x
and then the HJB equations reduce to
µ ¶
∂V (x, t ) 1 ∂V (x, t ) 2
− = 0, V (x, T ) = x 2 .
∂t 4R ∂x
Motivated by the boundary condition we now try a V (x, t ) that is quadratic in x for all time, so
of the form V (x, t ) = x 2 P (t ). (Granted, this is a magic step.) This way the HJB equations (3.12,
3.13) simplify to
1
x 2 Ṗ (t ) − (2xP (t ))2 = 0, x 2 P (T ) = x 2 .
4R
Ṗ (t ) = P 2 (t )/R, P (T ) = 1.
R
P (t ) = .
R +T −t
(This solution can be found via separation of variables.) It is well defined throughout t ∈ [0, T ]
and, therefore,
R
V (x, t ) = x 2 (3.17)
R +T −t
2 If the state at some time τ is corrupted by noise or whatever, then the feedback implementation of the input
still performs well, and in fact x(t ), u(t ) then continue optimally from that time on.
70
is a solution of the HJB equation. Now that V (x, t ) is known we can compute the candidate
optimal input (3.16). It is not (yet) a function of t alone, it also depends on x(t ):
x ∗ (t )
ẋ ∗ (t ) = u ∗ (t ) = − .
R +T −t
It is a linear differential equation and it has a well defined solution x ∗ (t ) on [0, T ] and then
also the above u ∗ (t ) is well defined on [0, T ]. This, finally, allows us to conclude that (3.17)
is the value function, that the above u ∗ (t ) is the optimal input and that the optimal cost is
J [0,T ] (x 0 , u ∗ (·)) = V (x 0 , 0) = x 02 /(1 + T /R). We solved the optimal control completely. Not bad.
ä
The term to be minimized is quadratic in u hence is minimal only if the derivative with re-
spect u is zero. This gives
1 ∂V (x, t )
u=− .
2r 2 ∂x
So we can re-write the HJB equations as
µ ¶
∂V (x, t ) 2 1 ∂V (x, t ) 2
+x − 2 = 0, V (x, T ) = 0. (3.19)
∂t 4r ∂x
This is a nonlinear partial differential equation, and this might be complicated. But is has an
interesting physical dimensional property (read the footnote3 if you want to know) and this
suggests that
V (x, t ) = P (t )x 2 .
We then find
1
Ṗ (t )x 2 + x 2 − (2P (t )x)2 = 0, P (T ) = 0.
4r 2
3 Outside the scope of this book, but still: let [x] denote the dimension of a quantity x. For example [t ] = time.
From ẋ = u it follows that [u] = [x][t ]−1 and then x 2 + r 2 u 2 implies that [r ] = [t ] and then [V ] = [J ] = [x]2 [t ]. This
suggests that V (x, t ) = x 2 P (t ). In fact, application of the Buckingham π-theorem (not part of this course) shows
that V (x, t ) must have the form V (x, t ) = x 2 r Q((t − T )/r ) for some dimensionless function Q : R → R.
71
Division by x 2 yields the ordinary differential equation
1
Ṗ (t ) + 1 − P (t )2 = 0, P (T ) = 0.
r2
This type of differential equations are discussed at length in the next chapter. The solution is
|r |
e(T −t )/r − e(t −T )/r P (t )
P (t ) = r . (3.20)
e(T −t )/r + e (t −T )/r
T −|r | T
So the HJB equations (3.19) has a solution V (x, t ) = x 2 P (t ) for this system, where P (t ) is given
by (3.20). The candidate optimal control u ∗ (t ) then is u(x(t ), t ) = − 2r1 2 ∂V (x(t
∂x
),t )
= − r12 P (t )x(t )
and so the candidate optimal state satisfies the linear time-varying differential equation
1
ẋ ∗ (t ) = u(x ∗ (t ), t ) = − P (t )x ∗ (t ), x(0) = x 0 .
r2
Since P (t ) is well defined and bounded,Rt it will be clear that the solution x(t ) is well defined,
− 12 0 P (τ) dτ
in fact the solution is x ∗ (t ) = e r x 0 . Having a well defined solution means that x ∗ (t )
is the optimal state, that
1
u ∗ (t ) = − P (t )x ∗ (t )
r2
is the optimal control and that V (x 0 , 0) = P (0)x 02 is the optimal cost. Once again the optimal
control is given as a state feedback, and once again we managed to solve the optimal control
problem completely. Nice. ä
V (x, t ) = x 4 P (t ). (3.21)
(It needs to be seen whether this forms works.) Substitution of this form in the HJB equation
yields
x 4 Ṗ (t ) + min(4x 3 P (t )u + x 4 + u 4 ) = 0, x 4 P (T ) = 0.
u∈R
p
The minimizing u is u = − 3 P (t )x. This can be obtained by setting the gradient of 4x 3 P (t )u +
x 4 + u 4 with respect to u equal to zero (verify this yourself). This simplifies the HJB equation
to
x 4 Ṗ (t ) − 4x 4 P 4/3 (t ) + x 4 + x 4 P 4/3 (t ) = 0, x 4 P (T ) = 0.
Ṗ (t ) = 3P 4/3 (t ) − 1, P (T ) = 0. (3.22)
72
The equation here is a simple first-order differential equation, except that no closed form
solution appears to be known. The graph of the solution (obtained numerically) is
3−3/4 ≈ 0.43869
P (t )
T −1 T
and it reveals that P (t ) is well defined and bounded for all t < T . This proves that the HJB
equation has a solution of the quartic form (3.21)! As t → −∞ the solution P (t ) converges
to the equilibrium solution where 0 = 3P 4/3 − 1, i.e. where P = 3−3/4 ≈ 0.43869. For now the
function V (x, t ) = x 4 P (t ) is just a candidate value function. The resulting candidate optimal
control input
p
3
u ∗ (t ) = − P (t )x ∗ (t ) (3.23)
3−1/4 ≈ 0.75984
p
3
P (t ) area≈ 0.12692
T −1 T
For t < T − 1 it is very close to the constant 3−1/4 ≈ 0.75984 and the total area over all t < T
p
between 3 P (t ) and this constant can be shown to be finite, approximately equal to 0.12692.
This tells us that the solution x ∗ (t ) of the differential equation (3.24) for t < T − 1 is close to
−1/4
exponential, x ∗ (t ) ≈ e−3 t x 0 , and that x ∗ (t ) slows down as t approaches T , and that
−1/4
x ∗ (T ) ≈ e0.12692 e−3 T
x0
whenever T > 1. ä
In the above examples the candidate value functions V (x, t ) all turned out to be true value
functions. We need to stress that examples exist where this is not the case, see Exercise 3.3(c).
The final example is one where U is restricted.
Example 3.4.7 (Example 3.4.2 extended). We consider the system of Example 3.4.2, that is,
ẋ(t ) = u(t ), x(0) = x 0 with inputs taking values in U = [−1, 1]. The cost, however, we extend
with a final cost,
Z T
J [0,T ] (x 0 , u(·)) = −αx(T ) + x(t ) dt
0
4
in which
α > 0.
4 It may be helpful to know that α has the same physical dimension as t , so if t has dimension “time” then also
73
The HJB equations (3.12),(3.13) become
µ ¶
∂V (x, t ) ∂V (x, t )
+ min u + x = 0, V (x, T ) = −αx. (3.25)
∂t u∈[−1,1] ∂x
(x,t )
The function to be minimized, ∂V∂x u + x, is linear in u. So the minimum is attained at one
of the boundaries of U = [−1, 1]. One way to proceed would be to analyse the HJB equations
for the two cases u = ±1. But the equations are partial differential equations and these are
often very hard to solve. (In this case it can be done though.) We take another route: in
Example 3.4.2 where we analyzed a similar problem we ended up with a value function V (x, t )
of the form
V (x, t ) = xP (t ) +Q(t )
for certain functions P (t ),Q(t ). We will see that this form also works for our problem. The
HJB equations for this form simplify to
This has to hold for all x and all t so the HJB equations hold iff
This settles P (t ):
P (t ) = T − α − t .
This function is positive for t < T − α and negative for t > T − α. The minimizing u ∈ [−1, 1] of
P (t )u + x hence is
(
−1 if t < T − α
u ∗ (t ) = . (3.27)
+1 if t > T − α
( Q̇(t )
+(T − α − t ) if t < T − α
Q̇(t ) =
−(T − α − t ) if t > T − α
T −α T
( 1 T −α T
2
− 2 (T − α − t )2 − α2 if t < T − α, 2
Q(t ) = − α2
α2
+ 12 (T −α− t) − 2
2 if t > T − α.
Q(t )
This function is continuously differentiable. Now all conditions of (3.26) are met and there-
fore V (x, t ) = xP (t ) + Q(t ) satisfies the HJB equations. Along the way we also determined the
candidate optimal input: (3.27). This input does not depend on x (in most applications it
does depend on x). Clearly for this input, the solution x(t ) of ẋ(t ) = u(t ) is well defined for
all t ∈ [0, T ]. Hence (3.27) is the optimal input, the above V (x, t ) is the value function and
V (x 0 , 0) = x 0 P (0) +Q(0) is the optimal cost.
74
Does it agree with the Minimum Principle? The Hamiltonian is H (x, p, u) = pu + x so
the Hamiltonian equation for the costate is ṗ ∗ (t ) = −1, p ∗ (T ) = −α. Clearly this means that
p ∗ (t ) = T − α − t . Now the u that minimizes the Hamiltonian at the optimal state and costate,
H (x ∗ (t ), p ∗ (t ), u) = (T − α − t )u + x ∗ (t )
agrees what we found earlier: (3.27). But of course, the fundamental difference is that the
Minimum Principle assumes the existence of an optimal control, whereas satisfaction of the
above HJB equations proves that the control is optimal. ä
The examples might give the impression that Dynamic Programming is superior to the
Minimum Principle. In practical applications, however, it is the other way round. The equa-
tions needed in the Minimum Principle (i.e. the Hamiltonian equations) are ordinary differen-
tial equations and numerical routines exist that are quite efficient in solving these equations.
The HJB equations, in contrast, normally are partial differential equations and the standard
HJB theory requires a higher degree of smoothness than the Minimum Principle requires.
This last point is exemplified in the next section where we derive co-states from value func-
tions of limited smoothness.
∂V (x, t ) ¡ ∂V (x, t ) ¢
+ min H x, , u = 0.
∂t u∈U ∂x
This suggests that the costate p ∗ (t ) in the Minimum Principle equals
∂V (x ∗ (t ), t )
p ∗ (t ) = .
∂x
Under mild assumptions that is indeed the case. In order to prove it for restricted sets U
(such as U = [0, 1]) we need an extension of the classic result that says that at a minimum of a
smooth function certain gradients are zero.
Lemma 3.5.1 (Technical lemma). Let U ⊆ Rm , and let G : Rn × U → R be some function. Sup-
pose Ω is a region of Rn . If
then
∂G(x, u ∗ (x)) ∂u ∗ (x)
=0 ∀x ∈ Ω.
∂u T ∂x T
∂G(x,u ∗ (x)) ∂u ∗ (x)
Proof. Denote G u T := ∂u T ∈ R1×m and u x T := ∂x T ∈ Rm×n . Let δx ∈ Rn and α ∈ R. Then
If G u T u x T is nonzero then the above is less than G(x, u ∗ (x)) for some choice of δx ∈ Rn and
scalar α close enough to zero. This contradicts that u ∗ (x) minimizes G(x, u). ■
75
Observe that the lemma does not require u ∗ (x) to be differentiable at every x.
Example 3.5.2 (Demonstration of technical lemma). Let x, u be real numbers and consider
G(x, u) = (x − u)2 and U = [−1, 1]. Given x, the u ∈ U that minimizes G(x, u) is
−1 if x < −1
u ∗ (x) = x if x ∈ [−1, 1]
1 if x > 1.
It is defined and equal to zero for almost all x (for all x 6= ±1.) ä
In optimal control the optimal input u(x, t ) is usually differentiable with respect to x al-
most everywhere and this is enough to make the connection between the HJB equations and
the Minimum Principle:
Theorem 3.5.3 (Connection between costate & value function). Assume f (x, u), L(x, u), S(x)
are all C 1 . Let U ⊆ Rm and suppose there is a function V : Rn × [0, T ] → R that satisfies the HJB
equation
∂V (x, t ) ³ ∂V (x, t ) ´
+ min H x, , u = 0, V (x, T ) = S(x)
∂t u∈U ∂x
at all (x, t ) where V (x, t ) it is continuously differentiable. Denote one possible minimizer u as
u ∗ (x, t ) and let x ∗ (t ) be the solution of ẋ(t ) = f (x(t ), u ∗ (x(t ), t )), x(0) = x 0 . If
then p ∗ (t ) defined as
∂V (x ∗ (t ), t )
p ∗ (t ) = (3.28)
∂x
is a solution of the Hamiltonian co-state equation
76
Proof. Let D be a region of Rn × U on which V (x, t ) is C 2 with respect to x, t and u ∗ (x, t ) is C 1
with respect to x, t . By definition, the minimizing u ∗ (x, t ) satisfies the HJB equation
∂V (x, t ) ³ ∂V (x, t ) ´
+ H x, , u ∗ (x, t ) = 0 (3.30)
∂t ∂x
for all (x, t ) ∈ D. In the rest of this proof we drop the arguments (x, t ) and u ∗ . The partial
derivative of the previous expression with respect to (row vector) x T yields
∂V ∂H ∂H ∂V ∂H ∂u
T
+ T+ T T
+ T T =0 ∀(x, t ) ∈ D.
∂x ∂t ∂x ∂p ∂x ∂x |∂u {z∂x }
0
The underbraced term is zero on D because of Lemma 3.5.1. Using this expression and the
fact that ∂H
∂p = f we find that
d ∂V (x(t ), t ) ∂V ∂V ∂V ∂V ∂H ∂H
= + f = + T =− ∀(x, t ) ∈ D.
dt ∂x ∂t ∂x ∂x∂x T
∂t ∂x ∂x ∂x ∂p ∂x
∂V (x(T ),T ) ))
Because V (x, T ) = S(x) for all x, we have ∂x = ∂S(x(T ∂x . Hence, if (x ∗ (t ), t ) – as a function
∂V (x ∗ (t ),t )
of time – is in D for almost all time, then p ∗ (t ) := ∂x satisfies the costate equation (3.29)
∂V (x ∗ (t ),t )
for almost all time. By assumed continuity of ∂x it is therefore a solution of the costate
equation.
Along the optimal solution, the total derivative of the Hamiltonian with respect to time is
zero almost all the time because
d ∂H ∂H ∂H du ∗
H (x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t )) = T ẋ ∗ + ṗ + T
T ∗
∀(x ∗ (t ), t ) ∈ D
dt ∂x ∂p ∂u ∗ dt
| {z }
0
∂H ∂H ∂H ∂H
= T( )+ T
(− )=0 ∀(x ∗ (t ), t ) ∈ D.
∂x ∂p ∂p ∂x
Here, again, the underbraced term is zero because of Lemma 3.5.1. Hence the Hamiltonian at
x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t ) is constant for almost all time. By assumption, the HJB equality (3.30)
holds at x = x ∗ (t ), u ∗ (x, t ) = u ∗ (x ∗ (t ), t ) for all t ∈ [0, T ] and, since, again by assumption,
∂V (x,t )
∂t is continuous at x = x ∗ (t ) for all time, also H (x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t )) is continuous
for all time. Combined with the fact that it is constant for almost all time shows that it is
constant for all time. ■
77
This is exactly the p ∗ (t ) as found in Example 2.4.4. ä
For restricted sets such as U = [0, 1] the value function is typically continuously differen-
tiable everywhere (see Example 3.4.7 and Exercise 3.7) but in some cases it is continuously
differentiable only almost everywhere:
Example 3.5.5 (Non-smooth value functions). Suppose ẋ(t ) = x(t )u(t ) with U = [0, 1] and let
J [0,T ] (x 0 , u(·)) = x(T ). So we should try to make x(t ) as small (negative) as possible. Clearly
one optimal input as a function of state x and time t is
(
0 if x ≥ 0
u(x, t ) = ,
1 if x < 0
This value function is not continuously differentiable with respect to x at x = 0 and there-
fore the standard theory does not apply. It does satisfy the HJB equations at all x where it is
continuously differentiable (at all x 6= 0):
µ ¶ ½ ¾
∂V (x, t ) ∂V (x, t ) 0 + 0 = 0 if x > 0
+ min xu = = 0 ∀x 6= 0.
∂t u∈[0,1] ∂x − eT −t x + eT −t x = 0 if x < 0
The only difference here is the cost function. The integral that defines the cost is now over
all t > 0, and the “final” cost S(x(∞)) has been dropped because in applications we normally
send the state to a unique equilibrium x(∞) := limt →∞ x(t ) and thus all such controls achieve
the same final cost (i.e., the final cost would not affect the optimal control). As before we
define the value function as
Because of the infinite horizon, however, the value function no longer depends on τ (see Ex-
ercise 3.9(a)) and so we can simply write
78
That way the HJB equation (3.12) simplifies to
µ ¶
∂V (x)
min f (x, u) + L(x, u) = 0. (3.33)
u∈U ∂x T
If we solve this equation then, quite often, we also found a stabilizing input (which is an
input that steers the state to some given equilibrium state x̄) and a Lyapunov function for
that equilibrium. The following example demonstrates this point.
Example 3.6.1 (Quartic control – design of optimal stabilizing inputs and Lyapunov func-
tion). Consider the optimal control problem with
Z ∞
ẋ(t ) = u(t ), U = R, J [0,∞) (z, u(·)) = x 4 (t ) + u 4 (t ) dt .
0
V (x) = 3−3/4 x 4 + d
for some integration constant d . The choice d = 0 is convenient for then V (0) = 0. We claim
that this V (x) is a Lyapunov function for the equilibrium x̄ = 0 of the controlled system ẋ ∗ (t ) =
u ∗ (t ) for the control equal to the candidate optimal control,
1 ∂V (x(t )) 1/3
u ∗ (t ) = −( ) = −3−1/4 x(t ).
4 ∂x
Indeed, x̄ = 0 is an equilibrium of this controlled system and V (x) clearly is C 1 , is positive
definite and by construction the HJB equation (3.33) gives us that
∂V (x)
V̇ (x) = f (x, u ∗ ) = −L(x, u ∗ ) = −(x 4 + u ∗4 ) < 0
∂x T
for all x 6= 0. This V (x) hence is a strong Lyapunov function for the controlled system with
equilibrium x̄ = 0 and, therefore, it is asymptotically stable at x̄ = 0. For this reason the con-
trol input u ∗ (·) is called a stabilizing input. In fact it is the input that minimizes the cost
R∞
J [0,∞) (x 0 , u(·)) = 0 x 4 (t ) + u 4 (t ) dt over all inputs that stabilize the system. Indeed, for any
input u(·) that steers the state to zero we have the inequality
Z ∞
J [0,∞) (x 0 , u(·)) = L(x(t ), u(t )) dt
0
Z ∞
V (x(t ))
≥ − f (x(t ), u(t )) dt because of (3.33)
∂x T
Z0 ∞
= −V̇ (x(t )) dt = V (x 0 ) − V (x(∞)) = V (x 0 ),
0 | {z }
0
79
3.7 Exercises
3.1 Consider the system
Find a new cost functional such that the maximization problem becomes a minimiza-
tion problem. How are the associated optimal inputs for the two optimization problems
related?
3.2 Not every optimal control problem is solvable. Consider the system ẋ(t ) = u(t ), x 0 = 1
with cost
Z T
J [0,T ] (x 0 , u(·)) = x 2 (t ) dt
0
and U = R.
(a) Determine the value function (from the definition, not the HJB equation).
(b) Show that the value function does not satisfy the HJB equations.
(c) Show that there is no bounded optimal control u ∗ : [0, T ] → R.
(a) Determine the candidate value function V (x, t ) and candidate optimal control
u ∗ (x(t ), t ) (possibly still depending on x(t )).
Hint: assume that V (x, t ) does not depend on t , i.e. that it has the form V (x, t ) =
Q(x) for some function Q(·).
(b) Now let x 0 = 1 and T > 0. Show that the candidate value function V (x, t ) is the
value function and determine the optimal control u ∗ (t ) explicitly as a function of
time. (Hint: have look at Example B.1.5.)
(c) Now let x 0 = −1 and T = 2. Show that the candidate V (x, t ) and u ∗ (x(t ), t ) are not
the value function and not the optimal input! (In other words: what condition of
Thm. 3.4.3 fails here?)
3.4 Even though Dynamic Programming and the HJB equation are powerful concepts, we
should always strive for simpler approaches.
Consider the system
0 ≤ u(t ) ≤ 1.
80
(a) Use your common sense to solve the minimization problem for x 0 > 0.
(b) What are the minimal costs that you could achieve in part 3.4a?
(c) Use part 3.4b to find a candidate solution for HJB equation.
Verify that this candidate solution satisfies (3.12) for x > 0.
Does it also satisfy (3.12) for all x ∈ R?
(d) Using your common sense to solve the minimization problem for x 0 < 0. Make a
distinction between −x 0 ≤ T and −x 0 > T . What are the minimal costs now?
3.5 The capital x(t ) ≥ 0 of an economy at any moment t is divided into two parts: u(t )x(t )
and (1 − u(t ))x(t ) with
The first part, u(t )x(t ), is for investments and contributes to the increase in capital ac-
cording to the formula
The other part, (1 − u(t ))x(t ), is for consumption and is evaluated by the “satisfaction”
Z 3
J [0,T ] (x 0 , u(·)) = x(3) + (1 − u(t ))x(t ) dt .
0
(a) Try as value function a function of the form V (x, t ) = Q(t )x and with it determine
the HJB equations.
(b) Express the candidate optimal u ∗ (t ) as a function of Q(t ) (Hint: x(t ) is always
positive.)
(c) Determine Q(t ) for all t ∈ [0, 3].
(d) Determine the optimal u ∗ (t ) explicitly as a function of time and argue that this is
the true optimal control (so not just the “candidate” optimal control).
(e) What is the optimal cost J [0,3] (x 0 , u ∗ (·))?
81
3.7 Consider the system with bounded input
(a) Argue that the value function V (x, τ) defined in (3.32) does not depend on time τ.
(b) Suppose V (x) is a continuously differentiable function that solves this HJB equa-
tion (3.33). Show that for any admissible input for which V (x(∞)) = 0 we have
that
J [0,∞) (x 0 , u(·)) ≥ V (x 0 ).
(c) Consider the integrator ẋ(t ) = u(t ) and that u(t ) is free to choose, u(t ) ∈ R, and
suppose that the cost is
Z ∞
J [0,∞) (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0
There are two continuously differentiable solutions V (x) of the HJB equa-
tion (3.33) with the property that V (0) = 0. Determine both.
(d) Continue with the system and cost of (c). Find the input u ∗ : [0, ∞) → R that mini-
mizes J [0,∞) (x 0 , u(·)) over all inputs that steer the state to zero, limt →∞ x(t ) = 0.
82
3.10 Consider the problem of Example 3.4.6. Argue that
µ ¶
∂V (x ∗ (t ), t ) 4 4
min u ∗ (t ) + x ∗ (t ) + u ∗ (t )
u∈U ∂x
3.11 Consider a mass m hanging from a ceiling on a thin massless rod of length `, see
Fig. 3.4. We can control the pendulum with a torque u(t ). The standard mathemati-
cal model in the absence of friction is
where φ is the angle between pendulum and the vertical hanging position, u is the ap-
plied torque, m is the mass of the pendulum, ` is the length of the pendulum and g is
the gravitational acceleration.
The objective is to choose a torque u(t ) that stabilizes the pendulum to the vertical
hanging equilibrium φ = 2kπ, φ̇ = 0. This, by definition means that u(t ) is such that
3.12 Determine the input u : [0, ∞) → R that stabilizes the system ẋ(t ) = x(t ) + u(t ) (meaning
R∞
limt →∞ x(t ) = 0) and minimizes 0 u 4 (t ) dt over all inputs that stabilize the system.
ℓ
φ
83
84
Chapter 4
In this chapter we study quadratic costs for linear systems. This class of systems and costs is
broad, yet is simple enough to allow for explicit solutions. Especially for the infinite horizon
problem (explained below) there are efficient numerical routines that solve the problem com-
pletely, and they can be found in various software packages. These methods lie at the heart of
a number of popular controller design methods, such as LQR, H2 and H∞ controller design,
and it is also connected to Kalman filtering.
over all inputs u : [0, T ] → Rm and states x : [0, T ] → Rn that are governed by a linear time
invariant system with given initial state,
No restrictions are imposed on u(t ), that is, at any time t ∈ [0, T ] the input can take any value
in Rm . The number of entries of the state x is denoted by n. The matrix B thus has n rows
and m columns and A is an n × n matrix. The weights Q and S are assumed to be positive
semidefinite n × n matrices but are otherwise arbitrary, and we assume that R is an m × m
positive definite matrix:
S ≥ 0, Q ≥ 0, R > 0.
Definition 4.1.1 (LQ problem). The finite horizon linear quadratic control problem –
LQ problem for short – is to determine inputs u : [0, T ] → Rm that minimize cost (4.1) subject
to (4.2). ä
We solve the LQ problem in detail, first using Pontryagin’s Minimum Principle and then
using Dynamic Programming. Both methods reveal that the optimal cost is quadratic in the
initial state, that is,
85
for some matrix P . The quadratic nature of the optimal cost is subsequently exploited to
derive a number of results. Most importantly it allows to elegantly solve the infinite horizon
LQ problem, which is where the final time is infinity, T = ∞, and the terminal cost is absent:
Z ∞
J [0,∞) (x 0 , u(·)) := x T (t )Qx(t ) + u T (t )Ru(t ) dt .
0
If for every x 0 this infinite horizon cost is finite for at least one input (and in practice that is
always the case) then the optimal input u(·) exists and we will see that it can be implemented
as a linear static state feedback
u(t ) = −F x(t )
for some matrix F . This is remarkable since the feedback form is not imposed on the LQ
problem. It is a result.
Working out the Hamiltonian equations for state (2.12a) and costate (2.12b) we obtain
Now the math to come will clean up considerably if we replace the costate p with the halved
costate p̃ defined as
1
p̃ := p.
2
Also this halved costate p̃ is called costate. This way the Hamiltonian becomes
According to the Minimum Principle, the optimal input at each t minimizes the Hamiltonian.
The Hamiltonian is quadratic in u with positive definite quadratic term, hence it is minimal
if-and-only-if its gradient with respect to u is zero. This gradient is
∂H (x, 2p̃, u)
= 2B T p̃ + 2Ru
∂u
u = −R −1 B T p̃. (4.5)
86
Substitution of this input into the Hamiltonian equations (4.4) yields the system of coupled
differential equations
· ¸ · ¸· ¸
ẋ(t ) A −B R −1 B T x(t ) x(0) = x 0
= , (4.6)
p̃˙ (t ) −Q −A T p̃(t ) p̃(T ) = Sx(T ).
The coupled differential equations (4.6) is a linear time-invariant differential equation in x(t )
and p̃(t ). If we would have had only an initial or only a final state condition then we could
have easily solved (4.6). Here, though, we have combined initial and final conditions, so it
is not immediately clear how to solve the above equation. At this point, it may not be clear
that the above differential equation, with its mixed boundary conditions, has a solution at all!
Later on in this section we will see that it does. This result exploits the following remarkable
connection between state/costate and optimal cost. This connection may come as a surprise
but can be understood from the Dynamic Programming solution presented further on in this
chapter.
Lemma 4.2.1 (Optimal cost). For any solution x(t ), p̃(t ) of (4.6), the cost (4.1) for u ∗ (t ) =
−R −1 B T p̃(t ) equals
d T T T ˙ T −1 T T T
dt (p̃ x) = p̃ ẋ + x p̃ = p̃ (Ax − B R B p̃) + x (−Qx − A p̃)
T −1 T T
= −p̃ B R B p̃ − x Qx
= −(u ∗ Ru ∗ + x T Qx).
T
Example 4.2.2 (First order system). For the standard integrator system ẋ(t ) = u(t ) with
quadratic cost
Z T
J [0,T ] (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt
0
87
This is simple enough to allow for an explicit solution of its matrix exponential,
" #
Ht 1 et + e−t − et + e−t
e = .
2 − et + e−t et + e−t
for an, as yet, unknown initial costate p̃(0). It should be chosen such that p̃(T ) matches the
final condition p̃(T ) = Sx(T ) = 0x(T ) = 0. It is not hard to see that this requires
eT − e−T
p̃(0) = x0 .
eT + e−T
This then fully determines the state and costate for all t ∈ [0, T ] as
· ¸ " #" #
x(t ) 1 et + e−t − et + e−t 1
p̃(t )
=
2 − et + e−t et + e−t eT − e−T x 0 .
eT + e−T
The initial costate p̃(0) is linear in x 0 and therefore the entire state and costate (x(t ), p̃(t )) is
linear in x 0 . The optimal cost is quadratic in x 0 ,
eT − e−T
J [0,T ] (x 0 , u ∗ (·)) = p̃(0)x(0) = x 02 .
eT + e−T
The optimal input is linear in the costate,
u ∗ (t ) = −R −1 B T p̃(t ) = −p̃(t ),
and since the costate is linear in x 0 also the optimal input is linear in x 0 . ä
In the above example we managed to transfer the final condition p̃(T ) = Sx(T ) into an
equivalent initial condition on p̃(0), and this proved that the solution of the Hamiltonian
equation exists and is unique. We will shortly see that this always works! The general pro-
cedure is as follows. First compute the 2n × 2n matrix exponential of the Hamiltonian matrix
and split it into four n × n blocks:
· ¸
Σ11 (t ) Σ12 (t )
= eH t .
Σ21 (t ) Σ22 (t )
and this shows that the final condition p̃(T ) = Sx(T ) can be rewritten as
0 = Sx(T ) − p̃(T )
· ¸
£ ¤ x(T )
= S −I
p̃(T )
· ¸· ¸
£ ¤ Σ11 (T ) Σ12 (T ) x0
= S −I
Σ21 (T ) Σ22 (T ) p̃(0)
= [SΣ11 (T ) − Σ21 (T )]x 0 + [SΣ12 (T ) − Σ22 (T )]p̃(0). (4.8)
88
Clearly this final condition is satisfied if and only if
p̃(0) = M x 0
where
The question, of course, is: does the above inverse exist? The answer is yes:
and here that is zero because we took x 0 = 0. Since all terms on the left-hand side of the above
equation are nonnegative it must be that all these terms are zero. In particular u ∗T (t )Ru ∗ (t ) ≡
0. Now R > 0 so necessarily u(t ) ≡ 0. This, in turn, implies that ẋ(t ) = Ax(t ) + Bu ∗ (t ) = Ax(t ).
Given x(0) = 0 we get x(t ) = 0 for all time and, as a result, p̃˙ (t ) = −Qx(t ) − A T p̃(t ) = −A T p̃(t )
and p̃(T ) = Sx(T ) = 0. This shows that p̃(t ) is zero for all time as well. Conclusion: for x 0 =
0 the solution (x(·), p̃(·)) of (4.6) exists and is unique. This implies that SΣ12 (T ) − Σ22 (T ) is
nonsingular for otherwise there would have existed multiple p̃(T ) that satisfy the boundary
condition (4.8). Invertibility of SΣ12 (T ) − Σ22 (T ) in turn shows that the final condition (4.8)
has a unique solution p̃(0) for every initial state x 0 . ■
Notice that p̃(0) according to (4.10) is linear in the initial state: p̃(0) = M x 0 . Hence the op-
timal cost is quadratic in x 0 (see Lemma 4.2.1). There is also an elegant elementary argument
why the optimal cost is quadratic in the state, see Exercise 4.6.
Example 4.2.4 (Integrator, see also Example 3.4.4). Consider once again the integrator sys-
tem ẋ(t ) = u(t ) and take as cost
Z T
2
J [0,T ] (x 0 , u(·)) = x (T ) + Ru 2 (t ) dt (4.11)
0
and
· ¸
1 −t /R
eH t = .
0 1
89
The final condition on p̃(T ) can be transferred to a (unique) initial condition on p̃(0). Indeed
the final condition is met if and only if
0 = Sx(T ) − p̃(T )
· ¸
£ ¤ x0
= S −1 eH T
p̃(0)
· ¸· ¸
£ ¤ 1 −T /R x0
= 1 −1
0 1 p̃(0)
= x 0 − (T /R + 1)p̃(0).
It is linear in x 0 (as predicted) and the inverse required exists (as predicted) because T /R ≥ 0
so T /R + 1 6= 0. The optimal cost is quadratic (predicted as well), in fact,
x 02
J [0,T ] (x 0 , u ∗ (·)) = p̃(0)x(0) = .
T /R + 1
Special about this example is that the costate is constant p̃(t ) = p̃(0). The optimal input is
constant as well
1 p̃(0) x0
u ∗ (t ) = − p̃(t ) = − =− .
R R T +R
For R À T the optimal control u ∗ (t ) is small, which is to be expected because for large R
the input is penalized strongly in the cost (4.11). If R ≈ 0 then control is cheap. Now the
control is not necessarily large, u ∗ (t ) ≈ −x 0 /T but large enough to steer the final state x(T ) to
something close to zero, x(T ) = x 0 (1 − T /(R + T )) = x 0 R/(T + R) ≈ 0. ä
Example 4.2.5 (Second order system with mixed boundary condition). This is a laborious
example. Consider the system with initial condition
· ¸ · ¸· ¸ · ¸ · ¸ · ¸
ẋ 1 (t ) 0 1 x 1 (t ) 0 x 1 (0) 1
= + u(t ), =
ẋ 2 (t ) 0 0 x 2 (t ) 1 x 2 (0) 0
with constraints
· ¸ · ¸ · ¸ · ¸
x 1 (0) 1 p̃ 1 (3) x 1 (3)
= , = . (4.13)
x 2 (0) 0 p̃ 2 (3) 0
90
Now we try to solve (4.12). The differential equation for p̃ 1 (t ) simply is p̃˙1 (t ) = 0, p̃ 1 (3) = x 1 (3),
and therefore has solution
p̃ 1 (t ) = x 1 (3). (4.14)
so that
p̃ 2 (t ) = (3 − t )x 1 (3). (4.15)
With this solution, we can write the differential equation for x 2 (t ) explicitly
1
x 2 (t ) = t (t − 6)x 1 (3). (4.16)
2
1
ẋ 1 (t ) = x 2 (t ) = t (t − 6)x 1 (3), x 1 (0) = 1.
2
Its solution is
· ¸
1 3 3 2
x 1 (t ) = t − t x 1 (3) + 1. (4.17)
6 2
The only unknown we are left with is x 1 (3). From Equation (4.17) it follows that
¸·
9 27
x 1 (3) = − x 1 (3) + 1.
2 2
I.e.,
1
x 1 (3) = . (4.18)
10
Now we have solved the differential equation (4.12), and the solution is given by (4.14)–(4.17),
with x 1 (3) equal to 1/10, see (4.18). Hence, the optimal control (4.5) is
£ ¤ 1
u(t ) = −R −1 B T p̃(t ) = −B T p̃(t ) = − 0 1 p̃(t ) = −p̃ 2 (t ) = (t − 3).
10
91
4.3 Finite Horizon LQ: Dynamic Programming
Dynamic Programming applies to the LQ problem as well. The crucial difference with the ap-
proach of the previous section is that while the Minimum Principle supplies necessary condi-
tions for optimality, with Dynamic Programming we have sufficient conditions for optimality.
That said, the equation that has to be solved in Dynamic Programming is a partial differential
equation (called Hamilton-Jacobi-Bellman (HJB) equation) and that is no easy task. For LQ it
can be done, however.
So consider again a linear system
V (x, t ) = x T P (t )x
with P (t ) an n × n symmetric matrix. This may seem a restrictive assumption, but the beauty
of Dynamic Programming is that satisfaction of the HJB equation for a restricted class proves
that it was not a restriction after all. Using this quadratic V (x, t ), the HJB equations (4.21)
become
£ ¤
x T Ṗ (t )x + minm 2x T P (t ) (Ax + Bu) + x T Qx + u T Ru = 0, x T P (T )x = x T Sx. (4.22)
u∈R
The minimization over u can, like in the previous section, be solved by setting the gradient of
2x T P (t )(Ax + Bu) + x T Qx + u T Ru with respect to u equal to zero. This gives
u = −R −1 B T P (t )x
All terms here have a factor x T (on the left) and a factor x (on the right). Now if the equation
with the x T and x removed has a solution then clearly the above equation with x T and x in
place has a solution as well. The differential equation with the x T and x removed is
Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) −Q, P (T ) = S . (4.23)
u ∗ (t ) = −R −1 B T P (t )x(t )
92
makes the closed-loop system satisfy
This is a linear differential equation and it has a unique solution x(t ) on [0, T ] for every con-
tinuous P (t ). So then Thm. 3.4.3 guarantees that this u ∗ (·) is the optimal input for every x 0
and that V (x, t ) = x T P (t )x is the value function. In particular V (x 0 , 0) = x 0T P (0)x 0 is then the
optimal cost. Therefore we proved:
Proposition 4.3.1 (Solution of the finite horizon LQ problem). Let Q, S, R be symmetric and
R > 0. If the RDE (4.23) has a continuously differentiable solution P : [0, T ] → Rn×n , then the
LQ problem (4.1)–(4.2) is solvable for every x 0 ∈ Rn . In particular then
u ∗ (t ) = −R −1 B T P (t )x(t ) (4.24)
Notice that this proposition is also valid for symmetric matrices S,Q that are not positive
semidefinite. The optimal control (4.24) is of a special form: first we have to determine the so-
lution P (t ) to the matrix RDE, but this can be done irrespective of x 0 . Once this is determined
the optimal control can be implemented as a static time-varying state feedback (4.24).
Example 4.3.2 (Example 4.2.4 continued). Consider again the integrator system ẋ(t ) = u(t )
of Example 4.2.4 with
Z T
J (x 0 , u(·)) = x 2 (T ) + Ru 2 (t ) dt
0
Ṗ (t ) = P 2 (t )/R, P (T ) = 1.
x 02
J (x 0 , u ∗ (·)) = x 02 P (0) =
1 + T /R
and the optimal input is
P (t )x(t ) x(t )
u ∗ (t ) = −R −1 B T P (t )x(t ) = − =− . (4.26)
R R +T −t
In this example the optimal control u ∗ (·) is given in state feedback form, while in Exam-
ple 4.2.4 (where we handled the same LQ problem) the control input is given as a function
of time. The feedback form is preferred in applications, but for this particular problem the
feedback form (4.26) blurs the fact that the resulting optimal state and control are just linear
functions, see Example 4.2.4. ä
93
4.4 Riccati Differential Equations (RDE’s)
Proposition 4.3.1 assumes the existence of a solution of the RDE. The good news is that for all
LQ problems (where S ≥ 0,Q ≥ 0, R > 0) it can be proved that such a solution exists. Thus for
LQ problems we have a complete solution:
Theorem 4.4.1 (Existence of solution of RDE’s). If S ≥ 0,Q ≥ 0, R > 0 then the RDE (4.23) has
a unique continuously differentiable solution P (t ) for t ∈ [0, T ] and
Consequently, the LQ problem has a unique solution u ∗ (t ) = −R −1 B T P (t )x(t ) with value func-
tion V (x, t ) := x T P (t )x and optimal cost J [0,T ] (x 0 , u ∗ (·)) = x 0T P (0)x 0 .
lim |p i j (t )| = ∞.
t ↓t esc
We now show that this leads to a contradiction. Let e i be the i -th basis vector of Rn . Because
P (t ) is symmetric, it is easy to see that
(e i + e j )T P (t )(e i + e j ) − (e i − e j )T P (t )(e i − e j ) = 4p i j (t ).
Our z T P (t )z can therefore not escape to +∞ in finite time. (By the way, it can also not escape
to −∞ because z T P (t )z = minu(·) J [t ,T ] (z, u(·)) ≥ 0.). Hence z T P (t )z does not escape on [0, T ].
Contradiction. The differential equation therefore does not have a finite escape time t esc ∈
[0, T ]. Lemma B.1.4 now guarantees that the differential equation has a unique solution P (t )
on the entire [0, T ].
Now that existence of P (t ) is proved, Proposition 4.3.1 tells us that the given u ∗ (·) is op-
timal and x T P (t )x is the value function and x 0T P (0)x 0 is the optimal cost. We showed at the
beginning of the proof that P (t ) is symmetric. It is also positive semi-definite because the
cost is nonnegative. ■
94
In this proof non-negativity of Q, S, R is used. These assumptions are standard in LQ, but
it is interesting to see what happens if an assumption fails. Then the solution P (t ) might
escape in finite time:
Example 4.4.2 (Negative Q, finite escape time). Consider the integrator system ẋ(t ) = u(t )
for some nonzero x(0) = x 0 and cost
Z T
J [0,T ] (x 0 , u(·)) = −x 2 (t ) + u 2 (t ) dt .
0
Ṗ (t ) = P 2 (t ) + 1, P (T ) = 0.
P (t ) = tan(t − T ), t ∈ (T − π2 , T ].
t esc = T − π2 .
See Fig. 4.1. If T < π2 then there is no escape time in [0, T ] and hence P (t ) = tan(t − T ) is then
well defined on the entire horizon [0, T ] and consequently
V (x, t ) = x 2 tan(t − T )
u ∗ (t ) = − tan(t − T )x(t )
for some small ² > 0. Since ẋ(t ) = u(t ) it means that x(t ) is constant over [0, t esc + ²] and then
it continues optimally over [t esc + ², T ]. The total cost for this input is
It converges to −∞ as ² ↓ 0. ä
95
T T − π2 T
0 0
tan(t − T )
tan(t − T )
tan(−T )
F IGURE 4.1: Graph of tan(t − T ) for t ∈ [0, T ]. Left: if 0 < T < π2 . In that case tan(t − T ) is defined
for all t ∈ [0, T ]. Right: if T ≥ π2 . Then tan(t −T ) is not defined at T − π2 ∈ [0, T ]. See Example 4.4.2
As before we assume that R is positive definite and that Q is positive semidefinite. The termi-
nal cost x T (∞)Sx(∞) is absent. (For the problems that we have in mind the state converges
to zero so the terminal cost would not contribute anyway.) We approach the infinite horizon
LQ problem as the limit as T → ∞ of the finite horizon LQ problem over the time-window
[0, T ]. To make the dependence on T explicit we write the solution of the RDE (4.23) with a
subscript T , that is,
Ṗ T (t ) = −P T (t )A − A T P T (t ) + P T (t )B R −1 B T P T (t ) −Q, P T (T ) = 0. (4.28)
ẋ(t ) = u(t ),
Ṗ T (t ) = P T2 (t ) − 1, P T (T ) = 0.
It has solution
P T (t )
eT −t − e−(T −t )
P T (t ) = tanh(T − t ) = .
eT −t + e −(T −t )
T t→
96
The example suggests that P T (t ) converges to a constant P as the horizon T goes to ∞. It
also suggests that limT →∞ Ṗ T (t ) = 0, which in turn suggests that the Riccati differential equa-
tion in the limit reduces to an algebraic equation,
0 = A T P + P A − P B R −1 B T P +Q . (4.29)
This type of equation is known as an Algebraic Riccati Equation (ARE). The following extensive
theorem shows that that is indeed the case. It requires just one condition: for each x 0 there
needs to exist at least one input that renders the cost J [0,∞) (x 0 , u(·)) finite (and, as always,
Q ≥ 0, R > 0).
Theorem 4.5.2 (Infinite horizon LQ). Consider ẋ(t ) = Ax(t ) + Bu(t ) and suppose Q ≥ 0, R > 0
and that for every x 0 an input exists that renders the cost (4.27) finite. Then the solution P T (t )
of (4.28) converges to a matrix independent of t as the final time T goes to infinity. That is, a
constant matrix P exists such that
1. P ≥ 0.
2. P satisfies the ARE (4.29), (but the ARE has more than one solution).
4. The input that minimizes the infinite horizon cost (4.27) is u(t ) = −R −1 B T P x(t ) and the
optimal cost is
Proof. For every fixed x 0 the expression x 0T P T (t )x 0 is nondecreasing with T because the
longer the horizon the higher the cost. Indeed for every ² > 0 and initial x(t ) = z we have
Z T +²
z T P T +² (t )z = x ∗T (t )Qx ∗ (t ) + u ∗T (t )Ru ∗ (t ) dt
t
Z T
≥ x ∗T (t )Qx ∗ (t ) + u ∗T (t )Ru ∗ (t ) dt ≥ z T P T (t )z.
t
Besides being nondecreasing, it is, for any given z, also bounded from above because by as-
sumption for at least one input u z (·) the infinite horizon cost is finite,
p i i := lim e iT P T (t )e i
T →∞
97
exists. The diagonal entries of P T (t ) hence converge. For the off-diagonal entries we use that
The limit on the left-hand side exists, so the limit p i j := limT →∞ e iT P T (t )e j exists as well.
Therefore all entries of P T (t ) converge. The limit is independent of t (see Exercise 4.16). Re-
mains to prove the 4 items:
3. First realize that for every input, an infinite horizon cost is never less than a finite hori-
zon cost:
for every T > τ > 0. Since this inequality holds for every T > τ it also holds in the limit
T → ∞:
Now for τ → ∞ the above two J ’s converge to each-other. Hence provided that
J [0,∞) (x 0 , u(·)) is finite we have
V (x) = x T P x.
98
The final identity is not entirely straightforward. Verify it. It looks cleaner if we define
the signal v(t ) as
Clearly the integral of −V̇ (x(t )) from 0 to ∞ equals V (x(0)) − V (x(∞)) and so we find
that
Z ∞
J [0,∞) (x 0 , u(·)) = V (x 0 ) − V (x(∞)) + v T (t )R v(t ) dt (4.34)
0
for every u(·) (optimal or not). If u(·) achieves a finite cost then V (x(∞)) = 0 because
of (4.31). Hence then we get what we wanted to prove:
Z ∞
J [0,∞) (x 0 , u(·)) = V (x 0 ) + v T (t )R v(t ) dt . (4.35)
0
4. The input u(t ) = −R −1 B T P x(t ) achieves a finite cost because then v(t ) = 0 and so
from (4.34) we find that J (x 0 , u(·)) = V (x 0 ) − V (x(∞)) ≤ V (x 0 ) which is finite. Hence
the previous part applies which says that the cost in fact then equals (4.35), i.e. equals
V (x 0 ) = x 0T P x 0 . Clearly (4.35) can not be less than V (x 0 ). The input u(t ) = −R −1 B T P x(t )
is therefore the optimal input and x 0T P x 0 is the optimal cost.
ẋ(t ) = u(t ),
First notice that u(t ) := −x(t ) ensures that the cost is finite, so we can Apply Thm. 4.5.2. The
ARE is
−P 2 + 1 = 0,
Obviously it has two solutions, P = ±1, and since Theorem 4.5.2 guarantees that the solution
that we need is ≥ 0 we have
P = 1.
P x 02 = x 02 .
99
Application of the optimal u(t ) = −x(t ) results in a stable closed loop,
The state converges exponentially fast to zero, and then so does the input u(t ) = −x(t ). In
fact, it is easy to verify from this optimal state and input that the optimal cost is
Z ∞ Z ∞
2 2
x (t ) + u (t ) dt = x 02 e2t +x 02 e2t dt = x 02 .
0 0
In this example we could easily figure out which solution of the ARE to take because we
know that the solution P is positive semi-definite. Also interesting is the fact that the closed
loop (4.36) is asymptotically stable (with zero equilibrium). This holds for a large class of
systems and opens up alternative ways to find P , as in the above example. The next theorem
assumes familiarity1 with detectability and stabilizability, see Appendix A.6.
Theorem 4.5.4 (Alternative ways to find P ). Suppose that Q ≥ 0, R > 0 and that (A, B ) is sta-
bilizable and (Q, A) detectable. Then the solution P defined in (4.30) that determines the
solution of the infinite horizon LQ problem exists and can alternatively be characterized as
follows:
1. The ARE (4.29) has a unique solution P ∈ Rn×n for which A −B R −1 B T P is asymptotically
stable, and this is the LQ-solution P of (4.30).
Hence the state feedback u ∗ (t ) = −R −1 B T P x(t ) is a stabilizing state feedback, meaning
that the eigenvalues of A − B R −1 B T P of the feedback system ẋ(t ) = Ax(t ) + Bu ∗ (t ) =
(A − B R −1 B T P )x(t ) have strictly negative real part.
2. The ARE (4.29) has a unique positive semi-definite solution, and this is the LQ-solution
P of (4.30).
Proof.
1 Quick definition: A pair (A, B ) is stabilizable iff for every x there is an input u(·) for which the solution of
0
ẋ(t ) = Ax(t ) + Bu(t ) converges to zero as t → ∞. A pair (Q, A), with Q ≥ 0, is detectable if for every solution of
ẋ(t ) = Ax(t ) we have limt →∞ x T (t )Qx(t ) = 0 =⇒ limt →∞ x(t ) = 0.
100
The solution P TS (t ) of the associated RDE (4.23) for this case is constant P TS (t ) = P S be-
cause P S satisfies the ARE. Hence the optimal cost is x 0T P S x 0 (achieved for some u ∗S (·)).
Since S ≥ 0 we clearly have J S ≥ J for any input, in particular for u ∗S (·):
x 0T P S x 0 = J [0,T
S S T
] (x 0 , u ∗ (·)) ≥ x 0 P T (0)x 0 .
x 0T P S x 0 ≥ x 0T P x 0 . (4.37)
S
The converse inequality also holds because for u ∗ (·) (optimal for J [0,T ] , not J [0,T ]
) we
find
x 0T P S x 0 ≤ J [0,T
S
] (x 0 , u ∗ (·)) = J [0,T ] (x 0 , u ∗ (·)) + S(x(T ))
which for T → ∞ (and using that u ∗ (·) achieves x(∞) = 0 hence S(x(∞)) = 0 according
the first part of this theorem) becomes
Combination of (4.37) and (4.38) shows that x 0T (P S − P )x 0 = 0. This holds for all x 0 and
because P S − P is symmetric we therefore have that P S = P .
m u
−α ẏ
F IGURE 4.2: A car at position y(t ) with friction force −α ẏ(t ) and external force u(t ). See Exam-
ple 4.5.5
Example 4.5.5 (Infinite horizon – control of a single car). This is a laborious example. Con-
sider the mechanical system
This models a mass m at position y(t ) subject to an external force u(t ) and a friction force
proportional to the velocity, see Fig. 4.2. We take the mass equal to
m=1
and leave the friction coefficient α arbitrary (but positive). As state we take x(t ) = (y(t ), ẏ(t )).
Then (4.39) becomes
· ¸ · ¸
0 1 0
ẋ(t ) = Ax(t ) + Bu(t ) with A= , B= .
0 −α 1
101
The idea is to bring the mass to rest but without using much control effort. A possible solution
is to minimize the cost
Z ∞
J [0,∞) (x 0 , u(·)) = y 2 (t ) + ρ 2 u 2 (t ) dt .
0
The parameter ρ > 0 defines a trade-off between small y and small u. The bigger the ρ, the
larger the penalty on u in the cost, so probably the smaller the optimal control. The matrices
Q and R for our cost are
· ¸
1 0
Q= , R = ρ2.
0 0
(It can be shown that (A, B ) is stabilizable and (Q, A) detectable, but we do not want to go into
the details.) The ARE becomes
· ¸ · ¸ · ¸ · ¸ · ¸
0 1 0 0 0 0 1 0 0 0
P + P −P P + = . (4.40)
0 −α 1 −α 0 ρ −2 0 0 0 0
This matrix equation is effectively a set of three scalar equations in three unknown numbers.
£ p p 12 ¤
Indeed, the matrix P is symmetric so is characterized by three numbers, P = p 11 12 p 22
and then
the above left-hand side is symmetric so it equals zero iff its (1, 1)-element, (1, 2)-element and
(2, 2)-element is zero. This gives:
2
0 = 1 − ρ −2 p 12 ,
0 = p 11 − αp 12 − ρ −2 p 12 p 22 ,
2
0 = 2p 12 − 2αp 22 − ρ −2 p 22 .
From the firstpwe find that p 12 = ±ρ. If p 12 = +ρ then the third equation gives two possible
p 22 = ρ 2 (−α± α2 + 2/ρ). One is positive, the other is negative. We need the positive solution
because P is positive semidefinite only if p 22 ≥ 0. Now that p 12 and p 22 are known, the second
equation gives p 11 . This turns out to give
"p #
α2 + 2/ρ 1
P =ρ ¡ p ¢ . (4.41)
1 ρ −α + α2 + 2/ρ
(Similarly, for p 12 = −ρ the resulting P turns out not to be positive semi-definite so it is not the
solution P ). Conclusion: the P of (4.41) is the only positive semi-definite solution P . Hence it
is the solution we seek. The optimal control is
u ∗ (t ) = −ρ −2 B T P x(t )
h p i
= −1/ρ α − α2 + 2/ρ x(t ) (4.42)
¡ q ¢
= − ρ1 y(t ) + α − α2 + 2/ρ ẏ(t ).
This optimal control is a linear combination of the displacement y(t ) and speed ẏ(t ) of the
mass. These two terms can be interpreted as a spring and damper force in paralel, connected
to a fictitious wall, see Fig. 4.3. ä
102
y
1/ρ
m
p
−α ẏ α2 + 2/ρ − α
F IGURE 4.3: A car at position y(t ) with friction force −α ẏ(t ) optimally
p controlled with a spring
with spring coefficient 1/ρ and damper with damping coefficient α2 + 2/ρ − α. See Exam-
ple 4.5.5
u
k1 k2
m1 m2
r1 r2
q1 q2
F IGURE 4.4: Two connected cars. The purpose is to control the second car with a force u that
acts on the first car. See Example 4.6.1
For simplicity we take all masses and spring constants equal to one, m 1 = m 2 = 1, k 1 = k 2 = 1
and that the damping coefficients are small and the same: r 1 = r 2 = 0.1. Then the linear model
in the state x(t ) defined as x(t ) = (q 1 (t ), q 2 (t ), q̇ 1 (t ), q̇ 2 (t )) becomes
0 0 1 0 0
0 0 0 1 0
ẋ(t ) = x(t ) + u(t ).
−2 1 −0.2 0.1 1
1 −1 0.1 −0.1 0
As the damping coefficients are small one may expect sizeable oscillations when no control
103
is applied. Indeed, the matrix A has two eigenvalues close to the imaginary axis2 and for the
initial state x 0 = (0, 1, 0, 0) and u(t ) = 0 the positions q 1 (t ), q 2 (t ) of the two cars oscillate for a
long time, see Fig. 4.5(top).
To control the second car with the force u(t ) we propose the solution of the infinite hori-
zon LQ problem with cost
Z ∞
q 22 (t ) + Ru 2 (t ) dt .
0
The value of R was set, somewhat arbitrarily, to R = 0.2. As A is stable there is a control that
renders the cost finite, so the conditions of Thm. 4.5.2 are met and we are guaranteed that the
solution P of the corresponding Riccati equation exists. The solution, obtained numerically,
turns out to be
0.4126 0.2286 0.2126 0.5381
0.2286 0.9375 0.0773 0.5624
P =
0.2126 0.0773 0.2830 0.4430
0.5381 0.5624 0.4430 1.1607
Under this control the response to the initial state x 0 = (0, 1, 0, 0) is damped much quicker
than without control, see Fig. 4.5(middle). The eigenvalues of the controlled system ẋ = (A −
B R −1 B T P )x are −0.5925 ± 0.6847i and −0.2651 ± 1.7081i and these are considerably further
away from the imaginary axis than the eigenvalues of A, and the imaginary parts are almost
the same as before. This confirms the stronger damping in the controlled system.
All this is achieved with a control force u(t ) that never exceeds 0.5 in magnitude for this
initial state, see Fig. 4.5(bottom). Notice that the optimal control u(t ) starts out negative but
turns positive way before q 2 (t ) becomes zero for the first time. So apparently it is optimal to
initially speed up the first car away from the second car, but only for a very short period of
time, and then for the next couple of seconds to move the first car towards the second car.
The latter part probably limits overshoot.
For the initial state x 0 = (0, 1, 0, 0) the optimal cost x 0T P x 0 is the (2, 2)-element of P , so
R∞ 2 2
0 q 2 (t ) + Ru (t ) dt = 0.9375. ä
p̃ ∗ (t ) = P (t )x ∗ (t ). (4.43)
Incidentally, this re-proves Lemma 4.2.1 because p̃ T (0)x(0) = x T (0)P (0)x(0) = V (x(0), 0). This
connection (4.43) expresses the costate p̃(t ) in terms of the solution P (t ) of the RDE, but it
2 At −0.011910 ± i0.61774 and two more at −0.13090 ± i1.61273
104
1
q
1
0.8 q
2
0.6
0.4
0.2
-0.2
-0.4
-0.6
-0.8
0 5 10 15 20 25
1
q1
q
2
0.8
0.6
0.4
0.2
-0.2
0 5 10 15 20 25
0.5
u
0.4
0.3
0.2
0.1
-0.1
-0.2
-0.3
-0.4
0 5 10 15 20 25
F IGURE 4.5: Top: positions of the uncontrolled cars. Middle: positions of the controlled cars.
Bottom: control force u(t ) for the controlled car. The initial state is q 1 (0) = 0, q 2 (0) = 1, q̇ 1 (0) =
0, q̇ 2 (0) = 0. See Example 4.6.1
105
can also be used to determine P (t ) using the states and costates. This goes as follows. In
Thm. 4.2.3 we saw that
· ¸ · ¸· ¸
x(t ) Σ11 (t ) Σ12 (t ) I
= x0 (4.44)
p̃(t ) Σ21 (t ) Σ22 (t ) M
for M = (SΣ12 (T ) − Σ22 (T ))−1 (SΣ11 (T ) − Σ21 (T )). If the mapping from x 0 to x(t ) is nonsingular
then x 0 follows uniquely from x(t ) as x 0 = (Σ11 (t ) + Σ12 (t )M )−1 x(t ) and then p(t ) also follows
uniquely from x(t ):
Comparing this with (4.43) suggests the following explicit formula for P (t ):
Theorem 4.7.1 (Solution of RDE’s using Hamiltonians). Let S,Q, R be positive semi-definite
n × n matrices and R > 0. Then the solution of the RDE
Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) −Q, P (T ) = S
is
Here M = [SΣ12 (T )−Σ22 (T )]−1 [SΣ11 (T )−Σ21 (T )], and Σi j are n ×n sub-blocks from the matrix
exponential eH t .
Proof. Recall that the solution P (t ) of the RDE exists. If Σ11 (t ) + Σ12 (t )M would have been
singular at some t = t̄ , then any nonzero x 0 in the null space of Σ11 (t̄ )+Σ12 (t̄ )M renders x(t̄ ) =
0 while p̃(t̄ ) is nonzero (because Σ(t ) := eH t is invertible). This contradicts the fact that p̃(t ) =
P (t )x(t ). Hence Σ11 (t ) + Σ12 (t )M is invertible for all t ∈ [0, T ] and, consequently, the mapping
from x(t ) to p̃(t ) follows uniquely from (4.44) and it equals (4.45). See Exercise 4.21. ■
We have been using this result already a couple of times without mentioning it:
RT
Example 4.7.2. In Example 4.2.2 we tackled the minimization of 0 x 2 (t ) + u 2 (t ) dt for ẋ(t ) =
u(t ) using Hamiltonians, and we found that
· ¸ " #
Σ11 (t ) Σ12 (t ) 1 et + e−t − et + e−t eT − e−T
= , M = .
Σ21 (t ) Σ22 (t ) 2 − et + e−t et + e−t eT + e−T
Ṗ (t ) = P 2 (t ) − 1, P (T ) = 0.
106
Infinite horizon LQ and stabilizing solutions of ARE’s. Also for the infinite horizon case
there is connection between Hamiltonians and solutions of Riccati equations. If an n × n
matrix P satisfies an Algebraic Riccati Equation (ARE)
P A + A T P − P B R −1 B T P +Q = 0
This is interesting because in the case that all matrices here are numbers (and the Hamilto-
nian matrix hence a 2 × 2 matrix) it says that
· ¸
I
P
has no imaginary eigenvalues, and there is a matrix V ∈ R2n×n of rank n with the property
that
· ¸
A −B R −1 B T
V =VΛ
−Q −A T
for some asymptotically stable Λ ∈ Rn×n . Furthermore, given any such V decompose V as
£ ¤
V = VV12 with V1 ,V2 ∈ Rn×n . Then V1 is invertible and
P = V2V1−1
Proof. This proof assumes knowledge of linear algebra. If P is a solution of the ARE, then
· ¸· ¸· ¸ · ¸
I 0 A −B R −1 B T I 0 A − B R −1 B T P −B R −1 B T
= . (4.46)
−P I −Q −A T P I 0 −(A − B R −1 B T P )T
£ I 0 ¤£ I 0 ¤
Interestingly −P I P I = I 2n so the above equation is a similarity transformation and this
£ A −B R −1 B T ¤
tells us that the eigenvalues of the Hamiltonian −Q −A T are those of A − B R −1 B T P and
−(A − B R −1 B T P )T . If P is stabilizing then none of these eigenvalues lie on the imaginary axis.
Furthermore it shows that there are n stable eigenvalues (those of A − B R −1 B T P ) and n anti-
stable eigenvalues (those of −(A−B R −1 B T P )T ). Now for simplicity assume that all eigenvalues
107
are distinct, and let λ1 , . . . , λn denote the n stable eigenvalues. To each eigenvalue λi there
£ A −B R −1 B T ¤
corresponds an eigenvector v i . That is, −Q −A T v i = λi v i , or in matrix form
· ¸
A −B R −1 B T £ ¤ £ ¤
v1 v2 ··· v n = λ1 v 1 · · · λ2 v n
−Q −A T | {z }
V
£ ¤
= v1 v2 ··· vn Λ
| {z }
V
where
· ¸
W1 £ ¤
= w1 w2 ··· wn .
0
£ W1 ¤
Written out this becomes V = PW . Now W1 is nonsingular because the matrix of eigenvec-
£W ¤ 1
tors 0 has full column rank. Therefore V1 = W1 is nonsingular and P follows uniquely as
1
Lemma 4.7.4 (Technical lemma). If A ∈ Rn×n is asymptotically then for every Q ∈ Rn×n (not
necessarily symmetric) the equation A T P +P A = −Q has a unique solution P ∈ Rn×n (not nec-
essarily symmetric).
R∞ T
Proof. Based on Lyapunov theory we guess that P := 0 e A t Q e At dt is one solution. Indeed,
then
Z ∞ ¯∞
T T T ¯
T
A P +PA = A T e A t Q e At + e A t Q e At A dt = e A t Q e At ¯ = 0 −Q = −Q.
0 0
n×n
This shows that every −Q ∈ R T
is in the range of A P + P A. The linear mapping that sends
n×n n×n
P ∈R to A P + P A ∈ R
T
hence is surjective. Then, by the dimension theorem, it is injec-
tive as well, i.e. the solution P of A T P + P A = −Q is unique. ■
Those familiar with Linear Algebra would say that “V spans the stable eigenspace” of the
Hamiltonian. In any event, the proof of the above theorem shows that finding P is essentially
an eigenvalue problem and these problems are normally numerically tractable.
Example 4.7.5. Consider for the final time the system ẋ(t ) = u(t ) and infinite horizon cost
R∞ 2 2
0 x (t ) + u (t ) dt . The Hamiltonian is
· ¸
0 −1
H = .
−1 0
108
The eigenvalues of the Hamiltonian are λ1,2 = ±1. The eigenvectors for the stable eigenvalue
λ1 = −1 are
· ¸ · ¸
v1 1
v := = c, c 6= 0.
v2 1
It does not depend on the choice of eigenvector (as predicted). The (eigen)value of A −
B R −1 B T P = −1 by construction equals λ1 = −1. The optimal control is u = −B R −1 P x = −x.
Ready. ä
with cost
Z ∞
J [0,∞) (x 0 , u(·)) = x 12 (t ) + x 22 (t ) + u 2 (t ) dt .
0
The first two, λ1,2 , are stable so we need corresponding eigenvectors. We can take these two
−λ1,2
−λ2
v 1,2 = 1,2 .
1
λ31,2
(The fact that these are complex is not a problem.) Now with V known it follows that
· ¸· ¸−1 ·p ¸
1 1 −λ1 −λ2 3 1
P = V2V1−1 = 3 = p .
λ1 λ32 −λ21 −λ22 1 3
p
The optimal input is u(t ) = −R −1 B T P x(t ) = −x 1 (t ) − 3x 2 (t ). ä
109
4.8 Exercises
4.1 Consider the system
with cost
Z T
J [0,T ] (x 0 , u(·)) = 4x 2 (t ) + u 2 (t ) dt .
0
For arbitrary T > 0 determine the optimal x ∗ (t ), u ∗ (t ), p ∗ (t ) and the optimal cost.
4.2 Consider the system and cost of Example 4.4.2. In that example we found a minimizing
control only if 0 ≤ T < π/2. For T = π/2 the method failed. In this exercise we use
Hamiltonians to analyse the case T = π/2:
Use this to confirm the claim that for T = π/2 the Hamiltonian equations (4.6)
have no solution if x 0 6= 0.
(c) Does Pontryagin’s Minimum Principle allow us to conclude that for T = π/2 and
x 0 6= 0 no optimal control u ∗ (·) exists?
R π/2 R π/2
(d) A Wirtinger inequality. Show that 0 ẋ 2 (t ) dt ≥ 0 x 2 (t ) dt for all smooth x(·) for
which x 0 = 0, and show that equality holds if-and-only-if x(t ) = A sin(t ).
4.3 Suppose
and that
Z T
2
J (x 0 , u(·)) = 2x (T ) + u 2 (t ) dt
0
110
RT
4.5 Minimize 0 x 2 (t ) + ẍ 2 (t ) dt over all x(·) with x(0) = 1, ẋ(0) = 0. [Hint: define an appro-
priate u(t ).]
4.6 Why LQ-optimal inputs are linear in the state and costs are quadratic in the state. In this
exercise we prove, using only elementary arguments, that the optimal control in LQ
control is linear in the state and the value function is quadratic in the state. Consider
ẋ(t ) = Ax(t ) + Bu(t ) with the standard LQ cost over the time window [t , T ],
Z T
T
J [t ,T ] (x(t ), u(·)) = x (T )Sx(T ) + x T (τ)Qx(τ) + u T (τ)Ru(τ) dτ,
t
(a) Exploit the quadratic nature of the cost to prove that for every λ ∈ R, every two
x, z ∈ Rn and every two inputs u(·), w(·) we have
V (x + z, t ) + V (x − z, t ) ≤ 2V (x, t ) + 2V (z, t ).
[Hint: minimize the right-hand side of (4.47) over all u(·), w(·).]
(d) Likewise conclude that
V (x + z, t ) + V (x − z, t ) ≥ 2V (x, t ) + 2V (z, t ).
[Hint: minimize the left-hand side of (4.47) over all u(·) + w(·), u(·) − w(·).]
(e) Suppose u x (·) is the optimal input for x and w z (·) is the optimal input for z. Show
that
(f) Prove that if u x (·) is the optimal input for x and w z (·) is the optimal input for z,
then u x (·) + λw z (·) is optimal for x + λz.
(g) Part 4.6f shows that the optimal control u ∗ : [t , T ] → Rm for J [t ,T ] (x(t ), u(·)) is linear
in x. Show that this implies that at each t the optimal control u ∗ (t ) is linear in x(t ).
(h) Argue that V (x, t ) is quadratic in the state, i.e. that V (x, t ) = x T P (t )x for some ma-
trix P (t ) ∈ Rn×n .
111
(a) Prove that
1
q(t ) :=
p(t ) + α
satisfies
with cost
Z 0
J [t0 ,0] (x 0 , u(·)) = g x 2 (0) + 3x 2 (t ) + u 2 (t ) dt .
t0
The initial time t 0 is assumed to be negative. The final time T is taken to be zero.
(a) Determine the associated RDE and show that the solution is given by
( −3+3 e4t −3g −e4t g
−1−3 e4t −g +e4t g
if g 6= 3
P (t ) = (4.49)
3 if g = 3.
(b) Plot P (t ) for −2 ≤ t ≤ 0 for g ∈ {0, 1, 2, 2.5, 2.9, 3, 3.1, 3.5, 4}.
What do you conclude from these plots? Give an intuitive explanation.
(c) We know from the theory that for g = 0, P (t ) is decreasing on (−∞, 0]. From the
plot that you just made it appears that the same is true for all values of g between
0 and 3. Give a formal proof of this observation. Hint: define P̃ (t ) := P (t ) − g and
derive a differential equation for P̃ . Use the general theory to argue that P̃ (t ) is
decreasing on (−∞, 0].
(d) We know from the theory that for g = 0, P (t ) is decreasing on (−∞, 0]. From the
plot that you just made it appears that P (t ) is increasing for all g ≥ 3. Give a for-
mal proof of this observation. Hint: define P̃ (t ) := P 1(t ) and derive a differential
equation for P̃ . Use the general theory to argue that P̃ (t ) is decreasing on (−∞, 0].
(e) Take t 0 = −2 and assume that the initial state x(−2) = 1. Plot the state trajectory
x(t ) and the optimal input u(t ) for t ∈ [−2, 0] for g ∈ {0, 2, 4}.
Carefully observe the behavior of the state near t = 0. What do you see? Can you
explain this?
4.9 Sometimes a transformation of the state variables can facilitate solving the optimal con-
trol problem.
With z(t ) defined as z(t ) = E −1 x(t ), show that the LQ problem for ẋ(t ) = Ax(t ) + Bu(t )
with cost
Z T
J [0,T ] (x 0 , u(·)) = x T (t )Qx(t ) + u T (t )Ru(t ) dt
0
112
yields the problem
with cost
Z T
J˜[0,T ] (z 0 , u(·)) = z T (t )Q̃z(t ) + u T (t )Ru(t ) dt ,
0
where à = E −1 AE , B̃ = E −1 B and Q̃ = E T QE .
Also, what is the relationship between the value functions for both problems?
4.10 (This exercise assumes you know how to diagonalize a matrix.) Consider the system
· ¸ · ¸
−1 1 1 0
ẋ(t ) = x(t ) + u(t )
1 −1 0 1
with cost
Z ∞
1 2 1 2 2 2
J [0,∞) (x 0 , u(·)) = 2 x 1 (t ) + 2 x 2 (t ) + u 1 (t ) + u 2 (t ) dt .
0
(a) Write this system in a minimal state space model with x 1 (t ) = y(t ) and x 2 (t ) = ẏ(t ).
(b) Determine the optimal control for the control problem with cost
Z ∞
J [0,∞) (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0
Also, find the minimal cost associated with the initial conditions y 0 = 0 and y 1 = 1.
4.13 Plot the eigenvalues of the closed loop system from Example 4.5.5 as a function of ρ for
fixed α. Also, plot them as a function of α for fixed ρ. What is the interpretation?
4.14 In this exercise, we want to use the optimal control theory to follow a given signal, the
so-called tracking problem. In order to get to know the system a bit better, we start with
the simple scalar system
We want to find a control such that the state follows the constant signal 1. First, we
show that such a control indeed exists.
lim x(t ) = 1.
t →∞
113
We now want to choose the control in an optimal way. We therefore introduce the cost
function
Z T
1
J [0,T ] (x 0 , u(·)) = (x(t ) − 1)2 + u 2 (t ) dt .
0 T
is not finite.
(d) Prove that there does not exist a control u(·) such that
lim x(t ) = 1,
t →∞
We want the output of this system to follow the constant signal Yr . Therefore, we con-
sider the cost
Z T
1
J [0,T ] (x 0 , u(·)) = ky(t ) − Yr k2 + ku(t )k2 dt .
0 T
(i) Give the equations for the value function, as in part 4.14e.
(j) What is the relationship between the value function derived by you and the matrix
RDE? For which Yr are both value functions equal?
(k) Prove that the optimal control problem has a solution.
with cost
Z T
J [0,T ] (x 0 , u(·)) = u 2 (t ) dt .
0
114
(a) Determine the optimal control using Theorem 2.4.2.
(b) Determine also the optimal state and costate using Differential Equation (4.6). You
are not allowed to use part 4.15a.
(c) Determine the value function V (x 0 , t ), using the x ∗ (t ) and p ∗ (t ) found in 4.15b.
R∞
(d) Solve the minimization problem for J [0,∞) (x 0 , u(·)) = 0 u 2 (t ) dt .
(e) What are the minimal cost in 4.15d? Do these satisfy the algebraic Riccati equa-
tion? Is (Q, A) detectable?
4.16 In the proof of Theorem 4.5.2 it is shown that limT →∞ P T (t ) exists for each t . Show that
the limit is independent of t .
4.17 Consider Theorem 4.5.2 and assume in addition that (A,Q) is observable. Show that
P > 0. [Hint: argue that otherwise there would have been initial states x 0 with optimal
cost x 0T P x 0 equal to zero, and that is impossible for observable (A,Q).]
with cost
Z ∞
J [0,∞) (x 0 , u(·)) = 4x 2 (t ) + u 2 (t ) dt
0
115
4.20 Let Q ≥ 0, R > 0 and suppose that B = 0. Consider Thm. 4.5.4.
(a) Under what conditions on A are the assumptions of Thm. 4.5.4 satisfied?
(b) Determine the ARE for this case.
(c) Thm. 4.5.4 re-proves which result of § B.5?
(a) Assume first that x(1) is given. Determine the optimal cost-to-go from t = 1 on:
R∞
V (x(1), 1) := minu 1 4x 2 (t ) + u 2 (t ) dt .
R1
(b) Express the optimal cost J [0,∞) (x 0 , u(·)) as J [0,∞) (x 0 , u(·)) = 0 u 2 (t ) dt + Sx 2 (1).
(That is: what is S?)
(c) Solve the optimal control problem: determine the optimal cost J [0,∞) (x 0 , u(·)) and
express the optimal input u(t ) as a function of x(t ). [Hint: use separation of vari-
ables, see § A.3.]
116
Appendix A
Background material
This appendix contains concise summaries of a number of topics that play a role in opti-
mal control. Each section covers one topic and most can be read independently from the
other sections. The topics are standard and are covered in some form or another in calcu-
lus courses, a course on differential equations or a first course on systems theory. Nonlinear
differential equations are discussed in Appendix B.
x T P x > 0 ∀x ∈ Rn , x 6= 0.
x T P x ≥ 0 ∀x ∈ Rn .
The notation V > 0 and P > 0 means that the function/matrix is positive definite. Inter-
estingly real symmetric matrices have real eigenvalues only, and there exist simple tests for
positive definiteness:
Lemma A.1.1 (Tests for positive definiteness). Suppose P is an n × n real symmetric matrix.
The following six statements are equivalent.
1. P > 0.
2. All leading principal minors are positive: det(P 1:k,1:k ) > 0 for all k ∈ {1, 2, . . . , n}.
117
4. There is a nonsingular matrix X such that P = X T X .
6. For a partition of P ,
· ¸
P 11 P 12
P= T
P 12 P 22
T −1
P 11 > 0 and P 22 − P 12 P 11 P 12 > 0.
T −1
(That is, both P 11 and its so-called Schur complement P 22 − P 12 P 11 P 12 are positive def-
inite).
ä
For positive semi-definite matrices similar tests exist, except for the principal minor test
which is now more involved:
Lemma A.1.2 (Tests for positive semi-definiteness). Let P = P T ∈ Rn×n . The following state-
ments are equivalent.
1. P ≥ 0.
2. All principal minors (not just the leading ones) are nonnegative: det(P I ,I ) ≥ 0 for every
subset I of {1, . . . , n}.
T −1
the matrix P 11 is square and invertible, then P ≥ 0 iff P 11 > 0 and P 22 − P 12 P 11 P 12 ≥ 0. ä
£ 0¤
Example A.1.3. P = 00 −1 is not positive semidefinite because the principle minor det P 2,2 =
−1 is not nonnegative.
£ ¤
P = 00 01 is positive semidefinite because all three principle minors, det(0), det(1), det(P )
are nonnegative. ä
118
∂ f (x)
First the case k = 1, so f : Rn → R. The ∂x is then a vector of partial derivatives of the
same dimension as x. For the standard choice of column vectors x (with n entries) this means
∂ f (x)
∂x
1
∂ f (x)
∂ f (x) n
:=
∂x 2 ∈R .
∂x ..
.
∂ f (x)
∂x n
With the same logic we get a row vector if we differentiate with respect to a row vector,
· ¸
∂ f (x) ∂ f (x) ∂ f (x) ∂ f (x)
:= ··· ∈ R1×n .
∂x T ∂x 1 ∂x 2 ∂x n
Now the case k ≥ 1. If f (x) ∈ Rk is itself vectorial (column) then similarly we end up with
∂ f 1 (x) ∂ f 1 (x) ∂ f 1 (x)
∂x ···
1 ∂x 2 ∂x n
∂ f (x)
: = .. . .. .. .. ∈ Rk×n ,
∂x T . . .
∂ f k (x) ∂ f k (x) ∂ f k (x)
···
∂x 1 ∂x 2 ∂x n
and
∂ f 1 (x) ∂ f k (x)
∂x ···
1 ∂x 1
∂ f (x) ∂ f k (x)
∂ f T (x) 1 ··· n×k
:=
∂x 2 ∂x 2 ∈R .
∂x .. .. ..
. . .
∂ f 1 (x) ∂ f k (x)
···
∂x n ∂x n
The first is the Jacobian, the second is its transpose. Convenient about this notation is that
the n × n Hessian of a function f : Rn → R can now compactly be denoted as
µ ¶ · ¸
∂2 f (x) ∂ ∂ f (x) ∂ ∂ f (x) ∂ f (x) ∂ f (x)
:= = ···
∂x∂x T ∂x ∂x T ∂x ∂x 1 ∂x 2 ∂x n
2
∂ f (x) ∂2 f (x) ∂2 f (x)
∂2 x · · ·
1 ∂x 1 ∂x 2 ∂x 1 ∂x n
2
∂ f (x) ∂2 f (x) ∂2 f (x)
···
= ∂x 2 ∂x 1 ∂2 x 2 ∂x 2 ∂x n
.
.. .. .. ..
. . . .
∂2 f (x) 2
∂ f (x) ∂ f (x)
2
···
∂x n ∂x 1 ∂x n ∂x 2 ∂2 x n
Indeed, we first differentiate with respect to a row x T and subsequently differentiate the out-
come (a row) with respect to a column x, resulting in an n × n matrix of second-order partial
derivatives. If f (x) is twice continuously differentiable then the order in which we differenti-
ate does not matter (Clairaut’s theorem) so then
∂2 f (x) ∂2 f (x)
= .
∂x∂x T ∂x T ∂x
The Hessian is then symmetric.
119
A.3 Separation of variables
Let x : R → R and consider the differential equation
g (t )
ẋ(t ) = (A.1)
h(x(t ))
with g (·), h(·) some given continuous functions. Let H (·),G(·) denote anti-derivatives of
h(·), g (·). The differential equation is equivalent to
h(x(t ))ẋ(t ) = g (t )
and we see that the left-hand side is the derivative of H (x(t )) with respect to t and the right-
hand side obviously is the derivative of G(t ) with respect to t . So it must be that
H (x(t )) = G(t ) + c 0
This derivation assumes that H (·) is invertible. The c 0 is typically used to match an initial
condition x(t 0 ).
ẋ(t ) = −x 2 (t ), x(0) = x 0
of Example B.1.5 using separation of variables. We split the solution in two columns; the first
column is the example, the second column makes a connection with the general procedure:
In this example the inverse exists as long as t 6= c 0 . Now x 0 = x(0) = −1/c 0 so c 0 can be ex-
pressed in terms of x 0 as c 0 = −1/x 0 and the above solution then becomes
1 x0
x(t ) = = . (A.3)
t + 1/x 0 x 0 t + 1
The solution x(t ) escapes at t = −1/x 0 . (For the escape time problem we refer to Exam-
ple B.1.5.) ä
and that x(t ) > 0 for some time. Then we may divide by x(t ),
ẋ(t )
= a.
x(t )
120
Integrating both sides and using that x(t ) > 0, we find that
log(x(t )) = at + c 0 .
and x 0 = ec0 . For x(t ) < 0 the same solution x 0 eat results (verify this yourself), and if x(t ) = 0
for some time t then x(t ) = 0 for all time, which is also of the form x(t ) = x 0 eat . In summary,
for every x 0 ∈ R the solution is x(t ) = x 0 eat . ä
λ2 + 5λ + 6 = 0.
λ = −2, λ = −3
and then the general solution y(t ) follows as the linear combinations of corresponding expo-
nential terms
one can find a particular solution y part (t ) of the same exponential form, y part (t ) = A es0 t . The
constant A follows easily by equating left and right-hand side of (A.4). For this example it
gives
2s 0 + 3
y part (t ) = 2
es 0 t .
s 0 + 5s 0 + 6
Then the general solution is obtained by adding the general solution of the homogeneous
equation
2s 0 + 3
y(t ) = 2
es 0 t +c 1 e−2t +c 2 e−3t .
s 0 + 5s 0 + 6
If s 0 happens to be a characteristic root (s 0 = −2 or s 0 = −3 in our example) then the particular
solution is invalid because of division by zero. Then a particular solution exists of the form
y part (t ) = (A k t k + · · · + A 1 t + A 0 ) es0 t
for some large enough k and with the constants A 0 , . . . , A k yet to be determined.
If the function u(t ) in (A.4) is polynomial then a polynomial particular solution y part (t ) =
A k t k + · · · + A 1 t + A 0 of sufficiently high degree exists.
121
A.5 System of linear time-invariant DE’s
Let A ∈ Rn×n and B ∈ Rn×m . Then for every x 0 ∈ Rn and piecewise continuous u : R → Rm the
solution x : R → Rn of the DE
follows uniquely,
Z t
x(t ) = e At x 0 + e A(t −τ) Bu(τ) dτ, t ∈ R. (A.5)
0
Piecewise continuity of u(·) for technical reasons only. Here e A is the matrix exponential. It is
defined for square matrices A and can, for instance, be defined in analogy with the the Taylor
series expansion of ea as
X∞ 1 1 1
eA = Ak = I + A + A2 + A3 + · · · . (A.6)
k=0 k! 2! 3!
This series is convergent for every square matrix A. Some characteristic properties of the
matrix exponential are:
For the zero signal u(t ) = 0 the above says that the general solution of
is
x(t ) = e At x 0 .
122
if A is “diagonalizable” – meaning Rn has a basis {v 1 , . . . , v n } of eigenvectors of A – then the
£ ¤
matrix of eigenvectors, P := v 1 v 2 · · · v n , is invertible and A = P ΛP −1 with Λ the diago-
nal matrix of eigenvalues of A. In that case
e At = P eΛt P −1
with eΛt as in (A.7). This shows that for diagonalizable matrices A, every entry of e At is a
linear combination of eλi t , i = 1, . . . , n. However, not every matrix is diagonalizable. Using
Jordan forms it can be shown that:
Lemma A.5.2. Let A ∈ Rn×n and denote its eigenvalues as λ1 , λ2 , . . . , λn . Then every entry
of e At is a finite linear combination of t k eλi t with k ∈ N and i = 1, 2, . . . , n. Moreover, the
following statements are equivalent.
Here, A ∈ Rn×n and B ∈ Rn×m . The function u : [0, ∞) → Rm is often called the input and
the interpretation is that this input u(·) is for us to choose, and that the state x(·) follows. A
natural question is how well the state can be controlled:
Definition A.6.1 (Controllability). A system ẋ(t ) = Ax(t ) + Bu(t ) is controllable if for every
pair of states x 0 , x 1 ∈ Rn , there is a time T > 0 and an input u : [0, T ] → Rm such that the
solution x(·) with x(0) = x 0 satisfies x(T ) = x 1 . ä
Controllability means that any state x(t ) can be driven to any other state by an appropri-
ate choice of input. Controllability can be tested in many ways:
Theorem A.6.2 (Controllability tests). For the system (A.8), the following statements are
equivalent:
4. For every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric with respect
to the real axis, there exists a matrix F ∈ Rm×n such that the eigenvalues of A − B F are
equal to {λ1 , λ2 , . . . , λn }.
ä
123
Definition A.6.3 (Stabilizability). A system ẋ = Ax(t ) + Bu(t ) is stabilizable if for every x(0) ∈
Rn there is a u : [0, ∞) → Rm such that limt →∞ x(t ) = 0. ä
It can be shown that stabilizability is equivalent to the existence of a matrix F ∈ Rm×n such
that all eigenvalues of A−B F have negative real part. This is interesting because it implies that
u(t ) := −F x(t ) is a stabilizing input for every x(t ) (verify this yourself).
Now consider a state system with an output
where A is the same as in (A.8) and C is a (constant) k ×n matrix. The function y : [0, ∞) → Rk
is often called the output and the interpretation is that y is the part of the state the can be
measured. It is a natural question to ask how much information the output provides about
the state. For example, if we know the output, can we reconstruct the state? For linear systems
one can define “observability” as follows.
Definition A.6.4 (Observability). A system (A.9) is observable if a T > 0 exists such that the
x 0 follows uniquely from y : [0, T ] → Rk . ä
Of course, if x 0 follows uniquely then the state x(t ) = e At x 0 follows uniquely over the en-
tire interval [0, T ]. There are many ways to test for observability:
Theorem A.6.5 (Observability tests). Consider system (A.9). The following statements are
equivalent:
4. For every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric with respect to
the real axis, there is a matrix L ∈ Rn×k such that the eigenvalues of A − LC are equal to
{λ1 , λ2 , . . . , λn };
Detectability means that a possible instability of ẋ = Ax(t ) can always be detected by look-
ing at y(t ). Dual to stabilizability it can be shown that detectability is equivalent to the exis-
tence of a matrix L ∈ Rn×k such that all eigenvalues A − LC have negative real part.
124
ent
tang
0
=
z)
G(
L(z) =
minimal
∂G(z 0 )
∂z
z0
∂L(z 0 )
∂z
G(z) < 0
=
0 (a)
z)
G(
L(z) =
minimal ∂G(z ∗ )
∂z
∂L(z ∗ )
z∗ ∂z
125
A.7 Lagrange multipliers
We recall in this section the Lagrange multipliers for finite dimensional optimization prob-
lems and its connection with first order conditions of constrained minimization.
Let L : Rn → R. The first order condition for unconstrained minimization
minn L(z)
z∈R
Example A.7.1 (Geometric interpretation of first order necessary condition for n = 2 and
k = 1). An example is depicted in Fig. A.1. Let L : R2 → R be some smooth function that
we want to minimize over all z ∈ R2 subject to the constraint G(z) = 0 with G : R2 → R some
differentiable function. Intuition tells us that z 0 in Fig. A.1(a) is not a local minimizer because
moving up along the constraint curve brings us to a lower value of L(z). Another way to say
this is that the tangent of the constraint curve at z 0 is not tangent to the level set of L(z)
through z 0 . The first order condition for (local) minimality is that every perturbation δ ∈ R2
“in the tangent of the constraint” at the candidate minimizer z ∗ ,
∂G(z ∗ )
{δ ∈ R2 | δ = 0}
∂z T
is also tangent to the level set through z ∗ , meaning
∂L(z ∗ )
δ = 0.
∂z T
∂L(z) ∂G(z)
The geometric interpretation for n = 2 is that the gradients ∂z and ∂z are aligned at the
minimizer z ∗ , see Fig. A.1(b). ä
Under mild regularity assumptions1 one can likewise show that a solution z ∗ of the con-
straint minimization problem (A.10) necessarily has the property
µ ¶
n ∂G(z ∗ ) ∂L(z ∗ )
G(z ∗ ) = 0 and ∀δ ∈ R : δ = 0 =⇒ δ=0 . (A.11)
∂z T ∂z T
With this constraint minimization problem (A.10) we associate an unconstrained mini-
mization problem by defining the Lagrangian function
and minimizing this function as a function of z ∈ Rn and the vector of Lagrange multipliers
λ ∈ Rk . The standard first-order conditions for minimality of K (z ∗ , λ∗ ) is that the gradient
with respect to both z and λ is zero at (z ∗ , λ∗ ):
∂L(z ∗ ) ∂G(z ∗ )
+ λ∗T = 0, G(z ∗ ) = 0. (A.13)
∂z T ∂z T
1 That G(·) is continuously differentiable and ∂G(z)/∂z T has full row rank at z = z .
∗
126
The second half of the equations (the first order conditions with respect to λ) are just the
constraint equations themselves. The first half of these equations (the first-order conditions
with respect to z) tells us that at a minimizing z ∗ the gradient vector ∂L(z)
∂z T is a linear combina-
∗)
tion of the k rows of ∂G(z
∂z T , see Fig. A.1(b). The classic result is that the first order conditions
for the unconstrained Lagrangian are equivalent to the first order conditions for the original
constrained problem:
(proof: The only-if part is easy: if µT = λT A and Aδ = 0 then µT δ = λT Aδ = 0. For the if-part we
note that the condition that (Aδ = 0 =⇒ µT δ = 0) implies that ker A ⊆ ker µT . This is equivalent
to im µ ⊆ im A T . Since µ ∈ im µ this implies the existence of a λ ∈ Rn such that A T λ = µ.)
Apply the theorem of alternatives with A = ∂G(z ∗ )/∂z T and µ = −∂L(z ∗ )/∂z. ■
127
128
Appendix B
ẋ 1 (t ) = f 1 (x 1 (t ), . . . , x n (t )), x 1 (0) = x 01
.. ..
. .
ẋ n (t ) = f n (x 1 (t ), . . . , x n (t )), x n (0) = x 0n , t ≥ 0. (B.1)
Here x 01 , . . . , x 0n ∈ R are given initial conditions and the f i : Rn → R are given functions. The
vector
x 1 (t )
.
x(t ) := .. ∈ Rn
x n (t )
is called the state (vector) and with a similar definition for the vector of initial conditions x 0 ∈
Rn and vector field function f : Rn → Rn we may write (B.1) more succinctly as
The solution we normally write as x(t ) but sometimes we use x(t ; x 0 ) if we want to emphasise
the dependence on the initial state.
Clearly the zero function x(t ) = 0 ∀t is one solution, but it is easy to verify that for every c > 0
the function
(
0 t ∈ [0, c] x(t )
x(t ) = 1 2
4 (t − c) t >c
0 c t→
129
is a solution as well! Weird. It is as if the state x(t ) – like Baron Munchhausen – is able to lift
itself by pulling on its own hair. ä
p
The vector field function in this example is f (x) = x and it has unbounded derivative
around x = 0. We will see next that if the function f (x) does not increase “too quickly” then
uniqueness is ensured. A measure for the rate of increase is the Lipschitz constant.
Definition B.1.2 (Lipschitz continuity). Let Ω ⊂ Rn and let k · k be some norm on Rn (e.g.
the standard Euclidean norm). A function f : Ω → Rn is Lipschitz continuous on Ω if a Lip-
schitz constant K ≥ 0 exists such that
f (x)
a z b
F IGURE B.1: Lipschitz continuity for scalar functions f : [a, b] → R defined on an interval Ω =
[a, b] ⊂ R means that at each z ∈ [a, b] the graph (x, f (x)) is completely contained in a steep-
enough “bow tie” through the point (z, f (z)). The slope of the steepest bow tie needed over all
z in the interval is a possible Lipschitz-constant K .
For the linear function f (x) = kx with k ∈ R the Lipschitz constant is obviously K = |k| and
the solution of the corresponding differential equation
clearly is x(t ) = ekt x 0 . Given x 0 , this solution exists and is unique. The idea is now that for
arbitrary Lipschitz continuous f : Rn → Rn the solutions of ẋ(t ) = f (x(t )) exist and are unique
(one some region) and that the solution increases at most exponentially with the exponent K
equal to the Lipschitz constant (on that region):
130
Proof. The proof can be found in many textbooks, e.g. (Khalil, 1996, Thm. 2.2 & Thm. 2.5). ■
If a single Lipschitz constant K ≥ 0 exists such that (B.3) holds for all x, z ∈ Rn then f (·) is
said to satisfy a global Lipschitz condition.
It follows from the above theorem that solutions x(t ) can be uniquely continued at any
time t 0 if f (·) is locally Lipschitz. This is such a desirable property that one normally im-
plicitly assumes that f (x) is locally Lipschitz. Every continuously differentiable f (x) is locally
Lipschitz, so in such cases we can uniquely continue the solution x(t ) at every t 0 . However,
the solution might escape in finite time:
Theorem B.1.4 (Escape time). Suppose that f : Rn → Rn is locally Lipschitz. Then for every
x(0) = x 0 there is a unique t (x 0 ) ≥ 0 (possibly t (x 0 ) = ∞) such that the solution x(t ) of (B.2)
exists and is unique on the open time interval [0, t (x 0 )) but does not exist for t > t (x 0 ).
Moreover if t (x 0 ) < ∞ then limt ↑t (x0 ) kx(t ; x 0 )k = ∞.
If f (·) is globally Lipschitz then t (x 0 ) = ∞ i.e. the solution x(t ; x 0 ) then exists and is unique
for all t ≥ 0.
ẋ(t ) = −x 2 (t ), x(0) = x 0 .
We conclude this section with a result about continuity of solutions that is useful for op-
timal control. Here we take the standard Euclidean norm:
Lemma B.1.6 (Continuity of solutions). Consider the differential equation in x(t ) and (per-
turbed) z(t ):
Let T > 0. If Ω is an open set such that x(t ), z(t ) ∈ Ω for all t ∈ [0, T ) and if f (x) on Ω has
Lipschitz constant K , then
µ Z t ¶
Kt
kx(t ) − z(t )k ≤ e kx(0) − z(0)k + kg (τ)k dτ ∀t ∈ [0, T ).
0
Proof. Let ∆(t ) = x(t ) − z(t ). Then ∆(t˙ ) = f (x(t )) − f (z(t )) + g (t ). Then by Cauchy-Schwarz
d
we have | dt k∆(t )k| ≤ K k∆(t )k + kg (t )k. From (A.5) it follows that then k∆(t )k ≤ eK t (k∆(0)k +
Rt
0 kg (τ)kdτ). ■
131
x(t ; x0 )
for x0 > 0
x̃ 0 −1/x̃ 0 t→
x(t ; x0 )
for x0 < 0
F IGURE B.2: For negative x 0 the solution escapes at t = −1/x 0 (Example B.1.5)
x2 →
ǫ
x0
δ
x̄
x(t )
x1 →
132
B.2 Definitions of stability
Asymptotic stability of ẋ(t ) = f (x(t )) loosely speaking, means that solutions x(t ) “comes to
rest”, and stability means that x(t ) remains “close to rest”. In order to formalize this, we first
have to define the “points of rest”. These are the constant solutions x(t ) = x̄ of the differential
equation, so solutions x̄ of f (x̄) = 0.
Different possibilities for the behavior of the system near an equilibrium point are de-
scribed in the following definition. For ease of exposition we assume that t 0 = 0.
1. stable if ∀² > 0 ∃δ > 0 such that kx 0 − x̄k < δ implies kx(t ; x 0 ) − x̄k < ² ∀t ≥ t 0 .
2. attractive if ∃δ1 > 0 such that kx 0 − x̄k < δ1 implies that limt →∞ x(t ; x 0 ) = x̄.
6. unstable if x̄ is not stable. This means that ∃² > 0 such that ∀δ > 0 an x 0 and a t 1 exists
for which kx 0 − x̄k < δ yet kx(t 1 ; x 0 ) − x̄k ≥ ².
ä
133
for every solution of ẋ(t ) = f (x(t )). Now if V (x(t )) is differentiable with respect to time t then
it is nonincreasing if and only if its derivative with respect to time is non-positive everywhere
V̇ (x(t )) ≤ 0 ∀t .
This condition can be checked for solutions of (B.2) without explicit knowledge of x(t ). In-
deed, using the chain-rule, we have that
dV (x(t ))
V̇ (x(t )) =
dt
∂V (x(t )) ∂V (x(t ))
= ẋ 1 (t ) + · · · + ẋ n (t )
∂x 1 ∂x n
∂V (x(t )) ∂V (x(t ))
= f 1 (x(t )) + · · · + f n (x(t ))
∂x 1 ∂x n
∂V (x) ¯
¯
=: f (x) ¯ . (B.5)
∂x T x=x(t )
∂V (x)
We took the opportunity here to introduce the convenient notation ∂x T for the gradient of
V (x) at x seen as a row vector,
· ¸
∂V (x) ∂V (x) ∂V (x) ∂V (x)
:= ··· .
∂x T ∂x 1 ∂x 2 ∂x n
(Appendix A.2 explains this notation, in particular the role of the transpose.) The product
in (B.5) is that of a row vector ∂V (x)/∂x T and a column vector f (x), evaluated at x = x(t ).
With slight abuse of notation we use V̇ (x) to mean
∂V (x)
V̇ (x) = f (x).
∂x T
In order to deduce stability from the existence of a non-increasing function V (x(t )) we addi-
tionally require that the function has a minimum at the equilibrium. Furthermore, for tech-
nical reasons we also have to require a certain degree of differentiability of the function. We
formalize these properties in the following definition and theorem.
Definition B.3.1 (Positive and negative (semi) definite). Let Ω ⊆ Rn and assume it is a neigh-
borhood of some x̄ ∈ Rn . A continuously differentiable function V : Ω → R is positive definite
on Ω relative to x̄ if
It is positive semi-definite if V (x̄) = 0 and V (x) ≥ 0 for all other x. And V (·) is negative (semi)
definite if −V (·) is positive (semi) definite. ä
Positive definite implies that V (·) has a unique minimum on Ω and that the minimum is
attained at x̄. The assumption that the minimum is zero, V (x̄) = 0, is a convenient normaliza-
tion. Figure B.4 shows an example of each of the four types of “definite” functions.
The famous result can now be proved:
Theorem B.3.2 (Lyapunov’s second stability theorem). Consider the DE ẋ(t ) = f (x(t )) with
f : Rn → Rn Lipschitz continuous, and let x̄ be an equilibrium of this DE. If there is a neigh-
borhood Ω of x̄ and a function V : Ω → R such that on Ω
134
positive positive
definite semi-definite
(x̄, 0) x→ (x̄, 0) x→
(x̄, 0) x→ (x̄, 0) x→
negative negative
definite semi-definite
kx − x̄k = ǫ1
B(x̄, ǫ1 )
Ω1 x̄
Ω
kx − x̄k = δ
By definition a Lyapunov function V (x(t )) never increases over time (on Ω), and a strong
Lyapunov function V (x(t )) always decreases on Ω unless we are at the equilibrium x̄.
Proof. We denote the open sphere with radius r and center x̄ by B (x̄, r ), i.e.,
We first consider the stability property. For every ² > 0 we have to find a δ > 0 such that
x 0 ∈ B (x̄, δ) implies x(t ) ∈ B (x̄, ²) for all t > 0. We construct a series of inclusions, see Fig. B.5.
Because Ω is a neighborhood of x̄, there exists an ²1 > 0 such that B (x̄, ²1 ) ⊂ Ω. Without
loss of generality we can take it so small that ²1 ≤ ². Because V (x) is continuous on Ω and
because the boundary of B (x̄, ²1 ) is a compact set, V (x) has a minimum on the boundary of
B (x̄, ²1 ). We call this minimum α. Now define
This set Ω1 is open because V (x) is continuous. It is contained in B (x̄, ²1 ). Now x̄ is an ele-
ment of Ω1 because V (x̄) = 0. So, by continuity of V (x), there exists a δ such that B (x̄, δ) ⊂ Ω1 .
135
We prove that this δ satisfies the requirements: if x 0 ∈ B (x̄, δ), we find because V̇ (·) is nega-
tive semi-definite that V (x(t ; x 0 )) ≤ V (x 0 ) < α for all t ≥ 0. This means that it is impossible
that x(t ; x 0 ), with initial condition in B (x̄, δ), reaches the boundary of B (x̄, ²1 ) because on this
boundary we have, by definition, that V (x) ≥ α. So kx(t ; x 0 ) − x̄k < ² for all time and the sys-
tem, thus, is stable.
Next we prove that the stronger inequality V̇ (x) < 0 ∀x ∈ Ω\{x̄} assures asymptotic stabil-
ity. Specifically we prove that for every x 0 ∈ B (x̄, δ) the solution x(t ; x 0 ) → x̄ as t → ∞. First
note that, because of stability, the orbit x(t ; x 0 ) remains within the bounded set B (x̄, ²1 ) for
all time. Now, to obtain a contradiction, assume that x(t ; x 0 ) does not converge to x̄. This
implies that there is a µ > 0 and increasing time instances t k with t k → ∞ such that
where m is chosen such that t k j + t < t k j +m . Now in the limit j → ∞ the above inequality
becomes
V (x ∞ ) ≥ V (x(t ; x ∞ )) ≥ V (x ∞ ).
(Let us be precise here: since the differential equation is locally Lipschitz we have, by
Thm. B.1.3, that x(t ; x 0 ) depends continuously on x 0 . For that reason we are allowed to say
that lim j →∞ x(t j + t ; x 0 ) = lim j →∞ x(t ; x(t j )) = x(t ; x ∞ ).) Hence V (x(t ; x ∞ )) = V (x ∞ ) for all t .
In particular we see that V (x(t ; x ∞ )) is constant. But that would mean that V̇ (x ∞ ) = 0 and this
violates the fact that V̇ (·) is negative definite and x ∞ 6= x̄. Therefore the assumption that x(t )
does not converge to x̄ is wrong. The system is asymptotically stable. ■
−1 1
x→
1−x 2
F IGURE B.6: Graph of 1+x 2
1 − x 2 (t )
ẋ(t ) =
1 + x 2 (t )
has two equilibria, x̄ = ±1, see Fig. B.6. For equilibrium x̄ = 1 we propose the candidate Lya-
punov function
V (x) = (x − 1)2 .
∂V (x) 1 − x2 (1 − x)2 (1 + x)
V̇ (x) = f (x) = 2(x − 1) = −2 ≤ 0 ∀x ∈ (−1, ∞).
∂x 1 + x2 x2 + 1
136
Actually V̇ (x) < 0 for all x ∈ (−1, ∞) \ {1} so it is in fact a strong Lyapunov function on (−1, ∞)
and hence the equilibrium x̄ = 1 is asymptotically stable.
The other equilibrium, x̄ = −1, is unstable. ä
x2
joint
x1 x1 = −π x1 = π
mass
F IGURE B.7: Left: pendulum. Right: level sets of its mechanical energy V (x). See Example B.3.4
Example B.3.4 (pendulum). The standard equation of motion of a pendulum without damp-
ing is
ẋ 1 (t ) = x 2 (t )
g (B.6)
ẋ 2 (t ) = − ` sin(x 1 (t ))
This way V (x) on Ω has a unique minimum at equilibrium x̄ = (0, 0). Hence V (x) is positive
definite relative to this x̄ for this Ω. Clearly V (x) is also continuously differentiable and V̇ (x)
equals
∂V (x)
V̇ (x) = f (x)
∂x T
∂V (x) ∂V (x)
= f 1 (x) + f 2 (x)
∂x 1 ∂x 2
g
= M g ` sin(x 1 )x 2 − M `2 x 2 sin(x 1 ) = 0.
`
Apparently the mechanical energy is constant over time. Therefore using Theorem B.3.2 we
may draw the conclusion that the system is stable, but not necessarily asymptotically stable.
The fact that V (x(t )) is constant actually implies it is not asymptotically stable. Indeed if we
start at a nonzero state x 0 ∈ Ω – so with V (x 0 ) > 0 – then V (x(t )) = V (x 0 ) for all time and x(t )
thus does not converge to (0, 0). Figure B.7(bottom) indicates level sets {(x 1 , x 2 )|V (x 1 , x 2 ) = c}
of the mechanical energy in the phase plane for several levels c > 0. Solutions x(t ) remain
within its level set. ä
137
For strong Lyapunov functions, Thm. B.3.2 states that x(t ; x 0 ) → x̄ for initial sates x 0 that
are close enough to the equilibrium. At first sight it seems reasonable to expect that the “big-
ger” the Ω the “bigger” the region of attraction. Alas. As demonstrated in Exercise B.3, having
a strong Lyapunov function on the entire state space Ω = Rn does not imply that x(t ; x 0 ) → x̄
for all initial conditions x 0 ∈ Rn . The question that thus arises is: what is the region of attrac-
tion of the equilibrium x̄ in case it is asymptotically stable, and under which conditions is this
region of attraction the entire state space Rn ?
The proof of Theorem B.3.2 gives some insight into the region of attraction. In fact, it
follows that the region of attraction of x̄ includes the largest sphere about x̄ that is contained
in Ω1 := {x ∈ B (x̄, ²) | V (x) < α}, see Fig. B.5. We use this observation to formulate an extra
condition on V (·) that guarantees global asymptotic stability.
Theorem B.3.5 (Global asymptotic stability). Suppose all conditions of Thm. B.3.2 are met
with Ω = Rn . If V : Rn → R is a strong Lyapunov function with the additional property that
then the system is globally asymptotically stable. (Property (B.7) is known as radial unbound-
edness.)
Proof. The proof of Thm. B.3.2 shows that x(t ) → x̄ whenever x 0 ∈ B (x̄, δ) where δ is as in-
dicated in Fig. B.5. Remains to show that any x 0 is in this ball B (x̄, δ), that is, that δ can be
chosen arbitrarily large. We will construct the various regions of Fig. B.5 starting with the
smallest and step-by-step working towards the biggest.
Take an arbitrary x 0 ∈ Rn and let δ := 2kx̄ − x 0 k and α = supkx−x̄k<δ V (x). This α is finite.
Next let Ω1 = {x|V (x) < α}. This set is bounded because V (x) is radially unbounded. (This
is the reason we require radial unboundedness.) By construction we have x 0 ∈ B (x, δ) ⊂ Ω1 .
Therefore ²1 := supx∈Ω1 kx − x̄k is finite. For every ² > ²1 the conditions of Thm. B.3.2 are met
and since x 0 ∈ B (x̄, δ) the proof of Thm. B.3.2 says that x(t ) → x̄ as t → ∞. This works for
every x 0 so the system is globally attractive. Together with stability this means it is globally
asymptotically stable. ■
F IGURE B.8: Phase portrait of the system of Example B.3.6. The origin is globally asymptotically
stable
ẋ 1 (t ) = −x 1 (t ) + x 22 (t )
(B.8)
ẋ 2 (t ) = −x 2 (t )x 1 (t ) − x 2 (t ).
138
Clearly the origin (0, 0) is an equilibrium of this system. We choose
V (x) = x 12 + x 22 .
This V (·) is radially unbounded and it is a strong Lyapunov function on R2 because it is posi-
tive definite and continuously differentiable and
Since V (·) is radially unbounded the equilibrium (0, 0) is globally asymptotically stable. This
also implies that (0, 0) is the only equilibrium. Its phase portrait is shown in Fig. B.8. ä
Powerful as the theory may be, it does not really tell us how to find a Lyapunov function,
assuming one exists. Systematic design of Lyapunov functions is hard, but it does work for
linear-time invariant systems, as discussed in Section B.5.
Example B.4.1 (Pendulum with friction). The equations of motion of a pendulum subject to
damping are
ẋ 1 (t ) = x 2 (t )
g (B.9)
ẋ 2 (t ) = − sin(x 1 (t )) − cx 2 (t )
`
where x 1 (t ) is the angular displacement, x 2 (t ) is the angular velocity and c a positive friction
coefficient. The time-derivative of the mechanical energy V (x) = 21 M `2 x 22 + M g `[1 − cos(x 1 )]
is
g
V̇ (x) = M g ` sin(x 1 )x 2 − M `2 x 2 sin(x 1 ) − c M `2 x 22 = −c M `2 x 22 ≤ 0.
`
The mechanical energy decreases everywhere except if the angular velocity x 2 is zero. Using
Theorem B.3.2 we may only draw the conclusion that the system is stable but not that it is
asymptotically stable because V̇ (x) is zero at other points than the equilibrium (it is zero at
any x = (x 1 , 0)). However from physical considerations we feel that (0, 0) is an asymptotically
stable equilibrium nonetheless. How to prove it? ä
In the above example we would still like to infer asymptotically stability. If we were to
use the theory from the previous section, we would have to find a new Lyapunov function
(different from the mechanical energy), but this is not an easy task. In this section we dis-
cuss a method that allows to prove asymptotic stability without us having to construct a new
Lyapunov function.
From the above pendulum example one might be tempted to conclude that asymptotic
stability follows as long as V (x) decreases “almost everywhere” in state space. That is not
necessarily the case as the following basic example demonstrates.
139
F IGURE B.9: Simple system (Example B.4.2)
ẋ 1 (t ) = 0
ẋ 2 (t ) = −x 2 (t ).
Clearly x 1 (t ) is constant and x 2 (t ) converges exponentially fast to zero (see the vector field of
Fig. B.9). Now
V (x) = x 12 + x 22
V̇ (x) = 2x 1 ẋ 1 + 2x 2 ẋ 2 = −2x 22 ≤ 0.
The set of states x(t ) where V̇ (x(t )) = 0 is where x 2 = 0 (i.e. the x 1 -axis) and everywhere else
in the plane we have V̇ (x(t )) < 0. In that sense V̇ (x) is strictly negative “almost everywhere”.
The origin is however not asymptotically stable because every point (x̄ 1 , 0) on the x 1 -axis is an
equilibrium so no matter how small we take δ > 0, there are always initial states x 0 = (δ/2, 0)
whose solution x(t ) is constant and so does not converge to (0, 0). ä
We set up a generalized Lyapunov theory that allows to prove that the hanging position in
the pendulum-with-friction example (Example B.4.1) is indeed asymptotically stable and that
in Example B.4.2 all solutions converge to the x 1 -axis. It requires a bit of terminology.
The orbit of x 0 is just the set of states that x(t ; x 0 ) traces out as t varies over all t ≥ 0.
Definition B.4.4 (Invariant set). A set G ⊆ Rn is called a (forward) invariant set for (B.2) if
every solution x(t ; x 0 ) of (B.2) with initial condition x 0 in G , is contained in G for all t > 0. ä
So once the state is in an invariant set it never leaves it. Every orbit is an invariant set.
Example B.4.5. The x 1 -axis is an invariant set for the system of Example B.4.2. In fact every
element x = (x 1 , 0) of this axis is an invariant set because they all are equilibria. The general
solution is x(t ) = (x 10 , x 20 e−t ). This shows that for instance also the x 2 -axis {(0, x 2 ) : x 2 ∈ R} is
an invariant set. ä
140
F IGURE B.10: Phase portrait (Example B.4.6)
The union of two invariant sets is itself an invariant set. In fact, the union of an arbitrary
number (finite, infinite, countable, uncountable) of invariant sets is invariant. Also, realize
that every equilibrium is an invariant set.
Example B.4.6 (Rotation invariant phase portrait). The phase portrait of Fig. B.10 is that of
£ ¤
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) 1 − x 12 (t ) − x 22 (t )
£ ¤ (B.10)
ẋ 2 (t ) = −x 1 (t ) + x 2 (t ) 1 − x 12 (t ) − x 22 (t ) .
Inspired by the rotation-invariant phase portrait (see Fig. B.10) we analyze first how the
squared radius
r (t ) := x 12 (t ) + x 22 (t )
If r (0) = 1 then r (t ) is always equal to one, so the unit circle is an invariant set. Furthermore,
Eqn. (B.11) shows that if 0 ≤ r (0) < 1, then 0 ≤ r (t ) < 1 for all time. Hence the open unit disc is
also invariant. Using similar arguments, we find that also the complement of the unit disc is
invariant. ä
In this example the state does not always converge to a single element, but to a set (e.g.
the unit circle in the previous example). We use dist(x, G ) to denote the (minimal) distance
between a point x ∈ Rn and a set G ⊂ Rn . We define
dist(x, G ) := inf kx − g k
g ∈G
and we say that a function x(t ) converges to a set G if limt →∞ dist(x(t ), G ) = 0. The extension
of Lyapunov can now be proved.
141
Theorem B.4.7 (LaSalle’s Invariance Principle). Let x̄ be an equilibrium of the locally
Lipschitz-continuous system ẋ(t ) = f (x(t )) and suppose that V (x) is a Lyapunov function on
some neighborhood Ω of x̄.
This set Ω contains a (nonempty) closed and bounded invariant neighborhood K of x̄,
and for every x 0 ∈ K the solution x(t ) as t → ∞ converges to the subset
G := {x ∗ ∈ K | V̇ (x(t ; x ∗ )) = 0 ∀t ≥ 0}.
Proof. The construction of K is very similar to that of Ω1 in the proof of Thm. B.3.2. Since Ω
is a neighborhood of x̄ there is, by definition, a small enough ball B (x̄, ²) completely contained
in Ω. Let α = minkx−x̄k=² V (x). This α is larger than zero. Then K := {x ∈ B (x̄, ²)|V (x) ≤ α/2}
does the job. Indeed it is bounded, it is closed and since V̇ (x) ≤ 0 it is also invariant. And,
finally, it is a neighborhood of x̄.
The set G is nonempty (it contains x̄). Let x ∗ be an element of G . Then by invariance of
K for every t > 0 the element y := x(t ; x ∗ ) is in K . Also since V̇ (x(s; y)) = V̇ (x(t + s, x ∗ )) = 0
this orbit is in G . Hence G is invariant.
Next let x 0 ∈ K . Since K is invariant, the entire orbit x(t ; x 0 ) is in K for all time.
Now suppose, to obtain a contradiction, that x(t ) does not converge to G . Then, as x(t ) is
bounded, there is a sequence t n of time with limn→∞ t n = ∞ for which x(t n , x 0 ) converges
to some x ∞ 6∈ G . Notice that x ∞ is in K because K is closed. We claim that V (x(t ; x ∞ )) is
constant as a function of time. To see this we need the inequality
(The first inequality holds because V̇ (x) ≤ 0 and the second inequality follows from V̇ (x) ≤
0 combined with the fact that t n + t < t n+k for some large enough k, so that V (x(t n + t )) ≥
V (x(t n+k )) ≥ V (x ∞ ).) Taking the limit n → ∞ turns (B.12) into
V (x ∞ ) ≥ V (x(t ; x ∞ )) ≥ V (x ∞ ).
Hence V (x(t ; x ∞ )) is constant for all time, that is V̇ (x(t ; x ∞ )) = 0. But then x ∞ ∈ G (by defini-
tion of G ) which is a contradiction. Therefore the assumption that x(t ) does not converge to
G is wrong. ■
The proof also provides an explicit description of the set K but if we only want to estab-
lish asymptotic stability then we can normally avoid this description. Its existence is enough.
ẋ 1 (t ) = x 23 (t )
(B.13)
ẋ 2 (t ) = −x 13 (t ) − x 2 (t ).
Clearly the origin (0, 0) is an equilibrium. For this equilibrium, we suggest the Lyapunov func-
tion
V (x) = x 14 + x 24 .
142
This implies that the origin is stable, but not necessarily asymptotically stable. To prove
asymptotic stability we use Thm. B.4.7. This theorem says that a bounded, closed invariant
neighborhood K of (0, 0) exists, but we need not worry about its precise form. The set of in-
terest is G . It contains those initial states x ∗ ∈ K whose solution x(t ; x ∗ ) satisfies the system
equations (B.13) and at the same time is such that V̇ (x(t )) = 0 for all time, but for our example
the latter means
x 2 (t ) = 0 ∀t .
ẋ 1 (t ) = 0
0 = −x 13 (t ) − 0, ∀t .
G = {(0, 0)}.
LaSalle’s Invariance Principle proves that for every x 0 ∈ K the x(t ) converges to (0, 0) and that
the system, hence, is asymptotically stable. ä
Example B.4.9 (Example B.4.1 continued). Consider the pendulum system from Exam-
ple B.4.1,
ẋ 1 (t ) = x 2 (t )
g (B.14)
ẋ 2 (t ) = − sin(x 1 (t )) − cx 2 (t ).
`
We found that the mechanical energy V (x) is a Lyapunov function on some small enough
neighborhood Ω of the hanging equilibrium x̄ = (0, 0) and we also found that
V̇ (x) = −cM `2 x 22 .
The equality V̇ (x(t )) = 0 hence holds for all time iff x 2 (t ) = 0 for all time and the LaSalle set G
therefore is
G = {x ∗ ∈ K | x 1∗ = kπ, k ∈ Z, x 2∗ = 0}.
This set contains at most two physically different solutions: the hanging downwards solution
£ ¤ £ ¤
x ∗ = 00 and the standing upwards solution x ∗ = π0 . To rule out the upwards solution it
£ ¤ £ ¤
suffices to take the neighborhood Ω of x̄ = 00 so small that π0 6∈ Ω. For example
LaSalle’s Invariance Principle now guarantees the existence of an invariant, closed, bounded
£ ¤
neighborhood K of x̄ in Ω. Cleary this K does not contain π0 either, so then
£0¤
G ={ 0 }
143
Although not strictly needed, it may be interesting to know that we can take K equal
to the set of states close enough to (x 1 , x 2 ) = (0, 0) and whose energy is strictly less than the
energy of the upwards position, for example,
£π¤
K = {x ∈ R2 | −π < x 1 < π,V (x) ≤ 0.9V ( 0 )}.
Since the energy does not increase over time it is immediate that this set is invariant. It is also
closed and bounded, and it is a neighborhood of (0, 0). ä
ẋ 1 (t ) = 0,
ẋ 2 (t ) = −x 2 (t )
with equilibrium x̄ = (0, 0) and Lyapunov function V (x) = x 12 + x 22 . In Example B.4.2 we found
that V (x) is indeed a Lyapunov function and that
V̇ (x) = −2x 22 ≤ 0.
ẋ 1 (t ) = 0.
G = {(x 1 , x 2 ) ∈ K | x 1 = c, x 2 = 0}.
This is the x 1 -axis. Now LaSalle’s Invariance Principle says that all states converge to the x 1 -
axis. For K we can take for instance K = {x ∈ R2 |V (x) ≤ 10000}. ä
L(x) ≥ 0
per unit time, when we are at state x. As time progresses we move as dictated by the differen-
tial equation and so the cost L(x(t )) typically changes with time. The cost-to-go V (x 0 ) is now
defined as the total payment over the infinite future if we start at x 0 , that is, it is the integral
of L(x(t )) over positive time,
Z ∞
V (x 0 ) := L(x(τ)) dτ, x(0) = x 0 . (B.15)
0
If L(x(t )) decreases quick enough as we approach the equilibrium x̄ then the cost-to-go may
be well defined (finite) and possibly it is going to be continuously differentiable in x 0 as well.
These are technical considerations and they might be hard to verify. The interesting property
of the cost-to-go V (x(t )) is that it decays as t increases. In fact
144
whenever V (x) is convergent. To see this split the cost-to-go into an integral over the first h
units of time and an integral over the time beyond h,
Z t +h Z ∞ Z t +h
V (x(t )) = L(x(τ)) dτ + L(x(τ)) dτ = L(x(τ)) dτ + V (x(t + h)).
t t +h t
Therefore
R t +h
V (x(t + h)) − V (x(t )) − t L(x(τ)) dτ
V̇ (x(t )) = lim = lim = −L(x(t )) (B.17)
h→0 h h→0 h
if L(x) is continuous. An interpretation of the equality is that the current cost-to-go minus
the cost-to-go from tomorrow onwards, is what we pay today. The function L(x) is called the
running cost. In physical applications L(x) is often the dissipated power and then V (x) is the
total dissipated energy.
As mentioned earlier, the only obstacle is that the integral (B.15) has to be well defined
and continuously differentiable in x 0 . If the system dynamics is linear of the form
ẋ(t ) = Ax(t )
then these obstacles can be overcome and we end up with a very useful result. It is a classic
theorem in Systems Theory. In this result we take the running cost to be quadratic in x,
L(x) = x T Qx
Theorem B.5.1 (Lyapunov equation). Let A ∈ Rn×n and consider ẋ(t ) = Ax(t ) with equilib-
rium x̄ = 0 ∈ Rn . Suppose Q ∈ Rn×n is positive definite and let
Z ∞
V (x 0 ) := x T (t )Qx(t ) dt (B.18)
0
3. V (x) defined in (B.18) exists for every x ∈ Rn and it is a strong Lyapunov function for
this system. In fact V (x) is then quadratic, V (x) = x T P x, with P ∈ Rn×n the well defined
positive definite matrix
Z ∞
T
P := e A t Q e At dt . (B.19)
0
A T P + P A = −Q (B.20)
1. =⇒ 2. Trivial.
145
2. =⇒ 3. The solution of ẋ(t ) = Ax(t ) is x(t ) = e At x 0 . By asymptotic stability the entire tran-
sition matrix converges to zero limt →∞ e At = 0 ∈ Rn×n . Now
Z ∞ Z ∞ T
V (x 0 ) = (e At x 0 )T Q(e At x 0 ) dt = x 0T (e A t Q e At )x 0 dt = x 0T P x 0
0 0
R∞ T
for P := 0 e A t Q e At dt . This P is well defined because e At converges to zero expo-
nentially fast. This P is positive definite because it is the integral of a positive definite
matrix.
So V (x 0 ) is well defined and quadratic and, hence, continuously differentiable. It has
a unique minimum at x 0 = 0 and, as we showed earlier, V̇ (x) = −L(x) := −x T Qx ≤ 0.
Hence V (x) is a Lyapunov function, in fact strong Lyapunov function because −x T Qx =
0 iff x = 0.
3. =⇒ 4. Take P defined in (B.19). On the one hand we have V̇ (x) = −L(x) = −x T Qx and on
the other hand we have
d T
V̇ (x) = x P x = ẋ T P x + x T P ẋ = x T (A T P + P A)x.
dt
x T (A T P + P A +Q)x = 0 ∀x ∈ Rn .
d T
4. =⇒ 1. Then V (x) := x T P x satisfies V̇ (x) = dt x P x = ẋ T P x +x T P ẋ = x T (A T P +P A)x = −x T Qx
so it is a strong Lyapunov function with V̇ (x) < 0 for all x 6= 0. It is radially unbounded
hence the equilibrium is globally asymptotically stable (Thm. B.3.5.)
ẋ(t ) = −2x(t )
−2p − 2p = −q = −1
has a unique solution p = 1/4 and it is positive. Note that Thm. B.5.1 says that we may take
any q > 0 that we like. Indeed, whatever positive q > 0 we take, we have that the solution
p = q/4 of the Lyapunov equation is unique and is positive. ä
146
Notice that (B.20) is a linear equation in the entries of P and is therefore easily solved (it
requires a finite number of operations). Combined with the fact that positive definiteness of
a matrix is a finite test (Appendix A.1) allows to conclude that stability of ẋ = Ax can be tested
in a finite number of steps.
By symmetry the upper-right and lower-left entries are identical so the above equation are
effectively three equations in the three unknowns α, β, γ:
−2α = −1,
2α − 2β = 0,
4β − 2γ = −1.
This matrix is positive definite because P 11 = 12 > 0 and det(P ) = 1 > 0 (see Appendix A.1). So
the differential equation with equilibrium x̄ = (0, 0) is (globally) asymptotically stable. ä
with A ∈ Rn×n some matrix and o : Rn → Rn some “little-o” function which means having the
property that
k o(δx )k
lim = 0. (B.23)
δx →0 kδx k
147
We think of little-o functions as functions that are “extremely small” around the origin.
To analyze the behavior of the state x(t ) relative to an equilibrium x̄ it makes sense to
define δx (t ) as the difference between state and equilibrium,
δx (t ) := x(t ) − x̄.
The linearized system of ẋ(t ) = f (x(t )) at equilibrium x̄ is now simply defined as the system
in which the little-o term o(δx (t )) is deleted:
δ̇x (t ) = Aδx (t ).
x→ δx →
x̄ 0
f (x) Aδx
F IGURE B.11: Nonlinear f (x) (left) and its linear approximation Aδx (right)
x̄ = 0.
The idea of linearization is that around x̄ the function f (x) is almost indistinguishable from
its tangent with slope
d f (x̄)
A= = −2 cos(0) = −2
dx
(see Fig. B.11) and so the solutions of (B.25) will probably be quite similar to x(t ) = x̄ +δx (t ) =
δx (t ) with δx (t ) the solution of the linear system
provided that δx (t ) is small. The above linear system (B.26) is known as the linearized system
of (B.25) at equilibrium x̄ = 0. ä
148
Lyapunov’s first method, presented next, roughly speaking says that the nonlinear system
and the linearized system have the same asymptotic stability properties. The only exception
to this rule is if the eigenvalue of largest real part is on the imaginary axis. The proof of this
result relies on the fact that every asymptotically stable linear system has a Lyapunov function
(its cost-to-go) which then turns out to be a Lyapunov function for the nonlinear system as
well:
1. If all eigenvalues of the Jabobian (B.24) have strictly negative real part, then x̄ is an
asymptotically stable equilibrium of the nonlinear system.
2. If there is an eigenvalue of the Jacobian (B.24) with strictly positive real part, then x̄ is
an unstable equilibrium of the nonlinear system.
Proof. (First realize that continuous differentiability of f (·) implies Lipschitz continuity and
so Lyapunov theory might be applicable.) Write f (x) as in (B.22). Without loss of generality
we assume that x̄ = 0, and we define A as in (B.24).
A T P + P A = −I
and that V (x) = x T P x is a strong Lyapunov function for the linear system δ̇x (t ) = Aδx (t ).
We prove that this V (x) is also a strong Lyapunov function for ẋ(t ) = f (x(t )) on some
neighborhood Ω of x̄ = 0. Clearly this V (x) is positive definite and continuously differ-
entiable and positive definite. We have that
V̇ (x) = ẋ T P x + x T P ẋ
= f (x)T P x + x T P f (x)
= [Ax + o(x)]T P x + x T P [Ax + o(x)]
= x T (A T P + P A)x + o(x)T P x + x T P o(x)
= −x T x + 2 o(x)T P x
= −kxk2 + 2 o(x)T P x.
The term 2 o(x)T P x we recognize as the standard inner product of 2 o(x) and P x, so by
the Cauchy-Schwarz inequality we can bound it from above with
149
2. See (Khalil, 1996, Thm. 3.7).
These two cases of Theorem B.6.2 cover all possible eigenvalue configurations, except
when some eigenvalues have zero real part and none have positive real part, see Exercise B.5.
ẋ 1 (t ) = x 1 (t ) + x 1 (t )x 22 (t )
ẋ 2 (t ) = −x 2 (t ) + x 12 (t )x 2 (t ).
£0¤
The system has equilibrium x̄ := 0 and the Jacobian at that equilibrium equals
· ¸¯ · ¸
∂ f (x̄) 1 + x 22 2x 1 x 2 ¯¯ 1 0
A= = = .
∂x T 2x 1 x 2 −1 + x 12 ¯x=(0,0) 0 −1
Clearly it has eigenvalues ±1. In particular it has a positive eigenvalue. Lyapunov’s first
method hence proves that the system at this equilibrium is unstable. ä
B.7 Exercises
B.1 (a) Prove that if V (x) is a Lyapunov function for the system (B.2) with equilibrium
point x̄, then V̇ (x̄) = 0.
(b) Prove that if a system of the form (B.2) has more than one equilibrium point, then
none of these equilibrium points is globally asymptotically stable.
(c) Consider the linear system
ẋ(t ) = Ax(t ),
with A an n × n matrix. Prove that this system either has exactly one equilibrium,
or infinitely many equilibria.
B.2 Investigate the stability of the origin for the following two systems (that is, check all six
stability types mentioned in Definition B.2.2). Use a suitable Lyapunov function.
(a)
ẋ 1 (t ) = −x 13 (t ) − x 22 (t )
ẋ 2 (t ) = x 1 (t )x 2 (t ) − x 23 (t ).
ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = −x 13 (t ).
β
[Hint: try V (x 1 , x 2 ) = x 1α + c x 2 and then determine suitable α, β, γ.]
150
B.3 This exercise is based on an exercise in Khalil (1996) who, in turn, took it from Hahn
(1967) and it appears that Hahn was inspired by an example from a paper by Barbashin
and Krasovskı̆ (1952). Consider the system
−x 1 (t ) + x 2 (t )(1 + x 12 (t ))2
ẋ 1 (t ) =
(1 + x 12 (t ))2
−x 1 (t ) − x 2 (t )
ẋ 2 (t ) =
(1 + x 12 (t ))2
x 12
V (x) = + x 22 .
1 + x 12
x1 x2 > 1
0 5
F IGURE B.12: A phase portrait of the system of Exercise B.3. The red dashed lines are level sets
of V (x). The boundary of the shaded region x 1 x 2 > 1 is where x 2 = 1/x 1
B.4 Adaptive Control. The following problem from adaptive control illustrates an extension
of the theory of Lyapunov functions to functions that are, strictly speaking, no longer
151
Lyapunov functions. This problem concerns the stabilization of a system of which the
parameters are not (completely) known. Consider the following scalar system.
where a is a constant and u(t ) is an input that we should choose to steer the state to
zero. If we know a then u(t ) = −kx(t ), with k > a, would solve the problem. However,
we assume that a is unknown but that we can measure x(t ). Contemplate the following
dynamic state feedback input
The idea is that k(t ) increases until it has stabilized the system, so until x(t ) is equal to
zero.
(a) Write (B.27)–(B.28) as one system with state (x, k) and determine all equilibrium
points.
(b) Consider the function V (x, k) := x 2 + (k − a)2 . Prove that V̇ (x, k) = 0 for all x, k. For
which equilibrium point is this a Lyapunov function?
(c) Prove, using the above, that k(t ) is bounded.
(d) Prove, using (B.28), that k(t ) converges as t → ∞.
(e) Prove that x(t ; x 0 ) converges, and prove specifically that limt →∞ x(t ; x 0 ) = 0.
(f) Determine limt →∞ k(t ).
ẋ(t ) = ax 3 (t )
with a ∈ R.
(a) Prove that the linearization of this system about its equilibrium point is indepen-
dent of a.
(b) Sketch the graph of ax 3 as a function of x and use it to argue that the equilibrium
is
• For which α is the equilibrium asymptotically stable?
• For which α is the equilibrium stable?
• For which α is the equilibrium unstable?
ẋ 1 (t ) = −x 15 (t ) − x 2 (t ),
ẋ 2 (t ) = x 1 (t ) − 2x 23 (t ).
152
B.7 Suppose that
ẋ 1 (t ) = x 2 (t ) − x 1 (t )
ẋ 2 (t ) = −x 13 (t )
This equation occurs in the study of vacuum tubes and then ² is positive. However, in
this the exercise we take ² < 0.
(a) Rewrite this equation in the standard form (B.2) with x 1 (t ) := y(t ) and x 2 (t ) := ẏ(t ).
(b) Use linearization to show that the origin (x 1 , x 2 ) = (0, 0) is an asymptotically stable
equilibrium (recall that ² < 0).
(c) Determine a neighborhood Ω of the origin for which V (x 1 , x 2 ) = x 12 + x 22 is a Lya-
punov function for x̄ = (0, 0).
(d) Let V (x 1 , x 2 ) and Ω be as in the previous part. What stability properties can be
concluded from LaSalle’s invariance principle?
B.9 The well-known Lotka-Volterra model describes the interaction between a population
of predators (with size x 1 ) and preys (with size x 2 ), and is given by the equations
ẋ 1 (t ) = −ax 1 (t ) + bx 1 (t )x 2 (t ), x 1 (0) ≥ 0
(B.29)
ẋ 2 (t ) = cx 2 (t ) − d x 1 (t )x 2 (t ) x 2 (0) ≥ 0.
The first term on the right-hand side of the first equation shows that the predators will
become extinct without food, while the second term shows that the growth of their pop-
ulation is proportional to the size of the population of prey. Likewise, the term on the
right-hand side of the second equation shows that without predators, the population of
prey will increase, and that its decrease is proportional to the size of the population of
predators. For convenience we choose a = b = c = d = 1.
(a) Show that, apart from (0, 0), the system has a second equilibrium point.
(b) Investigate the stability of both equilibrium points using linearization.
(c) Investigate the stability of the nonzero equilibrium point using the function
V (x 1 , x 2 ) = x 1 + x 2 − ln(x 1 x 2 ) − 2.
153
B.10 The equations of motion of the pendulum with damping in state form is
ẋ 1 (t ) = x 2 (t )
g k (B.30)
ẋ 2 (t ) = − sin(x 1 (t )) − x 2 (t ).
` `
where x 1 is the angular displacement, and x 2 is the angular velocity, g is the gravita-
tional constant, ` is the length of the pendulum and k is the damping constant. All
constants are positive.
(a) Prove, using Theorem B.6.2, that the origin is an asymptotically stable equilibrium
point.
(b) In Example B.4.9 we verified asymptotic stability using LaSalle’s invariance prin-
ciple. Here we want to construct a strong Lyapunov function to show asymptotic
stability using Theorem B.3.2: determine a symmetric matrix P > 0 such that the
function
V (x) := x T P x + g [1 − cos(x 1 )]
ẋ 1 (t ) = −2x 1 (t ) [x 1 (t ) − 1] [2x 1 (t ) − 1]
(B.31)
ẋ 2 (t ) = −2x 2 (t ).
ẋ 1 (t ) = x 1 (t )(1 − x 22 (t ))
ẋ 2 (t ) = x 2 (t )(1 − x 12 (t )).
For each of the equilibrium points determine the linearization and the nature of stabil-
ity of the linearization.
B.13 Have a look at Fig. B.13. The equations of motion of a rigid body spinning around its
center of mass are
where ω(t ) := (ω1 (t ), ω2 (t ), ω3 (t )) is the vector of angular velocities around the three
principle axes of the rigid body and I 1 , I 2 , I 3 > 0 are the principal moments of inertia.
The kinetic energy (due to rotation) is
1
(I 1 ω21 + I 2 ω22 + I 3 ω31 ).
2
(a) Prove that the origin ω = (0, 0, 0) is a stable equilibrium.
154
ω3
ω2
ω1
(This implies a certain lack of symmetry of the rigid body, e.g. it is not a unit cube,
see Fig. B.13.)
(c) The origin (0, 0, 0) is just one equilibrium. Determine all equilibria and explain
what this implies about the stability properties.
(d) Determine the linearization around each of the equilibria.
(e) Use linearization to prove that steady spinning around the second principal axis
(0, ω¯2 , 0) is unstable if ω̄2 6= 0.
(f) This is a tricky question. Prove that both the kinetic energy
1 2 2 3
2 (I 1 ω1 + I 2 ω2 + I 3 ω1 )
are constant over time and use this to prove that steady spinning around the first
and third principal axes is stable, but not asymptotically stable.
B.14 Consider the system ẋ(t ) = f (x(t )) with equilibrium point x̄. Suppose that there ex-
ists a Lyapunov function such that V̇ (x) = 0 for all x ∈ Ω. Prove that this system is not
asymptotically stable.
B.15 Let x(t ; x 0 ) be a solution of the differential equation ẋ(t ) = f (x(t )). Prove that the orbit
O (x 0 ) = {x(t ; x 0 ) | t ≥ 0} is an invariant set for ẋ(t ) = f (x(t )).
B.16 A trajectory x(t ; x 0 ) is closed if x(t +s; x 0 ) = x(t , x 0 ) for some t and some s > 0. Let x(t ; x 0 )
be a closed trajectory of (B.2) and let V (x) be a Lyapunov function for this system. Prove
that V̇ (x(t ; x 0 )) = 0.
155
B.17 In this exercise, we look at variations on the system (B.10) from Example B.4.6. We
investigate the system
£ ¤
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) γ − x 12 (t ) − x 22 (t )
£ ¤ (B.33)
ẋ 2 (t ) = −x 1 (t ) + x 2 (t ) γ − x 12 (t ) − x 22 (t )
B.18 (Assumes Appendix A.1.) Determine all α, β ∈ R for which the matrix
α 0 0
P = 0 1 β
0 β 4
B.19 (Assumes Appendix A.1.) Let the matrices A and Q be given by:
· ¸ · ¸
0 1 4 6
A= , Q= ,
−2 −3 6 10
A T P + P A = −Q.
(b) Show that P and Q are positive definite and conclude that A is Hurwitz (i.e. that
all its eigenvalues have strictly negative real part).
(a) Use M APLE or M ATLAB to determine the solution of the Lyapunov equation
A T P + P A = −I .
ẋ 1 (t ) = x 1 (t ) + 2x 2 (t )
ẋ 2 (t ) = −αx 1 (t ) + (1 − α)x 2 (t )
Determine all α’s for which this differential equation is asymptotically stable.
156
4
−1 1
−1
−2
−3
−4
157
£p 0¤
(a) Determine a diagonal positive definite matrix P of the form P = 0 1
for which
also P A + A T P is diagonal.
(b) Show that x T P x is a strong Lyapunov function for this system (with equilibrium
x̄ = 0).
(c) Sketch in Fig. B.14 a couple of level sets {x | x T P x = constant} and explain from this
figure why indeed V̇ (x(t )) < 0 for all nonzero x(t ).
B.23 Notice that the results that we derived in this chapter are valid only for time-invariant 1
systems ẋ(t ) = f (x(t )). For time-varying systems ẋ(t ) = f (x(t ), t ) the story is quite dif-
ferent, even if the system is linear of the form
For linear systems it might sound reasonable to conjecture that it is asymptotically sta-
ble if for every t all eigenvalues of A(t ) have negative real part. In this exercise we will
see that this is wrong. Consider the system (B.34) where
(
A even if bt c is even
A(t ) = (B.35)
A odd if bt c is odd
with
· ¸ · ¸
−1 − π/2
3
−1 −3π/2
A even = , A odd = 1 .
3π/2 −1 3 π/2 −1
Here bt c denotes the floor of t (the largest integer less than are equal to t ). The system
hence switches dynamics at every t ∈ Z.
(a) Prove that the eigenvalues of A(t ) at each moment in time are −1 ± iπ/2. So in
particular the eigenvalues do not depend on t and its real parts are less than zero
for all time.
(b) Verify that
· ¸
cos( π2 t ) − 13 sin( π2 t )
−t
x(t ) = e x(0)
3 sin( π2 t ) cos( π2 t )
· π ¸
−t cos( 2 t ) −3 sin( π2 t )
x(1 + t ) = e 1 π x(1)
3 sin( 2 t ) cos( π2 t )
for all k ∈ Z and use it to conclude that the time-varying system (B.34) is not
asymptotically stable.
158
F IGURE B.15: Stable or not? Globally attractive or not? See Exercise B.24
B.24 This exercise is based on an example from a paper by Ryan and Sontag (2006). Consider
the system ẋ(t ) = f (x(t )) with
" 1 x1
#
−x 1 (1 − kxk ) − 2x 2 (1 − kxk )
if kxk ≥ 1
−x 2 (1 − 1 ) + 2x 1 (1 − x1 )
kxk kxk
f (x) = " #
2(x 1 − 1)x 2
if kxk < 1
−(x 1 − 1)2 + x 22
Inside the unit disc f (x) is defined differently than outside the unit disc. Nevertheless,
f (x) is locally Lipschitz, also on the unit circle. Inside the unit circle, the orbits are arcs
(part of circles) that converge to x = (1, 0), see Fig. B.15. Outside, kxk ≥ 1, the p system is
easier to comprehend in polar coordinates (x, y) = (r cos(θ), r sin(θ)) with r = x 2 + y 2 .
This gives
r˙(t ) = 1 − r (t )
(B.36)
θ̇(t ) = 4 sin2 (θ(t )/2) = 2(1 − cos(θ(t ))).
B.25 Let A ∈ Rn×n and suppose that A + A T is negative definite. Is the origin a stable equilib-
rium of ẋ(t ) = Ax(t )?
159
B.27 time-varying systems There is a way to transform time-varying systems
(b) Use the above and Thm. B.1.3 to prove the following:
Let t 0 ∈ R and x 0 ∈ Rn . If f (x, t ) is Lipschitz continuous at (x 0 , t 0 ) then, for some
δ > 0, the differential equation ẋ(t ) = f (x(t ), t ), x(t 0 ) = x 0 has a unique solution
x(t ; x 0 ) for all t ∈ [t 0 , t 0 + δ).
160
Appendix C
Bibliography
E.A. Barbashin and N.N Krasovskı̆. Ob ustoichivosti dvizheniya vtzelom. Dokl. Akad. Nauk.
USSR, 86(3):453–456, 1952. (Russian). English title: "On the stability of motion in the large".
W. Hahn. Stability of Motion, volume 138 of Die Grundlehren der mathematischen Wis-
senschaften. Springer-Verlag, New York, 1967.
H.K. Khalil. Nonlinear Systems. Macmillan Publisher Company, New York, 2 edition, 1996.
E.P. Ryan and E.D. Sontag. Well-defined steady-state response does not imply CICS. System
and Control Letters, 55:707–710, 2006.
E.D. Sontag. Mathematical Control Theory: Deterministic Finite Dimensional Systems (2Nd
Ed.). Springer-Verlag, Berlin, Heidelberg, 1998. ISBN 0-387-98489-5.
161
Index
162
constant, 130 stationary, 7
continuity, 130
locally, 130 tracking problem, 113
Lotka-Volterra model, 153
unstable, 133
LQ problem, 85
Lyapunov value function, 66, 78
first method, 147 Van der Pol equation, 153
function, 135
second method, 133 Zermelo, 50
strong function, 135
Lyapunov function, 135
observability, 124
optimal control, 35
optimal cost-to-go, 66
orbit, 140
output, 124
163