0% found this document useful (0 votes)
116 views

Calculus of Variations

This book provides an introduction to optimal control theory through calculus of variations, optimal control, dynamic programming, and linear quadratic control. It covers fundamental concepts like the Euler-Lagrange equation, minimum principle, Hamilton-Jacobi-Bellman equation, and Riccati equations. The book aims to teach these topics to undergraduate and graduate students and includes examples, illustrations, exercises and background material in the appendices. It is based on lecture notes developed at the University of Twente that have been expanded and revised by the authors over many years of teaching optimal control.

Uploaded by

Jesús Alcázar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
116 views

Calculus of Variations

This book provides an introduction to optimal control theory through calculus of variations, optimal control, dynamic programming, and linear quadratic control. It covers fundamental concepts like the Euler-Lagrange equation, minimum principle, Hamilton-Jacobi-Bellman equation, and Riccati equations. The book aims to teach these topics to undergraduate and graduate students and includes examples, illustrations, exercises and background material in the appendices. It is based on lecture notes developed at the University of Twente that have been expanded and revised by the authors over many years of teaching optimal control.

Uploaded by

Jesús Alcázar
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 167

Calculus of Optimal Control

Gjerrit Meinsma
Arjan van der Schaft

2018
Preface
This book reflects a long history of teaching Optimal Control at the Department of Applied
Mathematics, University of Twente. Main contributors to the original lecture notes are Hans
Zwart, Jan Willem Polderman (University of Twente) and Henk Nijmeijer (Eindhoven Univer-
sity of Technology). In 2006–2008 Arjan van der Schaft (University of Groningen) made a num-
ber of substantial revisions and modifications. In the period 2010–2018 Gjerrit Meinsma (Uni-
versity of Twente) rewrote most of the material, included new material and added a number
of examples, illustrations and alternative proofs and more.
The book aims at final year Bsc students and Msc students. Now in optimal control we fre-
quently switch from constants x to functions x(·) and this can be confusing upon first reading.
This is the reason we emphasise this difference by including brackets (·) whenever a function
is meant.

ii
Contents

1 Calculus of Variations 1
1.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.2 Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.3 Beltrami identity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.4 Higher-order Euler-Lagrange equation . . . . . . . . . . . . . . . . . . . . . . . . 13
1.5 Relaxed boundary conditions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.6 Intermezzo: Lagrangian Mechanics . . . . . . . . . . . . . . . . . . . . . . . . . . 17
1.7 Second order conditions for minimality . . . . . . . . . . . . . . . . . . . . . . . . 21
1.8 Integral constraints . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
1.9 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28

2 Minimum Principle 35
2.1 Optimal control . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 35
2.2 Summary of the classic Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . 36
2.3 First-order conditions for optimal control . . . . . . . . . . . . . . . . . . . . . . 36
2.4 Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
2.5 Optimal control with final constraints . . . . . . . . . . . . . . . . . . . . . . . . . 46
2.6 Free final time . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
2.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54

3 Dynamic Programming 61
3.1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.2 Principle of optimality . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.3 Discrete-time Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . . . . 62
3.4 Hamilton-Jacobi-Bellman equation . . . . . . . . . . . . . . . . . . . . . . . . . . 66
3.5 Connection with Hamiltonians . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 75
3.6 Infinite horizon and Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . 78
3.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 80

4 Linear Quadratic Control 85


4.1 Linear systems with quadratic cost . . . . . . . . . . . . . . . . . . . . . . . . . . . 85
4.2 Finite horizon LQ: Minimum Principle . . . . . . . . . . . . . . . . . . . . . . . . 86
4.3 Finite Horizon LQ: Dynamic Programming . . . . . . . . . . . . . . . . . . . . . . 92
4.4 Riccati Differential Equations (RDE’s) . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.5 Infinite horizon LQ and Algebraic Riccati Equations (ARE’s) . . . . . . . . . . . . 96
4.6 Application: LQ control design for connected cars . . . . . . . . . . . . . . . . . 103
4.7 Connection between Hamiltonians and Riccati equations . . . . . . . . . . . . . 104
4.8 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 110

A Background material 117

iii
A.1 Positive definite functions and matrices . . . . . . . . . . . . . . . . . . . . . . . . 117
A.2 A notation for partial derivatives . . . . . . . . . . . . . . . . . . . . . . . . . . . . 118
A.3 Separation of variables . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 120
A.4 Linear constant-coefficient DE’s . . . . . . . . . . . . . . . . . . . . . . . . . . . . 121
A.5 System of linear time-invariant DE’s . . . . . . . . . . . . . . . . . . . . . . . . . . 122
A.6 Stabilizability and detectability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
A.7 Lagrange multipliers . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 126

B Differential equations and Lyapunov stability 129


B.1 Existence and uniqueness of solutions . . . . . . . . . . . . . . . . . . . . . . . . 129
B.2 Definitions of stability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
B.3 Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 133
B.4 LaSalle’s Invariance Principle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 139
B.5 Cost-to-go Lyapunov functions . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144
B.6 Lyapunov’s first method . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 147
B.7 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150

C Bibliography 161

Index 162

iv
Chapter 1

Calculus of Variations

1.1 Introduction
Calculus of variations deals with minimization of expressions of the form
Z T
F (t , x(t ), ẋ(t )) dt
0

over all functions

x : [0, T ] → Rn .

In contrast to basic optimization – where we optimize over a finite number of parameters –


we can vary here in principle over all functions x : [0, T ] → Rn . Many fruitful applications of
the calculus of variations have been developed in physics, in particular in connection with
Hamilton’s principle of least action. Also in other sciences such as economics, biology, and
chemistry the calculus of variations has led to useful applications.
We start with some motivating examples. Of course the first example is the celebrated
brachistochrone problem. This problem was introduced in 1696 by Johann Bernoulli and it
is possibly the first problem of this type1 . When Bernoulli formulated the problem it imme-
diately attracted a lot of attention and several mathematicians submitted solutions to this
problem, including Leibniz, Newton, l’Hopital, and Johann Bernoulli’s brother Jakob.

Example 1.1.1 (Brachistochrone). A present-day formulation of the brachistochrone prob-


lem is as follows. Consider in R2 two points A = (x 0 , y 0 ) and B = (x 1 , y 1 ). The problem is to
find a path (a function) y(x) through (x 0 , y 0 ) and (x 1 , y 1 ) such that a point mass released at A
that glides along this curve friction free, reaches the point B in minimal time. It is assumed
that the gravitational acceleration g is constant. Fig. 1.1 depicts four possible such paths and,
actually, one of them is the optimal solution. Can you guess which one? This is a hard prob-
lem and we will solve this problem step by step in a series of examples. Here we set it up
as a calculus of variations problem. It is convenient to take the vertical displacement y(x) to
increase when going down (i.e. y points downwards, see Fig. 1.2). Also, without loss of gener-
ality we take (x 0 , y 0 ) = (x 0 , 0). That is, we release the point mass at zero altitude. As the mass
moves friction free we have that kinetic plus potential energy is constant,

1
mv 2 (x) − mg y(x) = c. (1.1)
2
1 Newton’s minimal resistance problem can also be seen as a calculus of variations problem and it predates it

by almost ten years

1
A A

B B

A A

B B

F IGURE 1.1: Four paths from A to B . Which is fastest?

x
(x0 , 0)
y

F IGURE 1.2: In this case a positive y means a negative altitude

dy ds

dx
p
F IGURE 1.3: ds = 1 + ẏ 2 (x) dx

2
Here v(x) is the speed of the mass and c is the kinetic energy at time zero c = 12 mv 2 (x 0 ).
We release the mass with zero speed so c = 0 and hence the speed follows from the vertical
displacement as
p
v(x) = 2g y(x). (1.2)

By the Pythagorean theorem, an infinitesimal


p horizontal displacement dx corresponds to a
displacement ds along the curve of ds = 1 + ẏ 2 (x)dx, see Fig. 1.3. The amount of time that
this takes is
s
ds 1 + ẏ 2 (x)
dt = = dx.
v 2g y(x)

This way the time T needed to travel from (x 0 , y 0 ) = (x 0 , 0) to (x 1 , y 1 ) can be seen as an integral
over x,
Z T Z x1 s
1 + ẏ 2 (x)
T= dt = dx. (1.3)
0 x0 2g y(x)

We need to minimize the integral (1.3) over all functions y(x) subject to y(x 0 ) = y 0 = 0 and
y(x 1 ) = y 1 . ä

Example 1.1.2 (Oil production). An oil company is to deliver an amount of L liters of oil at a
delivery time T . The company wants to find a production schedule for completing the order
with minimal costs. Let `(t ) denote the amount of oil at time t . We assume that both storing
oil and producing oil is costly. The total cost might be modeled as
Z T
α`˙2 (t ) + β`(t ) dt (1.4)
0

where β`(t ) models the storage cost per unit time and α`˙2 (t ) models the production cost per
unit time. The constants α, β are positive numbers. The objective of the oil company is to
˙ ) that minimizes the above cost, subject to the conditions
determine a production rate `(t

`(0) = 0, `(T ) = L, ˙ ) ≥ 0.
`(t (1.5)

Example 1.1.3 (Shortest path). What is the shortest path between two points (x 0 , y 0 ) and
(x 1 , y 1 ) in R2 ? Of course we know the answer but let us anyway formulate this problem in
more detail.
Clearly the path is characterized by a function y(x). As in Example 1.1.1, theplength ds of
an infinitesimal part of the path follows from an infinitesimal part dx as ds = 1 + ẏ 2 (x) dx
dy(x)
where ẏ(x) = dx . So the total length of the path is
Z x1 q
1 + ẏ 2 (x) dx. (1.6)
x0

This has to be minimized subject to

y(x 0 ) = y 0 , y(x 1 ) = y 1 . (1.7)

With the exception of the final example, the optimal solution – if one exists at all – is
typically hard to find.

3
xT

x1 (t )

x0
x2 (t )

t =0 t =T

F IGURE 1.4: Two possible functions that satisfy (1.9)

1.2 Euler-Lagrange equation


The examples given in the preceding section are instances of what is called the simplest prob-
lem in the calculus of variations:

Definition 1.2.1 (Simplest problem in the calculus of variations). Given a final time T > 0
and a function F : [0, T ] × Rn × Rn → R and states x 0 , x T ∈ Rn , the simplest problem in the cal-
culus of variations is to minimize the cost J (·) defined as
Z T
J (x(·)) = F (t , x(t ), ẋ(t )) dt (1.8)
0

over all functions x : [0, T ] → Rn that satisfy the boundary conditions

x(0) = x 0 , x(T ) = x T . (1.9)

The function J (·) is called the cost function and the integrand F (·) of this cost is sometimes
called the running cost. For n = 1 the problem is visualized in Fig. 1.4. Given the two points
(0, x 0 ) and (T, x T ) each smooth function x(·) between the two points determines a cost J (x(·))
as defined in (1.8) and the problem is to find the function x(·) that minimizes this cost.
This problem can be regarded as an infinite dimensional version of the standard prob-
lem of finding the minimizer z ∗ ∈ Rn of a function K : Rn → R. The difference is that the
function K (z) is replaced by an integral expression J (x(·)) and z ∈ Rn is replaced by functions
x : [0, T ] → Rn .
Most of the times we make the following two assumptions.

Assumption 1.2.2. The function F (t , x, y) in (1.8) is twice continuously differentiable in all its
components t , x, y. ä

Twice continuously differentiable is abbreviated to C 2 . Likewise C 1 means that the deriva-


tive is continuous.

Assumption 1.2.3. The function x : [0, T ] → Rn is C 2 . ä

Under the above assumptions we next derive a differential equation that every solution of
the simplest problem in the calculus of variations must satisfy. This differential equation is
the infinite-dimensional generalization of the well-known first-order condition that a z ∗ ∈ Rn
minimizes a differentiable function K : Rn → R only if the gradient vector ∂K∂z(z) is zero at z =
z∗ .

4
xT

x∗ (t ) + αδx (t )

x∗ (t )

x0

t =0 t =T

F IGURE 1.5: A function x ∗ (t ) and a possible perturbed function x ∗ (t ) + αδx (t ). At t = 0 and


t = T the perturbation αδx (t ) is zero. See the proof of Thm. 1.2.4

Theorem 1.2.4 (Euler-Lagrange equation — necessary first-order condition for optimality).


Suppose that F (t , x, y) is C 2 . Necessary for a C 2 function x ∗ (·) to minimize (1.8) subject
to (1.9) is that it satisfies the differential equation
µ ¶
∂ d ∂
− F (t , x ∗ (t ), ẋ ∗ (t )) = 0 for all t ∈ [0, T ]. (1.10)
∂x dt ∂ẋ

Proof. Suppose x ∗ (·) is a C 2 solution to the simplest problem in the calculus of variations and
let δx (·) be an arbitrary C 2 -function on [0, T ] that is zero at the boundaries,

δx (0) = δx (T ) = 0. (1.11)

We use it to form a variation of the optimal solution

x(t ) = x ∗ (t ) + αδx (t ) (1.12)

with α ∈ R. Notice that x(t ) for every α ∈ R satisfies the boundary conditions x(0) = x ∗ (0) = x 0
and x(T ) = x ∗ (T ) = x T , see Fig. 1.5. Since x ∗ (·) is a minimizing solution for our problem we
have that

J (x ∗ (·)) ≤ J (x ∗ (·) + αδx (·)) ∀α ∈ R. (1.13)

For fixed δx (·) the cost J (x ∗ (·) + αδx (·)) is a function of the scalar variable α,

J¯(α) := J (x ∗ (·) + αδx (·)).

The minimality condition (1.13) thus says that J¯(0) ≤ J¯(α) for all α ∈ R. Given that x ∗ (·), δx (·)
and F (·) are all assumed C 2 , it follows that the J¯(α) is differentiable and so the above implies
that J¯0 (0) = 0. This derivative is2
·Z T ¸
d
J¯0 (0) = F (t , x ∗ (t ) + αδx (t ), ẋ ∗ (t ) + αδ̇x (t )) dt
dα 0 α=0
Z T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= δx (t ) + δ̇x (t ) dt . (1.14)
0 ∂x T ∂ẋ T
R R
2 Leibniz’ integral rule says that d

G(α, t )dt = dG(α,t dα
)
dt if G(α, t ) and dG(α,t

)
are continuous in t and α.
1
Here they are continuous because F (t , x ∗ (t ), ẋ ∗ (t )) and δx (t ) are C .

5
Integration by parts of the second term in (1.14) yields3
Z T
∂F (t , x ∗ (t ), ẋ ∗ (t ))
δ̇x (t ) dt
0 ∂ẋ T
¯T Z T µ ¶
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯ d ∂F (t , x ∗ (t ), ẋ ∗ (t ))
= ¯
δx (t )¯ − δx (t ) dt
∂ẋ T 0 0 dt ∂ẋ T
| {z }
=0
Z Tµ d ∂F (t , x ∗ (t ), ẋ ∗ (t ))

=− δx (t ) dt . (1.15)
0 dt ∂ẋ T
Here the underbred term is zero because of the boundary conditions (1.11). Plugging (1.15)
into (1.14) and using that J¯0 (0) = 0 we find that
Z T ·µ ¶ ¸T
∂ d ∂
0= − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt . (1.16)
0 ∂x dt ∂ẋ
So far the perturbation δx (t ) in our derivation was some fixed function. However since the
perturbation can be arbitrarily chosen, the equality (1.16) must hold for every C 2 perturbation
δx (t ) that satisfies (1.11). But this implies, via the next lemma, that the term in between the
square brackets in (1.16) is zero, i.e. that (1.5.1) holds. ■

φ(t )

δx (t )

a t̄ b

F IGURE 1.6: The function δx (t ) defined in (1.18)

Lemma 1.2.5 (Fundamental lemma (or Lagrange’s lemma)). A continuous function φ :


[0, T ] → Rn has the property that
Z T
φT (t )δx (t ) dt = 0 (1.17)
0

for every C 2 -function δx : [0, T ] → Rn satisfying (1.11), if-and-only if φ(t ) is zero for all t ∈
[0, T ].

Proof. We prove it for n = 1. Figure 1.6 explains it all: suppose that φ(t ) is not the zero func-
tion, i.e. that φ(t̄ ) is nonzero for some t̄ ∈ [0, T ]. Then by continuity, φ(t ) is sign-definite on
some interval [a, b] ⊆ [0, T ] about t̄ (and with 0 ≤ a < b ≤ T ). Consider the function δx (t )
defined as
(
((t − a)(b − t ))3 t ∈ [a, b],
δx (t ) = (1.18)
0 elsewhere,

see Figure 1.6. Clearly this δx (t ) fulfils the requirements of (1.11) but it violates (1.17) because
both φ(t ) and δx (t ) are sign-definite on [a, b]. The assumption that φ(t̄ ) 6= 0 at some t̄ ∈ [0, T ]
hence is wrong. ■

3 Integration by parts holds if ∂


F (t , x ∗ (t ), ẋ ∗ (t )) and δx (t ) are C 1 with respect to time. By assumption on
∂x T
F (·), x ∗ (·), δx (·) they are.

6
This result was derived independently by Euler and Lagrange, and in honor of its inventors
Eqn. (1.5.1) is nowadays called the Euler-Lagrange equation (or the Euler equation). We want
to stress that the Euler-Lagrange equation is only a necessary condition for optimality. All it
guarantees is that a “small” perturbation of x ∗ (·) results in a “very small” change in cost. To
put it more mathematically, the solutions x ∗ (·) of the Euler-Lagrange equation are precisely
those functions for which for every allowable δx (·) we have

J (x ∗ (·) + αδx (·)) = J (x ∗ (·)) + o(α)

with o some little-o function. Such solutions x ∗ (·) are referred to as stationary solutions. They
might be minimizing J (x(·)), or maximizing J (x(·)), or neither.
Interestingly the Euler-Lagrange equation does not depend on the initial or final values
x 0 , x T . More on this in § 1.5.

Example 1.2.6 (Brachistochrone, Example 1.1.1 continued). The Euler-Lagrange equation


for the brachistochrone problem, see (1.3), reads
µ ¶s
∂ d ∂ 1 + ẏ 2 (x)
− =0 (1.19)
∂y dx ∂ ẏ 2g y(x)

with the boundary conditions y(x 0 ) = y 0 and y(x 1 ) = y 1 . One may expand (1.19) but in this
form the problem is still rather complicated. In the following section we use a more sophisti-
cated approach. ä

Example 1.2.7 (Shortest path, Example 1.1.3 continued). The Euler-Lagrange equation for
the shortest path problem described by (1.6) and (1.7) is
µ ¶q
∂ d ∂
0= − 1 + ẏ 2 (x) (1.20)
∂y dx ∂ ẏ

with boundary conditions y(x 0 ) = y 0 and y(x 1 ) = y 1 . From (1.20) we obtain


q à !
d ∂ d ẏ(x)
0= 2
1 + ẏ (x) = p = ÿ(x)(1 + ẏ 2 (x))−3/2 . (1.21)
dx ∂ ẏ dx 1 + ẏ 2 (x)

Clearly the solution of (1.21) is given by the differential equation

ÿ(x) = 0 (1.22)

which is another way of saying that y(x) is a straight line. In light of the boundary conditions
y(x 0 ) = y 0 and y(x 1 ) = y 1 it has the unique solution
y −y
y(x) = y 0 + x11 −x00 (x − x 0 ). (1.23)

This solution is not surprising. It is of course the solution, but formally we may not yet draw
this conclusion because the theory presented so far can handle only C 2 functions and it only
claims that solutions of (1.21) are stationary solutions and they need not be optimal. ä

Example 1.2.8 (Oil production, Example 1.1.2 continued). Corresponding to the criterion to
be minimized, (1.4), with the boundary conditions (1.5), we find the Euler-Lagrange equation
µ ¶
∂ d ∂ d
0= − (α`˙2 (t ) + β`(t )) = β − (2α`(t
˙ )) = β − 2α`(t
¨ ). (1.24)
∂` dt ∂`˙ dt

7
¨ )=
So `(t
β
2α , that is,

β 2
`(t ) = t + `1 t + `0 . (1.25)

The constants `1 and `0 follow from the boundary conditions `(0) = 0 and `(T ) = L, i.e. `0 =
0, `1 = L/T − β/(4α)T . Of course, it still remains to be seen that the above `(t ) defined in
˙ ) ≥ 0 from (1.5) puts a
(1.25) is indeed minimizing (1.4). Notice that the extra constraint `(t
further restriction on the total amount of L and the final time T . ä

1.3 Beltrami identity


In many applications the running cost F (t , x, ẋ) does not explicitly depend on t and thus has
the form

F (x, ẋ).

∂F
Obviously the partial derivative ∂t is zero now. An interesting implication is that then
µ ¶

1 − ẋ T (t ) F (x(t ), ẋ(t ))
∂ẋ

is constant over time for solutions x(t ) of the Euler-Lagrange equation. To see this, we dif-
ferentiate the above with respect to time (and for ease of notation we momentarily write x(t )
simply as x):
µ ¶
d T ∂F (x, ẋ)
F (x, ẋ) − ẋ
dt ∂ẋ
d d ¡ T ∂F (x, ẋ) ¢
= F (x, ẋ) − ẋ
µdt dt ∂ẋ ¶ µ ¶
T ∂F (x, ẋ) T ∂F (x, ẋ) ∂F (x, ẋ) d ∂F (x, ẋ)
= ẋ + ẍ − ẍ T + ẋ T ( )
∂x ∂ẋ ∂ẋ dt ∂ẋ
µ ¶
∂F (x, ẋ) d ∂F (x, ẋ)
= ẋ T − . (1.26)
∂x dt ∂ẋ

This is zero for every solution x(t ) of the Euler-Lagrange equation. Hence every stationary
solution x ∗ (t ) to our problem has the property that

∂F (x ∗ (t ), ẋ ∗ (t ))
F (x ∗ (t ), ẋ ∗ (t )) − ẋ ∗T (t ) =C ∀t
∂ẋ T
for some integration constant C . This identity is known as the Beltrami identity. We illustrate
the usefulness of this identity by explicitly solving the brachistochrone problem. It is good to
realize, though, that the Beltrami identity is not equivalent to the Euler-Lagrange equation.
Indeed, every constant function x(t ) satisfies the Beltrami identity. From (1.26) it can be seen
that the Beltrami identity and the Euler-Lagrange equation are equivalent for scalar functions
x : [0, T ] → R if ẋ(t ) is nonzero for almost all time. It is good to keep this in mind.

Example 1.3.1 (Brachistochrone, Example 1.1.1 continued). The running cost F (x, y, ẏ) in
the brachistochrone problem is
p
1 + ẏ 2
F (y, ẏ) = p .
2g y

8

y

0 x→ πc 2

x→
A

y B

2 2
F IGURE 1.7: Top: shown in red is the cycloid x(φ) = c2 (φ − sin(φ)), y(φ) = c2 (1 − cos(φ)) for
φ ∈ [0, 2π]. It is the curve that a point on a rolling disc of radius c 2 /2 traces out. Bottom: a
downwards facing cycloid (solution of the Brachistochrone problem). See Example 1.3.1

A = (0, 0) x→

y

F IGURE 1.8: Cycloids (1.28) for various c > 0. Given a B to the right and below A = (0, 0) there is
a unique cycloid that joins A and B , see Example 1.3.1

It does not depend on x, so Beltrami applies and it says that

∂F (y(x), ẏ(x))
C = F (y(x), ẏ(x)) − ẏ(x)
∂ ẏ
p
1 + ẏ 2 (x) ẏ 2 (x)
= p −p
2g y(x) 2g y(x)(1 + ẏ 2 (x))
1
=p .
2g y(x)(1 + ẏ 2 (x))

Squaring and inverting both sides gives

c 2 = y(x)(1 + ẏ 2 (x)), (1.27)


1
where c 2 := 2gC 2
. This equation can be solved parametrically by4

c2 c2
x(φ) = (φ − sin(φ)), y(φ) = (1 − cos(φ)). (1.28)
2 2
The curve (x(φ), y(φ)) is known as the cycloid. It is the curve that a fixed point on the bound-
ary of a wheel with radius c 2 /2 traces out while rolling without slipping on a horizontal line
4 Quick derivation: since the cotangent cos(φ/2)/ sin(φ/2) for φ ∈ [0, 2π] ranges over all real numbers once (in-

cluding ±∞) it follows that any dy/dx can uniquely be written as dy/dx = cos(φ/2)/ sin(φ/2) with φ ∈ [0, 2π].
Then (1.27) implies that y = c 2 /(1 + cos2 (φ/2)/ sin2 (φ/2)) = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2 and then dx/dφ =
(dy/dφ)/(dy/dx) = [c 2 sin(φ/2) cos(φ/2)]/[cos(φ/2)/ sin(φ/2)] = c 2 sin2 (φ/2) = c 2 (1 − cos(φ))/2. Integrating this
expression shows that x(φ) = c 2 (φ − sin(φ))/2 + d where d is some integration constant. This d = 0 because
(x, y) = (0, 0) is on the curve. (See Exercise 1.6 for more details.)

9
(think of the valve on your bike’s wheel), see Fig. 1.7. For the cycloid, Beltrami and Euler-
Lagrange are equivalent because ẏ(x) is nonzero almost everywhere. Hence all smooth-
enough stationary solutions of the Brachistochrone problem are precisely these cycloids.
Varying c in (1.28) generates a family of cycloids, see Fig. 1.8. Given a destination point
B to the right and below A = (0, 0) there is a unique cycloid that connects A and B , and the
solution of the brachistochrone problem is that segment of the cycloid. Notice that for certain
destinations B the curve extends below the final destination! ä

y max y max

y(x)

−1 x 1

dx

F IGURE 1.9: Given a positive function


p y(x) and its surface of revolution, the infinitesimal dx-
strip of this surface has area 2πy(x) 1 + ẏ 2 (x)dx. See Example 1.3.2

Example 1.3.2 (Minimal surface). Warning, this is an elaborate example. We seek a positive
function y(x) whose surface of revolution about the x-axis has minimal area, see Fig. 1.9. The
length of the axis we take equal to 2 and we take it symmetric around zero, and the boundary
conditions we assume to be the same

y max := y(−1) = y(+1).


p
The area of an infinitesimal dx-strip at x equals 2πy(x) 1 + ẏ 2 (x) dx (see Fig. 1.9) and there-
fore the total area J (y(·)) of the surface of revolution is
Z 1 q
J (y(·)) = 2πy(x) 1 + ẏ 2 (x) dx.
−1

Beltrami applies and it gives us that


q
ẏ(x)
2πy(x) 1 + ẏ 2 (x) − ẏ(x)2πy(x) p =C
1 + ẏ 2 (x)
p
for some constant C . Multiplying left and right by the nonzero 1 + ẏ 2 (x)/(2π) turns this into
q
C
y(x)(1 + ẏ 2 (x)) − y(x) ẏ 2 (x) = 1 + ẏ 2 (x),

that is,
q
C
y(x) = 1 + ẏ 2 (x).

10
y max →
y G = 1.895
y ∗ = 1.509

a∗ = 0.834 aG = 1.564 a→ (a)

surface
area a < a∗

17.16 a > a∗

y ∗ = 1.509 y max → (b)

surface
area Goldschmidt

22.56
17.16 a > a∗

y∗ yG y max → (c)
F IGURE 1.10: (a) y max = a cosh(1/a) as a function of a. The minimal y max is y ∗ ≈ 1.509 (attained
at a ∗ ≈ 0.834). (b) the area of the catenoid as a function of y max . (c) the area of the catenoid (in
red) and of the Goldschmidt two-disc solution (in blue) as a function of y max . The areas are the
same at y G = 1.895. This y G corresponds to a G = 1.564 (see part (a) of this figure)

11
Since the radius y(x) is nonnegative we have that a defined as a = C /(2π) is ≥ 0. Squaring left
and right hand-side we end up with

y 2 (x) = a 2 (1 + ẏ 2 (x)). (1.29)

The nonnegative even solutions of the differential equation are

y a (x) = a cosh(x/a), a ≥ 0. (1.30)

This can be derived using separation of variables5 (see Appendix A.3). Figure 1.9 is an example
of such a hyperbolic cosine. The two-dimensional surface of revolution of a hyperbolic cosine
is called catenoid. From the shape of hyperbolic cosines it will be clear that for every a > 0
the derivative ẏ(x) is nonzero almost everywhere, and so Beltrami and Euler-Lagrange are
equivalent.
Are such hyperbolic cosines optimal solutions? Not necessarily, and Figure 1.10(a) con-
firms this. It shows the boundary value

y max = y(±1) = a cosh(1/a)

as a function of a. This function has a minimum of y ∗ = 1.50887956154 and it is attained at


a ∗ = 0.8335565596. So if we choose y max < y ∗ then none of these hyperbolic cosines is the
solution to our problem! Also, if y max > y ∗ then there are two hyperbolic cosines that meet
the boundary condition and it seems likely that at most one of them is the optimal solution.
Now it can be shown that for the hyperbolic cosines (1.30) the area of the catenoid is

J (y a (·)) = 2πa 2 ( a1 + sinh( a1 ) cosh( a1 )).

It is interesting to plot this against y max = a cosh(1/a). This is done in Fig. 1.10(b). The black
curve is for a < a ∗ and the red curve is for a > a ∗ . This shows that for a given y max > y ∗ the
area of the catenoid is the smallest for the largest of the two a’s. Thus we need only consider
a ≥ a∗ .
Now the case y max < y ∗ . Then no hyperbolic cosine meets the boundary condition. What
does this mean? It means that no smooth function y(x) exists that is stationary and satisfies
y max < y ∗ . A deeper analysis (see Exercise 1.8) shows that the only other stationary curve is
the so-called Goldschmidt two-disc solution. This is when y(x) = 0 in the interior, and y(±1) =
y max at the boundary, see Fig. 1.11. In this case the area of the surface of revolution is the sum
2
of the areas of the two discs, 2πy max .
It can be shown that a global minimal solution exists and since it must be stationary it
is either the Goldschmidt two-disc solution or the catenoid for an appropriate a ≥ a ∗ . If
y max < y ∗ then clearly the Goldschmidt solution is the only stationary solution, hence must
be globally optimal. Now, for y max > y ∗ something odd occurs: Fig. 1.10(c) gives us the area of
the surface of revolution of the Goldschmidt two-disc solution as well as that of the catenoid.
We see that there is a point y G = 1.8950254556 at which the Goldschmidt and catenoid so-
lution have the same area. This point is attained at a G = 1.5643765887. For y max > y G the
catenoid (for the corresponding a > a G ) has the smallest area, hence is then globally optimal,
but for y max < y G it is the Goldschmidt two-disc solution that is globally optimal. The conclu-
sion is that the optimal solution is discontinuous in y max ! ä

5 There is, however, a technicality in this derivation that is often overlooked, see Exercise 1.7, but we need not

worry about that now.

12
y max y max

F IGURE 1.11: Goldschmidt two-disc solution

1.4 Higher-order Euler-Lagrange equation


The Euler-Lagrange equation can directly be extended to the case that the integral J (x(·)) de-
pends on higher-order time-derivatives of x(t ). Let us state explicitly the second-order case.

Proposition 1.4.1 (Higher-order Euler-Lagrange equation). Consider the problem of mini-


mizing the integral
Z T
J (x(·)) = F (t , x(t ), ẋ(t ), ẍ(t )) dt (1.31)
0

over the function x(·) satisfying the boundary conditions

x(0) = x 0 , x(T ) = x T
(1.32)
ẋ(0) = x 0d , ẋ(T ) = x Td

for given values x 0 , x 0d and x T , x Td . A necessary condition that a C 3 function x ∗ (t ) mini-


mizes (1.31) and satisfies (1.32) is that x ∗ (t ) is a solution of the differential equation
µ ¶
∂ d ∂ d2 ∂
− + F (t , x(t ), ẋ(t ), ẍ(t )) = 0 ∀t ∈ [0, T ]. (1.33)
∂x dt ∂ẋ dt 2 ∂ẍ

Proof. Define J¯(α) = J (x ∗ (·)+αδx (·)). Then, as before, the derivative J¯0 (0) is zero. Analogously
to (1.14) we compute J¯0 (0). For ease of exposition we momentarily skip all time arguments in
x ∗ (t ) and δx (t ) and, sometimes, F :
· Z ¸
d T
0 = J¯0 (0) = F (t , x ∗ + αδx , ẋ ∗ + αδ̇x , ẍ ∗ + αδ̈x ) dt
dα 0 α=0
Z T
∂F ∂F ∂F
= δ + T δ̇x + T δ̈x dt .
T x
(1.34)
0 ∂x ∂ẋ ∂ẍ
Integration by parts of the second term of the integrand yields
Z T · ¸T Z T Z T
∂F ∂F d ¡ ∂F ¢ d ¡ ∂F ¢
δ̇ x dt = δ x − δ x dt = − δx dt .
0 ∂ẋ ∂ẋ T 0 dt ∂ẋ 0 dt ∂ẋ
T T T
0
| {z }
=0

The last equality follows from the boundary condition that δx (0) = δx (T ) = 0. Integration by
parts of the third term in (1.34) similarly gives
Z T · ¸T Z T Z T
∂F ∂F ¡ d ∂F ¢ ¡ d ∂F ¢
δ̈ x dt = δ̇ x − δ̇ x dt = − δ̇x dt
0 ∂ẍ ∂ẍ T dt ∂ẍ T dt ∂ẍ T
T
0 0
| {z 0}
=0

13
y =0 y =0

z =0 z =ℓ

F IGURE 1.12: Elastic bar (Example 1.4.2)

where now the second equality is the result of the boundary conditions that δ̇x (0) = δ̇x (T ) = 0.
Another time integration by parts of this just obtained term yields
Z T¡ · µ ¶ ¸T Z T µ 2 ¶
d ∂F ¢ d ∂F d ∂F
− δ̇x dt = − δx + δx dt .
0 dt ∂ẍ T dt ∂ẍ T 0 0 dt 2 ∂ẍ T
| {z }
=0

Combination of both integration by parts procedures (one time for the second term, and two
times for the third term) and Lemma 1.2.5 thus yields (1.33). ■

Example 1.4.2 (Elastic bar). Consider a horizontal elastic bar, loaded by weights and sup-
ported at its two ends. The equilibrium of the bar is determined by the condition that its po-
tential energy is minimal, see Fig. 1.12. Denote by ` the length of the bar, and by z ∈ [0, `] the
spatial variable. The small vertical displacement caused by the loading of the bar is denoted
by y(z), while ρ(z) denotes the load per unit length. We assume that the bar has a uniform
cross-section (independent of z). If the curvature of the elastic bar is not too large then the
potential energy due to elastic forces is proportional to the square of the second derivative,
Z
k ` d2 y(z) 2
V1 = ( ) dz
2 0 dz 2

where k is a constant depending on the elasticity of the bar. Furthermore, the potential energy
due to gravity is given by
Z `
V2 = g ρ(z)y(z) dz
0

hence the minimum of the potential energy is obtained by minimizing the integral
Z ` k d2 y(z) 2
( ) + g ρ(z)y(z) dz.
0 2 dz 2

The Euler-Lagrange equation (1.33) of this variational problem is the fourth-order differential
equation

d4 y(z)
k = −g ρ(z) ∀z ∈ [0, `]. (1.35)
dz 4

If ρ(z) is constant then y(z) is a polynomial of degree 4. Fig. 1.12 depicts a solution for the
boundary conditions y(0) = y(`) = 0 and ẏ(0) = ẏ(`) = 0. In this case the solution is y(z) =

− 4!k (z(z − `))2 . ä

14
1.5 Relaxed boundary conditions
In the problems considered so far, the initial state x(0) and final state x(T ) were fixed. A useful
extension is obtained by removing some of these conditions. This means that we allow more
functions x(t ) to optimize over, and, consequently, that the first-order conditions expand. As
an example, suppose the state x(t ) has 3 components and that the first component of the
initial state and the last component of the final state are now free to choose,
   
free fixed
x(0) = fixed , x(T ) = fixed . (1.36)
fixed free

Following the exact same arguments as in the proof of Thm. 1.2.4, the necessary first-order
condition is that
¯T Z T µµ ¶ ¶T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯ ∂ d ∂
¯
δx (t )¯ + − F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0 (1.37)
∂ẋ T 0 0 ∂x dt ∂ẋ
at the optimal solution for all allowable perturbations. In particular it must be zero for all
perturbations δx (t ) that are zero at t = 0 and t = T . For these special perturbations the first-
order condition reduces to
Z T µµ ¶ ¶T
∂ d ∂
− F (t , x ∗ (t ), ẋ ∗ (t )) δx (t ) dt = 0
0 ∂x dt ∂ẋ
for all such δx (t ). But this means precisely that the Euler-Lagrange equation must hold. This
proves that also for relaxed boundary conditions the Euler-Lagrange equations hold (this was
to be expected). Knowing this, the first-order conditions (1.37) simplify to
¯T
∂F (t , x ∗ (t ), ẋ ∗ (t )) ¯
δ x (t )¯ = 0. (1.38)
∂ẋ T ¯
0

When is this equal to zero for all allowable perturbations? Since the perturbed state x(t ) =
x ∗ (t ) + αδx (t ) for our example must obey the boundary condition (1.36) it follows that the
allowable perturbations are exactly those that satisfy
   
free 0
δx (0) =  0  , δx (T ) =  0  .
0 free

In view of this it will be clear that the first-order condition (1.38) holds for all allowable δx (t )
iff
   
0 free
∂F (0, x(0), ẋ(0))  ∂F (T, x(T ), ẋ(T ))
= free , = free .
∂ẋ ∂ẋ
free 0

This example demonstrates that to every initial or final state component that is free to choose
there corresponds a condition on the derivative of F (·) with respect to that component of ẋ.
Incidentally, by allowing states with free entries at initial and final time, it can now make sense
to include an initial- and/or a final cost to the cost function:
Z T
J (x(·)) = F (t , x(t ), ẋ(t )) dt +G(x(0)) + S(x(T )). (1.39)
0

Here G(x(0)) denotes an initial cost and S(x(T )) a final cost (aka terminal cost). The addi-
tion of these two costs does not complicate matters much: the above example generalizes as
follows.

15
Proposition 1.5.1 (Relaxed boundary conditions). Let T > 0 and suppose F (t , x, y) is C 2 and
that S(x),G(x) are C 1 . Let I, J be subsets of {1, . . . , n} and consider the functions x : [0, T ] → Rn
whose initial x(0) and final x(T ) are fixed except for the components:

x i (0) = free ∀i ∈ I and x j (T ) = free ∀ j ∈ J.

Among these functions, a C 2 function x ∗ (·) is a stationary solution of the cost (1.39) if-and-
only if it satisfies the Euler-Lagrange equation together with

∂F (0, x(0), ẋ(0)) ∂G(x(0))


− =0 ∀i ∈ I, (1.40)
∂ẋ i ∂x i
∂F (T, x(T ), ẋ(T )) ∂S(x(T ))
+ =0 ∀ j ∈ J. (1.41)
∂ẋ j ∂x j

Proof. See Exercise 1.10. ■

A common special case of this is the free end-point problem, which is when the initial
state is completely fixed and the final state is completely free. This means I = ; and J =
{1, . . . , n} and so (1.41) holds for the entire state vector x ∈ Rn :

∂F (T, x(T ), ẋ(T )) ∂S(x(T ))


+ = 0 ∈ Rn . (1.42)
∂ẋ ∂x

Example 1.5.2 (Fixed and free end-point). Consider minimization of


Z 1
α2 x 2 (t ) + ẋ 2 (t ) dt (1.43)
−1

over all functions x : [−1, 1] → R. First we solve the standard problem, so with both initial and
final state fixed, for instance, assume that

x(−1) = 1, x(1) = 1. (1.44)

The running cost α2 x 2 (t ) + ẋ 2 (t ) is a sum of two squares, so with minimization we would like
both terms small, but one depends on the other. The parameter α models a trade-off between
making ẋ 2 (t ) small and x 2 (t ) small. Whatever α is, the optimal solution x(t ) needs to satisfy
the Euler-Lagrange equation,
µ ¶
∂ d ∂ d
0= − (α2 x 2 (t ) + ẋ 2 (t )) = 2α2 x(t ) − (2ẋ(t )) = 2α2 x(t ) − 2ẍ(t ).
∂x dt ∂ẋ dt

Therefore ẍ(t ) = α2 x(t ). This differential equation can be solved using characteristic equa-
tions (do this yourself, see Appendix A.4) and the general solution is

x(t ) = c eαt +d e−αt (1.45)

with c, d two arbitrary constants. The two constants follow from the two boundary condi-
tions (1.44):

1 = x(−1) = c e−α +d e+α ,


1 = x(1) = c e+α +d e−α .

16
The solution is c = d = 1/(eα + e−α ). That c equals d is expected because of the symmetry
of the boundary conditions. We see that there is exactly one function x(t ) that satisfies the
Euler-Lagrange equation and that meets the boundary conditions:

α = 1/2

eαt + e−αt α=1


x ∗ (t ) =
eα + e−α
α=2
t = −1 t = +1

For α = 0 the solution is a constant x ∗ (t ) = 1 which, in hind side, is not a surprise because
for α = 0 the running cost is just F (t , x(t ), ẋ(t )) = ẋ 2 (t ) and clearly then a zero derivative (a
constant x(t )) is optimal. For large values of α, on the other hand, the x 2 (t ) is penalized
strongly in F (t , x(t ), ẋ(t )) = ẋ 2 (t ) + α2 x 2 (t ) and so then it pays to take x(t ) close to zero, even
if that is to the expense of some increase of ẋ 2 (t ). Indeed this is what happens.
Consider next the free end-point problem with

x(−1) = 1 but where x(1) is free.

We stick to the same cost function (1.43). In the terminology of (1.39) this means we take
)
the initial and final cost equal to zero, G(x) = S(x) = 0. Hence ∂S(x(T
∂x = 0 and then the free
end-point boundary constraint (1.42) becomes

∂α2 x 2 (t ) + ẋ 2 (t ) ¯¯
0= ¯ = 2ẋ(1).
∂ẋ t =1

The parameters c, d in (1.45) now follow from the initial condition x(−1) = 1 and the above
boundary constraint 0 = ẋ(1):

1 = x(−1) = c e−α +d e+α ,


0 = ẋ(1) = cα e+α −d α e−α .

The solution is
e−α e+α
c= , d= ,
e2α + e−2α e2α + e−2α
(check it yourself). So again the first-order conditions has a unique solution,

α = 1/2
eα(t −1) + e−α(t −1)
x ∗free (t ) = α=1
e2α + e−2α
α=2
t = −1 t = +1

The free end-point condition is that the derivative of x(t ) is zero at the final time. Again we
see that x(t ) approaches zero faster if α increases. Makes sense. ä

1.6 Intermezzo: Lagrangian Mechanics


In this intermezzo we discuss a connection between classical mechanics and the Euler-
Lagrange equation. Our treatment is not very detailed. We assume familiarity with the basics
of classical mechanics.

17
Let q ∈ Rn denote the position in Cartesian coordinates of some point mass. Newton’s
d
second law states that the derivative of momentum dt (m q̇(t )) of a point mass with mass m
equals the net force F (t ) exerted on the mass,
d
(m q̇(t )) = F (t ). (1.46)
dt
In a conservative force field (such as the gravitational force field) the force depends on q alone

and can be expressed as the negative of a gradient, F (q) = − ∂q U (q), of some function U :
d
Rn → R called the potential energy. Then the equation of motion (1.46) becomes dt (m q̇(t )) =

− ∂q U (q(t )). We reorder it as

∂ d
− U (q(t )) − (m q̇(t )) = 0. (1.47)
|
∂q
{z } |dt {z }
net force derivate of momentum

This equation is precisely the Euler-Lagrange equation,


∂ d ∂
( − )L(q(t ), q̇(t )) = 0 (1.48)
∂q dt ∂q̇
if we take L(q, q̇) equal to

L(q, q̇) = K (q̇) −U (q). (1.49)

Here K (q̇) is the kinetic energy 6 defined by the Cartesian coordinates q as


1
K (q̇) = m(q̇ 12 + q 22 + · · · + q̇ n2 ). (1.50)
2
Therefore, trajectories of a point mass q(t ) in Cartesian coordinates in a conservative force
field are solutions of the Euler-Lagrange equation for L(q, q̇) defined as the difference of ki-
netic and potential energy (1.49). An advantage of the Lagrangian formulation (1.48–1.50)
over the equivalent Newton’s law (1.46) is that L(q, q̇) is invariant under coordinate change.
This means that if we switch from Cartesian coordinates, q = (x, y) to, say, polar coordinates
q̃ = (r, φ) then (1.48) remains valid in the new coordinates q̃. This often simplifies modeling.
joint

φ

mass

F IGURE 1.13: Pendulum

Example 1.6.1 (Pendulum). Consider the pendulum (Fig. 1.13). Here polar coordinates are
more appropriate than Cartesian coordinates. The mass of the particle is denoted m, the
angle with respect to the vertical hanging position is denoted φ(t ) and the length of cable is
`. The kinetic energy then is

K (φ̇) = 12 m`2 φ̇2


6 In the literature kinetic energy is often denoted as T instead of K .

18
and the potential energy is

U (φ) = −mg ` cos(φ).

The Lagrangian L(q, q̇) = K (q̇) −U (q) therefore is

L(φ, φ̇) = 12 m`2 φ̇2 + mg ` cos(φ).

The angle φ is our (generalized) coordinate vector q. Now Euler-Lagrange says that the equa-
tion of motion is determined by

∂ d ∂
0=( − d
)L(q(t ), q̇(t )) = mg ` sin(φ(t )) − dt (m`2 φ̇(t )).
∂φ dt ∂φ̇
g
That is the familiar φ̈(t ) = − ` sin(φ(t )). ä

In this context the cost functional


Z T
L(q(t ), q̇(t )) dt
0

is referred to as the action integral and Newton’s second law just means that trajectories q(t )
are stationary functions of this action integral. This stationary property is known as Hamil-
ton’s principle.
Since L(q, q̇) does not explicitly depend on time, we have according to Beltrami that

∂L(q(t ), q̇(t ))
q̇(t ) − L(q(t ), q̇(t )) (1.51)
∂q̇ T

is constant over time for all trajectories. Exploiting the quadratic nature of the kinetic en-
ergy (1.50) this constant function is

∂L(q(t ), q̇(t )) ∂K (q̇(t ))


q̇(t ) − L(q(t ), q̇(t )) = q̇(t ) − K (q̇(t )) +U (q(t ))
∂q̇ T
∂q̇ T
= 2K (q̇(t )) − K (q̇(t )) +U (q(t ))
= K (q̇(t )) +U (q(t )).

It is nothing but the sum of kinetic and potential energy, i.e. the total energy. This proves that
the total energy K (q̇) +U (q) is preserved over time for all trajectories in a conservative force
field.
It is easy to see that the Beltrami function (1.51) equals the Hamiltonian H̃ (q, p) defined
as

H̃ (q, p) = p T q̇ − L(q, q̇) (1.52)

for p equal to the (generalized) momentum defined as

∂L(q, q̇)
p= . (1.53)
∂q̇

By the way, for p equal to the momentum (1.53), the Hamiltonian H̃ (q, p) defined in (1.52)
is indeed a function of q, p alone (and not q̇) because ∂ H̃ /∂q̇ = p − ∂L(q, q̇)/∂q̇ = 0. Since
∂L(q,q̇)
the Beltrami function is constant over time, obviously also the Hamiltonian H̃ (q, ∂q̇ ) is

19
constant over time. With Hamiltonians and generalized momenta we can expand the Euler-
Lagrange equation (1.49) into two first-order differential equations, known as the Hamilto-
nian equations

∂ H̃ (q(t ), p(t ))
q̇(t ) = ,
∂p
∂ H̃ (q(t ), p(t ))
ṗ(t ) = − .
∂q
Example 1.6.2 (Newton’s apple). Consider an apple of mass m subject to a downwards grav-
itational acceleration g . With y the upwards position, the difference of kinetic and potential
energy becomes

L(y, ẏ) = 21 m ẏ 2 − mg y.
∂L(y, ẏ)
Now the momentum by definition is p = ∂ ẏ = m ẏ and the Hamiltonian is

p2
H̃ (y, p) = p ẏ + mg y − 21 m ẏ 2 = + mg y.
2m
As predicted, the dependence on ẏ cancels in H̃ . The two first-order differential equations
(i.e. the Hamiltonian equations) are

∂ H̃ (y(t ), p(t )) 1
ẏ(t ) = = p(t ),
∂p m
∂ H̃ (y(t ), p(t ))
ṗ(t ) = − = −mg .
∂y

The total energy is H̃ (y, m ẏ) = 12 m ẏ 2 + mg y. ä

Example 1.6.3 (Pendulum). In the pendulum example (Example 1.6.1) we took the angle q :=
φ as (generalized) coordinate, and we found the Lagrangian
m 2 2
L(φ, φ̇) = ` φ̇ + mg ` cos(φ).
2
Its generalized momentum p (with respect to this q = φ) is
∂L(q, q̇)
p := = m`2 φ̇
∂φ̇
which is known as the angular momentum. Its Hamiltonian is
m 2 2
H̃ (φ, p) = p φ̇ − ` φ̇ − mg ` cos(φ) = p 2 /(2m`2 ) − mg ` cos(φ).
2
As predicted, q̇ := φ̇ cancels in the Hamiltonian. The total energy equals
m 2 2
H̃ (φ, m`2 φ̇) = ` φ̇ − mg ` cos(φ)
2
and the Hamiltonian equations are

φ̇(t ) = p(t )/(m`2 ),


ṗ(t ) = −mg ` sin(φ(t )).

We stop the digression here.

20
1.7 Second order conditions for minimality
The Euler-Lagrange equation was derived from the condition that optimal solutions x ∗ (·) are
necessarily stationary solutions, i.e. solutions for which

J (x ∗ (·) + αδx (·)) = J (x ∗ (·)) + o(α)

for every admissible perturbation δx (·). Now stationary solutions need not be minimizing
solutions. To be minimizing the above term “o(α)” needs to be nonnegative in a neighbor-
hood of α = 0. In this section we analyze this problem. We derive a necessary condition and
a sufficient condition for stationary solutions to be minimizing. These conditions are second-
order conditions and they require a second-order Taylor series expansion of F (t , x, y) for fixed
t around (x, y):

F (t , x + δx ,y + δ y )
· ¸· ¸
∂F (t , x, y) ∂F (t , x, y) δx
= F (t , x, y) +
∂x T ∂y T δy
 2 2 
∂ F (t , x, y) ∂ F (t , x, y)
· ¸ · ¸
1£ T ¤  ∂x∂x T
T  ∂x∂y T  δx δx
+ δ δ  + o( ).
y  2
2
x
∂ F (t , x, y) ∂2 F (t , x, y)  δ y δy
∂y∂x T ∂y∂y T
| {z }
Hessian

We assume that F (t , x, y) is C 2 so then the Hessian exists and is symmetric.

Theorem 1.7.1 (Legendre condition – 2nd order necessary condition). Consider the sim-
plest problem in the calculus of variations and suppose Assumptions 1.2.2 and 1.2.3 are sat-
isfied. Let x ∗ (t ) be a solution of the Euler-Lagrange equation (1.5.1), meeting the boundary
conditions (1.9). Necessary for x ∗ (t ) to be minimizing is that

∂2 F (t , x ∗ (t ), ẋ ∗ (t ))
≥0 ∀t ∈ [0, T ]. (1.54)
∂ẋ∂ẋ T
Proof. For ease of notation we prove it for the case that x has one component. Similar to
the proof of Thm. 1.2.4, let δx (t ) be a C 2 -perturbation on [0, T ] that satisfies the boundary
conditions (1.11). Let α ∈ R. Defining J¯(α) as

J¯(α) := J (x ∗ (·) + αδx (·))

we have by construction that every solution x ∗ (·) of the Euler-Lagrange equation makes
J¯0 (0) = 0. Using the second order Taylor series of J¯(α) at α = 0 we find (and skipping time
arguments) that
 2 
Z T ∂ F (t ,x ∗ ,ẋ ∗ ) ∂2 F (t ,x ∗ ,ẋ ∗ ) · ¸
£ ¤
J¯00 (0) = δx δ̇x  2
∂x 2 ∂x∂ẋ  δx dt (1.55)
2
0 ∂ F (t ,x ∗ , ẋ ∗ ) ∂ F (t ,x ∗ , ẋ ∗ ) δ̇x
∂x∂ẋ ∂ẋ 2
| {z }
Hessian
Z T 2 2 2
∂ F 2 ∂ F ∂ F 2
= δ
∂x 2 x
+ 2 ∂x∂ ẋ δx δ̇x + ∂ẋ 2 δ̇x dt . (1.56)
0

For an optimal x ∗ (·) this has to be nonnegative for every allowable δx (·). This does not imply
that the Hessian is positive semi-definite because δx (·) and δ̇x (·) are related. Indeed, using

21
integration by parts the cross term can be simplified as
Z T Z T ¯T Z T
∂2 F ∂2 F d 2 ∂2 F 2 ¯ d ∂2 F 2
2 ∂x∂ ẋ δx δ̇x dt = ∂x∂ẋ ( dt δx ) dt = ∂x∂ẋ δx ¯ − ( dt ∂x∂ẋ )δx dt .
0 0 0 0
| {z }
0

Therefore
Z T 2
∂2 F 2 2
J¯00 (0) = d
[ ∂∂xF2 − dt ∂x∂ẋ ]δx + ∂∂ẋF2 δ̇2x dt . (1.57)
0

∂2 F
Lemma 1.7.2 below shows that (1.57) is non-negative for every possible δx (·) only if ∂ẋ 2
is
nonnegative definite for the candidate x ∗ (·) and all time. I.e. only if (1.54) holds. ■

Lemma 1.7.2 (Technical lemma). Let φ(t ) and ψ(t ) be continuous functions on [0, T ] and
suppose that
Z T
φ(t )δ2x (t ) + ψ(t )δ̇2x (t ) dt ≥ 0 (1.58)
0

for all C 2 -functions δx (t ) with δx (0) = δx (T ) = 0. Then

ψ(t ) ≥ 0 ∀t ∈ [0, T ]. (1.59)

Proof. Suppose, to obtain a contradiction, that ψ(t̄ ) < 0 for some t̄ ∈ [0, T ]. Then for every
² > 0 we can construct a possibly small interval [a, b] about t̄ in [0, T ] and a C 2 -function δx (·)
on [0, T ] that is zero for t 6∈ [a, b] and satisfies
Z b Z b
δ2x (t ) dt < ² and δ̇2x (t ) dt > 1.
a a

This may be clear from Figure 1.14. Such a δx (t ) satisfies all the conditions of the lemma but
renders the integral in (1.58) negative for small enough ² > 0. That is a contradiction hence
the assumption that ψ(t̄ ) < 0 is wrong. ■

δx (t )
0 a T
b

ψ(t )

F IGURE 1.14: About the construction of a δx (t ) that violates (1.58), see the proof of Lemma 1.7.2

This second order condition (1.54) is known as the Legendre condition. Notice that the
2
inequality (1.54) means that ∂ F (t ,x ∗ (t ), ẋ ∗ (t ))
∂ẋ∂ẋ T (which is an n × n matrix if x has n components)
is a symmetric positive semi-definite matrix at every moment in time.

Example 1.7.3 (Example 1.1.2 continued). From (1.4) we find

˙ = α`˙2 + β`
F (t , `, `) (1.60)

and so
˙
∂2 F (t , `, `)
= 2α. (1.61)
∂`˙ 2

It is given that α > 0 so the Legendre condition is satisfied ä

22
As remarked earlier, a maximization problem is obtained from a minimization problem
by changing the sign of F (t , x, ẋ).
In the preceding two examples the Legendre condition was easy to verify because the sec-
ond derivative turned out to be trivially nonnegative for all t , x, ẋ and not just for the optimal
t , x ∗ (t ), ẋ ∗ (t ).
The Euler-Lagrange condition together with the Legendre condition are necessary but are
not sufficient for minimality. This is confirmed by the next example.

Example 1.7.4 (Stationary solution, but not a minimizer). The Euler-Lagrange equation for
the minimization of
Z 1µ ¶
ẋ(t ) 2
− x 2 (t ) dt (1.62)
0 2π

is the differential equation (2π)2 x(t ) + ẍ(t ) = 0. Given the boundary conditions

x(0) = x(1) = 0

the solutions of the differential equation are

x(t ) = A sin(2πt ), A ∈ R.

Each of these solutions x(t ) meets the Legendre condition (1.54) since

∂2 F (t , x(t ), ẋ(t )) 2
2
= > 0.
∂ẋ (2π)2

Also, each such x(t ) renders the integral in (1.62) equal to zero. There are however many other
functions x(t ) that achieve x(0) = x(1) = 0 but for which the integral (1.62) takes a negative
value. For example x(t ) = −t 2 + t . By scaling this last function with a constant we can make
the cost as negative as we desire. In this case there is no optimal solution x ∗ (t ). ä

The proof of the Legendre condition actually provides us with an elegant sufficient condi-
tion for optimality, in fact for global optimality. If the Hessian, defined earlier as
 
∂2 F (t , x, y) ∂2 F (t , x, y)
 ∂x∂x T ∂x∂y T 
 
H (t , x, y) :=  2 , (1.63)
 ∂ F (t , x, y) ∂2 F (t , x, y) 
∂y∂x T ∂y∂y T

is positive semi-definite for all x ∈ Rn and all y ∈ Rn and all t ∈ [0, T ] then at each t the running
cost F (t , x(t ), ẋ(t )) is convex in x(t ), ẋ(t ). For convex functions it is known that stationarity
implies global optimality:

Theorem 1.7.5 (Convexity). Consider the simplest problem in the calculus of variations and
suppose that F (t , x, y) is C 2 . If the Hessian (1.63) is positive semi-definite for all x, y ∈ Rn and
all t ∈ [0, T ] then every solution x ∗ (·) of the Euler-Lagrange equation that meets the boundary
conditions is a global optimal solution. If the Hessian is positive definite then this x ∗ (·) is the
unique optimal solution.

Proof. Suppose that the Hessian is positive semi-definite. Let x ∗ (·), x(·) be two functions that
satisfy the boundary conditions and suppose x ∗ (·) satisfies Euler-Lagrange. Let δ(t ) = x(t ) −
x ∗ (t ) and define J¯(α) = J (x ∗ (·) + αδ(·)). This way J¯(0) = J (x ∗ (·)) while J¯(1) = J (x(·)). We need
to prove that J¯(1) ≥ J¯(0).

23
As before we have that J¯0 (0) is zero by the fact that x ∗ (·) satisfies the Euler-Lagrange equa-
tion.
The second derivative of J¯(α) with respect to α is (skipping time arguments)
Z T · ¸
£ ¤ δ
¯00
J (α) = δ δ̇ H (t , x ∗ + αδ, ẋ ∗ + αδ̇) dt .
0 δ̇

Since H (t , x, y) is positive semi-definite for all x, y ∈ Rn and all time, we see that J¯00 (α) ≥ 0 for
all α ∈ R. Therefore for every β ≥ 0 there holds
Z β
¯0 ¯0
J (β) = J (0) + J¯00 (α)dα ≥ J¯0 (0) = 0.
0
R1
But then J¯(1) = J¯(0) + 0 J¯0 (β) dβ ≥ J¯(0) which is what we had to prove.
Next suppose that H (t , x, y) is positive definite and that x(·) 6= x ∗ (·). Then δ(·) is not the
zero function and so by positive definiteness of H (t , x, y) we have J 00 (α) > 0 for every α ∈ [0, 1].
Then J (x(·)) = J¯(1) > J¯(0) = J (x ∗ (·)) for every x(·) 6= x ∗ (·). ■

This result produces a lot, but also requires a lot. Indeed the convexity assumption fails
in many cases of interest. Here are a couple examples where the convexity assumption is
satisfied.

Example 1.7.6 (Example


p 1.1.3 continued). In the notation of shortest path Example 1.1.3 we
have F (x, y, ẏ) = 1 + ẏ 2 and so we find that

∂2 F (x, y, ẏ) 1
2
= > 0. (1.64)
∂ ẏ (1 + ẏ 2 )3/2
It is positive for all time and all y, ẏ, in particular for solutions (y(t ), ẏ(t )) of the Euler-Lagrange
equation. This implies that the solution found in Example 1.2.7 – namely the line through the
points (x 0 , y 0 ) and (x 1 , y 1 ) – satisfies Legendre’s condition. The Hessian (1.63) is
" #
0 0
H (x, y, ẏ) = ≥ 0.
0 (1+ ẏ12 )3/2

It is positive semi definite hence the straight-line solution is globally optimal. No surprise. ä

Example 1.7.7. For the quadratic cost


Z 1
α2 x 2 (t ) + ẋ 2 (t ) dt
−1

as used in Example 1.5.2 the Hessian is constant


· 2 ¸
2α 0
H (t , x, ẋ) = .
0 1

Clearly this is positive definite for every α 6= 0 and hence the solution of the Euler-Lagrange
equation found in Example 1.5.2 is the unique global optimal solution of the problem. ä

One can pose many different types of problems in the calculus of variations by giving
different boundary conditions, for instance involving ẋ(T ), or by imposing further constraints
on the required solution. An example of the latter is presented in Example 1.1.2, see (1.5) and
we deal with another one in the next section. The Legendre condition (1.54) is only one of
the second order conditions for optimality. Additional second order conditions go under the
names of Weierstrass and Jacobi.

24
F IGURE 1.15: Three areas enclosed by ropes of the same length

1.8 Integral constraints


An interesting generalization is when the function x(·) that is to minimize the cost
Z T
J (x(·)) := F (t , x(t ), ẋ(t )) dt
0

is not free to choose but is subject to an integral constraint


Z T
C (x(·)) := M (t , x(t ), ẋ(t )) dt = c 0 .
0

The standard example of this type is a version of Queen Dido’s isoperimetric problem, which
is the problem to enclose an area as large as possible by a rope of given length. Intuition tells
us that the optimal area is a disc (the right-most option in Fig. 1.15). To put it more math-
ematically, in Dido’s problem we want to find a function x : [0, T ] → R with given boundary
values x(0) = x 0 , x(T ) = x T , that maximizes the area
Z T
J (x(·)) = x(t ) dt
0

subject to the constraint that


Z T p
1 + ẋ 2 (t ) dt = `
0

for a given `.
How to solve such constraint minimization problems? A quick-and-dirty argument goes
as follows: from basic calculus it is known that the solution of a minimization problem of
some function J (x(·)) subject to a constraint C (x(·)) − c 0 = 0 is a stationary solution of the
Lagrangian defined as

J¯(x(·), µ) : = J (x(·)) + µ(C (x(·)) − c 0 )


Z T
= F (t , x(t ), ẋ(t )) + µM (t , x(t ), ẋ(t )) dt − µc 0
0

for some constant Lagrange parameter7 µ. The stationary solutions (x ∗ (·), µ∗ ) of J¯(x(·), µ) ac-
cording to Euler-Lagrange satisfies
µ ¶
∂ d ∂
− (F (t , x ∗ (t ), ẋ ∗ (t )) + µ∗ M (t , x ∗ (t ), ẋ ∗ (t )) = 0.
∂x dt ∂ẋ

Below we formally prove that this argument is, in essence, correct. This may sound a bit
vague, but it does put us on the right track. The theorem presented next is motivated by
the above but the proof is given from scratch. The proof assumes knowledge of the Inverse
Function Theorem.
7 Lagrange parameters are usually denoted as λ. We use µ in order to avoid a a confusion in the next chapter.

25
Theorem 1.8.1 (Euler-Lagrange for integral-constraint minimization). Let c 0 be some con-
stant. Suppose that F (t , x, y) and M (t , x, y) are C 2 in all of its components and that x ∗ (·) is a
minimizer of
Z T
F (t , x(t ), ẋ(t )) dt
0

subject to
Z T
M (t , x(t ), ẋ(t )) dt = c 0
0

and that x ∗ (·) is C 2 . Then either there is a constant Lagrange parameter µ∗ ∈ R such that
µ ¶
∂ d ∂
− (F (t , x ∗ (t ), ẋ ∗ (t )) + µ∗ M (t , x ∗ (t ), ẋ ∗ (t ))) = 0 (1.65)
∂x dt ∂ẋ

or M (·) satisfies the Euler-Lagrange equation itself,


µ ¶
∂ d ∂
− M (t , x ∗ (t ), ẋ ∗ (t )) = 0. (1.66)
∂x dt ∂ẋ

Proof. This is not an easy proof. Suppose x ∗ (·) solves the constrained minimization problem
and fix two C 2 functions δx (·), ²x (·) that vanish at the boundaries

δx (0) = 0 = ²x (0), δx (T ) = 0 = ²x (T ).
RT RT
Define J (x(·)) = 0 F (t , x(t ), ẋ(t )) dt and C (x(·)) = 0 M (t , x(t ), ẋ(t )) dt and consider the map-
ping that sends two real numbers (α, β) to the two real numbers
· ¸ · ¸
J¯(α, β) J (x ∗ (·) + αδx (·) + β²x (·))
:= .
C̄ (α, β) C (x ∗ (·) + αδx (·) + β²x (·))

If the Jacobian
 ¯ ¯
∂ J (α, β) ∂ J¯(α, β) ¯
¯
 ∂α ∂β  ¯
JAC :=  ¯ (1.67)
 ∂C̄ (α, β) ∂C̄ (α, β) ¯¯

¯
∂α ∂β (α=0,β=0)

of this mapping is nonsingular at (α, β) = (0, 0) then by the inverse function theorem there is
a neighborhood of (α, β) = (0, 0) on which the mapping is invertible. In particular we then
can find small enough α, β such that C̄ (α, β) = C̄ (0, 0) = c 0 —hence satisfying the integral
constraint—but rendering a cost J¯(α, β) smaller than J¯(0, 0) = J (x ∗ (·)). This contradicts that
x ∗ (·) is minimizing. Conclusion: at an optimal x ∗ (·) the Jacobian (1.67) is singular for every
allowable δx (·), ²x (·).
We rewrite the Jacobian (1.67) in terms of F (·) and M (·). To this end let f (t ) and m(t )
denote the functions
µ ¶
∂ d ∂
f (t ) = − F (t , x ∗ (t ), ẋ ∗ (t )),
∂x dt ∂ẋ
µ ¶
∂ d ∂
m(t ) = − M (t , x ∗ (t ), ẋ ∗ (t )).
∂x dt ∂ẋ

26
This way the Jacobian (1.67) becomes (verify this yourself)
Z Z 
f (t )δx (t )dt f (t )²x (t )dt
 
JAC = Z Z . (1.68)
m(t )δx (t )dt m(t )²x (t )dt

If m(t ) = 0 for all t then (1.66) holds and the proof is complete. Remains to consider the case
that m(t 0 ) 6= 0 for at least one t 0 . Suppose, to obtain a contraction, that given such a t 0 there
is a t for which
· ¸
f (t 0 ) f (t )
(1.69)
m(t 0 ) m(t )
is nonsingular. Now take δx (·) to have support around t 0 and ²x (·) to have support around
t . Then by nonsingularity of (1.69) also (1.68) is nonsingular if the support is taken small
enough. However nonsingularity of the Jacobian is impossible by the fact that x ∗ (·) solves the
minimization problem. Therefore (1.69) is singular at every t . This means that
f (t 0 )m(t ) − f (t )m(t 0 ) = 0 ∀t .
In other words f (t ) + µ∗ m(t ) = 0 for µ∗ = − f (t 0 )/m(t 0 ) for all t . ■

The theorem says that the solution x ∗ (·) satisfies either (1.65) or (1.66). The first of these
two is often called the normal case, and the second the abnormal case. The next example
indeed suggests that we usually have the normal case.
R1
Example 1.8.2 (Normal & abnormal). Consider minimizing 0 x(t ) dt subject to the bound-
ary conditions x(0) = 0, x(1) = 1 and integral constraint
Z 1
ẋ 2 (t ) dt = C (1.70)
0
for some given C . The (normal) Euler-Lagrange equation (1.65) says that
µ ¶
∂ d ∂ d
0= − (x(t ) + µẋ 2 (t )) = 1 − (2µẋ(t )) = 1 − 2µẍ(t ).
∂x dt ∂ẋ dt
1 2
The general solution of this equation is x(t ) = 4µ t + bt + c and these satisfy the boundary
conditions x(0) = 0, x(1) = 1 iff
1 2 1
x(t ) = 4µ t + (1 − 4µ )t .

With this form the integral constraint (1.70) becomes


Z 1 Z 1
1
C= ẋ 2 (t ) dt = 1
( 2µ 1 2
t + 1 − 4µ ) dt = 1 + . (1.71)
0 0 48µ2
If the constraint value C is less than one then no µ exists, and it is not hard to see that in this
case there is no smooth function from x(0) = 0 to x(1) = 1 that meets the integral constraint
with C < 1 (See Exercise 1.16). For C > 1 there are two µ’s that satisfy (1.71), µ = p ±1 , and
48(C −1)
the resulting two functions x(·) (for C = 2) then are
1
x→

x(t )
µ<0
x(t )
µ>0

0 t→ 1

27
R1
Clearly, out of these two, the cost J (x(·)) := 0 x(t ) dt is minimal for the positive µ.
In the abnormal case (1.66) we have that
µ ¶
∂ d ∂
0= − ẋ 2 (t ) = 2ẍ(t ).
∂x dt ∂ẋ
Hence x(t ) = bt + c. Given the boundary conditions x(0) = 0, x(1) = 1 it is immediate that this
allows for only one solution: x(t ) = t ,
1
x→

0 t→ 1
R1
Now ẋ(t ) = 1 and the integral constraint is C = 0 ẋ 2 (t ) dt = 1. This corresponds to µ = ∞. It is
the case where the constraint is tight. There are, so to say, no degrees of freedom left to shape
the function. ä

1.9 Exercises
1.1 Find the solutions x : [0, T ] → R of the Euler-Lagrange equation for
Z T
J= F (t , x(t ), ẋ(t )) dt
0

and

(a) F (t , x, ẋ) = ẋ 2 − α2 x 2
(b) F (t , x, ẋ) = ẋ 2 + 2x
(c) F (t , x, ẋ) = ẋ 2 + 4t ẋ
(d) F (t , x, ẋ) = ẋ 2 + x ẋ + x 2
(e) F (t , x, ẋ) = t ẋ 2 − x ẋ + x
(f) F (t , x, ẋ) = g (t )ẋ 2 − h(t )x 2
(g) F (t , x, ẋ) = x 2 + 2t x ẋ (this one is curious.)

1.2 Consider minimization of


Z 1
ẋ 2 (t ) + 12 t x(t ) dt , x(0) = 0, x(1) = 1
0

over all functions x : [0, 1] → R.

(a) Determine the Euler-Lagrange equation for this problem


(b) Solve the Euler-Lagrange equation

1.3 Consider minimization of


Z 1
ẋ 2 (t ) − x(t ) dt , x(0) = 0, x(1) = 1
0

over all functions x : [0, 1] → R.

28
(a) Determine the Euler-Lagrange equation for this problem
(b) Solve the Euler-Lagrange equation
(c) Is Legendre’s second-order condition satisfied?
(d) Is the convexity condition (Thm. 1.7.5) satisfied?
(e) Show that the solution x ∗ (·) of Euler-Lagrange is globally optimal.

1.4 Consider minimization of


Z 1
J (x(·)) = ẋ 2 (t ) + x 2 (t ) + 2t x(t ) dt
0

with initial and final condition

x(0) = 0, x(1) = 1

over all functions x : [0, 1] → R.

(a) Determine the Euler-Lagrange equation for this problem


(b) Solve the Euler-Lagrange equation
(c) Is Legendre’s second-order condition satisfied?
(d) Let x ∗ (t ) be the solution of the Euler-Lagrange equation. Show that
Z 1
J (x ∗ (·) + δx (·)) = J (x ∗ (·)) + δ2x (t ) + δ̇2x (t ) dt
0

for every continuously differentiable function δx (·) with δx (0) = δx (1) = 0, and con-
clude that x ∗ (·) is globally optimal.
(e) Is the convexity condition (Thm. 1.7.5) satisfied?

y
y(x)
h

air speed v 0 x1

F IGURE 1.16: Solid of least resistance (Exercise 1.5)

1.5 A simplified Newton’s minimal resistance problem. Consider a solid of revolution with
diameter y(x) as shown in Fig. 1.16. If the air flows with a constant speed v (as in the
figure) then the total air resistance (force) can be modeled as
Z
2
x1 y(x) ẏ 3 (x)
4πρv dx.
0 1 + ẏ 2 (x)

The question is: given y(0) = 0 and y(x 1 ) = h > 0 (some constant h) for which curve is
the resistance minimal? Now we are going to cheat! To make the problem a lot easier
we simply discard the quadratic term in the denominator of the running cost: consider
the cost function
Z x1
2
J (y(·)) := 4πρv y(x) ẏ 3 (x) dx.
0

29
Given the boundary conditions are y(0) = 0 and y(x 1 ) = h show that
µ ¶3/4
x
y(x) = h
x1
is a solution of the Beltrami-identity with the given boundary conditions. (This function
y(x) is depicted in Fig. 1.16.)
(Notice that y(x) is not differentiable at x = 0 so formally the theorem of Euler-Lagrange
does not apply. But that’s nitpicking.)

1.6 Technical problem: the lack of Lipschitz continuity in the Beltrami identity for the
Brachistochrone problem, and how to circumvent it. The footnote of Example 1.3.1 de-
rives the cycloid equations (1.28) from the Beltrami identity with initial condition

c 2 = y(x)(1 + ẏ 2 (x)), y(0) = 0. (1.72)

The derivation was quick and this exercise shows that it was a bit dirty as well.
dy dy/dφ
(a) Let x(φ), y(φ) be the cycloid solution (1.28). Use the identity dx = dx/dφ to show
that they satisfy (1.72).
(b) The curve of this cycloid solution for φ ∈ [0, 2π] is
y
2
c

0 c 2π c2π x
2

Form this solution we construct a new solution by inserting a constant part of


some length ∆ ≥ 0 in the middle:
y
c2

0 c 2π c 2π
+∆ c2π + ∆ x
2 2

Argue that for every ∆ ≥ 0 also this new function satisfies the Beltrami iden-
tity (1.72) for all x ∈ (0, c 2 π + ∆).
(c) This is not what the footnote of Example 1.3.1 says. What goes wrong in this foot-
note?
2 2
(d) This new function y(x) is constant over the interval [ c 2π , c 2π +∆]. Show that a con-
stant function y(x) does not satisfy the Euler-Lagrange equation of the Brachis-
tochrone problem.
(e) It can be shown that y(x) solves (1.72) if-and-only-if it is of this new form for
some ∆ ≥ 0 (possibly ∆ = ∞). Argue that the only function that satisfies the Euler-
Lagrange equation with y(0) = 0 is the cycloid solution (1.28).

1.7 Technical problem: the lack of Lipschitz continuity in the minimal-surface problem, and
how to circumvent it. In Example 1.3.2 we claimed that y(x) = a cosh(x/a) is the only
positive even solution of (1.29). That is not correct. In this problem we assume that
a > 0.

30
(a) Show that the differential equation
q
dy(x)
= y 2 (x)/a 2 − 1
dx
is not Lipschitz-continuous at y = a. (See Appendix B.) Hence we can expect mul-
tiple solutions when y(x) = a.)
(b) Show that (1.29) can be separated as

dy
p = dx
y 2 /a 2 − 1

(c) If y(x 0 ) > a, show that y(x) = a cosh((x − c)/a) around x = x 0 for some c.
(d) Argue that y(x) is a solution of (1.29) iff it is pieced together from a hyperbolic
cosine, a constant, and a hyperbolic cosine again, as in

 x−c
a cosh( a ) if x < c

y(x) = a if x ∈ [c, d ]

 c
a cosh( x−d ) if x > d d
a

Here c ≤ d . (Notice that for x ∈ [c, d ] the value of y(x) is a at which point the
differential equation is not Lipschitz.)
(e) If c < d then on the strip [c, d ] the function y(x) is a constant (equal to a > 0). Show
that this does not satisfy the Euler-Lagrange equation. (Recall that the Beltrami
identity may have more solutions than the Euler-Lagrange equation.)
(f) Verify that y(x) = a cosh(x/a) is the only function that satisfies the Euler-Lagrange
equation of the minimal-surface problem (Example 1.3.2) and that has the sym-
metry property that y(−1) = y(+1).

1.8 Technical problem: from function to parameterized smooth curve. Consider the minimal
surface problem of Example 1.3.2. The Goldschmidt two-disc solution is not a function
y(x) and therefore it is not a surprise that it does not show up as a stationary trajectory
of our problem. Nevertheless it may be optimal. Instead of writing y as a function of x
we now express (x(t ), y(t )) as a function of some parameter t . Such a parameterization
of a given curve is of course highly non-unique. For any such parameterization the
length of the graph traced out over an infinitesimal strip [t , t + dt ] is
q
ẋ 2 (t ) + ẏ 2 (t ) dt

and hence the total area of the surface of revolution is


Z L q
J (y(·)) = 2πy(t ) ẋ 2 (t ) + ẏ 2 (t ) dt
0

Here L is the “time” at which x(L) equals the end-point. We do not know L but this is
not crucial.

(a) Use the Beltrami identity to show that any stationary curve (x(t ), y(t )) satisfies
q
y(t )ẋ 2 (t ) = a ẋ 2 (t ) + ẏ 2 (t )

for some constant a and all t ∈ (0, L).

31
(b) Show that for a = 0 Beltrami gives us the Goldschmidt solution.

For completeness we mention that if a 6= 0 then p the right-hand side is nonzero for an
appropriate parameterization (one for which ẋ 2 (t ) + ẏ 2 (t ) > 0 for all t ) and hence that
then ẋ(t ) 6= 0 for every t . As a result one can locally express t as a function of x which
brings us back to the parameterization proposed in Example 1.3.2 and therefore the
hyperbolic cosine solution. In summary: the only piecewise differentiable curves that
are stationary are those hyperbolic cosines and the Goldschmidt two-disc solution.

1.9 Show that the minimal surface example (Example 1.3.2) satisfies the second-order ne-
cessity condition of Thm. 1.7.1.

1.10 In this exercise we prove Proposition 1.5.1.

(a) For G(x) = S(x) = 0 the first-order conditions are that (1.37) holds for all possible
perturbations. Adapt this equation for the case that G(x) and/or S(x) or not zero.
(b) Prove that this equality implies that the Euler-Lagrange equation holds.
(c) Finish the proof of Proposition 1.5.1.

1.11 The optimal solar challenge. The solar vehicle receives power from solar radiation. This
power p(x, t ) depends on position x (due to clouds) and on time t (due to change of
clouds and sun’s angle of inclination). Driving at some speed v(t ) = ẋ(t ) also consumes
power. Denote this powerloss by f (ẋ) and assume that it is a function of speed alone.
This is reasonable if we do not change speed aggressively and if friction depends only
on speed. Now driving at higher speed is known to require more energy per meter than
driving at lower speed. This means that f (·) is convex, in fact that both first and second
derivative f 0 (·) and f 00 (·) are strictly positive,

f (·) ≥ 0, f 0 (·) > 0, f 00 (·) > 0.

Suppose the solar team starts at

x(0) = 0

and at time T it wants to be at some position x(T ) = x T and of course all that using
minimal net energy
Z T
f (ẋ(t )) − p(x(t ), t ) dt .
0

(a) Derive the Euler-Lagrange equation.


(b) Argue from the Euler-Lagrange equation that we should speed up if we drive into
a cloud.
(c) Is Legendre’s second order condition for minimality satisfied?
(d) From now on assume that

f (ẋ) = ẋ 2

(this is actually quite reasonable, modulo scaling) and that p(x, t ) does not depend
on time,

p(x, t ) = q(x),

i.e. that the sun’s angle does not change much over our time window [0, T ] and
that clouds are not moving. Use the Beltrami Identity to express ẋ(t ) in terms of
q(x(t )) and the initial speed ẋ(0) and initial q(0).

32
(e) Argue once again (but now using the explicit relation of the previous part) that
when we drive into a cloud then we should speed up.
(f) A computer might be useful for this part. Continue with f (ẋ) = ẋ 2 and p(x, t ) =
q(x). Now finally suppose that up to position x = 20 the sky is clear but that from
x = 20 onwards heavy clouds limit the power input:
(
100 x < 20
q(x) =
4 x > 20.

Determine the optimal speed profile ẋ(t ) that brings us from x(0) = 0 in T = 7 to
x(7) = 90.

1.12 Free end-point. Minimize


Z 1
2
x (1) + ẋ 2 (t ) dt
0

over all functions x(·) subject to x(0) = 1 and free end-point x(1).

1.13 Free end-point. Let T = 1 and consider minimization of


Z 1
J (x(·)) = ẋ 2 (t ) − 2x(t )ẋ(t ) − ẋ(t ) dt .
0

with initial condition x(0) = 1 and free end-point x(1).

(a) Show that no function exists that satisfies Euler-Lagrange with x(0) = 1 and the
free end-point constraint (1.42).
(b) Conclude that there is no C 2 function x(·) that minimizes J (x(·)) subject to x(0) =
x 0 and free end-point.
(c) Determine all functions x(·) that satisfy Euler-Lagrange and with x(0) = 1. Then
compute J (x(·)) explicitly and conclude, once more, that the free end-point prob-
lem has no solution.

1.14 Smoothness. Consider minimization of


Z 1
J (x(·)) = (1 − ẋ(t ))2 x 2 (t ) dt
−1

with boundary conditions

x(−1) = 0, x(1) = 1.

(a) Show that J (x(·)) ≥ 0 for every function x(·).


(b) Determine a continuous optimal solution x ∗ (·) and argue that it is unique. [Hint:
J (x ∗ (·)) = 0 and do not use Euler-Lagrange or Beltrami.]
(c) Is there a continuously differentiable optimal solution x ∗ (·)?

The exercise shows that smooth running costs F (·) may result in non-smooth optimal
solutions x ∗ (·).

33
1.15 The hanging cable. Any flexible free-hanging cable comes to a halt in a position of min-
imal energy, such as these three:

What is the shape of this minimal energy position? When hanging still it has no kinetic
energy, it only has potential energy. If the cable is very flexible then the potential energy
is only due to its height y(x). We assume that the cable is very thin, does not stretch
and that is has a constant mass per unit length. In a constant gravitational field with
gravitational acceleration g the potential energy J (y(·)) is
Z x1 q
J (y(·)) = ρg y(x) 1 + ẏ 2 (x) dx,
x0

with ρ the mass per unit length of the cable.


Show that the minimal energy solution y(x) has the form

y(x) = a cosh((x − b)/c) + d

for some constant a ≥ 0 and b, c ∈ R. (Hint: we considered a very similar running cost
elsewhere in these notes.)

1.16 Consider Example 1.8.2. Prove that for C < 1 there is no smooth function that satisfies
the boundary conditions and integral constraint.
RT RT
1.17 Minimize 0 ẋ 2 (t ) dt subject to x(0) = x(T ) = 0 and 0 x 2 (t ) dt = 1.

34
Chapter 2

Minimum Principle

2.1 Optimal control


In the solar-challenge problem (Exercise 1.11) we assumed that we could choose the speed
ẋ(t ) of the car at will, but in reality the speed is limited by the dynamics of the car. For in-
stance, the acceleration of the car is bounded. In this chapter we take these dynamical con-
straints into account. We assume that the state satisfies a differential equation

ẋ(t ) = f (x(t ), u(t )), x(0) = x 0 (2.1)

and that we can not choose x(t ) directly but only can choose the input u(t ). In fact, the input
may itself be restricted to take values in some possibly limited set U,

u : [0, T ] → U.

For example U = [−1, 1] or U = Rm or U = [0, ∞). In the solar challenge example, the input
u(t ) might be the throttle opening and this takes values in between u(t ) = 0 (fully closed) and
u(t ) = 1 (fully open).
For a given U and given (2.1), the optimal control problem is to find a control input u :
[0, T ] → U that minimizes a cost function of the form
Z T
J (x 0 , u(·)) := S(x(T )) + L(x(t ), u(t )) dt . (2.2)
0
n n
Here S : R → R and L : R × U → R. The part S(x(T )) is called the terminal- or final cost, and
L(x(t ), u(t )) is commonly called the running cost and the optimal u(·) is referred to as the
optimal control and is often denoted with a star, u ∗ (·).
Using the calculus of variations of the previous chapter, combined with the classic idea
of Lagrange parameters, we derive first-order conditions that any optimal control must sat-
isfy. Motivated by these first-order conditions we then formulate the truly fabulous Minimum
Principle of Pontryagin. This result shocked the world when it appeared in the late fifties,
early sixties of the previous century. The Minimum Principle is very general and gives us
necessary conditions for a control to be optimal. In many applications these conditions are
numerically trackable. But be warned: the proof of the Minimum Principle is complicated!
A variation of the optimal control problem is to fix the final state vector x(T ) to a given
x T . Clearly in this case there is no need for a final cost S(x(T )) in that every input results in
the same final cost. In this case the optimal control problem is to find, for a given (2.1) and
x T and U, an input u : [0, T ] → U that minimizes
Z T
L(x(t ), u(t )) dt (2.3)
0

35
subject to x(T ) = x T . In the final section of this chapter we consider the optimal control
problem where also the final time T is variable and where the cost is to be minimized over all
allowable inputs u(·) as well as all T > 0.

2.2 Summary of the classic Lagrange multipliers


The optimal control problem is a minimization problem subject to dynamical constraints.
The classic way of dealing with constraints is to introduce Lagrange multipliers. This short
section provides a quick summary of this method.
Consider minimizing a cost function L : Rn → R over a constraint set of Rn determined as
the zeros of some function G : Rn → Rk :
(
minz∈Rn L(z)
(2.4)
subject to G(z) = 0.

The method of Lagrange multipliers can help to find minimizers. The idea is to solve the
unconstraint minimization problem of an associated cost function defined as

K (z, λ) = λT G(z) + L(z).

This function K : Rn × Rk → R is known as the Lagrangian and the λ’s are known as La-
grange multipliers and they live in Rk . Assuming K (·) is sufficiently smooth, the first-order
conditions of the unconstraint minimization of K (z, λ) over all z and λ are that both gradi-
ents are zero at the solution:
∂K (z ∗ , λ∗ ) ∂K (z ∗ , λ∗ )
= 0, = 0. (2.5)
∂z ∂λ
As the Lagrangian is linear in λ it is immediate that the gradient of K (z, λ) with respect to
λ is G T (z). Hence (z ∗ , λ∗ ) is a stationary solution of K (z, λ) only if the constraints hold,
G(z ∗ ) = 0, and then K (z ∗ , λ∗ ) = L(z ∗ , λ∗ ). Under mild assumptions on G(·) the first-order
conditions (2.5) are equivalent to the first-order conditions of the constrained minimization
problem (2.4). A more detailed discussion can be found in Appendix A.7.
For the optimal control problem a similar approach will be taken, however with the added
complication that we are not dealing with a minimization over a finite number of parameters,
z ∈ Rn , but over uncountably many functions u(·), x(·), and the constraints are the dynamical
constraints ẋ(t ) = f (x(t ), u(t )) and these need to be satisfied for all time t ∈ [0, T ].

2.3 First-order conditions for optimal control


We return to the optimal control problem of minimizing a cost
Z T
J (x 0 , u(·)) = S(x(T )) + L(x(t ), u(t )) dt . (2.6)
0

subject to

ẋ(t ) = f (x(t ), u(t )), x(0) = x 0 , (2.7)

and in this section we do not restrict the inputs,

U = Rm .

36
The optimal control problem can be regarded as a constrained optimization problem,
with (2.7) being the dynamical constraint. This observation provides a clue to its solution:
introduce Lagrange multiplier functions p : [0, T ] → Rn corresponding to these dynamical
constraints. Analogous to the classic Lagrange multiplier method we define the Lagrangian
K (·) as

K (x, ẋ, u, p) = p T ( f (x, u) − ẋ) + L(x, u). (2.8)

Now the first objective is to determine the first-order conditions for K (·), i.e. the condi-
tions that stationary solutions

q(·) := (x(·), p(·), u(·))

of the unconstrained problem with cost


Z T
S(x(T )) + K (x(t ), ẋ(t ), u(t ), p(t )) dt (2.9)
0

must satisfy. Before we delve into the resulting Euler-Lagrange equation, it is interesting to
figure out what the Beltrami identity gives us. Indeed our K (·) is of the form K (q, q̇) and so
does not explicitly depend on time. Therefore Beltrami applies and it says that

∂K (q(t ), q̇(t ))
K (q(t ), q̇(t )) − q̇ T (t )
∂q̇

is constant over time for the stationary solutions. For our K (q, q̇) this constant function takes
the form

∂K (q, q̇)
K (q, q̇) − q̇ T
∂q̇
µ ¶
∂K (q, q̇) ∂K (q, q̇) ∂K (q, q̇)
= K (q, q̇) − ẋ T + ṗ T + u̇ T
∂ẋ ∂ṗ ∂u̇
= p T ( f (x, u) − ẋ) + L(x, u) − (−ẋ T p + 0 + 0)
= p T f (x, u) + L(x, u).

The final function is known as the Hamiltonian and it plays a central role in optimal control.

Lemma 2.3.1 (Hamiltonian equations). Let U = Rm and x 0 ∈ Rn . The smooth enough sta-
tionary functions (x(·), p(·), u(·)) with x(0) = x 0 of the cost (2.9), where K (·) is defined as
in (2.8), satisfy

∂H (x(t ), p(t ), u(t ))


ẋ(t ) = , x(0) = x 0 (2.10a)
∂p
∂H (x(t ), p(t ), u(t )) ∂S(x(T ))
ṗ(t ) = − , p(T ) = (2.10b)
∂x ∂x
∂H (x(t ), p(t ), u(t ))
0= . (2.10c)
∂u

Here the Hamiltonian H (x, p, u) is defined as

H (x, p, u) = p T f (x, u) + L(x, u). (2.11)

37
Proof. The stationary solutions are those that satisfy the Euler-Lagrange equation together
with the boundary conditions of Proposition 1.5.1. Define K (q, q̇) as in (2.8) with q := (x, p, u)
and notice that K (q, q̇) in terms of the Hamiltonian H (x, p, u) is

K (q, q̇) = H (q) − p T ẋ.

For ease of exposition we momentarily drop the arguments of all functions. The Euler-
∂ d ∂
Lagrange equation 0 = ( ∂q − dt ∂q̇ )K holds component-wise. For component x it says

∂ d ∂ ∂H
0=( − )(H − p T ẋ) = + ṗ.
∂x dt ∂ẋ ∂x

Hence ṗ = − ∂H
∂x . For component p it says

∂ d ∂ ∂H
0=( − )(H − p T ẋ) = − ẋ.
∂p dt ∂ṗ ∂p
∂H
Hence ẋ = ∂p . For component u it says

∂ d ∂ ∂H
0=( − )(H − p T ẋ) = .
∂u dt ∂u̇ ∂u
∂S(x(T )) ∂K (q(T ),q̇(T ))
The free final-point (aka fre end-point) condition (1.41) becomes 0 = ∂q + ∂q̇
and per component this is

∂S(x(T )) ∂K (q(T ), q̇(T )) ∂S(x(T ))


0= + = − p(T );
∂x ∂ẋ ∂x
∂S(x(T )) ∂K (q(T ), q̇(T ))
0= + = 0 + 0;
∂p ∂ṗ
∂S(x(T )) ∂K (q(T ), q̇(T ))
0= + = 0 + 0.
∂u ∂u̇
))
The first says p(T ) = ∂S(x(T∂x and the other two are void.
Since we have an initial condition on x but not on p and u, the free initial-point con-
ditions (1.40) on q needs to hold for components p, u. The initial-point conditions become
∂K (q(0),q̇(0))
0= ∂q̇ and for the respective components p and u this gives

∂K (q(0), q̇(0)) ∂K (q(0), q̇(0))


0= = 0, and 0 = = 0.
∂ṗ ∂u̇

These conditions are void. ■

The differential equations (2.10a, 2.10b) are known as the Hamiltonian equations. Note
that
∂H (x, p, u)
= f (x, u),
∂p

so the first Hamiltonian equation (2.10a) is nothing else than the system equation ẋ(t ) =
f (x(t ), u(t )), x(0) = x 0 .
The Lagrange multiplier p(t ) is called the costate (because mathematically it lives in a dual
space to the (variations) of the state x(t )). In examples it often has interesting interpretations
– shadow prices in economics and contact forces in mechanical systems – in terms of the
sensitivity of the minimized cost function. This is already illustrated by the condition p ∗ (T ) =

38
∂S(x ∗ (T ))
∂x ,
which means that p ∗ (T ) equals the sensitivity of the final time cost with respect to
variations in the optimal state at the final time. Later we will see that

∂J (x 0 , u ∗ (·))
p(0) = ,
∂x 0

where J (x 0 , u ∗ (·)) is the optimal cost for initial state x 0 , see § 3.5. A large p(0) hence means
that the optimal cost might be very sensitive to changes in the initial state.

2.4 Minimum Principle


Based on the classic Lagrangian technique one would conjecture that solutions to the opti-
mal control problem, with U = Rm , must satisfy the first-order conditions for the extended
problem (Lemma 2.3.1). This means that, given an optimal control u ∗ (t ) and correspond-
ing optimal state trajectory x ∗ (t ), one would conjecture the existence of a function p ∗ (t ) that
satisfies
∂H (x ∗ (t ), p(t )∗ , u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) =
∂x ∂x
and such that (x ∗ (t ), p ∗ (t ), u ∗ (t )) satisfies (2.10c). If we can guarantee all that then we would
have proved:

Proposition 2.4.1 (Unconstrained inputs). Suppose that U = Rm and that u ∗ : [0, T ] → U is a


solution of the optimal control problem, and x ∗ (·) the resulting optimal state trajectory. Then
there exists a function p ∗ : [0, T ] → Rn such that

∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x 0 , (2.12a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = (2.12b)
∂x ∂x
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
0= . (2.12c)
∂u
ä

If only that were true. Well, it is true (under some mild smoothness assumption)! In fact
it holds in a far more general setting. The following celebrated theorem by Pontryagin and
coworkers provides a necessary condition for solutions of the true minimization problem (not
just stationary ones), and it can even deal with restricted sets U! The basic feature is that it
replaces the first-order optimality condition (2.12c) with a true minimization condition. Here
is the famous result, it is the central result of this chapter:

Theorem 2.4.2 (Minimum Principle). Consider a differential equation ẋ(t ) = f (x(t ), u(t ))
and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are all continuous in
x and u.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and assume it is
bounded and piecewise continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Given
such u ∗ (·) there is a unique function p ∗ : [0, T ] → Rn that satisfies

∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
ẋ ∗ (t ) = , x ∗ (0) = x 0 , (2.13a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗ (T ) = (2.13b)
∂x ∂x

39
and along the solution x ∗ (t ), p ∗ (t ), the input u ∗ (t ) at every t ∈ [0, T ] where the input is con-
tinuous minimizes the Hamiltonian:

H (x ∗ (t ), p ∗ (t ), u ∗ (t )) = min H (x ∗ (t ), p ∗ (t ), u). (2.14)


u∈U

Proof. (This proof requires a couple of technical results regarding continuity of solutions of
differential equations. Upon first reading these can be discarded but for a full understanding
you should have a look at Appendix B.)
Let u ∗ (·) be an optimal input, and let x ∗ (·) be the corresponding optimal state. First notice
that the costate equations are linear in the costate:

ṗ(t ) = A(t )p(t ) + b(t ), p(T ) = ∂S(x ∗ (T ))/∂x

where A(t ) := −∂ f (x ∗ (t ), u ∗ (t ))/∂x T and b(t ) := −∂L(x ∗ (t ), u ∗ (t ))/∂x. By assumption both A(t )
and b(t ) are piecewise continuous and bounded and so the solution p ∗ (t ) exists, is continu-
ous and is unique.
Now assume, to obtain a contradiction, that at some time t̄ ∈ [0, T ) where the input is
continuous, a ū ∈ U exists that achieves a smaller Hamiltonian H (x ∗ (t̄ ), p ∗ (t̄ ), ū) than u ∗ (t̄ )
does. Then, because of continuity, for some small enough ² > 0 the function defined as
(
ū if t ∈ [t̄ , t̄ + ²],
ū(t ) = (2.15)
u ∗ (t ) elsewhere

achieves a smaller (or equal) Hamiltonian for all time, and


Z T
H (x ∗ (t ), p ∗ (t ), ū(t )) − H (x ∗ (t ), p ∗ (t ), u ∗ (t )) dt = c² + o(²)
0

for the negative number c = H (x ∗ (t̄ ), p ∗ (t̄ ), ū) − H (x ∗ (t̄ ), p ∗ (t̄ ), u ∗ (t̄ )). Now write ū(t ) as a per-
turbation of the optimal input,

ū(t ) = u ∗ (t ) + δu (t ).

The so defined perturbation δu (t ) = ū(t ) − u ∗ (t ) has a support of ². Its graph might look like

δu (t )
ǫ

0 t̄ t→

In the rest of the proof we fix this perturbation and we only consider very small ². Such per-
turbations are called “needle” perturbations.
By perturbing the input, ū(t ) = u ∗ (t ) + δu (·), the solution of ẋ(t ) = f (x(t ), u ∗ (t ) + δu (t ))
perturbs as well. Denote the perturbed state as x(·) = x ∗ (·) + δx (·). The perturbation δx (t ) is
probably not a needle but at each moment in t it is of order1 ². The derivative of this δx (·)
satisfies

δ̇x (t ) = (ẋ ∗ (t ) + δ̇x (t )) − ẋ ∗ (t )


= f (x ∗ (t ) + δx (t ), u ∗ (t ) + δu (t )) − f (x ∗ (t ), u ∗ (t )). (2.16)
1 For t ≤ t̄ we have δ (t ) = 0. For t ∈ [t̄ , t̄ + ²] we have kδ (t )k = kx(t ) − x (t )k = kx(t ) − x(t̄ ) − (x (t ) − x(t̄ ))k ≤
x x ∗ ∗
kx(t ) − x(t̄ )k + kx ∗ (t ) − x(t̄ )k = k(t − t̄ )( f (x(t̄ ), u(t̄ )))k + k(t − t̄ )( f (x(t̄ ), u(t̄ )))k + o(t − t̄ ) ≤ M ² for some M > 0 and all
small enough ² > 0. So at t = t̄ + ² the solutions x(t ) and x ∗ (t ) differ, in norm, at most M ². Now for t > t̄ + ² apply
Lemma B.1.6 with g (t ) = 0.

40
This expression we soon need. To avoid clutter we now drop all time arguments, that is, x(t )
is simply denoted as x, et cetera. Also in the equations that follow the approximate identity ≈
means equal up to an o(²) term. Let ∆ be the change in cost, ∆ = J (x 0 , u ∗ + δu ) − J (x 0 , u ∗ ). We
have

∆ = J (x 0 , u ∗ + δu ) − J (x 0 , u ∗ )
Z T
= S(x ∗ (T ) + δx (T )) − S(x ∗ (T )) + L(x ∗ + δx , u ∗ + δu ) − L(x ∗ , u ∗ ) dt
0
Z T
∂S(x ∗ (T ))
≈ δ x (T ) + L(x ∗ + δx , u ∗ + δu ) − L(x ∗ , u ∗ ) dt .
∂x T 0

Next use that L(x, u) = −p T f (x, u) + H (x, p, u) and let p be the optimal costate p ∗ :
Z T
∂S(x ∗ (T ))
∆≈ δ x (T ) + −p ∗T [ f (x ∗ + δx , u ∗ + δu ) − f (x ∗ , u ∗ )] dt
∂x T 0
Z T
+ H (x ∗ + δx , p ∗ , u ∗ + δu ) − H (x ∗ , p ∗ , u ∗ ) dt .
0

The term in between square brackets according to (2.16) is δ̇x , so


Z T
∂S(x ∗ (T ))
∆≈ δ x (T ) + −p ∗T δ̇x + H (x ∗ + δx , p ∗ , u ∗ + δu ) − H (x ∗ , p ∗ , u ∗ + δu ) dt
∂x T 0
Z T
+ H (x ∗ , p ∗ , u ∗ + δu ) − H (x ∗ , p ∗ , u ∗ ) dt .
0

Here we also subtracted and added a term H (x ∗ , p ∗ , u ∗ + δu ). The reason is that now the
difference of the first two Hamiltonian terms can be recognized as an approximate partial
derivative with respect to x, and the difference of the final two terms is what we considered
earlier (it equals c² + o(²)), so:
Z T
∂S(x ∗ (T )) ∂H (x ∗ , p ∗ , u ∗ + δu )
∆≈ δx (T ) + −p ∗T δ̇x + δx dt + c².
∂x T 0 ∂x T

∗ ∗ ∗∂H (x ,p ,u +δ )
u ∗ ∗ ∗ ∂H (x ,p ,u )
Notice that the partial derivative ∂x T equals −ṗ ∗ = ∂x T almost everywhere
(except for ² units of time). Combined with the fact that δx at each moment in time is also of
order ² we have that
Z T
∂S(x ∗ (T ))
∆≈ δ x (T ) + −p ∗T δ̇x − ṗ ∗T δx dt + c².
∂x T 0

The integrand −p ∗T δ̇x − ṗ ∗T δx we recognize as the total derivative of −p ∗T δx with respect to


time. Now it is better to add the time dependence again:

∂S(x ∗ (T )) h iT
T
∆≈ δ x (T ) + −p ∗ (t )δ x (t ) + c²
∂x T 0
∂S(x ∗ (T ))
=[ − p ∗T (T )] δx (T ) + p ∗T (0) δx (0) +c² = c² + o(²).
∂x T | {z }
| {z } 0
0

Here we used that δx (0) = 0. This is because of the boundary condition x(0) = x 0 . Since c < 0
we see that ∆ is negative for small enough ². But that would mean that ū(·) for small enough
² achieves a smaller cost than optimal. Not possible. Hence the assumption that u ∗ (t ) does
not minimize the Hamiltonian at every moment in time is wrong. ■

41
This theory of optimal control was developed in the Sovjet Union in the fifties of the 20th
century and to honour its main contributor is often called the Pontryagin Minimum Princi-
ple (or Pontryagin Maximum Principle if we would have considered maximization instead of
minimization). A drawback of the Minimum Principle is that it assumes the existence of an
optimal control u ∗ (t ), and only then guarantees that u ∗ (t ) minimizes the Hamiltonian at each
moment in time. In practical situations, though, it often is this minimization that determines
the optimal control u ∗ (t ).

Example 2.4.3. Consider the system

ẋ(t ) = u(t ), x(0) = x 0 , (2.17)

with cost
Z 1
J (x 0 , u(·)) = x(t ) dt . (2.18)
0

So we have that f (x, u) = u, S(x) = 0, and L(x, u) = x. As input set we choose U = [−1, 1]. The
Hamiltonian follows as

H (x, p, u) = pu + x,

and the equations for the optimal costate then are

∂H (x ∗ (t ), p ∗ (t ), u ∗ (t )) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − = −1, p ∗ (1) = = 0. (2.19)
∂x ∂x
Clearly this means that the costate is

p ∗ (t ) = 1 − t .

The optimal input u ∗ (t ) – assuming it exists – at each t ∈ [0, 1] minimizes the Hamiltonian
p 1 (t )u(t ) + x(t ), that is, we need to solve

min p ∗ (t )u + x ∗ (t ). (2.20)
u∈[−1,1]

Since p ∗ (t ) = 1 − t > 0 for all t ∈ [0, 1) the optimal input is the attained at the boundary where
U is minimal,

u ∗ (t ) = −1 ∀t .
R1
This makes perfect sense: to minimize 0 x(t ) dt we want x(t ) to go down as fast as possible
which given the system dynamics ẋ(t ) = u(t ) means taking u(t ) as negative as possible: u(t ) =
−1.
The situation changes qualitatively if we add a final cost S(x(1)) = − 12 x(1): consider the
cost
Z 1
1
J (x 0 , u(·)) = − x(1) + x(t ) dt .
2 0

Now it is not obvious what to with with u(t ) because the faster x(t ) goes down the larger the
final cost − 12 x(1) is going to be. So possibly u(t ) = −1 is no longer optimal. In fact, we will see
that it is not optimal. The costate equations now are

∂S(x ∗ (1)) 1
ṗ(t ) = −1, p(1) = =−
∂x 2

42
and therefore
1
p ∗ (t ) = − t.
2

It is positive for 0 ≤ t ≤ 12 but negative for 12 ≤ t ≤ 1. Hence the optimal control – still assuming
it exists – solves (2.20) and, therefore, switches at t = 1/2,
(
−1 if 0 ≤ t ≤ 21
u ∗ (t ) =
+1 if 12 ≤ t ≤ 1.

Apparently it is now optimal to move x(t ) down as fast as possible over the first half of the
time interval and then back up as fast as possible over the second half. ä

Example 2.4.4. Consider the system

ẋ(t ) = u(t ), x(0) = x 0

with cost
Z 1
J (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0

Now we allow any u(t ) ∈ R. Notice that this the same cost function as in Example 1.5.2 be-
cause u(t ) = ẋ(t ). The associated Hamiltonian is

H (x, p, u) = pu + x 2 + u 2 .

Since H (x, p, u) is quadratic in u and U = R the minimizing u is the one at which the gradient
of H (x, p, u) with respect to u is zero. This yields

1
u ∗ (t ) = − p ∗ (t ), (2.21)
2
and the Hamiltonian equations (2.13) then become
1
ẋ ∗ (t ) = − p ∗ (t ), x ∗ (0) = x 0 ,
2
ṗ ∗ (t ) = −2x ∗ (t ), p ∗ (1) = 0,

i.e., p̈ ∗ (t ) = p ∗ (t ). The solution is

p ∗ (t ) = c 1 et +c 2 e−t .

Since x ∗ (t ) = − 12 ṗ ∗ (t ) we find that

1 1
x ∗ (t ) = − c 1 et + c 2 e−t .
2 2
The two constants c 1 , c 2 follow uniquely from the two constraints x ∗ (0) = x 0 and p ∗ (1) = 0
(verify this yourself) and it gives
x 0 £ 1−t ¤
x ∗ (t ) = −1
e + et −1 ,
e+e
2x 0 £ 1−t ¤
p ∗ (t ) = −1
e − et −1
e+e
and then u ∗ (t ) follows from (2.21). ä

43
Example 2.4.5 (Optimal reinvestment). Let x(t ) be the production rate of, say, gold of some
mining company. At each moment in time a fraction u(t ) ∈ [0, 1] of the gold is reinvested in
the company to increase the production rate. This can be modeled as

ẋ(t ) = αu(t )x(t ), x(0) = x 0 , u(t ) ∈ [0, 1],

where α is some positive parameter that models the success of investment. Since we reinvest
u(t )x(t ), the net production rate available for the market is (1 − u(t ))x(t ). After T units of
RT
time we want to net total production 0 (1 − u(t ))x(t ) dt to be as large as possible. In our
setup (with minimization) it means that we want the opposite sign
Z T
J (x 0 , u(·)) := (u(t ) − 1)x(t ) dt
0

to be as small as possible. The Hamiltonian is

H (x, p, u) = (u − 1)x + pαux = ux(1 + αp) − x (2.22)

and the Hamiltonian equations become

ẋ(t ) = αu(t )x(t ), x(0) = x 0 ,


ṗ(t ) = (1 − u(t )) − p(t )αu(t ), p(T ) = 0.

These differential equations are, in its present form, still hard to solve. However our Hamilto-
nian (2.22) is linear in u(t ) so the minimizer u ∗ (t ) ∈ [0, 1] of the Hamiltonian (2.22) depends
solely on the sign of x(t )(1 + αp(t )). The production rate x(t ) is inherently positive (because
x(0) = x 0 > 0 and ẋ(t ) = αu(t )x(t ) ≥ 0) therefore the Hamiltonian is minimized for
(
0 if 1 + αp ∗ (t ) > 0
u ∗ (t ) =
1 if 1 + αp ∗ (t ) < 0.

The value of the costate p ∗ (t ) where this u ∗ (t ) switches is p ∗ (t ) = −1/α, see Fig. 2.1(left). Now
at t = T we have p ∗ (T ) = 0, so near the final time T we have u ∗ (t ) = 0 (invest nothing, sell all)
and then the Hamiltonian dynamics reduces to ẋ(t ) = 0 and

ṗ(t ) = 1 and p(T ) = 0 near t = T .

That is, p(t ) = t − T near t = T , see Fig. 2.1. Solving backwards in time, starting at t = T , we
see that the costate reduces linearly, until at time

t s := T − 1/α

it reaches the level p(t s ) = −1/α < 0 at which point u ∗ (t ) switches sign. Since ṗ(t ) > 0 for
every input, the value of p(t ) is less than −1/α for t < t s which, in turn, implies that u ∗ (t ) = 1
for all t < t s . For this case the Hamiltonian dynamics simplifies to

ẋ(t ) = αx(t ), ṗ(t ) = −αp(t ) ∀t < t s .

Both x(t ) and p(t ) now have exponential solutions. The combination of before-and-after-
switch is shown in Fig. 2.1.
Notice that if t s < 0 then on the time window [0, T ] no switch takes place. It is then op-
timal to invest nothing and sell everything throughout [0, T ]. This happens if α < 1/T and
the interpretation is that the success of investment α is then too small to benefit from invest-
ment. If, on the other hand, α > 1/T then t s > 0 and then investment is beneficial and the
above shows that it is optimal to first invest all and in the final 1/α time units to sell all. Of
course this model is a simplification of reality. ä

44
u→
p→
u∗ (t ) = 1

ts t =T u∗ (t ) = 0
0
0 ts t =T

t

T
=
t)
∗(
p
− α1
x∗ (t ) = eαts x0

x→
ts )

αt x0
(t −

e
α e −α

)=
(t
−1

x∗
=
(t )
p∗

0 ts t =T

F IGURE 2.1: Optimal costate p ∗ (t ), optimal input u ∗ (t ) and optimal state x ∗ (t ) (Example 2.4.5)

An interesting consequence of the Hamiltonian form of the differential equations for x(t )
and p(t ) is that the Hamiltonian function H (x, p, u) = p T f (x, u) + L(x, u) is preserved along
optimal trajectories. For the unconstraint inputs U = Rm this follows from the Beltrami iden-
tity but it may also verified directly from the first-order equations for optimality expressed in
Proposition 2.4.1. Indeed, let x ∗ (t ), p ∗ (t ), u ∗ (t ) denote an optimal triple satisfying the equa-
tions of Proposition 2.4.1. Then a direct computation yields (and for the sake of exposition
we momentarily skip all arguments of H and other functions)

d ∂H ∂H ∂H
H = T ẋ ∗ + ṗ ∗ + T u̇ ∗
dt ∂x ∂p T ∂u
|{z}
0
∂H ∂H ∂H ∂H ∂H ∂H
= T ẋ ∗ + ṗ = T
T ∗
+ T
(− )=0 (2.23)
∂x ∂p ∂x ∂p ∂p ∂x

for every solution x ∗ (t ), p ∗ (t ), u ∗ (t ) of (2.12c). In the next chapter we prove that the conser-
vation of the Hamiltonian H (x, p, u) along optimal trajectories also holds for restricted input
sets U (such as U = [0, 1] et cetera). This is quite remarkable because in such cases the input
often is discontinuous. The following example illustrates this property.

Example 2.4.6 (Example 2.4.3 continued). In Example 2.4.3 we considered ẋ(t ) = u(t ) with
R1
initial condition x(0) = x 0 and cost J (x 0 , u(·)) = − 12 x(1) + 0 x(t ) dt . We found that the optimal
costate trajectory is linear in time,

1/2
p(t )
1
p ∗ (t ) = − t 0 1
2

45
and that the optimal input switches halfway

( 1
−1 if 0 ≤ t < 12 ,
u ∗ (t ) = 1
0 1
1 if 2 ≤ t ≤ 1.
−1 u(t )

Therefore the description of the optimal state trajectory also switches halfway. From ẋ(t ) =
u(t ) it follows that
x0 x(t )
(
x0 − t for 0 ≤ t ≤ 12 ,
x ∗ (t ) = 1 0 1
x0 − 1 + t for 2 ≤ t ≤ 1.

Based on this one would perhaps think that the Hamiltonian then switches as well, but it does
not: the Hamiltonian is H (x, p, u) = pu + x and along optimal trajectories it is constant for all
time:

H (x ∗ (t ), p ∗ (t ), u ∗ (t )) = p ∗ (t )u ∗ (t ) + x ∗ (t )
(
−( 1 − t ) + (x 0 − t ) if t < 1/2
= 12
( 2 − t ) + (x 0 − 1 + t ) if t ≥ 1/2
= x 0 − 12 ∀t .

2.5 Optimal control with final constraints


There are quite a few applications where the final state x(T ) is constraint. In the car parking
problem, for instance, we need the speed of the car to equal zero at the final time. There
are many such applications. Let r denote the number of components of the final state that
are constraint. Without loss of generality we assume these to be the first r components. So
consider the system with initial and final conditions

ẋ(t ) = f (x(t ), u(t )), x(0) = x 0 , x i (T ) = x̂ i , i = 1, . . . , r. (2.24)

Keep in mind that there are no conditions on the remaining final state components
x r +1 (T ), . . . , x n (T ). As before we take a cost of the form
Z T
J (x 0 , u(·)) = S(x(T )) + L(x(t ), u(t )) dt . (2.25)
0

Lemma 2.3.1 (the first-order conditions for U = Rm ) can be generalized to this case as follows.
In the proof of this lemma, the conditions on the final costate
∂S(x ∗ (T ))
p(T ) =
∂x
were derived from the free end-point condition (1.41), but in Proposition 1.5.1 we saw that
these conditions are absent if the final state is fixed. With that in mind it will be no surprise
that fixing the first r components of the state, x i (T ), i = 1, . . . , r implies that the conditions
on the corresponding first r components of the costate are absent, so only the remaining
components of p(T ) are fixed:
∂S(x ∗ (T ))
p i (T ) = , i = r + 1, . . . , n.
∂x i

46
That is indeed the case, normally. However, there can be a catch: the first-order conditions
were derived using a perturbation of the solution, but if both initial state and final state are
fixed then examples can be constructed where nonzero perturbations do not exist. For exam-
ple, if

ẋ(t ) = u 2 (t ), x(0) = 0, x(1) = 0

then the only control that steers x(0) = 0 to x(1) = 0 is the zero function u(t ) = 0 and any
perturbation of this u(t ) is infeasible. Now matters become involved, and it would take way
to long to explain here how to resolve this problem. The interested reader might want to
consult the excellent book Liberzon (2012). Here we just provide the standard solution. The
solution involves the introduction of the modified Hamiltonian

H (x, p, u, λ) = p T f (x, u) + λL(x, u). (2.26)

It is the Hamiltonian but with an extra parameter λ and this parameter is either zero or one,

λ ∈ {0, 1}.

Note that H (x, p, u, 1) is the “normal” Hamiltonian, and that H (x, p, u, 0) completely neglects
the running cost L(x, u). The case λ = 0 is commonly referred to as the “abnormal” case,
indicating that it is not likely to happen in practice. With this modified Hamiltonian, the
Minimum Principle (Thm. 2.4.2) generalizes as follows.

Theorem 2.5.1 (Minimum Principle for constraint final state). Consider (2.24) with stan-
dard cost (2.25) and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are all
continuous in x and u.
Suppose u ∗ : [0, T ] → U is a solution of the optimal control problem, and assume it is
bounded and piecewise continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Then
there is a function p ∗ : [0, T ] → Rn and a constant λ ∈ {0, 1} such that (λ∗ , p ∗ (t )) 6= (0, 0)∀t ∈
[0, T ], and
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ)
ẋ ∗ (t ) = , x ∗ (0) = x 0 , x i∗ (T ) = x̂ i , i = 1, . . . , r (2.27a)
∂p
∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ) ∂S(x ∗ (T ))
ṗ ∗ (t ) = − , p ∗i (T ) = , i = r + 1, . . . , n (2.27b)
∂x ∂x i
and along the solution x ∗ (t ), p ∗ (t ) the input u ∗ (t ) at every t ∈ [0, T ] where it is continuous
minimizes the modified Hamiltonian:

H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ) = min H (x ∗ (t ), p ∗ (t ), u, λ). (2.28)


u∈U

Example 2.5.2 (Singular optimal control – an abnormal case). Consider the scalar system

ẋ(t ) = u 2 (t ), x(0) = 0, x(1) = 0,

with U = R and cost


Z 1
J (x 0 , u(·)) = u(t ) dt .
0

It is clear that the only feasible control is the zero function. So the minimal cost is 0, and
x ∗ (t ) = u ∗ (t ) = 0 for all time.

47
The modified Hamiltonian is H (x, p, u, λ) = pu 2 +λu. If we try to solve the normal Hamil-
tonian equations (2.27), (2.28) (so for λ = 1), we find that the costate is constant and that u ∗ (t )
at every t minimizes p ∗ (t )u 2 + u. But the true optimal control u ∗ (t ) = 0 does not minimize
p ∗ (t )u 2 + u.
For the abnormal case, λ = 0, the Hamiltonian equations again says that p ∗ (t ) is constant
but now that u ∗ (t ) at every t minimizes p ∗ (t )u 2 . Clearly for every positive constant p ∗ (t ) the
true optimal control u ∗ (t ) = 0 minimizes p ∗ (t )u 2 . ä

One more “abnormal” case is discussed in Exercise 2.12. All other examples in the chapter
are normal.

Example 2.5.3 (Shortest path). In the previous chapter we solved the (trivial) shortest path
problem by formulating it as an example of the simplest problem in the calculus of variations.
We now formulate this as an optimal control problem with final constraint. Let x(·) be a curve
through the points x(0) = a and x(T ) = b and assume T > 0. The length of the curve is
Z T p
`(x(·)) = 1 + ẋ(t )2 dt .
0

We want to minimize this `(x(·)). This can be seen as an optimal control problem for the
system

ẋ(t ) = u(t ), x(0) = a, x(T ) = b

with cost
Z T p
J (x 0 , u(·)) = 1 + u 2 (t ) dt .
0

The modified Hamiltonian for this problem is


p
H (x, p, u, λ) = pu + λ 1 + u 2 .

If we apply Thm. 2.5.1, we find that p ∗ (·) is constant. We denote this constant as p̂. Substitu-
tion of λ = 1 in (2.28) and some rearrangements yield the following candidates for the optimal
input (verify this yourself)


−∞ if p̂ ≥ 1
 −p̂
u ∗ (t ) = p1−p̂ 2 if − 1 < p̂ < 1



∞ if p̂ ≤ −1.

We can strike off the first and the last candidates, because they clearly fail to achieve the final
constraint x(T ) = b. The second candidate says that u ∗ (t ) is some constant. With a constant
input u ∗ (t ) = u 0 the solution of the differential equation for x(t ) is x(t ) = t u 0 + a which is a
straight line. Because of the initial and final conditions it follows that v = (b − a)/T . Hence, as
expected,

b−a b−a
x ∗ (t ) = a + t, u ∗ (t ) = .
T T
p
The costate can be recovered from the fact that u ∗ (t ) = −p̂/ 1 − p̂ 2 . This gives

a −b
p ∗ (t ) = p .
T 2 + (b − a)2

48
It is interesting to compare this with the optimal cost (the optimal length of the curve)
p
`∗ (a, b) = T 2 + (b − a)2 .
(a,b)
Here we see that p ∗ (0) equals ∂`∗∂a . I.e. p ∗ (0) expresses how strongly the optimal cost
changes if we change a. Every costate has this property (we return to this in § 3.5). ä

Example 2.5.4. Consider the system with bounded derivative,

ẋ(t ) = u(t ), u(t ) ∈ [−1, 1]

and with both an initial and a final condition

x(0) = x(T ) = 0.

We want to determine the input u(·) that minimizes


Z T
J (u(·)) = x(t ) dt .
0

This roughly speaking says that we want x(t ) as small (negative) as possible, yet it needs to
start at zero, x(0) = 0, and needs to end up at zero again, x(T ) = 0. The Hamiltonian (with
λ = 1) is

H (x, p, u) = pu + x

and therefore the costate equations become

ṗ(t ) = −1.

Notice that since the state is fixed at final time, x(T ) = 0, there is no condition on the costate
at the final time. So, for now, all we know about the costate is that its derivative is −1, i.e.

c p(t )
p(t ) = c − t 0 c T

for some as yet unknown constant c. Given this p(t ) = c −t , the minimizer u ∗ (t ) of the Hamil-
tonian is
( 1
−1 if t < c
u ∗ (t ) = 0 c T
+1 if t > c
−1 u(t )

This function switches sign (from negative to positive) at t = c, and as a result the state x(t ) is
piecewise linear. First it goes down and then, from t = c onwards, it goes up,
(
−t if t < c c
x(t ) = 0 T
2t − c if t > c
x(t )

It will be clear that the only value of c for which x(T ) is zero, is

c = T /2.

This then completely settles the optimal control: on the first half [0, T /2] we have ẋ ∗ (t ) = −1
and on the second half [T /2, T ] we have ẋ ∗ (t ) = +1. The optimal cost is the integral of this
RT
x ∗ (t ). The optimal cost is J (0, u ∗ (·)) = 0 x ∗ (t ) dt = −T 2 /4. ä

49
2.6 Free final time
So far the final time T was fixed. Now we drop this assumption and optimize the cost over all
inputs as well as over all final times T ≥ 0. As always we assume a cost of the form
Z T
J (u(·), T ) = S(x(T )) + L(x(t ), u(t )) dt .
0

As we have one more degree of freedom, the Minimum Principle still holds but with one more
condition. This condition is quite elegant:

Theorem 2.6.1 (Minimum Principle with free final time). Consider a differential equation
ẋ(t ) = f (x(t ), u(t )) and assume that f (x, u) and ∂ f (x, u)/∂x and L(x, u) and ∂L(x, u)/∂x are
all continuous in x and u. Suppose time T∗ and input u ∗ : [0, T ] → U are a solution of the
optimal control problem with free final time, and assume the input is bounded and piecewise
continuous. Let x ∗ : [0, T ] → Rn be the resulting optimal state. Then there exist a function
p ∗ (·) and a constant λ ∈ {0, 1}, such that

ẋ ∗ (t ) = f (x ∗ (t ), u ∗ (t )), x ∗ (0) = x 0 , x i (T∗ ) = x̂ i , i = 1, . . . , r


∂H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ) ∂S(x ∗ (T∗ ))
ṗ ∗ (t ) = − , p ∗i (T∗ ) = , i = r + 1, . . . , n
∂x ∂x i
and along the solution x ∗ (t ), p ∗ (t ) the input u ∗ (t ) at every t ∈ [0, T ] where the input is con-
tinuous minimizes the modified Hamiltonian:

H (x ∗ (t ), p ∗ (t ), u ∗ (t ), λ) = min H (x ∗ (t ), p ∗ (t ), u, λ). (2.29)


u∈U

Moreover at the final time T∗ we have that

H (x ∗ (T∗ ), p ∗ (T∗ ), u ∗ (T∗ ), λ) = 0. (2.30)

Proof (sketch). We prove it for the case without final state constraints (i.e., r = 0 and λ = 1)
and we assume that u ∗ (t ) is continuous at t = T . For technical reasons we define u ∗ (t ) to
equal u ∗ (T ) for all t > T . Then the cost is differentiable with respect to time the final time T
and, hence, at the optimal time T∗ we necessarily have

dJ (x 0 , u ∗ (·), T∗ )
= 0.
dT
This derivative equals

dJ (x 0 , u ∗ (·), T∗ ) ∂S(x ∗ (T∗ ))


= ẋ ∗ (T∗ ) + L(x ∗ (T∗ ), u ∗ (T∗ ))
dT ∂x T
= p ∗T (T∗ ) f (x ∗ (T∗ ), u ∗ (T∗ )) + L(x ∗ (T∗ ), u ∗ (T∗ ))
= H (x ∗ (T∗ ), p ∗ (T∗ ), u ∗ (T∗ ), 1).

The extension to the case r > 0 is nontrivial. Detailed proofs can be found in Liberzon (2012).

The remarks about λ that were made after Thm. 2.5.1 also apply to this situation. Since
by (2.23) the Hamiltonian H (x, p, u) is constant along optimal trajectories we conclude
from (2.30) that actually the Hamiltonian H (x ∗ (t ), p ∗ (t ), u ∗ (t )) is identically zero for all time!
Now we can solve the classic problem of Zermelo.

50
(x1 , x2 ) = (a, b)

x2 →
u

S(x2 )

(x1 , x2 ) = (0, 0) x1 →

F IGURE 2.2: Problem of Zermelo. See Example 2.6.2

Example 2.6.2 (Zermelo). We want a boat to cross a river in minimal time. The water in the
river flows with a speed that depends on the distance to the banks. We denote the speed of
the boat with respect to the water by v and we assume it is a positive constant. The point
of departure of the boat (x 1 (0), x 2 (0)) = (0, 0) and the arrival point (x 1 (T ), x 2 (T )) = (a, b) are
given. The equations of motion of the boat are

ẋ 1 (t ) = v cos(u(t )) + W (x 2 (t )), x 1 (0) = 0, x 1 (T ) = a


ẋ 2 (t ) = v sin(u(t )), x 2 (0) = 0, x 2 (T ) = b, (2.31)

where W (x 2 ) is the flow speed of the river at x 2 , and u is the angle between the boat’s principal
axis and the x 1 -axis, see Fig. 2.2. We take the minimal time cost
Z T
J (x 0 , u(·), T ) = dt = T.
0

With this cost, the Hamiltonian is given by (assuming we may take λ = 1)

H (x, p, u) = p 1 v cos(u) + p 1W (x 2 ) + p 2 v sin(u) + 1.


∂H
This Hamiltonian can be easily minimized by setting ∂u equal to zero. We then find

−p 1 (t )v sin(u(t )) + p 2 (t )v cos(u(t )) = 0

and therefore, if p 1∗ (t ) 6= 0

p 2∗ (t ) p 2∗ (t )
tan (u ∗ (t )) = = . (2.32)
p 1∗ (t ) p 1∗ (T )

The equation for the costate becomes

ṗ ∗1 (t ) = 0
∂W (x 2∗ (t )) (2.33)
ṗ 2∗ (t ) = −p 1∗ (t ) .
∂x 2

Finally, the condition (2.30) of optimality of final time implies as above that

0 = H (x ∗ (t ), p ∗ (t ), u ∗ (t ))
= p 1∗ (t )v cos(u ∗ (t )) + p 1∗ (t )s(x ∗ (t )) + p 2∗ (t )v sin(u ∗ (t )) + 1. (2.34)

In principle the equations (2.31)–(2.32) and (2.34) can be explicitly solved.


A much simpler situation arises if we assume that the flow speed is constant, so that
W (x 2 ) = w 0 . From (2.33) it then follows that p 2∗ (·) is constant, and using (2.32) we find in

51
addition that u ∗ (·) is constant. For constant inputs u ∗ (·) and W (x 2 ), Equation (2.31) can be
solved directly. We denote this constant input as u 0 (·) and so

x 1 (t ) = (v cos(u 0 ) + w 0 )t
x 2 (t ) = (v sin(u 0 ))t .

Also, the state at the final time must satisfy x 1 (T ) = a and x 2 (T ) = b. This yields the following
system of equations in the unknowns u 0 and T :

a = (v cos(u 0 ) + w 0 )T
b = (v sin(u 0 )))T.

I.e.,
a − w0T
cos(u 0 ) =
Tv
(2.35)
b
sin(u 0 ) = .
Tv
Squaring and adding both equations yields

T 2 (v 2 − w 02 ) + 2w 0 aT − a 2 − b 2 = 0.

If we assume that v > w 0 , then this equation has exactly one positive solution T . Then u 0
follows from (2.35). ä

Example 2.6.3 (Bang-bang control). This is an elegant and classic application. We want to
steer a car into a parking spot and we want to do it in minimal time, and of course, at the
precise moment that we reach the spot our speed should be zero. To keep things manageable
we assume that we can steer the car in one dimension only (like a cart on a rail). The position
of the car is denoted x 1 (t ) and its speed as x 2 (t ). The acceleration u(t ) is bounded, say u(t ) ∈
[−1, 1]. The state equations thus are

ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = u(t ), u(t ) ∈ [−1, 1].

Let us assume that the desired (parking) state is the origin x 1 (0) = 0, x 2 (0) = 0, and that at time
zero we are at some other state x 1 (0), x 2 (0). The cost to be minimized is time T , so
Z T
J (x 1 (0), x 2 (0), u(·)) = 1 dt .
0

That way the normal Hamiltonian becomes

H (x, p, u) = p 1 x 2 + p 2 u + 1.

From the Hamiltonian equations we can derive the co-state equations,

ṗ 1 (t ) = 0
ṗ 2 (t ) = −p 1 (t ).

Since both components of the final state x(T ) are fixed, the final constraints on both com-
ponents of the co-state are absent. Therefore in principle every constant p 1 is allowed and,
therefore, every linear function p 2 (t ):

p 2 (t ) = at + b.

52
For the optimal input we can not have a = b = 0 because that contradicts the fact that
H (x, p, u) = p 1 x 2 +p 2 u+1 is zero. Consequently, the second co-state entry p 2 (t ) is not the zero
function. This, in turn, implies that p 2 (t ) can switch sign at most once. Why is this important?
Well, the optimal u ∗ (t ) minimizes the Hamiltonian p 1 x 2 + p 2 u + 1 and since u ∗ (t ) ∈ [−1, 1] we
have

u ∗ (t ) = − sgn(p 2 (t )).

This is well defined because p 2 (t ) is nontrivial, and as p 2 (t ) switches sign at most once, also

u ∗ (t ) switches sign at most once.

Let t s be the moment of switching. Then, by definition, the input for t > t s does not switch
any more and so is either +1 throughout or −1 throughout. Now for u = +1 it is easy to see
that the solutions (x 1 (t ), x 2 (t )) are the shifted parabolas:
x2

u = +1 : x1

Likewise, if u = −1 then all possible (x 1 (t ), x 2 (t )) are the shifted “reversed” parabolas


x2

u = −1 : x1

Since on [t s , T ] the input does not change and since x(T ) = (0, 0) it must be that on [t s , T ] the
state is either this red or blue parabola:
x2

u = −1

x1

u = +1

53
These two are the only two parabolas that end up the desired final state x(T ) = (0, 0). Before
the switch time the input u(t ) had the opposite sign. For instance if after the switch we have
u = +1 (the red orbit) then before we the switch we have u = −1 e.g. any of the gray parabolas
. These have to end up at the red parabola. Inspection shows that the orbits are any of these:
x2

u = −1

x1

u = +1

Before the switch the orbit follows the gray parabola and then, after the moment of switching
it follows the red or blue parabola. This settles the problem for every initial state (x 1 (0), x 2 (0)).
ä

2.7 Exercises
2.1 Consider the scalar system (compare with Exercise 3.3)

ẋ(t ) = x(t )u(t ), x(0) = x 0 = 1.


RT
with cost function J [0,T ] (x 0 , u(·)) = 2x(T ) + 0 x 2 (t ) + u 2 (t ) dt . The input set is U = R.

(a) Determine the Hamiltonian H (x, p, u) and the differential equation for the costate.
(b) Determine the optimal input u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
(c) Show that the H (x ∗ (T ), p ∗ (T ), u ∗ (T )) is zero.
(d) Determine p ∗ (t ).
(e) Determine the optimal u(t ) as a function of x(t ) and then calculate the optimal
state trajectory for T = 2. [Hint: see Example B.1.5.]

2.2 Consider the following (scalar) system

ẋ(t ) = u(t ), x(0) = x 0


RT
with cost function J [0,T ] (x 0 , u(·)) = 0 x 2 (t ) dt . We want to minimize these cost under
the extra condition that 0 ≤ u(t ) ≤ 1.

(a) Give the Hamiltonian and the differential equation for the costate.
(b) Prove that from Pontryagin’s Minimum Principle it follows that u ∗ (t ) (generically)
assumes only two values.
(c) Prove that if x 0 > 0, then x ∗ (t ) > 0 for all t ∈ [0, T ].
(d) Prove that p ∗ (t ) under the conditions stated in c. has at most one change of sign.
What does this mean for u ∗ (t )?
(e) Solve the optimization problem for x 0 > 0. Also give the solution for p ∗ (t ).

54
2.3 A point mass attached to a spring with (positive) spring-constant k, displacement from
the equilibrium x 1 (t ) and velocity x 2 (t ) is subjected to an external force u(t ). The equa-
tions of motion are

ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = −kx 1 (t ) + u(t ).

The force u is subject to the constraint |u(t )| ≤ 1 for all t .

(a) Show that


p without external force, the mass follows a harmonic motion with period
T = 2π/ k.
p
(b) The system has to be controlled in such a way that after a time T = 2π/ k the
potential energy k2 x 12 (T ) of the mass is maximal. Formulate the associated optimal
control problem.
Hint: L = 0. What is S(x)?
(c) Derive the equations for the costate. Show that the optimal control only depends
on the second component of the costate. How are the control and this component
connected?
(d) Derive, by elimination of the first component, a differential equation for the sec-
ond component of the costate. Solve this using the associated constraints. Derive
from this the behavior of the optimal control.
(e) Calculate the optimal state as a function of the time for x 1 (0) = x 2 (0) = 0. What is
the potential energy of the optimal system at the final time?

2.4 The second order system

ÿ(t ) + y(t ) = u(t ), y(0) = y 0 , ẏ(0) = v 0

uses u(t ) as control. Here, u(t ) and y(t ) are scalar quantities. The cost is given by
Z T
1
J [0,T ] ((y 0 , v 0 ), u(·)) = u(t )2 dt .
2 0

Determine the optimal control that drives the system from the initial state y(0) = y 0 ,
ẏ(0) = v 0 to the final state y(T ) = ẏ(T ) = 0.

2.5 Consider the differential equation describing a pendulum without damping:

m`2 φ̈(t ) + g m` sin(φ(t )) = u(t ),

where φ(t ) is the angle with respect to the stable equilibrium state, u(t ) is a torque
exerted around the suspension point.
The objective is to minimize the cost
Z T
2 2
J [0,T ] (x 0 , u(·)) = m` φ̇ (T ) − 2mg ` cos(φ(T )) + φ̇2 (t ) + u 2 (t ) dt .
0
£ x1 ¤ £φ¤
We introduce the notation x = x2 = φ̇
.

(a) Determine the state differential equation ẋ(t ) = f (x(t ), u(t )).
(b) Determine the Hamiltonian H (x, p, u) and the differential equation for the costate.
RT
(c) Calculate 0 φ̇(t )u(t ) dt . What do you see?

55
−c ṁ(t )
y

g m(t )
F IGURE 2.3: Soft landing on the Moon

(d) (Difficult) Give an expression in terms of φ∗ (t ) and φ̇∗ (t ) for the optimal control.

2.6 Soft landing on the Moon. By thrusting out gasses with a constant velocity c (but vari-
able quantities), a lunar ship with mass m(t ) is subjected to an upward force −c ṁ(t )
(note: ṁ(t ) ≤ 0). See Fig. 2.3. Also, a gravity −g m(t ) works on the ship. The altitude
y(t ) of the ship satisfies the differential equation

m(t ) ÿ(t ) + g m(t ) + c ṁ(t ) = 0.

The objective is to determine the final time T > 0 such that the lunar ship makes a soft
landing, and such that the use of fuel is minimized. Fuel use is subject to an additional
restriction: −1 ≤ ṁ(t ) ≤ 0. With the state variables x 1 (t ) = y(t ), x 2 (t ) = ẏ(t ), x 3 (t ) = m(t )
and the input variable u(t ) = −ṁ(t ) we rewrite the problem as follows:

ẋ 1 (t ) = x 2 (t ), x 1 (0) = x 0 , x 1 (T ) = 0
u(t )
ẋ 2 (t ) = c −g, x 2 (0) = ẋ 0 , x 2 (T ) = 0
x 3 (t )
ẋ 3 (t ) = −u(t ), x 3 (0) = M 0

and
Z T
J [0,T ] (x 0 , u(·)) = u(t ) dt , U = [0, 1].
0

(a) Explain these equations, particularly J (·), x 1 (T ) and x 2 (T ).


(b) Explain, in terms of physics, why x 1 (0) > 0 and x 3 (t ) > 0 on [0, T ].
(c) Determine the Hamilton function and the differential equation for the adjoint
variable p(t ) = (p 1 (t ), p 2 (t ), p 3 (t ))T .
(d) Define ρ(t ) as

p 2 (t )
ρ(t ) = c − p 3 (t ) + 1.
x 3 (t )
−c p 1 (T )
Prove that ρ̇(t ) = x 3 (t ) , and give the conditions for the optimal control in terms
of ρ(t ).
(e) Conclude from (d) that u ∗ (t ) is of the following form:

56
i. u ∗ (t ) = 0, 0 ≤ t ≤ T∗ , or
ii. u ∗ (t ) = 1, 0 ≤ t ≤ t 1 , u ∗ (t ) = 0, t 1 < t ≤ T∗ , or
iii. u ∗ (t ) = 0, 0 ≤ t ≤ t 1 , u ∗ (t ) = 1, t 1 < t ≤ T∗ , or
iv. u ∗ (t ) = 1, 0 ≤ t ≤ T∗ .
(f) Prove that i. and ii. are not possible.
∗2 p (t )
(g) What is the relation between ρ ∗ (T∗ ), p ∗1 (0) and p ∗2 (0)? Here, ρ ∗ (t ) = c x∗3 (t ) −
p ∗3 (t ) + 1.

2.7 We want to move a mass in 2 seconds, beginning and ending with zero speed, using
bounded acceleration. With x 1 its position and x 2 its speed, a model for this problem is

ẋ 1 (t ) = x 2 (t ), x 1 (0) = 0
ẋ 2 (t ) = u(t ), x 2 (0) = 0, x 2 (2) = 0.

Here u(t ) is the acceleration which we take to be bounded in magnitude by one:

u(t ) ∈ [−1, 1]

for all t . We aim to maximize the traveled distance x 1 (T ) at final time T = 2.

(a) Determine the Hamiltonian H (x, p, u).


(b) Determine the Hamiltonian equations in x(t ) and p(t ) as used in Pontryagin’s
Minimum Principle, including all initial and final conditions.
(c) Determine the general solution of the costate p(t ) for t ∈ [0, T ].
(d) Determine the optimal input u(t ) for t ∈ [0, T ] and compute the maximal distance
x 1 (T ).

2.8 Initial and final constraints. Consider the system

ẋ(t ) = x(t )(1 − u(t )), x(0) = 1, x(1) = 4 e

with cost
Z 1
J (x 0 , u(·)) = − ln(x(t )u(t )) dt .
0

Since x(0) > 0 we have that x(t ) ≥ 0 for all t . For a well-defined cost we hence need
u(t ) ∈ [0, ∞) but for the moment we allow any u(t ) ∈ R and later verify that the optimal
u ∗ (t ) is in fact > 0.

(a) Determine the Hamiltonian


(b) Determine the Hamiltonian equations (2.12).
(c) Show that u(t ) = −1/(p(t )x(t )) is the candidate optimal control.
(d) Substitute this u into the Hamiltonian equations and solve for p ∗ (t ) and then x ∗ (t )
and subsequently u ∗ (t ).
(e) Is u ∗ (t ) > 0 for all t ∈ [0, 1]?

57
2.9 Consider an economy consisting of two sectors where Sector 1 produces investment
goods and Sector 2 produces consumption goods. Let x i (t ), i = 1, 2 represent the pro-
duction in the i -th sector at time t and let u(t ) be the fraction of investments allocated
to Sector 1. Suppose the dynamics of the x i (t ) are given by

ẋ 1 (t ) = au(t )x 1 (t )
ẋ 2 (t ) = a(1 − u(t ))x 1 (t )

where a is a positive constant. Hence, the increase in production per unit of time in
each sector is assumed to be proportional to the investment allocated to the sector. By
definition we have

0 ≤ u(t ) ≤ 1, t ∈ [0, T ] (2.36)

where [0, T ] denotes the planning period. As optimal control problem we may consider
the problem of maximizing the total consumption in the given planning period [0, T ],
thus our problem to maximize
Z T
J (x 0 , u(·)) = x 2 (t ) dt (2.37)
0

subject to

x 1 (0) = x 10 , x 1 (T ) = free,
x 2 (0) = x 20 , x 2 (T ) = free,

with x 10 > 0, x 20 ≥ 0.

(a) Argue that x 1 (t ) > 0 for all time.


(b) Determine an optimal input using Pontryagin’s Minimum Principle.

2.10 Consider the second order system with mixed initial and final conditions

ẋ 1 (t ) = u(t ), x 1 (0) = 0, x 1 (1) = 1


ẋ 2 (t ) = 1, x 2 (0) = 0.

and with cost


Z 1
J (x 0 , u(·)) := u 2 (t ) + 12x 2 (t )x 1 (t ) dt
0

The input u : [0, 1] → R is not restricted, i.e. u(t ) can take on any real value.

(a) Determine the Hamiltonian for this problem.


(b) Determine the differential equations for state x(t ) and costate p(t ), including the
boundary conditions.
(c) Express the candidate minimizing u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
(d) Solve the equations for x ∗ (t ), p ∗ (t ), u ∗ (t ) (that is, determine x ∗ (t ), p ∗ (t ), u ∗ (t ) as
explicit functions of time t ∈ [0, 1]).

58
2.11 Consider the second order system with mixed initial and final conditions

ẋ 1 (t ) = u(t ), x 1 (0) = 0, x 1 (1) = 2


ẋ 2 (t ) = 1, x 2 (0) = 0.

and with cost


Z 1
J (u(·)) := u 2 (t ) + 4x 2 (t )u(t ) dt .
0

The input u : [0, 1] → R is not restricted, i.e. u(t ) can take on any real value.

(a) Determine the Hamiltonian for this problem.


(b) Determine the differential equations for state x(t ) and costate p(t ), including the
boundary conditions.
(c) Express the candidate minimizing u ∗ (t ) as a function of x ∗ (t ), p ∗ (t ).
(d) Solve the equations for x ∗ (t ), p ∗ (t ), u ∗ (t ) (that is, determine x ∗ (t ), p ∗ (t ), u ∗ (t ) as
explicit functions of time t ∈ [0, 1]).

2.12 Integral constraints. Let us go back to the calculus of variations problem of minimizing
Z T
F (x(t ), ẋ(t )) dt
0

over all functions x : [0, T ] → Rn that satisfy an integral constraint


Z T
M (x(t ), ẋ(t )) dt = c 0 .
0

Thm. 1.8.1 says that at the optimal solution either (1.65) holds for some µ∗ ∈ R or
that (1.66) holds. This problem can also be cast as an optimal control problem with
a final condition, and then Thm. 2.5.1 gives us the same two conditions (depending
whether the Hamiltonian is normal or abnormal):

(a) Let ẋ(t ) = u(t ) and define ż n+1 (t ) = M (x(t ), u(t )) and z = (x, z n+1 ) ∈ Rn+1 . Formu-
late the above calculus of variations problem as an optimal control problem with
a final state condition in state z and with U = Rn . (I.e. express f (z), L(z, u), S(z) in
terms of F (x, ẋ), M (x, ẋ), c 0 .)
(b) Since z = (x, z n+1 ) has n + 1 components also the corresponding costate p has n +
1 components. Show that p n+1 (t ) is constant for both the normal Hamiltonian
H (x, p, u, 1) and abnormal Hamiltonian H (x, p, u, 0).
(c) For the normal Hamiltonian H (x, p, u, 1), show that the existence of a solution of
the Hamiltonian equations (2.27) and (2.28) imply that (1.65) holds for µ∗ = p n+1 .
(d) For the abnormal Hamiltonian H (x, p, u, 0), show that the existence of a solution
of the Hamiltonian equations (2.27) and (2.28) with p n+1 6= 0 implies that (1.66)
holds.

2.13 Time-varying cost. Suppose ẋ(t ) = −x(t ) + u(t ) and that x(0) = 1, x(1) = 0. Minimize
R 1/2 2 R1 2
0 u (t ) dt + 1/2 2u (t ) dt .

59
60
Chapter 3

Dynamic Programming

3.1 Introduction
In the late fifties of the previous century, at the time that the Minimum Principle was de-
veloped in the Sovjet Union, a team in the USA developed an entirely different approach to
optimal control, called Dynamic Programming. In this chapter we deal with Dynamic Pro-
gramming. As in the previous chapter, we assume that the state satisfies a differential equa-
tion

ẋ(t ) = f (x(t ), u(t )) (3.1a)

in which x : [0, T ] → Rn , and that the input u(t ) at each moment in time takes values in a
possibly limited set U:

u : [0, T ] → U. (3.1b)

As before, we associate with system (3.31a) a cost over a finite time horizon [0, T ] of the form
Z T
J [0,T ] (x 0 , u(·)) := L(x(t ), u(t )) dt + S(x(T )). (3.1c)
0

The cost depends on the initial condition x(0) = x 0 and the input u(·). In this chapter it is also
convenient to emphasise the dependence of this cost on the time interval [0, T ]. The final
time T and the functions S : Rn → R and L : Rn × U → R are assumed given.
The crux of Dynamic Programming is to associate with this one cost over time horizon
[0, T ] a whole family of costs over subsets of this time horizon,
Z T
J [τ,T ] (z, u(·)) := L(x(t ), u(t )) dt + S(x(T )), (3.2)
τ

for each initial time τ ∈ [0, T ] and for each initial state x(τ) = z, and then to establish a dy-
namic relation between these costs (hence the name dynamic programming). On the one
hand this complicates the problem because we will need to solve many optimal control prob-
lems, but it generates structure and much insight and, as we will see, it produces sufficient
conditions for optimality.

3.2 Principle of optimality


The principle of optimality is a simple yet powerful result in optimal control. Roughly speak-
ing it says that every tail of an optimal control is optimal. We formalize this result. Figure 3.1

61
u∗ (t )
û(t )

0 τ T

F IGURE 3.1: Principle of optimality

should be instructive here. It depicts an optimal control u ∗ (·) on [0, T ] and an alternative
input û(·) on a restricted time window [τ, T ] for some τ. The optimal control u ∗ (·) steers the
state from x(0) = x 0 to some value x ∗ (τ) at t = τ. Is it possible that the alternative û(·) achieves
a smaller cost-to-go J [τ,T ] (x ∗ (τ), u(·)) over the remaining window [τ, T ] than u ∗ (·)? That is, is
it possible that

J [τ,T ] (x ∗ (τ), û(·)) < J [τ,T ] (x ∗ (τ), u ∗ (·))?

No, because if it would then the new input ũ(·) constructed from u ∗ (·) over the initial [0, τ]
and û(·) over the remaining [τ, T ] would improve on u ∗ (·) over the entire horizon:
Z τ
J [0,T ] (x 0 , ũ(·)) = L(x(t ), ũ(t )) dt + J [τ,T ] (x(τ), ũ(·))
Z0 τ
= L(x ∗ (t ), u ∗ (t )) dt + J [τ,T ] (x ∗ (τ), û(·))
Z0 τ
< L(x ∗ (t ), u ∗ (t )) dt + J [τ,T ] (x ∗ (τ), u ∗ (·)) = J [0,T ] (x 0 , u ∗ (·))
0

and this contradicts the assumed optimality of u ∗ (·). Summary: if u ∗ (·) is optimal for
J [0,T ] (x 0 , u(·)) then for every τ ∈ [0, T ] it is optimal for J [τ,T ] (x ∗ (τ), u(·)) as well. That is the
principle of optimality. It will be of great help in the analysis to come.

3.3 Discrete-time Dynamic Programming


The main idea of Dynamic Programming and the reason of its popularity is explained best for
systems that evolve over discrete time – as opposed to the systems that evolve over continuous
time which we normally consider in this book. Thus, for the time being, consider a discrete-
time system

x t +1 = f (x t , u t ), (3.3)

on some discrete finite time horizon

t ∈ {0, 1, . . . T − 1},

with x 0 given and T a given positive integer. We want to find a control sequence
(u ∗0 , . . . , u ∗T −1 ), called optimal control (sequence) and resulting state sequence (x ∗0 , x ∗1 , . . . , x ∗T )
that minimizes a cost of the form
TX
−1
J [0,T ] (x 0 , u 0 , . . . , u T −1 ) = L(x t , u t ) + S(x T ). (3.4)
t =0

Incidentally, in discrete-time systems there is no need to restrict the state space X to some set
on which derivatives are defined, like our default Rn . Indeed, the state space in applications
is often a finite set, for example the standard alphabet, X = {a, b, c, . . . , z}. The same is true for
the input set U. In what follows, the number of elements of a set X is denoted as |X|.

62
2
1
3
0
4
6
5

F IGURE 3.2: Discrete-time system with 7 states

Example 3.3.1 (Naive optimization). Suppose the state space X consists of the 7 integer ele-
ments

X = {0, 2, . . . , 6}.

Align the states in a circle (Fig. 3.2) and suppose that at each moment in time the state can
either move one step counter-clockwise, or stay where it is. Thus at each moment in time we
have a choice of two. The input space U then has two elements. If we take

U = {0, 1}

then the transition from one state to the next is modeled by the discrete system

x t +1 = x t + u t , u t ∈ U, t ∈ {0, 1, . . . , T − 1}

(counting modulo 7, so 6 + 1 = 0). Each transition from one state x t to the next x t +1 is as-
sumed to cost a certain amount L(x t , u t ) and the final state x T at time T costs an additional
S(x T ). The total cost hence is (3.4). The naive approach to determine the optimal control
{u 0 , . . . , u T −1 } and resulting optimal state sequence {x 1 , . . . , x T } is to just explore them all and
pick the best. As we can move in two different ways each moment in time, this naive ap-
proach would require 2T sequences (x 1 , . . . , x T ) to explore. Since each sequence has length T
the evaluation of the cost for each sequence is (roughly) linear in T , and therefore the total
number of operations required in this naive approach is of order

T × 2T .

It is not hard to see that for arbitrary systems (3.3) the total number of operations that the
naive approach requires is of order

T × | U| T .

It is exponential in T .
In Dynamic Programming we solve the minimization backwards in time. This may at first
sight seem to complicate the analysis, but it allows us to exploit the principle of optimality.
The following example explains it all.

Example 3.3.2 (Backwards in time). Continue with the system of Example 3.3.1,

x t +1 = x t + u t , u t ∈ {0, 1}, t ∈ {0, 1, . . . , T − 1}

63
with x t ∈ {0, 1, . . . , 6} and to make it more explicit, assume that the final cost is x 2 and that each
counter-clockwise move costs 1, i.e.

S(x) = x 2 and L(x, u) = u ∈ U := {0, 1}.

This system over the given time horizon we now visualize as

x =6

x =1
x =0
t =0

t =1

t = T −1

t =T

The horizontal axis represents time t = 0, 1, . . . , T and the vertical axis represents the states x =
0, 1, . . . , 6. Vertices (dots) denote pairs (t , x) and lines (edges) represent possible transitions.
For instance the line connecting (t , x) = (0, 6) with (t , x) = (1, 0) says that we can move from
x = 6 to x = 0 in one time step.
Let us first figure out the cost of the final state, x T . Since we do not know in which final
state we end up, we have to determine this cost for every element of the state space. This cost
we denote as VT (x) and clearly this is simply the final cost VT (x) = S(x) = x 2 , so:

x =6 36

25

16

x =1 1

x =0 0
0 1 T −1 T

Now that the cost Vt (x) at the final t = T is known, consider the optimal cost-to-go from
t = T −1 onwards. This cost is denoted VT −1 (x) and since, again, we do not know which states
can be reached, we have to compute this cost for every x of the state space. This optimal cost-
to-go VT −1 (x) is by definition the smallest possible cost that we can achieve if at time t = T −1
we are at state x. This equals

VT −1 (x) = min (L(x, u) + VT ( f (x, u)))


u∈{0,1}

because L(x, u) is the cost of the transition if we apply input u and VT ( f (x, u)) is the final cost
(because f (x, u) is the state we end up in if we apply u). With VT already established this
minimization requires at each state |U| = 2 inputs to explore and since we have to perform
the minimization for every state in X = {0, 1, . . . , 6}, the total number of operations that this

64
requires is of order |X| × |U|. The numbers inside the circles are VT (x) and VT −1 (x):

x =6 1 36

25 25

16 16

9 9

4 4

x =1 1 1

x =0 0 0
0 1 T −1 T

Along the way we also determined an optimal input u T −1 (denoted in the figure by the thick
edges). Notice that none of the states x at time T − 1 switches to x = 6 at time T . We can
continue in this fashion and determine backwards in time, for each t = T − 2, T − 3, . . . , 0, the
cost-to-go from t onwards, via the rule

Vt (x) = min (L(x, u) + Vt +1 ( f (x, u))).


u∈{0,1}

As before, this equation says that the cost-to-go from t onwards starting at x t = x, is the cost
of the transition L(x, u) plus the optimal cost-to-go from t + 1 onwards. Eventually we end up
at t = 0 with this solution
x =6 1 1 1 1 1 36

x =5 2 2 2 2 25 25

x =4 3 3 3 16 16 16

x =3 4 4 9 9 9 9

x =2 4 4 4 4 4 4

x =1 1 1 1 1 1 1

x =0 0 0 0 0 0 0
0 1 T −1 T

The optimal control sequence in this example is actually not unique. The above indicates all
possible optimal solutions. The optimal cost Vt (x) of course is unique. Now the problem is
solved, in fact, it is solved for every initial condition x 0 . For x 0 = 6 we see that one optimal
input sequence is u ∗ = (1, 0, 0, 0, 0) while for x 0 = 5 one optimal input is u ∗ = (1, 1, 0, 0, 0). ä

In Dynamic Programming the game is to compute the optimal cost-to-go via the recursion

¡ ¡ ¢¢
Vt (x) = min L(x, u) + Vt +1 f (t , u) (3.5)
u∈U

starting at the final time, t = T , where the problem is trivial, and then subsequently going
backwards in time, t = T − 1, t = T − 2, . . . until we reach t = 0. To determine the final cost
VT (z) = S(x) for all x ∈ X requires order |X| operations. Then determining VT −1 (x) for all x ∈ X
requires |X| times the number of inputs |U| to explore, et cetera, and so the total number of
operations for all t ∈ {0, 1, . . . , T } is of order

T × |U| × |X|.

If the number of states is modest or if T is large, then this typically outperforms the naive ap-
proach (which requires order T ×|U|T operations). Equation (3.5) is called Bellman’s equation
of Dynamic Programming.

65
In continuous time the same basic idea survives, except for the results regarding its com-
putational complexity. In the continuous time case the optimization is over a set of input
functions on the time interval [0, T ], which is an infinite-dimensional space. Furthermore,
it is clear that contrary to the discrete-time case we will not be able to completely split the
problem into a series of finite-dimensional minimization problems.

3.4 Hamilton-Jacobi-Bellman equation


The idea of Dynamic Programming is to minimize all costs J [τ,T ] (z, u(·)) – for every τ ∈ [0, T ]
and every state z – and not just the one cost J [0,T ] (x 0 , u(·)) that we are asked to find. As in
the discrete-time case the principle of optimality motivates to define the value function, also
known as the optimal cost-to-go:

Definition 3.4.1 (Value function or cost-to-go). The value function V : Rn ×[0, T ] → R at state
z and time τ is defined as the optimal cost-to-go over time horizon [τ, T ] with initial state
x(τ) = z, that is,

V (z, τ) = inf J [τ,T ] (z, u(·)) (3.6)


u:[τ,T ]→U

with J [τ,T ] (·) as defined in (3.2) and x(τ) = z. ä

In most cases of interest the infimum in (3.6) is attained by some u ∗ (·), in which case the
infimum (3.6) is a minimum. In general, though, a minimizer need not exist but the infimum
always does exist (it might be −∞).

Example 3.4.2 (Integrator with linear cost). Consider once again the integrator from the sys-
tem of Example 2.4.3,

ẋ(t ) = u(t ), x(0) = x 0 , (3.7)

with bounded inputs

U = [−1, 1]

and with cost


Z T
J [0,1] (x 0 , u(·)) = x(t ) dt .
0

From (3.7) and the fact that ẋ(t ) = u(t ) ∈ [−1, 1] it is immediate that the optimal control is
u ∗ (t ) = −1 and, hence, x(t ) = x 0 − t . Then the value function at τ = 0 is
Z T
V (x 0 , 0) = J [0,1] (x 0 , u ∗ (·)) = x 0 − t dt = x 0 − T /2.
0

Next we determine value function at the other time instances. Analogously to the previous
situation, it is easy to see u ∗ (t ) = −1 is optimal for J [τ,T ] (z, u(·)) for every τ > 0 and every
x(τ) = z. Hence x ∗ (t ) = z − (t − τ) and
Z T h iT
V (z, τ) = z − (t − τ) dt = zt − 12 (t − τ)2 = z(T − τ) − 12 (T − τ)2 . (3.8)
τ τ

As expected, the value function is zero at the final time τ = T . It is not necessarily monotonic
in τ, see Fig. 3.3. Indeed for z = T /2 the value function is zero at τ = 0 and τ = T yet positive
in between. ä

66
1
V (1.5, τ)

0.5 V (1, τ)

V (.5, τ)

0 T =1 τ→
V (0, τ)

−0.5
V (−.5, τ)

−1

F IGURE 3.3: The value function V (z, τ) of the problem of Example 3.4.2 for various z as a func-
tion of τ ∈ [0, T ]. The plot assumes T = 1

For any input u(·) – optimal or not – the cost-to-go from τ onwards equals the cost over
[τ, τ + ²] plus the cost over the remaining [τ + ², T ], that is
Z τ+²
J [τ,T ] (z, u(·)) = L(x(t ), u(t )) dt + J [τ+²,T ] (x(τ + ²), u(·)) (3.9)
τ

with initial state x(τ) = z. The value function is defined as the infimum of this cost over all
admissible inputs, hence taking the infimum over u(·) of the left and right-hand side of (3.9)
shows that
³Z τ+² ´
V (z, τ) = inf L(x(t ), u(t )) dt + J [τ+²,T ] (x(τ + ²), u(·)) .
u:[τ,T ]→U τ

Now by the principle of optimality, any optimal control over [τ, T ] is optimal for J [t +²,T ] (x(t +
²), u(·)) as well, so the cost equals the value function. The right-hand side of the above equal-
ity can thus be simplified to
³ Z τ+² ´
V (z, τ) = min L(x(t ), u(t )) dt + V (x(τ + ²), τ + ²)
u:[τ,τ+²]→U τ

with initial condition x(τ) = z. Notice that in this last equation we need only optimize over
inputs on the time window [τ, τ+²] because optimization over the remaining time window [τ+
², T ] is incorporated in the value function V (x(τ + ²), τ + ²). For further analysis it is beneficial
to move the V (z, τ) to the right-hand side and to scale the entire equation by 1/²,
R τ+²
τ L(x(t ), u(t )) dt + V (x(τ + ²), τ + ²) − V (z, τ)
0= min .
u:[τ,τ+²]→U ²
In this form we can take the limit ² → 0. It is plausible that functions u : [τ, τ + ²] → U in the
limit can be identified with constants u ∈ U and that the difference of the above two value
functions converges to the total derivative with respect to τ. This gives
µ ¶
dV (x(τ), τ)
0 = min L(x(τ), u) + (3.10)
u∈U dτ

for all τ ∈ [0, T ] and all x(τ) = z ∈ Rn . Incidentally this identity is reminiscent of the cost-to-
go (B.16) explained in Section B.5 of Appendix B. The total derivative of V (x(τ), τ) with respect

67
to τ is
dV (x(τ), τ) ∂V (x(τ), τ) ∂V (x(τ), τ)
= f (x(τ), u(τ)) + .
dτ ∂x T ∂τ
Inserting this into (3.10) and using u := u(τ), x := x(τ) we arrive at a partial differential equa-
tion in V (x, τ):
µ ¶
∂V (x, τ) ∂V (x, τ)
0 = min L(x, u) + f (x, u) +
u∈U ∂x T ∂τ
for all τ ∈ [0, T ] and all x ∈ Rn . The partial derivative of V (x, τ) with respect to τ does not
depend on u and so does not contribute to the minimization. This, finally, brings us to the
famous equation
µ ¶
∂V (x, τ) ∂V (x, τ)
+ min f (x, u) + L(x, u) = 0. (3.11)
∂τ u∈U ∂x T
Ready. This equation is known as the Hamilton-Jacobi-Bellman equation, or just HJB equa-
tion.
What did we do so far? We made it plausible that the relation between the value functions
at neighboring points in state x and time τ is the partial differential equation (3.11). We need
to stress here the word “plausible”, because we have “derived” (3.11) only under several tech-
nical assumptions including existence of an optimal control, existence of a value functions
and existence of some limits1 . However, we can turn the analysis around, and obtain a suffi-
cient condition for optimality. This is the following theorem and it is the central result of this
chapter. In this formulation the time τ is called t again.

Theorem 3.4.3 (Hamilton-Jacobi-Bellman). Consider the optimal control problem (3.1).


Suppose V : Rn × [0, T ] → R is a continuously differentiable function that satisfies the partial
differential equation
µ ¶
∂V (x, t ) ∂V (x, t )
+ min f (x, u) + L(x, u) = 0 (3.12)
∂t u∈U ∂x T
for all x ∈ Rn and all t ∈ [0, T ], and final time condition

V (x, T ) = S(x) (3.13)

for all x ∈ Rn . Then

1. For any admissible input u(·), the value of V (z, τ) is a lower bound of the cost over [τ, T ]
starting at x(τ) = z:

J [τ,T ] (z, u(·)) ≥ V (z, τ).

2. If there is a function u ∗ : [0, T ] → U for which the solution x ∗ (t ) of ẋ(t ) = f (x(t ), u ∗ (t ))


with x(0) = x 0 is well defined, and u ∗ (t ) at almost each t ∈ [0, T ] minimizes
∂V (x ∗ (t ), t )
f (x ∗ (t ), u) + L(x ∗ (t ), u)
∂x T
over all u ∈ U, then u ∗ (t ) is a solution to the optimal control problem, and the optimal
cost is

J [0,T ] (x 0 , u ∗ (·)) = V (x 0 , 0). (3.14)


1 Technical detail: (Sontag, 1998, Prop. 8.1.8) proves that the value function indeed satisfies (3.11) if (a) the value

function V (x, t ) and L(x, u) and S(x) are all C 1 , (b) for every possible τ, x an optimal control u : [τ, T ] → U exists
that is continuous.

68
3. Suppose the minimization problem in (3.12) for each x ∈ Rn and each t ∈ [0, T ] has
a (possibly non-unique) solution u. Denote one such solution as u(x, t ). If for every
z ∈ Rn and every τ ∈ [0, T ] the solution x(t ) of ẋ(t ) = f (x(t ), u(x(t ), t )) with x(τ) = z is
well defined for all t ∈ [τ, T ], then V (z, τ) is the value function and u ∗ (t ) := u(x(t ), t ) is
an optimal control for J [τ,T ] (z, u(·)).

Proof.

1. Given z and τ, let u(·) be an admissible input for ẋ(t ) = f (x(t ), u(t )) for t > τ and x(τ) =
z. Then

J [τ,T ] (z, u(·))


Z T
= S(x(T )) + L(x(t ), u(t )) dt
τ
Z T
∂V (x(t ), t )
= S(x(T )) + f (x(t ), u(t )) + L(x(t ), u(t )) dt
τ ∂x T
Z T
∂V (x(t ), t )
− f (x(t ), u(t )) dt
τ ∂x T
Z T µ ¶
∂V (x(t ), t )
≥ S(x(T )) + min f (x(t ), u) + L(x(t ), u) dt (3.15)
τ u∈U ∂x T
Z T
∂V (x(t ), t )
− f (x(t ), u(t )) dt
τ ∂x T
Z T
∂V (x(t ), t ) ∂V (x(t ), t )
= S(x(T )) + − − f (x(t ), u(t )) dt
τ ∂t ∂x T
Z T
dV (x(t ), t )
= V (x(T ), T ) − dt
τ dt
= V (x(T ), T ) − [V (x(T ), T ) − V (x(τ), τ)] = V (z, τ).

2. Then by assumption, x ∗ (t ) is well defined. Let z = x 0 and τ = 0. For the input u ∗ (t ) the
inequality in Eqn. (3.15) is an equality. Hence J [0,T ] (x 0 , u ∗ (·)) = V (x 0 , 0) and we already
showed that no control achieves a smaller cost.

3. Similar: then by assumption x(t ) is well defined. For the so defined input u ∗ (t )
the inequality in Eqn. (3.15) is an equality. Hence the optimal cost then equals
J [τ,T ] (x 0 , u ∗ (·)) = V (z, τ) and it is achieved by u ∗ (t ). Since this holds for every z ∈ Rn
and every τ ∈ [0, T ] the V (z, τ) is the value function.

Parts 2 and 3 are a bit technical because the input found by solving the minimization
problem of (3.12) pointwise (at each x and each t ) does not always give us an input u ∗ (t )
for which x(t ) is well-defined for all t ∈ [0, T ], see Exercise 3.3(c). Luckily, however, in most
applications this problem does not occur and then the above says the the so determined input
is the optimal solution and that V (x, t ) is the value function.
Theorem 3.4.3 provides a sufficient condition for optimality: if we can solve the Hamilton-
Jacobi-Bellman equations (3.12,3.13) and if the conditions of Theorem 3.4.3 are satisfied, then
we are guaranteed that u ∗ (t ) is an optimal control. Recall, on the other hand, from the pre-
vious chapter that the conditions for optimality found from the Minimum Principle are nec-
essary for optimality. So in a sense, the Minimum Principle and Dynamic Programming com-
plement each other.

69
Another difference between the two methods is that an optimal control u(t ) derived from
the Minimum Principle is given as a function of state x(t ) costate p(t ), which after solving the
Hamiltonian equations gives us u ∗ (t ) as a function of time, while in Dynamic Programming
the optimal input is given in state feedback form u(x, t ). The state feedback form is all we need
to compute the solutions x ∗ (t ), u ∗ (t ) of the system equation ẋ(t ) = f (x(t ), u(x(t ), t )). Also, in
applications the state feedback form is prefered, because it is way more robust2 . The next
example demonstrate this feedback property.

Example 3.4.4 (Integrator with quadratic cost). Consider

ẋ(t ) = u(t ),

with cost
Z T
2
J (x 0 , u(·)) = x (T ) + Ru 2 (t ) dt
0

for some R > 0. We allow any u(t ) in R. Then the HJB equations (3.12, 3.13) become
µ ¶
∂V (x, t ) ∂V (x, t ) 2
+ min u + Ru = 0, V (x, T ) = x 2 .
∂t u∈R ∂x

Since the term to be minimized is quadratic in u (and R > 0) the optimal u is where the
(x,t )
derivative of ∂V∂x u + Ru 2 with respect to u is zero. This is for

1 ∂V (x, t )
u=− , (3.16)
2R ∂x
and then the HJB equations reduce to
µ ¶
∂V (x, t ) 1 ∂V (x, t ) 2
− = 0, V (x, T ) = x 2 .
∂t 4R ∂x

Motivated by the boundary condition we now try a V (x, t ) that is quadratic in x for all time, so
of the form V (x, t ) = x 2 P (t ). (Granted, this is a magic step.) This way the HJB equations (3.12,
3.13) simplify to

1
x 2 Ṗ (t ) − (2xP (t ))2 = 0, x 2 P (T ) = x 2 .
4R

It has a common quadratic term x 2 . Canceling this quadratic term x 2 gives

Ṗ (t ) = P 2 (t )/R, P (T ) = 1.

The solution of this differential equation is

R
P (t ) = .
R +T −t
(This solution can be found via separation of variables.) It is well defined throughout t ∈ [0, T ]
and, therefore,

R
V (x, t ) = x 2 (3.17)
R +T −t
2 If the state at some time τ is corrupted by noise or whatever, then the feedback implementation of the input

still performs well, and in fact x(t ), u(t ) then continue optimally from that time on.

70
is a solution of the HJB equation. Now that V (x, t ) is known we can compute the candidate
optimal input (3.16). It is not (yet) a function of t alone, it also depends on x(t ):

1 ∂V (x(t ), t ) 2x(t )P (t ) x(t )


u ∗ (t ) = − =− =− . (3.18)
2R ∂x 2R R +T −t
The candidate optimal state x ∗ (t ) satisfies

x ∗ (t )
ẋ ∗ (t ) = u ∗ (t ) = − .
R +T −t
It is a linear differential equation and it has a well defined solution x ∗ (t ) on [0, T ] and then
also the above u ∗ (t ) is well defined on [0, T ]. This, finally, allows us to conclude that (3.17)
is the value function, that the above u ∗ (t ) is the optimal input and that the optimal cost is
J [0,T ] (x 0 , u ∗ (·)) = V (x 0 , 0) = x 02 /(1 + T /R). We solved the optimal control completely. Not bad.
ä

Example 3.4.5 (Quadratic control). Consider the linear system

ẋ(t ) = u(t ), x(0) = x 0

with U = R and cost


Z T
J [0,1] (x 0 , u(·)) = x 2 (t ) + r 2 u 2 (t ) dt .
0

For this system the HJB equations (3.12,3.13) are


µ ¶
∂V (x, t ) ∂V (x, t )
+ min u + x 2 + r 2 u 2 = 0, V (x, T ) = 0.
∂t u∈R ∂x

The term to be minimized is quadratic in u hence is minimal only if the derivative with re-
spect u is zero. This gives

1 ∂V (x, t )
u=− .
2r 2 ∂x
So we can re-write the HJB equations as
µ ¶
∂V (x, t ) 2 1 ∂V (x, t ) 2
+x − 2 = 0, V (x, T ) = 0. (3.19)
∂t 4r ∂x

This is a nonlinear partial differential equation, and this might be complicated. But is has an
interesting physical dimensional property (read the footnote3 if you want to know) and this
suggests that

V (x, t ) = P (t )x 2 .

We then find
1
Ṗ (t )x 2 + x 2 − (2P (t )x)2 = 0, P (T ) = 0.
4r 2
3 Outside the scope of this book, but still: let [x] denote the dimension of a quantity x. For example [t ] = time.

From ẋ = u it follows that [u] = [x][t ]−1 and then x 2 + r 2 u 2 implies that [r ] = [t ] and then [V ] = [J ] = [x]2 [t ]. This
suggests that V (x, t ) = x 2 P (t ). In fact, application of the Buckingham π-theorem (not part of this course) shows
that V (x, t ) must have the form V (x, t ) = x 2 r Q((t − T )/r ) for some dimensionless function Q : R → R.

71
Division by x 2 yields the ordinary differential equation
1
Ṗ (t ) + 1 − P (t )2 = 0, P (T ) = 0.
r2
This type of differential equations are discussed at length in the next chapter. The solution is

|r |
e(T −t )/r − e(t −T )/r P (t )
P (t ) = r . (3.20)
e(T −t )/r + e (t −T )/r
T −|r | T

So the HJB equations (3.19) has a solution V (x, t ) = x 2 P (t ) for this system, where P (t ) is given
by (3.20). The candidate optimal control u ∗ (t ) then is u(x(t ), t ) = − 2r1 2 ∂V (x(t
∂x
),t )
= − r12 P (t )x(t )
and so the candidate optimal state satisfies the linear time-varying differential equation
1
ẋ ∗ (t ) = u(x ∗ (t ), t ) = − P (t )x ∗ (t ), x(0) = x 0 .
r2
Since P (t ) is well defined and bounded,Rt it will be clear that the solution x(t ) is well defined,
− 12 0 P (τ) dτ
in fact the solution is x ∗ (t ) = e r x 0 . Having a well defined solution means that x ∗ (t )
is the optimal state, that
1
u ∗ (t ) = − P (t )x ∗ (t )
r2
is the optimal control and that V (x 0 , 0) = P (0)x 02 is the optimal cost. Once again the optimal
control is given as a state feedback, and once again we managed to solve the optimal control
problem completely. Nice. ä

Example 3.4.6 (Quartic control). This is an uncommon application, but interesting. We


again consider a integrator system ẋ(t ) = u(t ) but now with the cost equal to a sum of quartics
Z T
J [0,T ] (x 0 , u(·)) = x 4 (t ) + u 4 (t ) dt .
0

Again u(t ) is not restricted: U = R. The HJB equations now are


µ ¶
∂V (x, t ) ∂V (x, t ) 4 4
+ min u + x + u = 0, V (x, T ) = 0.
∂t u∈R ∂x
Inspired by the previous example we try a value function of the form

V (x, t ) = x 4 P (t ). (3.21)

(It needs to be seen whether this forms works.) Substitution of this form in the HJB equation
yields

x 4 Ṗ (t ) + min(4x 3 P (t )u + x 4 + u 4 ) = 0, x 4 P (T ) = 0.
u∈R
p
The minimizing u is u = − 3 P (t )x. This can be obtained by setting the gradient of 4x 3 P (t )u +
x 4 + u 4 with respect to u equal to zero (verify this yourself). This simplifies the HJB equation
to

x 4 Ṗ (t ) − 4x 4 P 4/3 (t ) + x 4 + x 4 P 4/3 (t ) = 0, x 4 P (T ) = 0.

Cancelling the common factor x 4 leaves us with

Ṗ (t ) = 3P 4/3 (t ) − 1, P (T ) = 0. (3.22)

72
The equation here is a simple first-order differential equation, except that no closed form
solution appears to be known. The graph of the solution (obtained numerically) is

3−3/4 ≈ 0.43869
P (t )

T −1 T

and it reveals that P (t ) is well defined and bounded for all t < T . This proves that the HJB
equation has a solution of the quartic form (3.21)! As t → −∞ the solution P (t ) converges
to the equilibrium solution where 0 = 3P 4/3 − 1, i.e. where P = 3−3/4 ≈ 0.43869. For now the
function V (x, t ) = x 4 P (t ) is just a candidate value function. The resulting candidate optimal
control input
p
3
u ∗ (t ) = − P (t )x ∗ (t ) (3.23)

is linear in x ∗ (t ) and so the candidate optimal closed loop is linear as well,


p
3
ẋ ∗ (t ) = − P (t )x ∗ (t ). (3.24)
p
Since P (t ) is bounded, also − 3 P (t ) is bounded and therefore the closed loop differential
equation has a well defined solution x ∗ (t ) for every initial condition x 0 and all t ∈ [0, T ]. Thus
we may conclude that V (x, t ) = x 4 P (t ) is the value function, that (3.23) is the optimal con-
trol and that x 04 P (0) is the optimal cost. From the plot of P (t ) we see that the optimal cost is
p
always less than 3−3/4 x 04 , and for T > 1 it is close to this number. The graph of 3 P (t ) is

3−1/4 ≈ 0.75984
p
3
P (t ) area≈ 0.12692

T −1 T

For t < T − 1 it is very close to the constant 3−1/4 ≈ 0.75984 and the total area over all t < T
p
between 3 P (t ) and this constant can be shown to be finite, approximately equal to 0.12692.
This tells us that the solution x ∗ (t ) of the differential equation (3.24) for t < T − 1 is close to
−1/4
exponential, x ∗ (t ) ≈ e−3 t x 0 , and that x ∗ (t ) slows down as t approaches T , and that
−1/4
x ∗ (T ) ≈ e0.12692 e−3 T
x0

whenever T > 1. ä

In the above examples the candidate value functions V (x, t ) all turned out to be true value
functions. We need to stress that examples exist where this is not the case, see Exercise 3.3(c).
The final example is one where U is restricted.
Example 3.4.7 (Example 3.4.2 extended). We consider the system of Example 3.4.2, that is,
ẋ(t ) = u(t ), x(0) = x 0 with inputs taking values in U = [−1, 1]. The cost, however, we extend
with a final cost,
Z T
J [0,T ] (x 0 , u(·)) = −αx(T ) + x(t ) dt
0
4
in which

α > 0.
4 It may be helpful to know that α has the same physical dimension as t , so if t has dimension “time” then also

α has dimension “time”.

73
The HJB equations (3.12),(3.13) become
µ ¶
∂V (x, t ) ∂V (x, t )
+ min u + x = 0, V (x, T ) = −αx. (3.25)
∂t u∈[−1,1] ∂x
(x,t )
The function to be minimized, ∂V∂x u + x, is linear in u. So the minimum is attained at one
of the boundaries of U = [−1, 1]. One way to proceed would be to analyse the HJB equations
for the two cases u = ±1. But the equations are partial differential equations and these are
often very hard to solve. (In this case it can be done though.) We take another route: in
Example 3.4.2 where we analyzed a similar problem we ended up with a value function V (x, t )
of the form

V (x, t ) = xP (t ) +Q(t )

for certain functions P (t ),Q(t ). We will see that this form also works for our problem. The
HJB equations for this form simplify to

x Ṗ (t ) + Q̇(t ) + min (P (t )u + x) = 0, xP (T ) +Q(T ) = −αx.


u∈[−1,1]

This has to hold for all x and all t so the HJB equations hold iff

Ṗ (t ) = −1, Q̇(t ) = − min P (t )u, P (T ) = −α, Q(T ) = 0. (3.26)


u∈[−1,1]

This settles P (t ):

P (t ) = T − α − t .

This function is positive for t < T − α and negative for t > T − α. The minimizing u ∈ [−1, 1] of
P (t )u + x hence is
(
−1 if t < T − α
u ∗ (t ) = . (3.27)
+1 if t > T − α

This, in turn, settles the differential equation for Q(t ):

( Q̇(t )
+(T − α − t ) if t < T − α
Q̇(t ) =
−(T − α − t ) if t > T − α
T −α T

Since Q(T ) = 0 it follows that

( 1 T −α T
2
− 2 (T − α − t )2 − α2 if t < T − α, 2
Q(t ) = − α2
α2
+ 12 (T −α− t) − 2
2 if t > T − α.
Q(t )

This function is continuously differentiable. Now all conditions of (3.26) are met and there-
fore V (x, t ) = xP (t ) + Q(t ) satisfies the HJB equations. Along the way we also determined the
candidate optimal input: (3.27). This input does not depend on x (in most applications it
does depend on x). Clearly for this input, the solution x(t ) of ẋ(t ) = u(t ) is well defined for
all t ∈ [0, T ]. Hence (3.27) is the optimal input, the above V (x, t ) is the value function and
V (x 0 , 0) = x 0 P (0) +Q(0) is the optimal cost.

74
Does it agree with the Minimum Principle? The Hamiltonian is H (x, p, u) = pu + x so
the Hamiltonian equation for the costate is ṗ ∗ (t ) = −1, p ∗ (T ) = −α. Clearly this means that
p ∗ (t ) = T − α − t . Now the u that minimizes the Hamiltonian at the optimal state and costate,

H (x ∗ (t ), p ∗ (t ), u) = (T − α − t )u + x ∗ (t )

agrees what we found earlier: (3.27). But of course, the fundamental difference is that the
Minimum Principle assumes the existence of an optimal control, whereas satisfaction of the
above HJB equations proves that the control is optimal. ä

The examples might give the impression that Dynamic Programming is superior to the
Minimum Principle. In practical applications, however, it is the other way round. The equa-
tions needed in the Minimum Principle (i.e. the Hamiltonian equations) are ordinary differen-
tial equations and numerical routines exist that are quite efficient in solving these equations.
The HJB equations, in contrast, normally are partial differential equations and the standard
HJB theory requires a higher degree of smoothness than the Minimum Principle requires.
This last point is exemplified in the next section where we derive co-states from value func-
tions of limited smoothness.

3.5 Connection with Hamiltonians


The HJB equation is actually tightly connected to the Hamiltonian H (x, p, u) = p T f (x, u) +
L(x, u). Indeed, the HJB equation (3.12) can be expressed in terms of the Hamiltonian as

∂V (x, t ) ¡ ∂V (x, t ) ¢
+ min H x, , u = 0.
∂t u∈U ∂x
This suggests that the costate p ∗ (t ) in the Minimum Principle equals

∂V (x ∗ (t ), t )
p ∗ (t ) = .
∂x
Under mild assumptions that is indeed the case. In order to prove it for restricted sets U
(such as U = [0, 1]) we need an extension of the classic result that says that at a minimum of a
smooth function certain gradients are zero.

Lemma 3.5.1 (Technical lemma). Let U ⊆ Rm , and let G : Rn × U → R be some function. Sup-
pose Ω is a region of Rn . If

• G(x, u) is C 1 with respect x, u for all x ∈ Ω, u ∈ U,

• a (possibly nonunique) C 1 function u ∗ : Ω → U exists that minimizes G(x, u) over u ∈ U,


that is, G(x, u ∗ (x)) = minu∈U G(x, u) for all x ∈ Ω,

then
∂G(x, u ∗ (x)) ∂u ∗ (x)
=0 ∀x ∈ Ω.
∂u T ∂x T
∂G(x,u ∗ (x)) ∂u ∗ (x)
Proof. Denote G u T := ∂u T ∈ R1×m and u x T := ∂x T ∈ Rm×n . Let δx ∈ Rn and α ∈ R. Then

G(x, u ∗ (x + δx α)) = G(x, u ∗ (x)) +G u T u x T δx α + o(α).

If G u T u x T is nonzero then the above is less than G(x, u ∗ (x)) for some choice of δx ∈ Rn and
scalar α close enough to zero. This contradicts that u ∗ (x) minimizes G(x, u). ■

75
Observe that the lemma does not require u ∗ (x) to be differentiable at every x.

Example 3.5.2 (Demonstration of technical lemma). Let x, u be real numbers and consider
G(x, u) = (x − u)2 and U = [−1, 1]. Given x, the u ∈ U that minimizes G(x, u) is


−1 if x < −1

u ∗ (x) = x if x ∈ [−1, 1]


1 if x > 1.

This u ∗ (x) is differentiable everywhere except at x = ±1 and we have




0 if x < −1
∂u ∗ (x) 
= 1 if x ∈ (−1, 1)
∂x 

0 if x > 1.

The partial derivative of G(x, u) = (x − u)2 with respect to u is −2(x − u) and so




−2(x − u ∗ (x))0 = 0 if x < −1
∂G(x, u ∗ (x)) ∂u ∗ (x) 
= −2(x − u ∗ (x))1 = −2(x − x) = 0 if x ∈ (−1, 1)
∂u ∂x 

−2(x − u (x))0 = 0 if x > 1.

It is defined and equal to zero for almost all x (for all x 6= ±1.) ä

In optimal control the optimal input u(x, t ) is usually differentiable with respect to x al-
most everywhere and this is enough to make the connection between the HJB equations and
the Minimum Principle:

Theorem 3.5.3 (Connection between costate & value function). Assume f (x, u), L(x, u), S(x)
are all C 1 . Let U ⊆ Rm and suppose there is a function V : Rn × [0, T ] → R that satisfies the HJB
equation

∂V (x, t ) ³ ∂V (x, t ) ´
+ min H x, , u = 0, V (x, T ) = S(x)
∂t u∈U ∂x
at all (x, t ) where V (x, t ) it is continuously differentiable. Denote one possible minimizer u as
u ∗ (x, t ) and let x ∗ (t ) be the solution of ẋ(t ) = f (x(t ), u ∗ (x(t ), t )), x(0) = x 0 . If

• x ∗ (t ) and u ∗ (x ∗ (t ), t ) are defined for all t ∈ [0, T ],

• V (x, t ) is C 1 with respect to x, t at (x ∗ (t ), t ) for all t ∈ [0, T ],

• V (x, t ) is C 2 with respect to x, t at (x ∗ (t ), t ) for almost all t ∈ [0, T ],

• u ∗ (x, t ) is C 1 with respect to x, t at (x ∗ (t ), t ) for almost all t ∈ [0, T ],

then p ∗ (t ) defined as

∂V (x ∗ (t ), t )
p ∗ (t ) = (3.28)
∂x
is a solution of the Hamiltonian co-state equation

∂H (x(t ), p(t ), u ∗ (t )) ∂S(x(T ))


ṗ ∗ (t ) = − , p ∗ (T ) = . (3.29)
∂x ∂x
Moreover then H (x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t )) is constant as a function of time.

76
Proof. Let D be a region of Rn × U on which V (x, t ) is C 2 with respect to x, t and u ∗ (x, t ) is C 1
with respect to x, t . By definition, the minimizing u ∗ (x, t ) satisfies the HJB equation
∂V (x, t ) ³ ∂V (x, t ) ´
+ H x, , u ∗ (x, t ) = 0 (3.30)
∂t ∂x
for all (x, t ) ∈ D. In the rest of this proof we drop the arguments (x, t ) and u ∗ . The partial
derivative of the previous expression with respect to (row vector) x T yields
∂V ∂H ∂H ∂V ∂H ∂u
T
+ T+ T T
+ T T =0 ∀(x, t ) ∈ D.
∂x ∂t ∂x ∂p ∂x ∂x |∂u {z∂x }
0
The underbraced term is zero on D because of Lemma 3.5.1. Using this expression and the
fact that ∂H
∂p = f we find that

d ∂V (x(t ), t ) ∂V ∂V ∂V ∂V ∂H ∂H
= + f = + T =− ∀(x, t ) ∈ D.
dt ∂x ∂t ∂x ∂x∂x T
∂t ∂x ∂x ∂x ∂p ∂x
∂V (x(T ),T ) ))
Because V (x, T ) = S(x) for all x, we have ∂x = ∂S(x(T ∂x . Hence, if (x ∗ (t ), t ) – as a function
∂V (x ∗ (t ),t )
of time – is in D for almost all time, then p ∗ (t ) := ∂x satisfies the costate equation (3.29)
∂V (x ∗ (t ),t )
for almost all time. By assumed continuity of ∂x it is therefore a solution of the costate
equation.
Along the optimal solution, the total derivative of the Hamiltonian with respect to time is
zero almost all the time because
d ∂H ∂H ∂H du ∗
H (x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t )) = T ẋ ∗ + ṗ + T
T ∗
∀(x ∗ (t ), t ) ∈ D
dt ∂x ∂p ∂u ∗ dt
| {z }
0
∂H ∂H ∂H ∂H
= T( )+ T
(− )=0 ∀(x ∗ (t ), t ) ∈ D.
∂x ∂p ∂p ∂x
Here, again, the underbraced term is zero because of Lemma 3.5.1. Hence the Hamiltonian at
x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t ) is constant for almost all time. By assumption, the HJB equality (3.30)
holds at x = x ∗ (t ), u ∗ (x, t ) = u ∗ (x ∗ (t ), t ) for all t ∈ [0, T ] and, since, again by assumption,
∂V (x,t )
∂t is continuous at x = x ∗ (t ) for all time, also H (x ∗ (t ), p ∗ (t ), u ∗ (x ∗ (t ), t )) is continuous
for all time. Combined with the fact that it is constant for almost all time shows that it is
constant for all time. ■

As already alluded to in the previous chapter, the identity p ∗ (t ) = ∂V (x∂x


∗ (t ),t )
tells us that the
optimal costate function p ∗ (·) measures the sensitivity of the value function V (x ∗ (t ), t ) with
respect to variations of the optimal state at every time t . Specifically, if the costate at time
zero, p ∗ (0), is large then small changes in x 0 may have considerable effect on the optimal
cost. We apply this theorem to Example 3.4.5.
Example 3.5.4. Consider the optimal control problem of Example 3.4.5. For simplicity we
take r = T = 1. Then Example 3.4.5 says that
e1−t − et −1
V (x, t ) = x 2 P (t ) where P (t ) = .
e1−t + et −1
Using this and the formula for x ∗ (t ) (determined in Example 2.4.4) we find that
∂V (x ∗ (t ), t )
p ∗ (t ) = = 2x ∗ (t )P (t )
∂x
x 0 ¡ 1−t t −1
¢ e1−t − et −1
=2 e + e × 1−t
e + e−1 e + et −1
x 0 ¡ 1−t ¢
=2 e − et −1 .
e + e−1

77
This is exactly the p ∗ (t ) as found in Example 2.4.4. ä

For restricted sets such as U = [0, 1] the value function is typically continuously differen-
tiable everywhere (see Example 3.4.7 and Exercise 3.7) but in some cases it is continuously
differentiable only almost everywhere:

Example 3.5.5 (Non-smooth value functions). Suppose ẋ(t ) = x(t )u(t ) with U = [0, 1] and let
J [0,T ] (x 0 , u(·)) = x(T ). So we should try to make x(t ) as small (negative) as possible. Clearly
one optimal input as a function of state x and time t is
(
0 if x ≥ 0
u(x, t ) = ,
1 if x < 0

and then the value function is


(
x if x ≥ 0
V (x, t ) = T −t .
e x if x < 0

This value function is not continuously differentiable with respect to x at x = 0 and there-
fore the standard theory does not apply. It does satisfy the HJB equations at all x where it is
continuously differentiable (at all x 6= 0):
µ ¶ ½ ¾
∂V (x, t ) ∂V (x, t ) 0 + 0 = 0 if x > 0
+ min xu = = 0 ∀x 6= 0.
∂t u∈[0,1] ∂x − eT −t x + eT −t x = 0 if x < 0

The connection with the co-state is explored in Exercise 3.8. ä

3.6 Infinite horizon and Lyapunov functions


For infinite horizon optimal control problems there is an interesting connection with Lya-
punov functions and stabilizing inputs. Infinite horizon refers to the case that T = ∞. As
before we consider systems, input sets and cost functions of the form

ẋ(t ) = f (x(t ), u(t )), (3.31a)


u : [0, T ] → U, (3.31b)
Z ∞
J [0,∞) (z, u(·)) = L(x(t ), u(t )) dt for x(0) = z. (3.31c)
0

The only difference here is the cost function. The integral that defines the cost is now over
all t > 0, and the “final” cost S(x(∞)) has been dropped because in applications we normally
send the state to a unique equilibrium x(∞) := limt →∞ x(t ) and thus all such controls achieve
the same final cost (i.e., the final cost would not affect the optimal control). As before we
define the value function as

V (z, τ) = inf J [τ,∞) (z, u(·)). (3.32)


u:[τ,∞)→U

Because of the infinite horizon, however, the value function no longer depends on τ (see Ex-
ercise 3.9(a)) and so we can simply write

V (z) = inf J [0,∞) (z, u(·)).


u:[0,∞)→U

78
That way the HJB equation (3.12) simplifies to
µ ¶
∂V (x)
min f (x, u) + L(x, u) = 0. (3.33)
u∈U ∂x T
If we solve this equation then, quite often, we also found a stabilizing input (which is an
input that steers the state to some given equilibrium state x̄) and a Lyapunov function for
that equilibrium. The following example demonstrates this point.
Example 3.6.1 (Quartic control – design of optimal stabilizing inputs and Lyapunov func-
tion). Consider the optimal control problem with
Z ∞
ẋ(t ) = u(t ), U = R, J [0,∞) (z, u(·)) = x 4 (t ) + u 4 (t ) dt .
0

For this problem the infinite-horizon HJB equation (3.33) is


µ ¶
∂V (x)
min u + x 4 + u 4 = 0.
u∈R ∂x
The solution u of the minimization problem is u = −( 41 ∂V∂x(x) )1/3 and then the HJB equation
becomes
1 1 ∂V (x) 4/3
( − 1)( )1/3 ( ) + x 4 = 0.
4 4 ∂x
This looks rather ugly but actually it says that
∂V (x)
= 4(3−3/4 )x 3 ,
∂x
and therefore

V (x) = 3−3/4 x 4 + d

for some integration constant d . The choice d = 0 is convenient for then V (0) = 0. We claim
that this V (x) is a Lyapunov function for the equilibrium x̄ = 0 of the controlled system ẋ ∗ (t ) =
u ∗ (t ) for the control equal to the candidate optimal control,
1 ∂V (x(t )) 1/3
u ∗ (t ) = −( ) = −3−1/4 x(t ).
4 ∂x
Indeed, x̄ = 0 is an equilibrium of this controlled system and V (x) clearly is C 1 , is positive
definite and by construction the HJB equation (3.33) gives us that
∂V (x)
V̇ (x) = f (x, u ∗ ) = −L(x, u ∗ ) = −(x 4 + u ∗4 ) < 0
∂x T
for all x 6= 0. This V (x) hence is a strong Lyapunov function for the controlled system with
equilibrium x̄ = 0 and, therefore, it is asymptotically stable at x̄ = 0. For this reason the con-
trol input u ∗ (·) is called a stabilizing input. In fact it is the input that minimizes the cost
R∞
J [0,∞) (x 0 , u(·)) = 0 x 4 (t ) + u 4 (t ) dt over all inputs that stabilize the system. Indeed, for any
input u(·) that steers the state to zero we have the inequality
Z ∞
J [0,∞) (x 0 , u(·)) = L(x(t ), u(t )) dt
0
Z ∞
V (x(t ))
≥ − f (x(t ), u(t )) dt because of (3.33)
∂x T
Z0 ∞
= −V̇ (x(t )) dt = V (x 0 ) − V (x(∞)) = V (x 0 ),
0 | {z }
0

and equality holds if u(t ) = u ∗ (t ). ä

79
3.7 Exercises
3.1 Consider the system

ẋ(t ) = f (x(t ), u(t )), x(0) = x 0 .

We want to maximize the cost


Z T
J [0,T ] (x 0 , u(·)) = S 0 (x(T )) + L 0 (x(t ), u(t )) dt .
0

Find a new cost functional such that the maximization problem becomes a minimiza-
tion problem. How are the associated optimal inputs for the two optimization problems
related?

3.2 Not every optimal control problem is solvable. Consider the system ẋ(t ) = u(t ), x 0 = 1
with cost
Z T
J [0,T ] (x 0 , u(·)) = x 2 (t ) dt
0

and U = R.

(a) Determine the value function (from the definition, not the HJB equation).
(b) Show that the value function does not satisfy the HJB equations.
(c) Show that there is no bounded optimal control u ∗ : [0, T ] → R.

3.3 Consider the system

ẋ(t ) = x(t )u(t ), x(0) = x 0

with cost function


Z T
J [0,T ] (x 0 , u(·)) = 2x(T ) + x 2 (t ) + u 2 (t ) dt
0

and with the input free to choose, u(t ) ∈ R.

(a) Determine the candidate value function V (x, t ) and candidate optimal control
u ∗ (x(t ), t ) (possibly still depending on x(t )).
Hint: assume that V (x, t ) does not depend on t , i.e. that it has the form V (x, t ) =
Q(x) for some function Q(·).
(b) Now let x 0 = 1 and T > 0. Show that the candidate value function V (x, t ) is the
value function and determine the optimal control u ∗ (t ) explicitly as a function of
time. (Hint: have look at Example B.1.5.)
(c) Now let x 0 = −1 and T = 2. Show that the candidate V (x, t ) and u ∗ (x(t ), t ) are not
the value function and not the optimal input! (In other words: what condition of
Thm. 3.4.3 fails here?)

3.4 Even though Dynamic Programming and the HJB equation are powerful concepts, we
should always strive for simpler approaches.
Consider the system

ẋ(t ) = u(t ), x(0) = x 0


RT
with cost function J [0,T ] (x 0 , u(·)) = 0 x 2 (t ) dt . The problem is to minimize this with
bounded inputs

0 ≤ u(t ) ≤ 1.

80
(a) Use your common sense to solve the minimization problem for x 0 > 0.
(b) What are the minimal costs that you could achieve in part 3.4a?
(c) Use part 3.4b to find a candidate solution for HJB equation.
Verify that this candidate solution satisfies (3.12) for x > 0.
Does it also satisfy (3.12) for all x ∈ R?
(d) Using your common sense to solve the minimization problem for x 0 < 0. Make a
distinction between −x 0 ≤ T and −x 0 > T . What are the minimal costs now?

3.5 The capital x(t ) ≥ 0 of an economy at any moment t is divided into two parts: u(t )x(t )
and (1 − u(t ))x(t ) with

u(t ) ∈ [0, 1].

The first part, u(t )x(t ), is for investments and contributes to the increase in capital ac-
cording to the formula

ẋ(t ) = u(t )x(t ), x(0) > 0.

The other part, (1 − u(t ))x(t ), is for consumption and is evaluated by the “satisfaction”
Z 3
J [0,T ] (x 0 , u(·)) = x(3) + (1 − u(t ))x(t ) dt .
0

We want to maximize the satisfaction.

(a) Try as value function a function of the form V (x, t ) = Q(t )x and with it determine
the HJB equations.
(b) Express the candidate optimal u ∗ (t ) as a function of Q(t ) (Hint: x(t ) is always
positive.)
(c) Determine Q(t ) for all t ∈ [0, 3].
(d) Determine the optimal u ∗ (t ) explicitly as a function of time and argue that this is
the true optimal control (so not just the “candidate” optimal control).
(e) What is the optimal cost J [0,3] (x 0 , u ∗ (·))?

3.6 Weird system. Consider the system

ẋ(t ) = x(t ) + u(t ), x(0) = x 0 , u(t ) ∈ R

on the finite time horizon t ∈ [0, T ] with cost


Z T
J [0,T ] (x 0 , u(·)) = 12 x 2 (T ) + −x 2 (t ) − x(t )u(t ) dt . (3.34)
0

(a) Solve the HJB equation. [Hint: try V (x, t ) = Q(x)]


(b) Determine an optimal input u(t )
(c) Determine the optimal cost
(d) Show directly from (3.34) that for whatever input (not necessarily the optimal one)
we have J [0,T ] (x 0 , u(·)) = 21 x 02 . [Hint: use that u(t ) = ẋ(t ) − x(t ).]

81
3.7 Consider the system with bounded input

ẋ(t ) = u(t ), x(0) = x 0 , u(t ) ∈ [−1, 1]

on the finite time horizon t ∈ [0, T ] with the family of costs

J [τ,T ] (x(τ), u(·)) = x 2 (T ).

(a) Argue that




+1 if x(t ) < 0

u(t ) = 0 if x(t ) = 0


−1 if x(t ) > 0

is an optimal input for J [τ,T ] (x(τ), u(·)) for every τ ∈ [0, T ].


(b) Use the above optimal input to determine the value function V (x, t ). Use the def-
inition of value function, do not use the HJB equations.
(c) Verify that this V (x, t ) satisfies the HJB equation.

3.8 Consider Example 3.5.5.

(a) Determine the co-state directly from the Minimum Principle.


(b) Argue that for x 0 = 0 there are many optimal inputs and many corresponding co-
states p ∗ (t ).
(c) Determine the co-state via Theorem 3.5.3.

3.9 Infinite horizon. Consider the infinite horizon cost


Z ∞
J [τ,∞) (x(τ), u(·)) = L(x(t ), u(t )) dt .
τ

A terminal cost is absent.

(a) Argue that the value function V (x, τ) defined in (3.32) does not depend on time τ.
(b) Suppose V (x) is a continuously differentiable function that solves this HJB equa-
tion (3.33). Show that for any admissible input for which V (x(∞)) = 0 we have
that

J [0,∞) (x 0 , u(·)) ≥ V (x 0 ).

(c) Consider the integrator ẋ(t ) = u(t ) and that u(t ) is free to choose, u(t ) ∈ R, and
suppose that the cost is
Z ∞
J [0,∞) (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0

There are two continuously differentiable solutions V (x) of the HJB equa-
tion (3.33) with the property that V (0) = 0. Determine both.
(d) Continue with the system and cost of (c). Find the input u ∗ : [0, ∞) → R that mini-
mizes J [0,∞) (x 0 , u(·)) over all inputs that steer the state to zero, limt →∞ x(t ) = 0.

82
3.10 Consider the problem of Example 3.4.6. Argue that
µ ¶
∂V (x ∗ (t ), t ) 4 4
min u ∗ (t ) + x ∗ (t ) + u ∗ (t )
u∈U ∂x

equals −x ∗4 (T ) for all t ∈ [0, T ].

3.11 Consider a mass m hanging from a ceiling on a thin massless rod of length `, see
Fig. 3.4. We can control the pendulum with a torque u(t ). The standard mathemati-
cal model in the absence of friction is

m`2 φ̈(t ) + g m` sin(φ(t )) = u(t ),

where φ is the angle between pendulum and the vertical hanging position, u is the ap-
plied torque, m is the mass of the pendulum, ` is the length of the pendulum and g is
the gravitational acceleration.
The objective is to choose a torque u(t ) that stabilizes the pendulum to the vertical
hanging equilibrium φ = 2kπ, φ̇ = 0. This, by definition means that u(t ) is such that

lim φ(t ) = 2kπ, lim φ̇(t ) = 0.


t →∞ t →∞

To achieve an optimal stabilization, we choose the cost function


Z ∞
J [0,∞) (x 0 , u(·)) = φ̇2 (t ) + ρ 2 u 2 (t ) dt .
0

Here ρ > 0 is a tuning parameter.


R∞
(a) Prove that if u(·) stabilizes the system, then 0 u(t )φ̇(t ) dt only depends on the
initial conditions φ(0), φ̇(0).
(Hint: there is an explicit anti-derivative of u(t )φ̇(t )!)
(b) Solve the optimal control problem.
Hint: compute (φ̇ ± ρu)2 and use part 3.11a.
(c) (Tricky question.) Verify that your optimal solution makes the closed loop asymp-
totically stable. (You probably need LaSalle’s Invariance Principle, see § B.4 of the
appendix.)

3.12 Determine the input u : [0, ∞) → R that stabilizes the system ẋ(t ) = x(t ) + u(t ) (meaning
R∞
limt →∞ x(t ) = 0) and minimizes 0 u 4 (t ) dt over all inputs that stabilize the system.


φ

F IGURE 3.4: A pendulum with a torque u, see Exercise 3.11

83
84
Chapter 4

Linear Quadratic Control

In this chapter we study quadratic costs for linear systems. This class of systems and costs is
broad, yet is simple enough to allow for explicit solutions. Especially for the infinite horizon
problem (explained below) there are efficient numerical routines that solve the problem com-
pletely, and they can be found in various software packages. These methods lie at the heart of
a number of popular controller design methods, such as LQR, H2 and H∞ controller design,
and it is also connected to Kalman filtering.

4.1 Linear systems with quadratic cost


The optimal control problem that we consider is the minimization of a quadratic cost
Z T
T
J [0,T ] (x 0 , u(·)) = x (T )Sx(T ) + x T (t )Qx(t ) + u T (t )Ru(t ) dt (4.1)
0

over all inputs u : [0, T ] → Rm and states x : [0, T ] → Rn that are governed by a linear time
invariant system with given initial state,

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 . (4.2)

No restrictions are imposed on u(t ), that is, at any time t ∈ [0, T ] the input can take any value
in Rm . The number of entries of the state x is denoted by n. The matrix B thus has n rows
and m columns and A is an n × n matrix. The weights Q and S are assumed to be positive
semidefinite n × n matrices but are otherwise arbitrary, and we assume that R is an m × m
positive definite matrix:

S ≥ 0, Q ≥ 0, R > 0.

Definition 4.1.1 (LQ problem). The finite horizon linear quadratic control problem –
LQ problem for short – is to determine inputs u : [0, T ] → Rm that minimize cost (4.1) subject
to (4.2). ä

We solve the LQ problem in detail, first using Pontryagin’s Minimum Principle and then
using Dynamic Programming. Both methods reveal that the optimal cost is quadratic in the
initial state, that is,

min J [0,T ] (x 0 , u(·)) = x 0T P x 0


u(·)

85
for some matrix P . The quadratic nature of the optimal cost is subsequently exploited to
derive a number of results. Most importantly it allows to elegantly solve the infinite horizon
LQ problem, which is where the final time is infinity, T = ∞, and the terminal cost is absent:
Z ∞
J [0,∞) (x 0 , u(·)) := x T (t )Qx(t ) + u T (t )Ru(t ) dt .
0

If for every x 0 this infinite horizon cost is finite for at least one input (and in practice that is
always the case) then the optimal input u(·) exists and we will see that it can be implemented
as a linear static state feedback

u(t ) = −F x(t )

for some matrix F . This is remarkable since the feedback form is not imposed on the LQ
problem. It is a result.

4.2 Finite horizon LQ: Minimum Principle


The Hamiltonian for system (4.2) with cost (4.1) is

H (x, p, u) = p T (Ax + Bu) + x T Qx + u T Ru. (4.3)

Working out the Hamiltonian equations for state (2.12a) and costate (2.12b) we obtain

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 ,


T
ṗ(t ) = −A p(t ) − 2Qx(t ), p(T ) = 2Sx(T ).

Now the math to come will clean up considerably if we replace the costate p with the halved
costate p̃ defined as

1
p̃ := p.
2

Also this halved costate p̃ is called costate. This way the Hamiltonian becomes

H (x, 2p̃, u) = 2p̃ T (Ax + Bu) + x T Qx + u T Ru

and the Hamiltonian equations become

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 , (4.4)


p̃˙ (t ) = −A T p̃(t ) −Qx(t ), p̃(T ) = Sx(T ).

According to the Minimum Principle, the optimal input at each t minimizes the Hamiltonian.
The Hamiltonian is quadratic in u with positive definite quadratic term, hence it is minimal
if-and-only-if its gradient with respect to u is zero. This gradient is

∂H (x, 2p̃, u)
= 2B T p̃ + 2Ru
∂u

and so the gradient is zero precisely when

u = −R −1 B T p̃. (4.5)

86
Substitution of this input into the Hamiltonian equations (4.4) yields the system of coupled
differential equations
· ¸ · ¸· ¸
ẋ(t ) A −B R −1 B T x(t ) x(0) = x 0
= , (4.6)
p̃˙ (t ) −Q −A T p̃(t ) p̃(T ) = Sx(T ).

The 2n × 2n matrix here is called a Hamiltonian matrix and we denote it by H ,


· ¸
A −B R −1 B T
H = . (4.7)
−Q −A T

The coupled differential equations (4.6) is a linear time-invariant differential equation in x(t )
and p̃(t ). If we would have had only an initial or only a final state condition then we could
have easily solved (4.6). Here, though, we have combined initial and final conditions, so it
is not immediately clear how to solve the above equation. At this point, it may not be clear
that the above differential equation, with its mixed boundary conditions, has a solution at all!
Later on in this section we will see that it does. This result exploits the following remarkable
connection between state/costate and optimal cost. This connection may come as a surprise
but can be understood from the Dynamic Programming solution presented further on in this
chapter.

Lemma 4.2.1 (Optimal cost). For any solution x(t ), p̃(t ) of (4.6), the cost (4.1) for u ∗ (t ) =
−R −1 B T p̃(t ) equals

J [0,T ] (x 0 , u ∗ (·)) = p̃ T (0)x(0).

Proof. Consider first the identity (and skipping time arguments)

d T T T ˙ T −1 T T T
dt (p̃ x) = p̃ ẋ + x p̃ = p̃ (Ax − B R B p̃) + x (−Qx − A p̃)
T −1 T T
= −p̃ B R B p̃ − x Qx
= −(u ∗ Ru ∗ + x T Qx).
T

This can be used to express the cost (4.1) as


Z T
d
J [0,T ] (x 0 , u ∗ (·)) = x T (T )Sx(T ) − T
dt (p̃ (t )x(t )) dt
0
h iT
= x T (T )Sx(T ) − p̃ T (t )x(t )
0
= x (T )Sx(T ) − p̃ (T )x(T ) + p̃ T (0)x(0) = p̃ T (0)x(0).
T T

In the final identity we used the final condition p̃(T ) = Sx(T ). ■

Example 4.2.2 (First order system). For the standard integrator system ẋ(t ) = u(t ) with
quadratic cost
Z T
J [0,T ] (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt
0

the Hamiltonian matrix (4.7) is


· ¸
0 −1
H = .
−1 0

87
This is simple enough to allow for an explicit solution of its matrix exponential,
" #
Ht 1 et + e−t − et + e−t
e = .
2 − et + e−t et + e−t

The state-costate then have the form


· ¸ " #· ¸
x(t ) 1 et + e−t − et + e−t x0
=
p̃(t ) 2 − et + e−t et + e−t p̃(0)

for an, as yet, unknown initial costate p̃(0). It should be chosen such that p̃(T ) matches the
final condition p̃(T ) = Sx(T ) = 0x(T ) = 0. It is not hard to see that this requires

eT − e−T
p̃(0) = x0 .
eT + e−T
This then fully determines the state and costate for all t ∈ [0, T ] as
· ¸ " #" #
x(t ) 1 et + e−t − et + e−t 1
p̃(t )
=
2 − et + e−t et + e−t eT − e−T x 0 .
eT + e−T

The initial costate p̃(0) is linear in x 0 and therefore the entire state and costate (x(t ), p̃(t )) is
linear in x 0 . The optimal cost is quadratic in x 0 ,

eT − e−T
J [0,T ] (x 0 , u ∗ (·)) = p̃(0)x(0) = x 02 .
eT + e−T
The optimal input is linear in the costate,

u ∗ (t ) = −R −1 B T p̃(t ) = −p̃(t ),

and since the costate is linear in x 0 also the optimal input is linear in x 0 . ä

In the above example we managed to transfer the final condition p̃(T ) = Sx(T ) into an
equivalent initial condition on p̃(0), and this proved that the solution of the Hamiltonian
equation exists and is unique. We will shortly see that this always works! The general pro-
cedure is as follows. First compute the 2n × 2n matrix exponential of the Hamiltonian matrix
and split it into four n × n blocks:
· ¸
Σ11 (t ) Σ12 (t )
= eH t .
Σ21 (t ) Σ22 (t )

Now the state-costate as a function of (known) x 0 and (unknown) p̃(0) is


· ¸ · ¸· ¸
x(t ) Σ11 (t ) Σ12 (t ) x0
=
p̃(t ) Σ21 (t ) Σ22 (t ) p̃(0)

and this shows that the final condition p̃(T ) = Sx(T ) can be rewritten as

0 = Sx(T ) − p̃(T )
· ¸
£ ¤ x(T )
= S −I
p̃(T )
· ¸· ¸
£ ¤ Σ11 (T ) Σ12 (T ) x0
= S −I
Σ21 (T ) Σ22 (T ) p̃(0)
= [SΣ11 (T ) − Σ21 (T )]x 0 + [SΣ12 (T ) − Σ22 (T )]p̃(0). (4.8)

88
Clearly this final condition is satisfied if and only if

p̃(0) = M x 0

where

M = −[SΣ12 (T ) − Σ22 (T )]−1 [SΣ11 (T ) − Σ21 (T )]. (4.9)

The question, of course, is: does the above inverse exist? The answer is yes:

Theorem 4.2.3 (Existence and uniqueness of solution). Suppose that Q ≥ 0, S ≥ 0, R >


0. Then the linear system with mixed boundary conditions (4.6) has a unique solution
(x(t ), p̃(t )) on time interval [0, T ]. In particular, the matrix M in (4.9) is well defined and,
hence,
· ¸ · ¸· ¸
x(t ) Σ11 (t ) Σ12 (t ) I
= x0 ∀t ∈ [0, T ]. (4.10)
p̃(t ) Σ21 (t ) Σ22 (t ) M

Proof. First take x 0 = 0. Lemma 4.2.1 showed that


Z T
T
x (T )Sx(T ) + x T (t )Qx(t ) + u ∗T (t )Ru ∗ (t ) dt = p̃ T (0)x 0
0

and here that is zero because we took x 0 = 0. Since all terms on the left-hand side of the above
equation are nonnegative it must be that all these terms are zero. In particular u ∗T (t )Ru ∗ (t ) ≡
0. Now R > 0 so necessarily u(t ) ≡ 0. This, in turn, implies that ẋ(t ) = Ax(t ) + Bu ∗ (t ) = Ax(t ).
Given x(0) = 0 we get x(t ) = 0 for all time and, as a result, p̃˙ (t ) = −Qx(t ) − A T p̃(t ) = −A T p̃(t )
and p̃(T ) = Sx(T ) = 0. This shows that p̃(t ) is zero for all time as well. Conclusion: for x 0 =
0 the solution (x(·), p̃(·)) of (4.6) exists and is unique. This implies that SΣ12 (T ) − Σ22 (T ) is
nonsingular for otherwise there would have existed multiple p̃(T ) that satisfy the boundary
condition (4.8). Invertibility of SΣ12 (T ) − Σ22 (T ) in turn shows that the final condition (4.8)
has a unique solution p̃(0) for every initial state x 0 . ■

Notice that p̃(0) according to (4.10) is linear in the initial state: p̃(0) = M x 0 . Hence the op-
timal cost is quadratic in x 0 (see Lemma 4.2.1). There is also an elegant elementary argument
why the optimal cost is quadratic in the state, see Exercise 4.6.

Example 4.2.4 (Integrator, see also Example 3.4.4). Consider once again the integrator sys-
tem ẋ(t ) = u(t ) and take as cost
Z T
2
J [0,T ] (x 0 , u(·)) = x (T ) + Ru 2 (t ) dt (4.11)
0

for some R > 0. Then


· ¸
0 −1/R
H =
0 0

and
· ¸
1 −t /R
eH t = .
0 1

89
The final condition on p̃(T ) can be transferred to a (unique) initial condition on p̃(0). Indeed
the final condition is met if and only if

0 = Sx(T ) − p̃(T )
· ¸
£ ¤ x0
= S −1 eH T
p̃(0)
· ¸· ¸
£ ¤ 1 −T /R x0
= 1 −1
0 1 p̃(0)
= x 0 − (T /R + 1)p̃(0).

This is the case iff


x0
p̃(0) = .
T /R + 1

It is linear in x 0 (as predicted) and the inverse required exists (as predicted) because T /R ≥ 0
so T /R + 1 6= 0. The optimal cost is quadratic (predicted as well), in fact,

x 02
J [0,T ] (x 0 , u ∗ (·)) = p̃(0)x(0) = .
T /R + 1

Special about this example is that the costate is constant p̃(t ) = p̃(0). The optimal input is
constant as well
1 p̃(0) x0
u ∗ (t ) = − p̃(t ) = − =− .
R R T +R

For R À T the optimal control u ∗ (t ) is small, which is to be expected because for large R
the input is penalized strongly in the cost (4.11). If R ≈ 0 then control is cheap. Now the
control is not necessarily large, u ∗ (t ) ≈ −x 0 /T but large enough to steer the final state x(T ) to
something close to zero, x(T ) = x 0 (1 − T /(R + T )) = x 0 R/(T + R) ≈ 0. ä

Example 4.2.5 (Second order system with mixed boundary condition). This is a laborious
example. Consider the system with initial condition
· ¸ · ¸· ¸ · ¸ · ¸ · ¸
ẋ 1 (t ) 0 1 x 1 (t ) 0 x 1 (0) 1
= + u(t ), =
ẋ 2 (t ) 0 0 x 2 (t ) 1 x 2 (0) 0

and with cost


Z 3
J [0,3] (x 0 , u(·)) = x 12 (3) + u 2 (t ) dt .
0

The Hamilton equations (4.6) then become (verify this)


    
ẋ 1 (t ) 0 1 0 0 x 1 (t )
 ẋ (t )  0 0 0 −1  
 2     x 2 (t ) 
˙ =   (4.12)
p̃ 1 (t ) 0 0 0 0  p̃ 1 (t )
p̃˙2 (t ) 0 0 −1 0 p̃ 2 (t )

with constraints
· ¸ · ¸ · ¸ · ¸
x 1 (0) 1 p̃ 1 (3) x 1 (3)
= , = . (4.13)
x 2 (0) 0 p̃ 2 (3) 0

90
Now we try to solve (4.12). The differential equation for p̃ 1 (t ) simply is p̃˙1 (t ) = 0, p̃ 1 (3) = x 1 (3),
and therefore has solution

p̃ 1 (t ) = x 1 (3). (4.14)

The differential equation for p̃ 2 (t ) now is

p̃˙2 (t ) = −p̃ 1 (t ) = −x 1 (3), p̃ 2 (3) = 0,

so that

p̃ 2 (t ) = (3 − t )x 1 (3). (4.15)

With this solution, we can write the differential equation for x 2 (t ) explicitly

ẋ 2 (t ) = −p̃ 2 (t ) = (t − 3)x 1 (3), x 2 (0) = 0.

This equation, too, is not difficult to solve;

1
x 2 (t ) = t (t − 6)x 1 (3). (4.16)
2

Finally, we have to solve the differential equation for x 1 (t ), given by

1
ẋ 1 (t ) = x 2 (t ) = t (t − 6)x 1 (3), x 1 (0) = 1.
2

Its solution is
· ¸
1 3 3 2
x 1 (t ) = t − t x 1 (3) + 1. (4.17)
6 2

The only unknown we are left with is x 1 (3). From Equation (4.17) it follows that
¸·
9 27
x 1 (3) = − x 1 (3) + 1.
2 2

I.e.,

1
x 1 (3) = . (4.18)
10

Now we have solved the differential equation (4.12), and the solution is given by (4.14)–(4.17),
with x 1 (3) equal to 1/10, see (4.18). Hence, the optimal control (4.5) is

£ ¤ 1
u(t ) = −R −1 B T p̃(t ) = −B T p̃(t ) = − 0 1 p̃(t ) = −p̃ 2 (t ) = (t − 3).
10

The optimal cost is


· ¸T · ¸ · ¸T · ¸
p̃ 1 (0) x 1 (0) 1/10 1 1
= = .
p̃ 2 (0) x 2 (0) 3/10 0 10

91
4.3 Finite Horizon LQ: Dynamic Programming
Dynamic Programming applies to the LQ problem as well. The crucial difference with the ap-
proach of the previous section is that while the Minimum Principle supplies necessary condi-
tions for optimality, with Dynamic Programming we have sufficient conditions for optimality.
That said, the equation that has to be solved in Dynamic Programming is a partial differential
equation (called Hamilton-Jacobi-Bellman (HJB) equation) and that is no easy task. For LQ it
can be done, however.
So consider again a linear system

ẋ(t ) = Ax(t ) + Bu(t ), x(t 0 ) = x 0 , (4.19)

with, again, the quadratic cost


Z T
J [0,T ] (x 0 , u(·)) = x T (T )Sx(T ) + x T (t )Qx(t ) + u T (t )Ru(t ) dt . (4.20)
0

Here, as before S and Q are symmetric n ×n positive semidefinite matrices and R is an m ×m


positive definite matrix. The HJB equations (3.12,3.13) for this problem are
· ¸
∂V (x, t ) ∂V (x, t ) T T
+ minm (Ax + Bu) + x Qx + u Ru = 0, V (x, T ) = x T Sx. (4.21)
∂t u∈R ∂x T
We determine a solution V (x, t ) for this equation. Because the optimal cost according to Pon-
tryagin is quadratic, we expect the value function to be quadratic in x as well. Based on this
we restrict our V (x, t ) to functions of the form

V (x, t ) = x T P (t )x

with P (t ) an n × n symmetric matrix. This may seem a restrictive assumption, but the beauty
of Dynamic Programming is that satisfaction of the HJB equation for a restricted class proves
that it was not a restriction after all. Using this quadratic V (x, t ), the HJB equations (4.21)
become
£ ¤
x T Ṗ (t )x + minm 2x T P (t ) (Ax + Bu) + x T Qx + u T Ru = 0, x T P (T )x = x T Sx. (4.22)
u∈R

The minimization over u can, like in the previous section, be solved by setting the gradient of
2x T P (t )(Ax + Bu) + x T Qx + u T Ru with respect to u equal to zero. This gives

u = −R −1 B T P (t )x

(verify this yourself), and thereby the HJB equation reduces to

x T [Ṗ (t ) + P (t )A + A T P (t ) +Q − P (t )B R −1 B T P (t )]x = 0, x T P (T )x = x T Sx.

All terms here have a factor x T (on the left) and a factor x (on the right). Now if the equation
with the x T and x removed has a solution then clearly the above equation with x T and x in
place has a solution as well. The differential equation with the x T and x removed is

Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) −Q, P (T ) = S . (4.23)

It is a nonlinear, n × n-matrix-valued differential equation, called Riccati Differential Equa-


tion (RDE). The existence of a solution of this RDE is not straightforward but if it exists on
[0, T ] then the candidate optimal control

u ∗ (t ) = −R −1 B T P (t )x(t )

92
makes the closed-loop system satisfy

ẋ(t ) = Ax(t ) + Bu ∗ (t ) = (A − B R −1 B T P (t ))x(t ), x(0) = x 0

This is a linear differential equation and it has a unique solution x(t ) on [0, T ] for every con-
tinuous P (t ). So then Thm. 3.4.3 guarantees that this u ∗ (·) is the optimal input for every x 0
and that V (x, t ) = x T P (t )x is the value function. In particular V (x 0 , 0) = x 0T P (0)x 0 is then the
optimal cost. Therefore we proved:

Proposition 4.3.1 (Solution of the finite horizon LQ problem). Let Q, S, R be symmetric and
R > 0. If the RDE (4.23) has a continuously differentiable solution P : [0, T ] → Rn×n , then the
LQ problem (4.1)–(4.2) is solvable for every x 0 ∈ Rn . In particular then

u ∗ (t ) = −R −1 B T P (t )x(t ) (4.24)

is the optimal input, and the optimal cost is

J [0,T ] (x 0 , u ∗ (·)) = x 0T P (0)x 0 (4.25)

and V (x, t ) := x T P (t )x is its value function. ä

Notice that this proposition is also valid for symmetric matrices S,Q that are not positive
semidefinite. The optimal control (4.24) is of a special form: first we have to determine the so-
lution P (t ) to the matrix RDE, but this can be done irrespective of x 0 . Once this is determined
the optimal control can be implemented as a static time-varying state feedback (4.24).

Example 4.3.2 (Example 4.2.4 continued). Consider again the integrator system ẋ(t ) = u(t )
of Example 4.2.4 with
Z T
J (x 0 , u(·)) = x 2 (T ) + Ru 2 (t ) dt
0

for some R > 0. The RDE (4.23) becomes

Ṗ (t ) = P 2 (t )/R, P (T ) = 1.

The solution can be found with separation of variables,


R 1
P (t ) = = .
R + T − t 1 + (T − t )/R
Since t ∈ [0, T ] and R > 0 we see that R + T − t > 0 throughout and so that P (t ) is well defined
on [0, T ]. Hence x 2 P (t ) is the value function with optimal cost

x 02
J (x 0 , u ∗ (·)) = x 02 P (0) =
1 + T /R
and the optimal input is

P (t )x(t ) x(t )
u ∗ (t ) = −R −1 B T P (t )x(t ) = − =− . (4.26)
R R +T −t
In this example the optimal control u ∗ (·) is given in state feedback form, while in Exam-
ple 4.2.4 (where we handled the same LQ problem) the control input is given as a function
of time. The feedback form is preferred in applications, but for this particular problem the
feedback form (4.26) blurs the fact that the resulting optimal state and control are just linear
functions, see Example 4.2.4. ä

93
4.4 Riccati Differential Equations (RDE’s)
Proposition 4.3.1 assumes the existence of a solution of the RDE. The good news is that for all
LQ problems (where S ≥ 0,Q ≥ 0, R > 0) it can be proved that such a solution exists. Thus for
LQ problems we have a complete solution:

Theorem 4.4.1 (Existence of solution of RDE’s). If S ≥ 0,Q ≥ 0, R > 0 then the RDE (4.23) has
a unique continuously differentiable solution P (t ) for t ∈ [0, T ] and

1. P (t ) is symmetric at every t ∈ [0, T ],

2. P (t ) is positive semi-definite at every t ∈ [0, T ].

Consequently, the LQ problem has a unique solution u ∗ (t ) = −R −1 B T P (t )x(t ) with value func-
tion V (x, t ) := x T P (t )x and optimal cost J [0,T ] (x 0 , u ∗ (·)) = x 0T P (0)x 0 .

Proof. Equation (4.23) is equivalent to a system of n 2 differential equations in the entries


p i j (t ), i , j = 1, . . . , n of P (t ). The right-hand side of this equation consists of polynomials in
p i j (t ), and hence it is continuously differentiable and the differential equation therefore lo-
cally Lipschitz. We conclude from Theorem B.1.3 that the solution P (t ) exists and is unique
on an interval (t esc , T ] for some t esc < T . It is easy to see that also P T (t ) is a solution, so, being
unique, we have that P (t ) is symmetric.
Now suppose that it has an escape time t esc ∈ [0, T ]. From Thm. B.1.4 it follows that for
t ↓ t esc , the norm of the vector p i j (t ), i , j = 1, . . . , n goes to infinity. This implies that at least
one entry p i j (t ) grows without bound as t ↓ t esc :

lim |p i j (t )| = ∞.
t ↓t esc

We now show that this leads to a contradiction. Let e i be the i -th basis vector of Rn . Because
P (t ) is symmetric, it is easy to see that

(e i + e j )T P (t )(e i + e j ) − (e i − e j )T P (t )(e i − e j ) = 4p i j (t ).

So either (e i + e j )T P (t )(e i + e j ), or (e i − e j )T P (t )(e i − e j ) is unbounded. Now, choose the initial


state z equal to e i + e j or e i − e j , whichever results in an unbounded z T P (t )z as t ↓ t esc . From
the preceding discussion we know that z T P (t )z is the value function V (z, t ) for t ∈ (t esc , T ] but
the value function for sure is bounded from above by the cost that we make for the zero input,

z T P (t )z = min J [t ,T ] (z, u(·))


u(·)
Z T
T T
≤ J [t ,T ] (z, 0) = z T e A (T −t )
S e A(T −t ) z + z T eA (τ−t )
Q e A(τ−t ) z dτ.
t

Our z T P (t )z can therefore not escape to +∞ in finite time. (By the way, it can also not escape
to −∞ because z T P (t )z = minu(·) J [t ,T ] (z, u(·)) ≥ 0.). Hence z T P (t )z does not escape on [0, T ].
Contradiction. The differential equation therefore does not have a finite escape time t esc ∈
[0, T ]. Lemma B.1.4 now guarantees that the differential equation has a unique solution P (t )
on the entire [0, T ].
Now that existence of P (t ) is proved, Proposition 4.3.1 tells us that the given u ∗ (·) is op-
timal and x T P (t )x is the value function and x 0T P (0)x 0 is the optimal cost. We showed at the
beginning of the proof that P (t ) is symmetric. It is also positive semi-definite because the
cost is nonnegative. ■

94
In this proof non-negativity of Q, S, R is used. These assumptions are standard in LQ, but
it is interesting to see what happens if an assumption fails. Then the solution P (t ) might
escape in finite time:

Example 4.4.2 (Negative Q, finite escape time). Consider the integrator system ẋ(t ) = u(t )
for some nonzero x(0) = x 0 and cost
Z T
J [0,T ] (x 0 , u(·)) = −x 2 (t ) + u 2 (t ) dt .
0

This is a non-standard LQ problem because Q = −1 < 0. The RDE (4.23) specializes to

Ṗ (t ) = P 2 (t ) + 1, P (T ) = 0.

With separation of variables one can show that its solution is

P (t ) = tan(t − T ), t ∈ (T − π2 , T ].

This solution P (t ) escapes at

t esc = T − π2 .

See Fig. 4.1. If T < π2 then there is no escape time in [0, T ] and hence P (t ) = tan(t − T ) is then
well defined on the entire horizon [0, T ] and consequently

V (x, t ) = x 2 tan(t − T )

is the value function and

u ∗ (t ) = − tan(t − T )x(t )

is the optimal state feedback.


However if T ≥ π2 then the escape time t esc is in [0, T ], see Fig. 4.1(right). We claim that in
this case the optimal cost is unbounded from below. That is, it can be made as small (neg-
ative) as we desire. The idea is to set the input equal to zero to just after t esc and then to
continue optimally:
(
0 t ≤ t esc + ²
u ² (t ) =
− tan(T − t )x(t ) t > t esc + ²

for some small ² > 0. Since ẋ(t ) = u(t ) it means that x(t ) is constant over [0, t esc + ²] and then
it continues optimally over [t esc + ², T ]. The total cost for this input is

J [0,T ] (x 0 , u ² (·)) = J [0,tesc +²] (x 0 , u ² (·)) + V (x 0 , t esc + ²)


= −(t esc + ²)x 02 + tan(− π2 + ²)x 02 .

It converges to −∞ as ² ↓ 0. ä

95
T T − π2 T
0 0

tan(t − T )
tan(t − T )
tan(−T )

F IGURE 4.1: Graph of tan(t − T ) for t ∈ [0, T ]. Left: if 0 < T < π2 . In that case tan(t − T ) is defined
for all t ∈ [0, T ]. Right: if T ≥ π2 . Then tan(t −T ) is not defined at T − π2 ∈ [0, T ]. See Example 4.4.2

4.5 Infinite horizon LQ and Algebraic Riccati Equations (ARE’s)


Now we turn to the infinite horizon LQ problem. This is the problem of minimizing
Z ∞
J [0,∞) (x 0 , u(·)) = x T (t )Qx(t ) + u T (t )Ru(t ) dt (4.27)
0

over all u : [0, ∞) → Rm under the dynamical constraint

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 .

As before we assume that R is positive definite and that Q is positive semidefinite. The termi-
nal cost x T (∞)Sx(∞) is absent. (For the problems that we have in mind the state converges
to zero so the terminal cost would not contribute anyway.) We approach the infinite horizon
LQ problem as the limit as T → ∞ of the finite horizon LQ problem over the time-window
[0, T ]. To make the dependence on T explicit we write the solution of the RDE (4.23) with a
subscript T , that is,

Ṗ T (t ) = −P T (t )A − A T P T (t ) + P T (t )B R −1 B T P T (t ) −Q, P T (T ) = 0. (4.28)

Example 4.5.1. Consider again the integrator

ẋ(t ) = u(t ),

still with the finite horizon cost


Z T
J [0,T ] (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0

The associated RDE is

Ṗ T (t ) = P T2 (t ) − 1, P T (T ) = 0.

It has solution
P T (t )
eT −t − e−(T −t )
P T (t ) = tanh(T − t ) = .
eT −t + e −(T −t )
T t→

This solution converges to a constant when T goes to infinity. ä

96
The example suggests that P T (t ) converges to a constant P as the horizon T goes to ∞. It
also suggests that limT →∞ Ṗ T (t ) = 0, which in turn suggests that the Riccati differential equa-
tion in the limit reduces to an algebraic equation,

0 = A T P + P A − P B R −1 B T P +Q . (4.29)

This type of equation is known as an Algebraic Riccati Equation (ARE). The following extensive
theorem shows that that is indeed the case. It requires just one condition: for each x 0 there
needs to exist at least one input that renders the cost J [0,∞) (x 0 , u(·)) finite (and, as always,
Q ≥ 0, R > 0).

Theorem 4.5.2 (Infinite horizon LQ). Consider ẋ(t ) = Ax(t ) + Bu(t ) and suppose Q ≥ 0, R > 0
and that for every x 0 an input exists that renders the cost (4.27) finite. Then the solution P T (t )
of (4.28) converges to a matrix independent of t as the final time T goes to infinity. That is, a
constant matrix P exists such that

lim P T (t ) = P ∀t > 0. (4.30)


T →∞

This matrix P has a number of properties.

1. P ≥ 0.

2. P satisfies the ARE (4.29), (but the ARE has more than one solution).

3. If an input u(·) achieves a finite cost then


Z ∞
J [0,∞) (x 0 , u(·)) = x 0T P x 0 + v T (t )R v(t ) dt
0

where v(t ) is defined as v(t ) = u(t ) + R −1 B T P x(t ).

4. The input that minimizes the infinite horizon cost (4.27) is u(t ) = −R −1 B T P x(t ) and the
optimal cost is

min J [0,∞) (x 0 , u(·)) = x 0T P x 0 .


u(·)

Proof. For every fixed x 0 the expression x 0T P T (t )x 0 is nondecreasing with T because the
longer the horizon the higher the cost. Indeed for every ² > 0 and initial x(t ) = z we have
Z T +²
z T P T +² (t )z = x ∗T (t )Qx ∗ (t ) + u ∗T (t )Ru ∗ (t ) dt
t
Z T
≥ x ∗T (t )Qx ∗ (t ) + u ∗T (t )Ru ∗ (t ) dt ≥ z T P T (t )z.
t

Besides being nondecreasing, it is, for any given z, also bounded from above because by as-
sumption for at least one input u z (·) the infinite horizon cost is finite,

z T P T (t )z ≤ J [t ,T ] (z, u z (·)) ≤ J [t ,∞) (z, u z (·)) < ∞.

Bounded and nondecreasing implies that z T P T (t )z converges as T → ∞. Next we prove that


in fact the entire matrix P T (t ) converges as T → ∞. Let e i be the i -th unit vector in Rn , so
e i = (0, . . . , 0, 1, 0, . . . , 0)T , with a 1 on the i-th position. The preceding discussion shows that for
each z = e i , the limit

p i i := lim e iT P T (t )e i
T →∞

97
exists. The diagonal entries of P T (t ) hence converge. For the off-diagonal entries we use that

lim [e i + e j ]T P T (t )[e i + e j ] = lim e iT P T (t )e i + e Tj P T (t )e j + 2e iT P T (t )e j


T →∞ T →∞
= p i i + p j j + lim 2e iT P T (t )e j .
T →∞

The limit on the left-hand side exists, so the limit p i j := limT →∞ e iT P T (t )e j exists as well.
Therefore all entries of P T (t ) converge. The limit is independent of t (see Exercise 4.16). Re-
mains to prove the 4 items:

1. P ≥ 0 because it is the limit of P T (t ) ≥ 0.

2. Since P T (t ) converges also Ṗ T (t ) = −P T (t )A − A T P T (t ) + P T (t )B R −1 B T P T (t ) − Q con-


verges. It converges to zero because the limit P T (t ) is independent of t .

3. First realize that for every input, an infinite horizon cost is never less than a finite hori-
zon cost:

J [0,∞) (x 0 , u(·)) ≥ J [0,τ] (x 0 , u(·)) + J [τ,T ] (x(τ), u(·))


≥ J [0,τ] (x 0 , u(·)) + x T (τ)P T (τ)x(τ)

for every T > τ > 0. Since this inequality holds for every T > τ it also holds in the limit
T → ∞:

J [0,∞) (x 0 , u(·)) ≥ J [0,τ] (x 0 , u(·)) + x T (τ)P x(τ).

Now for τ → ∞ the above two J ’s converge to each-other. Hence provided that
J [0,∞) (x 0 , u(·)) is finite we have

lim x T (τ)P x(τ) = 0. (4.31)


τ→∞

This we need in the rest of the proof.


Finally we redo the proof of the HJB theorem (Thm. 3.4.3) but now we exploit the spe-
cific structure of the LQ problem: define V (x) as

V (x) = x T P x.

Then (and skipping time arguments)


Z ∞
J [0,∞) (x 0 , u) = L(x, u) dt
Z0 ∞
∂V (x) ∂V (x)
= T
f (x, u) + L(x, u) − f (x, u) dt
∂x ∂x T
Z0∞
= 2x T P (Ax + Bu) + x T Qx + u T Ru − V̇ (x) dt
0
Z ∞
= x T (P A + A T P +Q)x + 2x T P Bu + u T Ru − V̇ (x) dt (4.32)
0

now we use that P satisfies the ARE, so A T P + P A +Q equals P B R −1 P B T :


Z ∞
= x T (P B R −1 B T P )x + 2x T P Bu + u T Ru − V̇ (x) dt
0
Z ∞
= (u + R −1 B T P x)T R(u + R −1 B T P x) − V̇ (x) dt . (4.33)
0

98
The final identity is not entirely straightforward. Verify it. It looks cleaner if we define
the signal v(t ) as

v(t ) := u(t ) + R −1 B T P x(t ).

That way (4.33) becomes


Z ∞
J [0,∞) (x 0 , u(·)) = v T (t )R v(t ) − V̇ (x(t )) dt .
0

Clearly the integral of −V̇ (x(t )) from 0 to ∞ equals V (x(0)) − V (x(∞)) and so we find
that
Z ∞
J [0,∞) (x 0 , u(·)) = V (x 0 ) − V (x(∞)) + v T (t )R v(t ) dt (4.34)
0

for every u(·) (optimal or not). If u(·) achieves a finite cost then V (x(∞)) = 0 because
of (4.31). Hence then we get what we wanted to prove:
Z ∞
J [0,∞) (x 0 , u(·)) = V (x 0 ) + v T (t )R v(t ) dt . (4.35)
0

4. The input u(t ) = −R −1 B T P x(t ) achieves a finite cost because then v(t ) = 0 and so
from (4.34) we find that J (x 0 , u(·)) = V (x 0 ) − V (x(∞)) ≤ V (x 0 ) which is finite. Hence
the previous part applies which says that the cost in fact then equals (4.35), i.e. equals
V (x 0 ) = x 0T P x 0 . Clearly (4.35) can not be less than V (x 0 ). The input u(t ) = −R −1 B T P x(t )
is therefore the optimal input and x 0T P x 0 is the optimal cost.

Example 4.5.3. Consider once more the integrator system

ẋ(t ) = u(t ),

with the cost


Z ∞
J [0,∞) (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0

First notice that u(t ) := −x(t ) ensures that the cost is finite, so we can Apply Thm. 4.5.2. The
ARE is

−P 2 + 1 = 0,

Obviously it has two solutions, P = ±1, and since Theorem 4.5.2 guarantees that the solution
that we need is ≥ 0 we have

P = 1.

The LQ-optimal input hence is the state feedback,

u(t ) = −R −1 B T P x(t ) = −x(t )

and the optimal cost is

P x 02 = x 02 .

99
Application of the optimal u(t ) = −x(t ) results in a stable closed loop,

ẋ(t ) = Ax(t ) + Bu(t ) = 0 − x(t ) = −x(t ). (4.36)

The state converges exponentially fast to zero, and then so does the input u(t ) = −x(t ). In
fact, it is easy to verify from this optimal state and input that the optimal cost is
Z ∞ Z ∞
2 2
x (t ) + u (t ) dt = x 02 e2t +x 02 e2t dt = x 02 .
0 0

This is indeed the same as V (x 0 , 0) = x 0 P x 0 = x 02 . ä

In this example we could easily figure out which solution of the ARE to take because we
know that the solution P is positive semi-definite. Also interesting is the fact that the closed
loop (4.36) is asymptotically stable (with zero equilibrium). This holds for a large class of
systems and opens up alternative ways to find P , as in the above example. The next theorem
assumes familiarity1 with detectability and stabilizability, see Appendix A.6.

Theorem 4.5.4 (Alternative ways to find P ). Suppose that Q ≥ 0, R > 0 and that (A, B ) is sta-
bilizable and (Q, A) detectable. Then the solution P defined in (4.30) that determines the
solution of the infinite horizon LQ problem exists and can alternatively be characterized as
follows:

1. The ARE (4.29) has a unique solution P ∈ Rn×n for which A −B R −1 B T P is asymptotically
stable, and this is the LQ-solution P of (4.30).
Hence the state feedback u ∗ (t ) = −R −1 B T P x(t ) is a stabilizing state feedback, meaning
that the eigenvalues of A − B R −1 B T P of the feedback system ẋ(t ) = Ax(t ) + Bu ∗ (t ) =
(A − B R −1 B T P )x(t ) have strictly negative real part.

2. The ARE (4.29) has a unique positive semi-definite solution, and this is the LQ-solution
P of (4.30).

Proof.

1. Suppose, to obtain a contradiction, that A − B R −1 B T P has an eigenvalue λ0 with


nonnegative real part. Let x 0 be a corresponding eigenvector. For this initial
state we have x(t ) = eλ0 t x 0 and, therefore, u(t ) = − eλ0 t R −1 B T P x 0 . Since J < ∞ it
R∞
must be that 0 u T (t )Ru(t ) dt is finite. However if u is nonzero then u T (t )Ru(t ) =
x 0T P B R −1 B T P x 0 e(λ0 +λ0 )t is nonzero everywhere and would make the cost infinite.
Hence u(t ) ≡ 0. This in turn shows that x 0 is eigenvector of A as well, with the same
eigenvalue λ0 because Ax 0 = (A − B R −1 B T P )x 0 = λ0 x 0 . Now by detectability of (Q, A) it
must be that Qx 0 is nonzero, but then x T (t )Qx(t ) = x 0T Qx 0 e(λ0 +λ0 )t would be nonzero
and would make the infinite horizon cost infinite. Contradiction. Hence no unstable
eigenvalue λ0 exists. Uniqueness is shown in Thm. 4.7.3 (page 107).

2. Suppose P S is a positive semi-definite solution of (4.29). We need to show that P S = P .


With this P S define the finite horizon LQ problem that includes a terminal cost S(x) :=
x T P S x ≥ 0:
S
J [0,T ] (x 0 , u(·)) := J [0,T ] (x 0 , u(·)) + S(x(T )).

1 Quick definition: A pair (A, B ) is stabilizable iff for every x there is an input u(·) for which the solution of
0
ẋ(t ) = Ax(t ) + Bu(t ) converges to zero as t → ∞. A pair (Q, A), with Q ≥ 0, is detectable if for every solution of
ẋ(t ) = Ax(t ) we have limt →∞ x T (t )Qx(t ) = 0 =⇒ limt →∞ x(t ) = 0.

100
The solution P TS (t ) of the associated RDE (4.23) for this case is constant P TS (t ) = P S be-
cause P S satisfies the ARE. Hence the optimal cost is x 0T P S x 0 (achieved for some u ∗S (·)).
Since S ≥ 0 we clearly have J S ≥ J for any input, in particular for u ∗S (·):

x 0T P S x 0 = J [0,T
S S T
] (x 0 , u ∗ (·)) ≥ x 0 P T (0)x 0 .

In the limit T → ∞ this becomes

x 0T P S x 0 ≥ x 0T P x 0 . (4.37)

S
The converse inequality also holds because for u ∗ (·) (optimal for J [0,T ] , not J [0,T ]
) we
find

x 0T P S x 0 ≤ J [0,T
S
] (x 0 , u ∗ (·)) = J [0,T ] (x 0 , u ∗ (·)) + S(x(T ))

which for T → ∞ (and using that u ∗ (·) achieves x(∞) = 0 hence S(x(∞)) = 0 according
the first part of this theorem) becomes

x 0T P S x 0 ≤ J [0,∞) (x 0 , u ∗ (·)) = x 0T P x 0 . (4.38)

Combination of (4.37) and (4.38) shows that x 0T (P S − P )x 0 = 0. This holds for all x 0 and
because P S − P is symmetric we therefore have that P S = P .

m u

−α ẏ

F IGURE 4.2: A car at position y(t ) with friction force −α ẏ(t ) and external force u(t ). See Exam-
ple 4.5.5

Example 4.5.5 (Infinite horizon – control of a single car). This is a laborious example. Con-
sider the mechanical system

m ÿ(t ) + α ẏ(t ) = u(t ), α > 0. (4.39)

This models a mass m at position y(t ) subject to an external force u(t ) and a friction force
proportional to the velocity, see Fig. 4.2. We take the mass equal to

m=1

and leave the friction coefficient α arbitrary (but positive). As state we take x(t ) = (y(t ), ẏ(t )).
Then (4.39) becomes
· ¸ · ¸
0 1 0
ẋ(t ) = Ax(t ) + Bu(t ) with A= , B= .
0 −α 1

101
The idea is to bring the mass to rest but without using much control effort. A possible solution
is to minimize the cost
Z ∞
J [0,∞) (x 0 , u(·)) = y 2 (t ) + ρ 2 u 2 (t ) dt .
0

The parameter ρ > 0 defines a trade-off between small y and small u. The bigger the ρ, the
larger the penalty on u in the cost, so probably the smaller the optimal control. The matrices
Q and R for our cost are
· ¸
1 0
Q= , R = ρ2.
0 0

(It can be shown that (A, B ) is stabilizable and (Q, A) detectable, but we do not want to go into
the details.) The ARE becomes
· ¸ · ¸ · ¸ · ¸ · ¸
0 1 0 0 0 0 1 0 0 0
P + P −P P + = . (4.40)
0 −α 1 −α 0 ρ −2 0 0 0 0

This matrix equation is effectively a set of three scalar equations in three unknown numbers.
£ p p 12 ¤
Indeed, the matrix P is symmetric so is characterized by three numbers, P = p 11 12 p 22
and then
the above left-hand side is symmetric so it equals zero iff its (1, 1)-element, (1, 2)-element and
(2, 2)-element is zero. This gives:

2
0 = 1 − ρ −2 p 12 ,
0 = p 11 − αp 12 − ρ −2 p 12 p 22 ,
2
0 = 2p 12 − 2αp 22 − ρ −2 p 22 .

From the firstpwe find that p 12 = ±ρ. If p 12 = +ρ then the third equation gives two possible
p 22 = ρ 2 (−α± α2 + 2/ρ). One is positive, the other is negative. We need the positive solution
because P is positive semidefinite only if p 22 ≥ 0. Now that p 12 and p 22 are known, the second
equation gives p 11 . This turns out to give
"p #
α2 + 2/ρ 1
P =ρ ¡ p ¢ . (4.41)
1 ρ −α + α2 + 2/ρ

(Similarly, for p 12 = −ρ the resulting P turns out not to be positive semi-definite so it is not the
solution P ). Conclusion: the P of (4.41) is the only positive semi-definite solution P . Hence it
is the solution we seek. The optimal control is

u ∗ (t ) = −ρ −2 B T P x(t )
h p i
= −1/ρ α − α2 + 2/ρ x(t ) (4.42)
¡ q ¢
= − ρ1 y(t ) + α − α2 + 2/ρ ẏ(t ).

This optimal control is a linear combination of the displacement y(t ) and speed ẏ(t ) of the
mass. These two terms can be interpreted as a spring and damper force in paralel, connected
to a fictitious wall, see Fig. 4.3. ä

102
y

1/ρ

m
p
−α ẏ α2 + 2/ρ − α

F IGURE 4.3: A car at position y(t ) with friction force −α ẏ(t ) optimally
p controlled with a spring
with spring coefficient 1/ρ and damper with damping coefficient α2 + 2/ρ − α. See Exam-
ple 4.5.5

u
k1 k2

m1 m2

r1 r2
q1 q2

F IGURE 4.4: Two connected cars. The purpose is to control the second car with a force u that
acts on the first car. See Example 4.6.1

4.6 Application: LQ control design for connected cars


Example 4.6.1 (Connected cars — numerical solution). In this example we consider an ap-
plication of two connected cars. The state dimension in this case is 4 which is too high to eas-
ily determine the solution of the Riccati equation by hand. The solution will be determined
numerically.
The two cars are connected to each other with springs and dampers and with the car on
the left connected to a wall, see Fig. 4.4. The two spring constants are denoted k 1 and k 2 and
the two damping coefficients are r 1 and r 2 . The horizontal position of the two cars relative to
the equilibrium positions are denoted q 1 (t ) and q 2 (t ) respectively and the two masses are m 1
and m 2 . We can control the first car with an additional force u(t ), but we want to control the
position q 2 (t ) of the second car. This application represents a common situation where the
control action is physically separated from the part that needs to be controlled.
The standard linear model for this system is
· ¸· ¸ · ¸· ¸ · ¸· ¸ · ¸
m1 0 q̈ 1 (t ) r 1 + r 2 −r 2 q̇ 1 (t ) k 1 + k 2 −k 2 q 1 (t ) u(t )
+ + = .
0 m 2 q̈ 2 (t ) −r 2 r2 q̇ 2 (t ) −k 2 k2 q 2 (t ) 0

For simplicity we take all masses and spring constants equal to one, m 1 = m 2 = 1, k 1 = k 2 = 1
and that the damping coefficients are small and the same: r 1 = r 2 = 0.1. Then the linear model
in the state x(t ) defined as x(t ) = (q 1 (t ), q 2 (t ), q̇ 1 (t ), q̇ 2 (t )) becomes
   
0 0 1 0 0
0 0 0 1  0
   
ẋ(t ) =   x(t ) +   u(t ).
−2 1 −0.2 0.1  1
1 −1 0.1 −0.1 0

As the damping coefficients are small one may expect sizeable oscillations when no control

103
is applied. Indeed, the matrix A has two eigenvalues close to the imaginary axis2 and for the
initial state x 0 = (0, 1, 0, 0) and u(t ) = 0 the positions q 1 (t ), q 2 (t ) of the two cars oscillate for a
long time, see Fig. 4.5(top).
To control the second car with the force u(t ) we propose the solution of the infinite hori-
zon LQ problem with cost
Z ∞
q 22 (t ) + Ru 2 (t ) dt .
0

The value of R was set, somewhat arbitrarily, to R = 0.2. As A is stable there is a control that
renders the cost finite, so the conditions of Thm. 4.5.2 are met and we are guaranteed that the
solution P of the corresponding Riccati equation exists. The solution, obtained numerically,
turns out to be
 
0.4126 0.2286 0.2126 0.5381
0.2286 0.9375 0.0773 0.5624
 
P = 
0.2126 0.0773 0.2830 0.4430
0.5381 0.5624 0.4430 1.1607

and then the optimal state feedback control u(t ) = −R −1 B T x(t ) is


£ ¤
u(t ) = − 1.0628 0.3867 1.4151 2.2150 x(t ).

Under this control the response to the initial state x 0 = (0, 1, 0, 0) is damped much quicker
than without control, see Fig. 4.5(middle). The eigenvalues of the controlled system ẋ = (A −
B R −1 B T P )x are −0.5925 ± 0.6847i and −0.2651 ± 1.7081i and these are considerably further
away from the imaginary axis than the eigenvalues of A, and the imaginary parts are almost
the same as before. This confirms the stronger damping in the controlled system.
All this is achieved with a control force u(t ) that never exceeds 0.5 in magnitude for this
initial state, see Fig. 4.5(bottom). Notice that the optimal control u(t ) starts out negative but
turns positive way before q 2 (t ) becomes zero for the first time. So apparently it is optimal to
initially speed up the first car away from the second car, but only for a very short period of
time, and then for the next couple of seconds to move the first car towards the second car.
The latter part probably limits overshoot.
For the initial state x 0 = (0, 1, 0, 0) the optimal cost x 0T P x 0 is the (2, 2)-element of P , so
R∞ 2 2
0 q 2 (t ) + Ru (t ) dt = 0.9375. ä

4.7 Connection between Hamiltonians and Riccati equations


In § 3.5 of the previous chapter we established a connection between value functions and
costates: p ∗ (t ) = ∂V (x∂x
∗ (t ),t )
. For our LQ problem with quadratic value functions V (x, t ) =
T
x P (t )x this means p ∗ (t ) = 2P (t )x ∗ (t ). Now, again, the formulas to come are cleaner in the
halved costate p̃(t ) = 21 p(t ). We have

p̃ ∗ (t ) = P (t )x ∗ (t ). (4.43)

Incidentally, this re-proves Lemma 4.2.1 because p̃ T (0)x(0) = x T (0)P (0)x(0) = V (x(0), 0). This
connection (4.43) expresses the costate p̃(t ) in terms of the solution P (t ) of the RDE, but it
2 At −0.011910 ± i0.61774 and two more at −0.13090 ± i1.61273

104
1
q
1
0.8 q
2

0.6

0.4

0.2

-0.2

-0.4

-0.6

-0.8
0 5 10 15 20 25

1
q1
q
2
0.8

0.6

0.4

0.2

-0.2
0 5 10 15 20 25

0.5
u
0.4

0.3

0.2

0.1

-0.1

-0.2

-0.3

-0.4
0 5 10 15 20 25

F IGURE 4.5: Top: positions of the uncontrolled cars. Middle: positions of the controlled cars.
Bottom: control force u(t ) for the controlled car. The initial state is q 1 (0) = 0, q 2 (0) = 1, q̇ 1 (0) =
0, q̇ 2 (0) = 0. See Example 4.6.1

105
can also be used to determine P (t ) using the states and costates. This goes as follows. In
Thm. 4.2.3 we saw that
· ¸ · ¸· ¸
x(t ) Σ11 (t ) Σ12 (t ) I
= x0 (4.44)
p̃(t ) Σ21 (t ) Σ22 (t ) M

for M = (SΣ12 (T ) − Σ22 (T ))−1 (SΣ11 (T ) − Σ21 (T )). If the mapping from x 0 to x(t ) is nonsingular
then x 0 follows uniquely from x(t ) as x 0 = (Σ11 (t ) + Σ12 (t )M )−1 x(t ) and then p(t ) also follows
uniquely from x(t ):

p(t ) = (Σ21 (t ) + Σ22 (t )M )x 0 = (Σ21 (t ) + Σ22 (t )M )(Σ11 (t ) + Σ12 (t )M )−1 x(t ).

Comparing this with (4.43) suggests the following explicit formula for P (t ):

Theorem 4.7.1 (Solution of RDE’s using Hamiltonians). Let S,Q, R be positive semi-definite
n × n matrices and R > 0. Then the solution of the RDE

Ṗ (t ) = −P (t )A − A T P (t ) + P (t )B R −1 B T P (t ) −Q, P (T ) = S

is

P (t ) = (Σ21 (t ) + Σ22 (t )M )(Σ11 (t ) + Σ12 (t )M )−1 . (4.45)

Here M = [SΣ12 (T )−Σ22 (T )]−1 [SΣ11 (T )−Σ21 (T )], and Σi j are n ×n sub-blocks from the matrix
exponential eH t .

Proof. Recall that the solution P (t ) of the RDE exists. If Σ11 (t ) + Σ12 (t )M would have been
singular at some t = t̄ , then any nonzero x 0 in the null space of Σ11 (t̄ )+Σ12 (t̄ )M renders x(t̄ ) =
0 while p̃(t̄ ) is nonzero (because Σ(t ) := eH t is invertible). This contradicts the fact that p̃(t ) =
P (t )x(t ). Hence Σ11 (t ) + Σ12 (t )M is invertible for all t ∈ [0, T ] and, consequently, the mapping
from x(t ) to p̃(t ) follows uniquely from (4.44) and it equals (4.45). See Exercise 4.21. ■

We have been using this result already a couple of times without mentioning it:
RT
Example 4.7.2. In Example 4.2.2 we tackled the minimization of 0 x 2 (t ) + u 2 (t ) dt for ẋ(t ) =
u(t ) using Hamiltonians, and we found that
· ¸ " #
Σ11 (t ) Σ12 (t ) 1 et + e−t − et + e−t eT − e−T
= , M = .
Σ21 (t ) Σ22 (t ) 2 − et + e−t et + e−t eT + e−T

The RDE for this problem was derived in Example 4.5.1:

Ṗ (t ) = P 2 (t ) − 1, P (T ) = 0.

According to (4.45) the solution of this RDE is


Σ21 (t ) + Σ22 (t )M
P (t ) =
Σ11 (t ) + Σ12 (t )M
− et + e−t +(et + e−t )M
= t
e + e−t +(− et + e−t )M
(− et + e−t )(eT + e−T ) + (et + e−t )(eT − e−T )
= t
(e + e−t )(eT + e−T ) + (− et + e−t )(eT − e−T )
eT −t − e−(T −t )
= T −t = tanh(T − t ).
e + e−(T −t )
ä

106
Infinite horizon LQ and stabilizing solutions of ARE’s. Also for the infinite horizon case
there is connection between Hamiltonians and solutions of Riccati equations. If an n × n
matrix P satisfies an Algebraic Riccati Equation (ARE)

P A + A T P − P B R −1 B T P +Q = 0

then it easy to see that


· ¸· ¸ · ¸
A −B R −1 B T I I
= (A − B R −1 B T P ).
−Q −A T P P

This is interesting because in the case that all matrices here are numbers (and the Hamilto-
nian matrix hence a 2 × 2 matrix) it says that
· ¸
I
P

is an eigenvector of the Hamiltonian matrix and that A − B R −1 B T P is an eigenvalue of the


Hamiltonian. In Thm. 4.5.4 we saw that under mild assumptions the matrix P that we need
in LQ is “stabilizing” meaning that A − B R −1 B T P is asymptotically stable. But then we should
be able to determine P from the eigenvalues and eigenvectors of the Hamiltonian. That is
indeed the case and most numerical routines in M ATLAB exploit this property:

Theorem 4.7.3 (Stabilizing solution of AREs = eigenvalue problem). Let Q = Q T ≥ 0, R = R T >


0. Then the 2n × 2n Hamiltonian matrix
· ¸
A −B R −1 B T
−Q −A T

has no imaginary eigenvalues, and there is a matrix V ∈ R2n×n of rank n with the property
that
· ¸
A −B R −1 B T
V =VΛ
−Q −A T

for some asymptotically stable Λ ∈ Rn×n . Furthermore, given any such V decompose V as
£ ¤
V = VV12 with V1 ,V2 ∈ Rn×n . Then V1 is invertible and

P = V2V1−1

is the unique matrix P for which A − B R −1 B T P is asymptotically stable. Moreover this P is


symmetric.

Proof. This proof assumes knowledge of linear algebra. If P is a solution of the ARE, then
· ¸· ¸· ¸ · ¸
I 0 A −B R −1 B T I 0 A − B R −1 B T P −B R −1 B T
= . (4.46)
−P I −Q −A T P I 0 −(A − B R −1 B T P )T
£ I 0 ¤£ I 0 ¤
Interestingly −P I P I = I 2n so the above equation is a similarity transformation and this
£ A −B R −1 B T ¤
tells us that the eigenvalues of the Hamiltonian −Q −A T are those of A − B R −1 B T P and
−(A − B R −1 B T P )T . If P is stabilizing then none of these eigenvalues lie on the imaginary axis.
Furthermore it shows that there are n stable eigenvalues (those of A − B R −1 B T P ) and n anti-
stable eigenvalues (those of −(A−B R −1 B T P )T ). Now for simplicity assume that all eigenvalues

107
are distinct, and let λ1 , . . . , λn denote the n stable eigenvalues. To each eigenvalue λi there
£ A −B R −1 B T ¤
corresponds an eigenvector v i . That is, −Q −A T v i = λi v i , or in matrix form
· ¸
A −B R −1 B T £ ¤ £ ¤
v1 v2 ··· v n = λ1 v 1 · · · λ2 v n
−Q −A T | {z }
V
£ ¤
= v1 v2 ··· vn Λ
| {z }
V

where Λ is the diagonal matrix of eigenvalues λ1 , . . . , λn . As (4.46) is a similarity transforma-


£ ¤
tion, the eigenvectors v i may also be written as v i = PI 0I w i where w i is an eigenvector of
£ A−B R −1 B T P −B R −1 B T
¤
0 −(A−B R −1 B T P )T
with eigenvalue λi . Now all λi are stable, and they are the eigenval-
ues of A − B R −1 B T P and not of the antistable −(A − B R −1 B T P )T , so w i must be of the form
£ ¤
w i = ∗0 . Hence
· ¸· ¸
I 0 W1
V= ,
P I 0

where
· ¸
W1 £ ¤
= w1 w2 ··· wn .
0
£ W1 ¤
Written out this becomes V = PW . Now W1 is nonsingular because the matrix of eigenvec-
£W ¤ 1
tors 0 has full column rank. Therefore V1 = W1 is nonsingular and P follows uniquely as
1

V2V1−1 = (PW1 )W1−1 = P .


Conversely if V1 is nonsingular then it is easy to verify that P := V2V1−1 satisfies the ARE
and that is stabilizing with A − B R −1 B T P = V1 ΛV1−1 .
To verify symmetry of P , define T = P T − P . Then (A − B R −1 B T P )T T + T (A − B R −1 B T P ) = 0.
This is a Lyapunov equation and its solution is unique because A−B R −1 B T P is asymptotically
stable (see Lemma 4.7.4.) That is, T = 0, i.e, P = P T . ■

Lemma 4.7.4 (Technical lemma). If A ∈ Rn×n is asymptotically then for every Q ∈ Rn×n (not
necessarily symmetric) the equation A T P +P A = −Q has a unique solution P ∈ Rn×n (not nec-
essarily symmetric).
R∞ T
Proof. Based on Lyapunov theory we guess that P := 0 e A t Q e At dt is one solution. Indeed,
then
Z ∞ ¯∞
T T T ¯
T
A P +PA = A T e A t Q e At + e A t Q e At A dt = e A t Q e At ¯ = 0 −Q = −Q.
0 0
n×n
This shows that every −Q ∈ R T
is in the range of A P + P A. The linear mapping that sends
n×n n×n
P ∈R to A P + P A ∈ R
T
hence is surjective. Then, by the dimension theorem, it is injec-
tive as well, i.e. the solution P of A T P + P A = −Q is unique. ■

Those familiar with Linear Algebra would say that “V spans the stable eigenspace” of the
Hamiltonian. In any event, the proof of the above theorem shows that finding P is essentially
an eigenvalue problem and these problems are normally numerically tractable.

Example 4.7.5. Consider for the final time the system ẋ(t ) = u(t ) and infinite horizon cost
R∞ 2 2
0 x (t ) + u (t ) dt . The Hamiltonian is
· ¸
0 −1
H = .
−1 0

108
The eigenvalues of the Hamiltonian are λ1,2 = ±1. The eigenvectors for the stable eigenvalue
λ1 = −1 are
· ¸ · ¸
v1 1
v := = c, c 6= 0.
v2 1

The stabilizing solution P of the ARE hence is


v2 c
P= = = 1.
v1 c

It does not depend on the choice of eigenvector (as predicted). The (eigen)value of A −
B R −1 B T P = −1 by construction equals λ1 = −1. The optimal control is u = −B R −1 P x = −x.
Ready. ä

Example 4.7.6. Consider the following system


· ¸ · ¸· ¸ · ¸
ẋ 1 (t ) 0 1 x 1 (t ) 0
= + u(t )
ẋ 2 (t ) 0 0 x 2 (t ) 1

with cost
Z ∞
J [0,∞) (x 0 , u(·)) = x 12 (t ) + x 22 (t ) + u 2 (t ) dt .
0

The associated Hamiltonian matrix is (verify this yourself)


 
0 1 0 0
0 0 0 −1
 
H = .
−1 0 0 0
0 −1 −1 0

The four eigenvalues turn out to be


p p
λ1,2 = − 21 3 ± 12 i, λ3,4 = + 21 3 ± 12 i.

The first two, λ1,2 , are stable so we need corresponding eigenvectors. We can take these two
 
−λ1,2
−λ2 
 
v 1,2 =  1,2  .
 1 
λ31,2

Combined this defines V ∈ C4×2 as


 
−λ1 −λ2
£ ¤  −λ2 −λ22 

V = v1 v2 =  1 .
 1 1 
λ31 λ32

(The fact that these are complex is not a problem.) Now with V known it follows that
· ¸· ¸−1 ·p ¸
1 1 −λ1 −λ2 3 1
P = V2V1−1 = 3 = p .
λ1 λ32 −λ21 −λ22 1 3
p
The optimal input is u(t ) = −R −1 B T P x(t ) = −x 1 (t ) − 3x 2 (t ). ä

109
4.8 Exercises
4.1 Consider the system

ẋ(t ) = 3x(t ) + 2u(t ), x(0) = x 0 ,

with cost
Z T
J [0,T ] (x 0 , u(·)) = 4x 2 (t ) + u 2 (t ) dt .
0

(a) Determine the Hamiltonian matrix H .


(b) Show that the eigenvalues of H are ±5.
(c) It can be shown that
· ¸
Ht 1 4 e5t + e−5t −2 e5t +2 e−5t
e = .
5 −2 e5t +2 e−5t e5t +4 e−5t

For arbitrary T > 0 determine the optimal x ∗ (t ), u ∗ (t ), p ∗ (t ) and the optimal cost.

4.2 Consider the system and cost of Example 4.4.2. In that example we found a minimizing
control only if 0 ≤ T < π/2. For T = π/2 the method failed. In this exercise we use
Hamiltonians to analyse the case T = π/2:

(a) Determine the Hamiltonian matrix H for this problem.


(b) It can be shown that
· ¸
Ht cos(t ) − sin(t )
e = .
sin(t ) cos(t )

Use this to confirm the claim that for T = π/2 the Hamiltonian equations (4.6)
have no solution if x 0 6= 0.
(c) Does Pontryagin’s Minimum Principle allow us to conclude that for T = π/2 and
x 0 6= 0 no optimal control u ∗ (·) exists?
R π/2 R π/2
(d) A Wirtinger inequality. Show that 0 ẋ 2 (t ) dt ≥ 0 x 2 (t ) dt for all smooth x(·) for
which x 0 = 0, and show that equality holds if-and-only-if x(t ) = A sin(t ).

4.3 Suppose

ẋ(t ) = x(t ) + u(t ), x(0) = x 0 := 1

and that
Z T
2
J (x 0 , u(·)) = 2x (T ) + u 2 (t ) dt
0

for arbitrary positive T .

(a) Determine the RDE.


(b) Solve the RDE. [Hint: the solution happens to be constant!]
(c) Determine the optimal state x ∗ (t ) and input u ∗ (t ) explicitly as functions of time.
(d) Verify that J (1, u ∗ (·)) = P (0).
RT
4.4 Minimize 0 2x 2 (t ) + u 2 (t ) dt over all u(·) and x(·) subject to ẋ(t ) = x(t ) + u(t ), x(0) = x 0 .

110
RT
4.5 Minimize 0 x 2 (t ) + ẍ 2 (t ) dt over all x(·) with x(0) = 1, ẋ(0) = 0. [Hint: define an appro-
priate u(t ).]

4.6 Why LQ-optimal inputs are linear in the state and costs are quadratic in the state. In this
exercise we prove, using only elementary arguments, that the optimal control in LQ
control is linear in the state and the value function is quadratic in the state. Consider
ẋ(t ) = Ax(t ) + Bu(t ) with the standard LQ cost over the time window [t , T ],
Z T
T
J [t ,T ] (x(t ), u(·)) = x (T )Sx(T ) + x T (τ)Qx(τ) + u T (τ)Ru(τ) dτ,
t

and let V (x, t ) be the value function.

(a) Exploit the quadratic nature of the cost to prove that for every λ ∈ R, every two
x, z ∈ Rn and every two inputs u(·), w(·) we have

J [t ,T ] (λx, λu(·)) = λ2 J (x, u(·)),


J [t ,T ] (x + z, u(·) + w(·)) + J [t ,T ] (x − z, u(·) − w(·))
= 2J [t ,T ] (x, u(·)) + 2J [t ,T ] (z, w(·)). (4.47)

(The second identity is known as the parallelogram law.)


(b) Prove that V (λx, t ) = λ2V (x, t ) and that input λu ∗ (·) is optimal for initial state λx
if u ∗ (·) is optimal for initial state x.
(c) Conclude that

V (x + z, t ) + V (x − z, t ) ≤ 2V (x, t ) + 2V (z, t ).

[Hint: minimize the right-hand side of (4.47) over all u(·), w(·).]
(d) Likewise conclude that

V (x + z, t ) + V (x − z, t ) ≥ 2V (x, t ) + 2V (z, t ).

[Hint: minimize the left-hand side of (4.47) over all u(·) + w(·), u(·) − w(·).]
(e) Suppose u x (·) is the optimal input for x and w z (·) is the optimal input for z. Show
that

J [t ,T ] (x + z, u x (·) + w z (·)) − V (x + z, t ) = V (x − z, t ) − J [t ,T ] (x − z, u x (·) − w z (·)).

(f) Prove that if u x (·) is the optimal input for x and w z (·) is the optimal input for z,
then u x (·) + λw z (·) is optimal for x + λz.
(g) Part 4.6f shows that the optimal control u ∗ : [t , T ] → Rm for J [t ,T ] (x(t ), u(·)) is linear
in x. Show that this implies that at each t the optimal control u ∗ (t ) is linear in x(t ).
(h) Argue that V (x, t ) is quadratic in the state, i.e. that V (x, t ) = x T P (t )x for some ma-
trix P (t ) ∈ Rn×n .

4.7 The scalar Riccati equation is of the form


¡ ¢¡ ¢
ṗ(t ) = γ p(t ) + α p(t ) + β

This equation may be solved explicitly, as we show in this exercise.

111
(a) Prove that
1
q(t ) :=
p(t ) + α

satisfies

q̇(t ) = γ(α − β)q(t ) − γ.

(b) Solve the RDE for A = −1, B = 2, Q = 4, S = 0, t 0 = 0 and t 1 = 1.


(c) Determine the solution of the Riccati equation as in (b), but now for S = 1.
(d) Determine the solution of the RDE for the system from Example 3.4.5. So A =
0, B = 1,Q = 1, S = 0, t 0 = 0 and t 1 = 1.

4.8 Consider the scalar system and cost

ẋ(t ) = x(t ) + u(t ), x(t 0 ) = x 0 , (4.48)

with cost
Z 0
J [t0 ,0] (x 0 , u(·)) = g x 2 (0) + 3x 2 (t ) + u 2 (t ) dt .
t0

The initial time t 0 is assumed to be negative. The final time T is taken to be zero.

(a) Determine the associated RDE and show that the solution is given by
( −3+3 e4t −3g −e4t g
−1−3 e4t −g +e4t g
if g 6= 3
P (t ) = (4.49)
3 if g = 3.

(b) Plot P (t ) for −2 ≤ t ≤ 0 for g ∈ {0, 1, 2, 2.5, 2.9, 3, 3.1, 3.5, 4}.
What do you conclude from these plots? Give an intuitive explanation.
(c) We know from the theory that for g = 0, P (t ) is decreasing on (−∞, 0]. From the
plot that you just made it appears that the same is true for all values of g between
0 and 3. Give a formal proof of this observation. Hint: define P̃ (t ) := P (t ) − g and
derive a differential equation for P̃ . Use the general theory to argue that P̃ (t ) is
decreasing on (−∞, 0].
(d) We know from the theory that for g = 0, P (t ) is decreasing on (−∞, 0]. From the
plot that you just made it appears that P (t ) is increasing for all g ≥ 3. Give a for-
mal proof of this observation. Hint: define P̃ (t ) := P 1(t ) and derive a differential
equation for P̃ . Use the general theory to argue that P̃ (t ) is decreasing on (−∞, 0].
(e) Take t 0 = −2 and assume that the initial state x(−2) = 1. Plot the state trajectory
x(t ) and the optimal input u(t ) for t ∈ [−2, 0] for g ∈ {0, 2, 4}.
Carefully observe the behavior of the state near t = 0. What do you see? Can you
explain this?

4.9 Sometimes a transformation of the state variables can facilitate solving the optimal con-
trol problem.
With z(t ) defined as z(t ) = E −1 x(t ), show that the LQ problem for ẋ(t ) = Ax(t ) + Bu(t )
with cost
Z T
J [0,T ] (x 0 , u(·)) = x T (t )Qx(t ) + u T (t )Ru(t ) dt
0

112
yields the problem

ż(t ) = Ãz(t ) + B̃ u(t ),

with cost
Z T
J˜[0,T ] (z 0 , u(·)) = z T (t )Q̃z(t ) + u T (t )Ru(t ) dt ,
0

where à = E −1 AE , B̃ = E −1 B and Q̃ = E T QE .
Also, what is the relationship between the value functions for both problems?

4.10 (This exercise assumes you know how to diagonalize a matrix.) Consider the system
· ¸ · ¸
−1 1 1 0
ẋ(t ) = x(t ) + u(t )
1 −1 0 1

with cost
Z ∞
1 2 1 2 2 2
J [0,∞) (x 0 , u(·)) = 2 x 1 (t ) + 2 x 2 (t ) + u 1 (t ) + u 2 (t ) dt .
0

(a) Show that u ∗ (t ) = −P x ∗ (t ), where P is the solution of an Algebraic Riccati Equa-


tion.
(b) To find this solution, we want to perform the transformation z = E −1 x, for a suit-
able matrix E . Choose E such that E −1 AE is diagonal and solve P .
Hint: Use Exercise 4.9.

4.11 Determine P in Exercise 4.10 using Theorem 4.7.3 (page 107.)

4.12 Consider the system

ÿ(t ) + y(t ) = u(t ), y(0) = y 0 , ẏ(0) = y 1 .

(a) Write this system in a minimal state space model with x 1 (t ) = y(t ) and x 2 (t ) = ẏ(t ).
(b) Determine the optimal control for the control problem with cost
Z ∞
J [0,∞) (x 0 , u(·)) = x 2 (t ) + u 2 (t ) dt .
0

Also, find the minimal cost associated with the initial conditions y 0 = 0 and y 1 = 1.

4.13 Plot the eigenvalues of the closed loop system from Example 4.5.5 as a function of ρ for
fixed α. Also, plot them as a function of α for fixed ρ. What is the interpretation?

4.14 In this exercise, we want to use the optimal control theory to follow a given signal, the
so-called tracking problem. In order to get to know the system a bit better, we start with
the simple scalar system

ẋ(t ) = −x(t ) + u(t ), x(0) = x 0 .

We want to find a control such that the state follows the constant signal 1. First, we
show that such a control indeed exists.

(a) Find an input u(·) such that with x 0 = 0

lim x(t ) = 1.
t →∞

113
We now want to choose the control in an optimal way. We therefore introduce the cost
function
Z T
1
J [0,T ] (x 0 , u(·)) = (x(t ) − 1)2 + u 2 (t ) dt .
0 T

(b) Prove that for your choice of u(·) in part 4.14a,

lim J [0,T ] (0, u(·)) < ∞.


T →∞

(c) Prove that for your choice of u(·) in part 4.14a,


Z ∞
J [0,∞) (0, u(·)) = (x(t ) − 1)2 + u 2 (t ) dt
0

is not finite.
(d) Prove that there does not exist a control u(·) such that

lim x(t ) = 1,
t →∞

and J [0,∞) (0, u(·)) is finite.


(e) Try V (x, t ) = p(t )x 2 + q(t )x + r (t ) as a candidate solution for the Bellman equation
with cost J [0,T ] . Give the differential equations for p(t ), q(t ) and r (t ).
(f) Give the expression for u ∗ (t ) in terms of p(t ), q(t ), r (t ) and x ∗ (t ).
(g) Choose x 0 = 0. Plot the solutions for p(t ), q(t ), r (t ), u ∗ (t ) and x ∗ (t ) for T equal to
5, 10 and 100. To do this, you may want to use M APLE or M ATHEMATICA.
(h) Give the (numerical) value for the minimal values of J [0,T ] (0, u(·)), with T as in the
previous part.

Now we study the general case

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 ,


y(t ) = C x(t )

We want the output of this system to follow the constant signal Yr . Therefore, we con-
sider the cost
Z T
1
J [0,T ] (x 0 , u(·)) = ky(t ) − Yr k2 + ku(t )k2 dt .
0 T

(i) Give the equations for the value function, as in part 4.14e.
(j) What is the relationship between the value function derived by you and the matrix
RDE? For which Yr are both value functions equal?
(k) Prove that the optimal control problem has a solution.

4.15 Consider the linear time-invariant system

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0

with cost
Z T
J [0,T ] (x 0 , u(·)) = u 2 (t ) dt .
0

114
(a) Determine the optimal control using Theorem 2.4.2.
(b) Determine also the optimal state and costate using Differential Equation (4.6). You
are not allowed to use part 4.15a.
(c) Determine the value function V (x 0 , t ), using the x ∗ (t ) and p ∗ (t ) found in 4.15b.
R∞
(d) Solve the minimization problem for J [0,∞) (x 0 , u(·)) = 0 u 2 (t ) dt .
(e) What are the minimal cost in 4.15d? Do these satisfy the algebraic Riccati equa-
tion? Is (Q, A) detectable?

4.16 In the proof of Theorem 4.5.2 it is shown that limT →∞ P T (t ) exists for each t . Show that
the limit is independent of t .

4.17 Consider Theorem 4.5.2 and assume in addition that (A,Q) is observable. Show that
P > 0. [Hint: argue that otherwise there would have been initial states x 0 with optimal
cost x 0T P x 0 equal to zero, and that is impossible for observable (A,Q).]

4.18 ARE, see also Exercise 4.1. Consider

ẋ(t ) = 3x(t ) + 2u(t ), x(0) = x 0 ,

with cost
Z ∞
J [0,∞) (x 0 , u(·)) = 4x 2 (t ) + u 2 (t ) dt
0

(a) Is the system controllable?


(b) Is the system with output y = Qx = 4x observable?
(c) Determine the LQ-solution P of the Algebraic Riccati Equation.
(d) Verify that A − B R −1 B T P is stable.
(e) Determine the optimal input of the form u(t ) = −F x(t ) (that is, find F ).
(f) Determine the optimal cost.
(g) Considering that the Hamilton matrix H of Exercise 4.1 has eigenvalues ±5 what
result would you guess?

4.19 Consider the system

ẋ(t ) = x(t ) + u(t ), x(0) = x 0 , u(t ) ∈ R

on the infinite time horizon with cost


Z ∞
J [0,∞) (x 0 , u(·)) = γ2 x 2 (t ) + u 2 (t ) dt .
0

Here γ is some nonzero real number.

(a) Determine all solutions P of the Algebraic Riccati Equation.


(b) Determine the value function.
(c) Find the optimal u(t ), x(t ) explicitly as a function of time.
(d) Explicitly determine the optimal solution u(t ), x(t ) and value function for the case
that γ = 0. Also, which general result tells you that the optimal solution might be
qualitatively different for γ = 0 compared to γ 6= 0?

115
4.20 Let Q ≥ 0, R > 0 and suppose that B = 0. Consider Thm. 4.5.4.

(a) Under what conditions on A are the assumptions of Thm. 4.5.4 satisfied?
(b) Determine the ARE for this case.
(c) Thm. 4.5.4 re-proves which result of § B.5?

4.21 Theorem 4.7.1 claims that P (t ) = Z (t ) where Z (t ) = (Σ21 (t ) + Σ22 (t )M )(Σ11 (t ) +


Σ12 (t )M )−1 . However, all we have shown in the proof is that (P (t ) − Z (t ))x(t ) = 0 for all
t ∈ [0, T ] and all x 0 ∈ Rn . Why can we conclude that P (t ) = Z (t ) for all t ∈ [0, T ]?

4.22 Consider the optimal control problem

ẋ(t ) = u(t ), x(0) = x 0

with u(t ) ∈ R and cost


Z 1 Z ∞
J [0,∞) (x 0 , u(·)) = u 2 (t ) dt + 4x 2 (t ) + u 2 (t ) dt .
0 1

(a) Assume first that x(1) is given. Determine the optimal cost-to-go from t = 1 on:
R∞
V (x(1), 1) := minu 1 4x 2 (t ) + u 2 (t ) dt .
R1
(b) Express the optimal cost J [0,∞) (x 0 , u(·)) as J [0,∞) (x 0 , u(·)) = 0 u 2 (t ) dt + Sx 2 (1).
(That is: what is S?)
(c) Solve the optimal control problem: determine the optimal cost J [0,∞) (x 0 , u(·)) and
express the optimal input u(t ) as a function of x(t ). [Hint: use separation of vari-
ables, see § A.3.]

116
Appendix A

Background material

This appendix contains concise summaries of a number of topics that play a role in opti-
mal control. Each section covers one topic and most can be read independently from the
other sections. The topics are standard and are covered in some form or another in calcu-
lus courses, a course on differential equations or a first course on systems theory. Nonlinear
differential equations are discussed in Appendix B.

A.1 Positive definite functions and matrices


Suppose Ω is a neighborhood of some x̄ ∈ Rn . A continuously differentiable function V : Ω →
R is said to be positive definite relative to x̄ if

V (x) > 0 ∀x ∈ Ω\{x̄} and V (x̄) = 0.

It is positive semi-definite – also known as nonnegative definite – if

V (x) ≥ 0 ∀x ∈ Ω\{x̄} and V (x̄) = 0.

A real symmetric n × n matrix P is said to be positive definite if V (x) := x T P x is a positive


definite function relative to x̄ = 0 ∈ Rn . In this case the neighborhood Ω is irrelevant and we
may as well take Ω = Rn , so a symmetric P ∈ Rn×n is positive definite if

x T P x > 0 ∀x ∈ Rn , x 6= 0.

It is positive semi-definite (or nonnegative-definite) if

x T P x ≥ 0 ∀x ∈ Rn .

The notation V > 0 and P > 0 means that the function/matrix is positive definite. Inter-
estingly real symmetric matrices have real eigenvalues only, and there exist simple tests for
positive definiteness:

Lemma A.1.1 (Tests for positive definiteness). Suppose P is an n × n real symmetric matrix.
The following six statements are equivalent.

1. P > 0.

2. All leading principal minors are positive: det(P 1:k,1:k ) > 0 for all k ∈ {1, 2, . . . , n}.

3. All eigenvalues of P are > 0.

117
4. There is a nonsingular matrix X such that P = X T X .

5. Cholesky factorization: there is a (unique) upper-triangular matrix X with diagonal en-


tries > 0 such that P = X T X .

6. For a partition of P ,
· ¸
P 11 P 12
P= T
P 12 P 22

with P 11 square (hence P 22 square) we have that

T −1
P 11 > 0 and P 22 − P 12 P 11 P 12 > 0.

T −1
(That is, both P 11 and its so-called Schur complement P 22 − P 12 P 11 P 12 are positive def-
inite).
ä

For positive semi-definite matrices similar tests exist, except for the principal minor test
which is now more involved:

Lemma A.1.2 (Tests for positive semi-definiteness). Let P = P T ∈ Rn×n . The following state-
ments are equivalent.

1. P ≥ 0.

2. All principal minors (not just the leading ones) are nonnegative: det(P I ,I ) ≥ 0 for every
subset I of {1, . . . , n}.

3. All eigenvalues of P are ≥ 0.

4. There is a matrix X such that P = X T X .

5. Cholesky factorization: there is a (unique) upper-triangular matrix X with diagonal en-


tries ≥ 0 such that P = X T X .

Moreover, if for some partition


· ¸
P 11 P 12
P= T
P 12 P 22

T −1
the matrix P 11 is square and invertible, then P ≥ 0 iff P 11 > 0 and P 22 − P 12 P 11 P 12 ≥ 0. ä
£ 0¤
Example A.1.3. P = 00 −1 is not positive semidefinite because the principle minor det P 2,2 =
−1 is not nonnegative.
£ ¤
P = 00 01 is positive semidefinite because all three principle minors, det(0), det(1), det(P )
are nonnegative. ä

A.2 A notation for partial derivatives


We introduce a notation for partial derivatives of functions f : Rn 7→ Rk .

118
∂ f (x)
First the case k = 1, so f : Rn → R. The ∂x is then a vector of partial derivatives of the
same dimension as x. For the standard choice of column vectors x (with n entries) this means
 
∂ f (x)
 ∂x 
 1 
 
 ∂ f (x) 
∂ f (x)   n
:= 
 ∂x 2  ∈R .
∂x  .. 
 . 
 
 ∂ f (x) 
∂x n
With the same logic we get a row vector if we differentiate with respect to a row vector,
· ¸
∂ f (x) ∂ f (x) ∂ f (x) ∂ f (x)
:= ··· ∈ R1×n .
∂x T ∂x 1 ∂x 2 ∂x n
Now the case k ≥ 1. If f (x) ∈ Rk is itself vectorial (column) then similarly we end up with
 
∂ f 1 (x) ∂ f 1 (x) ∂ f 1 (x)
 ∂x ···
 1 ∂x 2 ∂x n  
∂ f (x)  

: =  .. . .. .. ..  ∈ Rk×n ,
∂x T  . . .  
 ∂ f k (x) ∂ f k (x) ∂ f k (x) 
···
∂x 1 ∂x 2 ∂x n
and
 
∂ f 1 (x) ∂ f k (x)
 ∂x ···
 1 ∂x 1  
 
 ∂ f (x) ∂ f k (x) 
∂ f T (x)  1 ···  n×k
:=
 ∂x 2 ∂x 2  ∈R .
∂x  .. .. .. 
 . . . 
 
 ∂ f 1 (x) ∂ f k (x) 
···
∂x n ∂x n
The first is the Jacobian, the second is its transpose. Convenient about this notation is that
the n × n Hessian of a function f : Rn → R can now compactly be denoted as
µ ¶ · ¸
∂2 f (x) ∂ ∂ f (x) ∂ ∂ f (x) ∂ f (x) ∂ f (x)
:= = ···
∂x∂x T ∂x ∂x T ∂x ∂x 1 ∂x 2 ∂x n
 2 
∂ f (x) ∂2 f (x) ∂2 f (x)
 ∂2 x · · ·
 1 ∂x 1 ∂x 2 ∂x 1 ∂x n 

 2 
 ∂ f (x) ∂2 f (x) ∂2 f (x) 
 ··· 
= ∂x 2 ∂x 1 ∂2 x 2 ∂x 2 ∂x n 
.
 .. .. .. .. 
 . . . . 
 
 ∂2 f (x) 2
∂ f (x) ∂ f (x) 
2
···
∂x n ∂x 1 ∂x n ∂x 2 ∂2 x n
Indeed, we first differentiate with respect to a row x T and subsequently differentiate the out-
come (a row) with respect to a column x, resulting in an n × n matrix of second-order partial
derivatives. If f (x) is twice continuously differentiable then the order in which we differenti-
ate does not matter (Clairaut’s theorem) so then
∂2 f (x) ∂2 f (x)
= .
∂x∂x T ∂x T ∂x
The Hessian is then symmetric.

119
A.3 Separation of variables
Let x : R → R and consider the differential equation

g (t )
ẋ(t ) = (A.1)
h(x(t ))

with g (·), h(·) some given continuous functions. Let H (·),G(·) denote anti-derivatives of
h(·), g (·). The differential equation is equivalent to

h(x(t ))ẋ(t ) = g (t )

and we see that the left-hand side is the derivative of H (x(t )) with respect to t and the right-
hand side obviously is the derivative of G(t ) with respect to t . So it must be that

H (x(t )) = G(t ) + c 0

for some integration constant c 0 . That is

x(t ) = H −1 (G(t ) + c 0 ). (A.2)

This derivation assumes that H (·) is invertible. The c 0 is typically used to match an initial
condition x(t 0 ).

Example A.3.1. We solve the differential equation

ẋ(t ) = −x 2 (t ), x(0) = x 0

of Example B.1.5 using separation of variables. We split the solution in two columns; the first
column is the example, the second column makes a connection with the general procedure:

ẋ(t ) = −x 2 (t ) h(x(t )) = 1/x 2 , g (t ) = 1


ẋ(t )
= −1 h(x(t ))ẋ = g (t )
x 2 (t )
1
− = −t + c 0 H (x(t )) = G(t ) + c 0
x(t )
1
x(t ) = x(t ) = H −1 (G(t ) + c 0 )
t − c0

In this example the inverse exists as long as t 6= c 0 . Now x 0 = x(0) = −1/c 0 so c 0 can be ex-
pressed in terms of x 0 as c 0 = −1/x 0 and the above solution then becomes

1 x0
x(t ) = = . (A.3)
t + 1/x 0 x 0 t + 1

The solution x(t ) escapes at t = −1/x 0 . (For the escape time problem we refer to Exam-
ple B.1.5.) ä

Example A.3.2. Suppose that

ẋ(t ) = ax(t ), x(0) = x 0

and that x(t ) > 0 for some time. Then we may divide by x(t ),

ẋ(t )
= a.
x(t )

120
Integrating both sides and using that x(t ) > 0, we find that

log(x(t )) = at + c 0 .

The logarithm is invertible, yielding

x(t ) = eat +c0 = x 0 eat

and x 0 = ec0 . For x(t ) < 0 the same solution x 0 eat results (verify this yourself), and if x(t ) = 0
for some time t then x(t ) = 0 for all time, which is also of the form x(t ) = x 0 eat . In summary,
for every x 0 ∈ R the solution is x(t ) = x 0 eat . ä

A.4 Linear constant-coefficient DE’s


On the basis of a few examples we briefly refresh the method of characteristic equations for
solving linear differential equations (DE’s). Several exercises and examples in this book as-
sume familiarity with this method.
To determine the solution y : R → R of the homogeneous DE

ÿ(t ) + 5 ẏ(t ) + 6y(t ) = 0

we first determine its characteristic equation

λ2 + 5λ + 6 = 0.

The zeros λ (over the complex numbers) of this equation are

λ = −2, λ = −3

and then the general solution y(t ) follows as the linear combinations of corresponding expo-
nential terms

y(t ) = c 1 e−2t +c 2 e−3t

with c 1 , c 2 arbitrary constants. For non-homogeneous equations with an exponential term on


the right, say,

ÿ(t ) + 5 ẏ(t ) + 6y(t ) = 2u̇(t ) + 3u(t ), u(t ) = es0 t (A.4)

one can find a particular solution y part (t ) of the same exponential form, y part (t ) = A es0 t . The
constant A follows easily by equating left and right-hand side of (A.4). For this example it
gives
2s 0 + 3
y part (t ) = 2
es 0 t .
s 0 + 5s 0 + 6
Then the general solution is obtained by adding the general solution of the homogeneous
equation
2s 0 + 3
y(t ) = 2
es 0 t +c 1 e−2t +c 2 e−3t .
s 0 + 5s 0 + 6
If s 0 happens to be a characteristic root (s 0 = −2 or s 0 = −3 in our example) then the particular
solution is invalid because of division by zero. Then a particular solution exists of the form

y part (t ) = (A k t k + · · · + A 1 t + A 0 ) es0 t

for some large enough k and with the constants A 0 , . . . , A k yet to be determined.
If the function u(t ) in (A.4) is polynomial then a polynomial particular solution y part (t ) =
A k t k + · · · + A 1 t + A 0 of sufficiently high degree exists.

121
A.5 System of linear time-invariant DE’s
Let A ∈ Rn×n and B ∈ Rn×m . Then for every x 0 ∈ Rn and piecewise continuous u : R → Rm the
solution x : R → Rn of the DE

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0

follows uniquely,
Z t
x(t ) = e At x 0 + e A(t −τ) Bu(τ) dτ, t ∈ R. (A.5)
0

Piecewise continuity of u(·) for technical reasons only. Here e A is the matrix exponential. It is
defined for square matrices A and can, for instance, be defined in analogy with the the Taylor
series expansion of ea as

X∞ 1 1 1
eA = Ak = I + A + A2 + A3 + · · · . (A.6)
k=0 k! 2! 3!

This series is convergent for every square matrix A. Some characteristic properties of the
matrix exponential are:

Lemma A.5.1 (Matrix exponential properties). Let A, P ∈ Rn×n . Then

1. e0 = I for the zero matrix 0 ∈ Rn×n .

2. e A is invertible and (e A )−1 = e−A .

3. If A = P ΛP −1 for some matrix Λ, then e A = P eΛ P −1 .


d
4. Let t ∈ R. Then dt e At = A e At = e At A.
ä

For the zero signal u(t ) = 0 the above says that the general solution of

ẋ(t ) = Ax(t ), x(0) = x 0 ∈ Rn .

is

x(t ) = e At x 0 .

For diagonal matrices


 
λ1 0 ··· 0
0 .. .. 
 λ2 . . 
Λ =  .. .. .. ,
 . . . 0
0 ··· 0 λn

the matrix exponential is simply the diagonal matrix of scalar exponentials,


 
eλ1 t 0 ··· 0
 0 .. .. .. 
 . . . 
eΛt =  .. .. .. . (A.7)
 . . . 0 
0 ··· 0 eλn t

122
if A is “diagonalizable” – meaning Rn has a basis {v 1 , . . . , v n } of eigenvectors of A – then the
£ ¤
matrix of eigenvectors, P := v 1 v 2 · · · v n , is invertible and A = P ΛP −1 with Λ the diago-
nal matrix of eigenvalues of A. In that case

e At = P eΛt P −1

with eΛt as in (A.7). This shows that for diagonalizable matrices A, every entry of e At is a
linear combination of eλi t , i = 1, . . . , n. However, not every matrix is diagonalizable. Using
Jordan forms it can be shown that:

Lemma A.5.2. Let A ∈ Rn×n and denote its eigenvalues as λ1 , λ2 , . . . , λn . Then every entry
of e At is a finite linear combination of t k eλi t with k ∈ N and i = 1, 2, . . . , n. Moreover, the
following statements are equivalent.

1. Every entry of e At converges to zero as t → ∞.

2. Every entry of e At converges to zero exponentially fast as t → ∞ (meaning for every


entry w(t ) of e At there is an ² > 0 such that limt →∞ w(t ) e²t = 0)

3. All eigenvalues of A have negative real part: re(λi ) < 0∀i = 1, . . . , n.


ä

A.6 Stabilizability and detectability


We consider systems given by the following differential equations

ẋ(t ) = Ax(t ) + Bu(t ), x(0) = x 0 , t > 0. (A.8)

Here, A ∈ Rn×n and B ∈ Rn×m . The function u : [0, ∞) → Rm is often called the input and
the interpretation is that this input u(·) is for us to choose, and that the state x(·) follows. A
natural question is how well the state can be controlled:

Definition A.6.1 (Controllability). A system ẋ(t ) = Ax(t ) + Bu(t ) is controllable if for every
pair of states x 0 , x 1 ∈ Rn , there is a time T > 0 and an input u : [0, T ] → Rm such that the
solution x(·) with x(0) = x 0 satisfies x(T ) = x 1 . ä

Controllability means that any state x(t ) can be driven to any other state by an appropri-
ate choice of input. Controllability can be tested in many ways:

Theorem A.6.2 (Controllability tests). For the system (A.8), the following statements are
equivalent:

1. The system is controllable;


£ ¤
2. B AB · · · A n−1 B ∈ Rn×(mn) has rank n;
£ ¤
3. A − sI B has rank n for every s ∈ C;

4. For every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric with respect
to the real axis, there exists a matrix F ∈ Rm×n such that the eigenvalues of A − B F are
equal to {λ1 , λ2 , . . . , λn }.
ä

A weaker form is “stabilizability”:

123
Definition A.6.3 (Stabilizability). A system ẋ = Ax(t ) + Bu(t ) is stabilizable if for every x(0) ∈
Rn there is a u : [0, ∞) → Rm such that limt →∞ x(t ) = 0. ä

It can be shown that stabilizability is equivalent to the existence of a matrix F ∈ Rm×n such
that all eigenvalues of A−B F have negative real part. This is interesting because it implies that
u(t ) := −F x(t ) is a stabilizing input for every x(t ) (verify this yourself).
Now consider a state system with an output

ẋ(t ) = Ax(t ), x(0) = x 0 , t > 0,


(A.9)
y(t ) = C x(t ),

where A is the same as in (A.8) and C is a (constant) k ×n matrix. The function y : [0, ∞) → Rk
is often called the output and the interpretation is that y is the part of the state the can be
measured. It is a natural question to ask how much information the output provides about
the state. For example, if we know the output, can we reconstruct the state? For linear systems
one can define “observability” as follows.

Definition A.6.4 (Observability). A system (A.9) is observable if a T > 0 exists such that the
x 0 follows uniquely from y : [0, T ] → Rk . ä

Of course, if x 0 follows uniquely then the state x(t ) = e At x 0 follows uniquely over the en-
tire interval [0, T ]. There are many ways to test for observability:

Theorem A.6.5 (Observability tests). Consider system (A.9). The following statements are
equivalent:

1. The system is observable;


 
C
 
 CA  (kn)×n
2.  . 

∈R has rank n;
 .. 
C A n−1
· ¸
C
3. has rank n for every s ∈ C;
A − sI

4. For every set {λ1 , λ2 , . . . , λn } of n points in the complex plane, symmetric with respect to
the real axis, there is a matrix L ∈ Rn×k such that the eigenvalues of A − LC are equal to
{λ1 , λ2 , . . . , λn };

5. The “transposed” system x̃˙ (t ) = A T x̃(t ) +C T ũ(t ) is controllable.


ä

A weaker form of observability is “detectability”. It can be defined as follows (from this


definition it is not immediately clear that it is a weaker form of observability):

Definition A.6.6 (Detectability). A system (A.9) is detectable if limt →∞ y(t ) = 0 implies


limt →∞ x(t ) = 0. ä

Detectability means that a possible instability of ẋ = Ax(t ) can always be detected by look-
ing at y(t ). Dual to stabilizability it can be shown that detectability is equivalent to the exis-
tence of a matrix L ∈ Rn×k such that all eigenvalues A − LC have negative real part.

124
ent
tang

0
=
z)
G(
L(z) =
minimal

∂G(z 0 )
∂z
z0
∂L(z 0 )
∂z

G(z) < 0
=
0 (a)
z)
G(

L(z) =
minimal ∂G(z ∗ )
∂z
∂L(z ∗ )
z∗ ∂z

G(z) < 0 (b)


F IGURE A.1: Let L(z) = z 12 + z 22 . Its level sets {z = (x 1 , x 2 )|L(z) = c} are circles (shown in red).
Suppose the blue curve is where G(z) = 0 and the lightgray region is where G(z) < 0. The z 0 in
(a) is not a local minimizer of L(z) subject to G(z) = 0. The z ∗ in (b) is a local minimizer and it
satisfies the first order condition that the gradients ∂L(z)/∂z and ∂G(z)/∂z are aligned at z = z ∗

125
A.7 Lagrange multipliers
We recall in this section the Lagrange multipliers for finite dimensional optimization prob-
lems and its connection with first order conditions of constrained minimization.
Let L : Rn → R. The first order condition for unconstrained minimization

minn L(z)
z∈R

roughly speaking is that no small perturbation z = z ∗ + δ of the candidate minimizer z ∗


decreases L(z). This idea leads to the classic first order condition that the gradient vector
∂L(z)/∂z must be zero at a minimizer z ∗ . A similar idea applies to constrained minimization,

minn L(z) subject to G(z) = 0 (A.10)


z∈R

where G : Rn → Rk . Notice that G(z) has k rows, so G(z) = 0 is a collection of k constraints. To


motivate this case, consider the situation of one constraint, k = 1, and two parameters, n = 2.

Example A.7.1 (Geometric interpretation of first order necessary condition for n = 2 and
k = 1). An example is depicted in Fig. A.1. Let L : R2 → R be some smooth function that
we want to minimize over all z ∈ R2 subject to the constraint G(z) = 0 with G : R2 → R some
differentiable function. Intuition tells us that z 0 in Fig. A.1(a) is not a local minimizer because
moving up along the constraint curve brings us to a lower value of L(z). Another way to say
this is that the tangent of the constraint curve at z 0 is not tangent to the level set of L(z)
through z 0 . The first order condition for (local) minimality is that every perturbation δ ∈ R2
“in the tangent of the constraint” at the candidate minimizer z ∗ ,
∂G(z ∗ )
{δ ∈ R2 | δ = 0}
∂z T
is also tangent to the level set through z ∗ , meaning
∂L(z ∗ )
δ = 0.
∂z T
∂L(z) ∂G(z)
The geometric interpretation for n = 2 is that the gradients ∂z and ∂z are aligned at the
minimizer z ∗ , see Fig. A.1(b). ä

Under mild regularity assumptions1 one can likewise show that a solution z ∗ of the con-
straint minimization problem (A.10) necessarily has the property
µ ¶
n ∂G(z ∗ ) ∂L(z ∗ )
G(z ∗ ) = 0 and ∀δ ∈ R : δ = 0 =⇒ δ=0 . (A.11)
∂z T ∂z T
With this constraint minimization problem (A.10) we associate an unconstrained mini-
mization problem by defining the Lagrangian function

K (z, λ) := L(z) + λT G(z) (A.12)

and minimizing this function as a function of z ∈ Rn and the vector of Lagrange multipliers
λ ∈ Rk . The standard first-order conditions for minimality of K (z ∗ , λ∗ ) is that the gradient
with respect to both z and λ is zero at (z ∗ , λ∗ ):
∂L(z ∗ ) ∂G(z ∗ )
+ λ∗T = 0, G(z ∗ ) = 0. (A.13)
∂z T ∂z T
1 That G(·) is continuously differentiable and ∂G(z)/∂z T has full row rank at z = z .

126
The second half of the equations (the first order conditions with respect to λ) are just the
constraint equations themselves. The first half of these equations (the first-order conditions
with respect to z) tells us that at a minimizing z ∗ the gradient vector ∂L(z)
∂z T is a linear combina-
∗)
tion of the k rows of ∂G(z
∂z T , see Fig. A.1(b). The classic result is that the first order conditions
for the unconstrained Lagrangian are equivalent to the first order conditions for the original
constrained problem:

Lemma A.7.2 (First order condition). Given are L : Rn → R, G : Rn → Rk both continuously


differentiable. A pair (z ∗ , λ∗ ) ∈ (Rn , Rk ) satisfies the Lagrangian first-order condition (A.13)
if-and-only-if (A.11) holds.

Proof. This is an application of the theorem of alternatives:

Given A ∈ Rk×n and µ ∈ Rn there is a λ ∈ Rk such that λT A = µT if and only if for


every δ ∈ Rn such that Aδ = 0 we have µT δ = 0.

(proof: The only-if part is easy: if µT = λT A and Aδ = 0 then µT δ = λT Aδ = 0. For the if-part we
note that the condition that (Aδ = 0 =⇒ µT δ = 0) implies that ker A ⊆ ker µT . This is equivalent
to im µ ⊆ im A T . Since µ ∈ im µ this implies the existence of a λ ∈ Rn such that A T λ = µ.)
Apply the theorem of alternatives with A = ∂G(z ∗ )/∂z T and µ = −∂L(z ∗ )/∂z. ■

127
128
Appendix B

Differential equations and Lyapunov


stability

This appendix reviews existence and uniqueness of solutions x 1 , . . . , x n : R → R of n coupled


differential equations

ẋ 1 (t ) = f 1 (x 1 (t ), . . . , x n (t )), x 1 (0) = x 01
.. ..
. .
ẋ n (t ) = f n (x 1 (t ), . . . , x n (t )), x n (0) = x 0n , t ≥ 0. (B.1)

Here x 01 , . . . , x 0n ∈ R are given initial conditions and the f i : Rn → R are given functions. The
vector
 
x 1 (t )
 . 
x(t ) :=  ..  ∈ Rn
x n (t )
is called the state (vector) and with a similar definition for the vector of initial conditions x 0 ∈
Rn and vector field function f : Rn → Rn we may write (B.1) more succinctly as

ẋ(t ) = f (x(t )), x(0) = x 0 , t ≥ 0. (B.2)

The solution we normally write as x(t ) but sometimes we use x(t ; x 0 ) if we want to emphasise
the dependence on the initial state.

B.1 Existence and uniqueness of solutions


There are functions f : Rn → Rn for which the solution x(·) of the differential equation is not
unique:
Example B.1.1 (Non-unique solution). A standard example of a differential equation with a
non-unique solution is
p
ẋ(t ) = x(t ), x(0) = 0, t ≥ 0.

Clearly the zero function x(t ) = 0 ∀t is one solution, but it is easy to verify that for every c > 0
the function
(
0 t ∈ [0, c] x(t )
x(t ) = 1 2
4 (t − c) t >c
0 c t→

129
is a solution as well! Weird. It is as if the state x(t ) – like Baron Munchhausen – is able to lift
itself by pulling on its own hair. ä
p
The vector field function in this example is f (x) = x and it has unbounded derivative
around x = 0. We will see next that if the function f (x) does not increase “too quickly” then
uniqueness is ensured. A measure for the rate of increase is the Lipschitz constant.

Definition B.1.2 (Lipschitz continuity). Let Ω ⊂ Rn and let k · k be some norm on Rn (e.g.
the standard Euclidean norm). A function f : Ω → Rn is Lipschitz continuous on Ω if a Lip-
schitz constant K ≥ 0 exists such that

k f (x) − f (z)k ≤ K kx − zk (B.3)

for all x, z ∈ Ω. It is Lipschitz continuous at x 0 if it is Lipschitz continuous on some neighbor-


hood Ω of x 0 , and it is locally Lipschitz if it Lipschitz continuous at every x 0 ∈ Rn . ä

f (x)

a z b

F IGURE B.1: Lipschitz continuity for scalar functions f : [a, b] → R defined on an interval Ω =
[a, b] ⊂ R means that at each z ∈ [a, b] the graph (x, f (x)) is completely contained in a steep-
enough “bow tie” through the point (z, f (z)). The slope of the steepest bow tie needed over all
z in the interval is a possible Lipschitz-constant K .

For the linear function f (x) = kx with k ∈ R the Lipschitz constant is obviously K = |k| and
the solution of the corresponding differential equation

ẋ(t ) = kx(t ), x(0) = x 0

clearly is x(t ) = ekt x 0 . Given x 0 , this solution exists and is unique. The idea is now that for
arbitrary Lipschitz continuous f : Rn → Rn the solutions of ẋ(t ) = f (x(t )) exist and are unique
(one some region) and that the solution increases at most exponentially with the exponent K
equal to the Lipschitz constant (on that region):

Theorem B.1.3 (Existence and uniqueness of solution). Let x 0 ∈ Rn and f : Rn → Rn . If f (·)


is Lipschitz continuous at x 0 then, for some T > 0, the differential equation (B.2) has a unique
solution x(t ; x 0 ) for all t ∈ [0, T ) and with the property that for every fixed t ∈ [0, T ) the solution
x(t ; x 0 ) depends continuously on x 0 .
Moreover, if x(t ; x 0 ) and x(t ; z 0 ) are two solutions that for all t ∈ [0, T ) live in some neigh-
borhood Ω, and if f (·) on this neighborhood has a Lipschitz constant K , then

kx(t ; x 0 ) − x(t ; z 0 )k ≤ kx 0 − z 0 k eK t ∀t ∈ [0, T ).

130
Proof. The proof can be found in many textbooks, e.g. (Khalil, 1996, Thm. 2.2 & Thm. 2.5). ■

If a single Lipschitz constant K ≥ 0 exists such that (B.3) holds for all x, z ∈ Rn then f (·) is
said to satisfy a global Lipschitz condition.
It follows from the above theorem that solutions x(t ) can be uniquely continued at any
time t 0 if f (·) is locally Lipschitz. This is such a desirable property that one normally im-
plicitly assumes that f (x) is locally Lipschitz. Every continuously differentiable f (x) is locally
Lipschitz, so in such cases we can uniquely continue the solution x(t ) at every t 0 . However,
the solution might escape in finite time:

Theorem B.1.4 (Escape time). Suppose that f : Rn → Rn is locally Lipschitz. Then for every
x(0) = x 0 there is a unique t (x 0 ) ≥ 0 (possibly t (x 0 ) = ∞) such that the solution x(t ) of (B.2)
exists and is unique on the open time interval [0, t (x 0 )) but does not exist for t > t (x 0 ).
Moreover if t (x 0 ) < ∞ then limt ↑t (x0 ) kx(t ; x 0 )k = ∞.
If f (·) is globally Lipschitz then t (x 0 ) = ∞ i.e. the solution x(t ; x 0 ) then exists and is unique
for all t ≥ 0.

Proof. See (Khalil, 1996, p. 74–75). ■

The t (x 0 ) – whenever finite – is known as the escape time.

Example B.1.5 (Escape time). Consider the scalar differential equation

ẋ(t ) = −x 2 (t ), x(0) = x 0 .

The function f (x) := −x 2 is locally Lipschitz because it is continuously differentiable. (It is


not globally Lipschitz however.) Hence for every initial condition there is a unique solution
on some non-empty interval [0, t (x 0 )) but t (x 0 ) might be finite. In fact, for this example we
can determine the solutions because it can be explicitly solved (see also Exercise 4.7 and Ap-
pendix A.3):
x0
x(t ) = .
t x0 + 1
If x 0 ≥ 0 then x(t ) is well defined for every t > 0 so then t (x 0 ) = ∞. If however x 0 < 0 then the
solution escapes at finite time t (x 0 ) = −1/x 0 , see Fig. B.2. ä

We conclude this section with a result about continuity of solutions that is useful for op-
timal control. Here we take the standard Euclidean norm:

Lemma B.1.6 (Continuity of solutions). Consider the differential equation in x(t ) and (per-
turbed) z(t ):

ẋ(t ) = f (x(t )), x(0) = x 0 ,


ż(t ) = f (z(t )) + g (t ), z(0) = z 0 .

Let T > 0. If Ω is an open set such that x(t ), z(t ) ∈ Ω for all t ∈ [0, T ) and if f (x) on Ω has
Lipschitz constant K , then
µ Z t ¶
Kt
kx(t ) − z(t )k ≤ e kx(0) − z(0)k + kg (τ)k dτ ∀t ∈ [0, T ).
0

Proof. Let ∆(t ) = x(t ) − z(t ). Then ∆(t˙ ) = f (x(t )) − f (z(t )) + g (t ). Then by Cauchy-Schwarz
d
we have | dt k∆(t )k| ≤ K k∆(t )k + kg (t )k. From (A.5) it follows that then k∆(t )k ≤ eK t (k∆(0)k +
Rt
0 kg (τ)kdτ). ■

131
x(t ; x0 )
for x0 > 0

x̃ 0 −1/x̃ 0 t→

x(t ; x0 )
for x0 < 0

F IGURE B.2: For negative x 0 the solution escapes at t = −1/x 0 (Example B.1.5)
x2 →

ǫ
x0

δ

x(t )

x1 →

F IGURE B.3: Stability for 2-dimensional systems x = (x 1 , x 2 )

132
B.2 Definitions of stability
Asymptotic stability of ẋ(t ) = f (x(t )) loosely speaking, means that solutions x(t ) “comes to
rest”, and stability means that x(t ) remains “close to rest”. In order to formalize this, we first
have to define the “points of rest”. These are the constant solutions x(t ) = x̄ of the differential
equation, so solutions x̄ of f (x̄) = 0.

Definition B.2.1 (Equilibrium). x̄ ∈ Rn is an equilibrium (point) of (B.2) if f (x̄) = 0. ä

Different possibilities for the behavior of the system near an equilibrium point are de-
scribed in the following definition. For ease of exposition we assume that t 0 = 0.

Definition B.2.2 (Stable and unstable equilibria). An equilibrium point x̄ of a differential


equation is called

1. stable if ∀² > 0 ∃δ > 0 such that kx 0 − x̄k < δ implies kx(t ; x 0 ) − x̄k < ² ∀t ≥ t 0 .

2. attractive if ∃δ1 > 0 such that kx 0 − x̄k < δ1 implies that limt →∞ x(t ; x 0 ) = x̄.

3. asymptotically stable if it is stable and attractive.

4. globally attractive if limt →∞ x(t ; x 0 ) = x̄ for every x 0 ∈ Rn .

5. globally asymptotically stable if it is stable and globally attractive.

6. unstable if x̄ is not stable. This means that ∃² > 0 such that ∀δ > 0 an x 0 and a t 1 exists
for which kx 0 − x̄k < δ yet kx(t 1 ; x 0 ) − x̄k ≥ ².
ä

In particular an equilibrium x̄ is unstable if every neighborhood of it contains an x 0 that


has finite escape time, t (x 0 ) < ∞. Surprisingly, perhaps, we have that attractive equilibria
need not be stable, see Exercise B.24. Instead of (in)stability of equilibria, one may also study
(in)stability of a specific trajectory x(t ), t ∈ R, in particular of a periodic orbit. We do not ex-
plicitly deal with this problem.
There are many ways to analyze stability properties of equilibria. Of particular importance
are those methods that do not rely on explicit forms of the solutions x(t ) as explicit forms are
in general very hard to find. Two methods, both attributed to Lyapunov, that do not require
explicit knowledge of x(t ) are linearization, also known as Lyapunov’s first method, and the
method of Lyapunov functions, also known as Lyapunov’s second method. An advantage of the
second method over the first is that the first can be proved elegantly with the second. This is
why Lyapunov’s second method is covered, well, first.

B.3 Lyapunov functions


Lyapunov’s second method mimics the well known physical property that a system that con-
tinually loses energy eventually comes to a halt. Of course in a mathematical context one may
bypass the precise notion of physical energy, but it is a helpful interpretation.
Suppose we have a functional V : Rn → R that does not increase along any solution x(t ) of
the differential equation, i.e., that

V (x(t + h)) ≤ V (x(t )) ∀h > 0, ∀t (B.4)

133
for every solution of ẋ(t ) = f (x(t )). Now if V (x(t )) is differentiable with respect to time t then
it is nonincreasing if and only if its derivative with respect to time is non-positive everywhere

V̇ (x(t )) ≤ 0 ∀t .

This condition can be checked for solutions of (B.2) without explicit knowledge of x(t ). In-
deed, using the chain-rule, we have that

dV (x(t ))
V̇ (x(t )) =
dt
∂V (x(t )) ∂V (x(t ))
= ẋ 1 (t ) + · · · + ẋ n (t )
∂x 1 ∂x n
∂V (x(t )) ∂V (x(t ))
= f 1 (x(t )) + · · · + f n (x(t ))
∂x 1 ∂x n
∂V (x) ¯
¯
=: f (x) ¯ . (B.5)
∂x T x=x(t )

∂V (x)
We took the opportunity here to introduce the convenient notation ∂x T for the gradient of
V (x) at x seen as a row vector,
· ¸
∂V (x) ∂V (x) ∂V (x) ∂V (x)
:= ··· .
∂x T ∂x 1 ∂x 2 ∂x n

(Appendix A.2 explains this notation, in particular the role of the transpose.) The product
in (B.5) is that of a row vector ∂V (x)/∂x T and a column vector f (x), evaluated at x = x(t ).
With slight abuse of notation we use V̇ (x) to mean

∂V (x)
V̇ (x) = f (x).
∂x T
In order to deduce stability from the existence of a non-increasing function V (x(t )) we addi-
tionally require that the function has a minimum at the equilibrium. Furthermore, for tech-
nical reasons we also have to require a certain degree of differentiability of the function. We
formalize these properties in the following definition and theorem.

Definition B.3.1 (Positive and negative (semi) definite). Let Ω ⊆ Rn and assume it is a neigh-
borhood of some x̄ ∈ Rn . A continuously differentiable function V : Ω → R is positive definite
on Ω relative to x̄ if

V (x̄) = 0 while V (x) > 0 for all x ∈ Ω \ x̄.

It is positive semi-definite if V (x̄) = 0 and V (x) ≥ 0 for all other x. And V (·) is negative (semi)
definite if −V (·) is positive (semi) definite. ä

Positive definite implies that V (·) has a unique minimum on Ω and that the minimum is
attained at x̄. The assumption that the minimum is zero, V (x̄) = 0, is a convenient normaliza-
tion. Figure B.4 shows an example of each of the four types of “definite” functions.
The famous result can now be proved:

Theorem B.3.2 (Lyapunov’s second stability theorem). Consider the DE ẋ(t ) = f (x(t )) with
f : Rn → Rn Lipschitz continuous, and let x̄ be an equilibrium of this DE. If there is a neigh-
borhood Ω of x̄ and a function V : Ω → R such that on Ω

1. V (x) is continuously differentiable,

134
positive positive
definite semi-definite

(x̄, 0) x→ (x̄, 0) x→

(x̄, 0) x→ (x̄, 0) x→

negative negative
definite semi-definite

F IGURE B.4: Examples of graphs of positive/negative (semi) definite functions V : R → R

2. V (x) is positive definite relative to x̄,

3. V̇ (x) is negative semi-definite relative to x̄

then x̄ is a stable equilibrium and we call V (x) a Lyapunov function.


If in addition V̇ (·) is negative definite (so not just negative semi-definite) then x̄ is asymp-
totically stable and we call V (·) a strong Lyapunov function. ä

kx − x̄k = ǫ1

B(x̄, ǫ1 )

Ω1 x̄

kx − x̄k = δ

F IGURE B.5: Four inclusions of regions (proof of Theorem B.3.2)

By definition a Lyapunov function V (x(t )) never increases over time (on Ω), and a strong
Lyapunov function V (x(t )) always decreases on Ω unless we are at the equilibrium x̄.

Proof. We denote the open sphere with radius r and center x̄ by B (x̄, r ), i.e.,

B (x̄, r ) := {x ∈ Rn | kx − x̄k < r }.

We first consider the stability property. For every ² > 0 we have to find a δ > 0 such that
x 0 ∈ B (x̄, δ) implies x(t ) ∈ B (x̄, ²) for all t > 0. We construct a series of inclusions, see Fig. B.5.
Because Ω is a neighborhood of x̄, there exists an ²1 > 0 such that B (x̄, ²1 ) ⊂ Ω. Without
loss of generality we can take it so small that ²1 ≤ ². Because V (x) is continuous on Ω and
because the boundary of B (x̄, ²1 ) is a compact set, V (x) has a minimum on the boundary of
B (x̄, ²1 ). We call this minimum α. Now define

Ω1 := {x ∈ B (x̄, ²1 ) | V (x) < α}.

This set Ω1 is open because V (x) is continuous. It is contained in B (x̄, ²1 ). Now x̄ is an ele-
ment of Ω1 because V (x̄) = 0. So, by continuity of V (x), there exists a δ such that B (x̄, δ) ⊂ Ω1 .

135
We prove that this δ satisfies the requirements: if x 0 ∈ B (x̄, δ), we find because V̇ (·) is nega-
tive semi-definite that V (x(t ; x 0 )) ≤ V (x 0 ) < α for all t ≥ 0. This means that it is impossible
that x(t ; x 0 ), with initial condition in B (x̄, δ), reaches the boundary of B (x̄, ²1 ) because on this
boundary we have, by definition, that V (x) ≥ α. So kx(t ; x 0 ) − x̄k < ² for all time and the sys-
tem, thus, is stable.
Next we prove that the stronger inequality V̇ (x) < 0 ∀x ∈ Ω\{x̄} assures asymptotic stabil-
ity. Specifically we prove that for every x 0 ∈ B (x̄, δ) the solution x(t ; x 0 ) → x̄ as t → ∞. First
note that, because of stability, the orbit x(t ; x 0 ) remains within the bounded set B (x̄, ²1 ) for
all time. Now, to obtain a contradiction, assume that x(t ; x 0 ) does not converge to x̄. This
implies that there is a µ > 0 and increasing time instances t k with t k → ∞ such that

kx(t k ; x 0 ) − x̄k > µ > 0 ∀k.

As x(t k ; x 0 ) is a bounded sequence, the theorem of Bolzano-Weierstrass guarantees that there


is a subsequence x(t k j ; x 0 ) that converges to some element x ∞ . Clearly x ∞ 6= x̄. Since V (·) is
non-increasing we have for every t > 0 that

V (x(t k j ; x 0 )) ≥ V (x(t k j + t ; x 0 )) ≥ V (x(t k j +m ; x 0 ))

where m is chosen such that t k j + t < t k j +m . Now in the limit j → ∞ the above inequality
becomes

V (x ∞ ) ≥ V (x(t ; x ∞ )) ≥ V (x ∞ ).

(Let us be precise here: since the differential equation is locally Lipschitz we have, by
Thm. B.1.3, that x(t ; x 0 ) depends continuously on x 0 . For that reason we are allowed to say
that lim j →∞ x(t j + t ; x 0 ) = lim j →∞ x(t ; x(t j )) = x(t ; x ∞ ).) Hence V (x(t ; x ∞ )) = V (x ∞ ) for all t .
In particular we see that V (x(t ; x ∞ )) is constant. But that would mean that V̇ (x ∞ ) = 0 and this
violates the fact that V̇ (·) is negative definite and x ∞ 6= x̄. Therefore the assumption that x(t )
does not converge to x̄ is wrong. The system is asymptotically stable. ■

−1 1
x→

1−x 2
F IGURE B.6: Graph of 1+x 2

Example B.3.3 (First order system). The scalar system

1 − x 2 (t )
ẋ(t ) =
1 + x 2 (t )
has two equilibria, x̄ = ±1, see Fig. B.6. For equilibrium x̄ = 1 we propose the candidate Lya-
punov function

V (x) = (x − 1)2 .

It is positive definite relative to x̄ = 1 and it is continuously differentiable. On Ω = (−1, ∞) it is


a Lyapunov function because then also the third condition of Thm. B.3.2 holds:

∂V (x) 1 − x2 (1 − x)2 (1 + x)
V̇ (x) = f (x) = 2(x − 1) = −2 ≤ 0 ∀x ∈ (−1, ∞).
∂x 1 + x2 x2 + 1

136
Actually V̇ (x) < 0 for all x ∈ (−1, ∞) \ {1} so it is in fact a strong Lyapunov function on (−1, ∞)
and hence the equilibrium x̄ = 1 is asymptotically stable.
The other equilibrium, x̄ = −1, is unstable. ä

x2
joint

x1 x1 = −π x1 = π

mass

F IGURE B.7: Left: pendulum. Right: level sets of its mechanical energy V (x). See Example B.3.4

Example B.3.4 (pendulum). The standard equation of motion of a pendulum without damp-
ing is

ẋ 1 (t ) = x 2 (t )
g (B.6)
ẋ 2 (t ) = − ` sin(x 1 (t ))

where x 1 (t ) is the angular displacement, x 2 (t ) is the angular velocity, g is the gravitational


constant and ` is the length of the pendulum, see Fig. B.7(top). The mechanical energy of the
pendulum with mass M is
1
V (x) = M `2 x 22 + M g `[1 − cos(x 1 )].
2
This energy is zero at (x 1 , x 2 ) = (2kπ, 0) and is positive elsewhere. To turn this into a Lyapunov
function for the hanging position x̄ = (0, 0) we simply take, say,

Ω = {x ∈ R2 | −2π < x 1 < 2π}.

This way V (x) on Ω has a unique minimum at equilibrium x̄ = (0, 0). Hence V (x) is positive
definite relative to this x̄ for this Ω. Clearly V (x) is also continuously differentiable and V̇ (x)
equals

∂V (x)
V̇ (x) = f (x)
∂x T
∂V (x) ∂V (x)
= f 1 (x) + f 2 (x)
∂x 1 ∂x 2
g
= M g ` sin(x 1 )x 2 − M `2 x 2 sin(x 1 ) = 0.
`
Apparently the mechanical energy is constant over time. Therefore using Theorem B.3.2 we
may draw the conclusion that the system is stable, but not necessarily asymptotically stable.
The fact that V (x(t )) is constant actually implies it is not asymptotically stable. Indeed if we
start at a nonzero state x 0 ∈ Ω – so with V (x 0 ) > 0 – then V (x(t )) = V (x 0 ) for all time and x(t )
thus does not converge to (0, 0). Figure B.7(bottom) indicates level sets {(x 1 , x 2 )|V (x 1 , x 2 ) = c}
of the mechanical energy in the phase plane for several levels c > 0. Solutions x(t ) remain
within its level set. ä

137
For strong Lyapunov functions, Thm. B.3.2 states that x(t ; x 0 ) → x̄ for initial sates x 0 that
are close enough to the equilibrium. At first sight it seems reasonable to expect that the “big-
ger” the Ω the “bigger” the region of attraction. Alas. As demonstrated in Exercise B.3, having
a strong Lyapunov function on the entire state space Ω = Rn does not imply that x(t ; x 0 ) → x̄
for all initial conditions x 0 ∈ Rn . The question that thus arises is: what is the region of attrac-
tion of the equilibrium x̄ in case it is asymptotically stable, and under which conditions is this
region of attraction the entire state space Rn ?
The proof of Theorem B.3.2 gives some insight into the region of attraction. In fact, it
follows that the region of attraction of x̄ includes the largest sphere about x̄ that is contained
in Ω1 := {x ∈ B (x̄, ²) | V (x) < α}, see Fig. B.5. We use this observation to formulate an extra
condition on V (·) that guarantees global asymptotic stability.

Theorem B.3.5 (Global asymptotic stability). Suppose all conditions of Thm. B.3.2 are met
with Ω = Rn . If V : Rn → R is a strong Lyapunov function with the additional property that

V (x) → ∞ as kxk → ∞, (B.7)

then the system is globally asymptotically stable. (Property (B.7) is known as radial unbound-
edness.)

Proof. The proof of Thm. B.3.2 shows that x(t ) → x̄ whenever x 0 ∈ B (x̄, δ) where δ is as in-
dicated in Fig. B.5. Remains to show that any x 0 is in this ball B (x̄, δ), that is, that δ can be
chosen arbitrarily large. We will construct the various regions of Fig. B.5 starting with the
smallest and step-by-step working towards the biggest.
Take an arbitrary x 0 ∈ Rn and let δ := 2kx̄ − x 0 k and α = supkx−x̄k<δ V (x). This α is finite.
Next let Ω1 = {x|V (x) < α}. This set is bounded because V (x) is radially unbounded. (This
is the reason we require radial unboundedness.) By construction we have x 0 ∈ B (x, δ) ⊂ Ω1 .
Therefore ²1 := supx∈Ω1 kx − x̄k is finite. For every ² > ²1 the conditions of Thm. B.3.2 are met
and since x 0 ∈ B (x̄, δ) the proof of Thm. B.3.2 says that x(t ) → x̄ as t → ∞. This works for
every x 0 so the system is globally attractive. Together with stability this means it is globally
asymptotically stable. ■

F IGURE B.8: Phase portrait of the system of Example B.3.6. The origin is globally asymptotically
stable

Example B.3.6 (Global asymptotic stability). Consider the system

ẋ 1 (t ) = −x 1 (t ) + x 22 (t )
(B.8)
ẋ 2 (t ) = −x 2 (t )x 1 (t ) − x 2 (t ).

138
Clearly the origin (0, 0) is an equilibrium of this system. We choose

V (x) = x 12 + x 22 .

This V (·) is radially unbounded and it is a strong Lyapunov function on R2 because it is posi-
tive definite and continuously differentiable and

V̇ (x) = 2x 1 [−x 1 + x 22 ] + 2x 2 [−x 2 x 1 − x 2 ] = −2(x 12 + x 22 ) < 0 ∀x 6= 0.

Since V (·) is radially unbounded the equilibrium (0, 0) is globally asymptotically stable. This
also implies that (0, 0) is the only equilibrium. Its phase portrait is shown in Fig. B.8. ä

Powerful as the theory may be, it does not really tell us how to find a Lyapunov function,
assuming one exists. Systematic design of Lyapunov functions is hard, but it does work for
linear-time invariant systems, as discussed in Section B.5.

B.4 LaSalle’s Invariance Principle


Theorem B.3.2 guarantees asymptotic stability when V̇ (x) < 0 everywhere outside the equilib-
rium. However, in many cases of interest the natural Lyapunov function does not satisfy this
condition, while the equilibrium may be asymptotically stable nonetheless. Examples include
physical systems whose energy decreases almost everywhere but not everywhere. An example
is the pendulum with friction:

Example B.4.1 (Pendulum with friction). The equations of motion of a pendulum subject to
damping are

ẋ 1 (t ) = x 2 (t )
g (B.9)
ẋ 2 (t ) = − sin(x 1 (t )) − cx 2 (t )
`

where x 1 (t ) is the angular displacement, x 2 (t ) is the angular velocity and c a positive friction
coefficient. The time-derivative of the mechanical energy V (x) = 21 M `2 x 22 + M g `[1 − cos(x 1 )]
is
g
V̇ (x) = M g ` sin(x 1 )x 2 − M `2 x 2 sin(x 1 ) − c M `2 x 22 = −c M `2 x 22 ≤ 0.
`

The mechanical energy decreases everywhere except if the angular velocity x 2 is zero. Using
Theorem B.3.2 we may only draw the conclusion that the system is stable but not that it is
asymptotically stable because V̇ (x) is zero at other points than the equilibrium (it is zero at
any x = (x 1 , 0)). However from physical considerations we feel that (0, 0) is an asymptotically
stable equilibrium nonetheless. How to prove it? ä

In the above example we would still like to infer asymptotically stability. If we were to
use the theory from the previous section, we would have to find a new Lyapunov function
(different from the mechanical energy), but this is not an easy task. In this section we dis-
cuss a method that allows to prove asymptotic stability without us having to construct a new
Lyapunov function.
From the above pendulum example one might be tempted to conclude that asymptotic
stability follows as long as V (x) decreases “almost everywhere” in state space. That is not
necessarily the case as the following basic example demonstrates.

139
F IGURE B.9: Simple system (Example B.4.2)

Example B.4.2 (Simple system). Consider

ẋ 1 (t ) = 0
ẋ 2 (t ) = −x 2 (t ).

Clearly x 1 (t ) is constant and x 2 (t ) converges exponentially fast to zero (see the vector field of
Fig. B.9). Now

V (x) = x 12 + x 22

is a Lyapunov function for x̄ = (0, 0) because it is positive definite, continuously differentiable


and

V̇ (x) = 2x 1 ẋ 1 + 2x 2 ẋ 2 = −2x 22 ≤ 0.

The set of states x(t ) where V̇ (x(t )) = 0 is where x 2 = 0 (i.e. the x 1 -axis) and everywhere else
in the plane we have V̇ (x(t )) < 0. In that sense V̇ (x) is strictly negative “almost everywhere”.
The origin is however not asymptotically stable because every point (x̄ 1 , 0) on the x 1 -axis is an
equilibrium so no matter how small we take δ > 0, there are always initial states x 0 = (δ/2, 0)
whose solution x(t ) is constant and so does not converge to (0, 0). ä

We set up a generalized Lyapunov theory that allows to prove that the hanging position in
the pendulum-with-friction example (Example B.4.1) is indeed asymptotically stable and that
in Example B.4.2 all solutions converge to the x 1 -axis. It requires a bit of terminology.

Definition B.4.3 (Orbit). The orbit O (x 0 ) with initial condition x 0 is defined as O (x 0 ) = {y ∈


Rn | y = x(t ; x 0 ) for certain t ≥ 0}. ä

The orbit of x 0 is just the set of states that x(t ; x 0 ) traces out as t varies over all t ≥ 0.

Definition B.4.4 (Invariant set). A set G ⊆ Rn is called a (forward) invariant set for (B.2) if
every solution x(t ; x 0 ) of (B.2) with initial condition x 0 in G , is contained in G for all t > 0. ä

So once the state is in an invariant set it never leaves it. Every orbit is an invariant set.

Example B.4.5. The x 1 -axis is an invariant set for the system of Example B.4.2. In fact every
element x = (x 1 , 0) of this axis is an invariant set because they all are equilibria. The general
solution is x(t ) = (x 10 , x 20 e−t ). This shows that for instance also the x 2 -axis {(0, x 2 ) : x 2 ∈ R} is
an invariant set. ä

140
F IGURE B.10: Phase portrait (Example B.4.6)

The union of two invariant sets is itself an invariant set. In fact, the union of an arbitrary
number (finite, infinite, countable, uncountable) of invariant sets is invariant. Also, realize
that every equilibrium is an invariant set.

Example B.4.6 (Rotation invariant phase portrait). The phase portrait of Fig. B.10 is that of
£ ¤
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) 1 − x 12 (t ) − x 22 (t )
£ ¤ (B.10)
ẋ 2 (t ) = −x 1 (t ) + x 2 (t ) 1 − x 12 (t ) − x 22 (t ) .

Inspired by the rotation-invariant phase portrait (see Fig. B.10) we analyze first how the
squared radius

r (t ) := x 12 (t ) + x 22 (t )

changes over time,


£ ¤
r˙(t ) = d
dt x 12 (t ) + x 22 (t )
= 2x 1 (t )ẋ 1 (t ) + 2x 2 (t )ẋ 2 (t )
£ ¤
= 2x 1 (t )x 2 (t ) + 2x 12 (t ) 1 − x 12 (t ) − x 22 (t )
£ ¤
− 2x 2 (t )x 1 (t ) + 2x 22 (t ) 1 − x 12 (t ) − x 22 (t )
£ ¤£ ¤
= 2 x 12 (t ) + x 22 (t ) 1 − x 12 (t ) − x 22 (t ) .

Therefore the squared radius r = x 12 + x 22 satisfies

r˙(t ) = 2r (t )(1 − r (t )). (B.11)

If r (0) = 1 then r (t ) is always equal to one, so the unit circle is an invariant set. Furthermore,
Eqn. (B.11) shows that if 0 ≤ r (0) < 1, then 0 ≤ r (t ) < 1 for all time. Hence the open unit disc is
also invariant. Using similar arguments, we find that also the complement of the unit disc is
invariant. ä

In this example the state does not always converge to a single element, but to a set (e.g.
the unit circle in the previous example). We use dist(x, G ) to denote the (minimal) distance
between a point x ∈ Rn and a set G ⊂ Rn . We define

dist(x, G ) := inf kx − g k
g ∈G

and we say that a function x(t ) converges to a set G if limt →∞ dist(x(t ), G ) = 0. The extension
of Lyapunov can now be proved.

141
Theorem B.4.7 (LaSalle’s Invariance Principle). Let x̄ be an equilibrium of the locally
Lipschitz-continuous system ẋ(t ) = f (x(t )) and suppose that V (x) is a Lyapunov function on
some neighborhood Ω of x̄.
This set Ω contains a (nonempty) closed and bounded invariant neighborhood K of x̄,
and for every x 0 ∈ K the solution x(t ) as t → ∞ converges to the subset

G := {x ∗ ∈ K | V̇ (x(t ; x ∗ )) = 0 ∀t ≥ 0}.

This subset is invariant and nonempty. In particular, if G = {x̄} then x̄ is an asymptotically


stable equilibrium.

Proof. The construction of K is very similar to that of Ω1 in the proof of Thm. B.3.2. Since Ω
is a neighborhood of x̄ there is, by definition, a small enough ball B (x̄, ²) completely contained
in Ω. Let α = minkx−x̄k=² V (x). This α is larger than zero. Then K := {x ∈ B (x̄, ²)|V (x) ≤ α/2}
does the job. Indeed it is bounded, it is closed and since V̇ (x) ≤ 0 it is also invariant. And,
finally, it is a neighborhood of x̄.
The set G is nonempty (it contains x̄). Let x ∗ be an element of G . Then by invariance of
K for every t > 0 the element y := x(t ; x ∗ ) is in K . Also since V̇ (x(s; y)) = V̇ (x(t + s, x ∗ )) = 0
this orbit is in G . Hence G is invariant.
Next let x 0 ∈ K . Since K is invariant, the entire orbit x(t ; x 0 ) is in K for all time.
Now suppose, to obtain a contradiction, that x(t ) does not converge to G . Then, as x(t ) is
bounded, there is a sequence t n of time with limn→∞ t n = ∞ for which x(t n , x 0 ) converges
to some x ∞ 6∈ G . Notice that x ∞ is in K because K is closed. We claim that V (x(t ; x ∞ )) is
constant as a function of time. To see this we need the inequality

V (x(t n ; x 0 )) ≥ V (x(t n + t ; x 0 )) ≥ V (x ∞ ) ∀t ≥ 0. (B.12)

(The first inequality holds because V̇ (x) ≤ 0 and the second inequality follows from V̇ (x) ≤
0 combined with the fact that t n + t < t n+k for some large enough k, so that V (x(t n + t )) ≥
V (x(t n+k )) ≥ V (x ∞ ).) Taking the limit n → ∞ turns (B.12) into

V (x ∞ ) ≥ V (x(t ; x ∞ )) ≥ V (x ∞ ).

Hence V (x(t ; x ∞ )) is constant for all time, that is V̇ (x(t ; x ∞ )) = 0. But then x ∞ ∈ G (by defini-
tion of G ) which is a contradiction. Therefore the assumption that x(t ) does not converge to
G is wrong. ■

The proof also provides an explicit description of the set K but if we only want to estab-
lish asymptotic stability then we can normally avoid this description. Its existence is enough.

Example B.4.8. Consider the system

ẋ 1 (t ) = x 23 (t )
(B.13)
ẋ 2 (t ) = −x 13 (t ) − x 2 (t ).

Clearly the origin (0, 0) is an equilibrium. For this equilibrium, we suggest the Lyapunov func-
tion

V (x) = x 14 + x 24 .

This function is indeed a Lyapunov function (on Ω = Rn ) because it is continuously differen-


tiable, it is positive definite and

V̇ (x) = 4x 13 (x 23 ) + 4x 23 (−x 13 − x 2 ) = −4x 24 ≤ 0.

142
This implies that the origin is stable, but not necessarily asymptotically stable. To prove
asymptotic stability we use Thm. B.4.7. This theorem says that a bounded, closed invariant
neighborhood K of (0, 0) exists, but we need not worry about its precise form. The set of in-
terest is G . It contains those initial states x ∗ ∈ K whose solution x(t ; x ∗ ) satisfies the system
equations (B.13) and at the same time is such that V̇ (x(t )) = 0 for all time, but for our example
the latter means

x 2 (t ) = 0 ∀t .

Substituting this into the system equations (B.13) gives

ẋ 1 (t ) = 0
0 = −x 13 (t ) − 0, ∀t .

Clearly then x 1 (t ) = 0 for all time as well and so

G = {(0, 0)}.

LaSalle’s Invariance Principle proves that for every x 0 ∈ K the x(t ) converges to (0, 0) and that
the system, hence, is asymptotically stable. ä

Example B.4.9 (Example B.4.1 continued). Consider the pendulum system from Exam-
ple B.4.1,

ẋ 1 (t ) = x 2 (t )
g (B.14)
ẋ 2 (t ) = − sin(x 1 (t )) − cx 2 (t ).
`
We found that the mechanical energy V (x) is a Lyapunov function on some small enough
neighborhood Ω of the hanging equilibrium x̄ = (0, 0) and we also found that

V̇ (x) = −cM `2 x 22 .

The equality V̇ (x(t )) = 0 hence holds for all time iff x 2 (t ) = 0 for all time and the LaSalle set G
therefore is

G = {x ∗ ∈ K | x(t ; x ∗ ) satisfies (B.14) and x 2 (t ; x ∗ ) = 0∀t }.

We comment on K later. Since x 2 (t ) ≡ 0 the system equations (B.14) reduce to ẋ 1 (t ) = 0, 0 =


−g /` sin(x 1 (t )). This implies that x 1 (t ) is constant and sin(x 1 ) = 0:

G = {x ∗ ∈ K | x 1∗ = kπ, k ∈ Z, x 2∗ = 0}.

This set contains at most two physically different solutions: the hanging downwards solution
£ ¤ £ ¤
x ∗ = 00 and the standing upwards solution x ∗ = π0 . To rule out the upwards solution it
£ ¤ £ ¤
suffices to take the neighborhood Ω of x̄ = 00 so small that π0 6∈ Ω. For example

Ω = {x ∈ R2 | −π < x 1 < π}.

LaSalle’s Invariance Principle now guarantees the existence of an invariant, closed, bounded
£ ¤
neighborhood K of x̄ in Ω. Cleary this K does not contain π0 either, so then
£0¤
G ={ 0 }

and thus we have asymptotic stability of the hanging position.

143
Although not strictly needed, it may be interesting to know that we can take K equal
to the set of states close enough to (x 1 , x 2 ) = (0, 0) and whose energy is strictly less than the
energy of the upwards position, for example,
£π¤
K = {x ∈ R2 | −π < x 1 < π,V (x) ≤ 0.9V ( 0 )}.

Since the energy does not increase over time it is immediate that this set is invariant. It is also
closed and bounded, and it is a neighborhood of (0, 0). ä

Example B.4.10 (Example B.4.2 continued). Consider again the system

ẋ 1 (t ) = 0,
ẋ 2 (t ) = −x 2 (t )

with equilibrium x̄ = (0, 0) and Lyapunov function V (x) = x 12 + x 22 . In Example B.4.2 we found
that V (x) is indeed a Lyapunov function and that

V̇ (x) = −2x 22 ≤ 0.

Substitution of x 2 (t ) = 0 into the system equations reduces the system equations to

ẋ 1 (t ) = 0.

Hence x 1 (t ) is constant (besides x 2 (t ) = 0) so

G = {(x 1 , x 2 ) ∈ K | x 1 = c, x 2 = 0}.

This is the x 1 -axis. Now LaSalle’s Invariance Principle says that all states converge to the x 1 -
axis. For K we can take for instance K = {x ∈ R2 |V (x) ≤ 10000}. ä

B.5 Cost-to-go Lyapunov functions


It is in general hard to come up with a Lyapunov function for a given ẋ(t ) = f (x(t )) and equi-
librium point x̄. An elegant attempt, with interesting interpretations, goes as follows. Suppose
we have to pay an amount

L(x) ≥ 0

per unit time, when we are at state x. As time progresses we move as dictated by the differen-
tial equation and so the cost L(x(t )) typically changes with time. The cost-to-go V (x 0 ) is now
defined as the total payment over the infinite future if we start at x 0 , that is, it is the integral
of L(x(t )) over positive time,
Z ∞
V (x 0 ) := L(x(τ)) dτ, x(0) = x 0 . (B.15)
0

If L(x(t )) decreases quick enough as we approach the equilibrium x̄ then the cost-to-go may
be well defined (finite) and possibly it is going to be continuously differentiable in x 0 as well.
These are technical considerations and they might be hard to verify. The interesting property
of the cost-to-go V (x(t )) is that it decays as t increases. In fact

V̇ (x) = −L(x) (B.16)

144
whenever V (x) is convergent. To see this split the cost-to-go into an integral over the first h
units of time and an integral over the time beyond h,
Z t +h Z ∞ Z t +h
V (x(t )) = L(x(τ)) dτ + L(x(τ)) dτ = L(x(τ)) dτ + V (x(t + h)).
t t +h t

Therefore
R t +h
V (x(t + h)) − V (x(t )) − t L(x(τ)) dτ
V̇ (x(t )) = lim = lim = −L(x(t )) (B.17)
h→0 h h→0 h
if L(x) is continuous. An interpretation of the equality is that the current cost-to-go minus
the cost-to-go from tomorrow onwards, is what we pay today. The function L(x) is called the
running cost. In physical applications L(x) is often the dissipated power and then V (x) is the
total dissipated energy.
As mentioned earlier, the only obstacle is that the integral (B.15) has to be well defined
and continuously differentiable in x 0 . If the system dynamics is linear of the form

ẋ(t ) = Ax(t )

then these obstacles can be overcome and we end up with a very useful result. It is a classic
theorem in Systems Theory. In this result we take the running cost to be quadratic in x,

L(x) = x T Qx

with Q ∈ Rn×n a symmetric positive definite matrix (see Appendix A.1).

Theorem B.5.1 (Lyapunov equation). Let A ∈ Rn×n and consider ẋ(t ) = Ax(t ) with equilib-
rium x̄ = 0 ∈ Rn . Suppose Q ∈ Rn×n is positive definite and let
Z ∞
V (x 0 ) := x T (t )Qx(t ) dt (B.18)
0

in which x(0) = x 0 . The following four statements are equivalent.

1. x̄ = 0 is a globally asymptotically stable equilibrium of ẋ(t ) = Ax(t ).

2. x̄ = 0 is an asymptotically stable equilibrium of ẋ(t ) = Ax(t ).

3. V (x) defined in (B.18) exists for every x ∈ Rn and it is a strong Lyapunov function for
this system. In fact V (x) is then quadratic, V (x) = x T P x, with P ∈ Rn×n the well defined
positive definite matrix
Z ∞
T
P := e A t Q e At dt . (B.19)
0

4. The linear matrix equation

A T P + P A = −Q (B.20)

has a unique symmetric solution P , and this P is positive definite.

In that case the P of (B.19) and (B.20) are the same.

Proof. We prove the cycle of implications 1. =⇒ 2. =⇒ 3. =⇒ 4. =⇒ 1.

1. =⇒ 2. Trivial.

145
2. =⇒ 3. The solution of ẋ(t ) = Ax(t ) is x(t ) = e At x 0 . By asymptotic stability the entire tran-
sition matrix converges to zero limt →∞ e At = 0 ∈ Rn×n . Now
Z ∞ Z ∞ T
V (x 0 ) = (e At x 0 )T Q(e At x 0 ) dt = x 0T (e A t Q e At )x 0 dt = x 0T P x 0
0 0
R∞ T
for P := 0 e A t Q e At dt . This P is well defined because e At converges to zero expo-
nentially fast. This P is positive definite because it is the integral of a positive definite
matrix.
So V (x 0 ) is well defined and quadratic and, hence, continuously differentiable. It has
a unique minimum at x 0 = 0 and, as we showed earlier, V̇ (x) = −L(x) := −x T Qx ≤ 0.
Hence V (x) is a Lyapunov function, in fact strong Lyapunov function because −x T Qx =
0 iff x = 0.

3. =⇒ 4. Take P defined in (B.19). On the one hand we have V̇ (x) = −L(x) = −x T Qx and on
the other hand we have

d T
V̇ (x) = x P x = ẋ T P x + x T P ẋ = x T (A T P + P A)x.
dt

These two must be the same for all x, so

x T (A T P + P A +Q)x = 0 ∀x ∈ Rn .

This necessarily means that A T P + P A + Q = 0 because if A T P + P A + Q 6= 0 then, as it


is symmetric, a real eigenvector x would have existed with nonzero eigenvalue λ 6= 0
rendering x T (A T P + P A +Q)x equal to λx T x = λkxk2 6= 0 which is a contradiction.
The above shows that for every symmetric Q there is a symmetric P for which A T P +
P A = −Q. This means that the linear mapping from symmetric P to symmetric A T P +
P A is surjective. Then by the rank-nullity theorem of linear algebra, the mapping is
injective as well. So the solution P is unique.
(Likewise one can prove that the solution P of (B.20) is unique among the set of all
matrices P ∈ Rn×n , not necessarily symmetric matrices.)

d T
4. =⇒ 1. Then V (x) := x T P x satisfies V̇ (x) = dt x P x = ẋ T P x +x T P ẋ = x T (A T P +P A)x = −x T Qx
so it is a strong Lyapunov function with V̇ (x) < 0 for all x 6= 0. It is radially unbounded
hence the equilibrium is globally asymptotically stable (Thm. B.3.5.)

Example B.5.2. The system

ẋ(t ) = −2x(t )

is globally asymptotically stable because for q = 1 > 0 the Lyapunov equation

−2p − 2p = −q = −1

has a unique solution p = 1/4 and it is positive. Note that Thm. B.5.1 says that we may take
any q > 0 that we like. Indeed, whatever positive q > 0 we take, we have that the solution
p = q/4 of the Lyapunov equation is unique and is positive. ä

146
Notice that (B.20) is a linear equation in the entries of P and is therefore easily solved (it
requires a finite number of operations). Combined with the fact that positive definiteness of
a matrix is a finite test (Appendix A.1) allows to conclude that stability of ẋ = Ax can be tested
in a finite number of steps.

Example B.5.3. Consider


· ¸
−1 2
ẋ(t ) = x(t ).
0 −1

We choose Q the 2 × 2 identity matrix,


· ¸
1 0
Q= .
0 1
£α β¤
The candidate solution P of the Lyapunov equation we write as P = β γ . The Lyapunov
equation (B.20) then reads
· ¸· ¸ · ¸· ¸ · ¸
−1 0 α β α β −1 2 −1 0
+ = .
2 −1 β γ β γ 0 −1 0 −1

Working out the matrix products on left-hand side, leaves us with


· ¸ · ¸
−2α 2α − 2β −1 0
= . (B.21)
2α − 2β 4β − 2γ 0 −1

By symmetry the upper-right and lower-left entries are identical so the above equation are
effectively three equations in the three unknowns α, β, γ:

−2α = −1,
2α − 2β = 0,
4β − 2γ = −1.

This gives α = β = 1/2 and γ = 3/2, that is


· ¸
1 1 1
P= .
2 1 3

This matrix is positive definite because P 11 = 12 > 0 and det(P ) = 1 > 0 (see Appendix A.1). So
the differential equation with equilibrium x̄ = (0, 0) is (globally) asymptotically stable. ä

B.6 Lyapunov’s first method


Through a process called linearization we can approximate a nonlinear system with a lin-
ear system and often the stability properties of the nonlinear and linear system are alike. In
particular, as we will see, they often share the same Lyapunov function.
We assume that the vector field function f : Rn → Rn is differentiable at the given equilib-
rium x̄. This is to say that f (x) is of the form

f (x̄ + δx ) = Aδx + o(δx ) (B.22)

with A ∈ Rn×n some matrix and o : Rn → Rn some “little-o” function which means having the
property that
k o(δx )k
lim = 0. (B.23)
δx →0 kδx k

147
We think of little-o functions as functions that are “extremely small” around the origin.
To analyze the behavior of the state x(t ) relative to an equilibrium x̄ it makes sense to
define δx (t ) as the difference between state and equilibrium,

δx (t ) := x(t ) − x̄.

This difference obeys the differential equation

δ̇x (t ) = ẋ(t ) − x̄˙ = ẋ(t ) = f (x(t )) = f (x̄ + δx (t )) = Aδx (t ) + o(δx (t )).

The linearized system of ẋ(t ) = f (x(t )) at equilibrium x̄ is now simply defined as the system
in which the little-o term o(δx (t )) is deleted:

δ̇x (t ) = Aδx (t ).

It constitutes a linear approximation of the original nonlinear system but we expect it to be an


accurate approximation as long as δx (t ) is “small”. The matrix A equals the Jacobian matrix
at x̄ defined as
 
∂ f 1 (x̄) ∂ f 1 (x̄)
 ∂x ···
∂x n 
∂ f (x̄)  
1
 .. .. 
A= :=  .  . (B.24)
∂x T  . 
 ∂ f n (x̄) ∂ f n (x̄) 
···
∂x 1 ∂x n
(See Appendix A.2 for an explanation of this notation.)

x→ δx →
x̄ 0
f (x) Aδx

F IGURE B.11: Nonlinear f (x) (left) and its linear approximation Aδx (right)

Example B.6.1. Consider the nonlinear differential equation

ẋ(t ) = − sin(2x(t )). (B.25)

The function f (x) = − sin(2x) has many zeros, among which is

x̄ = 0.

The idea of linearization is that around x̄ the function f (x) is almost indistinguishable from
its tangent with slope

d f (x̄)
A= = −2 cos(0) = −2
dx
(see Fig. B.11) and so the solutions of (B.25) will probably be quite similar to x(t ) = x̄ +δx (t ) =
δx (t ) with δx (t ) the solution of the linear system

δ̇x (t ) = −2δx (t ) (B.26)

provided that δx (t ) is small. The above linear system (B.26) is known as the linearized system
of (B.25) at equilibrium x̄ = 0. ä

148
Lyapunov’s first method, presented next, roughly speaking says that the nonlinear system
and the linearized system have the same asymptotic stability properties. The only exception
to this rule is if the eigenvalue of largest real part is on the imaginary axis. The proof of this
result relies on the fact that every asymptotically stable linear system has a Lyapunov function
(its cost-to-go) which then turns out to be a Lyapunov function for the nonlinear system as
well:

Theorem B.6.2 (Lyapunov’s first method). Let f : Rn → Rn be a continuously differentiable


function and let x̄ be an equilibrium of ẋ(t ) = f (x(t )).

1. If all eigenvalues of the Jabobian (B.24) have strictly negative real part, then x̄ is an
asymptotically stable equilibrium of the nonlinear system.

2. If there is an eigenvalue of the Jacobian (B.24) with strictly positive real part, then x̄ is
an unstable equilibrium of the nonlinear system.

Proof. (First realize that continuous differentiability of f (·) implies Lipschitz continuity and
so Lyapunov theory might be applicable.) Write f (x) as in (B.22). Without loss of generality
we assume that x̄ = 0, and we define A as in (B.24).

1. By assumptions on the eigenvalues the linearized system δ̇x (t ) = Aδx (t ) is asymptoti-


cally stable. So Thm. B.5.1 guarantees the existence of a positive definite matrix P that
satisfies

A T P + P A = −I

and that V (x) = x T P x is a strong Lyapunov function for the linear system δ̇x (t ) = Aδx (t ).
We prove that this V (x) is also a strong Lyapunov function for ẋ(t ) = f (x(t )) on some
neighborhood Ω of x̄ = 0. Clearly this V (x) is positive definite and continuously differ-
entiable and positive definite. We have that

V̇ (x) = ẋ T P x + x T P ẋ
= f (x)T P x + x T P f (x)
= [Ax + o(x)]T P x + x T P [Ax + o(x)]
= x T (A T P + P A)x + o(x)T P x + x T P o(x)
= −x T x + 2 o(x)T P x
= −kxk2 + 2 o(x)T P x.

The term 2 o(x)T P x we recognize as the standard inner product of 2 o(x) and P x, so by
the Cauchy-Schwarz inequality we can bound it from above with

V̇ (x) ≤ −kxk2 + 2k o(x)kkP xk.

Based on this we now choose Ω as


1
Ω := {x ∈ Rn | 2k o(x)kkP xk ≤ kxk2 }.
2
From (B.23) it follows that this Ω is a neighborhood of x̄ = 0. Then, finally, we find that
1
V̇ (x) ≤ − kxk2 ∀x ∈ Ω.
2
Therefore on Ω \ {0}, we have V̇ (x(t )) < 0, making V (x) a strong Lyapunov function for
the nonlinear system.

149
2. See (Khalil, 1996, Thm. 3.7).

These two cases of Theorem B.6.2 cover all possible eigenvalue configurations, except
when some eigenvalues have zero real part and none have positive real part, see Exercise B.5.

Example B.6.3. Consider the system

ẋ 1 (t ) = x 1 (t ) + x 1 (t )x 22 (t )
ẋ 2 (t ) = −x 2 (t ) + x 12 (t )x 2 (t ).
£0¤
The system has equilibrium x̄ := 0 and the Jacobian at that equilibrium equals
· ¸¯ · ¸
∂ f (x̄) 1 + x 22 2x 1 x 2 ¯¯ 1 0
A= = = .
∂x T 2x 1 x 2 −1 + x 12 ¯x=(0,0) 0 −1

Clearly it has eigenvalues ±1. In particular it has a positive eigenvalue. Lyapunov’s first
method hence proves that the system at this equilibrium is unstable. ä

B.7 Exercises
B.1 (a) Prove that if V (x) is a Lyapunov function for the system (B.2) with equilibrium
point x̄, then V̇ (x̄) = 0.
(b) Prove that if a system of the form (B.2) has more than one equilibrium point, then
none of these equilibrium points is globally asymptotically stable.
(c) Consider the linear system

ẋ(t ) = Ax(t ),

with A an n × n matrix. Prove that this system either has exactly one equilibrium,
or infinitely many equilibria.

B.2 Investigate the stability of the origin for the following two systems (that is, check all six
stability types mentioned in Definition B.2.2). Use a suitable Lyapunov function.

(a)

ẋ 1 (t ) = −x 13 (t ) − x 22 (t )
ẋ 2 (t ) = x 1 (t )x 2 (t ) − x 23 (t ).

[Hint: take the simplest V (x) you know.]


(b)

ẋ 1 (t ) = x 2 (t )
ẋ 2 (t ) = −x 13 (t ).

β
[Hint: try V (x 1 , x 2 ) = x 1α + c x 2 and then determine suitable α, β, γ.]

150
B.3 This exercise is based on an exercise in Khalil (1996) who, in turn, took it from Hahn
(1967) and it appears that Hahn was inspired by an example from a paper by Barbashin
and Krasovskı̆ (1952). Consider the system

−x 1 (t ) + x 2 (t )(1 + x 12 (t ))2
ẋ 1 (t ) =
(1 + x 12 (t ))2
−x 1 (t ) − x 2 (t )
ẋ 2 (t ) =
(1 + x 12 (t ))2

and define V (x) as

x 12
V (x) = + x 22 .
1 + x 12

(a) Show that (0, 0) is the only equilibrium point.


(b) Show that V (x) is a strong Lyapunov function on the entire state space Ω := R2 .
(c) Show that the level sets {x ∈ R2 | V (x) = c} of the Lyapunov function are unbounded
if c ≥ 1. Hence the Lyapunov function is not radially unbounded. Figure B.12 de-
picts several level sets.
(d) Figure B.12 also depicts the curve x 2 = 1/x 1 and the region to the right of it where
x 1 x 2 > 0. The phase portrait suggests that x 1 (t )x 2 (t ) increases if x 2 (t ) = 1/x 1 (t ).
Indeed. Show that
dx 1 (t )x 2 (t ) 1
= 2 >0
dt x 1 (t )(1 + x 12 (t ))2

whenever x 2 (t ) = 1/x 1 (t ) > 0.


(e) Use the above to prove that the origin is not globally asymptotically stable.

x1 x2 > 1

0 5

F IGURE B.12: A phase portrait of the system of Exercise B.3. The red dashed lines are level sets
of V (x). The boundary of the shaded region x 1 x 2 > 1 is where x 2 = 1/x 1

B.4 Adaptive Control. The following problem from adaptive control illustrates an extension
of the theory of Lyapunov functions to functions that are, strictly speaking, no longer

151
Lyapunov functions. This problem concerns the stabilization of a system of which the
parameters are not (completely) known. Consider the following scalar system.

ẋ(t ) = ax(t ) + u(t ), (B.27)

where a is a constant and u(t ) is an input that we should choose to steer the state to
zero. If we know a then u(t ) = −kx(t ), with k > a, would solve the problem. However,
we assume that a is unknown but that we can measure x(t ). Contemplate the following
dynamic state feedback input

u(t ) = −k(t )x(t )


(B.28)
k̇(t ) = x 2 (t ), k(0) = 0.

The idea is that k(t ) increases until it has stabilized the system, so until x(t ) is equal to
zero.

(a) Write (B.27)–(B.28) as one system with state (x, k) and determine all equilibrium
points.
(b) Consider the function V (x, k) := x 2 + (k − a)2 . Prove that V̇ (x, k) = 0 for all x, k. For
which equilibrium point is this a Lyapunov function?
(c) Prove, using the above, that k(t ) is bounded.
(d) Prove, using (B.28), that k(t ) converges as t → ∞.
(e) Prove that x(t ; x 0 ) converges, and prove specifically that limt →∞ x(t ; x 0 ) = 0.
(f) Determine limt →∞ k(t ).

B.5 Linearization. Consider the scalar equation

ẋ(t ) = ax 3 (t )

with a ∈ R.

(a) Prove that the linearization of this system about its equilibrium point is indepen-
dent of a.
(b) Sketch the graph of ax 3 as a function of x and use it to argue that the equilibrium
is
• For which α is the equilibrium asymptotically stable?
• For which α is the equilibrium stable?
• For which α is the equilibrium unstable?

B.6 Consider the system

ẋ 1 (t ) = −x 15 (t ) − x 2 (t ),
ẋ 2 (t ) = x 1 (t ) − 2x 23 (t ).

(a) Determine all points of equilibrium


(b) Determine a Lyapunov function for the equilibrium x̄ = (0, 0) and discuss the type
of stability that follows from this Lyapunov function (stable? asymptotically stable?
Globally asymptotically stable?)

152
B.7 Suppose that

ẋ 1 (t ) = x 2 (t ) − x 1 (t )
ẋ 2 (t ) = −x 13 (t )

and use the candidate Lyapunov function V (x 1 , x 2 ) = x 14 + 2x 22 . The equilibrium is x̄ =


(0, 0).

(a) Is this a Lyapunov function?


(b) Is this a strong Lyapunov function?
(c) Investigate the nature of stability of this equilibrium with LaSalle’s invariance prin-
ciple.

B.8 Consider the Van der Pol equation

ÿ(t ) − ²(1 − y 2 (t )) ẏ(t ) + y(t ) = 0.

This equation occurs in the study of vacuum tubes and then ² is positive. However, in
this the exercise we take ² < 0.

(a) Rewrite this equation in the standard form (B.2) with x 1 (t ) := y(t ) and x 2 (t ) := ẏ(t ).
(b) Use linearization to show that the origin (x 1 , x 2 ) = (0, 0) is an asymptotically stable
equilibrium (recall that ² < 0).
(c) Determine a neighborhood Ω of the origin for which V (x 1 , x 2 ) = x 12 + x 22 is a Lya-
punov function for x̄ = (0, 0).
(d) Let V (x 1 , x 2 ) and Ω be as in the previous part. What stability properties can be
concluded from LaSalle’s invariance principle?

B.9 The well-known Lotka-Volterra model describes the interaction between a population
of predators (with size x 1 ) and preys (with size x 2 ), and is given by the equations

ẋ 1 (t ) = −ax 1 (t ) + bx 1 (t )x 2 (t ), x 1 (0) ≥ 0
(B.29)
ẋ 2 (t ) = cx 2 (t ) − d x 1 (t )x 2 (t ) x 2 (0) ≥ 0.

The first term on the right-hand side of the first equation shows that the predators will
become extinct without food, while the second term shows that the growth of their pop-
ulation is proportional to the size of the population of prey. Likewise, the term on the
right-hand side of the second equation shows that without predators, the population of
prey will increase, and that its decrease is proportional to the size of the population of
predators. For convenience we choose a = b = c = d = 1.

(a) Show that, apart from (0, 0), the system has a second equilibrium point.
(b) Investigate the stability of both equilibrium points using linearization.
(c) Investigate the stability of the nonzero equilibrium point using the function

V (x 1 , x 2 ) = x 1 + x 2 − ln(x 1 x 2 ) − 2.

Here, ln is the natural logarithm.

153
B.10 The equations of motion of the pendulum with damping in state form is

ẋ 1 (t ) = x 2 (t )
g k (B.30)
ẋ 2 (t ) = − sin(x 1 (t )) − x 2 (t ).
` `
where x 1 is the angular displacement, and x 2 is the angular velocity, g is the gravita-
tional constant, ` is the length of the pendulum and k is the damping constant. All
constants are positive.

(a) Prove, using Theorem B.6.2, that the origin is an asymptotically stable equilibrium
point.
(b) In Example B.4.9 we verified asymptotic stability using LaSalle’s invariance prin-
ciple. Here we want to construct a strong Lyapunov function to show asymptotic
stability using Theorem B.3.2: determine a symmetric matrix P > 0 such that the
function

V (x) := x T P x + g [1 − cos(x 1 )]

is a strong Lyapunov function for (B.30) on some neighborhood Ω of the origin.


This exercise assumes knowledge of Appendix A.1.

B.11 Consider the system

ẋ 1 (t ) = −2x 1 (t ) [x 1 (t ) − 1] [2x 1 (t ) − 1]
(B.31)
ẋ 2 (t ) = −2x 2 (t ).

(a) Calculate all equilibrium points of the system (B.31).


(b) Prove that there are two asymptotically stable equilibrium points.
(c) Investigate the stability of the other equilibrium point(s).

B.12 Determine all equilibrium points of

ẋ 1 (t ) = x 1 (t )(1 − x 22 (t ))
ẋ 2 (t ) = x 2 (t )(1 − x 12 (t )).

For each of the equilibrium points determine the linearization and the nature of stabil-
ity of the linearization.

B.13 Have a look at Fig. B.13. The equations of motion of a rigid body spinning around its
center of mass are

I 1 ω̇1 (t ) = [I 2 − I 3 ]ω2 (t )ω3 (t )


I 2 ω̇2 (t ) = [I 3 − I 1 ]ω1 (t )ω3 (t ) (B.32)
I 3 ω̇3 (t ) = [I 1 − I 2 ]ω1 (t )ω2 (t ),

where ω(t ) := (ω1 (t ), ω2 (t ), ω3 (t )) is the vector of angular velocities around the three
principle axes of the rigid body and I 1 , I 2 , I 3 > 0 are the principal moments of inertia.
The kinetic energy (due to rotation) is

1
(I 1 ω21 + I 2 ω22 + I 3 ω31 ).
2
(a) Prove that the origin ω = (0, 0, 0) is a stable equilibrium.

154
ω3

ω2

ω1

F IGURE B.13: A spinning rigid body

(b) Prove that the origin ω = (0, 0, 0) is not asymptotically stable.

Now assume that the moments of inertias are ordered as

0 < I1 < I2 < I3.

(This implies a certain lack of symmetry of the rigid body, e.g. it is not a unit cube,
see Fig. B.13.)

(c) The origin (0, 0, 0) is just one equilibrium. Determine all equilibria and explain
what this implies about the stability properties.
(d) Determine the linearization around each of the equilibria.
(e) Use linearization to prove that steady spinning around the second principal axis
(0, ω¯2 , 0) is unstable if ω̄2 6= 0.
(f) This is a tricky question. Prove that both the kinetic energy

1 2 2 3
2 (I 1 ω1 + I 2 ω2 + I 3 ω1 )

and the squared total angular impulse

I 12 ω21 + I 22 ω22 + I 32 ω31

are constant over time and use this to prove that steady spinning around the first
and third principal axes is stable, but not asymptotically stable.

B.14 Consider the system ẋ(t ) = f (x(t )) with equilibrium point x̄. Suppose that there ex-
ists a Lyapunov function such that V̇ (x) = 0 for all x ∈ Ω. Prove that this system is not
asymptotically stable.

B.15 Let x(t ; x 0 ) be a solution of the differential equation ẋ(t ) = f (x(t )). Prove that the orbit
O (x 0 ) = {x(t ; x 0 ) | t ≥ 0} is an invariant set for ẋ(t ) = f (x(t )).

B.16 A trajectory x(t ; x 0 ) is closed if x(t +s; x 0 ) = x(t , x 0 ) for some t and some s > 0. Let x(t ; x 0 )
be a closed trajectory of (B.2) and let V (x) be a Lyapunov function for this system. Prove
that V̇ (x(t ; x 0 )) = 0.

155
B.17 In this exercise, we look at variations on the system (B.10) from Example B.4.6. We
investigate the system
£ ¤
ẋ 1 (t ) = x 2 (t ) + x 1 (t ) γ − x 12 (t ) − x 22 (t )
£ ¤ (B.33)
ẋ 2 (t ) = −x 1 (t ) + x 2 (t ) γ − x 12 (t ) − x 22 (t )

with γ ∈ R. Prove that the origin is an asymptotically stable equilibrium point if γ ≤ 0


and that it is an unstable equilibrium point if γ > 0.

B.18 (Assumes Appendix A.1.) Determine all α, β ∈ R for which the matrix
 
α 0 0
P =  0 1 β
0 β 4

(a) is positive definite


(b) is positive semi-definite but not positive definite.

B.19 (Assumes Appendix A.1.) Let the matrices A and Q be given by:
· ¸ · ¸
0 1 4 6
A= , Q= ,
−2 −3 6 10

(a) Determine a matrix P such that:

A T P + P A = −Q.

(b) Show that P and Q are positive definite and conclude that A is Hurwitz (i.e. that
all its eigenvalues have strictly negative real part).

B.20 (Assumes Appendix A.1.) Consider the matrix


 
−2 1 0
A =  0 −1 0  .
0 1 −2

(a) Use M APLE or M ATLAB to determine the solution of the Lyapunov equation

A T P + P A = −I .

(b) Check that the solution P is positive definite.

B.21 Consider the differential equation

ẋ 1 (t ) = x 1 (t ) + 2x 2 (t )
ẋ 2 (t ) = −αx 1 (t ) + (1 − α)x 2 (t )

Determine all α’s for which this differential equation is asymptotically stable.

B.22 Fig B.14 shows a phase portrait of ẋ(t ) = Ax(t ) with


· ¸
−1 − π/2
3
A= .
3π/2 −1

156
4

−1 1

−1

−2

−3

−4

F IGURE B.14: Phase portrait of the system of Exercise B.22

157
£p 0¤
(a) Determine a diagonal positive definite matrix P of the form P = 0 1
for which
also P A + A T P is diagonal.
(b) Show that x T P x is a strong Lyapunov function for this system (with equilibrium
x̄ = 0).
(c) Sketch in Fig. B.14 a couple of level sets {x | x T P x = constant} and explain from this
figure why indeed V̇ (x(t )) < 0 for all nonzero x(t ).

B.23 Notice that the results that we derived in this chapter are valid only for time-invariant 1
systems ẋ(t ) = f (x(t )). For time-varying systems ẋ(t ) = f (x(t ), t ) the story is quite dif-
ferent, even if the system is linear of the form

ẋ(t ) = A(t )x(t ). (B.34)

For linear systems it might sound reasonable to conjecture that it is asymptotically sta-
ble if for every t all eigenvalues of A(t ) have negative real part. In this exercise we will
see that this is wrong. Consider the system (B.34) where
(
A even if bt c is even
A(t ) = (B.35)
A odd if bt c is odd

with
· ¸ · ¸
−1 − π/2
3
−1 −3π/2
A even = , A odd = 1 .
3π/2 −1 3 π/2 −1

Here bt c denotes the floor of t (the largest integer less than are equal to t ). The system
hence switches dynamics at every t ∈ Z.

(a) Prove that the eigenvalues of A(t ) at each moment in time are −1 ± iπ/2. So in
particular the eigenvalues do not depend on t and its real parts are less than zero
for all time.
(b) Verify that
· ¸
cos( π2 t ) − 13 sin( π2 t )
−t
x(t ) = e x(0)
3 sin( π2 t ) cos( π2 t )
· π ¸
−t cos( 2 t ) −3 sin( π2 t )
x(1 + t ) = e 1 π x(1)
3 sin( 2 t ) cos( π2 t )

for all t ∈ [0, 1].


(c) Show that
· ¸
−(3/ e)2 0
x(2k + 2) = x(2k)
0 −1/(3 e)2

for all k ∈ Z and use it to conclude that the time-varying system (B.34) is not
asymptotically stable.

1 Time-invariant systems are sometimes called “autonomous” systems.

158
F IGURE B.15: Stable or not? Globally attractive or not? See Exercise B.24

B.24 This exercise is based on an example from a paper by Ryan and Sontag (2006). Consider
the system ẋ(t ) = f (x(t )) with
" 1 x1
#

 −x 1 (1 − kxk ) − 2x 2 (1 − kxk )

 if kxk ≥ 1

 −x 2 (1 − 1 ) + 2x 1 (1 − x1 )
kxk kxk
f (x) = " #



 2(x 1 − 1)x 2

 if kxk < 1
−(x 1 − 1)2 + x 22

Inside the unit disc f (x) is defined differently than outside the unit disc. Nevertheless,
f (x) is locally Lipschitz, also on the unit circle. Inside the unit circle, the orbits are arcs
(part of circles) that converge to x = (1, 0), see Fig. B.15. Outside, kxk ≥ 1, the p system is
easier to comprehend in polar coordinates (x, y) = (r cos(θ), r sin(θ)) with r = x 2 + y 2 .
This gives

r˙(t ) = 1 − r (t )
(B.36)
θ̇(t ) = 4 sin2 (θ(t )/2) = 2(1 − cos(θ(t ))).

(a) Derive (B.36).


(b) Show that x̄ := (1, 0) is the unique point of equilibrium.
(c) Argue that for kx(0)k > 1 its phase portrait is as in Fig. B.15.
(d) Is x̄ globally attractive?
(e) Is x̄ a stable equilibrium point?

B.25 Let A ∈ Rn×n and suppose that A + A T is negative definite. Is the origin a stable equilib-
rium of ẋ(t ) = Ax(t )?

B.26 (Assumes Appendix A.1.) Let C ∈ Rk×n .

(a) Show that C T C is positive semi-definite.


(b) Show that C T C is not positive definite if k < n.

159
B.27 time-varying systems There is a way to transform time-varying systems

ẋ(t ) = f (x(t ), t ), x(t 0 ) = x 0 , t ≥ t0 (B.37)

into time-invariant systems


£ x0 ¤
x̃˙ (t ) = f˜(x̃(t )), x̃(t 0 ) = t0 , t ≥ t0 . (B.38)
£x¤
(a) Let x̃ = ˜
t . Given f (·) determine a f (·) such that (B.37) is equivalent to (B.38).

(b) Use the above and Thm. B.1.3 to prove the following:
Let t 0 ∈ R and x 0 ∈ Rn . If f (x, t ) is Lipschitz continuous at (x 0 , t 0 ) then, for some
δ > 0, the differential equation ẋ(t ) = f (x(t ), t ), x(t 0 ) = x 0 has a unique solution
x(t ; x 0 ) for all t ∈ [t 0 , t 0 + δ).

160
Appendix C

Bibliography

E.A. Barbashin and N.N Krasovskı̆. Ob ustoichivosti dvizheniya vtzelom. Dokl. Akad. Nauk.
USSR, 86(3):453–456, 1952. (Russian). English title: "On the stability of motion in the large".

W. Hahn. Stability of Motion, volume 138 of Die Grundlehren der mathematischen Wis-
senschaften. Springer-Verlag, New York, 1967.

H.K. Khalil. Nonlinear Systems. Macmillan Publisher Company, New York, 2 edition, 1996.

D. Liberzon. Calculus of Variations and Optimal Control Theory: A Concise Introduction.


Princeton University Press, Princeton, 2012.

E.P. Ryan and E.D. Sontag. Well-defined steady-state response does not imply CICS. System
and Control Letters, 55:707–710, 2006.

E.D. Sontag. Mathematical Control Theory: Deterministic Finite Dimensional Systems (2Nd
Ed.). Springer-Verlag, Berlin, Heidelberg, 1998. ISBN 0-387-98489-5.

161
Index

B (x, r ), 135 asymptotically stable, 133


C 1, 4 attractive-, 133
H (x, p, u, λ), 47 stable, 133
J (u(·), T ), 50 unstable, 133
J [0,T ] (x 0 , u), 61 Euler, 7
V (z, τ), 66 Euler-Lagrange, 7
∂ f (x)
∂x , 119
(RDE), 92 free end-point, 16

action integral, 19 global Lipschitz condition, 131


Algebraic Riccati Equation (ARE), 97 globally
angular momentum, 20 asymptotically stable, 133
ARE, 97 attractive, 133
asymptotically stable, 133 globally asymptotically stable, 133
attractive Goldschmidt, 12
equilibrium, 133
Hamilton’s principle, 19
Beltrami identity, 8 Hamilton-Jacobi-Bellman, 68
brachistochrone, 1, 8 Hamiltonian, 37
equations, 38
catenoid, 12 equations (Lagrangian mechanics), 20
characteristic equation, 121 Lagrangian mechanics, 19
Cholesky factorization, 118 modified, 47
closed trajectory, 155 Hamiltonian matrix, 87
controllability, 123 Hessian, 119
cost, 4 HJB, 68
-to-go, 144
-to-go (discrete), 65 infinite horizon LQ, 96
final-, 15 input, 123
initial-, 15 invariant set, 140
running, 4
Jacobian, 119, 148
terminal-, 15
costate, 38, 76 Lagrange
cycloid, 9 lemma, 6
detectability, 124 multiplier, 36, 126
detectable Lagrangian, 36, 37
for Q ≥ 0, 100 Legendre
Dido condition, 22
isoperimetric, 25 linearization, 133, 147
linearized system, 148
equilibrium, 133 Lipschitz

162
constant, 130 stationary, 7
continuity, 130
locally, 130 tracking problem, 113
Lotka-Volterra model, 153
unstable, 133
LQ problem, 85
Lyapunov value function, 66, 78
first method, 147 Van der Pol equation, 153
function, 135
second method, 133 Zermelo, 50
strong function, 135
Lyapunov function, 135

matrix exponential, 122


Minimum Principle, 42
momentum, 19

negative semi-definite, 134


nonnegative definite, 117

observability, 124
optimal control, 35
optimal cost-to-go, 66
orbit, 140
output, 124

parallelogram law, 111


partial derivative, 118
particular solution, 121
positive
definite, 117, 134
definite matrix, 117
semi-definite, 117, 134
semi-definite matrix, 117
principle of optimality, 61

radial unboundedness, 138


region of attraction, 138
Riccati differential equation, 92
Riccati Differential Equation, 92
running cost, 4, 35, 145

Schur complement, 118


second order condition, 21
simplest problem in the calculus of varia-
tions, 4
stabilizability, 124
stabilizable, 100
stabilizing, 100
stable, 133
asymptotically, 133
globally asymptotically, 133

163

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy