Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
Linear Systems and Optimal Control Condensed Notes: J. A. Mcmahan JR
Condensed Notes
J. A. McMahan Jr.
1 Definitions 3
2
Chapter 1
Definitions
ordinary differential equation: An equation of the form x = f (x, t), where x = x(t). Here x can
be a vector and f can be vector valued so that this may be a system of ODEs.
domain: An open, nonempty, connected subset of Rn+1 . This is associated with a problem where
x Rn .
linear system: If for any two solutions 1 and 2 of an ODE, c1 + 2 is also a solution, where
c is a constant, the ODE is called linear. These can be written as x = A(t)x, where A(t) is a
matrix-valued function.
autonomous system: An ODE of the form x = f (x) (i.e., no explicit time dependence) is called
autonomous or time-invariant.
similar matrices: If A and B are square matrices that are equivalent in the form A = P 1 BP ,
then they are called similar.
minimal polynomial: The minimal polynomial of A is the polynomial m(x) of least degree such
that m(A) = 0.
companion form: The companion form of a matrix A is a matrix Ac which is similar to A and of
the form
0 1 0 0
0
0 1 0
.. .. .. . . .. .
.
. . . .
0 0 0 1
a0 a1 a2 an1
Note that the ai elements are also the coefficients of the characteristic polynomial of A. The
companion form does not always exist.
3
fundamental matrix: Let {1 , . . . , n } be a set of n linearly independent solutions of an nth order
linear homogeneous ODE. A fundamental matrix of the system is
= 1 2 n
state transition matrix: For any fundamental matrix of a linear homogeneous ODE, the state
transition matrix is defined by
(t, t0 ) = (t)1 (t0 ).
mode of a system:
zero-input response: The response of an ODE system with the input set to 0.
zero-state response: The response of an ODE system with the initial condition set to 0.
x = Ax + Bu
y = Cx + Du,
is the function
1
H(s) = C (sI A) B + D.
1.2 Optimization
performance index: A scalar function, L(x, u) R, which gives a cost or performance value for a
given x (the state or auxiliary vector) and u (the control or decision vector).
critical point: A critical point (also called stationary point) of a function L is a point where the
increment dL = 0 to first order for the increments of the variables which L depends on.
curvature matrix: The second derivative of a scalar function L(u), Luu . This is also called the
Hessian matrix.
positive definite / semidefinite: A scalar function is positive definite if its range is positive and
positive semidefinite if its range is nonnegative.
negative definite / semidefinite: A scalar function is negative definite if its range is negative
and negative semidefinite if its range is nonpositive.
indefinite: A scalar function is indefinite if its range contains both positive and negative values.
level curve: The level curve of a scalar function corresponding to the value c is the set of points
{u : L(u) = c}.
4
the requirement that the derivatives of the scalar performance index and the constraint equation
function form a singular linear system.
Hamiltonian: A function of the state, control, and Lagrange multipliers which gives a concise
representation of the necessary conditions for an extremum with equality constraints. Given a
scalar performance index, L(x, u), and a constraint equation function, f (x, u), the Hamiltonian is
defined as
H(x, u, ) = L(x, u) + T (f (x, u).
This is for the general static optimization case. The discrete-time Hamiltonian for the general
optimization problem is given later.
admissible region: For constrained input problems, this is the region which u(t) is allowed to lie
in.
sign function: The sign function, denoted sgn(x), is defined as
1,
x < 0,
sgn(x) = undetermined, x = 0,
1, x > 0.
5
Chapter 2
Assumptions
Conclusions
A necessary condition for an extremum at u is for u to be a critical point of L, i.e.
Lu = 0.
If Luu is semidefinite, then higher order terms of the Taylor expansion for an increment in
L must be examined to determine what type of critical point we have.
6
Facts
A matrix A is positive definite if all of its eigenvalues are positive. It is positive semidefinite
if all of its eigenvalues are nonnegative.
A matrix A is negative definite if all of its eigenvalues are negative. It is negative semidef-
inite if all of its eigenvalues are nonpositive.
A matrix A is indefinite if some of its eigenvalues are positive, some of its eigenvalues are
negative, and none of its eigenvalues are zero.
The gradient Lu is always perpendicular to the level curves of L and points in the direction
of increasing L(u).
Proof Notes
Write out the Taylor series for an increment in L,
1
dL = LTu du + duT Luu du + O(kuk3 ).
2
The above follows fairly easily from analyzing this.
We want to select a control u to optimize a scalar performance index L(x, u) while simultaneously
satisfying a constraint equation f (x, u) = 0. We will generally look for the minimum value, although
finding the maximum value should be similar.
7
Assumptions
x, u Rn .
L(x, u) R is a scalar performance index.
f (x, u) = 0 where f Rn is a constraint equation.
Conclusions
Lu fuT fxT Lx = 0.
must be positive definite. If this is negative definite, it is a sufficient condition for a maxi-
mum. If it is indefinite, it is a sufficient condition for a saddle point.
Changes in constraints
Remaining at an optimal solution we want the dx, du, and dL as functions of df . The
expressions for this are
Facts
Set du = 0. Then Lf = . That is, the Lagrange multiplier is the partial derivative of L
with respect to the constraint while holding u constant.
The Lagrange multiplier indicates the rate of change of the optimal value of the perfor-
mance index with respect to the constraint.
8
Proof Notes
9
2.2 Optimal Control Basics
Assumptions
Z b
Performance index in the form J(x) = L(x(t), x(t), t) dt
a
Conclusions
For a fixed end-point, a necessary condition for an extremum x in the cost J, is that x
satisfies the boundary value problem
dLx
Lx = 0, x(a) = xa , x(b) = xb
dt
For a free end-point, a necessary condition for an extremum x in the cost J, is that x
satisfies the boundary value problem
dLx
Lx = 0, x(a) = xa , Lx (x , x , t)|t=b = 0
dt
When applying the equation above, remember that taking the derivative with respect to
x works the same as if we relabel with v = x and take the derivative with respect to v.
Make sure the time derivative is applied to x and x implicitly.
This can be proved with some techniques from the calculus of variations. Assume x is the
minimum and compute the Taylor expansion of J at x evaluated at some perturbation of
x , like x + x. J(x + x) J(x ) 0 for a minimum. Discard any nonlinear parts of
the Taylor series. Use integration by parts and simplify to get the above equations.
10
2.2.2 General Continuous-time and Discrete-time Optimization
Discrete Continuous
L(x, u, t) + T f (x, u, t)
T
Costate Equation: Hxkk = k = (fxkk )T k+1 + Lkxk Hx = = (fx ) + Lx
T
Stationarity Equation: Hukk = 0 = (fukk )T k+1 + Lkuk Hu = 0 = (fu ) + Lu
Boundary Conditions
T T
Discrete: Lixi + fxi i i+1 dxi = 0,
T
(xN N ) dxN = 0
T
x + xT dx(T ) + t + tT + H T dT = 0
T
For the discrete case, construct an augmented performance index by appending the state
constraint with the Lagrange multiplier to the given performance index. Simplify the
notation by inserting the Hamiltonian function defined above. Compute the increment of
this augmented performance index and set it to zero, as the theory from static optimization
says this is a necessary condition for an extremum. This allows you to reason the various
expressions making up the increment must be zero, which gives the above expressions.
For the continuous case, you need to use some ideas from the calculus of variations which
Ill fill in later.
The boundary condition involving the initial condition for the continuous case may be off.
Usually the initial condition is given so dont worry too much about it.
11
2.3 Linear Quadratic Regulator and Related Problems
The linear quadratic regulator (LQR) defines an important class of control problems which involve
linear dynamics and a quadratic cost. They are analytical results available for these problems which
make them fairly well-understood.
Discrete Continuous
1 T 1 T
Performance Index: Ji = x P xN J(t0 ) = x (T )P x(T )
2 N 2
N 1 Z T
1 X T 1
xk Qk xk + uTk Rk uk xT Q(t)x + uT R(t)u dt
+ +
2 2 t0
k=1
Ricatti Equations
1 T
Discrete: Sk = ATk Sk+1 Sk+1 Bk BkT Sk+1 Bk + Rk Bk Sk+1 Ak + Qk ,
SN = P
Continuous: S = AT S + SA SBR1 B T S + Q,
S(T ) = P
Optimal Feedback
Discrete Continuous
Restrictions
A, B, Q, R, P, S linear
R > 0, symmetric
Q 0 , P 0 , S 0, all symmetric
12
Proof and Usage Notes
For open loop control of an LQR system, just plug the equations into the general opti-
mization formulas and solve.
For closed loop control, we will have P given. Solve the Ricatti equation backwards in time
using S(T ) = P as the end condition. Plug the result into the formula for the Kalman
gain and then into the formula for the optimal feedback control.
A technique that often works for deriving the Ricatti equations and related expressions is
to assume a form for which matches what the boundary conditions give you. Plugging
this into the necessary conditions often allows you to eliminate and find an expression
in terms of given parameters in the problem.
The cost-to-go is the cost starting at time ti or t0 to reach the desired state when using
the optimal feedback control. So this is the optimal cost for a given initial time and initial
state. This can be derived by plugging the optimal control expression into the cost and
manipulating algebraically.
Continuous: 0 = AT S + S A S BR1 B T S + Q,
Suboptimal Feedback
Discrete Continuous
1
Kalman Gain: K = B T S B + R B T S A K = R1 BS
Restrictions
A, B, Q, R time invariant
S must exist (check later theorems for some existence criteria)
13
Facts and Usage Notes
The feedback works in the same way as the optimal feedback, only you solve the algebraic
Ricatti equations rather than the differential Ricatti equations.
The solutions to the algebraic Ricatti equations do not need to be unique, symmetric,
positive definite, real, or finite, so this is not always an option. Some theorems are given
later on to identify situations where they do exist.
The solution to the algebraic Ricatti equation is an equilibrium point for the Ricatti
difference / differential equaion. The particular solution we want to use in the suboptimal
feedback is a limit of the Ricatti equation as time goes to infinity, so the suboptimal
feedback can be thought of as the optimal feedback on infinite intervals. As such, it works
better when it is used over longer time intervals.
These theorems give some criteria for the existence of useful solutions to the algebraic Ricatti
equations. They are identical for the discrete and continuous cases.
Assumptions
Conclusions
There exists S = lim S(t) lim Sk , a bounded solution to the Ricatti equaion, for
t k
every S(T ) (SN ).
S is a positive definite solution to the algebraic Ricatti equation.
Proof Notes
Stabilizability implies the existence of a feedback which keeps the state bounded and drives the
state to zero as time goes to and implies a finite cost as t (k ). The optimal cost
is bounded above by the cost of this feedback system. This can be used to show the solution to
the Ricatti equation associated with the optimal cost system is bounded for all time. Symmetry
at all points in time implies symmetry of the solution, and similar conclusions are drawn for the
positive definiteness of the solution.
14
Assumptions
Equivalent Statements
(A, B) is stabilizable.
1. There exists a unique positive definite limiting solution, S to the Ricatti equation
which is also the unique positive definite solution to the algebraic Ricatti equation.
2. The closed-loop system
(Continuous) x = (A BK )x
To be clear, items 1. and 2. above are equivalent. These are each equivalent to the
stabilizability of (A, Q).
One direction of the proof is easy, as the asymptotic stability of the closed loop plant
implies (A, B) is stabilizable. The
other direction involves using the optimal cost equation
to show the observability of (A, Q) requires the asymptotic stability of the state.
This result is the same as the previous theorem with the observability condition added in.
15
2.3.4 Ricatti Equation Analytic Solution
Discrete Continuous
A1 A1 BR1 B T A BR1 B T
Hamiltonian Matrix: H= H=
QA1 A + QA1 BR1 B T
T
Q AT
xk x x x
System: = H k+1 =H
k k+1
1
H Eigenvalue Property: (H) (H) (H) (H)
M 0 M 0
Eigenvalue Matrix: D= D=
0 M 1 0 M
W11 W12
Take W = such that W 1 HW = D
W21 W22
1
Solution: S = W21 W11
The continuous case constructs the ARE solution from the stable eigenvectors of H, while
the discrete case uses the unstable eigenvectors of H (or stable eigenvectors of H 1 .
The discrete matrix defined above seems to be backwards in time, so see if you can define
it forwards in time. This may make it match with the continuous case better.
Proving the eigenvalue property above is relatively straightforward by using the properties
of H as a symplectic matrix.
The rest is shown by writing out the above matrices, multiplying, manipulating alge-
braically, and using the asymptotic stability of the Ricatti equation to the algebraic Ricatti
equation in certain cases to eliminate some terms.
For the discrete case, there is a somewhat more complicated result for A singular, which
Ill add later. It involves a generalized eigenvalue problem.
The LQR tracking problem is a variation on the LQR problem which provides a way to track a
reference trajectory for a linear system model.
16
Performance Index
1 T
Discrete: Ji = (CxN rN ) P (CxN rN )
2
N 1
1 X T
+ (Cxk rk ) Q (Cxk rk ) + uTk Ruk
2
k=i
1 T
Continuous: J(t0 ) = (Cx(T ) r(T )) P (Cx(T ) r(T ))
2
Z T
1
T
+ (Cx r) Q (Cx r) + uT Ru dt
2 t0
Sk = AT Sk+1 (A BKk ) + C T QC
T
vk = (A BKk ) vk+1 + C T Qrk , vN = C T P rN
1
Kkv = B T Sk+1 B + R BT
uk = Kk xk + Kkv vk+1
S = AT S + SA SBR1 B T S + C T QC
T
v = (A BK) v + C T Qr, v(T ) = C T P r(T )
K v (t) = R1 B T v(t)
Restrictions
R > 0, symmetric
P 0, Q 0, symmetric
17
Proof and Usage Notes
As in other situations involving the Ricatti equations, the form of the final solution can be
derived by assuming the form for which matches the boundary conditions and eliminating
by plugging this into the necessary conditions and manipulating. So you would set
(t) = S(t)x(t) v(t) and plug in.
The optimal cost-to-go can be computed in a similar way as in the LQR problem, with
some extra terms added.
When the final time, T for the target state, x(T ), is not given in a problem, then dT is not necessarily
0. This must be taken into account when setting up the necessary boundary conditions for a
minimum.
Problems with x(T ) and T both free and independent of each other. This will require the
corresponding terms in the boundary conditions to each be set to 0.
Problems with x(T ) and T both free, but dependent. For example, x(t) may need to intersect
with some given function p(t). This allows you to write dx(T ) = dp(T )
dT dT , and combine terms
in the boundary conditions.
Problems where T is free and x(T ) is required to be on a given fixed set, which does not change
with time. This is a generalization of the second example.
The conditions on the final costate in the last three examples are called the transversality
condition.
The LQR problem can be extended to incorporate the time involved in reaching the final state by
adding a constant term, , to the cost functional. This gives
Z T
1 1
J = xT (T )P x(T ) + + xT Qx + uT Ru dt.
2 2 t0
Note that no final state is specified in this example. By working through the necessary conditions
for an optimal solution, we get the same conditions for the LQR case, except that we have H(t) = 0
for all t. Using = Sx and u = R1 B T Sx and plugging this into the expression for H(t0 ), we find
xT (t0 )Sx(t0 ) = .
The optimal control and Kalman gain will be the same as in the standard LQR case, so to solve this
problem, we can integrate the Ricatti equation backwards from some final time, Tf , until we find a
t such that xT (t0 )Sx(t0 ) = holds. The minimum time interval will then be Tf t.
18
2.5 Constrained Input Control
Constrained input control problems provide some admissible region which the control, u(t), must lie
in. For these problems, we must replace the stationarity condition, Hu = 0, with the more general
condition called Pontryagins Minimum Principle.
Assumptions
The system is given using the same notation as in the general optimization section.
An admissible region for u(t) is given.
Conclusions
Necessary conditions for a minimal-cost control are the same as in the general case, with the
stationarity condition replaced by
H(x , u , , t) H(x , u + u, , t)
This is sort of common sense, so there should be no need to talk about the proof.
Bang-bang control arises in linear minimum-time problems with constrained input magnitude. The
resulting optimal control for these problems needs only take on two values, which are the extreme
values of the control.
Optimal Control of Linear Minimum-Time Problem
System Model: x = Ax + Bu
Z T
Performance Index: J(t0 ) = dt
t0
Constraints: |u(t)| 1
The proof of this follows from applying Pontryagins minimum principle to get ( )T Bu
( )T Bu. The left side is minimized by choosing u to be as small as possible, which depends
on the sign of ( )T B.
Note that the above may not define a unique control if B T is 0 over a finite interval.
Bang-off-bang control arises in linear minimum-fuel problems with constrained input magnitude.
19
Optimal Control of Linear Minimum-Fuel Problems
System Model: x = Ax + Bu
Z T
Performance Index: J(t0 ) = C T |u(t)| dt
t0
Constraints: |u(t)| 1
bTi (t)
Optimal Control: ui (t) = dez
ci
The proof of this follows from applying Pontryagins minimum principle in a similar way
as in the previous section. You will get a vector expression that allows you to consider
each component separately, giving ci |ui | + ( )T bi ui ci |ui | + ( )T bi ui . You can split
this into cases to eliminate the absolute value symbol. From these expressions, you arrive
at the necessary conditions for the minimum by making the terms on the left-side as small
as possible, resulting in the given control
Note that the above may not define a unique control if B T is 1 or -1 over a finite interval.
Equivalent Statements
Assumptions
x(T ) = 0
The system has no eigenvalues with positive real parts.
Conclusions
20
Assumptions
x(T ) is fixed.
A minimum-time control exists.
Conclusions
Assumptions
Conclusions
Each component ui (t) of the minimum-time control can switch no more than n 1 times.
These are minimum-energy problems which have constraints on the control. The problem statement
is the same as for the LQR case, with the addition of bounds on the control input. If the R in the
cost functional is diagonal, the optimal control is given by
It may be possible to use this when R is merely diagonalizable, but as the diagonalization process
skews the space, this must be done carefully. The proof of this follows from Pontryagins principle,
as in the previous sections.
It is important to know the properties of the basic numerical analysis techniques when dealing with
control problems, as most practical problems will require numerical techniques to solve. Particularly
important are the properties of differential equation solvers. Some things that might be good to
know about are
Splines.
One-step methods (simpler, easier to get started, may be more work).
Multi-step methods (more complicated, can be faster, can have more problems with stability).
Stiff problems.
Here are a couple of techniques for solving optimal control problems numerically.
Control Parameterization
21
Given a system x = f (x, u, t), parameterize your control input by a finite number of parameters.
For example, sample points, or some sort of parameters to make up a spline. Write a function that
gives the cost in terms of these parameters. Use a nonlinear programming optimization software
with this function to find the minimum in terms of the parameterization you provided.
Direct Transcription
Here you discretize everything, not just the control. The constraints are the discretization equa-
tions (for example, whatever numerical method you are using). You run this through nonlinear
programming optimization software and see what it gives you.
22