Inno2024 EMT4203 CONTROL II NOTES R6
Inno2024 EMT4203 CONTROL II NOTES R6
II
DeKUT
Lecture Notes
By
February 2024
3.1 Optimal Control: LQR (Linear Quadratic Regulator) 3
to follow an optimal trajectory x(t) that minimizes the performance criterion, or cost function
Z t1
J= h(x(t), u(t),t)dt (9.2)
t0
The problem is one of constrained functional minimization, and has several approaches namely:
4. The Hamilton-Jacobi equation solved for special case of the linear time- invariant plant with
quadratic performance criterion (called the performance index), which takes the form of the
matrix Riccati (1724) equation.
Z t1
J= {(q11 x1 + q22 x2 + q33 x3 ) + (r1 u)} dt (9.4)
t0
or
Z t1
J= (Qx + Ru)dt (9.5)
t0
If the state and control variables in equations (9.4) and (9.5) are squared, then the performance
index become quadratic. The advantage of a quadratic performance index is that for a linear system
it has a mathematical solution that yields a linear control law of the form
Z t1
q11 x12 + q22 x22 + q33 x32 + r1 u2 dt
J= (9.7)
t0
Z t1 h i q11 0 0 x1
J= x1 x2 x3 0 q22 0 x2 + [u] [r1 ] [u] dt
t0
0 0 q33 x3
or, in general
Z t1
xT Qx + uT Ru dt
J= (9.8)
t0
Q and R are the state and control weighting matrices and are always square and symmetric. J
is always a scalar quantity.
i.e., the integral over time of the instantaneous penalty. Finally, the optimal return is the cost of the
optimal trajectory remaining after time t :
Z ∞
V (x(t), u(t)) = l(x(τ), u(τ))dτ.
t
The minimization of V (x(t), u(t)) is made by considering all the possible control inputs u
in the time interval (t,t + δt). As suggested by dynamic programming, the return at time t is
constructed from the return at t + δt, and the differential component due to l(x, u).
Through multivariate Taylor series expansion
∂V dx
V (x(t + δt), u(t + δt)) = V (x(t), u(t)) + + h.o.t. −→
∂ x dt
∂V
= V (x(t), u(t)) + (Ax(t) + Bu(t)).
∂x
Substituting the V (x(t + δt), u(t + δt)) in the bellman equation gives
∂V
V (x(t), u(t)) = min{l(x(t), u(t))δt +V (x(t), u(t)) + (Ax(t) + Bu(t))}. (3.4)
u ∂x
Now control input u in the interval (t,t + δt) cannot affect V (x(t), u(t)), so isolating it from
the minimization component and making a cancellation gives
∂V
0 = min l(x(t), u(t)) + (Ax(t) + Bu(t)) . (3.5)
u ∂x
We next make the assumption that V (x, u) has the following form:
1 1
V (x, u) = xT Px + uT Zu
2 2
6
∂V
= xT P −→
∂x
0 = min l(x, u) + xT P(Ax + Bu) .
u
We finally specify the instantaneous penalty function. The LQR employs the special quadratic
form
1 1
l(x, u) = xT Qx + uT Ru,
2 2
where Q and R are both symmetric and positive definite. The matrices Q and R are to be set by the
user, and represent the main "tuning knobs" for the LQR. Substitution of this form into the above
equation gives
1 T 1 T T
0 = min x Qx + u Ru + x P(Ax + Bu) . (3.6)
u 2 2
and setting the derivative with respect to u to zero gives
0 = uT R + xT PB
uT = −xT PBR−1
u = −R−1 BT Px.
The gain matrix for the feedback control is thus K = R−1 BT P. Inserting this solution back into
equation 3.6, and eliminating u in favor of x, we have
1 1
0 = xT Qx − xT PBR−1 BT Px + xT PAx.
2 2
2xT PAx = xT AT P + PA x
then
1 1
xT PAx = xT PAx + xT AT Px,
2 2
leading to the final matrix-only result
0 = Q + PA + AT P − PBR−1 BT P.
Example:
The regulator shown in Figure 9.1 contains a plant that is described by
" # " #" # " #
ẋ1 0 1 x1 0
= + u
ẋ2 −1 −2 x2 1
h i
y= 1 0 x
(a)
" # " #
0 1 0
A= B=
−1 −2 1
" #
2 0
Q= R = scalar = 1
0 1
From equation (9.25) the reduced Riccati equation is
8
Q + PA + AT P − PBR−1 BT P = 0
" #" # " #
p11 p12 0 1 −p12 p11 − 2p12
PA = =
p21 p22 −1 −2 −p22 p21 − 2p22
" #" # " #
T 0 −1 p11 p12 −p21 −p22
A P= =
1 −2 p21 p22 p11 − 2p21 p12 − 2p22
" #" # " #
−1 T p11 p12 0 h i p
11 p12
PBR B P = 1 0 1
p21 p22 1 p21 p22
" #
p12 h i
= p21 p22
p22
" #
p12 p21 p12 p22
=
p22 p21 p222
Combining equations (9.34), (9.35) and (9.36) gives
Since P is symmetric, p21 = p12 . Equation (9.37) can be expressed as four simultaneous
equations
Note that equations (9.39) and (9.40) are the same. From equation (9.38)
−2 + p212 + 2p12 = 0
solving
|sI − A + BK| = 0
" # " # " #
s 0 0 1 0 h i
− + 0.732 0.542 = 0
0 s −1 −2 1
" # " #
s −1 0 0
+ =0
1 s+2 0.732 0.542
s −1
=0
1.732 s + 2.542
s2 + 2.542s + 1.732 = 0
s1 , s2 = −1.271 ± j0.341
10
1.4.1 LQ-Observer
For a system of the form
The state estimator design problem is to choose the observer gain L in the observer equation
ė = (A − LC)e
B → CT A → AT