AL ILQR Tutorial
AL ILQR Tutorial
AL ILQR Tutorial
Brian Jackson
Abstract— Trajectory optimization is a powerful framework The most common approach, and the one treated exclu-
for controlling complex dynamical systems. While many algo- sively in the current tutorial, is to discretize the problem
rithms for solving these difficult problems have been proposed, in time, dividing the time period of length T seconds into
methods based on differential dynamic programming (DDP)
have recently become very popular due to their straightforward N − 1 segments, typically of equal length of ∆t seconds.
implementation and computational efficiency. While the original This results in N discretization points, also referred to as
DDP algorithm has no ability to deal with additional path “knot” points, including the initial and final times. There
constraints, the use of DDP within an augmented Lagrangian exist many methods for approximating the integrals in (2)
framework allows for a powerful and efficient framework with discrete sums, as well as transforming the ordinary
for solving constrained trajectory optimization problems. This
tutorial presents the algorithm in detail, based on experience differential equation for the dynamics (1) into a discrete
derived from implementing state-of-the-art implementations of difference equation of the form:
this algorithm, and aims to provide useful insight to help
those unfamiliar with DDP methods quickly understand the xk+1 = f (xk , uk , ∆t). (3)
algorithm and its extensions.
The resulting discrete optimization problem can then be
I. I NTRODUCTION written as:
Trajectory optimization is a powerful framework for con-
trolling complicated robotic systems. The value of trajectory N
X −1
optimization lies primarily in its generality, allowing it to minimize `N (xN ) + `k (xk , uk , ∆t)
x0:N , u0:N −1
be applied to a very broad class of dynamical systems. k=0
Importantly, trajectory optimization can be applied to any subject to
(4)
type of dynamical system whose dynamics are Markovian, xk+1 = f (xk , uk , ∆t), k = 1, ..., N -1,
i.e.
gk (xk , uk ){<= 0}, ∀k,
ẋ = f (x, u) (1)
hk (xk , uk ) = 0, ∀k,
where ẋ ∈ Rn is the time derivative of the state x ∈ Rn and
u ∈ Rm are the controls. There exist many methods for solving problems of the
While very general and powerful, trajectory optimization, form (4). These methods are typically divided into two
like all nearly all branches of optimal control, makes the categories: “indirect” and “direct” methods. Direct methods
important assumption that the states of the system are known treat both the states and controls as decision variables and use
exactly, i.e. that the system has full state feedback. Most general-purpose nonlinear programming (NLP) solvers, such
formulations of trajectory optimization assume a given initial as SNOPT or IPOPT. These methods typically transcribe
condition. the optimization problem into something of the form given
Trajectory optimization can be succinctly summarized by in (4), often with varying methods for approximating the
the underlying optimization problem being solved: continuous-time dynamics or unique formulations of problem
constraints. The most common method, direct collocation
Z tf (DIRCOL), uses Hermite-Simpson integration to integrate
minimize `(x(tf )) + `(x(t), u(t))dt both the cost and the dynamics, which is essentially a 3rd
x(t), u(t) 0
order implicit Runge-Kutta integrator for the states and first-
subject to ẋ = f (x, u), order hold (i.e. linear interpolation) for the controls. These
(2)
x(0) = x0 , methods benefit directly from the robustness and generality
g(x(t), u(t)) ≤ 0, of the NLP solvers on which they depend. However, direct
methods also tend to be fairly slow and require large opti-
h(x(t), u(t)) = 0,
mization packages.
where x(t) and u(t) are the state and control trajectories Alternatively, indirect methods exploit the Markov struc-
from time 0 to T . The dynamics constraint, which con- ture of (4) and often impose strict Markovianity across the
strains derivatives of the optimization variables, is what sets entire problem, including the cost functions and constraints.
trajectory optimization apart from other general nonlinear The dynamics constraints are then implicitly enforced by
optimization problems. The field of trajectory optimiza- simulating forward the system’s dynamics. Differential Dy-
tion is dedicated to finding efficient ways to solve these namic Programming (DDP) and iterative LQR (iLQR) are
differentially-constrained optimization problems. closely related indirect methods that solve (4) by breaking
it into a sequence of smaller sub-problems. DDP methods where E and I are the sets of equality and inequality
improve on more naive “simple shooting” methods by incor- constraint indices, respectively and φ is the penalty scaling
porating a feedback policy during the forward simulation of parameter (typically 2 ≤ φ ≤ 10).
the dynamics at each time step. Because of their strict en-
forcement of dynamic feasibility, it is often difficult to find a B. Linear Quadratic Regulator
control sequence that produces a reasonable initialization for The Linear Quadratic Regular (LQR) problem is a canon-
DDP methods. While they are fast and have a low memory ical problem in the theory of optimal control, partially due
footprint, making them amenable to embedded implemen- to the fact that it has analytical solutions that can be derived
tation, DDP methods have historically been considered less using a variety of methods, and from the fact that LQR is an
numerically robust and less well-suited to handling nonlinear extremely useful tool in practice. We present derivations for
state and input constraints. both continuous-time and discrete-time LQR. The discrete-
This tutorial derives a method for solving constrained time derivation is particularly useful in setting up the ideas
trajectory optimization problems using DDP or iLQR within that will used in deriving constrained DDP.
an augmented Lagrangian framework. The result is a fast, ef- The continuous-time LQR problem is formulated as
ficient algorithm that allows nonlinear equality and inequality
constraints on both the states and controls. 1 T
min x (tf )Q(tf )x(tf )
This tutorial is based on previous research [1]–[3] and u(t) 2
experience implementing this algorithm in several program-
1 tf T (9)
Z
ming languages. + [x (t)Q(t)x(t) + uT (t)R(t)u(t)]dt
2 0
II. BACKGROUND s.t. ẋ(t) = A(t)x(t) + B(t)u(t)
A. Augmented Lagrangian where R is a real, symmetric, positive-definite matrix and Q
Augmented Lagrangian is an optimization method for is a real, symmetric, positive semi-definite matrix.
solving constrained optimization problems of the form and the discrete-time LQR problem is
minimize f (x) 1 T
x min x QN xN
(5) u(t) 2 N
subject to c(x){≤, =}0.
N −1
One of the easiest methods for solving constrained op- 1 X T (10)
+ [xk Qk xk + uTk Rk uk ]dt
timization problems is to move the constraints into the 2
k=0
cost function and iteratively increase the penalty for either s.t. xk+1 = Ak xk + Bk uk
getting close to or violating the constraint. However, penalty
methods only converge to the optimal answer as the penalty 1) Hamilton-Jacobi-Bellman Derivation: The Hamilton-
terms are increased to infinity, which is impractical to Jacobi-Bellman equation is an important equation in the
implement in a numerical optimization routine with limited- theory of optimal control that states the necessary conditions
precision arithmetic. Augmented Lagrangian methods im- for optimality for a continuous-time system. We define the
prove on penalty methods by maintaining estimates of the cost function to be
Z tf
Lagrange multipliers associated with the constraints. This is
accomplished by forming the augmented Lagrangian J(x(t), u(t), t) = `(x(tf ), t) + `(x(t), u(t), t)dt (11)
0
1 and the minimum cost function
LA = f (x) + λT c(x) + c(x)T Iµ c(x) (6)
2
J ∗ (x(t), t) = min J(x(t), u(t), t). (12)
where λ are the Lagrange multipliers, µ are the penalty u(t)
multipliers, and We now define the Hamiltonian as
(
0 if ci (x) < 0 ∧ λi = 0, i ∈ I H(x(t), u(t), Jx∗ , t) = `(x(t), u(t), t)
Iµ = . (7) (13)
µi otherwise + Jx∗T (x(t), t)[f (x(t), u(t), t]
∗
The problem is then solved as follows: where Jx∗ = ∂J
∂x and f (x(t), u(t), t) are the dynamics.
1) solve minx LA (x, λ, µ), holding λ and µ constant The Hamilton-Jacobi-Bellman equation is then
2) update Lagrange multipliers
( 0 = Jt∗ (x(t), t) + H(x(t), u∗ (t), Jx∗ , t) (14)
+ λi + µi ci (x∗ ) i∈E
λi = (8) We now use these to solve the continuous LQR problem.
max(0, λi + µi ci (x∗ ) i ∈ I, We form the Hamiltonian
3) update penalty term: µ+ = φµ, φ > 1 H(x(t), u(t), Jx∗ , t) = xT (t)Q(t)x(t) + uT (t)R(t)u(t)
4) check constraint convergence
5) if tolerance not met, go to 1 + Jx∗T (x(t), t)[A(t)x(t) + B(t)u(t)] (15)
and minimize it with respect to u(t) by setting ∂H/∂u = 0: forward one time instance). For convenience, we define
∂H the action-value function Q(xk , uk ) to be the value being
= Ru + B T Jx∗ = 0. (16) minimized:
∂u
Vk (x) = min Q(xk , uk ) (25)
Solving with respect to u yields the optimal control uk
TABLE II
AUGMENTED L AGRANGIAN H YPERPARAMETERS
We begin by calculating the square root of the terminal trivially computed using backward or forward substitution,
cost-to-go: given that the square root factors are triangular.
√
The gradient (48) and change of the cost-to-go (49) are
(`N )x x
SN = P N = QRR (61) also computed with simple substitution:
Iµ,N cN
√ p = Qx + (Zuu K)T (Zuu d) + K T Qu + Qxu d (66)
where we define S ∈ Rn×n = P to be the upper-triangular
square root of the Hessian of 1
√ the cost-to-go. ∆V = dT Qu + (Zuu d)T (Zuu d). (67)
All
√ that is left is to find P k as an expression in terms 2
of P k+1 . We start by finding the square roots of Qxx and We note that (47) is a quadratic form that can be expressed
Quu : as T
Qxx QTux I
√ I
P =
`xx K Qux Quu K
0
p
Zxx = Qxx ← QRR pS A (62) T T
Iµ cx I Zxx 0 Zxx C I (68)
√ P =
K C T DT 0 D K
`uu T
0
S B Z + CK Zxx + CK
= xx
p
Zuu = Quu ← QRR Iµ cu .
p (63) DK DK
√
ρI where,
The optimal gains are then trivially computed as −T
C = Zxx Qxu (69)
q
−1 −T −1 −T
K = −Zuu Zuu Qux (64) D = Zuu T Z
uu − Qux Zxx Zxx Qxu (70)
−T
−1 −T
d = −Zuu Zuu Qu , (65) = D OWN DATE Zuu , Zxx Qxu . (71)
Here it’s important to note that these equations should be The square root of Pk is then
√ −T
computed using a linear solver (e.g. the “\” operator in Zuu + Zxx Qxu K
P k = QRR −T (72)
MATLAB or Julia) sequentially, since these equations are D OWN DATE Zuu , Zxx Qxu ·K
B. Infeasible State Trajectory Initialization q ∗ . For r ∈ R3 , r̂ is a quaternion with s = 0 and z = r.
Desired state trajectories can often be identified (e.g., H ∈ R4×3 is the matrix “hat” operator, such that r̂ = Hr
from sampling-based planners or expert knowledge), whereas and r = H T r̂. The matrix M (q) ∈ R4×4 is defined such
finding a control trajectory that will produce this result is that M (q2 )q1 = q2 ⊗ q1 .
usually challenging. Dynamically infeasible state trajectory Quaternion rotations require special consideration in op-
initialization is enabled by introducing additional inputs to timization problems to properly account for the unit-norm
the dynamics with slack controls s ∈ Rn , constraint. We define φ ∈ R3 to be a three-dimensional
differential rotation, such that φ̂ = δq, where q = q0 ⊗ δq,
xk+1 = f (xk , uk ) + sk , (73) when δq is a small rotation. We seek the Jacobian G(q) ∈
R3×4 that maps from the quaternion to our three-dimensional
to make the system artificially fully actuated.
differential, φ. From above, we have q = q0 ⊗ δq = q0 ⊗ φ̂ =
Given initial state and control trajectories, X̃ and U ,
M (q0 )Hφ, so G(q0 )T = M (q0 )H.
the initial infeasible controls s0:N −1 are computed as the
If h(q) : q 7→ Rp is a function that maps a quaternion to
difference between the dynamics and desired state trajectory
a real-valued vector, the linearization of h(q) gives
at each time step:
∂h(q)
sk = x̃k+1 − f (x̃k , uk ) (74) δh ≈ G(q) G(q)T φ. (78)
∂q
The optimization problem (4) is modified by replacing the Similarly, for a real-valued function g(q) : q 7→ R, the
dynamics with (73). An additional cost term, second-order Taylor series expansion is
N −1
1 T ∂g(q) 1 ∂ 2 g(q)
G(q)T φ + φT G(q) G(q)T φ
X
s Rs sk , (75) δg ≈ (79)
2 k ∂q 2 ∂q 2
k=0
2) Quaterion AL-iLQR: We now present modifications to
and constraints sk = 0, k=0, . . . , N −1 are also added to AL-iLQR in order to optimize states that contain quaternions.
the problem. Since sk = 0 at convergence, a dynamically With standard commercial solvers, like those used in direct
feasible solution to (4) is still obtained. trajectory optimization methods, quaternions can only be
C. Optimizing Quaternions treated as vectors in R4 , often resulting in poor performance.
With AL-iLQR we can modify the algorithm to correctly
When dealing with robotic systems with arbitrary orien-
optimize with respect to a unit quaternion. Extensions of
tation, such as satellites, airplanes, quadrotors, etc., the 3D
this method to states with multiple quaternions are easily
orientation must be optimized. Rotations present a unique
obtained through simple concatenation.
challenge for optimization since all vector parameterizations
Let x ∈ Rn (n ≥ 4) be a state vector containing a unit
of rotations are either underparameterized (e.g. Euler angles)
quaternion of the form,
or subject to constraints (e.g. rotation matrix or quaternions).
Quaternions are often considered the best parameterization y
x= (80)
of rotation, since they only use four parameters and have a q
number of desireable qualities that make them attractive for where y ∈ Rn−4 is a standard vector and q is the quaternion.
numerical computation. We begin with some background on Using the results from above, the difference between two
quaternions, along with our notation conventions for quater- states, x and x0 , is
nion operations, and then present a method for handling
quaternions within AL-iLQR. δy y − y0
δx = = ∈ Rn−1 , (81)
1) Background and Notation: The unit quaternion is a φ H T (q0∗ ⊗ q)
four-parameter non-singular representation of rotations com- from which we get the Jacobian,
prising a vector part z ∈ R3 and a scalar part s ∈ R. To rep-
I 0
resent quaternion operations as linear-algebraic expressions, G(x) = n−4 (82)
0 G(q)
the following conventions are used: A quaternion q encodes
the rotation from a vehicle’s body frame to the world frame. where G(x) ∈ Rn−1×n . The next sections detail how
The equivalent rotation matrix is given by, to modify the AL-iLQR backward and forward passes to
correctly take expansions with respect to quaternion states.
R(q) = I + 2z × (z × + sI), (76) Backward pass: The backward pass of iLQR linearizes the
where I is the 3 × 3 identity matrix and z × denotes the 3 × 3 dynamics at each time step and approximates the cost-to-go
skew-symmetric cross-product matrix: as a quadratic function by taking the second-order Taylor
series expansion of the cost-to-go. PN ∈ Sn++ , pN ∈ Rn are
0 −z3 z2 the Hessian and gradient of the terminal cost-to-go and the
z × = z3 0 −z1 . (77) corrected terms are,
−z2 z1 0
P̃N = G(xN )PN G(xN )T (83)
Quaternion multiplication is denoted q2 ⊗ q1 , and the quater-
nion conjugate, representing the opposite rotation, is denoted p̃N = G(xN )pN (84)
Qxx , Qux , and Qx are the terms from the approximate to the cost. Algorithm 5 takes successive Newton steps δz,
quadratic expansion of the action-value function at time only updating the constraint Jacobian D when the conver-
step k that must be modified. The corrected terms from the gence rate r drops below a threshold, allowing re-use of the
expansion are: same matrix factorization S for inexpensive linear system
solutions. Further, this algorithm can be implemented in a
Q̃xx = G(xk )Qxx G(xk )T (85)
sequential manner [4] that does not require building large
Q̃ux = Qux G(xk )T (86) matrices, making it amenable to embedded systems.
Q̃x = G(xk )Qx . (87)
Algorithm 5 Projection
The rest of the backward pass proceeds as normal, using the
1: function P ROJECTION (Y, tol.)
corrected expansion terms.
2: H −1 ← invert Hessian of objective
Forward pass: The only required modification to the
3: while kdk∞ > tol. do
forward pass is the correct calculation of the difference
4: d, D ←√ linearize active constraints
between states x and x0 using (81). The modified change in
5: S ← DH −1 DT
control, where K ∈ Rm×(n−1) and d ∈ Rm are the feedback
6: v ← kdk∞
and feedforward gains from the backward pass, is:
7: r←∞
δuk = Kk δxk + dk . (88) 8: while v > tol. and r > conv. rate tol. do
9: Y, v + ← L INE S EARCH(Y, S, H −1 , D, d, v)
Objective: Directly subtracting states that contain quater- 10: r ← log v + / log v
nions (or any rotational representation) is incorrect. A good 11: end while
alternative is to again use the state difference Jacobian, as 12: end while
demonstrated in the following objective, 13: return Y
˜
J(X, U) = 14: end function
Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.
Alternative Proxies: