13 Robotics Script
13 Robotics Script
13 Robotics Script
Marc Toussaint
February 4, 2014
This is a direct concatenation and reformatting of all lecture slides and exercises from
the Robotics course (winter term 2013/14, U Stuttgart), including a topic list to prepare
for exams. Sorry for the sometimes imperfect formatting.
Contents
1 Introduction 3
2 Kinematics 6
Kinematic map, Jacobian, inverse kinematics as optimization problem, motion profiles, trajectory interpolation, multiple
simultaneous tasks, special task variables, configuration/operational/null space, singularities
3 Path Planning 15
Path finding vs. trajectory optimization, local vs. global, Dijkstra, Probabilistic Roadmaps, Rapidly Exploring Random
Trees, non-holonomic systems, car system equation, path-finding for non-holonomic systems, control-based sampling,
Dubins curves
4 Path Optimization 26
very briefly
5 Dynamics 27
1D point mass, damping & oscillation, PID, dynamics of mechanical systems, Euler-Lagrange equation, Newton-Euler
recursion, general robot dynamics, joint space control, reference trajectory following, operational space control
6 Probability Basics 33
7 Mobile Robotics 36
State estimation, Bayes filter, odometry, particle filter, Kalman filter, SLAM, joint Bayes filter, EKF SLAM, particle SLAM,
graph-based SLAM
8 Control Theory 44
Topics in control theory, optimal control, HJB equation, infinite horizon case, Linear-Quadratic optimal control, Riccati
equations (differential, algebraic, discrete-time), controllability, stability, eigenvalue analysis, Lyapunov function
51section.9
1
2 Introduction to Robotics, Marc Toussaint—February 4, 2014
13 Exercises 69
14 Topic list 79
Introduction to Robotics, Marc Toussaint—February 4, 2014 3
1 Introduction
Why Robotics?
1:1
http://people.csail.mit.edu/nikolaus/drg/
1:5
Why Robotics?
• Commercial:
1:2 Industrial, health care, entertainment, agriculture, surgery, etc
• Critical view:
– International Committee for Robot Arms Control
http://www.icrac.co.uk/
http://www.nec.co.jp/products/robot/en/
1:6
http://www.saintpatrick.org/index.aspx/Health_Services/da_
Vinci_Surgery Robotics as intelligence research
1:3
Newton-Euler, joint space control, reference trajectory following, optimal operational space con-
trol
Path finding vs. trajectory optimization, local vs. global, Dijkstra, Probabilistic Roadmaps, Rapidly
Exploring Random Trees, differential constraints, metrics; trajectory optimization, general cost
function, task variables, transition costs, gradient methods, 2nd order methods, Dynamic Pro-
State estimation, Bayes filter, odometry, particle filter, Kalman filter, Bayes smoothing, SLAM,
• Machine Learning and AI are computational disciplines, which
joint Bayes filter, EKF SLAM, particle SLAM, graph-based SLAM
had great success with statistical modelling, analysis of data
sets, symbolic reasoning. But they have not solved autonomous
1:11
learning, acting & reasoning in real worlds.
• Neurosciences and psychology are descriptive sciences, either
on the biological or cognitive level, e.g. with geat sucesses to
describe and cure certain deceases. But they are not sufficient
to create intelligent systems. • Is this a practical or theoretical course?
• Robotics is the only synthetic discipline to understand intelli- “There is nothing more practical than a good theory.”
gent behavior in natural worlds. Robotic tells us what the actual (Vapnik, others...)
problems are when trying to organize behavior in natural worlds.
1:9 • Essentially, the whole course is about
Books
( http://www.csail.mit.edu/videoarchive/history/aifilms http: There is no reference book for this lecture. But a basic well-
//www.ai.sri.com/shakey/ ) known standard text book is:
1:10
Four chapters
polation, motion profiles; 1D point mass, damping & oscillation, PID, general dynamic systems,
Introduction to Robotics, Marc Toussaint—February 4, 2014 5
Books
1:14
Online resources
• VideoLecture by Oussama Khatib: http://academicearth.org/
courses/introduction-to-robotics http://www.virtualprofessors.
com/introduction-to-robotics-stanford-cs223a-khatib
(focus on kinematics, dynamics, control)
• Oliver Brock’s lecture http://courses.robotics.tu-berlin.de/
mediawiki/index.php/Robotics:_Schedule_WT09
• Stefan Schaal’s lecture Introduction to Robotics: http://www-clmc.
usc.edu/Teaching/TeachingIntroductionToRoboticsSyllabus
(focus on control, useful: Basic Linear Control Theory (analytic solution
to simple dynamic model → PID), chapter on dynamics)
• Chris Atkeson’s “Kinematics, Dynamic Systems, and Control” http:
//www.cs.cmu.edu/˜cga/kdc/
(uses Schaal’s slides and LaValle’s book, useful: slides on 3d kinemat-
ics http://www.cs.cmu.edu/˜cga/kdc/ewhitman1.pptx)
• CMU lecture “introduction to robotics” http://www.cs.cmu.edu/afs/
cs.cmu.edu/academic/class/16311/www/current/syllabus.
html
(useful: PID control, simple BUGs algorithms for motion planning, non-
holonomic constraints)
Organization
• Course webpage:
http://ipvs.informatik.uni-stuttgart.de/mlr/marc/teaching/
– Slides, exercises & software (C++)
– Links to books and other resources
• Let (o, e1:3 ) be the world frame, (o′ , e′1:3 ) be the body’s frame.
The new basis vectors are the columns in R, that is, e′1 =
R11 e1 + R21 e2 + R31 e3 , etc,
01-kinematics: ./x.exe -mode 2/3/4
• x = coordinates in world frame (o, e1:3 )
2:2
x′ = coordinates in body frame (o′ , e′1:3 )
p = coordinates of o′ in world frame (o, e1:3 )
Outline
x = p + Rx′
Rotations
• Kinematics: φ : q 7→ y
• Inverse Kinematics: y ∗ 7→ q ∗ = argminq ||φ(q)−y ∗ ||2C +||q−q0 ||2W • Rotations can alternatively be represented as
• Basic motion heuristics: Motion profiles – Euler angles – NEVER DO THIS!
– Rotation vector
– Quaternion – default in code
• Additional things to know
• See the “geometry notes” for formulas to convert, concatenate
– Many simultaneous task variables
& apply to vectors
– Singularities, null space,
2:7
2:3
Homogeneous transformations
Basic 3D geometry & notation
• xA = coordinates of a point in frame A
2:4
xB = coordinates of a point in frame B
Introduction to Robotics, Marc Toussaint—February 4, 2014 7
Joint types
• TAB describes the coordinate transformation from xB to xA
That is, it describes the backward COORDINATE transformation • Joint transformations: TAA′ (q) depends on q ∈ Rn
• Confused? Vectors (and frames) transform covariant, coordi- revolute joint: joint angle q ∈ R determines rotation about x-
TW C = TW A TAB TBC
xW = TW A TAB TBC xC
2:10
Kinematics
2:11 2:14
8 Introduction to Robotics, Marc Toussaint—February 4, 2014
TW eff (q) gives us the pose of the endeffector in the world frame 2:18
point
tates
1) A point v on the endeffector transformed to world coordinates: i-th joint
[a1 × (peff − p1 )]
[a2 × (peff − p2 )]
[an × (peff − pn )]
3
φvec
eff,v (q) = RW eff (q) v ∈R
[a1 × aeff ]
[a2 × aeff ]
[an × aeff ]
..
.
pos
Jeff,v (q) =
∈ R3×n Jeff,v
vec
(q) =
∈
..
.
Where RAB is the rotation in TAB .
2:15
2:19
Kinematic Map
Jacobian
• In general, a kinematic map is any (differentiable) mapping
Jacobian for a rotational joint 1) The pre-image φ-1 (y ∗ ) = may be empty: No configuration
can generate the desired y ∗
eff
axis
point
2) The pre-image φ-1 (y ∗ ) may be large: many configurations
i-th joint
q ∗ = q0 + J ♯ (y ∗ − y0 )
∗
• The 1st term ensures that we find a configuration even if y is
with J ♯ = (J⊤CJ + W )-1 J⊤C = W -1 J⊤(JW -1 J⊤ + C -1 )-1 (Woodbury
not exactly reachable
identity)
The 2nd term disambiguates the configurations if there are many
φ-1 (y ∗ )
– For C → ∞ and W = I, J ♯ = J⊤(JJ⊤)-1 is called pseudo-
inverse
– W generalizes the metric in q-space
– C regularizes this pseudo-inverse (see later section on sin-
gularities)
2:26
∗
where the target yt+1 moves smoothly with t
cal equations for inverse kinematics using the local Jacobian... – What happens if T = 1 and y ∗ is far?
2:25 2:28
Note that h corresponds to the classical concept of null space • Generally, let’s define a motion profile as a mapping
motion
MP : [0, 1] 7→ [0, 1]
• What if we want to find the exact (local) optimum? E.g. what if with MP(0) = 0 and MP(1) = 1 such that the interpolation is
∗
we want to compute a big step (where q will be remote from q) given as
and we cannot not rely only on the local linearization approxi- xt = x0 + MP(t/T ) (xT − x0 )
mation?
– Iterate equation (1) (optionally with a step size < 1 to en-
sure convergence) by setting the point y ′ of linearization to • For example
the current q ∗
– This is equivalent to the Gauss-Newton algorithm MPramp (s) = s
2:29 1
MPsin (s) = [1 − cos(πs)]
2
• We’ve derived a basic motion generation principle in robotics Joint space interpolation
from
– an understanding of robot geometry & kinematics
1) Optimize a desired final configuration qT :
– a basic notion of optimality
Given a desired final task value yT , optimize a final joint state qT to
minimize the function
2:31 2:34
yt = y0 + MP(t/T ) (yT − y0 )
(As steps are small, we should be ok with just using this local lineariza-
tion.)
The sine profile xt = x0 + 21 [1 − cos(πt/T )](xT − x0 ) is a com-
2:35
promise for low max-acceleration and max-velocity
Taken from http://www.20sim.com/webhelp/toolboxes/mechatronics_
√
Multiple tasks ̺ (φ (q) − y1∗ )
√ 1 1
∗ P
where Φ(q) := ̺2 (φ2 (q) − y2 )
∈R i di
..
2:37 .
2:40
Multiple tasks
Multiple tasks
Example: We want to control the 3D position of the left hand and of the
right hand. Both are “packed” to one 6-dimensional task vector which
becomes zero if both tasks are fulfilled.
∂Φ(q0 )
2:38
• Using the local linearization of Φ at q0 , J = ∂q
, the optimum
is
2:41
Multiple tasks
• Assume we have m simultaneous tasks; for each task i we • In the classical view, tasks should be executed exactly, which means
taking the limit ̺i → ∞ in some prespecified hierarchical order.
have: • We can rewrite the solution in a way that allows for such a hierarchical
– a kinematic mapping yi = φi (q) ∈ Rdi limit:
– a current value yi,t = φi (qt ) • One task plus “nullspace motion”:
Position
Note: φalign = 1 ↔ align φalign = −1 ↔ anti-align φalign = 0 ↔ orthog.
Notation:
2:52
– Ai is a matrix with columns (Ai )·k = [k ≺ i] ak containing
the joint axes or zeros
Introduction to Robotics, Marc Toussaint—February 4, 2014 13
Path finding vs. trajectory optimization, local vs. global, Dijkstra, Prob-
abilistic Roadmaps, Rapidly Exploring Random Trees, non-holonomic
systems, car system equation, path-finding for non-holonomic sys-
tems, control-based sampling, Dubins curves
path finding
trajectory optimization
feedback control
start goal
T. Bretl. Motion Planning of Multi-Limbed Robots Subject to • Many more problems can be solved locally optimal with only
Equilibrium Constraints: The Free-Climbing Robot Problem. In- trajectory optimization
ternational Journal of Robotics Research, 25(4):317-342, Apr • Tricky problems need path finding: global search for valid paths
2006. 3:6
3:3
16 Introduction to Robotics, Marc Toussaint—February 4, 2014
Outline
• Heuristics & Discretization (slides from Howie CHoset’s CMU A better bug?
lectures)
– Bugs algorithm “Bug 2” Algorithm
– Probabilistic Roadmaps
– Rapidly Exploring Random Trees
3:7 Goal
NO! How do we fix this?
3:10
A better bug?
m-line
“Bug 2” Algorithm A better bug?
1) head toward goal on the m-line
2) if an obstacle is in the way, “Bug 2” Algorithm
follow it until you encounter the
m-line again. 1) head toward goal on the m-line
3) Leave the obstacle and continue 2) if an obstacle is in the way,
Start
toward the goal follow it until you encounter the
m-line again closer to the goal.
3) Leave the obstacle and continue
toward the goal
OK ?
3:11
3:13
3:16
3:17
3:14
U (q ) = U att (q ) + U rep (q )
– This will be repeated until no 0’s are adjacent to cells
with values >= 2
• 0’s will only remain when regions are unreachable
F (q ) = −∇U (q )
+ =
3:18
3:15
18 Introduction to Robotics, Marc Toussaint—February 4, 2014
3:19
3:22
3:20
3:23
3:21 3:24
Introduction to Robotics, Marc Toussaint—February 4, 2014 19
Two
possible
shortest
paths
shown
[Kavraki, Svetska, Latombe,Overmars, 95]
Probabilistic Road Map
– generates a graph G = (V, E) of configurations
3:25 – such that configurations along each edges are ∈ Qfree
3:29
Dijkstra Algorithm
Sample-based Path Finding Given the graph, use (e.g.) Dijkstra to find path from qstart to
qgoal .
3:27
3:30
Probabilistic Road Maps
q ∈ Rn describes configuration
Qfree is the set of configurations without collision where path(q, q ′ ) is a local planner (easiest: straight line)
3:31
3:28
20 Introduction to Robotics, Marc Toussaint—February 4, 2014
Local Planner
• Pros:
– Algorithmically very simple
– Highly explorative
The smaller the gap (clearance ̺) the more unlikely to sample – Allows probabilistic performance guarantees
• Cons:
PRM theory – Precomputation of exhaustive roadmap takes a long time
(but not necessary for “Lazy PRMs”)
(for uniform sampling in d-dim space)
3:36
• Let a, b ∈ Qfree and γ a path in Qfree connecting a and b.
|B1 |
• Single Query path finding: Focus computational efforts on paths
σ= 2d |Qfree |
for specific (qstart , qgoal )
̺ = clearance of γ (distance to obstacles)
(roughly: the exponential term are “volume ratios”)
• Use actually controllable DoFs to incrementally explore the search
space: control-based path finding.
• This result is called probabilistic complete (one can achieve any
probability with high enough n)
(Ensures that RRTs can be extended to handling differential
• For a given success probability, n needs to be exponential in d
constraints.)
3:34
3:37
3:38
Simplest RRT with straight line local planner and step size α
3:39
3:40
22 Introduction to Robotics, Marc Toussaint—February 4, 2014
Bi-directional search
Summary: RRTs
Steven M. LaValle: Planning Algorithms, http://planning. • The car is a non-holonomic system: not all DoFs are controlled,
cs.uiuc.edu/. dim(u) < dim(q)
We have the differential constraint q̇:
Latombe’s “motion planning” lecture, http://robotics.stanford. “A car cannot move directly lateral.”
edu/˜latombe/cs326/2007/schedule.htm
3:44 • Analogy to dynamic systems: Just like a car cannot instantly move side-
wards, a dynamic system cannot instantly change its position q: the
current change in position is constrained by the current velocity q̇.
Non-holonomic systems 3:48
3:45
Path finding with a non-holonomic system
The elements of Ux are elements of Tx subject to additional Path finding with a non-holonomic system
differential constraints.
This is a solution we would like to have:
3:46
Car example
ẋ = v cos θ
ẏ = v sin θ
θ̇ = (v/L) tan ϕ
|ϕ| < Φ
x The path respects differential constraints.
State q = y
Controls u = v
ϕ Each step in the path corresponds to setting certain controls.
θ
ẋ
v cos θ 3:50
Sytem equation ẏ = v sin θ
θ̇ (v/L) tan ϕ
3:47 Control-based sampling to grow a tree
24 Introduction to Robotics, Marc Toussaint—February 4, 2014
3:51
car parking
3:54
Control-based sampling for the car
1) Select a q ∈ T
2) Pick v, φ, and τ
3) Integrate motion from q
4) Add result if collision-
free
3:52
3:57
back.
(includes 46 types of trajectories, good metric for use in RRTs
for cars)
Metrics
3:60
Standard/Naive metrics:
• Comparing two 2D rotations/orientations θ1 , θ2 ∈ SO(2):
a) Euclidean metric between eiθ1 and eiθ2
b) d(θ1 , θ2 ) = min{|θ1 − θ2 |, 2π − |θ1 − θ2 |}
• Comparing two configurations (x, y, θ)1,2 in R2 :
Eucledian metric on (x, y, eiθ )
• Comparing two 3D rotations/orientations r1 , r2 ∈ SO(3):
Represent both orientations as unit-length quaternions r1 , r2 ∈ R4 :
d(r1 , d2 ) = min{|r1 − r2 |, |r1 + r2 |}
where | · | is the Euclidean metric.
(Recall that r1 and −r1 represent exactly the same rotation.)
• Ideal metric:
Optimal cost-to-go between two states x1 and x2 :
– Use optimal trajectory cost as metric
– This is as hard to compute as the original problem, of course!!
→ Approximate, e.g., by neglecting obstacles.
3:58
Dubins curves
Dubins curves
26 Introduction to Robotics, Marc Toussaint—February 4, 2014
Outline
Control costs
Choice of optimizer
T
X T
X
f (q0:T ) = ||Ψt (qt-k , .., qt )||2 + ||Φt (qt )||2
t=0 t=0
5 Dynamics – The notation xt refers to the dynamic state of the system: e.g.,
joint positions and velocities xt = (qt , q̇t ).
1D point mass, damping & oscillation, PID, dynamics of mechanical
– f is an arbitrary function, often smooth
systems, Euler-Lagrange equation, Newton-Euler recursion, general
5:3
robot dynamics, joint space control, reference trajectory following,
operational space control
Outline
accounts for kinematic cou- accounts for dynamic coupling • We discuss computing the dynamics of general robotic systems
pling of joints but ignores in- of joints and full Newtonian – Euler-Lagrange equations
ertia, forces, torques physics – Euler-Newton method
• We derive the dynamic equivalent of inverse kinematics:
gears, stiff, all of industrial future robots, compliant, few – operational space control
robots research robots 5:4
5:5
5:1
The dynamics of a 1D point mass
When velocities cannot be changed/set arbi-
• Start with simplest possible example: 1D point mass
trarily
(no gravity, no friction, just a single mass)
• Examples:
– An air plane flying: You cannot command it to hold still in the
air, or to move straight up.
– A car: you cannot command it to move side-wards.
• The state x(t) = (q(t), q̇(t)) is described by:
– Your arm: you cannot command it to throw a ball with arbitrary
– position q(t) ∈ R
speed (force limits).
– velocity q̇(t) ∈ R
– A torque controlled robot: You cannot command it to instantly
change velocity (infinite acceleration/torque).
• The controls u(t) is the force we apply on the mass point
u = Kp (q ∗ − q)
5:7 5:10
• What’s the effect of this control law? • What’s the effect of this control law?
mq̈ = u = Kp (q ∗ − q) + Kd (0 − q̇)
m q̈ = u = Kp (q ∗ − q)
⇒ (m b ω 2 + Kp b) = 0 ∧ (q ∗ − a) = 0
p The term − K
2m
d
in ω is real ↔ exponential decay (damping)
⇒ ω = i Kp /m 5:11
√
q(t) = q ∗ + b ei Kp /m t
1D point mass – derivative feedback
This is an oscillation around q ∗ with amplitude b = q(0) − q ∗ and
p p
frequency Kp /m! −Kd ± Kd2 − 4mKp
∗ ω t
q(t) = q + b e , ω=
5:8 2m
0.5
-0.5
-1
0 2 4 6 8 10 12 14
5:9
• Idea 2
“Pull less, when we’re heading the right direction already:”
“Damp the system:” illustration from O. Brock’s lecture
5:13
u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇)
Introduction to Robotics, Marc Toussaint—February 4, 2014 29
Alternative parameterization: • Proportional and derivative feedback (PD control) are like adding
Instead of the gains Kp and Kd it is sometimes more intuitive to a spring and damper to the point mass
set the
ξ > 1: over-damped
• With such linear control laws we can design approach trajecto-
ξ = 1: critically dampled
ries (by tuning the gains)
ξ < 1: oscillatory-damped
– but no optimality principle behind such motions
√ 5:17
1−ξ2 t/λ
q(t) = q ∗ + be−ξ t/λ ei
• Idea 3
Two ways to derive dynamics equations for
“Pull if the position error accumulated large in the past:” mechanical systems
Z t
u = Kp (q ∗ − q) + Kd (q̇ ∗ − q̇) + Ki (q ∗ − q(s)) ds The Euler-Lagrange equation
s=0
d ∂L ∂L
• PID control − =u
dt ∂ q̇ ∂q
– Proportional Control (“Position Control”)
• L(q, q̇) is called Lagrangian and defined as
f ∝ Kp (q ∗ − q)
L=T −U
– Derivative Control (“Damping”)
f ∝ Kd (q̇ ∗ − q̇) (ẋ∗ = 0 → damping) where T =kinetic energy and U =potential energy.
• q is called generalized coordinate – any coordinates such that
(q, q̇) describes the state of the system. Joint angles in our case.
– Integral Control (“Steady State Error”)
Rt • u are external forces
f ∝ Ki s=0 (q ∗ (s) − q(s)) ds
5:20
5:16
30 Introduction to Robotics, Marc Toussaint—February 4, 2014
The Euler-Lagrange equation – Euler’s equation expresses the torque (=control!) act-
ing on a rigid body given an angular velocity and angular
• How is this typically done? acceleration:
• First, describe the kinematics and Jacobians for every link i: ui = Ii ẇ + w × Iw
X1 X1 ⊤
1
T = mi vi2 + w⊤
i I i wi = q̇⊤Mi q̇ , Mi = Jw
i m i I3 0 Ji • Backward recursion:
2 2 2 Ji 0 Ii Jiw
i i
For the leaf links, we now know the desired accelerations q ∗ and
where Ii = Ri I¯i R⊤ ¯
i and Ii the inertia tensor in link coordinates
can compute the necessary joint torques. Recurse backward.
• Third, formulate the potential energies (typically independent of q̇) 5:23
U = gmi height(i)
Numeric algorithms for forward and inverse
• Fourth, compute the partial derivatives analytically to get something like dynamics
d ∂L ∂L ∂T ∂U
u = − = M q̈ + Ṁ q̇ − + • Newton-Euler recursion: very fast (O(n)) method to compute
|{z} dt ∂ q̇ ∂q |{z} ∂q ∂q
control inertia | {z } |{z}
Coriolis gravity
inverse dynamics
u = NE(q, q̇, q̈ ∗ )
which relates accelerations q̈ to the forces
Note that we can use this algorithm to also compute
5:21
– gravity terms: u = NE(q, 0, 0) = G(q)
– Coriolis terms: u = NE(q, q̇, 0) = C(q, q̇) q̇ + G(q)
Example: A pendulum – column of Intertia matrix: u = NE(q, 0, ei ) = M (q) ei
fi = mv̇i
Introduction to Robotics, Marc Toussaint—February 4, 2014 31
• However, knowing the robot dynamics we can transfer our un- • Recall the inverse kinematics problem:
derstanding of PID control of a point mass to general systems – We know the desired step δy ∗ (or velocity ẏ ∗ ) of the endeffec-
5:27 tor.
General robot dynamics – Which step δq (or velocities q̇) should we make in the joints?
• Let (q, q̇) be the dynamic state and u ∈ Rn the controls (typically
• Equivalent dynamic problem:
joint torques in each motor) of a robot
– We know how the desired acceleration ÿ ∗ of the endeffector.
• Robot dynamics can generally be written as:
– What controls u should we apply?
M (q) q̈ + C(q, q̇) q̇ + G(q) = u 5:31
Multiple tasks
5:35
Introduction to Robotics, Marc Toussaint—February 4, 2014 33
– Normalization P (Ω) = 1
P
The conditional is normalized: ∀Y : X P (X|Y ) = 1
• Implications
0 ≤ P (A) ≤ 1 • X is independent of Y iff: P (X|Y ) = P (X)
(table thinking: all columns of P (X|Y ) are equal)
P (∅) = 0
A ⊆ B ⇒ P (A) ≤ P (B)
• The same for n random variables X1:n (stored as a rank n tensor)
P (A ∪ B) = P (A) + P (B) − P (A ∩ B) P
Joint: P (x1:n ), Marginal: P (X1 ) = X2:n P (X1:n ),
P (X1:n )
P (Ω \ A) = 1 − P (A) Conditional: P (X1 |X2:n ) = P (X2:n )
6:3 6:6
34 Introduction to Robotics, Marc Toussaint—February 4, 2014
1 ⊤
A-1 (x−a)
• n-dim: N(x | a, A) = 1
|2πA|1/2
e− 2 (x−a)
Product rule: P (X, Y ) = P (X|Y ) P (Y ) = P (Y |X) P (X)
x2
P (Y |X) P (X)
Bayes’ Theorem P (X|Y ) = P (Y ) x1
(b)
Useful identities:
Symmetry: N(x|a, A) = N(a|x, A) = N(x − a|0, A)
• The same for n variables, e.g., (X, Y, Z): Product:
N(x | a, A) N(x | b, B) = N(x | B(A+B)-1 a+A(A+B)-1 b, A(A+B)-1 B) N(a | b, A
B)
P (X, Z, Y ) =
Qn “Propagation”:
P (X1:n ) = i=1 P (Xi |Xi+1:n ) P (X|Y, Z) P (Y |Z) P (Z) R
y
N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A + F BF⊤)
P (X1 |X2:n ) = P (Y |X,Z) P (X|Z)
P (X|Y, Z) = P (Y |Z) Transformation:
P (X2 |X1 ,X3:n ) P (X1 |X3:n ) N(F x + f | a, A) = 1
N(x | F -1 (a − f ), F -1 AF -⊤ )
|F |
P (X2 |X3:n ) P (X,Z|Y ) P (Y )
P (X, Y |Z) = P (Z)
6:7 Mre identities: see “Gaussian identities” http://userpage.fu-berlin.de/
˜mtoussai/notes/gaussians.pdf
6:10
Bayes’ Theorem
PN
Distributions over continuous domain q(x) := i=1 wi δ(x − xi )
PN R
limN →∞ hf (x)iq = limN →∞ i=1 wi f (xi ) = x
f (x)p(x)dx = hf (x)ip
7 Mobile Robotics
State estimation, Bayes filter, odometry, particle filter, Kalman filter,
SLAM, joint Bayes filter, EKF SLAM, particle SLAM, graph-based
SLAM
http://stair.stanford.edu/multimedia.php
7:5
http://www.darpa.mil/grandchallenge/index.asp
7:6
http://www.slawomir.de/publications/grzonka09icra/
grzonka09icra.pdf
Outline
Bayes Filter
• Given the local sensor readings yt , the current state xt (location,
position) is uncertain. xt = state (location) at time t
yt = sensor readings at time t
– which hallway? ut-1 = control command (action, steering, velocity) at time t-1
7:9
pt (xt ) := P (xt | y0:t , u0:t-1 )
• Generally:
State Estimation Problem xt
Filtering: P (xt |y0:t ) 111111111
000000000
000000000
111111111
y 0:t
x t
Smoothing: P (xt |y0:T ) 000000000000000
111111111111111
000000000000000
111111111111111
y 0:T
• What is the probability of being in
xt
front of room 154, given we see Prediction: P (xt |y0:s ) 11111
00000
00000
11111
y 0:s
Z
= ct P (yt | xt ) P (xt | xt-1 , y0:t-1 , u0:t-1 ) P (xt-1 | y0:t-1 , u0:t-1 ) dxt-1 • The predictive distributions p̂t (xt ) without integrating observa-
xt-1
Z tions (removing the P (yt |xt ) part from the Bayesian filter)
= ct P (yt | xt ) P (xt | xt-1 , ut-1 ) P (xt-1 | y0:t-1 , u0:t-1 ) dxt-1
xt-1
Z
= ct P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
xt-1
Bayes Filter
Z
pt (xt ) ∝ P (yt | xt ) P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1
| {z } xt-1 | {z }
new information old estimate
| {z }
predictive estimate p̂t (xt )
7:19
Bayesian Filters
http://www.slawomir.de/publications/grzonka09icra/
grzonka09icra.pdf
(more identities: see “Gaussian identities” http://ipvs.informatik.uni-stuttgart. • Let P = y0:t past observations, F = yt+1:T future observations
de/mlr/marc/notes/gaussians.pdf)
R
Bayes Filter: pt (xt ) ∝ P (yt | xt ) xt-1
P (xt | ut-1 , xt-1 ) pt-1 (xt-1 ) dxt-1 Types of maps
• If P (yt | xt ) or P (xt | ut-1 , xt-1 ) are not linear: K. Murphy (1999): Bayesian map learning
in dynamic environments.
Grisetti, Tipaldi, Stachniss, Burgard, Nardi:
P (yt | xt ) = N(yt | g(xt ), W ) Fast and Accurate SLAM with
Rao-Blackwellized Particle Filters
P (xt | ut-1 , xt-1 ) = N(xt | f (xt-1 , ut-1 ), Q)
Victoria Park data set
– approximate f and g as locally linear (Extended Kalman Filter ) Laser scan map M. Montemerlo, S. Thrun, D. Koller, & B.
Wegbreit (2003): FastSLAM 2.0: An im-
– or sample locally from them and reapproximate as Gaussian proved particle filtering algorithm for simul-
taneous localization and mapping that prov-
(Unscented Transform) ably converges. IJCAI, 1151–1156.
7:28 7:32
Introduction to Robotics, Marc Toussaint—February 4, 2014 41
• Given the history y0:t and u0:t-1 , we want to compute the belief
over the pose AND THE MAP m at time t
7:33
• If we knew the map we could use a Bayes filter to compute the – Conceptually very simple
belief over the state
• Drawbacks:
P (xt | m, y0:t , u0:t-1 )
– Scaling (full covariance matrix is O(N 2 ))
– Sometimes non-robust (uni-modal, “data association problem”)
• SLAM requires to tie state estimation and map building together: – Lacks advantages of Particle Filter
Joint Bayesian Filter over x and m • Use a conditional representation “pt (xt , m) = pt (xt ) pt (m | xt )”
(This notation is flaky... the below is more precise)
• A (formally) straight-forward approach is the joint Bayesian filter
The conditional beliefs pit (m) may be factorized over grid points Particle-based SLAM summary
or landmarks of the map
• We have a set of N particles {(xi , wi )}N
i=1 to represent the pose
belief pt (xt )
K. Murphy (1999): Bayesian map learning in dynamic environ-
ments.
• For each particle we have a separate map belief pit (m); in the
http://www.cs.ubc.ca/˜murphyk/Papers/map_nips99.pdf
case of landmarks, this factorizes in N separate 2D-Gaussians
7:38
• Iterate
1. Resample particles to get N weight-1-particles: {x̂it-1 }N
i=1
Map estimation for a given particle history
2. Use motion model to get new “predictive” particles {xit }N
i=1
• Given x0:t (e.g. a trajectory of a particle), what is the posterior 3. Update the map belief pim (m) ∝ P (yt | m, xt ) pit-1 (m) for each
over the map m? particle
R
4. Compute new importance weights wti ∝ P (yt | xit , m) pt−1 (m) d
→ simplified Bayes Filter: using the observation model and the map belief
7:41
(no transtion model: assumption that map is constant) • Map building from a freely moving camera
Y
P (θ1:N | x0:t , y0:n , n0:t ) = P (θj | x0:t , y0:n , n0:t )
j
7:39
7:42
• Particle likelihood for Localization Problem: • Map building from a freely moving camera
• In SLAM the map is uncertain → each particle is weighted with – PTAM (Parallel Tracking and Mapping) parallelizes computa-
the expected likelihood: tions...
R
wti = P (yt | xit , m) pt−1 (m) dm
• Data association problem (actually we don’t know nt ): Newcombe, Lovegrove & Davison: DTAM: Dense Tracking and
For each particle separately choose nit = argmaxnt wti (nt ) Mapping in Real-Time ICCV 2011. http:
7:40 //www.doc.ic.ac.uk/˜rnewcomb/
7:43
Introduction to Robotics, Marc Toussaint—February 4, 2014 43
7:45
Graph-based SLAM
7:46
SLAM code
8:1 Outline
• Stability* An optimal policy has the property that whatever the initial state
Analyze the stability of a closed-loop system and initial decision are, the remaining decisions must constitute
→ Eigenvalue analysis or Lyapunov function method
an optimal policy with regard to the state resulting from the first
• Controllability* decision.
Analyze which dimensions (DoFs) of the system can actually in principle
be controlled
Introduction to Robotics, Marc Toussaint—February 4, 2014 45
1 1
1 3
Derivation:
3
7 1
15 Goal dV (x, t) ∂V ∂V
5 = + ẋ
Start 8 20 3 dt ∂t ∂x
3 3 ∂V ∂V
10 ∗
c(x, u ) = +
∗
f (x, u )
1 ∂t ∂x
1 ∂V ∂V
∗ ∗
5 3 − = c(x, u ) + f (x, u )
∂t ∂x
8:10
“V (state) = minedge [c(edge) + V (next-state)]”
8:7 Infinite horizon case
h T
X i h i
= min c(x, ut ) + min[ c(xs , us ) + φ(xT )]
ut π
s=t+1
The argmin gives the optimal control signal: π ∗ (x) = argminu · · ·
h i
= min c(x, ut ) + Vt+1 (f (x, ut )) 8:11
ut
Z T
• Reference following:
Jπ = c(x(t), u(t)) dt + φ(x(T ))
0 – You always want to stay close to a reference trajectory r(t)
˙
Define x̃(t) = x(t) − r(t) with dynamics x̃(t) = f (x̃(t) + r(t), u) − ṙ(t)
where the start state x(0) and the controller π : (x, t) 7→ u Define a cost Z ∞
are given, which determine the closed-loop system trajectory Jπ = ||x̃||2 + ρ||u||2
0
x(t), u(t) via ẋ = f (x, π(x, t)) and u(t) = π(x(t), t)
• Many many problems in control can be framed this way
8:9
8:12
linear dynamics
• The optimal control u∗ = −R-1 B⊤P x is called Linear Quadratic
ẋ = f (x, u) = Ax + Bu Regulator
quadratic costs
Note: If the state is dynamic (e.g., x = (q, q̇)) this control is linear
c(x, u) = x⊤Qx + u⊤Ru , φ(xT ) = x⊤T F xT
in the positions and linear in the velocities and is an instance of
and constant terms. This is because The matrix K = R-1 B⊤P is therefore also called gain matrix
For instance, if x(t) = (q(t) − r(t), q̇(t) − ṙ(t)) for a reference r(t) and
– constant costs are irrelevant K = [ Kp Kd ] then
– linear cost terms can be made away by redefining x or u
u∗ = Kp (r(t) − q(t)) + Kd (ṙ(t) − q̇(t))
– constant dynamic term only introduces a constant drift
8:14 8:17
u∗ = −R-1 B⊤P x
x= q q̇ q̇ = 0 1 x + 0 u
, =
ẋ =
q̇ q̈ u(t)/m 0 0 1/m
⇒ Riccati differential equation
= Ax + Bu , A= 0 1 , B= 0
⊤ ⊤ 0 0 1/m
−Ṗ = A P + P A − P BR B P + Q
-1
Introduction to Robotics, Marc Toussaint—February 4, 2014 47
T T
c(x, u) = ǫ||x||2 + ̺||u||2 , Q = ǫI , R = ̺I X X
min f (q0:T ) = ||Ψt (qt-k , .., qt )||2 + ||Φt (qt )||2
q0:T
t=0 t=0
• Algebraic Riccati equation:
• Reinforcement Learning:
−1
P = a
c ,
∗
u = −R B P x = -1 ⊤
[cq + bq̇]
c b ̺m – Markov Decision Processes ↔ discrete time stochastic con-
⊤
0 = A P + P A − P BR B P + Q -1 ⊤ trolled system P (xt+1 | ut , xt )
8:19
0= c b + 0 a − 1 c2 bc + ǫ 1 0
0 0 0 c ̺m2 bc b2 0 1 • As a starting point, consider the claim:
√ √ “Intelligence means to gain maximal controllability over all de-
First solve for c, then for b = m ̺ c + ǫ. Whether the damping
ration ξ = √b depends on the choices of ̺ and ǫ. grees of freedom over the environment.”
4mc
• Even if we can solve the HJB eq. and have the optimal control, 8:24
• Inverse Kinematics:
ẋ1 = 0 1 x 1 + 0 u
min f (q) = ||q − q0 ||2W + ||φ(q) − y ∗ ||2C
ẋ2
0
0 x2
1
q
Controllability 8:29
has full rank dim(x) (that is, all rows are linearly independent) • Instead of designing a controller by first designing a cost func-
tion and then applying Riccati,
• Meaning of C:
design a controller such that the desired state is provably a sta-
The ith row describes how the ith element xi can be influenced by u
“B”: ẋi is directly influenced via B ble equilibrium point of the closed loop system
“AB”: ẍi is “indirectly” influenced via AB (u directly influences some ẋj 8:31
via B; the dynamics A then influence ẍi depending on ẋj )
...
“A2 B”: x i is “double-indirectly” influenced
etc... Stability
Note: ẍ = Aẋ + B u̇ = AAx + ABu + B u̇
...
x = A3 x + A2 Bu + AB u̇ + B ü
• Stability is an analysis of the closed loop system. That is: for
8:26
this analysis we don’t need to distinguish between system and
Controllability controller explicitly. Both together define the dynamics
• When all rows of the controllability matrix are linearly indepen- ẋ = f (x, π(x, t)) =: f (x)
dent ⇒ (u, u̇, ü, ...) can influence all elements of x indepen-
dently • The following will therefore discuss stability analysis of general
• If a row is zero → this element of x cannot be controlled at all differential equations ẋ = f (x)
• If 2 rows are linearly dependent → these two elements of x will
remain coupled forever • What follows:
8:27
– 3 basic definitions of stability
ẋ1 = 0 0 x 1 + 1 u C= 1 0 rows linearly dependent
ẋ2 0 0 x2 1 1 0
ẋ1 = 0 0 x 1 + 1 u C= 1 0 2nd row zero
ẋ2 0 0 x2 0 0 0
ẋ1 = 0 1 x 1 + 0 u C= 0 1 good!
ẋ2 0 0 x2 1 1 0
8:28
Controllability
• “Intelligence means to gain maximal controllability over all de- Stability – 3 definitions
grees of freedom over the environment.”
– real environments are non-linear ẋ = F (x) with equilibrium point x = 0
– “to have the ability to gain controllability over the environment’s • x0 is an equilibrium point ⇐⇒ f (x0 ) = 0
DoFs”
Introduction to Robotics, Marc Toussaint—February 4, 2014 49
(when it starts off δ-near to x0 , it will remain ǫ-near forever) • Given a discrete time linear system
xt+1 = Axt
• Dynamics:
q̇ 0 1 x + 1/m 0 0 x • If there exists a D and a Lyapunov function ⇒ the system is
=
ẋ =
q̈ 0 0 −Kp −Kd asymtotically stable
A= 0 1
−Kp /m −Kd /m
If D is the whole state space, the system is globally stable
• Eigenvalues:
8:39
The equation λ q 0 1 q
= leads to the
q̇ −Kp /m −Kd /m q̇
equation λq̇ = λ q = −Kp /mq −Kd /mλq or mλ2 +Kd λ+Kp =
2
Lyapunov function method
0 with solution (compare slide 05:10)
p
−Kd ± Kd2 − 4mKp • The Lyapunov function method is very general. V (x) could be
λ=
2m “anything” (energy, cost-to-go, whatever). Whenever one finds
For Kd2 − 4mKp negative, the real(λ) = −Kd /2m some V (x) that decreases, this proves stability
⇒ Positive derivative gain Kd makes the system stable.
50 Introduction to Robotics, Marc Toussaint—February 4, 2014
• The problem though is to think of some V (x) given a dynamics! • Adaptive Control: When system dynamics ẋ = f (x, u, β) has
(In that sense, the Lyapunov function method is rather a method unknown parameters β.
of proof than a concrete method for stability analysis.) – One approach is to estimate β from the data so far and use
optimal control.
• In standard cases, a good guess for the Lyapunov function is – Another is to design a controller that has an additional internal
either the energy or the value function update equation for an estimate β̂ and is provably stable. (See
8:40 Schaal’s lecture, for instance.)
8:43
Lyapunov function method – Energy Exam-
ple
(We could have derived this easier! x⊤Qx are the immediate state costs,
and x⊤K⊤RKx = u⊤Ru are the immediate control costs—and V̇ (x) =
−c(x, u∗ )! See slide 11 bottom.)
A 2-wheeled racer
• Educational ideas:
– have a really dynamic system
– have a system which, in the “racing”
limit, is hard to control • See theoretical modelling notes
– learn about hardware, communica-
9:6
tion, etc
– challenges connecting theory with
practise: 3D Modelling
• Real world issues:
– control interface (“setting velocities”) • Account for centrifugal forces in a curve
is adventurous
• Generalized coordinates q = (x, y, φ, θ), with steering angle φ
– PARTIAL OBSERVABILITY: we only
have a noisy accelerometer & gyro-
scope
• Exercise: Derive general Euler-Lagrange equations
– unknown time delays
9:7
– unknown system parameters
(masses, geometry, etc)
9:1 Clash of theory and real world
9:8
Intro
The control interface
9:2
• From Marcel’s thesis: • Use a Bayes filter to estimate the state (q, q̇) from all sensor
– Control loop (around 36 msec) information we have
• Sensor information:
• Kalman filter tests on the accelerometer: – Accelerometer readings ãx,y,z
– Caroline – Gyro readings g̃x,y,z
9:5
– Motor positions θ̃l,r . Note that θ̃ ∝ x/r − θ desribes the
relative angle between the pole and the wheels
• Open issue: time delays – relevant?
52 Introduction to Robotics, Marc Toussaint—February 4, 2014
q̈ = Ax + a + B̄u
Coping with unknown system parameters
∂
A= M -1 (Bu − F ) , B̄ = M -1 B
∂x
• System identification
→ gradient check
• We derived the eqs of motion Bu = M q̈ + F (for 2D) – but don’t
know the parameters → Riccati eqn → nice controller [demo]
– mA , IA , mB , IB : masses and inertias of bodies A (=wheel) 9:15
and B (=pendulum)
– r: radius of the wheel Modelling overview II
– l: length of the pendulum (height of its COM)
• Sensor model
Or completely different: Reinforcement Learn- • Constrained Euler-Lagange equations for acceleration control
– Our motors actually don’t allow to set torques – but rather
ing set accelerations. Setting accelerations implies the con-
straint
• Alternatively one fully avoids modelling → Reinforcement Learn- B ′ q̈ = u′
ing – Using q̈ = M -1 (Bu − F ) we can retrieve the torque
u = (B ′ M -1 B)-1 [u′ + B ′ M -1 F ]
• Roughly: blackbox optimization (e.g., EA) of PD parameters
that exactly generates this acceleration
9:13 – Plugging this back into q̈ = M -1 (Bu − F ) we get
q̈ = B ′# u′ − (I − B ′# B ′ )M -1 F , B ′# = M -1 B(B ′ M -1 B)-1
Modelling
9:17
9:14
Modelling summary
Modelling overview I
• We now have all analytic models we need
We have exact analytical models (and implemented) for the fol-
• In simulation we have no problem to apply
lowing: – Riccati to retrieve a (locally) optimal linear regulator
– Kalman to optimally (subject to linearizations) estimate the
state
• Euler-Lagange equations
System Identification
9:19
System Identification
(x, u) 7→ y or P (y|x, u)
System Id examples: Dynamics
9:20
• If the dynamics ẋ = f (x, u) are unknown, learn them from data!
Regression options for system identification
Literature:
• Linear: (linear in finite number of parameters)
Moore: Acquisition of Dynamic Control Knowledge for a Robotic Manip-
f (x; θ) = φ(x)⊤θ ulator (ICML 1990)
Atkeson, Moore & Schaal: Locally weighted learning for control. Artifi-
cial Intelligence Review, 1997.
• Blackbox parameteric:
Schaal, Atkeson & Vijayakumar: Real-Time Robot Learning with Locally
– Given some blackbox parameteric model f (x; θ) with finite Weighted Statistical Learning. (ICRA 2000)
parameters θ; use blackbox optimization Vijayakumar et al: Statistical learning for humanoid robots, Autonomous
Robots, 2002.
• Non-parameteric:
9:24
– Kernel methods
– Gaussian processes
– Are closely related to linear models
Pn
Lls (θ) = i=1 (yi − f (xi ; θ))2
1 ls
• We can use the mean n
L (θ) as estimate of the output variance
2
σ to define
P (y|x; θ) = N(y|f (x; θ), σ 2 )
9:21
• If the kinematics φ are unknown, learn them from data! • Use a simple regression method (locally weighted Linear Re-
gression) to estimate ẋ = f (x, u)
9:25
Literature:
Regression basics
Todorov: Probabilistic inference of multi-joint movements, skeletal pa-
rameters and marker attachments from diverse sensor data. (IEEE
Transactions on Biomedical Engineering 2007) [ML slides]
Deisenroth, Rasmussen & Fox: Learning to Control a Low-Cost Manip-
9:26
ulator using Data-Efficient Reinforcement Learning (RSS 2011)
9:22
Applying System Id to the racer?
• Core problem:
We have no ground truth data!
Data
9:28
• Motor responses
– Free running wheels (no load..)
– Setting extreme target velocities v ∗ and different accelera-
tion levels a∗ ∈ {−10, .., −1, 1, .., 10} we can generate well-
defined accelerations
• Balancing trials
– the gyroscope picks up some oscillations
– the accelerometer is very noisy, perhaps correlated with
jerky controls
– only 30Hz!
9:29
Introduction to Robotics, Marc Toussaint—February 4, 2014 55
s0 s1 s2
r0 r1 r2
QT
P (s0:T +1 , a0:T , r0:T ; π) = P (s0 ) t=0 P (at |st ; π) P (rt |st , at ) P (st+1 |st , at )
RL = Learning to Act
• Stationary MDP:
– We assume P (s′ | s, a) and P (r|s, a) independent of time
R
– We also define R(s, a) := E{r|s, a} = r P (r|s, a) dr
from Satinder Singh’s Introduction to RL, videolectures.com 10:4
10:1
ẋ = f (x, u) + noise(x, u)
π : (x, t) 7→ u
10:5
(2007, Andrew Ng et al.)
10:2
Five approaches to RL
Markov Decision Process
56 Introduction to Robotics, Marc Toussaint—February 4, 2014
learn model learn value fct. optimize policy learn policy learn latent costs
Imitation Learning
P (x′ |u, x) V (x) π(x) π(x) c(x)
c(x, u)
Imitation Learning
dynamic prog. policy dynamic prog.
Vt (x) π(x) Vt
Policy Search
Model−based
policy
Model−free
Inverse OC
policy
πt (x) π(x)
10:7
Five approaches to RL
10:11
learn model learn value fct. optimize policy learn policy learn latent costs
P (s′ |s, a) V (s) π(s) π(s) R(s, a)
R(s, a)
Inverse RL
Imitation Learning
policy
Model−free
policy
Inverse RL
learn DP
π(s) π(s) D = {(s0:T , a0:T )d }n
d=1 → R(s, a) → V (s) → π(s)
Imitation Learning
Literature:
learn/copy Pieter Abbeel & Andrew Ng: Apprenticeship learning via inverse rein-
D = {(s0:T , a0:T )d }n
d=1 → π(s) forcement learning (ICML 2004)
Andrew Ng & Stuart Russell: Algorithms for Inverse Reinforcement Learn-
ing (ICML 2000)
• Use ML to imitate demonstrated state trajectories x0:T Nikolay Jetchev & Marc Toussaint: Task Space Retrieval Using Inverse
Feedback Control (ICML 2011).
10:12
Literature:
∂V (β)
3. Compute a new candidate policy πi that optimizes R(x) = • Another is PoWER, which requires ∂β
=0
⊤
w φ(x) and add to candidate list. PH
E{ξ|β} ǫt Qπ (st , at , t)
β←β+ Pt=0
H
(Abbeel & Ng, ICML 2004) E{ξ|β} t=0 Qπ (st , at , t)
10:13
10:17
10:14 Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
with k features φj .
∂V (β)
∂β Tedrake, Zhang & Seung: Stochastic policy gradient reinforcement learning on a
simple 3D biped. IROS, 2849-2854, 2004. http://groups.csail.mit.edu/
robotics-center/public_papers/Tedrake04a.pdf
10:16
10:19
Policy Gradients
Policy Gradients – references
• One approach is called REINFORCE:
Peters & Schaal (2008): Reinforcement learning of motor skills with policy gradi-
Z Z ents, Neural Networks.
∂V (β) ∂ ∂
= P (ξ|β) R(ξ) dξ = P (ξ|β) log P (ξ|β)R(ξ)dξ Kober & Peters: Policy Search for Motor Primitives in Robotics, NIPS 2008.
∂β ∂β ∂β
H H Vlassis, Toussaint (2009): Learning Model-free Robot Control by a Monte Carlo
∂ X∂ log π(at |st ) X ′
= E{ξ|β} log P (ξ|β)R(ξ) = E{ξ|β} γt γt −t
rt′ EM Algorithm. Autonomous Robots 27, 123-130.
∂β ∂β
t=0 t′ =t
| {z } Rawlik, Toussaint, Vijayakumar(2012): On Stochastic Optimal Control and Rein-
Qπ (st ,at ,t) forcement Learning by Approximate Inference. RSS 2012. (ψ-learning)
Stochastic Search
where we can only evaluate f (x) and g(x) for any x ∈ Rn
I haven’t seen much work on this. Would be interesting to consider this more
rigorously. • The parameter θ is the only “knowledge/information” that is be-
10:23 ing propagated between iterations
θ encodes what has been learned from the history
θ = (x̂) , pt (x) = N(x|x̂, σ 2 ) Hansen, N. (2006), ”The CMA evolution strategy: a comparing
review”
a n-dimenstional isotropic Gaussian with fixed deviation σ Hansen et al.: Evaluating the CMA Evolution Strategy on Multi-
modal Test Functions, PPSN 2004.
• Update heuristic:
– Given D = {(xi , f (xi ))}λi=1 , select µ best: D′ = bestOfµ (D)
– Compute the new mean x̂ from D′
10:35
yt −yt−1
where ẏ = τt
and hyiα is a low-pass filter
10:36
Introduction to Robotics, Marc Toussaint—February 4, 2014 61
intro)
Outline
Force closure, alternative/bio-inspired views
• Introduce to the basic classical concepts for grasping (force clo-
Grasping sure)
• References:
• In industrial settings with high precision sensors and actuators:
very fast and precise. Craig’s Introduction to robotics: mechanics and control – chapter 3.
Force Closure
11:2 – but can exert arbitrary normal forces. Can we generate (or
counter-act) arbitrary forces on the object?
Research 11:5
http://www.k2.t.u-tokyo.ac.jp/fusion/index-e.html
X X
f total = fi , τ total = fi × (pi − c)
i i
62 Introduction to Robotics, Marc Toussaint—February 4, 2014
• Guaranteed synthesis:
Biomechanics of the human hand
1) put fingers “everywhere” • Finger tendons:
2) while redundant finger exists delete any redundant finger
(A finger is redundant if it can be deleted without reducing the
positive linear span.)
yields a grasp with at most 6 fingers in the plane, at most 12 De Bruijne et al. 1999
fingers in three space.
11:8
See Just Herder’s lecture on “Biograsping”
http://www.slideshare.net/DelftOpenEr/bio-inspired-design-lectu
Traditional research into force closure 11:12
• Force closure turn into a continuous optimization criterion: Tendon-based hand mechanisms
DLR hand
11:13
64 Introduction to Robotics, Marc Toussaint—February 4, 2014
12:4
Biped locomotion
• Walking vs Running
12:1 Walking := in all instances at least one foot is on ground
Running := otherwise
Legged Locomotion
• 2 phases of Walking
• Why legs?
– double-support phase (in Robotics often statically stable)
– single-support phase (statically instable)
– Human/Animal Locomotion Research
Asimo
Rolling vs walking
12:6
• You could rest (hold pose) at any point in time and not fall over
One-legged locomotion • Try yourself: Move as slow as you can but make normal length
steps...
Introduction to Robotics, Marc Toussaint—February 4, 2014 65
12:7
See also: Popovic, Goswami & Herr: Ground Reference Points in Legged Loco-
motion: Definitions, Biological Trajectories and Control Implications. International
Journal of Robotics Research 24(12), 2005. http://www.cs.cmu.edu/˜cga/
Zero Moment Point legs/Popovic_Goswami_Herr_IJRR_Dec_2005.pdf
12:10
• Vukobratovic’s view on walking, leading to ZMP ideas:
“Basic characteristics of all biped locomotion systems are: Zero Moment Point
• If the ZMP is outside the support polygon → foot rolls over the
(i) the possibility of rotation of the overall system about one of
edge
the foot edges caused by strong disturbances, which is equiva-
lent to the appearance of an unpowered (passive) DOF, (presumes foot is a rigid body, and non-compliant control)
(ii) gait repeatability (symmetry), which is related to regular gait • Locomotion is dynamically stable if the ZMP remains within the
only foot-print polygons.
(↔ the support can always apply some momentum, if neces-
sary)
(iii) regular interchangeability of single- and double-support phases”
Kajita et al.: Biped Walking Pattern Generation by using Preview Control of Zero-
Moment Point. ICRA 2003. http://eref.uqu.edu.sa/files/eref2/folder1/
biped_walking_pattern_generation_by_usin_53925.pdf
12:9 12:12
• “ZMP is defined as that point on the ground at which the net mo- • ZMP is the “rescue” for conventional methods:
ment of the inertial forces and the gravity forces has no compo-
– ZMP-stability → the robot can be controlled as if foot is “glued”
nent along the horizontal axes.” (Vukobratovic & Borovac, 2004)
(virtually) to the ground!
– The whole body behaves like a “conventional arm”
• fi = force vector acting on body i (gravity plus external)
– Can accellerate q̈ any DoF → conventional dynamic control
wi = angular vel vector of body i fully actuated system
3×3
Ii = interia tensor (∈ R ) of body i
ri = pi − pZMP = relative position of body i to ZMP • Limitations:
– Humans don’t use ZMP stability, we allow our feet to roll
• Definition: pZMP is the point on the ground for which (toe-off, heel-strike: ZMP at edge of support polygon)
P ! ⊤
i ri × fi + Ii ẇi + wi × Ii wi = (0, 0, *) – Can’t describe robots with point feet (walking on stilts)
66 Introduction to Robotics, Marc Toussaint—February 4, 2014
12:13
12:15
1. Compass Gait:
12:16
2. Pelvic Rotation:
controlled on a circle
passive walker
Models of human bipedal locomotion • Switch between two consecutive swing phases: depends on
slope!
• Suggest different principles of human motion:
• Typical assumptions made in simulation models:
– passive dynamics (Compass Gait) ↔ underactuated system – The contact of the swing leg with the ground results in no rebound and
no slipping of the swing leg.
– modulation of basic passive dynamics
– At the moment of impact, the stance leg lifts from the ground without
– Energy minimization interaction.
– The impact is instantaneous.
12:18
– The external forces during the impact can be represented by impulses.
– The impulsive forces may result in an instantaneous change in the
Passive dynamic walking: Compass Gait velocities, but there is no instantaneous change in the configuration.
Westervelt, Grizzle & Koditschek: Hybrid Zero Dynamics of Planar Biped Walkers.
IEEE Trans. on Automatic Control 48(1), 2003.
http://repository.upenn.edu/cgi/viewcontent.cgi?article=1124&con
ese_papers
12:22
Goswami, Thuilot & Espiau: A study of the passive gait of a compass-like biped
robot: symmetry and chaos. International Journal of Robotics Research 17, 1998.
http://www.ambarish.com/paper/COMPASS_IJRR_Goswami.pdf
12:19
12:23
12:27
Learning To Walk
12:24
Summary
• ZMP type walking was successful (ASIMO, HRP-2, etc), but lim-
ited
12:25
12:26
http://www.bostondynamics.com/robot_rise.html
Introduction to Robotics, Marc Toussaint—February 4, 2014 69
a) The example solution generates a motion in one step Bonus: How can we apply joint space interpolation?
using inverse kinematics δq = J ♯ δy with J ♯ = (J⊤J + How could one avoid zero velocities at the junction of
σ 2 W )-1 J⊤. Record the task error, that is, the deviation sections?
of hand position from y ∗ after the step. Why is it larger
than expected?
motion profiles within each. The motion should reach write the pseudo code (see http://en.wikibooks.org/
the final position with very high accuracy in finite time wiki/LaTeX/Algorithms#Typesetting_using_the_
and without collisions. algorithmicx_package )
Introduction to Robotics, Marc Toussaint—February 4, 2014 71
e) (Bonus) Follow the smooth trajectory using a sinus of the subroutines needed in lines 4-6 of the algorithm
motion profile using kinematic control. on slide 03:58. (No need to implement it.)
13.6 Exercise 6
13.5.2 A distance measure in phase space for As above, try to hold the arm steady at q ∗ = 0 and
q̇ ∗ = 0. But now use the knowledge of M and F in
kinodynamic RRTs each time step. For this, decide on a desired wave-
length λ and damping behavior ξ and compute the re-
Consider the 1D point mass with mass m = 1 state spective Kp and Kd (assuming m = 1), the same for
x = (q, q̇). The 2D space of (q, q̇) combining position each joint. Use the PD equation to determine desired
and velocity is also called phase space. accelerations q̈ ∗ (slide 05:31) and use inverse dynam-
ics to determine the necessary u.
Draft an RRT algorithm for rapidly exploring the phase
space of the point mass. Provide explicit descriptions Try this for both, pegArm.ors and pegArm2.ors.
72 Introduction to Robotics, Marc Toussaint—February 4, 2014
In the exercise 3 you generated nice collision-free tra- On slide 06:11 there is the definition of a multivariate
jectories for peg-in-a-hole using inverse kinematics. (n-dim) Gaussian distribution. Proof the following using
only the definition. (You may ignore terms independent
a) Follow these reference trajectories using PD accel- of x.)
eration control (slide 05:31) and thereby solve the peg-
in-a-hole problem with a noisy dynamic system. a) Proof that:
b) Prove:
N(x | a, A) N(x | b, B) ∝ N x (A-1 + B -1 )-1 [A-1 a +
13.7 Exercise 7 B -1 b] , (A-1 + B -1 )-1
c) Prove:
13.7.1 Particle Filtering the location of a car
R
y
N(x | a + F y, A) N(y | b, B) dy = N(x | a + F b, A +
Start from the code in RoboticsCourse/05-car. F BF⊤)
c) Test the full particle filter including the likelihood weights What is the matrix C(xt ) (the Jacobian of the landmark
(step 4) and resampling (step 2). Test using a larger positions w.r.t. the car state) in our example?
(10σobservation ) and smaller (σobservation /10) variance in the
computation of the likelihood. c) Start with the code in RoboticsCourse/06-kalmanSLAM.
Introduction to Robotics, Marc Toussaint—February 4, 2014 73
Write a Kalman filter to track the car. You can use the 13.9.2 Cart pole swing-up
routine getObservationJacobianAtState to access
∂y θ
C(xt ) = ∂x . Note that c(xt ) = ŷt − C(xt )xt , where ŷt
is the mean observation in state xt (there is another
routine for this). u
13.8.2 Kalman SLAM The cart pole (as described, e.g., in the Sutton-Barto
book) is a standard benchmark to test stable control
strategies. We will assume the model known.
Slide 07:38 outlines how to use a high-dimensional Kalman
filter to simultaneously estimate the robot position (lo- The state of the cart-pole is given by x = (x, ẋ, θ, θ̇),
calization) and the landmarks position (mapping). with x ∈ R the position of the cart, θ ∈ R the pendulums
angular deviation from the upright position and ẋ, θ̇ their
a) Concerning P (yt |xt , θ1:N ) we assume respective temporal derivatives. The only control signal
u ∈ R is the force applied on the cart. The analytic
P (yt |xt , θ1:N ) = N(yt | D(xt )θ + d(xt ), σobservation ) model of the cart pole is
h i
where D(xt ) = ∂θ∂y
is the observation Jacobian w.r.t. the g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
θ̈ = 4 (2)
unknown landmarks, and θ ∈ R2N is the same as θ1:N 2
3 l − c2 cos (θ)
h i
written as a 2N -dim vector. ẍ = c1 u + c2 θ̇2 sin(θ) − θ̈ cos(θ) (3)
Write a pseudo-code for Kalman SLAM. (There is much
with g = 9.8ms2 the gravitational constant, l = 1m the
freedom in how to organize the code, choices of nota-
pendulum length and constants c1 = (Mp + Mc )−1 and
tion and variables, etc. Try to write is as concise as
c2 = lMp (Mp + Mc )−1 where Mp = Mc = 1kg are the
possible.)
pendulum and cart masses respectively.
b) Try to implement Kalman SLAM, which tracks the car
a) Implement the system dynamics using the Euler in-
simultaneous to the landmarks. You should now access
tegration with a time step of ∆ = 1/60s. Test the imple-
the routines getMeanObservationAtStateAndLandmarks
mentation by initializing the pole almost upright (θ = .1)
and getObservationJacobianAtStateAndLandmarks
and watching the dynamics. To display the system,
to retrieve the mean observation and the necessary Ja-
start from the code in course/07-cartPole. The
cobians given the current mean estimate θ of the land-
state of the cart pole can be displayed using OpenGL
marks.
with the state.gl.update() function.
system state in each Euler integration step). respective temporal derivatives. The only control signal
u ∈ R is the force applied on the cart. The analytic
model of the cart pole is
h i
g sin(θ) + cos(θ) −c1 u − c2 θ̇2 sin(θ)
13.10 Exercise 10 θ̈ = (4)
4
3l − c2 cos2 (θ)
h i
p̈ = c1 u + c2 θ̇2 sin(θ) − θ̈ cos(θ) (5)
13.10.1 Read the other exercise
with g = 9.8ms2 the gravitational constant, l = 1m the
On the webpage there is a 2nd exercise sheet e10- pendulum length and constants c1 = (Mp + Mc )−1 and
riccati. Please read this carefully. You don’t need to c2 = lMp (Mp + Mc )−1 where Mp = Mc = 1kg are the
do the exercise – the Octave solution was anyway in pendulum and cart masses respectively.
07-cartPole/cartPole.m. But you need to under-
stand what’s happening – we will do exactly the same a) Derive the local linearization of these dynamics around
for the Racer below. x∗ = (0, 0, 0, 0). The eventual dynamics should be in
the form
ẋ = Ax + Bu
13.10.2 Balance the Racer
Note that
Download the new libRoboticsCourse.13.tgz from
0 1 0 0 0
the webpage. This now includes a folder 09-racer,
∂ p̈
∂ p̈ ∂ p̈ ∂ p̈
∂ p̈
∂p ∂ ṗ ∂θ ∂u
which simulates the racer using Runge Kutta. Note that A= ∂ θ̇
, B=
0 0 0 1
0
q = (x, θ). Currently it applies a control signal u = 0.
∂ θ̈ ∂ θ̈ ∂ θ̈
∂ θ̈
∂ θ̈
∂p ∂ ṗ ∂θ ∂ θ̇ ∂u
Design a controller π : (q, q̇) 7→ u that balances the
robot. where all partial derivatives are taken at the point p =
ṗ = θ = θ̇ = 0.
Implement a Kalman filter to estimate the state trajec- (x, ypred ) 7→ ytrue
tory q(t) from this data. For this,
a) What is the mean squared error of (ypred − ytrue )2 (not Consider a dynamic without Coriolis forces and con-
using the learned mapping) stant M (independent of a and q̇).
b) What is the mean squared error of (f (x, ypred )−ytrue )2 Can you give sufficient conditions on U (q) (potential en-
using the learned linear map f ergy) and u(q) (control policy) such that energy is a Lya-
punov function?
c) (Bonus) Can you weave in this learned mapping into
the Kalman filter of the first exercise?
13.14 Exercise 13
π : φ(y) 7→ u = φ(u)⊤w
13.13.2 Stable control for the cart-pole
that maps some features of the (history of the) direct
sensor signals y to the control policy.
Consider a linear controller u = w⊤x with 4 parameters
w ∈ R4 for the cart-pole. Use black-box optimization to find parameters w that
robustly balance the racer.
a) What is the closed-loop linear dynamics ẋ = Âx of
the system? Some notes:
• Cost function: Realistically, the running costs ct control). The pole itself cannot be articulated, only bal-
would have to be defined on the sensor signals anced using the arm. Above the arm (in the “ceiling”)
only. (On the real robot we don’t know the real is a hole. Initially the pole is hanging down; the prob-
state – if the robot is to reward itself it needs to lem is to swing up the pole and then balance it to in-
rely on sensor signals only.) I tried sert it in the ceiling’s hole without collision. We assume
costs += .1*MT::sqr(y(3)) + 1.*MT::sqr(y(2)); to have full access to the system dynamics in the form
q̈M (q)+F (q̇, q) = u, where q = (q arm , q pole ) contains the
which combines the wheel encoder y(3) and the
6 arm DoFs q arm ∈ R6 and the 2 pole DoFs q pole ∈ R2 .
gyroscope reading y(2). That worked mediocre.
Since we cannot actuate the pole directly, upole ≡ 0.
For a start, cheat and directly use the state of the
simulator to compute costs: If you would like to see the system, download the newest
costs += 1.*MT::sqr(R.q(0)) + 10.*MT::sqr(R.q(1)); version of the http://userpage.fu-berlin.de/˜mtoussai/source-code/
libRoboticsCourse.11.1.tgz and have a look at course/08-pin_ba
• Episodes and duration costs: To compute the
cost for a given w you need to simulate the racer For all questions: Write down the precise controller equa-
for a couple of time steps. For the optimization tions, or cost function, or constraints, or search/optimization
it is really bad if an episode is so long that it in- strategy, or whatever is necessary to precisely define
cludes a complete failure and wrapping around your solution.
of the inverted pendulum. Therefore, abort an
episode if fabs(R.q(1)) too large and penal- a) Hold steady: Let q arm ∈ R6 be the 6D arm joints
ize an abortion with an extra cost, e.g., propor- and q0arm their initial position. How can you hold the arm
tional to T −t. Try different episode horizons T up steady around q0 (counter-balance gravity and potential
to 500; maybe increase this horizon stage-wise. perturbations while the pole is hanging down)? Which
methods from the lecture do you use?
• Optimizer: You are free to use any optimizer you
like. On the webpage you find a reference imple- b) Swing up (1): How can you find a rough plan of how
mentation of CMA by Niko Hansen (with wrap- to swing up the pole? Which methods from the lec-
per using our arr), which you may use. In that ture do you use? (What does not work: using the local
case, add cmaes.o and search_CMA.o in the linearization of the pole around standing-up, because
Makefile. The typical loop for optimization is this linearization is totally wrong if the pole is hanging
SearchCMA cma; down.)
cma.init(5, -1, -1, -0.1, 0.1);
arr samples, values;
uint T=500;
c) Swing up (2): Given a rough plan of the swing up,
for(uint t=0;t<1000;t++){ is there a way to follow this rough plan directly? Alter-
cma.step(samples, values);
natively, how would you refine
for(uint i=0;i<samples.d0;i++) values(i) = evaluateControlParameters(samples[i], T); the plan to become an
uint i=values.minIndex();
cout <<t <<’ ’ <<values(i) <<’ ’ <<samples[i] <<endl; “optimal” swing up? Which methods from the lecture
}
do you use?
• Basics
This list summarizes the lecture’s content and is in-
– Path finding vs. trajectroy optimization vs. feed-
tended as a guide for preparation for the exam. (Go-
back control (3:5,6)
ing through all exercises is equally important!) Refer-
ences to the lectures slides are given in the format (lec- – Roughly: BUG algorithms (3:8-12)
ture:slide). – Potential functions, and that they’re nothing but
IK with special task variables (3:17)
– Dijkstra Algorithm (3:26,18-25)
14.1 Kinematics
• Probabilistic Road Maps (PRMs)
• 3D geometry – Definition (3:29,30) & Generation (4:31)
– Definition of an object pose, frame, transforma- – Importance of local planner (3:32)
tions (2:5,6)
– Roughly: knowing about probabilistic complete-
– Homogeneous transformations (4×4 matrix) (2:8,9)
ness (3:34)
– Composition of transformations, notation xW =
– Roughly: alternative sampling strategies (3:35)
TW →A TA→B TB→C xC (2:10)
• Rapidly Exploring Random Trees (RRTs)
• Fwd kinematics & Jacobian
– Algorithm (3:39)
– Fwd kinematics as composition of transforma-
tions (2:12) – Goal-directed (3:40) & bi-direction (3:42) exten-
sions
– Transformations of a rotational joint (2:13)
– Kinematic maps φpos : q 7→ y and φvec (2:15) • Non-holonomic Systems
– Definition of a Jacobian (2:17) – Definition of non-holonomicity (3:36) Be able to
– Derivation of the position and vector Jacobians give example
J pos , J vec (2:18,19) – Path finding: control-based sampling (3:52)
– Optimality criterion for IK (2:23,24) – Roughly: Intricacies with metrics for non-holonomic
systems (3:58-60)
– Using the local linearization to find the optimum
(2:26)
– Pseudo code of Inverse Kinematics control (2:28)
14.3 Dynamics
– Definition & example for a singularity (3:24)
– Motion profiles (esp. sine profile) (2:32) – General form of a dynamic system (5:3)
– Joint space vs. task space interpolation of a – Dynamics of a 1D point mass (5:6)
motion (2:34,35) [e.g. using a motion profile in – Position, derivative and integral feedback to con-
one or the other space] trol it to a desired state (5:7,10,15)
• Multiple Tasks – Solution to the closed-loop PD system equa-
tions (5:11)
– How to incorporate multiple tasks (2:40,41)
– What are interesting task variables; know at – Qualitative behaviors: oscillatory-damped, over-
leads about pos, vec, align, and limits (2:46-52) damped, critically damped (5:13,14)
– Joint space approach to follow a reference tra- – Definition of the (continuous time) optimal con-
ref
jectory q0:T (5:30) trol problem (8:9)
– Concept & definition of the value function (8:10)
– HJB equation (8:10)
14.4 Mobile Robotics – Infinite horizon → stationary solution (8:11)
– Awareness that optimal control is not the only
• Probability Basics approach; it shifts the problem of designing a con-
– Definitions of random variable, probability dis- troller to designing a cost function.
tribution, joint, marginal, conditional distribution, • Linear-Quadratic Optimal Control
independence (6:4-7)
– Definition of problem (esp. assumptions made)
– Bayes’ Theorem (6:7,8) (8:14)
– Continuous distributions, Gaussian, particles – Be able to express system dynamics in (locally
(6:9-13) linearized) standard form ẋ = Ax + Bu (8:19)
– Fact that the value function is quadratic V (x, t) =
• State Estimation
x⊤P (t)x (8:16)
– Formalization of the state estimation problem – Riccati differential equation (8:16)
(7:13)
– How P gives the optimal Linear-Quadratic Reg-
– The Bayes Filter as the general analytic solu- ulator (8:17)
tion (7:14,15)
– Algebraic (infinite horizon) Riccati equation (8:18)
– Gaussians and particles to approximate the Bayes
filter and make it computationally feasible: • Controllability
– Definition and understanding/interpretation of
– Details of a Particle Filter (7:22)
the controllability matrix C (8:27)
– Kalman filter (esp. assumptions made, not eq.
– Definition of controllability (8:26,27)
or derivation) (7:26)
– Be able to apply on simple examples (8:28)
– Extended KF (assumptions made) (7:28)
• Stability
– Odometry (dead reckoning) as “Bayes filter with-
out observations” (7:17,18) – Definitions of stability (8:34)