Robotics

Robotics
Lecture 2.2
Perception
● Perception is the process of taking sensor data and extracting useful information
from it
● It’s application to robotics is broad, and includes
○ Object detection for safety and social interactions
○ Localization
○ Pose estimation of objects for manipulation
○ Environment modeling
● We are going to look at predominantly classical computer vision techniques in this
lecture
Classical Computer Vision
● In classical computer vision, we are assuming that the features of an image we are trying
to identify can be modeled, and are using analytical methods to fit these models
● Classical computer vision techniques are useful and important
○ When our confidence in model assumptions is high, they can be very accurate, and have a much lower
implementation cost than machine learning techniques
○ Often machine learning techniques require some form of image pre-processing that depend on classical
techniques for feature extraction
● We are going to focus on
○ Edge detection: A very common and essential pre-processing technique
○ Model fitting with outliers: A very flexible technique across images, lidar data and more
● We are not going to cover
○ Corner and blob detection
○ Colour thresholding
○ Template matching
○ There is much more, but just to give you a taste
Machine Learning
● Machine learning allows us to produce models for prediction or classification
based on labelled data
● For example, if we want to detect a child’s voice from a directional microphone,
each datapoint of our dataset would have
○ An audio signal
○ A direction vector
○ A boolean, true if the audio wasa child’s voice
● And our trained model would
○ take in as input the audio and the direction
○ produce as output whether it was a child’s voice
● Machine learning models are often black box solutions. We won’t know from our
model what about the audio or direction makes a child’s voice detectable, we’ll
just know we can detect one.
Other Sensors
● Perception doesn’t just operate on 2D images, it also works on
○ 3D images and point clouds
○ Laser scans
○ Audio signals
○ Environmental sensors such as pressure, temperature, humidity
○ Motional sensors such as IMU, GPS
● Many algorithms for classical computer vision also apply to these sensors
● Machine learning also applies to these
● We will be focusing predominantly on
Classical Perception: Edge Detection
● Edges are important because many of
our models for things are based on
edges. For example
○ Identifying objects via their shape
○ Identifying patterns such as QR codes
○ Identifying lines such as the horizon or
planes such as a tabletop
● We know from neurological studies -
and personal experience - that the
brain uses edges to identify things
Canny Edge Detection
The Canny Edge Detection algorithm has the following steps
1. Smooth the image to reduce noise

2. Compute the gradient of the image intensity
a. This means we are finding which pixels have neighbours with very different intensities
3. Threshold the image at some gradient value
a. This means we are choosing to only keep pixels whose gradient magnitude is above the threshold
4. Perform non-maximum suppression to get thinner lines
a. If you think of the high gradient pixels as a ridge, we only want the steepest part of the ridge
5. Perform hysteresis to remove weak line segments
a. This is a technique of choosing only pixels that are connected according to some thresholds
Image basics
● Images are grids of pixels, usually
with each pixel containing RGB
values.
● This makes digital images discrete,
they sample light colour across a
fixed grid.
● We can
○ Transform these images into different
representations
○ Sample them to different qualities
○ Apply filters to these images
Image basics
● We can reprsent images in terms of
intensity: How bright each pixel is.
● We can show this representation in
the form of a grey scale image, as
above, where the pixel intensity scales
from black to white.
● We can show this representation as a
function, as below, where the
intensity is shown as height.
● Note that this isn’t actually a smooth
function, since images are discrete
Image basics
● An edge is going to look like a steep
cliff in the intensity function
● Noise in the image intensity means
that the intensity rapidly changes by
small amounts everywhere
● Whereas at the true edges, it changes
consistently
● Noise makes it hard to find an edge,
in the below example, how do you
find the steep change in intensity?
Step 1: Remove Noise 0.1 0.2 0.1
0.2 0.3 0.2

● Filtering images is a process of 0.1 0.2 0.1
changing each pixels value, based on
kernel
its value the values of pixels around it
● The way in which the candidate pixel
and it’s neighbours are used is
defined the by the kernel
1 4 3 2
● Applying this filter to the image is a
process called convolution 1 1 1 2
● We flip the kernel in both directions,
2 1 5 5
then move it across the image,
summing the product of all 1 3 5 5
overlapping cells
input image
Step 1: Remove Noise 0.1 0.2 0.1
0.2 0.31 0.24 3 2

● As the kernel is moved over the
0.1 0.21 0.11 1 2
image, the products of each
intersecting pixel pair are summed up 2 1 5 5
● This kernel is symmetrical, so we 1 3 5 5
don’t notice the flip
● Read this example for an example
1x0.3 + 4x0.2 + 1x0.2 + 1x0.1 = 1.4
with a flip!
1.4
Step 1: Remove Noise
0.11 0.24 0.13 2
● Therefore the kernel can be thought
0.21 0.31 0.21 2
of as the weights we are applying
● A kernel that has high weights in the 0.12 0.21 0.15 5
centre favours the pixel being filtered 1 3 5 5
● Convolution is denoted with a
1x0.1 + 4x0.2 + 3x0.1 +
centered asterisk as below, where f is 1x0.2 + 1x0.3 + 1x0.2 +
the image and g is the kernel 2x0.1 + 1x0.2 + 5x0.1 = 2.8
1.4 2.4 2.6 1.7
1.6 2.8 3.7 3.0
1.6 3.3 5.0 4.5
1.4 3.0 4.7 4.0

Step 1: Remove Noise - Gaussian Filter
● To filter out image intensity noise we
use a Gaussian filter
● The 2D Gaussian is given above with
it’s plot below
● To create a Gaussian kernel, we can
sample this function at the desired
sampling rate
Step 1: Remove Noise - Gaussian Filter
● The way to think of how this filter works is
○ The candidate pixel receives the most weight (see the
peak in the middle), but still less than half.
○ It is then made to look similar to it’s neighbours, with
the nearest ones being given the most weight, and
further neighbours dropping off exponentially.
○ So if you have an outlier, it will be made to look similar
to it’s neighbours, but preserve some differentiation
● A taller thinner Gaussian curve preserves more
of each pixel
● A shorter flatter Gaussian curve favours
uniformity
● See how the pixels on either side of the edge are
made grey, with the corners darker since they
have more black neighbours
Step 2: Calculate the derivative
● The derivative or gradient of an
image is a vector that points in the
direction of the steepest change in
intensity, with the magnitude of that
intensity change
● This is straightforward when dealing
with continuous functions, but we
need to calculate this for a discrete
image
● We can take advantage of a special property of convolution:
Filtering an image by the derivative of the kernel is the same as
filtering the image with the kernel then taking it’s derivative.
● This means we can combine step 1 and 2, and can easily take the derivative of our
discrete image.
● So let’s look at the derivatives of the
Gaussian
○ Top left is the x-derivative of a Gaussian
curve
○ Top right is the y-derivative of a Gaussian
curve
● Below are the outputs when applied
to our input image from before
○ Going left to right, the x-derivative shows
the jump in intensity on the left edge of
the square and the drop in intensity on
the right edge
○ The y does the same going top to bottom
● For edge detection, we don’t actually
care yet about the direction of the
derivate, just it’s magnitude, so we
can go a step further
● The Laplacian of a Gaussian is the
magnitude of it’s derivative
● If we filter our image with this, each
pixel of the output will represent how
much the intensity of that pixel,
differed from it’s neighbours
● The mathematics can make it seem
complicated, but at the end of the day
we are just
○ Evening out pixels to look like their
neighbours a bit
○ Seeing, after that, how much each one
differs from it’s neighbours
● The mathematics just allow us to do
this elegantly with few steps
Step 3: Threshold the derivative magnitude
● Now we have a greyscale image
where the brighter the pixel, the
strong the change in intensity
compared to its neighbours
● We eventually want to convert this to
a binary image, where 1 indicates an
edge and 0 indicates no edge
● So we pick a threshold, some
minimum level of intensity change,
and set all the pixels below it to zero
Step 3: Threshold the derivative magnitude
● In doing this, we have eliminate
anything that changes intensity too
gradually
● Notice how a lot of the hair
disappeared, if you look at the
original image, you will see the
changes in intensity within the hair is
not that strong
Step 4: Non-maximum supression
● One thing about our image so far, is
that the “lines” that we’ve found are
multiple pixels wise
● This makes it ambiguous where
precisley the edge is
● We want to reduce these to be
one-pixel-wide lines
● Remember in Step 2 we were able to
calculate the x and y gradient
separately?
● These two values give us a vector that
points in the direction of the gradient
● The line this vector is useful because
it shows us where the edge is crossed
at this part of the image
● Our goal is to pick the pixel along
this line that has the steepest gradient
magnitude
● Because images are grids, we are only
given eight directions to move in,
however our gradient line could have
any angle
● So we can’t really compare a pixel, to
it’s “neighbours” along this line
because they often don’t exist
● We can however compare it to virtual
neighbours, by interpolating
● Thus our non-maximum supression
algorithm is:
○ If pixel q has a higher magnitude gradient
than p and r, keep it
○ If not, set it to zero
Step 5: Hysteresis
● With all the steps we’ve done so far, it
is still possible for tiny segments to
show up as “edges”.
● This is because all that is needed so
far is
○ A high enough intensity gradient
○ Not having any neighbours with higher
intensity gradient
● This means we can have weak (just
above the threshold) non-zero values
that can be as small as a single pixel
● We want to only deal with pixels that
are part of some edge line
Step 5: Hysteresis
● To accomplish this we:
○ Define two thresholds, a low and a high
■ pixel > high → pixel is strong edge
■ low < pixel < high →pixel is weak edge
■ pixel < low →pixel is not an edge
○ For each pixel
■ If it is a strong edge, keep it
■ If it is a weak edge and it is connected to a strong edge by N other edge pixels (weak or strong) keep it
Classical Computer Vision: RANSAC
● Now that we have edges, we need to
actually detect something!
● RANSAC stands for Random Sample
Consensus
● It is an algorithm that allows us to fit
models to data while excluding
outliers
● For example, in the data on the right,
we can see a clear line between the
corners. However if we fit a line using
regression, the outliers distort the
result
RANSAC
● RANSAC will take a random subset
of the data, and then fit a model just
to that data
● It will then count the number of data
points that are close enough to this
model, and call them inliers
● It then repeats this many times, and
returns the fitted model that had the
most inliers
RANSAC
The algorithm is
1. Sample k data points randomly
2. Solve for the best fit model
parameters
3. Score by the fraction of inliers within
some threshold of the model
4. Repeat 1-3 S times and return the
highest scoring model
In this example, the model parameters
would be the radius and the circle centre.
RANSAC
RANSAC
RANSAC: Choosing the right S
● Let’s say that any given data point has
probability p of being an inlier
● We can therefore say that when we take k
samples, the probability that all of them are
inliers is pk, and the probability that at least one
is not is 1-pk
● We can therefore say that the probability that
after S iterations, we did not find one sample
where all points were inliers is (1-pk)S
● We can call the probability that we found one
sample with all inliers P, therefore 1-P = (1-pk)S
● We can now solve for S, and plug in the sample
size, and the confidence we want to have that
we’ve found one sample with all inliers.
How would these techniques apply to robotics?
● Tag detection
○ Tags are known patterns that robots can use to predict the pose of something
○ For example, a checker board with known square size and number
○ We’d use Canny edge detection to extract the borders
○ We’d use RANSAC to fit polygons to the lines
○ We’d then use geometry to determine the pose based on the size and angles of the shapes
● Plane detection
○ We have 3D data from a 3D camera or lidar
○ We want to detect objects on the ground
○ We use RANSAC to fit a plane to the data
○ We subtract the plane from our data and find clusters that intersect with that plane
● Object detection
○ We have a 2D camera
○ We want to detect bottle caps on the floor
○ We use Canny edge detection to extract edges
○ We use RANSAC to detect circles

Robotics

Uploaded by

Document Informationclick to expand document information

Copyright:

Available Formats

Robotics

Uploaded by

Document Information

Original Title

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Robotics

Uploaded by

Copyright:

Available Formats

Robotics

1. Smooth the image to reduce noise

0.2 0.3 0.2

0.2 0.31 0.24 3 2

1.4 2.4 2.6 1.7

1.6 2.8 3.7 3.0

1.6 3.3 5.0 4.5

1.4 3.0 4.7 4.0

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.