4.2.stereo-geo

Stereo Vision
- Geometry
Lu Yang
Email: yanglu@uestc.edu.cn
School of Automation Engineering, UESTC
What is Stereo Vision
Ll Lr
Stereo Vision = Correspondences + Reconstruction:

• Correspondence: Given a point pl in one image, find the
corresponding point pr in the other image.
• Reconstruction: Given a correspondence (pl, pr), compute
the 3D coordinates P of the corresponding point in space.
◇ Stereo vision is the process of recovering the three-
dimensional location of points in the scene from their
projections in images.
◇ Types: Window/Local based algorithms and Global

algorithms
◇ Triangulation: Given pl, we know that the point P lies

on the line Ll joining pl and the left optical center Cl.
Assuming we know exactly the parameters of the cameras,
we can explicitly compute the parameters of Ll and Lr .
Therefore, we can compute the intersection of the two
lines, which is the point P.
Local Stereo Algorithm Steps
◇ Matching cost computation: SSD, SAD, MSE, MAD
◇ Cost aggregation:
◇ Disparity computation and optimization
◇ Disparity refinement
Finding Correspondences
Sparse and dense correspondences
How to solve the correspondence
problem
2D search domain?
No! Thanks to the

epipolar constraints.
Rectification
Motivation: Images are almost always rectified before
searching for correspondences in order to simplify the
search.
Implementation: Rectifying the input images so that
corresponding horizontal scanlines are epipolar lines;
Matching horizontal scanlines independently while
computing matching scores.
Simple solution: rotate both

cameras so that they are looking
perpendicular to the line joining
the camera centers c0 and c1.
• Given a plane P in space, there exists two homo-graphies Hl
and Hr that map each image plane onto P.
• If we map both images to a common plane P such that P is

parallel to the line ClCr, then the pair of rectified images is
such that the epipolar lines are parallel.
• With proper choice of the coordinate system, the epipolar

lines are parallel to the rows of the image.
The algorithm for rectification is then:
• Select a plane P parallel to CrCl
• Define the left and right image coordinate systems on P
• Construct the rectification matrices Hl and Hr from P
and the virtual image’s coordinate systems.
Example from Seitz/ Szeliski
Disparity
d  x1  xr
• Assuming that images are rectified to simplify things, given

two corresponding points pl and pr , the difference of their
coordinates along the epipolar line xl-xr is disparity d.
• The disparity is the quantity that is directly measured from
the correspondence.
• It turns out that the position of the corresponding 3-D
point P can be computed from pl and d, assuming that the
camera parameters are known.
Disparity
arg min  ( I ( x, y), I ( x  d , y))

l r
d x , yW
Stereo Matching
The problem of stereo matching is to find the coordinate xr
of the corresponding pixel in the same row in the right image.
The difference d = xr – xl is called the disparity at that pixel.
The basic matching approach is to take a window W centered

at the left pixel, translate that window by d and compare the
intensity values in W in the left image and W translated in the
right image. The comparison metric typically has the form:
S (d )   ( I ( x, y), I ( x  d , y))
x , yW
l r
Matching Functions
SSD (Sum of Squared Difference): The SSD tends to
magnify pixel errors because of the quadratic function
of the difference. This can caused problems in noisy
images.
 ( I l ( x, y ), I r ( x  d , y ))  I l ( x, y )  I r ( x  d , y )  2
SAD(Sum of Absolute Difference): It is better behaved

than the SSD because it does not increase
quadratically with pixel error. Depending on the
architecture it may be faster.
 ( I l ( x, y ), I r ( x  d , y ))  I l ( x, y )  I r ( x  d , y )
Correlation: ψ is the product of the intensities. The maximum
of correlation corresponds to the best match. Correlation is
roughly equivalent to SSD. It really should be used with
normalization.
 ( I l ( x, y ), I r ( x  d , y ))  I l ( x, y )  I r ( x  d , y )
Normalized Correlation: The normalized correlation has the

advantage that it is always between -1 and +1,with +1 for a
perfect match. It is obviously more expensive to compute.
I l ( x, y )  I r ( x  d , y ) - I l I r
 ( I l ( x, y ), I r ( x  d , y )) 
 l r (d )
Left Image Right Image
Disparity
Image
• Disparity is the inverse of the depth

• Larger disparity for points closer to the cameras
Matching Algorithm Implementation:
for x=1:xsize
for y=1:ysize
for d=dmin:dmax Note that
• The loops in x,y and d can be inverted
Sbest = max; •computation.
This looks like an enormous amount of
It turns out that this can be
S(d) = 0; done efficiently by re-using partial results.
for u = x-w:x+w
Each pixel
Disparity
for v = y-w:y+w
Level S(d) = S(d) + y(d) Cost
Aggregation
if(S(d) < Sbest)
Sbest = S(d)
dbest(x,y) = d
The value of the matching function
Recover the 3-D coordinates of the scene points
Similar triangles: PCrCl and Pprpl
Z Z f

B B  xl  xr
W W
Origin B  ( xl    xr )
2 2
pl pr
Bf
Z
Cl Cr d
Assuming that the two cameras have parallel image planes

f -The focal length of the cameras
B -the distance between the optical centers
Some important results:
• This relation is the fundamental relation of stereo.
• The depth is inversely proportional to the disparity.
• Once we know Z, the other two coordinates are derived
using the standard perspective equations:
Z Z
X  xl , Y  yl
f f
• Camera calibration: Intrinsic and extrinsic parameters.
Overview of a Stereo Vision System
Error Analysis
Consider a matching pair of disparity d corresponding to a
depth Z. We want to evaluate ∆Z, the error in depth due to
the error in disparity. Taking the derivative of Z as a function
of d, we get:
Z Bf Z 2
Z'  2 
d d Bf
The fundamental relation between baseline, focal length and
accuracy of stereo reconstruction:
Z2
Z  d
Bf
Depth: The resolution of the stereo reconstruction decreases
quadratically with depth. This implies severe limitation on the
applicability of stereo. (Sub-pixel disparity interpolation)
Baseline: The resolution improves as the baseline increases.
However, the matching becomes increasingly difficult as the
baseline increases.
Focal length: For a given image size, the density of pixels in
the image plane increases as f increases. Therefore, the
disparity resolution is higher.
The point P is reconstructed as the
point that is the closest to both lines.
• Assumes that the viewing rays from the left and right
cameras intersect exactly.
• The two viewing rays pass close to each other but do not
exactly intersect due to small errors in calibration.
Sub-Pixel Disparity
◎ The disparity is computed by moving a window
one pixel at a time. As a result, the disparity is
known only up to one pixel.
◎ This limitation on the resolution of the disparity

translates into a severe limitation on the accuracy of
the recovered 3-D coordinates.
◎ One effective way to get this problem is to

recover the disparity at a finer resolution by
interpolating between the pixel disparities using
quadratic interpolation.
• We can obtain a second order approximation of the
(unknown) function S(d) by approximating S by a
parabola.
• At the position dopt corresponding to the bottom of the
parabola, we have S(dopt) <= S(d).
S (1)  S (1)
d opt 
2( S (1)  S (1)  2 S (d 0 ))
How to find this approximating parabola
Let us first translate all the disparity values so that d = 0.

The equation of a general parabola is:
S(d) =ad 2+bd+c
To recover the 3 parameters of the parabola we need 3

equations which we obtain by writing that the parabola
passes through the point of disparity 0, -1, and +1:
S(0) = c; S(1) =a+b+c; S(-1) =a-b+c.
Solving this, we obtain:

c = S(0), a = ( S(1) + S(-1) - 2 S(0))/2, b = ( S(1) - S(-1))/2.
The bottom of the parabola is obtained at dopt such that
S’(d ) = 2ad+b = 0. Therefore, the optimal disparity is
obtained as:
S (1)  S (1)
d opt 
2( S (1)  S (1)  2 S (0))
• Estimating disparity with a fraction of a pixel resolution

is possible using this interpolation approach.
• The denominator is close to 0 if the function S is mostly
flat, in which case there is no valid estimate of the
disparity.
Matching Confidence
Stereo matching assumes that there is enough information
in the images to discriminate between different positions
of the matching window.
That is not the case in regions of the image in which the

intensity is nearly constant. In those regions, there is not a
single minimum of the matching function and the disparity
cannot be computed.
Two ways of detecting that the disparity estimate is
unreliable
• To look at the curvature of the matching function near its
minimum. Using the standard deviation of the matching
values for a few values of the disparity around the
minimum; Using the curvature of the fitted parabola
• To first apply a filter to the input images, such that the
output of the filter at a pixel is high if there is enough
information in its neighborhood. The main criterion is
that there is enough variation of intensity in the
direction of matching, i.e., along the x direction, Σix (a
measure of reliability of matching, confidence measure)
Lighting Issues
Problem: The lighting conditions maybe substantially
different between the left and right image.
Reason: For example, of different exposures or different

settings of the camera.
Because ψ measures directly the difference in pixel values,

its value will be corrupted.
Two ways to address this problem: Normalized

correlation (expensive)
A simple model:
Il(x , y) = aIol(x,y) + bx +c,
where Il is the actual image, and Iol is the “ideal” image.

This amounts to applying a scaling followed by a linear
ramp across the image. Using the SSD, we have:
 (d )  ( I 1( x, y )  I r ( x  d , y )) 2  ((a  1) I o l  bx  c) 2
• Even if the ideal images match perfectly, the matching

function on the actual images can be arbitrarily far from 0.
• The normalized correlation reduces this problem by
eliminating the terms a and c. However, it is still corrupted
by the ramping factor b.
Lighting Conditions (Photometric Variations)
Laplacian of Gaussian (LOG)
A better way of eliminating smooth variations across the
images is to take the second derivative, because the
second derivative of a linear function of the form bx+c is 0,
and therefore eliminates this component.
More generally, smoothly varying parts of the image do not

carry much information for matching.
The useful information is contained in higher-frequency

variations of intensity.
Laplacian Pyramid Implementation
Effect of Window Size
Window size has qualitatively the same effect as smoothing:
Localization is better with smaller windows. In particular, the

disparity image is corrupted less near occluding edges
because of better localized match.
Matching is better with larger windows because more pixels

are taking into account, and there is therefore better
discrimination between window positions.
Window Shape and Forshortening
Assumption: The window around Pl is compared to a
window of the same size and shape around Pr.
In fact, given a square window in the left image, the

corresponding pixels in the right image do not
necessarily lies in a square window (let alone of the
same size.)
Thus, matching results may be very poor by using the
same window in the left and right image to do the
matching.
This is because the perspective projection distorts the

geometry and does not preserve angles or parallelism.
Foreshortening: Things farther away appear smaller.
Window Shape: Fronto-parallel Configuration
Ambiguity
□ Problem: In many cases, several positions along the
epipolar line can match the window around the left pixel.
□ If the ambiguous matches are far apart, they will

correspond to very different points in space, thus leading
to large 3-D reconstruction errors.
□ It is nearly impossible to eliminate all such

ambiguities, especially is environments with lots of
regular structures.
□ Solution: use more than two cameras in stereo.

Example SRI
Videre
Example: Point Grey Systems
Example (old) JPL
• Large baseline
• 5-10Hz
• LOG pyramid
• Rectification
• Sub-pixel parabolic interpolation
• More recent: Real-time, high resolution
• http://www.middlebury.edu/stereo/
• www.ptgrey.com
• www.ai.sri.com/~konolige/svs
D. Scharstein and R. Szeliski. A taxonomy and
evaluation of dense two-frame stereo
correspondence algorithms. International Journal of
Computer Vision, 47(1):7-42, May
2002.
Stereo evaluation
Stereo—best algorithms

4.2.stereo-geo

Uploaded by

Copyright:

Available Formats

4.2.stereo-geo

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

4.2.stereo-geo

Uploaded by

Copyright:

Available Formats

Stereo Vision

Stereo Vision = Correspondences + Reconstruction:

◇ Types: Window/Local based algorithms and Global

◇ Triangulation: Given pl, we know that the point P lies

◇ Disparity computation and optimization

No! Thanks to the

Simple solution: rotate both

• If we map both images to a common plane P such that P is

• With proper choice of the coordinate system, the epipolar

• Assuming that images are rectified to simplify things, given

arg min  ( I ( x, y), I ( x  d , y))

The basic matching approach is to take a window W centered

SAD(Sum of Absolute Difference): It is better behaved

Normalized Correlation: The normalized correlation has the

• Disparity is the inverse of the depth

Assuming that the two cameras have parallel image planes

◎ This limitation on the resolution of the disparity

◎ One effective way to get this problem is to

Let us first translate all the disparity values so that d = 0.

S(d) =ad 2+bd+c

To recover the 3 parameters of the parabola we need 3

Solving this, we obtain:

• Estimating disparity with a fraction of a pixel resolution

That is not the case in regions of the image in which the

Reason: For example, of different exposures or different

Because ψ measures directly the difference in pixel values,

Two ways to address this problem: Normalized

Il(x , y) = aIol(x,y) + bx +c,

where Il is the actual image, and Iol is the “ideal” image.

• Even if the ideal images match perfectly, the matching

More generally, smoothly varying parts of the image do not

The useful information is contained in higher-frequency

Localization is better with smaller windows. In particular, the

Matching is better with larger windows because more pixels

In fact, given a square window in the left image, the

This is because the perspective projection distorts the

□ If the ambiguous matches are far apart, they will

□ It is nearly impossible to eliminate all such

□ Solution: use more than two cameras in stereo.

You might also like

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.