Realtime Omnidirectional Stereo For Obstacle Detection and Tracking in Dynamic Environments
Realtime Omnidirectional Stereo For Obstacle Detection and Tracking in Dynamic Environments
Realtime Omnidirectional Stereo For Obstacle Detection and Tracking in Dynamic Environments
omnidirectional image at once. Although the image resolution is low, its realtime nature is suitable for mobile robot
applications.
This paper describes a realtime omnidirectional stereo
system and its application to detection of static and dynamic obstacles. The system uses a pair of verticallyaligned omnidirectional cameras. The input images are
converted to panoramic images, in which epipolar lines become vertical and in parallel. The stereo matching is performed by a PC cluster system to realize a realtime range
calculation for a relatively large image size. For obstacle
detection, a map of static obstacles is first generated. Then
candidates for moving obstacles are extracted by comparing the current observation with the map. The temporal correspondence between the candidates are examined
based on their estimated position and velocity which are
calculated using a Kalman filter-based tracking.
1 Introduction
Detection of obstacles and free spaces is an essential function of the vision system for mobile robots. Even if a robot
is given a map of the environment, this function is indispensable to cope with unknown, possibly dynamic, obstacles or errors of the map. Many works (e.g., [8]) use laser
range finders to detect obstacles. Laser range finders which
scan a 2D plane have a drawback that objects at a specific height can only be detected. Stereo vision is also used
widely (e.g., [4]). Conventional stereo systems, however,
suffer from a narrow field of view because they usually use
a pair of ordinary cameras.
In the case of mobile robots, it is sometimes necessary to obtain panoramic range information of 360 degrees
because obstacles may approach from various directions.
There are two methods to obtain panoramic images. One
is to capture a sequence of images with rotating a camera
and then to integrate the images into one panoramic image
(e.g., [5, 10]). Although this method could obtain a highresolution image, it takes a long time to get an image and
is not suitable for robots in dynamic environments. The
other method is to use a special lens or mirror to obtain an
since each omnidirectional camera is supported by a cantilever attached to a vertical bar, there is a blind spot in the
omnidirectional images.
x =
y =
P(X, Y, Z)
O
image plane
c
X
p(x, y)
x
c2 )
Xf (b2 c2 )
,
(Z c) 2bc (Z c)2 + X 2 + Y 2
Y f (b2 c2 )
,
(b2 + c2 ) (Z c) 2bc (Z c)2 + X 2 + Y 2
focal point
OM
hyperboloidal mirror
X2 + Y2
Z2
= -1
a2
b2
(b2
has two drawbacks: limited field of view and large deformation in the peripheral areas of the converted images. In
addition, to obtain a 360-degree range image, we have to
use at least 3 HOVIs placed, for example, on the vertices
of a regular triangle.
The other configuration is to align HOVIs vertically (see
Fig. 1(b)). Each omnidirectional image is converted to a
panoramic image on a cylindrical image plane whose axis
is aligned to those of the HOVIs. By this conversion, all
epipolar lines become vertical in the panoramic images.
Based on the above consideration, we selected the latter
configuration, which is the same as the one proposed by
Gluckman et al. [3]. Fig. 2 shows a pair of verticallyaligned omnidirectional cameras. In the current setup,
Fig. 4: Panoramic image obtained from the input image shown in Fig. 2.
Fig. 5: Panoramic disparity image obtained from the images in Fig. 4. Brighter pixels are nearer.
PC0
PC1
image input
image input
panoramic
conversion
panoramic
conversion
stereo
matching
stereo
matching
PC2 -- PC5
image distribution
stereo
matching
...
stereo
matching
result collection
result output
f (b2 c2 )
,
(b2 + c2 )(U Ly/h) 2bc 1 + (U Ly/h)2
tan u , L = tan u + tan l ,
robot
obstacle region
1.8
r / y
1.6
safe region
1.4
1.2
0
Rmin(d) R(d) Rmax(d)
0.8
0.6
10
20
30
40
50
60
70
80
90
vertical position y
where u and l are the viewing angle above and the below of the focal point of the mirror; h is the height of the
panoramic image. From this equation, we can calculate
how many pixels in the original omnidirectional image corresponds to one pixel in the panoramic image at a specific
vertical position, denoted as r/y. Fig. 7 shows the calculation result of r/y at all vertical positions; the lower
part (larger y) of the panoramic image is degraded because
that part corresponds to the central part of the original omnidirectional image, where the resolution is low compared
with the peripheral areas. For the positions where r/y
is less than one, the uncertainty in disparity should be considered larger than 1. In that case, the possible range of
the distance (see eq. (1)) is increased accordingly.
Using the above uncertainties, we interpret each point of
disparity d in the range profile as follows. In Fig. 8, the angle indicates the angular resolution of the panoramic image (currently, 0.5 degrees); R(d), Rmax (d), and Rmin (d)
are the distance of the point from the robot and their maximum and minimum values mentioned above. We use the
dark gray trapezoid in the figure to approximate the region
where an obstacle may exist. We call this region an obstacle region. The area before the obstacle region approximated by the light gray triangle, called a safe region, is the
region where an obstacle never exists as long as the stereo
matching is correct. Safe regions are used for making a
map of static obstacles, while obstacle regions are used for
detecting moving obstacles (see Sec. 4.1).
3.2.2 Generation of Free Space Map
We use a grid representation for the map. To cope with
false matches in stereo, we accumulate multiple observations to obtain reliable map information. Each grid of the
map holds the counter which indicates how many times the
grid has been observed to be safe. At each observation, the
counter of each grid inside the safe regions is incremented.
If the counter value of a grid is higher than a certain threshold, the grid is considered safe. The set of safe grids constitutes the current free spaces (called a free space map).
where wt and vt are the error terms. wt represents the acceleration of the obstacle; the maximum allowed acceleration (i.e., so-called 3 value) is set to 1[m/s2 ]. vt represents the observation error (see Sec. 3.2.1). The matrices
in the above equations are given by:
Ft
Ht
xt = (xt , yt , x t , y t ) ,
where (xt , yt ) and (x t , y t ) are the position and the velocity
in the world coordinates at time t. The robot obtains the
following observation y t for each moving obstacle:
T
y t = (xot , yto ) ,
where (xot , yto ) is the observed position of the obstacle.
Supposing a constant velocity of a moving obstacle, we
obtain the following state transition equation and observation equation:
xt+1
Ft xt + wt ,
yt
Ht x t + v t ,
0
1
0
0
t
0
1
0
1
0
0
1
0
0
0
0
0
t
,
0
1
0
0
tracking result
free space
static obstacle
moving obstacle candidate
movement of robot
Fig. 11: A tracking result.
Acknowledgments
The authors would like to thank Prof. Yagi of Osaka
University for his useful advice on omnidirectional cameras. This research is supported in part by the Kurata Foundation, Tokyo, Japan.
References
5 Experimental Results
The total throughput of omnidirectional stereo, visual localization, update of free space map, and dynamic object
detection and tracking is currently about 0.38[s] per frame.
Fig. 11 shows a tracking result in the case where one
person was approaching the robot. In earlier frames (left
ones), there are several branches in the tracking tree (i.e.,
several traces exist). As time elapses (right frames), all
branches except the true one are deleted. Fig. 12 shows the
projection of the estimated position of the person onto the
panoramic images. The figure shows that the tracking was
correctly performed.
6 Conclusion
We have developed a realtime omnidirectional stereo system using two omnidirectional cameras and a PC cluster. The system can generate omnidirectional range data of
720x100 pixels with disparity range of 80 about 5 frames
per second. We also have developed a method of detecting
[1] I.J. Cox. A Review of Statistical Data Association Techniques for Motion Correspondence. Int. J. of Computer Vis.,
Vol. 10, No. 1, pp. 5366, 1993.
[2] O. Faugeras et al.. Real-Time Correlation-Based Stereo:
Algorithm, Implementation and Application. Tech. Report
2013, INRIA Sophia Antipolis, 1993.
[3] J. Gluckman, S.K. Nayar, and K.J. Thoresz. Real-Time Omnidirectional and Panoramic Stereo. In Proc. of Image Understanding Workshop, 1998.
[4] S. Kagami, K. Okada, M. Inaba, and H. Inoue. Design and
Implementation of Onbody Real-time Depthmap Generation System. In Proc. ICRA-200, pp. 14411446, 2000.
[5] S.B. Kang and R. Szeliski. 3-D Scene Data Recovery Using
Omnidirectional Multibaseline Stereo. Int. J. of Computer
Vis., Vol. 25, No. 2, pp. 167183, 1997.
[6] T. Katayama. Application of Kalman Filter. Asakura
Shoten, 1983. (in Japanese).
[7] K. Kidono, J. Miura, and Y. Shirai. Autonomous Visual
Navigation of a Mobile Robot Using a Human-Guided Experience. In Proc. IAS-6, pp. 620627, 2000.
[8] E. Prassler and J. Scholz. Tracking Multiple Moving Objects for Real-Time Navigation. Autonomous Robots, Vol. 8,
No. 2, pp. 105116, 2000.
[9] K. Yamazawa, Y. Yagi, and M. Yachida. Omnidirectional
Imaging with Hyperboloidal Projection. In Proc. IROS-93,
pp. 10291034, 1993.
[10] J.Y. Zheng and S. Tsuji. Panoramic Representation of
Scenes for Route Understanding. In Proc. 10th ICPR, pp.
161167, 1990.