3d Object Reconstruction
3d Object Reconstruction
3d Object Reconstruction
Introduction
3D reconstruction
• Industry, for instance, in clothing industry (e.g. [1, 2]), on-line mea-
application fields, such as:
• Security systems, like in visual surveillance (e.g. [25, 26]) and biometric
struction (e.g. [21, 22]) or archeological documentation (e.g. [23, 24]);
Fig. 3. Visual hull obtained from two different viewpoints (C1, C2)
Fig. 4. Color consistency: if the voxel is inside the object surface it will reproject
the same color onto all viewpoints where it is visible (left); otherwise, if the voxel
is outside the object surface it will most likely reproject distinct colors
Fig. 5. Relation between photo and visual hull : the real object is contained inside
the photo hull that is in tern inside the visual hull
Methodologies followed
In this work, SFM and GVC methods were tested on two objects with dif-
ferent shape properties: a simple parallelepiped and a human hand model.
The parallelepiped has a straightforward topology, with flat orthogonal
surfaces, whose vertices are easily detected in each image and simply
matched along the acquired image sequence. On the contrary, the hand
model has a smooth surface and a more complicated shape.
SFM methodology
To test the SFM method, we follow the methodology proposed in [50], and
resumed in Fig. 6:
1. the first step is to acquire two uncalibrated images, of the object to be
reconstructed, using a single off-the-shelf digital camera;
2. then, image feature points of the considered object are ext racted. Fea-
ture or interesting points are those who reflect the relevant discrepan-
cies between their intensity values and those of their neighbors.
Usually, these points represent vertices, and their correct detection al-
lows posterior matching along the image sequences acquired. Many
algorithms for interest points detection are available, but the point
features detectors based on the Harris’ s principles, [51], are the most
commonly used;
3. after being ext racted, feature points must be matched. The matching
process is a 2D points association between sequential images that are
the projection of the same 3D object point. Automatic detection of
matching points between images can be achieved using several cross-
correlation processes. They all use small image windows from a first
image as templates for matching in the subsequent images, [52]. The
most common matching methods include Normalized Cross-
Correlation, [53, 54], and Sum-of-Squared-Differences, [50, 55];
4. then the epipolar geometry is estimated. Epipolar geometry deter-
mines a pairwise relative orientation and allows for rejection of pre-
vious false matches (or outliers). When the interior orientation para-
meters of both images are the same, it mathematically expresses itself
by the fundamental matrix, a projective singular correlation between
two images, [56]. At least 7 matches are required to compute the fun-
damental matrix, but to cope with possible outliers, robust methods of
estimation are required. In general, the RANSAC – RANdom Sampling
Consensus - algorithm, [57], achieves a robust estimation of the epi-
polar geometry;
5. next step is image rectification. It is the act of projecting two stereo
images onto a common plane, such that pairs of conjugate epipolar
lines (derived from the fundamental matrix) become collinear and pa-
rallel to one of the image axes. Performing this step simplifies the
posterior process of dense matching, because the search problem is
reduced to 1D;
6. finally, dense matching is performed, where a disparity map is ob-
tained. A disparity map codifies the distance between the object and
the camera(s): closer points will have maximal disparity and farther
points will get zero disparity. For short, a disparity map gives some
perception of discontinuity in terms of depth (2.5D reconstruction).
GVC methodology
To test the GVC method we follow the methodology proposed in [58], and
represented in Fig. 7.
turntable device, with the same chessboard pattern beneath it; keeping
the camera untouched, the second sequence of images is acquired, spin-
ning the turntable device until a full rotation is performed.
Fig. 7. GVC methodology followed for the 3D reconstruction of objects
No restrictions are made on the number of images acquired, nor the ro-
tation angle between two consecutive images of the second image se-
quence needs to be known.
Then, the used camera is calibrated, in order to find the transformation
that maps the 3D world in the associated 2D image space. The calibration
procedure is based on Zhang’ s algorithm, [59]. Intrinsic parameters (focal
length and principal point) and distortion parameters (radial and tangential)
are obtained from the first image sequence; using the second image se-
quence, the extrinsic parameters (rotation and translation) associated with
each viewpoint considered in the reconstruction process are determined.
Then, to obtain the object silhouettes from the input images, image
segmentation is performed. This step is required, because, even when the
scene background has low color variation, the photo-consistency criterion
may not be sufficient for accurate 3D reconstructions, [60]. Also, since the
used calibration pattern will rotate along with the object to be recon-
structed, it will not be considered has background and, consequently, will
be reconstructed as if it was part of the object of interest. Images are here
segmented by first removing the red and green channels from the original
RGB images and, finally, by image binarization using a user-defined thre-
shold value.
Combining the original image sequence and associated silhouette im-
ages, and considering the previously obtained camera calibration parame-
ters, the 3D models are built using the GVC volumetric method imple-
mented in [61].
Finally, the volumetric model obtained is polygonized and smoothed us-
ing the Marching Cubes algorithm ([62]). Basically, this algorithm extracts
a polygonal surface from the volumetrical data. Thus, it proceeds through
the voxelized model, and, for each voxel, it determines the polygon(s)
needed to represent the patch of the isosurface that passes through the re-
ferred voxel.
Experimental results
In this section, some of the obtained experimental results for both followed
methodologies and both considered objects will be presented and analyzed.
SFM method
Fig. 8 shows the acquired stereo image pairs of both objects used in this
work.
Fig. 8. Stereo image pairs of the objects used to test the SFM reconstruction me-
thod
For both objects, 200 image features were extracted using the Harris’ s
corner detector, [51], imposing a minimum distance between each detected
feature. Robust matching of features between the stereo images was made
using the RANSAC algorithm, [57]. The results obtained can be observed
in Fig. 9. Since the hand model presents a smooth surface, obviously many
wrong matches were detected and, consequently, the determined epipolar
geometry will be incorrectly estimated.
After, both stereo pairs were rectified using the algorithm presented in
[63]. As observed in Fig. 10 and Fig. 11, the results were much less accu-
rate for the hand model, due to the wrong matches from the previous step.
This caused a strong image distortion during the rectification step for this
object.
Then, dense matching was performed using Stan Birchfield’ s algorithm,
[64]. The results obtained for both objects considered in this work can be
observed in Fig. 12 and Fig. 13. Again, from the incorrect results obtained
in the previous steps, the dense matching for the hand model was, conse-
quently, of low quality. For the parallelepiped object case, the generated
disparity map matches reality better.
Fig. 9. Results of the (robust) feature points matching for both objects considered:
green crosses represent the matched feature points of the first image and the red
crosses represent the correspondent matched feature points of the second image
Fig. 10. Rectification results for the stereo images of the parallelepiped object
Fig. 11. Rectification results for the stereo images of the hand model object
Fig. 12. Disparity map obtained for the parallelepiped object
Fig. 13. Disparity map obtained for the hand model object
GVC method
Fig. 14 shows some examples of the second image sequence acquired for
the 3D reconstruction of both objects using the GVC method.
Fig. 14. T hree images used for the 3D reconstruction of the parallelepiped (top)
and the hand model (bottom)
For both objects considered, the results of the extrinsic calibration pro-
cedure are represented in Fig. 15. T he 3D graphics shown represent the
viewpoints considered in the second image acquisition process, consider-
ing the world coordinate system fixed on the lower-left corner of the
chessboard pattern and the camera rotating around the object.
Fig. 15. 3D graphical representation of the extrinsic parameters obtained from the
camera calibration process for the parallelepiped object case, on the left, and for
the hand model case, on the right
Fig. 16. One example of image segmentation for the parallelepiped (top) and the
hand model (bottom): on the left, the original image; on the right, the binary image
obtained
Fig. 17. Two different viewpoints (by row) of the 3D model obtained for the pa-
rallelepiped case: on the left, original image; in the centre, voxelized 3D model;
on the right, polygonized and smoothed 3D model
Fig. 18. Two different viewpoints (by row) of the 3D model obtained for the hand
model case: on the left, original image; in the centre, voxelized 3D model; on the
right, polygonized and smoothed 3D model
Fig. 19. Three different viewpoints (by row) of the 3D model obtained for the tor-
so model case: on the left, original image; in the centre, voxelized 3D model; on
the right, polygonized and smoothed 3D model
Fig. 20. Comparison of the obtained measurements from the reconstructed 3D
models with the real objects measures
Conclusions
The main goal of this paper was to compare experimentally two commonly
used image-based methods for 3D object reconstruction: Structure From
Motion (SFM) and Generalized Voxel Coloring (GVC).
To test and compare both methods, two objects with different shape
properties were used: a parallelepiped and a hand model.
Our adopted SFM methodology produced fine results when the objects
present strong feature points, and so, are easy to detect and match along
the input images. However, we can conclude that even small errors in the
matching or in the epipolar geometry estimation can seriously compromise
the success of the remaining steps.
The models built using the GVC methodology were quite similar to the
real objects, be it in terms of shape or in color. Nevertheless, the recon-
struction accuracy was highly dependent on the quality of the results from
camera calibration and image segmentation steps. These can be two major
drawbacks in real-world scenes, because they can limit the application of
the GVC method. Moreover, the reflectance of their surfaces is an aspect
that must be considered for more accurate 3D reconstructions. In resume,
we can conclude that in controlled environments the GVC methodology is
capable to obtain adequate 3D static reconstructions of objects from im-
ages. In addition, its major contribution may be the fact that it is fully au-
tomatic and suitable for many real applications.
T hus, when comparing the two methods, we can conclude that, on one
hand, GVC performs better in 3D reconstruction of objects with complex
shapes and, on the other hand, SFM is better for unconstrained real-world
objects reconstruction.
Acknowledgements
References