Robust Model-Based 3D Object Recognition by Combining Feature Matching With Tracking
Incheol Kim
L .
We propose a unified framework of object recognition and tracking for a service robot to work in indoor environments. To solve the object variations, we propose modified local Zernike moments robust to various environmental changes such as illumination changes and pose changes. The recognized object can be tracked using Lie group methodologies to simplify representation and computation of a motion. The initial 3D CAD model alignment is performed using the homography computed in the process of object recognition.
Abstrac: - We propose a vision based 3D object recognition and tracking system, which provides high level scene descriptions such as object identification and 3D pose information. The system is composed of object recognition part and real-time tracking part. In object recognition, we propose a feature which is robust to scale, rotation, illumination change and background clutter. A probabilistic voting scheme maximizes the conditional probability defined by the features in correspondence to recognize an object of interest. As a result of object recognition, we obtain the homography between the model image and the input scene. In tracking, a Lie group formalism is used to east the motion computation problem into simple geometric terms so that tracking becomes a simple optimization problem. An initial object pose is estimated using correspondences between the . model image and the 3D CAD model which are predefined and the homography which relates the model image to the input scene. Results from the experiments show the robustness of the proposed system. I. INTRODUCTION
One of the most important tasks for an intelligent service robot is to identify objects of interest and to track them in indoor environment. If a mobile robot is commanded to bring a certain object, it has to recognize what objects are in the scene in advance, then track the recognized object for further actions such as grasping and manipulation. There have been many methods to solve object recognition problems and tracking problems separately, but seldom to solve both problems simultaneously. Object recognition is defined as the process of extracting information such as name, size, position, pose, and functions related to the object. In this paper, we restrict the definition to extracting objects identification and pose information only. Recently, there has been much development in object recognition, hut it still remains on the level of recognizing object in a well controlled environment. This is originated from the object variations such as view angle changes and illumination changes. Further more, it becomes a difficult task when an object is occluded by others or placed in a cluttered environment [ l 21. To solve these problems, various approaches have been proposed using invariant, 3D CAD model and appearance [3-51.Recently, reflecting the characteristic of the human visual object recognition methods based on local image
Figure 1 shows the proposed system of 3 0 object recognition by combining feature matching with tracking. The system is consisted of three components: the model DB generation, object recognition and tracking. In model generation, four points of an image model are pre-selected corresponding to the 3D CAD model points. By performing object recognition using local Zernike moments, we can obtain a homography between the image model and the input scene. By transforming the image model points using this homograpby, we can establish correspondences between the input scene feature point and 3D CAD feature points. Using these correspondences, we can estimate the initial pose of the recognized object. Then we perform object tracking using Lie group SE(3) in real-time.
Sensor lmut
We propose a new object recognition method which is robust to scale, planar rotation, illumination change, occlusion and background clutter.
A . Proposed object recognilion system
The object recognition system is composed of robust feature extraction part and feature matching part. Figure 2 shows the overall system of the object recognition part.
with the conditions ( n -1ml) : even, /mi 5 n . As the Zemike moments are calculated using radial polynomials shown in figure 3, they have inherent rotation invariant property. Especially, Zemike moments have superior properties in terms of image representation, information redundancy and noise characteristics [14].
In off-line process, Zemike moments are calculated around interest points detected from the scale space of image model and are stored in a database. We can recognize an object by probabilistic voting of these Zemike moments in on-line process. The locality of the Zemike moment provides some robustness to occlusion and background clutter. We verify the recognition by aligning model features to input scene. In this process, the homography between image model and input scene is calculated. We determine the success of the recognition by the percentage of the outlier. The outlier isdetermined from the distance between scene feature position and transformed model feature position by homography.
But they are sensitive to scale and illumination changes. We reduced the scale problem by applying scale space theory to image model [15]. This method makes the comer extraction process more effective than using the image pyramid. The problem of illumination change is simply solved by normalizing the moments by the ZOO moment which is equivalent to the average intensity. Since local illumination change can be modeled as
f(xj,yi) = a J ( x i , r j )
where f ( x , y ) represents the intensity at ( x , y ) , f ( x , y ) is the new intensity after illumination change, a, means the rate of illumination change, m, denotes average intensity and Z means Zemike moment operator. Figure 4 shows the robustness of the modified Zemike moments to illumination changes. We can observe that the normalized Zemike moments are almost constant even under illumination changes.
If all the objects are equally probable and independent mutually, equation (IO) becomes
2 P ( H h I Mj)P(M,)
(1 1)
P ( H h ) = C P ( H h 1 M,)p(M,)
(b) With illmination invariant (a) Without illumit&tiooinvariant Fig. 4. Zemike moments and illumination changes
C. Probabilistic voting
In this paper, we propose a probabilistic voting based recognition method considering the stability of Zemike moments of the image model and the similarity of Zemike moments between the image model and the input scene. Object recognition using probabilistic voting means finding a model M i that maximizes
argmax P ( M i I S)
The term P ( H h I M,) is computed by considering the stability and similarity measure of matching pairs. Basically, P ( H h IM,) has to be large when both the stability and the similarity measures are high. The stability measure reflects the incompleteness of repeatability of interest points and the similarity measure reflects the closeness of feature vectors in Euclidean space.
where S represents the input scene. In this paper, we assumed only one model M i for each object in frontal view. But the model images can be extended to represent full 3D views. For each input feature, we form a set of matching pairs consisting of the corresponding model features whose Zernike moments are similar to that of the input feature. For k-th input feature, a set of matching pairs is
(2) Similarity ( C, ): As defined equation (14), the O similarity of a feature pair is inversely proportional to the Euclidean distance between the Zemike moments of input scene and that of the corresponding image model.
,wk ( z k , i k j ) 1 z kE ={
S , E~ , i = l , Z ; . . , ~ , } M~ ~
where Z,.denotes the Zemike moments of the k - th input feature, Zy means the Zernike moments of the model feature and N , is the number of corresponding model features. The hypothesis is
Hh = Z , j l Z 3 ?i = ~ @ N,} k j i ) ( i . j l,2,..., )h
P ( ( Z j , i , ) I Mi) =
where Ns is the number of interest points of input scene. The set of total feature pairs can be written as
H = { H IUH 2 . . . uHN, ]
if Z , E f ( M O
where N , is the size of product space as MFk x Hh . In each hypothesis, a model point can be assigned to multiple input points because corresponding pairs are made using Euclidean distance measure. Since H is composed of the model Zemike moments corresponding to the input scene S , we can substitute H for S . By the Bayes theorem, equation (6) becomes
a is the normalization factor and E is assigned if the corresponding model feature doesnt belong to a certain model. We use the approximate nearest neighbor search algorithm to find matching pairs [16]. It takes log time for linear search space.
D. Recognition verification
Recognition results are verified using feature pairs. We find optimal feature pairs by rejecting outliers using the area ratio which is preserved under the affine
transformation. For given four points (4,P2,4, P,) shown in figure 5, we calculate the area ratio S , / S , = A P 2 ~ P 4 1 A P , P 2 P 3in the image model and S,'I S,'= AP', P',/ AP, P', in the input scene. If the P', P', two ratios differ with by a predetermined threshold, we reject the fourth feature point. We assume the first three points are matched.
Second, we experimented under complex environment. Figure7-(a) shows some examples of recognition results under view changed in cluttered background. Figure 7-(b) shows experimental results when cluttered background, illumination change and occlusion exist.
(a) Local Ceahlre of model (b) Local feahlre of scene Fig. 5 . Rejection of outliers using area ratio
Then we calculate an initial homography based on these optimal feature pairs randomly selected. A LMedS based method selects an optima1 homography from feature pairs. This homography is used as the initial pose estimation of 3D CAD model. We can also determine the percentage of outliers by applying this homography to the remaining image model points.
model 6
model 11
model 12 model 17
model 1
model 8
model 9
model 10
Fig. 7. Sample object recognition results Cor images taken under various conditions
model 13
model 14
model 15
model 16
model 18
model 1S
model 20
Table 1 shows the statistical results of )the object recognition to model-IO. It is failed to recognize some scenes where sever specular, blurring, background clutter and low illumination exist.
Model #
Fig. 6. Image models used for object recognition
IO First, we evaluated the recognition system performance for independent environments, such as scale change, illumination change, background clutter and occlusion. The system can recognize objects under scale changed up to factor 3, illumination intensity changed up to factor 2; somewhat complex background and half occlusion. For these cased, success rate is above 95%.
2 126
the recognized object continuously. We utilize the general paradigm of Lie group formalism [ 171. Conventional approaches often estimate an initial pose of 3D CAD model manually. In this paper, we initialize the us,ng the pose of 3~ CAD homography computed in the recognition stage. If we have n correspondences between the recognizd image model points X = (um,vm,l) and the input Scene , T the points X, =(u,,v,,~)~, relations are written as
X, = Hx,
where H i s the 3 by 3 homography. From equation (17), we obtain
Figure &(a) shows the input image which contains model-10. Interesting points computed by the Harris comer detector are shown in figure 8-(b). By the probabilistic voting scheme proposed in this paper, we correctly recognized model-IO as shown in figure 8-(c). Finally, figure X-(d) shows the transformed image model features to the input scene using the homography computed from the feature correspondencesSince we have the initial pose matrix, we can project the 3D CAD model to the input scene . . . . a- shown in figure 9. . ~. "
H = X,X,r(X,X,T)-'
The steps to compute the initial pose of 3D CAD model include Specify 4 corresponding points between a 3D CAD model and the corresponding image model. Compute the homography through object recognition. Transform the pre-selected model image points to th e input scene using the homography obtained in step2. The image feature points corresponding to the 3D CAD model points are obtained and transformed to the camera coordinate system using pre-determine camera intrinsic parameters [IS]. Calculate the rotation and translation matrices from these feature pairs.
initial a l i m e n t
frame #20
hame #30
V. EXPERIMENTAL RESULTS We have tested the proposed system using various models shown in figure 6. Figure 8 shows the results of object recognition for each stage.
frame #50
frame #60
frame #70
frame #SO
frame #90
The tracking system shows fairly good performance for various object motions through a long sequence of images. It takes 1-2sec for object recognition. The tracking
Fig. 8. Object recognition process for model-IO.
performance shows 10-15 framesisec under AMD 1.4 GHz. VI. CONCLUSIONS
In this paper we have proposed a practical object recognition and tracking system. Main contributions of the proposed system are: First, we proposed an object recognition system which is robust to view position, illumination change, occlusion and background clutter based on modified local Zemike moments. Second, we proposed an automatic pose initialization technique to track the recognized object in real-time by using the bomograpby computed from the feature correspondences. The experimental results demonstrate that robust object recognition is feasible by combining feature matching with 3D object tracking. For future work, we will develop a recognition system that can recognize and track full 3D object laid in random poses.
