2 1 2D Visual Servoing
2 1 2D Visual Servoing
2 1 2D Visual Servoing
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
I. INTRODUCTION ISION feedback control loops have been introduced in order to increase the exibility and the accuracy of robot systems [12], [13]. Consider for example the classical positioning task of an eye-in-hand system with respect to a target. After the image corresponding to the desired camera position has been learned, and after the camera and/or the target has been moved, an error control vector can be extracted from the two views of the target. A zero error implies that the robot end-effector has reached its desired position with an accuracy regardless of calibration errors. However, these errors inuence the way the system converges. In many cases, and especially when the initial camera position is far away from its desired one, the target may leave the camera eld of view during servoing, which thus leads to failure. For this reason, it is important to study the robustness of visual servoing with respect to calibration errors. Vision-based robot control using an eye-in-hand system is classied into two groups [12], [13], [19]: position-based
Manuscript received April 3, 1998; revised February 12, 1999. This paper was supported by INRIA and the National French Company of Electricity Power: EDF. This paper was recommended for publication by Associate Editor H. Zhuang and Editor V. Lumelsky upon evaluation of the reviewers comments. E. Malis was with IRISA/INRIA Rennes, Rennes cedex 35042, France. He is now with the University of Cambridge, Cambridge, U.K. F. Chaumette is with IRISA/INRIA Rennes, Rennes cedex 35042, France. S. Boudet is with DER-EDF, Chatou cedex 78401, France. Publisher Item Identier S 1042-296X(99)03918-X.
and image-based control systems. In a position-based control system, the input is computed in the three-dimensional (3-D) Cartesian space [20] (for this reason, this approach can be called 3-D visual servoing). The pose of the target with respect to the camera is estimated from image features corresponding to the perspective projection of the target in the image. Numerous methods exist to recover the pose of an object (see [6] for example). They are all based on the knowledge of a perfect geometric model of the object and necessitate a calibrated camera to obtain unbiased results. Even if a closed loop control is used, which makes the convergence of the system possible in presence of calibration errors, it seems to be impossible to analyze the stability of the system. On the other hand, in an image-based control system, the input is computed in the 2-D image space (for this reason, this approach can be called 2-D visual servoing) [7]. In general, image-based visual servoing is known to be robust not only with respect to camera but also to robot calibration errors [8]. However, its convergence is theoretically ensured only in a region (quite difcult to determine analytically) around the desired position. Except in very simple cases, the analysis of the stability with respect to calibration errors seems to be impossible, since the system is coupled and nonlinear. Contrary to the previous approaches, we will see that it is possible to obtain analytical results using a new approach which combines the advantages of 2-D and 3-D visual servoing and avoids their respective drawbacks. This new approach is called 2-1/2-D visual servoing since the used input is expressed in part in the 3-D Cartesian space and in part in the 2-D image space [14]. More precisely, it is based on the estimation of the camera displacement (the rotation and the scaled translation of the camera) between the current and desired views of an object. It must be emphasized that, contrary to the 3-D visual servoing, the partial camera displacement estimation does not need any 3-D model of the target, which increases the versatility and the application area of visual servoing. Since the camera rotation between the two views is computed at each iteration, the rotational control loop is immediately obtained. In order to control the translational camera d.o.f, we introduce extended image coordinates of a reference point of the target. We thus obtain a triangular interaction matrix with very satisfactory decoupling properties. It is interesting to note that this Jacobian matrix has no singularity in the whole task space. This allows us to obtain the convergence of the positioning task for any initial camera position if the camera intrinsic parameters are known. If the camera intrinsic parameters are not perfectly known, the estimated control vector can be analytically computed as a function of camera calibration errors. Then, the necessary and sufcient conditions
239
for the local asymptotic stability in the presence of camera calibration errors are easily obtained. Moreover, owing to the simple structure of the system, sufcient conditions for global asymptotic stability are presented. Using an adaptive control law, we can nally ensure that the target will always remain in the camera eld of view. Experimental results conrm that 2-1/2-D visual servoing is more efcient than existing control schemes. The paper is organized as follows: in Section II and Section III, we briey recall 3-D and 2-D visual servoing respectively. In Section IV, we show how to use the information extracted from Euclidean partial reconstruction to design our 2-1/2-D visual servoing scheme. Its robustness with respect to camera calibration errors is analyzed in Section V. The experimental results are given in Section VI. A more robust adaptive control law is presented in Section VII and its robustness with respect to camera and hand-eye calibration errors is experimentally shown. II. THREE-DIMENSIONAL
VISUAL
of the camera calibration errors, it seems to be impossible to analyze the stability of the system [5]. III. TWO-DIMENSIONAL VISUAL SERVOING The control error function is now expressed directly in the 2-D image space (see Fig. 2). Let be the current value of visual features observed by the be the desired value of to be reached in the camera and image. The time variation of is related to camera velocity by [7] (1) is the interaction matrix (also called the image where depends on the Jacobian matrix) related to . Note that of each selected feature. depth The interaction matrix for a large range of image features can be found in [7]. The vision-based task (to be regulated to 0), corresponding to the regulation of to , is dened by (2) where is a matrix which has to be selected such that in order to ensure the global stability of the control law. The optimal choice is to consider as the pseudoof the interaction matrix. The matrix thus inverse of each target point used in visual depends on the depth servoing. An estimation of the depth can be obtained using, as in 3-D visual servoing, a pose determination algorithm (if a 3D target model is available), or using a structure from known motion algorithm (if the camera motion can be measured). may lead the system close However, using this choice for to, or even reach, a singularity of the interaction matrix. Furthermore, the convergence may also not be attained due to local minima reached because of the computation by the control law of unrealizable motions in the image [5]. Another choice is to consider as a constant matrix equal , the pseudo-inverse of the interaction matrix to and , where is an approximate computed for at the desired camera position. In this simple value of case, the condition for convergence is satised only in the neighborhood of the desired position, which means that the
SERVOING
Let be the coordinate frame attached to the target, and be the coordinate frames attached to the camera in its desired and current position respectively (see Fig. 1). , of at least four Knowing the coordinates, expressed in points of the target [6] (i.e. the 3-D model of the target is supposed to be perfectly known), it is possible from their projection to compute the desired camera pose and the current camera pose. The camera displacement to reach the desired position is thus easily obtained, and the control of the robot end-effector can be performed either in open loop or, more robustly, in closed-loop. The main advantage of this approach is that it directly controls the camera trajectory in Cartesian space. However, since there is no control in the image, the image features used in the pose estimation may leave the image (especially if the robot or the camera are coarsely calibrated), which thus leads to servoing failure. Also note that, if the camera is coarse calibrated, or if errors exist in the 3-D model of the target, the current and desired camera poses will not be accurately estimated. Finally, since the error made on the pose estimation cannot be computed analytically as a function
240
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
The work described in [2], closely related to ours, will be discussed afterward. Let be the intrinsic parameters matrix of the camera
(3)
and are the pixels coordinates of the principal where and are the scaling factors along the and axes point, (in pixels/meters), is the angle between these axes and is is the transformation matrix between the the focal length. of an pixel coordinates and the normalized coordinates image point (4) is Assuming that the camera calibration is known (i.e., known), the Euclidean homography is calculated as follows: (5) is computed, it can be decomposed as the sum of a After rotation matrix and of a rank 1 matrix [9] (6) (i.e., where is the rotation matrix between frames and ), is the the homography of the plane at innity , and is dened unit vector normal to expressed in being the translation vector between and as being the distance of to . From and the image features, it is thus possible to determine the motion parameters and the structure of the reference plane. For example, the distances and are unknown (where is the current distance between and ), but the ratio can easily be estimated. the vector normal to , expressed Indeed, noting in , we have [15] (7) Furthermore, ratio between the unknown depth can be computed as lying on and of a point
convergence may not be ensured if the initial camera position is too far away from the desired one. Once again, several examples leading to failure are given in [5]. IV. 2-1/2-D VISUAL SERVOING The main drawback of 3-D visual servoing is that there is no control in the image which implies that the target may leave the camera eld of view. Furthermore, a model of the target is needed to compute the pose of the camera. 2-D visual servoing does not explicitly need this model. However, a depth estimation or approximation is necessary in the design of the control law. Furthermore, the main drawback of this approach is that the convergence is ensured only in a neighborhood of the desired position (whose domain seems to be impossible to determine analytically). The objective of this section is to present a control scheme avoiding these drawbacks. We will see that a satisfactory solution is to combine image features and 3-D data. This 3-D information can be obtained by performing partial pose estimation as described in the next subsection. A. Scaled Euclidean Reconstruction dening a reference Consider three 3-D target points plane (see Fig. 3). It is well known that the resulting image in the current camera frame , are related to the points in the desired camera frame corresponding image points , by a projective homography such that [9]. On one hand, if the target is known to be can be estimated solving a linear system using at planar, least 4 points of this plane. On the other hand, if the target is a nonlinear problem. is not planar, the estimation of and 5 not If at least 8 points are available (3 to dene belonging to ), it is possible to estimate the homography matrix at video-rate using for example the linearized algorithm presented in [16]. Other classical, but less robust, linearized methods for computing the camera displacement through the recovery of the epipolar geometry can be found in [11] and [4]. Such classical methods have recently been used in visionbased control [10], [2]. In [10], a stereovision system is used.
(8) Finally, we can also obtain (9) These parameters are important since they are used in the design of our control scheme. We will see in Section V how it is still possible to obtain an analytical form of the estimated motion parameters in the presence of camera calibration errors.
241
B. Control Scheme In order to control the orientation of the camera, we natbetween and urally use the 3-D estimated rotation (which has to reach the identity matrix). Let be the rotation axis and the rotation angle obtained from . Instead of the or , the vector classical control vectors was chosen since, as shown below, this representation has no singularity in the whole workspace. The time derivative of can be expressed as a function of the camera velocity screw as (10)
The positioning task can be described as the regulation to zero of the following task function : (19) are directly where the rst two components of computed from the current and desired images, and its last , is estimated using (9). The time component, equal to derivative of the task function is related to the camera velocity by (20) where is an upper triangular matrix given by (21)
(i.e., out and it is thus singular only for of the possible workspace). We can also note that for small values of . We now design the position control vector. Consider a point (called the reference point) lying on the chosen reference plane of the target. The time derivative of its coordinates, expressed in the current camera frame, can be written as (13) Let us dene the extended image coordinates as follows: (14) is a supplementary normalized coordinate. where The time derivative of the extended image coordinates can be written as (15) is unknown and where given by is an upper triangular matrix
This matrix is singular only if the camera optical center lies (in that case, 0). on the reference plane 0, 0, 0 Other degenerate cases occur when . The workspace free of or, as already stated, when singularity is thus composed of the area in front of . In fact, if the target is known to be not planar, it is even possible to increase this workspace by changing the points used to dene at the nearing of a singularity. toward and Finally, the exponential convergence of toward 0 can be obtained by imposing (where tunes the convergence rate). If the target is known to be motionless (see [1], [3], and [17] otherwise), the corresponding control law is given by [7] (22) where is the camera velocity sent to the robot controller, is an approximation of . Since depends on the and (that is the only value which can not be unknown distance measured or estimated in our method), an approximate value has thus to be chosen during the off-line learning stage . (when the desired image is acquired), and introduced in has not to be precisely determined (by hand in However, the following experiments) since it has a small inuence on the stability of the system. More precisely, it inuences the timeto-convergence of the translational velocity and the amplitude of the possible tracking error due to a wrong compensation of the rotational motion. As far as the tracking error is concerned, it is proportional to the rotational velocity and thus disappears when the camera is correctly oriented. We will see in the next such section that it is possible to determine bounds on that the global stability of the system is ensured. Finally, the control law is given by (23)
(16) where obtain is given by (8). Then, using (13) and (15), we nally (17) where is the matrix (18)
Indeed, due to the particular form of the matrix since [15] set
, we can
(24)
242
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
The corresponding bloc-diagram is given in Fig. 4. Let us is an upper triangular square matrix emphasize that without any singularity in the whole task space. Such a decoupled system also provides a satisfactory camera trajectory in the Cartesian space. Indeed, the rotational control loop is decoupled from the translational one, and the chosen reference point is controlled by the translational camera d.o.f. such that its trajectory is a straight line in the state space, and thus in the image. If a perfect model is available, the reference point will thus always remain in the camera eld of view whatever the initial camera position. Of course, this property does not ensure that all the target points remain visible. However, in practice, it would be possible to change the chosen reference point during servoing, and we could select as reference point the target point nearest the bounds of the image plane. Indeed, it is possible to consider as reference point any point of the target (and not only points lying on ). In fact, for points lying outside , the only difference in the previous computations is that the value of and are not given by (8) and (9) but have a different form (see [15] for the corresponding equations). In practice, we have not considered the possibility of changing the reference point during the servoing since it would lead to a discontinuity in the translational components of the camera velocity at each change. Another strategy would be to select the reference point as the nearest of the center of gravity of the target in the image. This would increase the probability that the target remains in the camera eld of view, but without any complete assurance. Therefore, we have preferred to use an adaptive control law, described in Section VII, to deal with this problem. Finally, and contrary to 2-D and 3-D visual servoing, it will be shown in the following section that it is possible to obtain the necessary and sufcient conditions for local asymptotic stability, and sufcient conditions for global asymptotic stability in the presence of camera calibration errors. Remark it is possible to design a control law directly in has to reach 0, which thus the Cartesian space (such that implies achievement of the positioning task). A scheme very similar to classical 3-D visual servoing can hence be performed without knowing the 3-D structure of the target. In [2], such a scheme is used where the direction of translation is obtained from the essential matrix, instead of the homography matrix. However, as for 3-D visual servoing, such a control vector does not ensure that the considered object will always remain in the camera eld of view, particularly in the presence of important camera or robot calibration errors.
It is also possible to control the camera position directly in the image space (as is done for 2-D visual servoing, the main difference being that the orientation is controlled using the result of the motion estimation). Contrary to 2-D visual servoing, in the present case, the decoupled control of the camera orientation allows the system to avoid local minima. However, the stability analysis is as difcult as for 2-D visual servoing. Furthermore, at least 2 image points are necessary, and the coupling of the related interaction matrix leads to an unpredictable camera trajectory. Experimental results show that, using this approach when the camera displacement is very important, the robot may unfortunately reach its joint limits, or the target may become so little in the image that visual servoing has to be stopped. V. SYSTEM STABILITY IN PRESENCE OF CAMERA CALIBRATION ERRORS If the camera is not perfectly calibrated and [see (3)], the measured image point instead of written as a function of the real image point as is used can be (25) . Furthermore, the estimated homography where matrix is given by (26) It can be decomposed as the sum of a matrix similar to a rotation matrix and of a rank 1 matrix (27) and [15]. The eigenvalues of depend on the angle of rotation , and its eigenvector corresponding to is not the unit eigenvalue is the axis of rotation . Matrix a rotation matrix, but is similar to , which implies that the two matrices have the same eigenvalues and the eigenvectors are the eigenvectors of multiplied by matrix . The of estimated rotation angle and the estimated rotation axis , , can thus be written as a function extracted directly from of the real parameters and of the calibration errors where and (28)
It must be emphasized that, as well as the rotation angle , the are computed without error ratios and (29) Finally, since is given by (30) The task function can thus be reconstructed as (31)
243
with
where is the sub-matrix of containing the is the (2 1) error on the pixel lengths [see (3)], sub-vector containing the error on the principal point and . It is interesting to note that we have 0 if , since the estimated task function depends and only if linearly of the real (but unknown) one . Of course, we here assume that the intrinsic camera parameters do not change after the off-line acquisition of the desired image. In other words, since is measured from the current and desired images, we if and only if the target points have reached have their desired position in the image, the function computing from the image points being bijective. The closed-loop system taking into account the camera calibration errors can thus be written (32) vector eld dened on an open subset Function is a of . It is easy to show the existence and uniqueness of the equilibrium point: Proposition 1: The only point of equilibrium for , i.e., a such that 0, is 0. point Proof: The existence of the equilibrium point is evident 0 , then 0 . This equilibrium since if 0, . Since point is unique if and only if is upper triangular, its determinant can be easily matrix calculated
increase before convergence toward zero. Therefore, it is more interesting to nd sufcient conditions to ensure the decreasing of at each iteration, which ensures the global asymptotic stability of the system. It is well known that such a sufcient condition is the . Owing to the particular form positiveness of matrix (upper triangular matrix), we have been able to obtain of the following result: Theorem 2 (Global Asymptotic Stability): The differential system (32) is globally asymptotically stable, only if conditions (34) are veried, and if
(35) (36) (37) (38) is the tangent of the vision angle where ) and the maximum value of ( is
(39) The proof is given in [15]. Let us note that these sufcient at each iteration, but conditions ensure the decreasing of do not ensure that the target will remain in the camera eld of view, since such a constraint is not taken into account in . In fact, we can even not be the positiveness of matrix sure that the reference point will always be visible, since the does not necessarily imply the decrease of decrease of . As already stated, the visibility constraint will be taken into account in Section VII. The global asymptotic , and stability is however interesting since it implies that , will not have an oscillatory behavior. To our hence knowledge, theorems 1 and 2 are the rst analytical results of the stability of any visual servoing scheme able to control the six camera d.o.f. in presence of camera calibration errors. Conditions (3538) can be analyzed in order to have an approximate idea of the convergence domain. However, they and (which depends on the rotation axis , depend on and thus on the current camera position). In order to simplify the analysis, we now present the following corollary whose proof is given in [15]. Corollary 1 (Global Asymptotic Stability): The differential system (32) is globally asymptotically stable, only if conditions (34) are veried, and if (40) (41)
(33)
where
. We thus have 0, since 0 and 0 in the non degenerate cases 0 or 0 for example). (such that Therefore, there is no singularity in the workspace and, if the task function decreases, it decreases toward 0. We now give the conditions to ensure the local asymptotic stability of the system. Theorem 1 (Local Asymptotic Stability): The differential system (32) is locally asymptotically stable around the if and only if equilibrium point and (34)
Let us note that these conditions are also necessary and sufcient for global asymptotic stability of the orientation control subsystem. The proof of this theorem is given in [15]. In practice, these conditions are of course easily veried. However, it is well known that the error may considerably
(42)
244
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
where
Let us now analyze condition (41). This condition can be written as (46) which implies (47) are plotted versus the ratio In Fig. 6, the bounds for on the axis and versus the ratio on the axis 0.364, which (in order to obtain these results, we have set corresponds to a camera with a 20 vision angle, and 20 ). In the worst considered case, when 0.5 and then 0.44. Since , this means that the 3-D vector associated with the estimated 24 . principal point must be in a cone of angle The previous conditions are quite easy to satisfy. However, the stability domain is not so large since condition (42) must also be veried. This last condition can be written as a second degree inequality: (48) where (44)
These sufcient conditions are more restrictive than the conditions imposed by Theorem 2. However, these new conditions of do not depend on the axis of rotation , and the error of the pixel the principal point is separated from the error lengths. The analysis is thus simplied even if the conditions seem more complex. Let us rst analyze condition (40) in more details. This condition is equivalent to the following one: (43) Since [see (3)], we have
the error made of the estimation of the Let us note . We angle . In general, the estimated angle is set to show now that this choice does not have a great inuence on the stability of the system. Indeed, we thus have and condition (44) can be written as The solution of (48) is (45) (49) are plotted versus the ratio In Fig. 5, the bounds for on the axis and versus the ratio on the axis. 63 if the ratios are well From this gure, we obtain estimated. In the worst considered case, when the estimated 1/5 and the real ratio is 5 (which ratio is corresponds to a rectangular pixel with the -length ve times 22 . For a more common camera the -length), we have 3/4, if 4/3 then 56 , which with seems impossible to not satisfy. Example 1: We rst consider a perfect camera calibration . In that simple case, and obtain the possible bounds for and we have . Condition (42) can thus be written (50)
245
(b)
The corresponding solution of (49) is (51) The two bounds are plotted in Fig. 7 versus . From this gure, if we consider for example a camera with a 20 vision angle 0.364 ), the stability condition is veried if 0.24 (then 4.22 . If the real distance is 50 cm, the system will asymptotically converge for any initial position in the is chosen between 12 and 211 cm. This result task space if denitively validates the robustness of our control scheme in absence of camera calibration errors. Moreover, similar results can be obtained by considering camera calibration errors. Since condition (42) depends on the ve camera intrinsic parameters, we rst study the stability and a variable , and after, with a variable with a xed and a xed . It must be noted that, if , . then 1.5 (which Example 2: If we consider 5 , the means 50% error on each pixel length) and two corresponding bounds are plotted in Fig. 8. For example, 0.0875 (which corresponds to a cone with a 5 if angle), then 0.45 1.7. In order to obtain a simpler interpretation of this condition, we suppose now that (which means that the normal to the reference plane ). If the real distance is again 50 cm, is the system will asymptotically converge for any initial camera is chosen between 23 and 85 cm. position if 0.0875 (which correExample 3: We x now 5 . The sponds to a cone with a 10 angle) and again upper and lower bounds for are plotted in Fig. 9 versus the on the axis and versus the ratio on the ratio axis. For a common camera with 3/4, we obtain
0.53 1.51 if 4/3. If the real distance is again 50 cm, the system will asymptotically converge is chosen between 26 and 76 cm. for any initial position if A more complete analysis is given in [15]. Let us emphasize that conditions (40)(42) are more restrictive than conditions decreases at (35)(38). When they are ensured, error each iteration whatever the initial camera position in the whole task space. If this initial position is always in a known region, the stability analysis can be made from conditions (35)(38) taking into account the restriction on the task space, and thus a larger stability domain will be obtained. More generally, all these conditions are only sufcient, and the convergence can be realized even for larger errors. In the next section, we will see that our method is also robust in presence of hand-eye calibration errors (the sufcient conditions for global asymptotic stability of the system in presence of such supplementary errors can be found in [15]). VI. EXPERIMENTAL RESULTS The control law has been tested on a seven d.o.f. industrial robot MITSUBISHI PA10 (at EDF DER Chatou) and a six d.o.f. Cartesian robot AFMA (at IRISA). The camera is mounted on the robot end-effector. In the presented experiis set to 50 cm while its real value is 60 cm. As ments, far as calibration is concerned, two different set of parameters have been used: 1) coarse calibration: the pixel and focal lengths given by the camera manufacturer are used. The image center has been used for the principal point. The transformation matrix between the camera and the robot end-effector frames is set with an accuracy to within 1 cm for translation and 5 for rotation. 2) bad calibration: a supplementary error is added to the camera intrinsic parameters (20%), as well as to the translation (5 cm on each axis) and to the rotation (5 on each axis) of the transformation matrix between the camera and the robot end-effector. We present rst the results obtained using coarse calibration for 3-D visual servoing and 2-D visual servoing. Then, the 2-1/2-D visual servoing results are presented using coarse and bad calibration. The images corresponding to the desired and initial camera position are given in Fig. 10(a) and (b), respectively. As can be seen on Table I, the corresponding
246
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
(a)
Fig. 10. Images of the target for the desired and the initial camera position. (a) Desired image. (b) Initial image. TABLE I MOTION PARAMETERS
Fig. 11. Trajectory of target points in the image for 3-D and 2-D visual servoing. (a) 3-D visual servoing. (b) 2-D visual servoing.
C. 2-1/2-D Visual Servoing We now present the results obtained using 2-1/2-D visual servoing. The three points of the target dening the reference plane are marked with a square in Fig. 10(a), and the chosen reference point is the nearest to the top of the image. Similar results may be obtained using another selection of the reference point. From the estimated homography, we get a partial estimation of the camera displacement. For example, the estimated rotation and direction of translation , using the initial and desired images, are given in Table I as a function of the camera calibration. Despite the coarse calibration which has been used, the estimation is quite precise (maximal rotational error is around 5 , as well as the angle error on the direction of translation). If a bad calibration is used, the rotational and translational errors may approximatively reach 7 and 9 , respectively. involved in the In the rst two experiments, the gain control law (23) was chosen constant. This explains that the convergence is very slow (approximatively 100 s). This is due has to be set to a small value in order to the fact that that the camera velocity be not too big at the beginning of the servoing. As it will be shown in the third experiment, can be automatically adapted in order to reduce the time to convergence. and the es1) Coarse Calibration: The error on timated rotation are plotted in Figs. 12(a) and (b) respectively. The computed control law is given in Figs. 12(c) and (d). We can now observe the convergence of the task function toward 0. The error on the coordinates of each target point is given in Fig. 12(e). We can note the convergence of the coordinates to their desired value, which demonstrates the correct realization of the task. Finally, the corresponding trajectory in the image is given in Fig. 12(f). The reference point trajectory can be easily identied since it looks like a straight line in the image. 2) Bad Calibration: We now test the robustness of our approach with respect to a bad calibration, as described previously. The obtained results are given in Fig. 13. As can be seen in Fig. 13(a), the convergence of the error is no longer perfectly exponential. This is due to the bad calibration of the (which had a very camera and the rough approximation of low inuence using a coarse calibration). However, even in this worse case, we can note the stability and the robustness of the control law. Contrary to the previous experiment, the trajectory of the reference point in the image is no longer a
camera displacement is very important. The target is composed by twelve white marks lying on three different planes (see Fig. 10). The extracted visual features are the image coordinates of the center of gravity of each mark. With such simple images, the control loop can easily be realized at video rate (i.e., 25 Hz). For large camera displacements, such as the one considered, point matching between initial and desired images is an important computer vision problem. This problem is not considered here, because of the simplicity of the considered target. Of course, we can note that it also occurs for 2-D visual servoing, and similarly for 3-D visual servoing since point matching between the image and the 3-D model of the target is needed in that case. A. 3-D Visual Servoing In this experiment, the camera position is controlled in the Cartesian space. As can be seen in Fig. 11(a), the target leaves the camera eld of view (for security reasons, the control scheme is stopped as soon as one of the target points is no longer visible in the image. Of course, the servoing could continue with less than 12 points and, if the system succeeds in converging, it implies that the lost points will come back into the image. However, we have considered that the loss of at least one point shows an unsatisfactory behavior). This failure is encountered because, using this scheme, there is absolutely no control in the image. The probability of failure increases considerably when a bad camera calibration is used or in the presence of hand-eye calibration errors. B. 2-D Visual Servoing In this experiment, the camera is fully controlled using classical image-based visual servoing. Great robustness with respect to calibration errors can thus be expected. However, the camera trajectory in the Cartesian space is not satisfactory because of coupling between the different visual features. This, once again, causes failure of servoing, since, as can be seen in Fig. 11(b), one target point leaves the image. This is due to the too large camera displacement from initial to desired poses.
247
(a)
(b)
(a)
(b)
(c)
(d)
(c)
(d)
(e)
(f)
(e)
(f)
Fig. 12. Results of 2-1/2-D visual servoing with coarse camera and hand-eye calibration: (a) error in extended image coordinates, (b) rotation u , (c) translational velocity, (d) rotational velocity, (e) error in image points coordinates, and (f) trajectories in the image of the target points.
Fig. 13. Results of 2-1/2-D visual servoing with bad camera and hand-eye calibration: (a) error in extended image coordinates, (b) rotation u , (c) translational velocity, (d) rotational velocity, (e) error in image points coordinates, and (f) trajectories in the image of the target points.
straight line since the camera is badly calibrated as well as the homogeneous transformation matrix between the camera and the robot end-effector frame. However, the convergence of the image points coordinates to their desired value demonstrates the correct realization of the task. 3) Bad Calibration and Adaptive Gain : In this experis automatically increased when the error iment, gain decreases. As can be seen in Fig. 14, the convergence rate has been divided by a factor of three while the initial camera position was very far away from its desired position (compare the initial value in Figs. 13(a), (b) and 14(a) and (b). The convergence has been reached in approximately 30 s, which is not so important on account of the large displacement to realize and the limited camera motion necessary for correct image tracking of the visual features. Of course, we can note that the convergence of the system is no longer exponential, and that the gain increase has added some noise to the control law (but not to the image features) near the convergence of the system. From the numerous experiments that have been realized (see [15]), we can conclude that, when 2-D or 3-D visual servoing succeeds, convergence is also reached with our
scheme, but with a more satisfactory behavior. Furthermore, the convergence domain of the 2-1/2-D visual servoing is indeed more large than for the two other schemes. We have to note however that our scheme sometimes fails. It occurs when some parts of the target occlude one or more image points (this problem is not encountered by considering a planar target) or when some points leave the image plane (once again, we always stop an experiment as soon as one point is no longer visible, even if it is not necessary). VII. ADAPTIVE GAINS We now present an adaptive control law which takes into account the constraint that the target has to remain in the camera eld of view. Another approach would consist in determining off line a specied trajectory in the Cartesian frame ensuring this constraint in the image plane. This problem of path planning seems to be very complex, and has not been considered here. As for adaptive control, two kinds of methods can be distinguished [18]: 1) gains adaptation: this kind of control has the purpose of improving the robustness to the calibration errors. The adaptation is based on the stability analysis of the
248
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
(a)
(b)
where to
if otherwise
(c) (d)
(54)
and being two parameters used to design the form of the in Fig. 15). bell-curve (see for example the function In our case, the system is constrained such that
(55)
(e)
(f)
Fig. 14. Same experiment using an adaptive gain : (a) error in extended image coordinates, (b) rotation u , (c) translational velocity, (d) rotational velocity, (e) error in image points coordinates, and (f) trajectories in the image of the target points.
closed-loop system and does not give any supplementary knowledge on the geometric parameters involved in the system (these parameters remain constant). 2) system parameters adaptation: this kind of control has the purpose of improving not only the stability, but also the dynamic behavior (and then the performance) of the system by the estimation of the involved geometric parameters. We have used the rst kind of adaptive control. The exponential decrease of the task function is again imposed (52) whose elements are but using a positive diagonal matrix function of the position of the image points. More precisely, can thanks to the particular form of the interaction matrix, be chosen as
(53)
where only the rst part of the state is constrained since the . The bounds rotation is free to evolve in and are dened by the CCD size, while and are respectively the and coordinates of the image can be points nearest these bounds. Similarly, determined experimentally, such that, for example, the image is not blurred. work like bandpass lters. For The elements of matrix example, if the error on grows because of the bad estimation become of the rotation, then the last three elements of smaller and the rotational control law decreases. The same (or on ) grows, then the thing happens if the error on become smaller, and only last four (or ve) elements of the stable part of the control law is considered. Using this control law, we have proved in [15] that the reference point never leaves the image even in presence of large camera and hand-eye calibration errors (providing the analysis of the robustness domain). This control law could also be used in 3-D visual servoing, but without any possible theoretical stability analysis. Furthermore, our technique cannot be used in 2D visual servoing, since the rotational control loop is not decoupled from the translational one using this scheme. We present now an experiment showing the behavior of the system using the adaptive control law. The used target is now planar (see Fig. 16) and the chosen reference point is marked with a square in the image. Similar results can be obtained with other targets [15]. In order to prove that the adaptive control law allows a larger robustness domain, a supplementary error was added to
249
(a)
(b)
Fig. 16. Images of the target for the desired and the initial camera position. (a) Desired image. (b) Initial image.
the top of the image. The gain decreases during the rst 50 iterations. The rotational velocity thus decreases, and the target does not leave the image. Then, the target starts to move on the left (since the control on is stable), and thus, comes back to the center. The gains increase progressively to 1 (iteration 4000). At this moment, the two points on the bottom of the image, which have gone afterward their desired position since the system is badly calibrated, move in the bottom of the image. The gain starts again to decrease until about 0.6. At the same time, the rotation continues to decrease toward zero. Therefore, the error decreases and the gain can increase to 1 until the convergence. In this experiment, the convergence rate was slow since we only wanted to show the behavior of the system when only the elements of were changed. Once again, the convergence rate can be improved by increasing the gain when the error decreases. VIII. CONCLUSION In this paper, we have proposed a new approach to visionbased robot control which presents many advantages with respect to classical position-based and image-based visual servoing. This new method does not need any 3-D target model, nor a precise camera calibration and presents very interesting decoupling and stability properties. Thanks to its simple structure, analytical results on its robustness with respect to calibration errors have been obtained. Experimental results show the validity of our approach and its robustness not only with respect to camera calibration errors, but also to hand-eye calibration errors. More experimental results can be found in [15]. One of the drawbacks of our method is that, for a non planar target, at least eight points are necessary to estimate the homography matrix, while at least four points are theoretically needed in the other schemes. Another drawback is that our method is more sensitive to image noise than 2-D visual servoing, since this scheme directly uses visual features as input of the control law, without any supplementary estimation step. Future work will thus be devoted to improve the robustness of our method with respect to image noise, and to the use of 2-1/2-D visual servoing on real objects and complex images. ACKNOWLEDGMENT The authors would like to thank the team manager and the researchers of the Teleoperation/Robotics group, DER Chatou, for their participation and help, C. Samson and the anonymous reviewers for their valuable comments, and T. Drummond for careful reading. REFERENCES
[1] P. K. Allen, A. Timcenko, B. Yoshimi, and P. Michelman, Automated tracking and grasping of a moving object with a robotic hand-eye system, IEEE Trans. Robot. Automat., vol. 9, pp. 152165, Apr. 1993. [2] R. Basri, E. Rivlin, and I. Shimshoni, Visual homing: Surng on the epipoles, in IEEE Int. Conf. Comput. Vision, ICCV98, Bombay, India, Jan. 1998, pp. 863869. [3] F. Bensalah and F. Chaumette, Compensation of abrupt motion changes in target tracking by visual servoing, in IEEE/RSJ Int. Conf. Intell. Robots Syst., IROS95, Pittsburgh, PA, Aug. 1995, vol. 1, pp. 181187.
(a)
(b)
(c)
(d)
(e)
(f)
Fig. 17. Robust control law: (a) error in extended image coordinates, (b) rotation u , (c) translational velocity, (d) rotational velocity, (e) gains fu , fv , and fu fv , and (f) trajectories in the image of the target points.
the translation (20 cm on each axis) and to the rotation (20 on each axis) of the transformation matrix between the camera and the robot end-effector. With such a bad calibration and a large initial camera displacement, the target leaves the camera eld of view using classical 2-D and 3-D visual servoing, and . The distance using 2-1/2-D visual servoing with is set again to 50 cm while its real value is 60 cm. The results obtained using the adaptive control law are shown in Fig. 17. At the beginning of the servoing, the gains are equal to 1 since the target is in the center of the image. When the visual servoing starts, the target moves rapidly at
250
IEEE TRANSACTIONS ON ROBOTICS AND AUTOMATION, VOL. 15, NO. 2, APRIL 1999
[4] B. Boufama and R. Mohr, Epipole and fundamental matrix estimation using the virtual parallax property, in IEEE Int. Conf. Comput. Vision, ICCV95, Cambridge, MA, 1995, pp. 10301036. [5] F. Chaumette, Potential problems of stability and convergence in image-based and position-based visual servoing, in The Conuence of Vision and Control, LNCIS Series, D. Kriegman, G. Hager, and A. Morse, Eds. New York: Springer Verlag, 1998, vol. 237, pp. 6678. [6] D. Dementhon and L. S. Davis, Model-based object pose in 25 lines of code, Int. J. Comput. Vision, vol. 15, nos. 1/2, pp. 123141, June 1995. [7] B. Espiau, F. Chaumette, and P. Rives, A new approach to visual servoing in robotics, IEEE Trans. Robot. Automat., vol. 8, pp. 313326, June 1992. [8] B. Espiau, Effect of camera calibration errors on visual servoing in robotics, in Proc. 3rd Int. Symp. Experimental Robot., Kyoto, Japan, Oct. 1993. [9] O. Faugeras and F. Lustman, Motion and structure from motion in a piecewise planar environment, Int. J. Pattern Recognit. Artif. Intell., vol. 2, no. 3, pp. 485508, 1988. [10] G. D. Hager, A modular system for robust positioning using feedback from stereo vision, IEEE Trans. Robot. Automat., vol. 13, pp. 582595, Aug. 1997. [11] R. I. Hartley, In defense of the eight-point algorithm, IEEE Trans. Pattern Anal. Machine Intell., vol. 19, pp. 580593, June 1997. [12] K. Hashimoto, Ed, Visual Servoing: Real Time Control of Robot Manipulators Based on Visual Sensory Feedback, of World Scientic Series in Robotics and Automated Systems. Singapore: World Scientic, 1993, vol. 7. [13] S. Hutchinson, G. D. Hager, and P. I. Corke, A tutorial on visual servo control, IEEE Trans. Robot. Automat., vol. 12, pp. 651670, Oct. 1996. [14] E. Malis, F. Chaumette, and S. Boudet, Positioning a coarse-calibrated camera with respect to an unknown planar object by 2-D 1/2 visual servoing, in Proc. 5th IFAC Symp. Robot Contr. (SYROCO97), Nantes, France, Sept. 1997, vol. 2, pp. 517523. [15] E. Malis, Contributions a ` la mod elisation et a ` la commande en asservissement visuel, Ph.D. Thesis, Univ. Rennes I, IRISA, France, Nov. 1998. [16] E. Malis, F. Chaumette, and S. Boudet, Camera displacement through the recovery of a homography: Application to 2-1/2-D visual servoing, Int. J. Comput. Vision, 1999. [17] N. P. Papanikolopoulos, P. K. Kosla, and T. Kanade, Visual tracking of a moving target by a camera mounted on a robot: A combination of control and vision, IEEE Trans. Robot. Automat., vol. 9, pp. 1435, Feb. 1993. [18] C. Samson, M. L. Borgne, and B. Espiau, Robot Control: The Task Function Approach, Oxford Engineering Science Series. Oxford, U.K.: Clarendon, 1991, vol. 22. [19] L. E. Weiss, A. C. Sanderson, and C. P. Neuman, Dynamic sensorbased control of robots with visual feedback, IEEE J. Robot. Automat., vol. 3, pp. 404417, Oct. 1987. [20] W. J. Wilson, C. C. W. Hulls, and G. S. Bell, Relative end-effector control using Cartesian position-based visual servoing, IEEE Trans. Robot. Automat., vol. 12, pp. 684696, Oct. 1996.
Ezio Malis was born in Gorizia, Italy, in 1970. He graduated from the University Politecnico di Milano, Italy, and from the Ecole Sup erieure dElectricit e (Sup elec), Paris, France, in 1995 and received the Ph.D. degree from the University of Rennes, Rennes, France, in 1998. He is a Research Associate with the Department of Engineering, University of Cambridge, Cambridge, U.K. His research interests include robotics, computer vision, and vision-based control.
Fran cois Chaumette was born in Nantes, France, in 1963 and graduated from Ecole Nationale Sup erieure de M ecanique, Nantes, in 1987. He received the Ph.D. degree and Habilitation a ` Diriger des Recherches in computer science from the University of Rennes in 1990 and 1998 respectively. Since 1990, he has been with IRISA/INRIA, Rennes. His research interests include robotics, computer vision, and especially the coupling of these two research domains (visionbased control, active vision and purposive vision). Dr. Chaumette received the AFCET/CNRS Prize for the best french thesis in automatic control, in 1991.
Sylvie Boudet was born in Grenoble, France, in 1971. She graduated from Ecole Sup erieure dElectricit e (Sup elec), Paris, France, in 1994. She works as a Research Engineer at the Research Center of Electricit e de France (EDF), Chatou. EDF is the French company that produces, transports, and delivers electricity. Her research interests are to develop robotics controllers including force-control and vision-based control, in order to make it easier to have robotics maintenance in nuclear power plant or in any other hostile environment. She has also led a project on a medical robot, Hippocrate, to improve medical diagnosis on heart diseases.