Camera Calibration
Camera Calibration
CAMERA CALIBRATION
Zhengyou Zhang
2.1
Introduction
Camera Calibration
Chapter 2
an elaborate setup.
2D plane based calibration. Techniques in this category requires to observe a planar pattern shown at a few dierent orientations [42, 31].
Dierent from Tsais technique [33], the knowledge of the plane motion
is not necessary. Because almost anyone can make such a calibration
pattern by him/her-self, the setup is easier for camera calibration.
1D line based calibration. Calibration objects used in this category are
composed of a set of collinear points [44]. As will be shown, a camera
can be calibrated by observing a moving line around a xed point, such
as a string of balls hanging from the ceiling.
Self-calibration. Techniques in this category do not use any calibration
object, and can be considered as 0D approach because only image
point correspondences are required. Just by moving a camera in a
static scene, the rigidity of the scene provides in general two constraints [22, 21] on the cameras internal parameters from one camera
displacement by using image information alone. Therefore, if images
are taken by the same camera with xed internal parameters, correspondences between three images are sucient to recover both the
internal and external parameters which allow us to reconstruct 3-D
structure up to a similarity [20, 17]. Although no calibration objects
are necessary, a large number of parameters need to be estimated, resulting in a much harder mathematical problem.
Other techniques exist: vanishing points for orthogonal directions [4, 19],
and calibration from pure rotation [16, 30].
Before going further, Id like to point out that no single calibration technique is the best for all. It really depends on the situation a user needs to
deal with. Following are my few recommendations:
Calibration with apparatus vs. self-calibration. Whenever possible, if
we can pre-calibrate a camera, we should do it with a calibration apparatus. Self-calibration cannot usually achieve an accuracy comparable
with that of pre-calibration because self-calibration needs to estimate a
large number of parameters, resulting in a much harder mathematical
problem. When pre-calibration is impossible (e.g., scene reconstruction
from an old movie), self-calibration is the only choice.
Partial vs. full self-calibration. Partial self-calibration refers to the
case where only a subset of camera intrinsic parameters are to be cal-
Section 2.2.
ibrated. Along the same line as the previous recommendation, whenever possible, partial self-calibration is preferred because the number
of parameters to be estimated is smaller. Take an example of 3D reconstruction with a camera with variable focal length. It is preferable
to pre-calibrate the pixel aspect ratio and the pixel skewness.
Calibration with 3D vs. 2D apparatus. Highest accuracy can usually be
obtained by using a 3D apparatus, so it should be used when accuracy is
indispensable and when it is aordable to make and use a 3D apparatus.
From the feedback I received from computer vision researchers and
practitioners around the world in the last couple of years, calibration
with a 2D apparatus seems to be the best choice in most situations
because of its ease of use and good accuracy.
Calibration with 1D apparatus. This technique is relatively new, and it
is hard for the moment to predict how popular it will be. It, however,
should be useful especially for calibration of a camera network. To
calibrate the relative geometry between multiple cameras as well as
their intrinsic parameters, it is necessary for all involving cameras to
simultaneously observe a number of points. It is hardly possible to
achieve this with 3D or 2D calibration apparatus1 if one camera is
mounted in the front of a room while another in the back. This is not
a problem for 1D objects. We can for example use a string of balls
hanging from the ceiling.
This chapter is organized as follows. Section 2.2 describes the camera
model and introduces the concept of the absolute conic which is important
for camera calibration. Section 2.3 presents the calibration techniques using
a 3D apparatus. Section 2.4 describes a calibration technique by observing a
freely moving planar pattern (2D object). Its extension for stereo calibration
is also addressed. Section 2.5 describes a relatively new technique which uses
a set of collinear points (1D object). Section 2.6 briey introduces the selfcalibration approach and provides references for further reading. Section 2.7
concludes the chapter with a discussion on recent work in this area.
2.2
An exception is when those apparatus are made transparent; then the cost would be
much higher.
Camera Calibration
2.2.1
X
M = Y
Z
(u0 , v0 )
Chapter 2
(R, t )
Figure 2.1. Pinhole camera model
u0
with A = 0 v0
0 0 1
and P = A R t
(2.2)
(2.3)
where s is an arbitrary scale factor, (R, t), called the extrinsic parameters,
is the rotation and translation which relates the world coordinate system to
the camera coordinate system, and A is called the camera intrinsic matrix,
with (u0 , v0 ) the coordinates of the principal point, and the scale factors
Section 2.2.
in image u and v axes, and the parameter describing the skew of the
two image axes. The 3 4 matrix P is called the camera projection matrix,
which mixes both intrinsic and extrinsic parameters. In Figure 2.1, the angle
between the two image axes is denoted by , and we have = cot . If the
pixels are rectangular, then = 90 and = 0.
The task of camera calibration is to determine the parameters of the
transformation between an object in 3D space and the 2D image observed by
the camera from visual information (images). The transformation includes
Extrinsic parameters (sometimes called external parameters): orientation (rotation) and location (translation) of the camera, i.e., (R, t);
Intrinsic parameters (sometimes called internal parameters): characteristics of the camera, i.e., (, , , u0 , v0 ).
The rotation matrix, although consisting of 9 elements, only has 3 degrees
of freedom. The translation vector t obviously has 3 parameters. Therefore,
there are 6 extrinsic parameters and 5 intrinsic parameters, leading to in
total 11 parameters.
We use the abbreviation AT for (A1 )T or (AT )1 .
2.2.2
Absolute Conic
Now let us introduce the concept of the absolute conic. For more details,
the reader is referred to [7, 15].
= [x1 , x2 , x3 , x4 ]T .
A point x in 3D space has projective coordinates x
The equation of the plane at innity, , is x4 = 0. The absolute conic
is dened by a set of points satisfying the equation
x21 + x22 + x23 = 0
x4 = 0 .
(2.4)
10
Camera Calibration
Chapter 2
Absolute Conic
xT x = 0
x
fin
at in
e
n
Pla
ity
Image of
Absolute Conic
m T A T A 1m = 0
C
Figure 2.2. Absolute conic and its image
Section 2.3.
m , which is given by
= sA[R
m
11
x
t] = sARx .
0
It follows that
T AT A1 m
= s2 xT RT Rx = s2 xT x = 0 .
m
Therefore, the image of the absolute conic is an imaginary conic, and is
dened by AT A1 . It does not depend on the extrinsic parameters of the
camera.
If we can determine the image of the absolute conic, then we can solve
the cameras intrinsic parameters, and the calibration is solved.
We will show several ways in this chapter how to determine , the image
of the absolute conic.
2.3
12
Camera Calibration
n
ow
Kn
(a)
Chapter 2
nt
me
e
c
pla
dis
(b)
Section 2.3.
2.3.1
13
Feature Extraction
If one uses a generic corner detector, such as Harris corner detector, to detect
the corners in the check pattern image, the result is usually not good because
the detector corners have poor accuracy (about one pixel). A better solution
is to leverage the known pattern structure by rst estimating a line for each
side of the square and then computing the corners by intersecting the tted
lines. There are two common techniques to estimate the lines. The rst is to
rst detect edges, and then t a line to the edges on each side of the square.
The second technique is to directly t a line to each side of a square in the
image such that the gradient on the line is maximized. One possibility is to
represent the line by an elongated Gaussian, and estimate the parameters
of the elongated Gaussian by maximizing the total gradient covered by the
Gaussian. We should note that if the lens distortion is not severe, a better
solution is to t just one single line to all the collinear sides. This will leads
a much more accurate estimation of the position of the checker corners.
2.3.2
Once we extract the corner points in the image, we can easily establish their
correspondences with the points in the 3D space because of knowledge of
the patterns. Based on the projection equation (2.1), we are now able to
estimate the camera parameters. However, the problem is quite nonlinear if
we try to estimate directly A, R and t. If, on the other hand, we estimate
the camera projection matrix P, a linear solution is possible, as to be shown
now.
Given each 2D-3D correspondence mi = (ui , vi ) Mi = (Xi , Yi , Zi ), we
can write down 2 equations based on (2.1):
Xi Yi Zi 1 0 0 0 0 ui Xi ui Yi ui Zi ui
p=0
0 0 0 0 Xi Yi Zi 1 vi Xi vi Yi vi Zi vi
Gi
subject to p = 1
14
Camera Calibration
Chapter 2
2.3.3
Once the camera projection matrix P is known, we can uniquely recover the
intrinsic and extrinsic parameters of the camera. Let us denote the rst 33
submatrix of P by B and the last column of P by b, i.e., P [B b]. Since
P = A[R t], we have
B = AR
(2.5)
b = At
(2.6)
2
+ 2 + u20 u0 v0 + c u0
kc
ku
K BBT = AAT = u0 v0 + c
v2 + v02
v0
kc
kv
u0
v0
1
Because P is dened up to a scale factor, the last element of K = BBT is usually not equal to 1, so we have to normalize it such that K33 (the last element) =
1. After that, we immediately obtain
u0 = K13
(2.7)
v0 = K23
= kv v02
(2.8)
kc u0 v0
= ku u20 2
=
(2.9)
(2.10)
(2.11)
Section 2.3.
15
2.3.4
b.
(2.12)
(2.13)
2.3.5
Lens Distortion
16
Camera Calibration
Chapter 2
Z]T = R [Xw
Yw
Zw ]T + t
X
,
Z
y=f
Y
Z
y = y + y
where (
x, y) are the distorted or true image coordinates, and (x , y )
are distortions applied to (x, y).
Step 4: Ane transformation from real image coordinates (
x, y) to frame
buer (pixel) image coordinates (
u, v):
u
= d1
+ u0 ,
x x
v = d1
+ v0 ,
y y
where (u0 , v0 ) are coordinates of the principal point; dx and dy are distances between adjacent pixels in the horizontal and vertical directions,
respectively.
There are two types of distortions:
Radial distortion: It is symmetric; ideal image points are distorted along
radial directions from the distortion center. This is caused by imperfect
lens shape.
Decentering distortion: This is usually caused by improper lens assembly; ideal image points are distorted in both radial and tangential directions.
The reader is referred to [29, 3, 6, 37] for more details.
2
Note that the lens distortion described here is dierent from Tsais treatment. Here,
we go from ideal to real image coordinates, similar to [36].
Section 2.3.
17
(2.15)
v = v + (v v0 )[k1 (x + y ) + k2 (x + y ) ] .
(2.16)
2 2
mi m(A,
R, t, k1 , k2 , Mi )2
(2.17)
min
A,R,t,k1 ,k2
where m(A,
R, t, k1 , k2 , Mi ) is the projection of Mi onto the image according
to (2.1), followed by distortion according to (2.15) and (2.16).
2.3.6
An Example
18
Camera Calibration
Chapter 2
image axes is 90 , and the aspect ratio of the pixels is / = 0.679. For the
extrinsic parameters, the translation vector t = [211.28, 106.06, 1583.75]T
(in mm), i.e., the calibration object is about 1.5m away from the camera;
the rotation axis is [0.08573, 0.99438, 0.0621]T (i.e., almost vertical), and
the rotation angle is 47.7 .
Other notable work in this category include [27, 38, 36, 18].
2.4
2.4.1
Section 2.4.
19
X
u
Y
X
s v = A r1 r2 r3 t
0 = A r1 r2 t Y .
1
1
1
By abuse of notation, we still use M to denote a point on the model plane, but
M = [X, Y, 1]T . Therefore,
M = [X, Y ]T since Z is always equal to 0. In turn,
a model point M and its image m is related by a homography H:
= H
sm
M
with H = A r1 r2 t .
(2.18)
As is clear, the 3 3 matrix H is dened up to a scale factor.
2.4.2
h2
h3 ] = A [ r1
r2
t] ,
hT2 AT A1 h2
(2.19)
.
(2.20)
These are the two basic constraints on the intrinsic parameters, given one
homography. Because a homography has 8 degrees of freedom and there
are 6 extrinsic parameters (3 for rotation and 3 for translation), we can only
obtain 2 constraints on the intrinsic parameters. Note that AT A1 actually
describes the image of the absolute conic [20]. In the next subsection, we
will give an geometric interpretation.
2.4.3
Geometric Interpretation
We are now relating (2.19) and (2.20) to the absolute conic [22, 20].
It is not dicult to verify that the model plane, under our convention, is
described in the camera coordinate system by the following equation:
T x
y
r3
=0,
T
r3 t z
w
20
Camera Calibration
Chapter 2
2.4.4
Closed-form solution
We now provide the details on how to eectively solve the camera calibration
problem. We start with an analytical solution. This initial estimation will
be followed by a nonlinear optimization technique based on the maximum
likelihood criterion, to be described in the next subsection.
Let
v0 u0
1
2
2
2
2
0 )
+1
(v0u
v02
=
(2.22)
22
2 2 2
2
.
(v0 u0 ) v0
(v0 u0 )2 v02
v0 u0
2 2 2
+ 2 +1
2
2 2
Section 2.4.
21
(2.23)
Let the ith column vector of H be hi = [hi1 , hi2 , hi3 ]T . Then, we have
T
hTi Bhj = vij
b
(2.24)
with vij = [hi1 hj1 , hi1 hj2 +hi2 hj1 , hi2 hj2 , hi3 hj1 +hi1 hj3 , hi3 hj2 +hi2 hj3 , hi3 hj3 ]T .
Therefore, the two fundamental constraints (2.19) and (2.20), from a given
homography, can be rewritten as 2 homogeneous equations in b:
T
v12
b=0.
(2.25)
(v11 v22 )T
If n images of the model plane are observed, by stacking n such equations
as (2.25) we have
Vb = 0 ,
(2.26)
where V is a 2n6 matrix. If n 3, we will have in general a unique solution
b dened up to a scale factor. If n = 2, we can impose the skewless constraint
= 0, i.e., [0, 1, 0, 0, 0, 0]b = 0, which is added as an additional equation to
(2.26). (If n = 1, we can only solve two camera intrinsic parameters, e.g.,
and , assuming u0 and v0 are known (e.g., at the image center) and = 0,
and that is indeed what we did in [28] for head pose determination based
on the fact that eyes and mouth are reasonably coplanar. In fact, Tsai [33]
already mentions that focal length from one plane is possible, but incorrectly
says that aspect ratio is not.) The solution to (2.26) is well known as the
eigenvector of VT V associated with the smallest eigenvalue (equivalently,
the right singular vector of V associated with the smallest singular value).
Once b is estimated, we can compute all camera intrinsic parameters as
follows. The matrix B, as described in Sect. 2.4.4, is estimated up to a scale
factor, i.e.,, B = AT A with an arbitrary scale. Without diculty, we
can uniquely extract the intrinsic parameters from matrix B.
2
v0 = (B12 B13 B11 B23 )/(B11 B22 B12
)
2
= B33 [B13
+ v0 (B12 B13 B11 B23 )]/B11
= /B11
2 )
= B11 /(B11 B22 B12
= B12 2 /
u0 = v0 / B13 2 / .
22
Camera Calibration
Chapter 2
r2 = A1 h2 ,
r3 = r1 r2 ,
t = A1 h3
2.4.5
mij m(A,
Ri , ti , Mj )2 ,
(2.27)
i=1 j=1
where m(A,
Ri , ti , Mj ) is the projection of point Mj in image i, according to
equation (2.18). A rotation R is parameterized by a vector of 3 parameters,
denoted by r, which is parallel to the rotation axis and whose magnitude is
equal to the rotation angle. R and r are related by the Rodrigues formula [8].
Minimizing (2.27) is a nonlinear minimization problem, which is solved with
the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It
requires an initial guess of A, {Ri , ti |i = 1..n} which can be obtained using
the technique described in the previous subsection.
Desktop cameras usually have visible lens distortion, especially the radial components. We have included these while minimizing (2.27). See my
technical report [41] for more details.
2.4.6
Section 2.4.
23
Estimating Radial Distortion by Alternation. As the radial distortion is expected to be small, one would expect to estimate the other ve intrinsic
parameters, using the technique described in Sect. 2.4.5, reasonable well by
simply ignoring distortion. One strategy is then to estimate k1 and k2 after
having estimated the other parameters, which will give us the ideal pixel
coordinates (u, v). Then, from (2.15) and (2.16), we have two equations for
each point in each image:
u
u
(uu0 )(x2 +y 2 ) (uu0 )(x2 +y 2 )2 k1
=
.
vv
(vv0 )(x2 +y 2 ) (vv0 )(x2 +y 2 )2 k2
(2.28)
Once k1 and k2 are estimated, one can rene the estimate of the other param
eters by solving (2.27) with m(A,
Ri , ti , Mj ) replaced by (2.15) and (2.16).
We can alternate these two procedures until convergence.
Complete Maximum Likelihood Estimation. Experimentally, we found the
convergence of the above alternation technique is slow. A natural extension
to (2.27) is then to estimate the complete set of parameters by minimizing
the following functional:
m
n
mij m(A,
k1 , k2 , Ri , ti , Mj )2 ,
(2.29)
i=1 j=1
where m(A,
k1, k2, Ri , ti , Mj ) is the projection of point Mj in image i according to equation (2.18), followed by distortion according to (2.15) and
(2.16). This is a nonlinear minimization problem, which is solved with the
Levenberg-Marquardt Algorithm as implemented in Minpack [23]. A rotation is again parameterized by a 3-vector r, as in Sect. 2.4.5. An initial guess
of A and {Ri , ti |i = 1..n} can be obtained using the technique described in
Sect. 2.4.4 or in Sect. 2.4.5. An initial guess of k1 and k2 can be obtained with
the technique described in the last paragraph, or simply by setting them to
0.
2.4.7
Summary
24
Camera Calibration
Chapter 2
2.4.8
Experimental Results
The proposed algorithm has been tested on both computer simulated data
and real data. The closed-form solution involves nding a singular value
decomposition of a small 2n 6 matrix, where n is the number of images.
The nonlinear renement within the Levenberg-Marquardt algorithm takes
3 to 5 iterations to converge. Due to space limitation, we describe in this
section one set of experiments with real data when the calibration pattern
is at dierent distances from the camera. The reader is referred to [41] for
more experimental results with both computer simulated and real data, and
to the following Web page:
Section 2.4.
(A)
25
(B)
Figure 2.6. Two sets of images taken at dierent distances to the calibration
pattern. Each set contains ve images. On the left, three images from the set taken
at a close distance are shown. On the right, three images from the set taken at a
larger distance are shown.
26
Camera Calibration
Chapter 2
Table 2.1. Calibration results with the images shown in Figure 2.6
image set
A
B
A+B
834.01
836.17
834.64
839.86
841.08
840.32
89.95
89.92
89.94
u0
305.51
301.76
304.77
v0
240.09
241.51
240.59
k1
-0.2235
-0.2676
-0.2214
k2
0.3761
1.3121
0.3643
http://research.microsoft.com/zhang/Calib/
for some experimental data and the software.
The example is shown in Fig. 2.6. The camera to be calibrated is an
o-the-shelf PULNiX CCD camera with 6 mm lens. The image resolution
is 640480. As can be seen in Fig. 2.6, the model plane contains a 9 9
squares with 9 special dots which are used to identify automatically the correspondence between reference points on the model plane and square corners
in images. It was printed on a A4 paper with a 600 DPI laser printer, and
attached to a cardboard.
In total 10 images of the plane were taken (6 of them are shown in
Fig. 2.6). Five of them (called Set A) were taken at close range, while the
other ve (called Set B) were taken at a larger distance. We applied our
calibration algorithm to Set A, Set B, and also to the whole set (called Set
A+B). The results are shown in Table 2.1. For intuitive understanding, we
show the estimated angle between the image axes, , instead of the skew
factor . We can see that the angle is very close to 90 , as expected with
almost all modern CCD cameras. The cameras parameters were estimated
consistently for all three sets of images, except the distortion parameters
with Set B. The reason is that the calibration pattern only occupies the
central part of the image in Set B, where lens distortion is not signicant
and therefore cannot be estimated reliably.
2.4.9
Related Work
Almost at the same time, Sturm and Maybank [31], independent from us,
developed the same technique. They assumed the pixels are square (i.e., =
0) and have studied the degenerate congurations for plane-based camera
calibration.
Gurdjos et al. [14] have re-derived the plane-based calibration technique
from the center line constraint.
My original implementation (only the executable) is available at
Section 2.5.
27
http://research.microsoft.com/zhang/calib/.
Bouguet has re-implemented my technique in Matlab, which is available at
http://www.vision.caltech.edu/bouguetj/calib doc/.
In many applications such as stereo, multiple cameras need to be calibrated simultaneously in order to determine the relative geometry between
cameras. In 2000, I have extended (not published) this plane-based technique
to stereo calibration for my stereo-based gaze-correction project [40, 39].
The formulation is similar to (2.29). Consider two cameras, and denote the
quantity related to the second camera by . Let (Rs , ts ) be the rigid transformation between the two cameras such that (R , t ) = (R, t) (Rs , ts ) or
more precisely: R = RRs and t = Rts + t. Stereo calibration is then to
solve A, A , k1 , k2 , k1 , k2 , {(Ri , ti )|i = 1, . . . , n}, and (Rs , ts ) by minimizing
the following functional:
m
n
(2.30)
subject to
Ri = Ri Rs
and ti = Ri ts + ti .
2.5
28
Camera Calibration
Chapter 2
2.5.1
Four or more collinear points with known distances. As seen above, when the
number of points increases from two to three, the number of independent
equations (constraints) increases by one for each observation. If we have
a fourth point, will we have in total 6N independent equations? If so, we
would be able to solve the problem because the number of unknowns remains
the same, i.e., 5 + 5N , and we would have more than enough constraints if
N 5. The reality is that the addition of the fourth point or even more
points does not increase the number of independent equations. It will always
Section 2.5.
29
be 5N for any four or more collinear points. This is because the cross ratio is
preserved under perspective projection. With known cross ratios and three
collinear points, whether they are in space or in images, other points are
determined exactly.
2.5.2
30
2.5.3
Camera Calibration
Chapter 2
Basic Equations
Refer to Figure 2.7. Point A is the xed point in space, and the stick AB
moves around A. The length of the stick AB is known to be L, i.e.,
B A = L .
(2.31)
The position of point C is also known with respect to A and B, and therefore
C = A A + B B ,
(2.32)
Section 2.5.
31
have
A = zA A1 a
B = zB A1 b
C = zC A
c.
(2.33)
(2.34)
(2.35)
(2.36)
A (
a
c) (b
c)
.
B (b
c) (b
c)
(2.37)
2 T T 1
h A A h = L2
zA
with
+
h=a
a
c) (b
c)
A (
b.
B (b
c) (b
c)
(2.38)
(2.39)
Equation (2.38) contains the unknown intrinsic parameters A and the unknown depth, zA , of the xed point A. It is the basic constraint for camera
calibration with 1D objects. Vector h, given by (2.39), can be computed from
image points and known A and B . Since the total number of unknowns
is 6, we need at least six observations of the 1D object for calibration. Note
that AT A actually describes the image of the absolute conic [20].
32
Camera Calibration
2.5.4
Chapter 2
Closed-Form Solution
Let
B = AT A1
1
2
2
v0 u0
2
2
+1
2 2 2
0 )
(v0u
v02
22
(2.40)
v0 u0
2
(v0 u0 ) v0
2 2 2
(v0 u0 )2 v02
+
+1
2 2
2
(2.41)
(2.42)
vT x = L2
(2.43)
with
v = [h21 , 2h1 h2 , h22 , 2h1 h3 , 2h2 h3 , h23 ]T .
When N images of the 1D object are observed, by stacking n such equations
as (2.43) we have
Vx = L2 1 ,
(2.44)
where V = [v1 , . . . , vN ]T and 1 = [1, . . . , 1]T . The least-squares solution is
then given by
x = L2 (VT V)1 VT 1 .
(2.45)
2 b.
Once x is estimated, we can compute all the unknowns based on x = zA
T
Let x = [x1 , x2 , . . . , x6 ] . Without diculty, we can uniquely extract the
intrinsic parameters and the depth zA as
Section 2.5.
2.5.5
33
Nonlinear Optimization
ai (A, A)2 + bi (A, Bi )2 + ci (A, Ci )2 ,
(2.46)
i=1
sin cos
B = A + L sin sin
cos
where L is the known distance between A and B. In turn, point C is computed
according to (2.32). We therefore only need 2 additional parameters for each
observation.
Minimizing (2.46) is a nonlinear minimization problem, which is solved
with the Levenberg-Marquardt Algorithm as implemented in Minpack [23].
It requires an initial guess of A, A, {Bi , Ci |i = 1..N } which can be obtained
using the technique described in the last subsection.
34
2.5.6
Camera Calibration
Chapter 2
F=
N
i=1
2 =
wi lTi a
N
wi nTi a + qi 2 =
i=1
N
i=1
(2.47)
where wi is a weighting factor (see below). By setting the derivative of F
with respect to a to 0, we obtain the solution, which is given by
N
N
1
wi ni nTi
wi qi ni .
a=
i=1
i=1
,
The optimal weighting factor wi in (2.47) is the inverse of the variance of lTi a
T
). Note that the weight wi involves the unknown a.
a i a
which is wi = 1/(
To overcome this diculty, we can approximate wi by 1/ trace(i ) for the
rst iteration, and by re-computing wi with the previously estimated a in
the subsequent iterations. Usually two or three iterations are enough.
Visible xed point. Since the xed point is visible, we have N observations:
2
{ai |i = 1, . . . , N }. We can therefore estimate a by minimizing N
i=1 aai ,
assuming that the image
are detected with the same accuracy. The
points
N
solution is simply a = ( i=1 ai )/N .
Section 2.5.
35
The above estimation does not make use of the fact that the xed point
is also the intersection of the N observed lines of the 1D object. Therefore,
a better technique to estimate a is to minimize the following function:
F=
N
N
2 =
(aai )T Vi1 (aai )+wi lTi a
(aai )T Vi1 (aai )+wi nTi a+qi 2
i=1
i=1
(2.48)
where Vi is the covariance matrix of the detected point ai . The derivative
of the above function with respect to a is given by
F
Vi1 (a ai ) + wi ni nTi a + wi qi ni .
=2
a
N
i=1
Setting it to 0 yields
a=
N
(Vi1
N
1
wi ni nTi )
i=1
(Vi1 ai wi qi ni ) .
i=1
If more than three points are visible in each image, the known cross ratio
provides an additional constraint in determining the xed point.
For an accessible description of uncertainty manipulation, the reader is
referred to [45, Chapter 2].
2.5.7
Experimental Results
The proposed algorithm has been tested on both computer simulated data
and real data.
Computer Simulations
The simulated camera has the following property: = 1000, = 1000,
= 0, u0 = 320, and v0 = 240. The image resolution is 640 480. A stick of
70 cm is simulated with the xed point A at [0, 35, 150]T . The other endpoint
of the stick is B, and C is located at the half way between A and B. We have
generated 100 random orientations of the stick by sampling in [/6, 5/6]
and in [, 2] according to uniform distribution. Points A, B, and C are
then projected onto the image.
Gaussian noise with 0 mean and standard deviation is added to the
projected image points a, b and c. The estimated camera parameters are
compared with the ground truth, and we measure their relative errors with
respect to the focal length . Note that we measure the relative errors in
36
Camera Calibration
Chapter 2
Solution
Closed-form
Nonlinear
Plane-based
Relative dierence
u0
v0
Real Data
For the experiment with real data, I used three toy beads from my kids
and strung them together with a stick. The beads are approximately 14 cm
apart (i.e., L = 28). I then moves the stick around while trying to x one
end with the aid of a book. A video of 150 frames was recorded, and four
sample images are shown in Fig. 2.9. A bead in the image is modeled as a
Gaussian blob in the RGB space, and the centroid of each detected blob is
the image point we use for camera calibration. The proposed algorithm is
therefore applied to the 150 observations of the beads, and the estimated
camera parameters are provided in Table 2.2. The rst row is the estimation
from the closed-form solution, while the second row is the rened result after
nonlinear minimization. For the image skew parameter , we also provide
the angle between the image axes in parenthesis (it should be very close to
90 ).
For comparison, we also used the plane-based calibration technique described in [42] to calibrate the same camera. Five images of a planar pattern
were taken, and one of them is shown in Fig. 2.10. The calibration result is
shown in the third row of Table 2.2. The fourth row displays the relative
Section 2.5.
37
Figure 2.8. Calibration errors with respect to the noise level of the image points.
38
Camera Calibration
Frame 10
Frame 60
Frame 90
Frame 140
Chapter 2
Figure 2.10. A sample image of the planar pattern used for camera calibration.
Section 2.6.
Self Calibration
39
dierence between the plane-based result and the nonlinear solution with respect to the focal length (we use 828.92). As we can observe, the dierence
is about 2%.
There are several sources contributing to this dierence. Besides obviously the image noise and imprecision of the extracted data points, one
source is our current rudimentary experimental setup:
The supposed-to-be xed point was not xed. It slipped around on the
surface.
The positioning of the beads was done with a ruler using eye inspection.
Considering all the factors, the proposed algorithm is very encouraging.
2.6
Self-Calibration
2.7
Conclusion
40
Camera Calibration
Chapter 2
Camera calibration is still an active research area because more and more
applications use cameras. In [2], spheres are used to calibrate one or more
cameras, which can be considered as a 2D approach since only the surface
property is used. In [5], a technique is described to calibrate a camera network consisting of an omni-camera and a number of perspective cameras. In
[24], a technique is proposed to calibrate a projector-screen-camera system.
2.8
There are many ways to estimate the homography between the model plane
and its image. Here, we present a technique based on maximum likelihood
criterion. Let Mi and mi be the model and image points, respectively. Ideally,
they should satisfy (2.18). In practice, they dont because of noise in the
extracted image points. Lets assume that mi is corrupted by Gaussian noise
with mean 0 and covariance matrix mi . Then, the maximum likelihood
estimation of H is obtained by minimizing the following functional
i )T 1
i) ,
(mi m
mi (mi m
i
T
Mi
1
h
1
i , the ith row of H.
with h
where
mi = T
TM
h
h3 Mi 2 i
In practice, we simply assume mi = 2 I for all i. This is reasonable if points
are extracted independently with the same procedure. In thiscase, the above
i 2 .
problem becomes a nonlinear least-squares one, i.e., minH i mi m
The nonlinear minimization is conducted with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. This requires an initial guess,
which can be obtained as follows.
T , h
T , h
T ]T . Then equation (2.18) can be rewritten as
Let x = [h
1
2
3
T
0T u
MT
M
x=0.
0T
MT
MT v
When we are given n points, we have n above equations, which can be written
in matrix equation as Lx = 0, where L is a 2n 9 matrix. As x is dened up
to a scale factor, the solution is well known to be the right singular vector of
L associated with the smallest singular value (or equivalently, the eigenvector
of LT L associated with the smallest eigenvalue). In L, some elements are
constant 1, some are in pixels, some are in world coordinates, and some are
multiplication of both. This makes L poorly conditioned numerically. Much
better results can be obtained by performing a simple data normalization
prior to running the above procedure.
Bibliography
41
Bibliography
[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation into object
space coordinates in close-range photogrammetry. In Proceedings of the Symposium on Close-Range Photogrammetry, University of Illinois at UrbanaChampaign, Urbana, Illinois, pages 118, January 1971.
[2] M. Agrawal and L. Davis. Camera calibration using spheres: A semi-denite
programming approach. In Proceedings of the 9th International Conference on
Computer Vision, pages 782789, Nice, France, October 2003. IEEE Computer
Society Press.
[3] D. C. Brown. Close-range camera calibration. Photogrammetric Engineering,
37(8):855866, 1971.
[4] B. Caprile and V. Torre. Using Vanishing Points for Camera Calibration. The
International Journal of Computer Vision, 4(2):127140, March 1990.
[5] X. Chen, J. Yang, and A. Waibel. Calibration of a hybrid camera network.
In Proceedings of the 9th International Conference on Computer Vision, pages
150155, Nice, France, October 2003. IEEE Computer Society Press.
[6] W. Faig. Calibration of close-range photogrammetry systems: Mathematical
formulation. Photogrammetric Engineering and Remote Sensing, 41(12):1479
1486, 1975.
[7] O. Faugeras and Q.-T. Luong. The Geometry of Multiple Images. The MIT
Press, 2001. With contributions from T. Papadopoulo.
[8] O. Faugeras. Three-Dimensional Computer Vision: a Geometric Viewpoint.
MIT Press, 1993.
[9] O. Faugeras, T. Luong, and S. Maybank. Camera self-calibration: theory and
experiments. In G. Sandini, editor, Proc 2nd ECCV, volume 588 of Lecture
Notes in Computer Science, pages 321334, Santa Margherita Ligure, Italy,
May 1992. Springer-Verlag.
[10] O. Faugeras and G. Toscani. The calibration problem for stereo. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pages
1520, Miami Beach, FL, June 1986. IEEE.
[11] S. Ganapathy. Decomposition of transformation matrices for robot vision. Pattern Recognition Letters, 2:401412, December 1984.
[12] D. Gennery. Stereo-camera calibration. In Proceedings of the 10th Image Understanding Workshop, pages 101108, 1979.
[13] G.H. Golub and C.F. van Loan. Matrix Computations. The John Hopkins
University Press, Baltimore, Maryland, 3 edition, 1996.
[14] P. Gurdjos, A. Crouzil, and R. Payrissat. Another way of looking at plane-based
calibration: the centre circle constraint. In Proceedings of the 7th European
Conference on Computer Vision, volume IV, pages 252266, Copenhagen, May
2002.
[15] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision.
Cambridge University Press, 2000.
[16] R. Hartley. Self-calibration from multiple views with a rotating camera. In JO. Eklundh, editor, Proceedings of the 3rd European Conference on Computer
Vision, volume 800-801 of Lecture Notes in Computer Science, pages 471478,
Stockholm, Sweden, May 1994. Springer-Verlag.
42
Camera Calibration
Chapter 2
[17] R. Hartley. An algorithm for self calibration from several views. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pages
908912, Seattle, WA, June 1994. IEEE.
[18] J. Heikkil
a and O. Silven. A four-step camera calibration procedure with implicit image correction. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 11061112, San Juan, Puerto Rico, June
1997. IEEE Computer Society.
[19] D. Liebowitz and A. Zisserman. Metric rectication for perspective images of
planes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 482488, Santa Barbara, California, June 1998. IEEE
Computer Society.
[20] Q.-T. Luong and O.D. Faugeras. Self-calibration of a moving camera from
point correspondences and fundamental matrices. The International Journal of
Computer Vision, 22(3):261289, 1997.
[21] Q.-T. Luong.
Matrice Fondamentale et Calibration Visuelle sur
lEnvironnement-Vers une plus grande autonomie des syst`emes robotiques.
PhD thesis, Universite de Paris-Sud, Centre dOrsay, December 1992.
[22] S. J. Maybank and O. D. Faugeras. A theory of self-calibration of a moving
camera. The International Journal of Computer Vision, 8(2):123152, August
1992.
[23] J.J. More. The levenberg-marquardt algorithm, implementation and theory. In
G. A. Watson, editor, Numerical Analysis, Lecture Notes in Mathematics 630.
Springer-Verlag, 1977.
[24] T. Okatani and K. Deguchi. Autocalibration of projector-screen-camera system: Theory and algorithm for screen-to-camera homography estimation. In
Proceedings of the 9th International Conference on Computer Vision, pages
774781, Nice, France, October 2003. IEEE Computer Society Press.
[25] L. Robert. Camera calibration without feature extraction. Computer Vision,
Graphics, and Image Processing, 63(2):314325, March 1995. also INRIA Technical Report 2204.
[26] J.G. Semple and G.T. Kneebone. Algebraic Projective Geometry. Oxford:
Clarendon Press, 1952. Reprinted 1979.
[27] S.W. Shih, Y.P. Hung, and W.S. Lin. Accurate linear technique for camera
calibration considering lens distortion by solving an eigenvalue problem. Optical
Engineering, 32(1):138149, 1993.
[28] I. Shimizu, Z. Zhang, S. Akamatsu, and K. Deguchi. Head pose determination from one image using a generic model. In Proceedings of the IEEE Third
International Conference on Automatic Face and Gesture Recognition, pages
100105, Nara, Japan, April 1998.
[29] C. C. Slama, editor. Manual of Photogrammetry. American Society of Photogrammetry, fourth edition, 1980.
[30] G. Stein. Accurate internal camera calibration using rotation, with analysis of
sources of error. In Proc. Fifth International Conference on Computer Vision,
pages 230236, Cambridge, Massachusetts, June 1995.
Bibliography
43