0% found this document useful (0 votes)
253 views

Camera Calibration

Camera calibration is necessary to extract metric information from 2D images. Various calibration techniques exist based on the dimension of the calibration object used, including: 3D objects, 2D planar patterns, 1D line-based objects, and self-calibration using unknown scene points. The chapter reviews these techniques and discusses their relative advantages, recommending pre-calibration using a 2D or 3D apparatus when possible for highest accuracy. Camera calibration involves estimating the intrinsic and extrinsic parameters that relate 3D coordinates to 2D image projections.

Uploaded by

yokesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
253 views

Camera Calibration

Camera calibration is necessary to extract metric information from 2D images. Various calibration techniques exist based on the dimension of the calibration object used, including: 3D objects, 2D planar patterns, 1D line-based objects, and self-calibration using unknown scene points. The chapter reviews these techniques and discusses their relative advantages, recommending pre-calibration using a 2D or 3D apparatus when possible for highest accuracy. Camera calibration involves estimating the intrinsic and extrinsic parameters that relate 3D coordinates to 2D image projections.

Uploaded by

yokesh
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 39

Chapter 2

CAMERA CALIBRATION
Zhengyou Zhang

Camera calibration is a necessary step in 3D computer vision in order to


extract metric information from 2D images. It has been studied extensively
in computer vision and photogrammetry, and even recently new techniques
have been proposed. In this chapter, we review the techniques proposed
in the literature include those using 3D apparatus (two or three planes orthogonal to each other, or a plane undergoing a pure translation, etc.), 2D
objects (planar patterns undergoing unknown motions), 1D objects (wand
with dots) and unknown scene points in the environment (self-calibration).
The focus is on presenting these techniques within a consistent framework.

2.1

Introduction

Camera calibration is a necessary step in 3D computer vision in order to


extract metric information from 2D images. Much work has been done,
starting in the photogrammetry community (see [3, 6] to cite a few), and
more recently in computer vision ([12, 11, 33, 10, 37, 35, 22, 9] to cite a few).
According to the dimension of the calibration objects, we can classify those
techniques roughly into three categories.
3D reference object based calibration. Camera calibration is performed
by observing a calibration object whose geometry in 3-D space is known
with very good precision. Calibration can be done very eciently [8].
The calibration object usually consists of two or three planes orthogonal to each other. Sometimes, a plane undergoing a precisely known
translation is also used [33], which equivalently provides 3D reference
points. This approach requires an expensive calibration apparatus and
5

Camera Calibration

Chapter 2

an elaborate setup.
2D plane based calibration. Techniques in this category requires to observe a planar pattern shown at a few dierent orientations [42, 31].
Dierent from Tsais technique [33], the knowledge of the plane motion
is not necessary. Because almost anyone can make such a calibration
pattern by him/her-self, the setup is easier for camera calibration.
1D line based calibration. Calibration objects used in this category are
composed of a set of collinear points [44]. As will be shown, a camera
can be calibrated by observing a moving line around a xed point, such
as a string of balls hanging from the ceiling.
Self-calibration. Techniques in this category do not use any calibration
object, and can be considered as 0D approach because only image
point correspondences are required. Just by moving a camera in a
static scene, the rigidity of the scene provides in general two constraints [22, 21] on the cameras internal parameters from one camera
displacement by using image information alone. Therefore, if images
are taken by the same camera with xed internal parameters, correspondences between three images are sucient to recover both the
internal and external parameters which allow us to reconstruct 3-D
structure up to a similarity [20, 17]. Although no calibration objects
are necessary, a large number of parameters need to be estimated, resulting in a much harder mathematical problem.
Other techniques exist: vanishing points for orthogonal directions [4, 19],
and calibration from pure rotation [16, 30].
Before going further, Id like to point out that no single calibration technique is the best for all. It really depends on the situation a user needs to
deal with. Following are my few recommendations:
Calibration with apparatus vs. self-calibration. Whenever possible, if
we can pre-calibrate a camera, we should do it with a calibration apparatus. Self-calibration cannot usually achieve an accuracy comparable
with that of pre-calibration because self-calibration needs to estimate a
large number of parameters, resulting in a much harder mathematical
problem. When pre-calibration is impossible (e.g., scene reconstruction
from an old movie), self-calibration is the only choice.
Partial vs. full self-calibration. Partial self-calibration refers to the
case where only a subset of camera intrinsic parameters are to be cal-

Section 2.2.

Notation and Problem Statement

ibrated. Along the same line as the previous recommendation, whenever possible, partial self-calibration is preferred because the number
of parameters to be estimated is smaller. Take an example of 3D reconstruction with a camera with variable focal length. It is preferable
to pre-calibrate the pixel aspect ratio and the pixel skewness.
Calibration with 3D vs. 2D apparatus. Highest accuracy can usually be
obtained by using a 3D apparatus, so it should be used when accuracy is
indispensable and when it is aordable to make and use a 3D apparatus.
From the feedback I received from computer vision researchers and
practitioners around the world in the last couple of years, calibration
with a 2D apparatus seems to be the best choice in most situations
because of its ease of use and good accuracy.
Calibration with 1D apparatus. This technique is relatively new, and it
is hard for the moment to predict how popular it will be. It, however,
should be useful especially for calibration of a camera network. To
calibrate the relative geometry between multiple cameras as well as
their intrinsic parameters, it is necessary for all involving cameras to
simultaneously observe a number of points. It is hardly possible to
achieve this with 3D or 2D calibration apparatus1 if one camera is
mounted in the front of a room while another in the back. This is not
a problem for 1D objects. We can for example use a string of balls
hanging from the ceiling.
This chapter is organized as follows. Section 2.2 describes the camera
model and introduces the concept of the absolute conic which is important
for camera calibration. Section 2.3 presents the calibration techniques using
a 3D apparatus. Section 2.4 describes a calibration technique by observing a
freely moving planar pattern (2D object). Its extension for stereo calibration
is also addressed. Section 2.5 describes a relatively new technique which uses
a set of collinear points (1D object). Section 2.6 briey introduces the selfcalibration approach and provides references for further reading. Section 2.7
concludes the chapter with a discussion on recent work in this area.

2.2

Notation and Problem Statement

We start with the notation used in this chapter.


1

An exception is when those apparatus are made transparent; then the cost would be
much higher.

Camera Calibration

2.2.1

Pinhole Camera Model

X
M = Y
Z

(u0 , v0 )

Chapter 2

(R, t )
Figure 2.1. Pinhole camera model

A 2D point is denoted by m = [u, v]T . A 3D point is denoted by M =


 to denote the augmented vector by adding 1 as the last
[X, Y, Z]T . We use x

element: m = [u, v, 1]T and 
M = [X, Y, Z, 1]T . A camera is modeled by the
usual pinhole (see Figure 2.1): The image of a 3D point M, denoted by m is
formed by an optical ray from M passing through the optical center C and
intersecting the image plane. The three points M, m, and C are collinear. In
Figure 2.1, for illustration purpose, the image plane is positioned between
the scene point and the optical center, which is mathematically equivalent
to the physical setup under which the image plane is in the other side with
respect to the optical center. The relationship between the 3D point M and
its image projection m is given by


 =A R t 
sm
M P
M,
(2.1)
  
P

u0
with A = 0 v0
0 0 1


and P = A R t

(2.2)
(2.3)

where s is an arbitrary scale factor, (R, t), called the extrinsic parameters,
is the rotation and translation which relates the world coordinate system to
the camera coordinate system, and A is called the camera intrinsic matrix,
with (u0 , v0 ) the coordinates of the principal point, and the scale factors

Section 2.2.

Notation and Problem Statement

in image u and v axes, and the parameter describing the skew of the
two image axes. The 3 4 matrix P is called the camera projection matrix,
which mixes both intrinsic and extrinsic parameters. In Figure 2.1, the angle
between the two image axes is denoted by , and we have = cot . If the
pixels are rectangular, then = 90 and = 0.
The task of camera calibration is to determine the parameters of the
transformation between an object in 3D space and the 2D image observed by
the camera from visual information (images). The transformation includes
Extrinsic parameters (sometimes called external parameters): orientation (rotation) and location (translation) of the camera, i.e., (R, t);
Intrinsic parameters (sometimes called internal parameters): characteristics of the camera, i.e., (, , , u0 , v0 ).
The rotation matrix, although consisting of 9 elements, only has 3 degrees
of freedom. The translation vector t obviously has 3 parameters. Therefore,
there are 6 extrinsic parameters and 5 intrinsic parameters, leading to in
total 11 parameters.
We use the abbreviation AT for (A1 )T or (AT )1 .

2.2.2

Absolute Conic

Now let us introduce the concept of the absolute conic. For more details,
the reader is referred to [7, 15].
 = [x1 , x2 , x3 , x4 ]T .
A point x in 3D space has projective coordinates x
The equation of the plane at innity, , is x4 = 0. The absolute conic
is dened by a set of points satisfying the equation
x21 + x22 + x23 = 0
x4 = 0 .

(2.4)

Let x = [x1 , x2 , x3 ]T be a point on the absolute conic (see Figure 2.2).


 = [x1 , x2 , x3 , 0]T and
By denition, we have xT x = 0. We also have x
T
 x
 = 0. This can be interpreted as a conic of purely imaginary points
x
on . Indeed, let x = x1 /x3 and y = x2 /x3 be a point
on the conic, then
x2 + y 2 = 1, which is an imaginary circle of radius 1.
An important property of the absolute conic is its invariance

to any rigid
R t
transformation. Let the rigid transformation be H =
. Let x be
0 1

x
 =
a point on . By denition, its projective coordinates: x
with
0

10

Camera Calibration

Chapter 2

Absolute Conic

xT x = 0

x
fin
at in
e
n
Pla

ity

Image of
Absolute Conic

m T A T A 1m = 0

C
Figure 2.2. Absolute conic and its image

xT x = 0. The point after the rigid transformation is denoted by x , and




Rx

 = H
x
x =
.
0
Thus, x is also on the plane at innity. Furthermore, x is on the same
because

T
T
T
xT
x = (Rx ) (Rx ) = x (R R)x = 0 .

The image of the absolute conic, denoted by , is also an imaginary conic,


and is determined only by the intrinsic parameters of the camera. This can
be seen as follows. Consider the projection of a point x on , denoted by

Section 2.3.

Camera Calibration with 3D Objects

m , which is given by
 = sA[R
m

11


x
t] = sARx .
0

It follows that
 T AT A1 m
 = s2 xT RT Rx = s2 xT x = 0 .
m
Therefore, the image of the absolute conic is an imaginary conic, and is
dened by AT A1 . It does not depend on the extrinsic parameters of the
camera.
If we can determine the image of the absolute conic, then we can solve
the cameras intrinsic parameters, and the calibration is solved.
We will show several ways in this chapter how to determine , the image
of the absolute conic.

2.3

Camera Calibration with 3D Objects

The traditional way to calibrate a camera is to use a 3D reference object


such as those shown in Figure 2.3. In Fig. 2.3a, the calibration apparatus
used at INRIA [8] is shown, which consists of two orthogonal planes, on
each a checker pattern is printed. A 3D coordinate system is attached to
this apparatus, and the coordinates of the checker corners are known very
accurately in this coordinate system. A similar calibration apparatus is a
cube with a checker patterns painted in each face, so in general three faces
will be visible to the camera. Figure 2.3b illustrates the device used in Tsais
technique [33], which only uses one plane with checker pattern, but the plane
needs to be displaced at least once with known motion. This is equivalent
to knowing the 3D coordinates of the checker corners.
A popular technique in this category consists of four steps [8]:
1. Detect the corners of the checker pattern in each image;
2. Estimate the camera projection matrix P using linear least squares;
3. Recover intrinsic and extrinsic parameters A, R and t from P;
4. Rene A, R and t through a nonlinear optimization.
Note that it is also possible to rst rene P through a nonlinear optimization,
and then determine A, R and t from the rened P.
It is worth noting that using corners is not the only possibility. We can
avoid corner detection by working directly in the image. In [25], calibration

12

Camera Calibration

n
ow
Kn

(a)

Chapter 2

nt
me
e
c
pla
dis

(b)

Figure 2.3. 3D apparatus for calibrating cameras

is realized by maximizing the gradients around a set of control points that


dene the calibration object. Figure 2.4 illustrates the control points used
in that work.

Figure 2.4. Control points used in a gradient-based calibration technique

Section 2.3.

2.3.1

Camera Calibration with 3D Objects

13

Feature Extraction

If one uses a generic corner detector, such as Harris corner detector, to detect
the corners in the check pattern image, the result is usually not good because
the detector corners have poor accuracy (about one pixel). A better solution
is to leverage the known pattern structure by rst estimating a line for each
side of the square and then computing the corners by intersecting the tted
lines. There are two common techniques to estimate the lines. The rst is to
rst detect edges, and then t a line to the edges on each side of the square.
The second technique is to directly t a line to each side of a square in the
image such that the gradient on the line is maximized. One possibility is to
represent the line by an elongated Gaussian, and estimate the parameters
of the elongated Gaussian by maximizing the total gradient covered by the
Gaussian. We should note that if the lens distortion is not severe, a better
solution is to t just one single line to all the collinear sides. This will leads
a much more accurate estimation of the position of the checker corners.

2.3.2

Linear Estimation of the Camera Projection Matrix

Once we extract the corner points in the image, we can easily establish their
correspondences with the points in the 3D space because of knowledge of
the patterns. Based on the projection equation (2.1), we are now able to
estimate the camera parameters. However, the problem is quite nonlinear if
we try to estimate directly A, R and t. If, on the other hand, we estimate
the camera projection matrix P, a linear solution is possible, as to be shown
now.
Given each 2D-3D correspondence mi = (ui , vi ) Mi = (Xi , Yi , Zi ), we
can write down 2 equations based on (2.1):


Xi Yi Zi 1 0 0 0 0 ui Xi ui Yi ui Zi ui
p=0
0 0 0 0 Xi Yi Zi 1 vi Xi vi Yi vi Zi vi



Gi

where p = [p11 , p12 , . . . , p34 ]T and 0 = [0, 0]T .


For n point matches, we can stack all equations together:
Gp = 0

with G = [GT1 , . . . , GTn ]T

Matrix G is a 2n 12 matrix. The projection matrix can now be solved by


min Gp2
p

subject to p = 1

14

Camera Calibration

Chapter 2

The solution is the eigenvector of GT G associated with the smallest eigenvalue.


In the above, in order to avoid the trivial solution p = 0 and considering
the fact that p is dened up to a scale factor, we have set p = 1. Other
normalizations are possible. In [1], p34 = 1, which, however, introduce a singularity when the correct value of p34 is close to zero. In [10], the constraint
p231 + p232 + p233 = 1 was used, which is singularity free.
Anyway, the above linear technique minimizes an algebraic distance, and
yields a biased estimation when data are noisy. We will present later an
unbiased solution.

2.3.3

Recover Intrinsic and Extrinsic Parameters from P

Once the camera projection matrix P is known, we can uniquely recover the
intrinsic and extrinsic parameters of the camera. Let us denote the rst 33
submatrix of P by B and the last column of P by b, i.e., P [B b]. Since
P = A[R t], we have
B = AR

(2.5)

b = At

(2.6)

From (2.5), we have

2
+ 2 + u20 u0 v0 + c u0

   

kc
ku

K BBT = AAT = u0 v0 + c
v2 + v02
v0




  

kc
kv
u0
v0
1
Because P is dened up to a scale factor, the last element of K = BBT is usually not equal to 1, so we have to normalize it such that K33 (the last element) =
1. After that, we immediately obtain
u0 = K13

(2.7)

v0 = K23

= kv v02

(2.8)

kc u0 v0


= ku u20 2
=

(2.9)
(2.10)
(2.11)

Section 2.3.

Camera Calibration with 3D Objects

15

The solution is unambiguous because: > 0 and > 0.


Once the intrinsic parameters, or equivalently matrix A, are known, the
extrinsic parameters can be determined from (2.5) and (2.6) as:
R = A1 B
t=A

2.3.4

b.

(2.12)
(2.13)

Rene Calibration Parameters Through a Nonlinear Optimization

The above solution is obtained through minimizing an algebraic distance


which is not physically meaningful. We can rene it through maximum
likelihood inference.
We are given n 2D-3D correspondences mi = (ui , vi ) Mi = (Xi , Yi , Zi ).
Assume that the image points are corrupted by independent and identically
distributed noise. The maximum likelihood estimate can be obtained by
minimizing the distances between the image points and their predicted positions, i.e.,

mi (P, Mi )2
(2.14)
min
P

where (P, Mi ) is the projection of Mi onto the image according to (2.1).


This is a nonlinear minimization problem, which can be solved with the
Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It requires
an initial guess of P which can be obtained using the linear technique described earlier. Note that since P is dened up to a scale factor, we can set
the element having the largest initial value as 1 during the minimization.
Alternatively, instead of estimating P as in (2.14), we can directly estimate the intrinsic and extrinsic parameters, A, R, and t, using the same
criterion. The rotation matrix can be parameterized with three variables
such as Euler angles or scaled rotation vector.

2.3.5

Lens Distortion

Up to this point, we use the pinhole model to describe a camera. It says


that the point in 3D space, its corresponding point in image and the cameras
optical center are collinear. This linear projective equation is sometimes not
sucient, especially for low-end cameras (such as WebCams) or wide-angle
cameras; lens distortion has to be considered.
According to [33], there are four steps in camera projection including lens
distortion:

16

Camera Calibration

Chapter 2

Step 1: Rigid transformation from world coordinate system (Xw , Yw , Zw )


to camera one (X, Y, Z):
[X

Z]T = R [Xw

Yw

Zw ]T + t

Step 2: Perspective projection from 3D camera coordinates (X, Y, Z) to


ideal image coordinates (x, y) under pinhole camera model:
x=f

X
,
Z

y=f

Y
Z

where f is the eective focal length.


Step 3: Lens distortion 2 :
x
= x + x ,

y = y + y

where (
x, y) are the distorted or true image coordinates, and (x , y )
are distortions applied to (x, y).
Step 4: Ane transformation from real image coordinates (
x, y) to frame
buer (pixel) image coordinates (
u, v):
u
= d1
+ u0 ,
x x

v = d1
+ v0 ,
y y

where (u0 , v0 ) are coordinates of the principal point; dx and dy are distances between adjacent pixels in the horizontal and vertical directions,
respectively.
There are two types of distortions:
Radial distortion: It is symmetric; ideal image points are distorted along
radial directions from the distortion center. This is caused by imperfect
lens shape.
Decentering distortion: This is usually caused by improper lens assembly; ideal image points are distorted in both radial and tangential directions.
The reader is referred to [29, 3, 6, 37] for more details.
2

Note that the lens distortion described here is dierent from Tsais treatment. Here,
we go from ideal to real image coordinates, similar to [36].

Section 2.3.

17

Camera Calibration with 3D Objects

 The distortion can be expressed as power series in radial distance r =


x2 + y 2 :
x = x(k1 r2 + k2 r4 + k3 r6 + ) + [p1 (r2 + 2x2 ) + 2p2 xy](1 + p3 r2 + ) ,
y = y(k1 r2 + k2 r4 + k3 r6 + ) + [2p1 xy + p2 (r2 + 2y 2 )](1 + p3 r2 + ) ,
where ki s are coecients of radial distortion and pj s and coecients of
decentering distortion.
Based on the reports in the literature [3, 33, 36], it is likely that the
distortion function is totally dominated by the radial components, and especially dominated by the rst term. It has also been found that any more
elaborated modeling not only would not help (negligible when compared with
sensor quantization), but also would cause numerical instability [33, 36].
Denote the ideal pixel image coordinates by u = x/dx , and v = y/dy . By
combining Step 3 and Step 4 and if only using the rst two radial distortion
terms, we obtain the following relationship between (
u, v) and (u, v):
u
= u + (u u0 )[k1 (x2 + y 2 ) + k2 (x2 + y 2 )2 ]

(2.15)

v = v + (v v0 )[k1 (x + y ) + k2 (x + y ) ] .

(2.16)

2 2

Following the same reasoning as in (2.14), camera calibration including


lens distortion can be performed by minimizing the distances between the
image points and their predicted positions, i.e.,


mi m(A,
R, t, k1 , k2 , Mi )2
(2.17)
min
A,R,t,k1 ,k2

where m(A,
R, t, k1 , k2 , Mi ) is the projection of Mi onto the image according
to (2.1), followed by distortion according to (2.15) and (2.16).

2.3.6

An Example

Figure 2.5 displays an image of a 3D reference object, taken by a camera to


be calibrated at INRIA. Each square has 4 corners, and there are in total
128 points used for calibration.
Without considering lens distortion, the estimated camera projection matrix is

7.025659e01 2.861189e02 5.377696e01 6.241890e+01


1.265804e+00
1.591456e01 1.075646e+01
P = 2.077632e01
4.634764e04 5.282382e05
4.255347e04
1
From P, we can calculate the intrinsic parameters: = 1380.12, =
2032.57, 0, u0 = 246.52, and v0 = 243.68. So, the angle between the two

18

Camera Calibration

Chapter 2

Figure 2.5. An example of camera calibration with a 3D apparatus

image axes is 90 , and the aspect ratio of the pixels is / = 0.679. For the
extrinsic parameters, the translation vector t = [211.28, 106.06, 1583.75]T
(in mm), i.e., the calibration object is about 1.5m away from the camera;
the rotation axis is [0.08573, 0.99438, 0.0621]T (i.e., almost vertical), and
the rotation angle is 47.7 .
Other notable work in this category include [27, 38, 36, 18].

2.4

Camera Calibration with 2D Objects: Plane-based Technique

In this section, we describe how a camera can be calibrated using a moving


plane. We rst examine the constraints on the cameras intrinsic parameters
provided by observing a single plane.

2.4.1

Homography between the model plane and its image

Without loss of generality, we assume the model plane is on Z = 0 of the


world coordinate system. Lets denote the ith column of the rotation matrix

Section 2.4.

Camera Calibration with 2D Objects: Plane Based Technique

19

R by ri . From (2.1), we have




X
u

 Y

 X


s v = A r1 r2 r3 t
0 = A r1 r2 t Y .
1
1
1

By abuse of notation, we still use M to denote a point on the model plane, but
M = [X, Y, 1]T . Therefore,
M = [X, Y ]T since Z is always equal to 0. In turn, 
a model point M and its image m is related by a homography H:


 = H
sm
M
with H = A r1 r2 t .
(2.18)
As is clear, the 3 3 matrix H is dened up to a scale factor.

2.4.2

Constraints on the intrinsic parameters

Given an image of the model plane, an homography can be estimated (see


Appendix). Lets denote it by H = [h1 h2 h3 ]. From (2.18), we have
[h1

h2

h3 ] = A [ r1

r2

t] ,

where is an arbitrary scalar. Using the knowledge that r1 and r2 are


orthonormal, we have
hT1 AT A1 h2 = 0
hT1 AT A1 h1

hT2 AT A1 h2

(2.19)
.

(2.20)

These are the two basic constraints on the intrinsic parameters, given one
homography. Because a homography has 8 degrees of freedom and there
are 6 extrinsic parameters (3 for rotation and 3 for translation), we can only
obtain 2 constraints on the intrinsic parameters. Note that AT A1 actually
describes the image of the absolute conic [20]. In the next subsection, we
will give an geometric interpretation.

2.4.3

Geometric Interpretation

We are now relating (2.19) and (2.20) to the absolute conic [22, 20].
It is not dicult to verify that the model plane, under our convention, is
described in the camera coordinate system by the following equation:


T x
y
r3
=0,
T
r3 t z
w

20

Camera Calibration

Chapter 2

where w = 0 for points at innity and w = 1 otherwise. This


intersects


plane
r
r1
and 2 are
the plane at innity at a line, and we can easily see that
0
0
two particular points on that line. Any point on it is a linear combination
of these two points, i.e.,



r1
r2
ar1 + br2
+b
=
.
x = a
0
0
0
Now, lets compute the intersection of the above line with the absolute
conic. By denition, the point x , known as the circular point [26], satises:
xT x = 0, i.e., (ar1 + br2 )T (ar1 + br2 ) = 0, or a2 + b2 = 0 . The solution
is b = ai, where i2 = 1. That is, the two intersection points are


r1 ir2
.
x = a
0
The signicance of this pair of complex conjugate points lies in the fact that
they are invariant to Euclidean transformations. Their projection in the
image plane is given, up to a scale factor, by
 = A(r1 ir2 ) = h1 ih2 .
m
 is on the image of the absolute conic, described by AT A1 [20].
Point m
This gives
(h1 ih2 )T AT A1 (h1 ih2 ) = 0 .
Requiring that both real and imaginary parts be zero yields (2.19) and (2.20).

2.4.4

Closed-form solution

We now provide the details on how to eectively solve the camera calibration
problem. We start with an analytical solution. This initial estimation will
be followed by a nonlinear optimization technique based on the maximum
likelihood criterion, to be described in the next subsection.
Let

B11 B12 B13


B = AT A1 B12 B22 B23
(2.21)
B13 B23 B33

v0 u0
1
2
2
2

2
0 )
+1
(v0u
v02
=
(2.22)
22
2 2 2
2
.
(v0 u0 ) v0
(v0 u0 )2 v02
v0 u0
2 2 2
+ 2 +1
2
2 2

Section 2.4.

Camera Calibration with 2D Objects: Plane Based Technique

21

Note that B is symmetric, dened by a 6D vector


b = [B11 , B12 , B22 , B13 , B23 , B33 ]T .

(2.23)

Let the ith column vector of H be hi = [hi1 , hi2 , hi3 ]T . Then, we have
T
hTi Bhj = vij
b

(2.24)

with vij = [hi1 hj1 , hi1 hj2 +hi2 hj1 , hi2 hj2 , hi3 hj1 +hi1 hj3 , hi3 hj2 +hi2 hj3 , hi3 hj3 ]T .
Therefore, the two fundamental constraints (2.19) and (2.20), from a given
homography, can be rewritten as 2 homogeneous equations in b:


T
v12
b=0.
(2.25)
(v11 v22 )T
If n images of the model plane are observed, by stacking n such equations
as (2.25) we have
Vb = 0 ,
(2.26)
where V is a 2n6 matrix. If n 3, we will have in general a unique solution
b dened up to a scale factor. If n = 2, we can impose the skewless constraint
= 0, i.e., [0, 1, 0, 0, 0, 0]b = 0, which is added as an additional equation to
(2.26). (If n = 1, we can only solve two camera intrinsic parameters, e.g.,
and , assuming u0 and v0 are known (e.g., at the image center) and = 0,
and that is indeed what we did in [28] for head pose determination based
on the fact that eyes and mouth are reasonably coplanar. In fact, Tsai [33]
already mentions that focal length from one plane is possible, but incorrectly
says that aspect ratio is not.) The solution to (2.26) is well known as the
eigenvector of VT V associated with the smallest eigenvalue (equivalently,
the right singular vector of V associated with the smallest singular value).
Once b is estimated, we can compute all camera intrinsic parameters as
follows. The matrix B, as described in Sect. 2.4.4, is estimated up to a scale
factor, i.e.,, B = AT A with an arbitrary scale. Without diculty, we
can uniquely extract the intrinsic parameters from matrix B.
2
v0 = (B12 B13 B11 B23 )/(B11 B22 B12
)
2
= B33 [B13
+ v0 (B12 B13 B11 B23 )]/B11

= /B11

2 )
= B11 /(B11 B22 B12

= B12 2 /
u0 = v0 / B13 2 / .

22

Camera Calibration

Chapter 2

Once A is known, the extrinsic parameters for each image is readily


computed. From (2.18), we have
r1 = A1 h1 ,

r2 = A1 h2 ,

r3 = r1 r2 ,

t = A1 h3

with = 1/A1 h1  = 1/A1 h2 . Of course, because of noise in data, the


so-computed matrix R = [r1 , r2 , r3 ] does not in general satisfy the properties
of a rotation matrix. The best rotation matrix can then be obtained through
for example singular value decomposition [13, 41].

2.4.5

Maximum likelihood estimation

The above solution is obtained through minimizing an algebraic distance


which is not physically meaningful. We can rene it through maximum
likelihood inference.
We are given n images of a model plane and there are m points on the
model plane. Assume that the image points are corrupted by independent
and identically distributed noise. The maximum likelihood estimate can be
obtained by minimizing the following functional:
m
n 



mij m(A,
Ri , ti , Mj )2 ,

(2.27)

i=1 j=1


where m(A,
Ri , ti , Mj ) is the projection of point Mj in image i, according to
equation (2.18). A rotation R is parameterized by a vector of 3 parameters,
denoted by r, which is parallel to the rotation axis and whose magnitude is
equal to the rotation angle. R and r are related by the Rodrigues formula [8].
Minimizing (2.27) is a nonlinear minimization problem, which is solved with
the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. It
requires an initial guess of A, {Ri , ti |i = 1..n} which can be obtained using
the technique described in the previous subsection.
Desktop cameras usually have visible lens distortion, especially the radial components. We have included these while minimizing (2.27). See my
technical report [41] for more details.

2.4.6

Dealing with radial distortion

Up to now, we have not considered lens distortion of a camera. However, a


desktop camera usually exhibits signicant lens distortion, especially radial
distortion. The reader is referred to Section 2.3.5 for distortion modeling.
In this section, we only consider the rst two terms of radial distortion.

Section 2.4.

Camera Calibration with 2D Objects: Plane Based Technique

23

Estimating Radial Distortion by Alternation. As the radial distortion is expected to be small, one would expect to estimate the other ve intrinsic
parameters, using the technique described in Sect. 2.4.5, reasonable well by
simply ignoring distortion. One strategy is then to estimate k1 and k2 after
having estimated the other parameters, which will give us the ideal pixel
coordinates (u, v). Then, from (2.15) and (2.16), we have two equations for
each point in each image:



u
u
(uu0 )(x2 +y 2 ) (uu0 )(x2 +y 2 )2 k1
=
.
vv
(vv0 )(x2 +y 2 ) (vv0 )(x2 +y 2 )2 k2

Given m points in n images, we can stack all equations together to obtain


in total 2mn equations, or in matrix form as Dk = d, where k = [k1 , k2 ]T .
The linear least-squares solution is given by
k = (DT D)1 DT d .

(2.28)

Once k1 and k2 are estimated, one can rene the estimate of the other param
eters by solving (2.27) with m(A,
Ri , ti , Mj ) replaced by (2.15) and (2.16).
We can alternate these two procedures until convergence.
Complete Maximum Likelihood Estimation. Experimentally, we found the
convergence of the above alternation technique is slow. A natural extension
to (2.27) is then to estimate the complete set of parameters by minimizing
the following functional:
m
n 


mij m(A,
k1 , k2 , Ri , ti , Mj )2 ,

(2.29)

i=1 j=1

where m(A,
k1, k2, Ri , ti , Mj ) is the projection of point Mj in image i according to equation (2.18), followed by distortion according to (2.15) and
(2.16). This is a nonlinear minimization problem, which is solved with the
Levenberg-Marquardt Algorithm as implemented in Minpack [23]. A rotation is again parameterized by a 3-vector r, as in Sect. 2.4.5. An initial guess
of A and {Ri , ti |i = 1..n} can be obtained using the technique described in
Sect. 2.4.4 or in Sect. 2.4.5. An initial guess of k1 and k2 can be obtained with
the technique described in the last paragraph, or simply by setting them to
0.

2.4.7

Summary

The recommended calibration procedure is as follows:

24

Camera Calibration

Chapter 2

1. Print a pattern and attach it to a planar surface;


2. Take a few images of the model plane under dierent orientations by
moving either the plane or the camera;
3. Detect the feature points in the images;
4. Estimate the ve intrinsic parameters and all the extrinsic parameters
using the closed-form solution as described in Sect. 2.4.4;
5. Estimate the coecients of the radial distortion by solving the linear
least-squares (2.28);
6. Rene all parameters, including lens distortion parameters, by minimizing (2.29).
There is a degenerate conguration in my technique when planes are
parallel to each other. See my technical report [41] for a more detailed
description.
In summary, this technique only requires the camera to observe a planar
pattern from a few dierent orientations. Although the minimum number
of orientations is two if pixels are square, we recommend 4 or 5 dierent
orientations for better quality. We can move either the camera or the planar
pattern. The motion does not need to be known, but should not be a pure
translation. When the number of orientations is only 2, one should avoid
positioning the planar pattern parallel to the image plane. The pattern could
be anything, as long as we know the metric on the plane. For example, we
can print a pattern with a laser printer and attach the paper to a reasonable
planar surface such as a hard book cover. We can even use a book with known
size because the four corners are enough to estimate the plane homographies.

2.4.8

Experimental Results

The proposed algorithm has been tested on both computer simulated data
and real data. The closed-form solution involves nding a singular value
decomposition of a small 2n 6 matrix, where n is the number of images.
The nonlinear renement within the Levenberg-Marquardt algorithm takes
3 to 5 iterations to converge. Due to space limitation, we describe in this
section one set of experiments with real data when the calibration pattern
is at dierent distances from the camera. The reader is referred to [41] for
more experimental results with both computer simulated and real data, and
to the following Web page:

Section 2.4.

Camera Calibration with 2D Objects: Plane Based Technique

(A)

25

(B)

Figure 2.6. Two sets of images taken at dierent distances to the calibration
pattern. Each set contains ve images. On the left, three images from the set taken
at a close distance are shown. On the right, three images from the set taken at a
larger distance are shown.

26

Camera Calibration

Chapter 2

Table 2.1. Calibration results with the images shown in Figure 2.6

image set
A
B
A+B

834.01
836.17
834.64

839.86
841.08
840.32

89.95
89.92
89.94

u0
305.51
301.76
304.77

v0
240.09
241.51
240.59

k1
-0.2235
-0.2676
-0.2214

k2
0.3761
1.3121
0.3643

http://research.microsoft.com/zhang/Calib/
for some experimental data and the software.
The example is shown in Fig. 2.6. The camera to be calibrated is an
o-the-shelf PULNiX CCD camera with 6 mm lens. The image resolution
is 640480. As can be seen in Fig. 2.6, the model plane contains a 9 9
squares with 9 special dots which are used to identify automatically the correspondence between reference points on the model plane and square corners
in images. It was printed on a A4 paper with a 600 DPI laser printer, and
attached to a cardboard.
In total 10 images of the plane were taken (6 of them are shown in
Fig. 2.6). Five of them (called Set A) were taken at close range, while the
other ve (called Set B) were taken at a larger distance. We applied our
calibration algorithm to Set A, Set B, and also to the whole set (called Set
A+B). The results are shown in Table 2.1. For intuitive understanding, we
show the estimated angle between the image axes, , instead of the skew
factor . We can see that the angle is very close to 90 , as expected with
almost all modern CCD cameras. The cameras parameters were estimated
consistently for all three sets of images, except the distortion parameters
with Set B. The reason is that the calibration pattern only occupies the
central part of the image in Set B, where lens distortion is not signicant
and therefore cannot be estimated reliably.

2.4.9

Related Work

Almost at the same time, Sturm and Maybank [31], independent from us,
developed the same technique. They assumed the pixels are square (i.e., =
0) and have studied the degenerate congurations for plane-based camera
calibration.
Gurdjos et al. [14] have re-derived the plane-based calibration technique
from the center line constraint.
My original implementation (only the executable) is available at

Section 2.5.

Solving Camera Calibration With 1D Objects

27

http://research.microsoft.com/zhang/calib/.
Bouguet has re-implemented my technique in Matlab, which is available at
http://www.vision.caltech.edu/bouguetj/calib doc/.
In many applications such as stereo, multiple cameras need to be calibrated simultaneously in order to determine the relative geometry between
cameras. In 2000, I have extended (not published) this plane-based technique
to stereo calibration for my stereo-based gaze-correction project [40, 39].
The formulation is similar to (2.29). Consider two cameras, and denote the
quantity related to the second camera by  . Let (Rs , ts ) be the rigid transformation between the two cameras such that (R , t ) = (R, t) (Rs , ts ) or
more precisely: R = RRs and t = Rts + t. Stereo calibration is then to
solve A, A , k1 , k2 , k1 , k2 , {(Ri , ti )|i = 1, . . . , n}, and (Rs , ts ) by minimizing
the following functional:
m
n 





 , k1 , k2 , Ri , ti , Mj )2


ij mij m(A,
k1 , k2 , Ri , ti , Mj )2 + ij
mij m(A
i=1 j=1

(2.30)
subject to
Ri = Ri Rs

and ti = Ri ts + ti .

In the above formulation, ij = 1 if point j is visible in the rst camera,


 = 1 if point j is visible in the second
and ij = 0 otherwise. Similarly, ij
camera. This formulation thus does not require the same number of feature
points to be visible over time or across cameras. Another advantage of this
formulation is that the number of extrinsic parameters to be estimated has
been reduced from 12n if the two cameras are calibrated independently to
6n + 6. This is a reduction of 24 dimensions in parameter space if 5 planes
are used.
Obviously, this is a nonlinear optimization problem. To obtain the initial guess, we run rst single-camera calibration independently for each camera, and compute Rs through SVD from Ri = Ri Rs (i = 1, . . . , n) and ts
through least-squares from ti = Ri ts + ti (i = 1, . . . , n). Recently, a closedform initialization technique through factorization of homography matrices
is proposed in [34].

2.5

Solving Camera Calibration With 1D Objects

In this section, we describe in detail how to solve the camera calibration


problem from a number of observations of a 1D object consisting of 3 collinear
points moving around one of them [43, 44]. We only consider this minimal

28

Camera Calibration

Chapter 2

conguration, but it is straightforward to extend the result if a calibration


object has four or more collinear points.

2.5.1

Setups With Free-Moving 1D Calibration Objects

We now examine possible setups with 1D objects for camera calibration.


As already mentioned in the introduction, we need to have several observations of the 1D objects. Without loss of generality, we choose the camera
coordinate system to dene the 1D objects; therefore, R = I and t = 0 in
(2.1).
This could be the two endpoints of a stick,
and we take a number of images while waving freely the stick. Let A and B
be the two 3D points, and a and b be the observed image points. Because
the distance between A and B is known, we only need 5 parameters to dene
A and B. For example, we need 3 parameters to specify the coordinates of A
in the camera coordinate system, and 2 parameters to dene the orientation
of the line AB. On the other hand, each image point provides two equations
according to (2.1), giving in total 4 equations. Given N observations of
the stick, we have 5 intrinsic parameters and 5N parameters for the point
positions to estimate, i.e., the total number of unknowns is 5+5N . However,
we only have 4N equations. Camera calibration is thus impossible.
Two points with known distance.

By adding an additional point,


say C, the number of unknowns for the point positions still remains the
same, i.e., 5 + 5N , because of known distances of C to A and B. For each
observation, we have three image points, yielding in total 6N equations.
Calibration seems to be plausible, but is in fact not. This is because the
three image points for each observation must be collinear. Collinearity is
preserved by perspective projection. We therefore only have 5 independent
equations for each observation. The total number of independent equations,
5N , is always smaller than the number of unknowns. Camera calibration is
still impossible.

Three collinear points with known distances.

Four or more collinear points with known distances. As seen above, when the
number of points increases from two to three, the number of independent
equations (constraints) increases by one for each observation. If we have
a fourth point, will we have in total 6N independent equations? If so, we
would be able to solve the problem because the number of unknowns remains
the same, i.e., 5 + 5N , and we would have more than enough constraints if
N 5. The reality is that the addition of the fourth point or even more
points does not increase the number of independent equations. It will always

Section 2.5.

Solving Camera Calibration With 1D Objects

29

be 5N for any four or more collinear points. This is because the cross ratio is
preserved under perspective projection. With known cross ratios and three
collinear points, whether they are in space or in images, other points are
determined exactly.

2.5.2

Setups With 1D Calibration Objects Moving Around a xed


Point

From the above discussion, calibration is impossible with a free moving 1D


calibration object, no matter how many points on the object. Now let us
examine what happens if one point is xed. In the sequel, without loss of
generality, point A is the xed point, and a is the corresponding image point.
We need 3 parameters, which are unknown, to specify the coordinates of A
in the camera coordinate system, while image point a provides two scalar
equations according to (2.1).
Two points with known distance. They could be the endpoints of a stick,
and we move the stick around the endpoint that is xed. Let B be the free
endpoint and b, its corresponding image point. For each observation, we
need 2 parameters to dene the orientation of the line AB and therefore the
position of B because the distance between A and B is known. Given N
observations of the stick, we have 5 intrinsic parameters, 3 parameters for A
and 2N parameters for the free endpoint positions to estimate, i.e., the total
number of unknowns is 8 + 2N . However, each observation of b provides
two equations, so together with a we only have in total 2 + 2N equations.
Camera calibration is thus impossible.
Three collinear points with known distances. As already explained in the last
subsection, by adding an additional point, say C, the number of unknowns
for the point positions still remains the same, i.e., 8 + 2N . For each observation, b provides two equations, but c only provides one additional equation
because of the collinearity of a, b and c. Thus, the total number of equations
is 2 + 3N for N observations. By counting the numbers, we see that if we
have 6 or more observations, we should be able to solve camera calibration,
and this is the case as we shall show in the next section.
Four or more collinear points with known distances. Again, as already explained in the last subsection, The number of unknowns and the number of
independent equations remain the same because of invariance of cross-ratios.
This said, the more collinear points we have, the more accurate camera calibration will be in practice because data redundancy can combat the noise
in image data.

30

2.5.3

Camera Calibration

Chapter 2

Basic Equations

Figure 2.7. Illustration of 1D calibration objects

Refer to Figure 2.7. Point A is the xed point in space, and the stick AB
moves around A. The length of the stick AB is known to be L, i.e.,
B A = L .

(2.31)

The position of point C is also known with respect to A and B, and therefore
C = A A + B B ,

(2.32)

where A and B are known. If C is the midpoint of AB, then A = B = 0.5.


Points a, b and c on the image plane are projection of space points A, B and
C, respectively.
Without loss of generality, we choose the camera coordinate system to
dene the 1D objects; therefore, R = I and t = 0 in (2.1). Let the unknown
depths for A, B and C be zA , zB and zC , respectively. According to (2.1), we

Section 2.5.

Solving Camera Calibration With 1D Objects

31

have

A = zA A1 a

B = zB A1 b
C = zC A


c.

(2.33)
(2.34)
(2.35)

Substituting them into (2.32) yields



 + zB B b
zC 
c = zA A a

(2.36)

after eliminating A1 from both sides. By performing cross-product on both


sides of the above equation with 
c, we have
 
zA A (
a
c) + zB B (b
c) = 0 .
In turn, we obtain
zB = zA

 
A (
a
c) (b
c)
.
 
 
B (b
c) (b
c)

(2.37)

From (2.31), we have


 zA a
) = L .
A1 (zB b
Substituting zB by (2.37) gives
 

A (
a
c) (b
c)  
+
b =L.
zA A1 a
 
 
B (b
c) (b
c)
This is equivalent to

2 T T 1
h A A h = L2
zA

with
+
h=a

 
a
c) (b
c) 
A (
b.


B (b 
c) (b 
c)

(2.38)

(2.39)

Equation (2.38) contains the unknown intrinsic parameters A and the unknown depth, zA , of the xed point A. It is the basic constraint for camera
calibration with 1D objects. Vector h, given by (2.39), can be computed from
image points and known A and B . Since the total number of unknowns
is 6, we need at least six observations of the 1D object for calibration. Note
that AT A actually describes the image of the absolute conic [20].

32

Camera Calibration

2.5.4

Chapter 2

Closed-Form Solution

Let
B = AT A1

1
2

2

v0 u0
2

B11 B12 B13


B12 B22 B23
B13 B23 B33
2

2
+1
2 2 2
0 )
(v0u
v02
22

(2.40)

v0 u0
2
(v0 u0 ) v0
2 2 2

(v0 u0 )2 v02
+
+1
2 2
2

(2.41)

Note that B is symmetric, and can be dened by a 6D vector


b = [B11 , B12 , B22 , B13 , B23 , B33 ]T .

(2.42)

2 b, then equation (2.38) becomes


Let h = [h1 , h2 , h3 ]T , and x = zA

vT x = L2

(2.43)

with
v = [h21 , 2h1 h2 , h22 , 2h1 h3 , 2h2 h3 , h23 ]T .
When N images of the 1D object are observed, by stacking n such equations
as (2.43) we have
Vx = L2 1 ,
(2.44)
where V = [v1 , . . . , vN ]T and 1 = [1, . . . , 1]T . The least-squares solution is
then given by
x = L2 (VT V)1 VT 1 .
(2.45)
2 b.
Once x is estimated, we can compute all the unknowns based on x = zA
T
Let x = [x1 , x2 , . . . , x6 ] . Without diculty, we can uniquely extract the
intrinsic parameters and the depth zA as

v0 = (x2 x4 x1 x5 )/(x1 x3 x22 )



zA = x6 [x24 + v0 (x2 x4 x1 x5 )]/x1

= zA /x1

= zA x1 /(x1 x3 x22 )
= x2 2 /zA
u0 = v0 / x4 2 /zA .
At this point, we can compute zB according to (2.37), so points A and
B can be computed from (2.33) and (2.34), while point C can be computed
according to (2.32).

Section 2.5.

2.5.5

Solving Camera Calibration With 1D Objects

33

Nonlinear Optimization

The above solution is obtained through minimizing an algebraic distance


which is not physically meaningful. We can rene it through maximum
likelihood inference.
We are given N images of the 1D calibration object and there are 3
points on the object. Point A is xed, and points B and C moves around A.
Assume that the image points are corrupted by independent and identically
distributed noise. The maximum likelihood estimate can be obtained by
minimizing the following functional:
N




ai (A, A)2 + bi (A, Bi )2 + ci (A, Ci )2 ,

(2.46)

i=1

where (A, M) (M {A, Bi , Ci }) is the projection of point M onto the image,


according to equations (2.33) to (2.35). More precisely, (A, M) = z1M AM,
where zM is the z-component of M.
The unknowns to be estimated are:
5 camera intrinsic parameters , , , u0 and v0 that dene matrix A;
3 parameters for the coordinates of the xed point A;
2N additional parameters to dene points Bi and Ci at each instant
(see below for more details).
Therefore, we have in total 8 + 2N unknowns. Regarding the parameterization for B and C, we use the spherical coordinates and to dene the
direction of the 1D calibration object, and point B is then given by

sin cos
B = A + L sin sin
cos
where L is the known distance between A and B. In turn, point C is computed
according to (2.32). We therefore only need 2 additional parameters for each
observation.
Minimizing (2.46) is a nonlinear minimization problem, which is solved
with the Levenberg-Marquardt Algorithm as implemented in Minpack [23].
It requires an initial guess of A, A, {Bi , Ci |i = 1..N } which can be obtained
using the technique described in the last subsection.

34

2.5.6

Camera Calibration

Chapter 2

Estimating the xed point

In the above discussion, we assumed that the image coordinates, a, of the


xed point A are known. We now describe how to estimate a by considering
whether the xed point A is visible in the image or not.
The xed point does not need to be visible in the image.
And the camera calibration technique becomes more versatile without the
visibility requirement. In that case, we can for example hang a string of
small balls from the ceiling, and calibrate multiple cameras in the room by
swinging the string. The xed point can be estimated by intersecting lines
from dierent images as described below.
Each observation of the 1D object denes an image line. An image line
can be represented by a 3D vector l = [l1 , l2 , l3 ]T , dened up to a scale factor
 = 0. In the sequel, we
such as a point m = [u, v]T on the line satises lT m
T
also use (n, q) to denote line l, where n = [l1 , l2 ] and q = l3 . To remove
the scale ambiguity, we normalize l such that l = 1. Furthermore, each l
is associated with an uncertainty measure represented by a 3 3 covariance
matrix .
Given N images of the 1D object, we have N lines: {(li , i )|i = 1, . . . , N }.
Let the xed point be a in the image. Obviously, if there is no noise, we
 = 0, or nTi a + qi = 0. Therefore, we can estimate a by minimizing
have lTi a
Invisible xed point.

F=

N

i=1

 2 =
wi lTi a

N


wi nTi a + qi 2 =

i=1

N


wi (aT ni nTi a + 2qi nTi a + qi2 )

i=1

(2.47)
where wi is a weighting factor (see below). By setting the derivative of F
with respect to a to 0, we obtain the solution, which is given by
N
N
1  


wi ni nTi
wi qi ni .
a=
i=1

i=1

,
The optimal weighting factor wi in (2.47) is the inverse of the variance of lTi a
T
). Note that the weight wi involves the unknown a.
a i a
which is wi = 1/(
To overcome this diculty, we can approximate wi by 1/ trace(i ) for the
rst iteration, and by re-computing wi with the previously estimated a in
the subsequent iterations. Usually two or three iterations are enough.
Visible xed point. Since the xed point is visible, we have N observations:

2
{ai |i = 1, . . . , N }. We can therefore estimate a by minimizing N
i=1 aai  ,
assuming that the image
are detected with the same accuracy. The
points
N
solution is simply a = ( i=1 ai )/N .

Section 2.5.

Solving Camera Calibration With 1D Objects

35

The above estimation does not make use of the fact that the xed point
is also the intersection of the N observed lines of the 1D object. Therefore,
a better technique to estimate a is to minimize the following function:
F=

N
N


 


 2 =
(aai )T Vi1 (aai )+wi lTi a
(aai )T Vi1 (aai )+wi nTi a+qi 2
i=1

i=1

(2.48)
where Vi is the covariance matrix of the detected point ai . The derivative
of the above function with respect to a is given by


F
Vi1 (a ai ) + wi ni nTi a + wi qi ni .
=2
a
N

i=1

Setting it to 0 yields
a=

N


(Vi1

N
1  

wi ni nTi )

i=1


(Vi1 ai wi qi ni ) .

i=1

If more than three points are visible in each image, the known cross ratio
provides an additional constraint in determining the xed point.
For an accessible description of uncertainty manipulation, the reader is
referred to [45, Chapter 2].

2.5.7

Experimental Results

The proposed algorithm has been tested on both computer simulated data
and real data.

Computer Simulations
The simulated camera has the following property: = 1000, = 1000,
= 0, u0 = 320, and v0 = 240. The image resolution is 640 480. A stick of
70 cm is simulated with the xed point A at [0, 35, 150]T . The other endpoint
of the stick is B, and C is located at the half way between A and B. We have
generated 100 random orientations of the stick by sampling in [/6, 5/6]
and in [, 2] according to uniform distribution. Points A, B, and C are
then projected onto the image.
Gaussian noise with 0 mean and standard deviation is added to the
projected image points a, b and c. The estimated camera parameters are
compared with the ground truth, and we measure their relative errors with
respect to the focal length . Note that we measure the relative errors in

36

Camera Calibration

Chapter 2

Table 2.2. Calibration results with real data.

Solution
Closed-form
Nonlinear
Plane-based
Relative dierence

u0
v0

889.49 818.59 -0.1651 (90.01 ) 297.47 234.33


838.49 799.36 4.1921 (89.72 ) 286.74 219.89
828.92 813.33 -0.0903 (90.01 ) 305.23 235.17
1.15% 1.69%
0.52% (0.29 )
2.23% 1.84%

(u0 , v0 ) with respect to , as proposed by Triggs in [32]. He pointed out


that the absolute errors in (u0 , v0 ) is not geometrically meaningful, while
computing the relative error is equivalent to measuring the angle between
the true optical axis and the estimated one.
We vary the noise level from 0.1 pixels to 1 pixel. For each noise level,
we perform 120 independent trials, and the results shown in Fig. 2.8 are the
average. Figure 2.8a displays the relative errors of the closed-form solution
while Figure 2.8b displays those of the nonlinear minimization result. Errors
increase almost linearly with the noise level. The nonlinear minimization
renes the closed-form solution, and produces signicantly better result (with
50% less errors). At 1 pixel noise level, the errors for the closed-form solution
are about 12%, while those for the nonlinear minimization are about 6%.

Real Data
For the experiment with real data, I used three toy beads from my kids
and strung them together with a stick. The beads are approximately 14 cm
apart (i.e., L = 28). I then moves the stick around while trying to x one
end with the aid of a book. A video of 150 frames was recorded, and four
sample images are shown in Fig. 2.9. A bead in the image is modeled as a
Gaussian blob in the RGB space, and the centroid of each detected blob is
the image point we use for camera calibration. The proposed algorithm is
therefore applied to the 150 observations of the beads, and the estimated
camera parameters are provided in Table 2.2. The rst row is the estimation
from the closed-form solution, while the second row is the rened result after
nonlinear minimization. For the image skew parameter , we also provide
the angle between the image axes in parenthesis (it should be very close to
90 ).
For comparison, we also used the plane-based calibration technique described in [42] to calibrate the same camera. Five images of a planar pattern
were taken, and one of them is shown in Fig. 2.10. The calibration result is
shown in the third row of Table 2.2. The fourth row displays the relative

Section 2.5.

Solving Camera Calibration With 1D Objects

37

(a) Closed-form solution

(b) Nonlinear optimization

Figure 2.8. Calibration errors with respect to the noise level of the image points.

38

Camera Calibration

Frame 10

Frame 60

Frame 90

Frame 140

Chapter 2

Figure 2.9. Sample images of a 1D object used for camera calibration.

Figure 2.10. A sample image of the planar pattern used for camera calibration.

Section 2.6.

Self Calibration

39

dierence between the plane-based result and the nonlinear solution with respect to the focal length (we use 828.92). As we can observe, the dierence
is about 2%.
There are several sources contributing to this dierence. Besides obviously the image noise and imprecision of the extracted data points, one
source is our current rudimentary experimental setup:
The supposed-to-be xed point was not xed. It slipped around on the
surface.
The positioning of the beads was done with a ruler using eye inspection.
Considering all the factors, the proposed algorithm is very encouraging.

2.6

Self-Calibration

Self-calibration is also called auto-calibration. Techniques in this category


do not require any particular calibration object. They can be considered as
0D approach because only image point correspondences are required. Just
by moving a camera in a static scene, the rigidity of the scene provides in
general two constraints [22, 21, 20] on the cameras internal parameters from
one camera displacement by using image information alone. Absolute conic,
described in Section 2.2.2, is an essential concept in understanding these
constraints. Therefore, if images are taken by the same camera with xed
internal parameters, correspondences between three images are sucient to
recover both the internal and external parameters which allow us to reconstruct 3-D structure up to a similarity [20, 17]. Although no calibration
objects are necessary, a large number of parameters need to be estimated,
resulting in a much harder mathematical problem.
We do not plan to go further into details of this approach because two
recent books [15, 7] provide an excellent recount of those techniques.

2.7

Conclusion

In this chapter, we have reviewed several camera calibration techniques. We


have classied them into four categories, depending whether they use 3D
apparatus, 2D objects (planes), 1D objects, or just the surrounding scenes
(self-calibration). Recommendations on choosing which technique were given
in the introduction section.
The techniques described so far are mostly focused on a single-camera
calibration. We touched a little bit on stereo calibration in Section 2.4.9.

40

Camera Calibration

Chapter 2

Camera calibration is still an active research area because more and more
applications use cameras. In [2], spheres are used to calibrate one or more
cameras, which can be considered as a 2D approach since only the surface
property is used. In [5], a technique is described to calibrate a camera network consisting of an omni-camera and a number of perspective cameras. In
[24], a technique is proposed to calibrate a projector-screen-camera system.

2.8

Appendix: Estimating Homography Between the Model


Plane and its Image

There are many ways to estimate the homography between the model plane
and its image. Here, we present a technique based on maximum likelihood
criterion. Let Mi and mi be the model and image points, respectively. Ideally,
they should satisfy (2.18). In practice, they dont because of noise in the
extracted image points. Lets assume that mi is corrupted by Gaussian noise
with mean 0 and covariance matrix mi . Then, the maximum likelihood
estimation of H is obtained by minimizing the following functional

 i )T 1
 i) ,
(mi m
mi (mi m
i

T
Mi
1
h
1
i , the ith row of H.

with h
where
mi = T
TM

h
h3 Mi 2 i
In practice, we simply assume mi = 2 I for all i. This is reasonable if points
are extracted independently with the same procedure. In thiscase, the above
 i 2 .
problem becomes a nonlinear least-squares one, i.e., minH i mi m
The nonlinear minimization is conducted with the Levenberg-Marquardt Algorithm as implemented in Minpack [23]. This requires an initial guess,
which can be obtained as follows.
T , h
T , h
T ]T . Then equation (2.18) can be rewritten as
Let x = [h
1
2
3

T

0T u
MT
M
x=0.
0T 
MT
MT v
When we are given n points, we have n above equations, which can be written
in matrix equation as Lx = 0, where L is a 2n 9 matrix. As x is dened up
to a scale factor, the solution is well known to be the right singular vector of
L associated with the smallest singular value (or equivalently, the eigenvector
of LT L associated with the smallest eigenvalue). In L, some elements are
constant 1, some are in pixels, some are in world coordinates, and some are
multiplication of both. This makes L poorly conditioned numerically. Much
better results can be obtained by performing a simple data normalization
prior to running the above procedure.

Bibliography

41

Bibliography
[1] Y.I. Abdel-Aziz and H.M. Karara. Direct linear transformation into object
space coordinates in close-range photogrammetry. In Proceedings of the Symposium on Close-Range Photogrammetry, University of Illinois at UrbanaChampaign, Urbana, Illinois, pages 118, January 1971.
[2] M. Agrawal and L. Davis. Camera calibration using spheres: A semi-denite
programming approach. In Proceedings of the 9th International Conference on
Computer Vision, pages 782789, Nice, France, October 2003. IEEE Computer
Society Press.
[3] D. C. Brown. Close-range camera calibration. Photogrammetric Engineering,
37(8):855866, 1971.
[4] B. Caprile and V. Torre. Using Vanishing Points for Camera Calibration. The
International Journal of Computer Vision, 4(2):127140, March 1990.
[5] X. Chen, J. Yang, and A. Waibel. Calibration of a hybrid camera network.
In Proceedings of the 9th International Conference on Computer Vision, pages
150155, Nice, France, October 2003. IEEE Computer Society Press.
[6] W. Faig. Calibration of close-range photogrammetry systems: Mathematical
formulation. Photogrammetric Engineering and Remote Sensing, 41(12):1479
1486, 1975.
[7] O. Faugeras and Q.-T. Luong. The Geometry of Multiple Images. The MIT
Press, 2001. With contributions from T. Papadopoulo.
[8] O. Faugeras. Three-Dimensional Computer Vision: a Geometric Viewpoint.
MIT Press, 1993.
[9] O. Faugeras, T. Luong, and S. Maybank. Camera self-calibration: theory and
experiments. In G. Sandini, editor, Proc 2nd ECCV, volume 588 of Lecture
Notes in Computer Science, pages 321334, Santa Margherita Ligure, Italy,
May 1992. Springer-Verlag.
[10] O. Faugeras and G. Toscani. The calibration problem for stereo. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pages
1520, Miami Beach, FL, June 1986. IEEE.
[11] S. Ganapathy. Decomposition of transformation matrices for robot vision. Pattern Recognition Letters, 2:401412, December 1984.
[12] D. Gennery. Stereo-camera calibration. In Proceedings of the 10th Image Understanding Workshop, pages 101108, 1979.
[13] G.H. Golub and C.F. van Loan. Matrix Computations. The John Hopkins
University Press, Baltimore, Maryland, 3 edition, 1996.
[14] P. Gurdjos, A. Crouzil, and R. Payrissat. Another way of looking at plane-based
calibration: the centre circle constraint. In Proceedings of the 7th European
Conference on Computer Vision, volume IV, pages 252266, Copenhagen, May
2002.
[15] R. Hartley and A. Zisserman. Multiple View Geometry in Computer Vision.
Cambridge University Press, 2000.
[16] R. Hartley. Self-calibration from multiple views with a rotating camera. In JO. Eklundh, editor, Proceedings of the 3rd European Conference on Computer
Vision, volume 800-801 of Lecture Notes in Computer Science, pages 471478,
Stockholm, Sweden, May 1994. Springer-Verlag.

42

Camera Calibration

Chapter 2

[17] R. Hartley. An algorithm for self calibration from several views. In Proceedings
of the IEEE Conference on Computer Vision and Pattern Recognition, pages
908912, Seattle, WA, June 1994. IEEE.
[18] J. Heikkil
a and O. Silven. A four-step camera calibration procedure with implicit image correction. In Proceedings of the IEEE Conference on Computer
Vision and Pattern Recognition, pages 11061112, San Juan, Puerto Rico, June
1997. IEEE Computer Society.
[19] D. Liebowitz and A. Zisserman. Metric rectication for perspective images of
planes. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pages 482488, Santa Barbara, California, June 1998. IEEE
Computer Society.
[20] Q.-T. Luong and O.D. Faugeras. Self-calibration of a moving camera from
point correspondences and fundamental matrices. The International Journal of
Computer Vision, 22(3):261289, 1997.
[21] Q.-T. Luong.
Matrice Fondamentale et Calibration Visuelle sur
lEnvironnement-Vers une plus grande autonomie des syst`emes robotiques.
PhD thesis, Universite de Paris-Sud, Centre dOrsay, December 1992.
[22] S. J. Maybank and O. D. Faugeras. A theory of self-calibration of a moving
camera. The International Journal of Computer Vision, 8(2):123152, August
1992.
[23] J.J. More. The levenberg-marquardt algorithm, implementation and theory. In
G. A. Watson, editor, Numerical Analysis, Lecture Notes in Mathematics 630.
Springer-Verlag, 1977.
[24] T. Okatani and K. Deguchi. Autocalibration of projector-screen-camera system: Theory and algorithm for screen-to-camera homography estimation. In
Proceedings of the 9th International Conference on Computer Vision, pages
774781, Nice, France, October 2003. IEEE Computer Society Press.
[25] L. Robert. Camera calibration without feature extraction. Computer Vision,
Graphics, and Image Processing, 63(2):314325, March 1995. also INRIA Technical Report 2204.
[26] J.G. Semple and G.T. Kneebone. Algebraic Projective Geometry. Oxford:
Clarendon Press, 1952. Reprinted 1979.
[27] S.W. Shih, Y.P. Hung, and W.S. Lin. Accurate linear technique for camera
calibration considering lens distortion by solving an eigenvalue problem. Optical
Engineering, 32(1):138149, 1993.
[28] I. Shimizu, Z. Zhang, S. Akamatsu, and K. Deguchi. Head pose determination from one image using a generic model. In Proceedings of the IEEE Third
International Conference on Automatic Face and Gesture Recognition, pages
100105, Nara, Japan, April 1998.
[29] C. C. Slama, editor. Manual of Photogrammetry. American Society of Photogrammetry, fourth edition, 1980.
[30] G. Stein. Accurate internal camera calibration using rotation, with analysis of
sources of error. In Proc. Fifth International Conference on Computer Vision,
pages 230236, Cambridge, Massachusetts, June 1995.

Bibliography

43

[31] P. Sturm and S. Maybank. On plane-based camera calibration: A general


algorithm, singularities, applications. In Proceedings of the IEEE Conference
on Computer Vision and Pattern Recognition, pages 432437, Fort Collins,
Colorado, June 1999. IEEE Computer Society Press.
[32] B. Triggs. Autocalibration from planar scenes. In Proceedings of the 5th European Conference on Computer Vision, pages 89105, Freiburg, Germany, June
1998.
[33] R. Y. Tsai. A versatile camera calibration technique for high-accuracy 3D machine vision metrology using o-the-shelf tv cameras and lenses. IEEE Journal
of Robotics and Automation, 3(4):323344, August 1987.
[34] T. Ueshiba and F. Tomita. Plane-based calibration algorithm for multi-camera
systems via factorization of homography matrices. In Proceedings of the 9th
International Conference on Computer Vision, pages 966973, Nice, France,
October 2003. IEEE Computer Society Press.
[35] G.Q. Wei and S.D. Ma. A complete two-plane camera calibration method
and experimental comparisons. In Proc. Fourth International Conference on
Computer Vision, pages 439446, Berlin, May 1993.
[36] G.Q. Wei and S.D. Ma. Implicit and explicit camera calibration: Theory and
experiments. IEEE Transactions on Pattern Analysis and Machine Intelligence,
16(5):469480, 1994.
[37] J. Weng, P. Cohen, and M. Herniou. Camera calibration with distortion models
and accuracy evaluation. IEEE Transactions on Pattern Analysis and Machine
Intelligence, 14(10):965980, October 1992.
[38] R. Willson. Modeling and Calibration of Automated Zoom Lenses. PhD thesis,
Department of Electrical and Computer Engineering, Carnegie Mellon University, 1994.
[39] R. Yang and Z. Zhang. Eye gaze correction with stereovision for video teleconferencing. In Proceedings of the 7th European Conference on Computer
Vision, volume II, pages 479494, Copenhagen, May 2002. Also available as
Technical Report MSR-TR-01-119.
[40] R. Yang and Z. Zhang. Model-based head pose tracking with stereovision.
In Proc. Fifth IEEE International Conference on Automatic Face and Gesture
Recognition (FG2002), pages 255260, Washington, DC, May 2002. Also available as Technical Report MSR-TR-01-102.
[41] Z. Zhang. A exible new technique for camera calibration. Technical Report
MSR-TR-98-71, Microsoft Research, December 1998. Available together with
the software at http://research.microsoft.com/zhang/Calib/.
[42] Z. Zhang. A exible new technique for camera calibration. IEEE Transactions
on Pattern Analysis and Machine Intelligence, 22(11):13301334, 2000.
[43] Z. Zhang. Camera calibration with one-dimensional objects. Technical Report
MSR-TR-2001-120, Microsoft Research, December 2001.
[44] Z. Zhang. Camera calibration with one-dimensional objects. In Proc. European Conference on Computer Vision (ECCV02), volume IV, pages 161174,
Copenhagen, Denmark, May 2002.
[45] Z. Zhang and O.D. Faugeras. 3D Dynamic Scene Analysis: A Stereo Based
Approach. Springer, Berlin, Heidelberg, 1992.

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy