Radiological images are increasingly being used in healthcare and medical re-
search. There is, consequently, widespread interest in accurately relating infor-
mation in the different images for diagnosis, treatment and basic science. This
article reviews registration techniques used to solve this problem, and describes
the wide variety of applications to which these techniques are applied. Applica-
tions of image registration include combining images of the same subject from
different modalities, aligning temporal sequences of images to compensate for
motion of the subject between scans, image guidance during interventions and
aligning images from multiple subjects in cohort studies. Current registration
algorithms can, in many cases, automatically register images that are related
by a rigid body transformation (i.e. where tissue deformation can be ignored).
There has also been substantial progress in non-rigid registration algorithms
that can compensate for tissue deformation, or align images from different sub-
jects. Nevertheless many registration problems remain unsolved, and this is
likely to continue to be an active field of research in the future.
Summary of notation
1. Introduction
Medical images are increasingly being used within healthcare for diagnosis, planning
treatment, guiding treatment and monitoring disease progression. Within medical research
(especially neuroscience research) they are used to investigate disease processes and understand
normal development and ageing. In many of these studies, multiple images are acquired from
subjects at different times, and often with different imaging modalities. In research studies, it
is sometimes desirable to compare images obtained from patient cohorts rather than just single
subjects imaged multiple times. Furthermore, the amount of data produced by each successive
generation of imaging system is greater than the previous generation. This trend is set to
continue with the introduction of multislice helical CT scanning and MR imaging systems
with higher gradient strengths. There are, therefore, potential benefits in improving the way in
which these images are compared and combined. Current clinical practice normally involves
printing the images onto radiographic film and viewing them on a light box. Computerized
approaches offer potential benefits, particularly by accurately aligning the information in the
different images, and providing tools for visualizing the combined images. A critical stage in
this process is the alignment or registration of the images, which is the topic of this review
article. There have been previous surveys of the medical image registration literature (e.g.
Maurer and Fitzpatrick 1993, van den Elsen et al 1993, Maintz and Viergever 1998). This
article aims to complement them both by describing some more recent literature and by using
notation that makes clear some of the practical difficulties in implementing robust registration
techniques. Furthermore, this article focuses discussion on some of the most widely used
registration algorithms rather than attempting to provide a comprehensive survey of all the
literature in this field. For this reason, some very recently devised algorithms are not described,
as it is not yet clear whether they will become widely used.
Medical image registration R3
In this article we describe the main approaches used for the registration of radiological
images. The most widely used application of medical image registration is aligning
tomographic images. That is aligning images that sample three-dimensional space with
reasonably isotropic resolution. Furthermore, it is often assumed that between image
acquisitions, the anatomical and pathological structures of interest do not deform or distort.
This ‘rigid body’ assumption simplifies the registration process, but techniques that make
this assumption have quite limited applicability. Many organs do deform substantially, for
example with the cardiac or respiratory cycles or as a result of change in position. The brain
within the skull is reasonably non-deformable provided the skull remains closed between
imaging, and that there is no substantial change in anatomy and pathology, such as growth
in a lesion, between scans. Imaging equipment is imperfect, so regardless of the organ being
imaged, the rigid body assumption can be violated as a result of scanner-induced geometrical
distortions that differ between images. Although the majority of the registration approaches
reviewed here have been applied to the rigid body registration of head images acquired using
tomographic modalities, there is now considerable research activity aimed at tackling the
more challenging problems of aligning images that have different dimensionality (for example
projection images with tomographic images), aligning images of organs that deform, aligning
images from different subjects, or of aligning images in ways that can correct for scanner-
induced geometric distortion. This is quite a rapidly moving research field, so the work
reviewed in these areas is more preliminary.
For all types of image registration, the assessment of registration accuracy is very
important. The required accuracy will vary between applications, but for all applications
it is desirable to know both the expected accuracy of a technique and also the registration
accuracy achieved on each individual set of images. For one type of registration algorithm,
point-landmark registration, the error propagation is well understood. For other approaches,
however, the algorithms themselves provide no useful indication of accuracy. The most
promising approach to ensuring acceptable accuracy is visual assessment of the registered
images before they are used for the desired clinical or research application.
Although this review concentrates on registration of radiological images of the same
subject (intrasubject registration), there is some discussion of the closely related topics of
intersubject registration, including registration of images of an individual to an atlas, and
image-to-physical space registration.
In this article we primarily consider the main radiological imaging modalities. These
include traditional projection radiographs, with or without contrast and subtraction, nuclear
medicine projection images, ultrasound images and the cross-sectional modalities of x-ray
computed tomography (CT), magnetic resonance imaging (MRI), single photon emission
computed tomography (SPECT) and positron emission tomography (PET). We refer to these
last four modalities (CT, MRI, SPECT and PET) as the tomographic modalities. In many ways
these are the easiest modalities from the point of view of image registration. They provide
voxel datasets in which the sampling is normally uniform along each axis, though the voxels
themselves tend to have anisotropic resolution. In a projection x-ray, each pixel represents
the integral of attenuation along one of a set of converging lines through the patient, and they
represent a superposition of structures in the patient with varying magnification. Many nuclear
medicine images acquired with gamma cameras are parallel projections, in which each pixel
represents the integral along one of a set of parallel lines through the patient. It is, therefore,
a superposition of structures in the patient all at the same magnification (though at different
resolutions). The majority of ultrasound images are acquired with a free-hand transducer.
Each image is a two-dimensional slice through part of the patient. The images are acquired
at a high frame rate, but the spatial relationship between the frames is not recorded. If the
transducer is moved in a controlled way or tracked, the relative positions of the frames can be
recorded, and a three-dimensional dataset obtained, provided the structures being tracked are
not moving or deforming during the acquisition.
Video images are often acquired during surgery, for example using endoscopes or
microscopes. For the purpose of image guidance, it can be useful to relate the video images
to preoperatively acquired diagnostic images. Video images are, like radiographs and many
nuclear medicine images, projections. They differ, however, in that they normally only contain
information about the surface of structures in the field of view, rather than a superposition
of overlying structures. There is a huge computer vision literature on estimating three-
dimensional shapes from one or more video camera views. Video images can be aligned
with tomographic images either by first extracting surface structures, or directly.
In this article we use the term ‘registration’ to mean determining the spatial alignment between
images of the same or different subjects, acquired with the same or different modalities, and
also the registration of images with the coordinate system of a treatment device or tracked
localizer. Other authors distinguish between different categories of alignment using the words
registration, co-registration and normalization. The term normalization is usually restricted
to the intersubject registration situation, and registration and co-registration are often used
interchangeably. The algorithms used for all these applications have many features in common,
so we prefer to use the term registration for all cases.
The word registration is used with two slightly different meanings. The first meaning is
determining a transformation that can relate the position of features in one image or coordinate
space with the position of the corresponding feature in another image or coordinate space. We
use the symbol T to represent this type of positional registration transformation. The second
meaning of registration both relates the position of corresponding features and enables us to
compare the intensity at those corresponding positions (e.g. to subtract image intensity values).
We use the symbol T to describe this second meaning of registration, which incorporates the
concepts of resampling and interpolation.
Using the language of geometry, the registration transformation is referred to as a mapping.
We can consider the mapping T , that transforms a position x from one image to another, or
Medical image registration R5
from one image to the coordinate system of a treatment device (image to physical registration)
T : xA → xB ⇔ T (xA ) = xB . (1)
Using this notation, T is a spatial mapping. We also need to consider the more complete
mapping T that maps both position and associated intensity value from image A to image B.
T therefore maps an image to an image, whereas T maps between coordinates. If we wish to
overlay two images that have been registered, or to subtract one from another, then we need
to know T , not just T . T is only defined in the region of overlap of the image fields of view,
and has to take account of image sampling and resolution. A(xA ) is the intensity value at the
location xA , and similarly for image B 1 . It is important to remember that the medical images
A and B are derived from a real object, i.e. the patient. The images have a limited field of view
that does not normally cover the entire patient. Furthermore, this field of view is likely to be
different for the two images.
We can usefully think of the two images themselves as being mappings of points in the
patient within their field of view (or domain ) to intensity values
A : xA ∈ A → A(xA )
B : xB ∈ B → B(xB ).
Because the images are likely to have different fields of view, the domains A and B will be
different. This is a very important factor, which accounts for a good deal of the difficulty in
devising accurate and reliable registration algorithms. We will return to this issue in section 2.1.
To compare images A and B, we would like them both to be defined at the location xA ,
so that we can write B(xA ). This is wrong, however, as B is not defined at location xA .
We therefore introduce the notation B T for the image B transformed with a given mapping
T . If T accurately registers the images, then A(xA ) and B T (xA ) will represent the same
location in the object to within some error depending on T . If we didn’t have to worry about
interpolation, we could write B T (xA ) instead of B T (xA ), but due to the discrete nature of
medical images, discussed further in section 2.2, interpolation is necessary for any realistic
As the images A and B represent one object X, imaged with the same or different
modalities, there is a relation between the spatial locations in A and B. Modality A is such
that position x ∈ X is mapped to xA , and modality B maps x to xB . The registration process
involves recovering the spatial transformation T which maps xA to xB over the entire domain of
interest, i.e. which maps from A to B within the overlapping portion of the domains. We refer
to this overlap domain as TA,B . This notation makes it clear that the overlap domain depends
on the domains of the original images A and B, and also on the spatial transformation T . The
overlap domain is the positions in the domain of image A that are also in the domain of image
B after transformation, and can be defined as:
TA,B = {xA ∈ A |T −1 (xA ) ∈ B }.
As stated earlier, the transformation T maps both position, and intensity at that position, from
one image to another, taking account of issues to do with sampling and interpolation. It is
important to emphasize, however, that T is not an intensity mapping: it does not make image B
look like image A by giving a position in image B T (xA ) the same intensity as A(xA ). We use
the symbol F to represent that mapping of intensities from one image to another. For two
images differing only by noise, F will be the identity. In general F will be a spatially varying
(non-stationary) function that is not monotonic.
1 A(xA ) and B(xB ) are normally scalars, but in some circumstances can be vectors (e.g. flow) or tensors (e.g.
diffusion). Non-scalar voxel values can add further complications to image transformation which are not addressed
in this article.
R6 D L G Hill et al
Registration algorithms that make use of geometrical features in the images such as points,
lines and surfaces, determine the transformation T by identifying features such as sets of image
points {xA } and {xB } that correspond to the same physical entity visible in both images, and
calculating T for these features. When these algorithms are iterative, they iteratively determine
T , and then infer T from T when the algorithm has converged.
Registration algorithms that are based on image intensity values work differently. They
iteratively determine the image transformation T that optimizes a voxel similarity measure.
Many of these voxel similarity measures involves analysing isointensity sets (or level sets)
within the images. For a single image A, an isointensity set with intensity value a is the set of
voxels within a subdomain of A, such that
a = {xA ∈ A |A(xA ) = a}. (2)
This equation, put into words, states ‘all locations xA in the field of view of image A for which
the intensity value is a’.
Some algorithms do not work on isointensity sets corresponding to a single intensity value,
but on isointensity sets corresponding to small groups, or bins, of intensities. For example a
12-bit image may have its intensities grouped into 256 4-bit bins. We use a to mean either
individual intensities or intensity bins, as appropriate.
It is important to remember that a is the isointensity set within all of image A that is within
the domain A . As has been stated above, for registration using voxel similarity measures,
we work within the overlap domain TA,B . The isointensity set within this overlap domain is,
of course, a function of T . To emphasize this T dependence, we define the isointensity set of
image A with value a, within TA,B as
Ta = {xA ∈ TA,B |A(xA ) = a}. (3)
Similarly, we can consider an isointensity set in image B. Image B, of course, is always
the image that we consider transformed, so the definition is slightly different from that for
image A. We consider the isointensity set to be the set of voxels in the space A that have
intensity b in image B T
Tb = {xA ∈ TA,B |B T (xA ) = b}. (4)
centroids (from the first-order moment) of OA and OB , and then aligning the principal axes
of OA and OB (from the second-order moment). This approach is, however, unsatisfactory
for most medical image registration applications because the first- and second-order moments
are highly sensitive to change in the image field of view. In order for this method to work
accurately, the object used for the calculations must be entirely within TA,B , and it is frequently
difficult to delineate appropriate structures with this property from clinical images.
To emphasize the importance of this overlap domain in image registration, we introduce
notation to make this clear. A|TA,B and B T |TA,B are the portions of image A and B T
respectively in the overlap domain. The use of the A|TA,B and B T |TA,B notation is equivalent
to A(xA ) ∀ xA ∈ TA,B and B T (xA ) ∀ xA ∈ TA,B .
A further important property of the medical images with which we work is that they are discrete.
That is, they sample the object at a finite number of voxels in three dimensions or pixels in two
dimensions. In general, this sampling is different for images A and B; also, while the sampling
is commonly uniform in a given direction, it is anisotropic, that is it varies with direction in
the image. Discretization has important consequences for image registration, so it is useful to
build this concept into our notational framework.
We can define our domain as
˜ ∩ ς
where ˜ is a bounded continuous set, and could be called the volume or field of view of the
image, and is an infinite discrete grid. is our sampling grid, which is characterized by
the anisotropic sample spacing ς = (ς x , ς y , ς z ). The sampling is normally different for the
images A and B being registered, and we denote this by introducing sampling grids ςA and
ςB for the domains A and B .
For any given T , the intersection of the discrete domains A and T (B ) is likely to be the
empty set, because no sample points will exactly overlap. In order, therefore, to compare the
images A and B for any estimate of T , it is necessary to interpolate between sample positions
and to take account of the difference in sample spacing ςA and ςB . This introduces two
problems. Firstly, fast interpolation algorithms are imperfect, introducing blurring or ringing
into the image (discussed further in section 9). This changes the image histograms, and hence
alters the isointensity sets discussed above. Secondly, we must be careful when the image
B being transformed has higher-resolution sampling than image A (i.e. one or more of the
elements in ςB is less than the corresponding element in ςA ). In this case, we risk aliasing
when we resample B from B to TA,B , so we should first blur B with a filter of resolution ςA
or lower before resampling.
As stated earlier, the transformation T maps both positions and intensities at these
positions. T , therefore, has to take account of the discrete sampling. The spatial mapping T ,
in contrast, does not take account of this effect. When we use T as a superscript of a symbol,
as in B T , we are making it clear that the quantity represented by this symbol is dependent on
both the spatial mapping T and the interpolation or blurring used during resampling. Figure 1
illustrates the relationship between field of view, domain of the image and the registration
R8 D L G Hill et al
Figure 1. Two images A and B illustrated in the top panel of this figure have different fields of
view ˜ A and ˜ B . For a registration transformation T there will be a region of overlap between the
images as illustrated in the third panel. It is important to remember that the images are discrete.
The discretization is determined by the sampling grids ςA and ςB shown in the fourth panel.
Even if images A and B have exactly the same sampling grid, the grid points will not normally
coincide in the volume of overlap as shown in the fifth panel. Interpolation is therefore necessary.
For iterative registration algorithms, this interpolation is necessary at each iteration. The image
transformation T described in the text maps both position and the associated intensity in the images
within the region of overlap, incorporating the concepts of field of view and discrete sampling.
3. Types of transformation
We have not, so far, discussed the nature of the transformation T . For most current applications
of medical image registration, both A and B are three dimensional, so T transforms from 3D
space to 3D space. In some circumstances, such as the registration of a radiograph to a CT scan,
it is useful to consider transformations from 3D space to 2D space (or vice versa). Where both
images being registered are two dimensional (e.g. two radiographs before and after contrast is
Medical image registration R9
injected), the appropriate transformation might be from 2D space to 2D space2 . Most medical
image registration algorithms additionally assume that the transformation is ‘rigid body’, i.e.
there are six degrees of freedom (or unknowns) in the transformation: three translations and
three rotations. The key characteristic of a rigid body transformation is that all distances are
Some registration algorithms increase the number of degrees of freedom by allowing for
anisotropic scaling (giving nine degrees of freedom) and skews (giving 12 degrees of freedom).
A transformation that includes scaling and skews as well as the rigid body parameters is referred
to as affine, and has the important characteristics that it can be described in matrix form and
that all parallel lines are preserved. A rigid body transformation can usefully be considered as
a special case of affine, in which the scaling values are all unity and the skews all zero.
Individual bones are rigid at the resolution of radiological imaging modalities, so rigid
body registration is widely used in medical applications where the structures of interest are
either bone or are enclosed in bone. By far the most important part of the body registered
in this way is the head, and in particular the brain. Rigid body registration is used for other
regions of the body in the vicinity of bone (e.g. the neck, pelvis, leg or spine) but the errors
are likely to be larger.
The use of an affine transformation rather than a rigid body transformation does not greatly
increase the applicability of image registration, as there are not many organs that only stretch
or shear. Tissues usually deform in more complicated ways. There are, however, several
scanner introduced errors that can result in scaling or skew terms, and affine transformations
are sometimes used to overcome these problems (see section 4).
For most organs in the body, many more degrees of freedom are necessary to describe
the tissue deformation with adequate accuracy. Even in the brain, development of children,
lesion growth or resection can make a affine transformations inadequate. Also, it is common
in neuroscience research to align images from different subjects. This is called intersubject
registration and is discussed in section 10.4.3. While an affine transformation is widely used
to provide an approximate alignment of different subjects, additional degrees of freedom are
needed for more accurate registration. These non-affine registration transformations are not
the main focus of this article, but some approaches for both intrasubject and intersubject
registration are described in section 10.4.
a negative value of the determinant of this matrix. When we refer to affine transformations
elsewhere in this article we exclude reflections as they are undesirable in this application.
4. Image distortion
planar imaging, the bandwidth per pixel is lowest in the phase encode (or blip) direction, so the
distortion is highest in that direction (Jezzard and Balaban 1995). For two-dimensional images,
the field inhomogeneity results in the excitation of slices that are curved rather than planar. The
magnet-dependent inhomogeneity can be measured with a phantom experiment, but to correct
for the object-dependent field inhomogeneity it is necessary to make additional measurements
during imaging. For spin-echo images, this can be done by making two measurements with
different readout gradient strengths (or opposite gradient directions) (Chang and Fitzpatrick
1992). For gradient echo images (Sumanaweera et al 1994) and echo planar images (Jezzard
and Balaban 1995), distortion can be inferred from a map of field inhomogeneity.
Patient motion during either CT or multislice MR imaging can result in a variation in slice
orientation relative to the patient across the scan. This is currently an unsolved problem, and
makes registration of such datasets very difficult. In the rigid body case, there is a different
transformation needed for each slice (or in some MR images, groups of slice interleaves), and
this transform is hard to find.
Motion during MR imaging within the acquisition of a single slice or an entire 3D volume
results in a different problem, namely ghost artefacts. This can result in one or more ‘ghosts’
of the object appearing in the image along with the main image of the object. These ghosts
normally have higher spatial frequency content than the main image, but there is a different
registration transformation needed for each ghost. Just as the geometric distortion described
above can be corrected, these ghost artefacts can be removed (e.g. Atkinson et al 1999, McGee
et al 2000) but this is not routinely performed.
With emission tomography (PET and SPECT), an important cause of distortion errors can
be poor alignment of the detector heads in multidetector systems, or uncertainty in the centre of
rotation. These lead to a halo artefact that distorts the true distribution. Also, in PET imaging
it is important to calibrate the voxel dimensions, as some reconstruction algorithms assume
that all photons are detected at the face of the scintillator crystals, but the mean free path of
photons in these materials can be 1 cm or so, giving scaling errors in the data.
In this section we describe registration algorithms that determine T using image features that
have been extracted from the images either interactively or automatically. The most widely
used of these features in this application are points and surfaces, though crest lines and extremal
points identified using differential geometric operators are also used. An alternative approach
to registration using geometrical features is to generate derived images from A and B that
represent the strength of a feature (such as a ridge) at each voxel and then to register these
derived images using a voxel similarity measure. Such approaches are described in section 7.1.
R12 D L G Hill et al
5.1.1. The orthogonal Procrustes problem. The orthogonal Procrustes problem draws its
name from the Procrustes area of statistics. Procrustes was a robber in Greek mythology. He
would offer travellers hospitality in his road-side house, and the opportunity to stay the night
in his bed that would perfectly fit each visitor. As the visitors discovered to their cost, however,
it was the guest who was altered to fit the bed, rather than the bed that was altered to fit the
guest. Short visitors were stretched to fit, and tall visitors had suitable parts of their body cut
off so they would fit. The result, it seems, was invariably fatal. The hero Theseus put a stop to
this unpleasant practice by subjecting Procrustes to his own method.
The term ‘Procrustes’ became a criticism for the practice of unjustifiably forcing data
to look like they fit another set. More recently, Procrustes statistics has lost its negative
associations and is used in shape analysis. The mathematical problem has relevance in many
domains including statistics (Green 1952, Hurley and Cattell 1962, Koschat and Swayne 1991,
Schönemann 1966, Rao 1980), gene recognition (Gelfand et al 1996), satellite positioning and
robotics (Kanatani 1994, Umeyama 1991) in addition to its interest in numerical mathematics
as a least square problem (Golub and van Loan 1996, Edelman et al 1998, Söderkvist 1993,
Stewart 1993). The Procrustes problem is an optimal fitting problem, of least square type:
given two configurations of N non-coplanar points P = {pi } and Q = {qi }, one seeks the
transformation T which minimizes G(T ) = ||T (P ) − Q||2 . The notation is: P , Q are the N -
by-D matrices whose rows are the coordinates of the points pi , qi , T (P ) is the corresponding
of transformed points, and || . . . || is a matrix norm, the simplest being the Frobenius
= ( i (T (pi ) − qi )2 )1/2 . The standard case is when T is a rigid body transformation
(Dryden and Mardia 1998, Fitzpatrick et al 1998b). One can additionally consider scaling
(Dryden and Mardia 1998). If T is affine, we are faced with a standard least square (Golub
and van Loan 1996).
5.1.2. Solutions. The classical Procrustes problem, i.e. T ∈ {rigid body transformations}
has known solutions. A matrix representation of the rotational part can be computed using
singular-value decomposition (SVD) (Dryden and Mardia 1998, Golub and van Loan 1996,
Kanatani 1994, Schönemann 1966, Umeyama 1991).
First replace P and Q by their demeaned versions, as the optimal transformation is from
centroid to centroid
pi → pi − p̄
qi → qi − q̄
This reduces the problem to the orthogonal Procrustes problem in which we wish to
determine the orthogonal rotation R. Central to the problem is the D-by-D correlation matrix
K := P t Q, as this matrix quantifies how much the points in Q are ‘predicted’ by points in
Medical image registration R13
P . If P = [pt1 , . . . , ptN ]t is a matrix of row vectors (and the same for Q), K = i Ki where
Ki := pi qit , then
K = U DV t ⇒ R = V U t := diag(1, 1, det(V U t ))
where K = U DV t is the SVD of K .
It is essential for most medical registration applications that R does not include any
reflections. This can be detected from the determinant of V U t , which should be +1 for a
rotation with no reflection, and will be −1 if there is a reflection. In the above equation,
takes this into account.
Finally the translation t is given by t = q̄ − Rp̄.
This approach has been widely used in medical image registration, first for multimodality
registration (e.g. Evans et al 1988, Hill et al 1991) and more recently in image-guided surgery
(e.g. Maurer et al 1997). The points can either be anatomical features that can be identified in
3D, or markers attached to the patient. The theory of errors has been advanced in the medical
application domain through the work of Fitzpatrick et al (1998b).
5.2.1. The head-and-hat algorithm. Pelizzari and colleagues (Pelizzari et al 1989, Levin et al
1988) proposed a surface fitting technique for intermodality registration of images of the head
that became known as the ‘head-and-hat’ algorithm. Two equivalent surfaces are identified in
the images. The first, from the higher-resolution modality, is represented as a stack of discs,
and is referred to as the head. The second surface is represented as a list of unconnected 3D
points. The registration transformation is determined by iteratively transforming the (rigid)
hat surface with respect to the head surface, until the closest fit of the hat onto the head is
found. The measure of closeness of fit used is the square of the distance between a point on the
hat and the nearest point on the head, in the direction of the centroid of the head. The iterative
optimization technique used is the Powell method (Press et al 1992). The Powell optimization
algorithm performs a succession of one-dimensional optimizations, finding in turn the best
solution along each of the six degrees of freedom, and then returning to the first degree of
freedom. The algorithm stops when it is unable to find a new solution with a significantly
lower cost (as defined by a tolerance factor) than the current best solution. This algorithm has
been used with considerable success for registering images of the head (Levin et al 1988), and
has also been applied to the heart (Faber et al 1991). The surfaces most commonly used are
the skin surface delineated from both MR images and PET transmission images, or the brain
surface delineated from both MR images and PET emission images. The measure of goodness
of fit can be prone to error for convoluted surfaces: the distance metric used by the head-and-
hat algorithm does not always measure the distance between a hat point and the closest point
on the head surface, because the nearest head point will not always lie in the direction of the
head centroid, especially if the surface is convoluted.
R14 D L G Hill et al
5.2.3. Iterative closest point. The iterative closest point algorithm (ICP) was proposed by Besl
and McKay (1992) for the registration of 3D shapes. It was not designed with medical images
in mind, but has subsequently been applied to medical images with considerable success,
and is now probably the most widely used surface matching algorithm in medical imaging
applications (e.g. Cuchet et al 1995, Declerck et al 1997, Maurer et al 1998). The original
article is written in terms of registration of collected data to a model. The collected data, P ,
could come from any sensor that provides three-dimensional surface information including
laser scanners, stereo video and so forth. The model data, X , could come from a computer-
aided design model. In medical imaging applications, both sets of surface data might be
delineated from radiological images, or the model might be derived from a radiological image
and the data from stereo video acquired during an operation. The algorithm is designed to work
with seven different representations of surface data: point sets, line segment sets (polylines),
implicit curves, parametric curves, triangle sets, implicit surfaces and parametric surfaces. For
medical image registration the most relevant representations are likely to be point sets and
triangle sets, as algorithms for delineating these from medical images are widely available.
The algorithm has two stages and iterates. The first stage involves identifying the closest
model point for each data point, and the second stage involves finding the least square rigid
body transformation relating these points sets. The algorithm then redetermines the closest
point set and continues until it finds the local minimum match between the two surfaces, as
determined by some tolerance threshold.
Whatever the original representation of the data surface P , it is first converted to a set of
points {pi }. The model data remain in their original representation. The first stage involves
identifying, for each point pi on the data surface P , the closest point on the model surface X .
This is the point x in X for which the distance d between pi and x is minimum
d(pi , X ) = min ||x − pi ||.
The resulting set of closest points (one for each pi ) is {qi }. For a triangulated surface, which
is the most likely model representation from medical image data, the model X comprises a set
of triangles {ti }. The closest model point to each data point is found by linearly interpolating
across the facets. If triangle ti has vertices r1 , r2 and r3 , then the distance between the point
pi and the triangle ti is
d(pi , ti ) = min ||ur1 + v r2 + w r3 − pi ||
Medical image registration R15
where u ∈ [0, 1], v ∈ [0, 1] and w ∈ [0, 1]. The closest model point to the data point pi is,
therefore, qi = (ur1 , v r2 , wr3 ).
A least squares registration between the points {pi } and {qi } can then be carried out using
the method described in section 5.13 . The set of data points {pi } is then transformed to {pi }
using the calculated rigid body transformation, and then the closest points once again identified.
The algorithm terminates when the change in mean square error between iterations falls below
a threshold.
The optimization can be accelerated by keeping track of the solutions at each iteration.
If there is good alignment between the solutions (to within some tolerance), then both a parabola
and a straight line are fitted through the solutions, and the registration estimate is updated using
one or the other of these estimates based on a slightly ad hoc method to ‘be on the safe side’.
As the algorithm iterates to the local minimum closest to the starting position, it may not
find the correct match. The solution proposed by Besl and McKay is to start the algorithm
multiple times, each with a different estimates of the rotation alignment, and choose the
minimum of the minima obtained.
In sections 5.1 and 5.2 above we did not distinguish between registration where images
A and B are of the same modality and registration of A and B when they are of different
modalities. For registration using voxel similarity measures this is an important distinction, as
will be seen from the following example. A common reason for carrying out same modality, or
intramodality, registration is to compare images from a subject taken at slightly different times
in order to ascertain whether there have been any subtle changes in anatomy or pathology.
If there has been no change in the subject, we might expect that, after registration and
subtraction, there will be no structure in the difference image, just noise. Where there is a small
amount of change in the structure, we would expect to see noise in most places in the images,
with a few regions visible in which there has been some change. If there were a registration
error, we would expect to see artefactual structure in the difference image resulting from the
poor alignment. In this application, various voxel similarity measures suggest themselves. We
could, for example, iteratively calculate T while minimizing the structure in the difference
image on the grounds that at correct registration there will be either no structure or a very
small amount of structure in the difference image, whereas with increasing misregistration,
the amount of structure would increase. The structure could be quantified, for example, by the
sum of squares of difference values, or the sum of absolute difference values, or the entropy of
the difference image. An alternative intuitive approach (at least for those familiar with signal
processing techniques) would be to find T by cross correlation of images A and B. In this
section, we describe these intramodality techniques in more detail.
In section 6 we described algorithms that register images of the same modality by optimizing
a voxel similarity measure. Because of the similarity of the intensities in the images being
registered, the subtraction, correlation and ratio techniques described have an intuitive basis.
With intermodality registration, the situation is quite different. There is, in general, no simple
relationship between the intensities in the images A and B. Using the notation of section 2,
the intensity mapping function F is a complicated function and no simple arithmetic operation
on the voxel values is, therefore, going to produce a single derived image from which we can
quantify misregistration.
In section 7.1 we describe some interesting approaches to applying the intramodality
measures of subtraction and correlation to images of different modalities. In the remainder of
this section we then describe similarity measures designed to work directly on intermodality
R18 D L G Hill et al
7.1.1. Intensity re-mapping for MR–CT registration. One approach, which works well in
MR–CT registration, is to transform the CT image intensities, such that high intensities are
re-mapped to low intensities. This creates a virtual image from the CT images that has an
intensity distribution more like an MR images (in which bone is dark) (van den Elsen et al
1994). The MR image and virtual MR image created from CT are then registered by cross
7.1.3. The registration algorithm used by the statistical parametric mapping (SPM)
software. Registration involves finding the optimal transformation T . In order to run
optimization algorithms, these transformation are assumed to be parametrized, for example
the rotations are typically parametrized by Euler angles. We call the collection of parameters
θ = (θ0 , . . . , θK−1 ), and we indicate the parametrization using the notation T = Tθ . For non-
rigid transformations, the number of parameters K can be very large. In this sense, a general
similarity measure SA,B (T ) becomes a function of these parameters θ : SA,B (θ ).
Friston et al (1995) proposed an alternative to running some optimization algorithm on
a similarity measure. In the section on similarity measures, it was mentioned that different
relationships between the image intensities are possible. At registration, the intensity mapping
could be the identity, i.e. B(T (xA )) = A(xA ) + ,(xA ). Alternatively the mapping could be
linear, in which case B T (xA ) = αA(xA ) + β + ,(xA ). More generally, we could have some
global functional relation B T (xA ) = F (A(xA )) + ,(xA ) or local functional (non-stationary)
relationship: B T (xA ) = F (A(xA ), xA ) + ,(xA ). In all these equations, ,(xA ) is some error
term, which is discarded. Friston et al note that this provides one equation for every xA . Just as
Medical image registration R19
the transformation can be parametrized, they assume that the unknown F can be parametrized
too, say by u = (u0 , . . . , uL−1 ), with L parameters. If we rewrite the N equations in the form
-xA (θ, u) = B Tθ (xA ) − Fu (A(xA ), xA ) = 0 (11)
we get N implicit equations in K + L unknowns. If N K + L such a system can be formally
solved. This is a very general approach, and Friston et al propose various specific versions for
different applications. The general approach is analogous to the sum of squares of difference
(SSD) algorithm described in section 6.1, in which registration is accomplished by minimizing
the sum of squares of differences in voxel intensities between a virtual image derived from
image A and the corresponding locations in image B
SSD = ||F (A(xA )) − B T (xA )||2 . (12)
The assumptions in Friston et al’s algorithm, however, mean that their solution is likely to be
different from the solution obtained by iteratively minimizing SSD.
In order to be able to solve explicitly equation (11), Friston et al make a series of
assumptions, which actually have the effect of ensuring that K and L are not too big. If - is
smooth, and we know that the solution is close to our starting estimate θ0 , u0 , equation (11) can
be linearized by taking the first two terms of the Taylor expansion of -, in order to compute
an explicit equation: -xA (θ, u) = -xA (θ0 , u0 ) + [∂θ -xA (θ0 , u0 ), ∂u -xA (θ0 , u0 )][θ, u]t .
If we call A the matrix of partial derivatives, then equation (11) takes the simple form
A[θ, u]t = −-xA (θ0 , u0 ) which can be solved by standard least square techniques. This is not
iterative, but obviously, due to the assumptions which have been made, iterative improvement
might be required. Good starting estimates are essential for this technique.
to vary, so vg is replaced by vg + v(xMR ). The parameters of (we don’t write the argument for
clarity) F (MR(·), ·) = u0 (·)e−(MR(·)−(vg +v(·))) /2σ are then u0 and v. Friston et al make the
2 2
parameter transformation u1 = u0 v and assumes smooth variation over the image to ensure
the system is overdetermined.
of the PET voxel values within each bin. Once again, uniformity within each bin is maximized
by minimizing the normalized standard deviation.
It is possible to consider this algorithm also using the concept of ratio images, as in the
intramodality RIU algorithm. In the intermodality case, however, instead of generating a single
ratio image, one ‘ratio image’ is generated for each of the MR intensity bins. This produces
256 sparse images, one for each isointensity set a . No explicit division is needed to generate
these ‘ratio images’, however, because the denominator image corresponds to a single MR bin.
The normalized standard deviation of each of these sparse ‘ratio images’ is then calculated,
and the overall similarity measure calculated from a weighted sum of the normalized standard
In the discussion above, we have described the algorithm in terms of MR and PET
registration only. We can now formulate the algorithm more generally in terms of images
A and B. It is important to note that the two images are treated differently, so there are two
different versions of the algorithm, depending on whether image A or image B is partitioned.
For registration of the images A and B, the partitioned image uniformity measure (PIU)
can be calculated in two ways. Either as the sum of the normalized standard deviation of
intensities in B for each intensity a in A (PIUB ) or the sum of the normalized standard deviation
of intensities in A for each intensity b in B (PIUA )
na σB (a) nb σA (b)
PIUB = and PIUA = (13)
a N µB (a) b
N µA (b)
na = 1 nb = 1
Ta Tb
1 T 1
µB (a) = B (xA ) µA (b) = A(xA )
na T nb T
a b
1 T 1
σB2 (a) = (B (xA ) − µB (a))2 σA2 (b) = (A(xA ) − µA (b))2 .
na T nb T
a b
In words, we can say that na is the number of voxels of the isointensity set a in A|TA,B , and
µB (a) and σB (a) are the mean and standard deviation values of the voxels in B T |TA,B that co-
occur with this set. The PIU algorithm is widely used for MR–PET registration, requiring that
the scalp is first removed from the MR image to avoid a breakdown of the idealized assumption
described above. The technique has never been widely used for registration of other modalities,
but its success has inspired considerable research activity aimed at identifying alternative voxel
similarity measures for intermodality registration.
H is the average information supplied by a set of n symbols whose probabilities are given by
p1 , p2 , p3 , . . . pn .
This formula, save for a multiplicative constant, is derived from three conditions that a
measure of choice or uncertainty in a communication channel should satisfy. These are:
(a) The functional should be continuous in pi .
(b) If all pi equal n1 , where n is the number of symbols, then H should be monotonically
increasing in n.
(c) If a choice is broken down into a sequence of choices then the original value of H should
be the weighted sum of the constituent H . That is H (p1 , p2 , p3 ) = H (p1 , p2 + p3 )
+(p2 + p3 )H ( p2p+p
2 p3
3 p2 +p3
Shannon proved that the − pi log pi form was the only functional form satisfying all three
Entropy will have a maximum value if all symbols have equal probability of occurring
(i.e. pi = n1 ∀i), and have a minimum value of zero if the probability of one symbol occurring
is 1, and the probability of all the others occurring is zero.
An important observation made by Shannon is that any change in the data that tends to
equalize the probabilities of the symbols {p1 , p2 , p3 , . . . pn } increases the entropy. Blurring
the symbols is one such operation. For a single image, the entropy is normally calculated from
the image intensity histogram in which the probabilities p1 . . . pn are the histogram entries4 .
If all voxels in an image have the same intensity a, the histogram contains a single non-zero
element with probability of 1, indicating that A(xA ) = a for all xA . The entropy of this
image is −1 log 1 = 0. If this uniform image were to include some noise, then the histogram
will contain a cluster of non-zero entries around a peak at the average (mode) intensity value,
which will be approximately a. The addition of noise to the image, therefore, tends to equalize
the probabilities by ‘blurring’ the histogram which increases the entropy. The dependence
of entropy on noise is important. One consequence is that interpolation of an image may
smooth the image (see section 9 for more detail) which can reduce the noise, and consequently
‘sharpen’ the histogram. This sharpening of the histograms reduces entropy.
An application of entropy for intramodality image registration is to calculate the entropy
of a difference image. If two identical images, perfectly aligned, are subtracted the result
is an entirely uniform image that has zero entropy (as stated above). For two images that
differ by noise, the histogram will be ‘blurred’, giving higher entropy. Any misregistration,
however, will lead to edge artefacts that further increase the entropy. Very similar images can,
therefore, be registered by iteratively minimizing the entropy of the difference image (Buzug
and Weese 1998).
7.3.1. Joint entropy. In image registration we have two images A and B to align. We therefore
have two values at each voxel location for any estimate of the transformation T . Joint entropy
measures the amount of information we have in the two images combined (Shannon 1948).
If A and B are totally unrelated, then the joint entropy will be the sum of the entropies of the
individual images. The more similar (i.e. less independent) the images are, the lower the joint
entropy compared with the sum of the individual entropies
H (A, B) H (A) + H (B). (15)
The concept of joint entropy can be visualized using a joint histogram calculated from
image A and B T (Hill et al 1994). For all voxels in the overlapping regions of the images
(xA ∈ TA,B ), we plot the intensity of this voxel in image A, A(xA ) against the intensity of the
corresponding voxel in image B T . The joint histogram can be normalized by dividing by the
total number of voxels N in TA,B , and regarded as a joint probability density function (PDF)
pAB of images A and B. We use the superscript T to emphasize that pAB changes with T .
Due to the quantization of image intensity values, the PDF is discrete, and the values in each
element represent the probability of pairs of image values occurring together. The joint entropy
H (A, B) is therefore given by
H (A, B) = − pAB (a, b) log pAB (a, b). (16)
a b
The number of elements in the PDF can either be determined by the range of intensity
values in the two images, or from a partitioning of the intensity space into ‘bins’. For example
MR and CT images being registered could have up to 4096 (12 bits) intensity values, leading
to a very sparse PDF with 4096 by 4096 elements. The use of between 64 and 256 bins is more
common. In the above equation a and b either represent the original image intensities or the
selected intensity bins. Joint entropy was simultaneously proposed for intermodality image
registration by Studholme et al (1995) and Collignon et al (1995) at the 1995 Information
Processing in Medical Imaging Conference.
As can be seen from figure 2, the joint histograms disperse or ‘blur’ with increasing
misregistration such that the brightest regions of the histogram gets less bright, and the
number of dark regions is reduced. This arises because misregistration leads to joint histogram
entries that correspond to different tissue types in the two images. This increases the entropy.
Conversely, when registering images we want to find a transformation that will produce a small
number of histogram elements with high probabilities, and give us as many zero-probability
elements in the histogram as possible, which will minimize the joint entropy. Registration can,
therefore, be thought of as trying to find the transformation that maximizes the ‘sharpness’ of
the histogram, thereby minimizing the joint entropy.
The simple form of the equation for joint entropy (equation (16)) can hide an important
limitation of this measure. As we have emphasized with the T superscript on the joint
probabilities, joint entropy is dependent on T . In particular, pAB is very dependent on the
overlap A,B , which is undesirable. For example, a change in T may alter the amount of air
surrounding the patient overlapping in the images A and B. Since the air region contains noise
that will tend to occupy the lowest value intensity bins (e.g. a = 0, b = 0), changing this
overlap will alter the joint probability pAB (0, 0). If the overlap of air increases, pAB (0, 0) will
increase, reducing the joint entropy H (A, B). If the overlap of air decreases, pAB (0, 0) will
reduce, increasing H (A, B). A registration algorithm that seeks to minimize joint entropy
will tend, therefore, to maximize the amount of air in TA,B , which may result in an incorrect
solution. More subtly, interpolation which is needed for both subvoxel translation and any
rotation will blur the image, altering the PDF values pAB .
7.3.2. Mutual information. A solution to the overlap problem from which joint entropy
suffers is to consider the information contributed to the overlapping volume by each image
Medical image registration R23
Figure 2. Example 2D histograms from Hill et al (1994) (with permission) for (a) identical MR
images of the head, (b) MR and CT images of the head and (c) MR and PET images of the head.
For all modality combinations, the left panel is generated from the images when aligned, the middle
panel when translated by 2 mm, and the right panel when translated by 5 mm. Note that while the
histograms are quite different for the different modality combinations, misregistration results in a
dispersion or blurring of the signal. Although these histograms are generated by lateral translational
misregistration, misregistration in other translation or rotation directions has a similar effect.
being registered together with the joint information. The information contributed by the
individual images is simply the entropy of the portion of the image that overlaps with the other
image volume:
H (A) = − pAT (a) log pAT (a) ∀A(xA ) = a|xA ∈ TA,B (17)
R24 D L G Hill et al
H (B) = − pBT (b) log pBT (b) ∀B T (xA ) = b|xA ∈ TA,B (18)
where pAT and pBT are the marginal probability distributions, which can be thought of as the
projection of the joint PDF onto the axes corresponding to intensities in image A and B
respectively. It is important to remember that the marginal entropies are not constant during
the registration process. Although the information content of the images being registered
is constant (subject to slight changes caused by interpolation during transformation), the
information content of the portion of each image that overlaps with the other image will
change with each change in estimated registration transformation. The superscript T in pBT
once again emphasizes the dependence of the probabilities on T . The probabilities pAT have
a superscript T rather than T because image A is the target image which is not interpolated
during registration, but the overlap domain nevertheless changes with T .
Communication theory provides a technique for measuring the joint entropy with respect
to the marginal entropies. This measure, introduced by Shannon (1948) as ‘rate of transmission
of information’ in his article that founded information theory, has become known as mutual
information I (A, B). It was independently and simultaneously proposed for intermodality
medical image registration by researchers in Leuven, Belgium (Collignon et al 1995, Maes
et al 1997) and MIT in the USA (Viola 1995, Wells et al 1996). In maximizing mutual
information, we seek for solutions that have a low joint entropy together with high marginal
pAB (a, b)
I (A, B) = H (A) + H (B) − H (A, B) = pAB (a, b) log . (19)
a b
pA (a).pBT (b)
The difference between joint entropy and mutual information is illustrated for serial MR
images in figure 3. These plots were obtained from MR images that were acquired perfectly
aligned5 . The correct registration transformation should correspond to zero rotation and zero
translation, so we want the optimum value of the similarity measure to be at this position.
Figure 3 plots the value of joint entropy, marginal entropies and mutual information for
misalignments of between 0 and 6 mm in 0.2 mm increments. Subvoxel translation was
achieved using trilinear interpolation which can introduce interpolation errors. The plots
therefore show the change in entropies with translation both using the original data (full curve),
and using the data pre-filtered with a Gaussian of variance 0.5 voxels to smooth the images and
reduce interpolation artifacts (broken curve). These plots demonstrate three important points.
Firstly, the marginal entropies change with translation due to change in the overlap domain
TA,B . Secondly, that mutual information (I (A, B)) varies more smoothly with misregistration
than joint entropy (H (A, B)). Thirdly, subvoxel interpolation can blur the images, resulting
in reduced entropy that introduces local extrema into parameter space, and the consequences
of this are greatly reduced by preblurring the data.
Mutual information can qualitatively be thought of as a measure of how well one image
explains the other, and is maximized at the optimal alignment. We can make our description
more rigorous if we think more about probabilities. The conditional probability p(b|a) is the
probability that B will take the value b given that A has the value a. The conditional entropy
is, therefore, the average of the entropy of B for each intensity in A, weighted according to
5 A two average gradient echo volume sequence was acquired with isotropic voxels of dimension 2 mm. The raw data
were exported from the scanner, and echoes contributing to the two averages were separated and reconstructed as two
different images. Because the echoes contributing to the two images are interleaved, the acquisitions are essentially
simultaneous, so are registered. However, as the echoes in the two images are obtained from different excitations,
random noise and artefact should be different in the two images.
Medical image registration R25
Figure 3. A comparison of change in marginal and joint entropies with cranial–caudal translation
for registration of two MR images that differ only by noise. Top left H (A), top right H (B), bottom
left H (A, B) and bottom right I (A, B). Each plot has two traces. The full curve is calculated
with the images at original resolution, with subvoxel translation achieved using linear interpolation.
Note the local extrema of the measures with the period of the voxel separation (2 mm). The broken
curve is obtained from images that have been preblurred with a Gaussian kernel with variance
σ 2 = 0.5 voxels. This has no effect on H (A), as image A is not interpolated. For H (B), the
preblurring with a Gaussian kernel reduces the interpolation artefacts, and results in a smooth trace
for mutual information.
a generalization of the assumption made by Woods in his PIU measure. The PIU measure
assumes that at registration the uniformity of values in B corresponding to a given value a in
A should be minimal. The information theoretic approaches assume that, at alignment, the
value of a voxel in A is a good predictor of the value at the corresponding location in B. As
misregistration increases, one image becomes a less good predictor of the second. In practical
terms, the advantage of mutual information over PIU is that two structures which have the
same intensity in image A may have very different intensities in image B. For example, in
an MR image, cortical bone and air will both have very low intensities, whereas in CT, air
will have a very low intensity but cortical bone a high intensity. If we have a low-intensity
voxel in image A, then at correct alignment we know that this should either be air or bone
in the CT image6 . The histogram of CT intensity values corresponding to low intensities in
MR will, therefore, have sharp peaks at both low-intensity values and high-intensity values,
and the sharp peaks in the histogram give us low entropy. Even though the MR intensity is a
good predictor of the CT intensity, however, the PIU would give a very low uniformity value.
Mutual information has been compared with other voxel similarity measure for MR–CT and
MR–PET registration (e.g. Studholme et al 1996, 1997).
7.3.3. Normalized mutual information. Mutual information does not entirely solve the
overlap problem described above. In particular, changes in overlap of very low-intensity
regions of the image (especially air around the patient) can disproportionately contribute to
the mutual information. As was stated earlier, Shannon (1948) was the first to present the
functional form of mutual information, calling it the ‘rate of transmission of information’
in a noisy communication channel between source and receiver. In his application in
telecommunications, the time over which the different measurements of the source and receiver
are made are constant, by definition. In image registration, however, the quantity analogous
to time is the total number of image data in the overlap domain TA,B , and this changes with
the transformation estimate T . To remove this dependence on volume of overlap we should
normalize to the combined information in the overlapping volume.
Three normalization schemes have so far been proposed in journal articles to address this
problem. Equations (22) and (23) below were mentioned in passing in the discussion section
of Maes et al (1997), though no results were presented comparing them with standard mutual
information (equation (19))
2I (A, B)
I˜1 (A, B) = (22)
H (A) + H (B)
I˜2 (A, B) = H (A, B) − I (A, B). (23)
Studholme et al (1999) have proposed an alternative normalization devised to overcome
the sensitivity of mutual information to change in image overlap. This measure involves
normalizing mutual information with respect to the joint entropy of the overlap volume
H (A) + H (B) I (A, B) 1
I˜3 (A, B) = = +1= . (24)
H (A, B) H (A, B) I˜1 (A, B) − 2
This third version of normalized mutual information has been shown to be considerably
more robust than standard mutual information for intermodality registration in which the
overlap volume changes substantially (Studholme et al 1999). For serial MR registration,
when images A and B have virtually identical fields of view, however, mutual information
6 We are assuming in this example that the MR sequence being used will give us no other structures with very low
Medical image registration R27
and normalized mutual information (equation (24)) have been shown to perform equivalently
(Holden et al 2000).
With the exception of registration using the Procrustes technique described in section 5.1, and
in certain circumstances the registration algorithm in SPM described in section 7.1, all the
registration algorithms reviewed in this article require an iterative approach, in which an initial
estimate of the transformation is gradually refined by trial and error. In each iteration, the
current estimate of the transformation is used to calculate a similarity measure. The optimiza-
tion algorithm then makes another (hopefully better) estimate of the transformation, evaluates
the similarity measure again, and continues until the algorithm converges, at which point no
transformation can be found that results in a better value of the similarity measure, to within
a preset tolerance. A review of optimization algorithms can be found in (Press et al 1992).
One of the difficulties with optimization algorithms is that they can converge to an incorrect
solution called a ‘local optimum’. It is sometimes useful to consider the parameter space of
values of the similarity measure. For rigid body registration there are six degrees of freedom,
giving a six-dimensional parameter space. Each point in the parameter space corresponds to a
different estimate of the transformation. Non-rigid registration algorithms have more degrees
of freedom (often many thousands), in which case the parameter space has correspondingly
more dimensions. The parameter space can be thought of as a high-dimensionality image
in which the intensity at each location corresponds to the value of the similarity measure
for that transformation estimate. If we consider dark intensities as good values of similarity,
and high intensities as poor ones, an ideal parameter space image would contain a sharp
low intensity optimum with monotonically increasing intensity with distance away from the
optimum position. The job of the optimization algorithm would then be to find the optimum
location given any possible starting estimate.
Unfortunately, parameter spaces for image registration are frequently not this simple.
There are often multiple optima within the parameter space, and registration can fail if the
optimization algorithm converges to the wrong optimum. Some of these optima may be very
small, caused either by interpolation artefacts (discussed further in section 9) or a local good
match between features or intensities. These small optima can often be removed from the
parameter space by blurring the images prior to registration. In fact, a hierarchical approach
to registration is common, in which the images are first registered at low resolution, then the
transformation solution obtained at this resolution is used as the starting estimate for registration
at a higher resolution, and so on.
Multiresolution approaches do not entirely solve the problem of multiple optima in the
parameter space. It might be thought that the optimization problem involves finding the globally
optimal solution within the parameter space, and that a solution to the problem of multiple
optima is to start the optimization algorithm with multiple starting estimates, resulting in
multiple solutions, and choose the solution which has the lowest value of the similarity measure.
This sort of approach, called ‘multistart’ optimization, can be effective for surface matching
algorithms. For voxel similarity measures, however, the problem is more complicated. The
desired optimum when registering images using voxel similarity measures is frequently not the
global optimum, but is one of the local optima. The following example serves to illustrate this
point. When registering images using joint entropy or mutual information, an extremely good
value of the similarity measure can be found by transforming the images such that only air in the
images overlaps. This will give a few pixels in the joint histogram with very high probabilities,
surrounded by pixels with zero probability. This is a very low entropy situation, and will tend
R28 D L G Hill et al
to have lower entropy than the correct alignment. The global optimum in parameter space
will, therefore, tend to correspond to an obviously incorrect transformation. The solution to
this problem is to start the algorithm within the ‘capture range’ of the correct optimum, that
is within the portion of the parameter space in which the algorithm is more likely to converge
to the correct optimum than the incorrect global one. In practical terms, this requires that the
starting estimate of the registration transformation is reasonably close to the correct solution.
The size of the capture range depends on the features in the images, and cannot be known
a priori, so it is difficult to know in advance whether the starting estimate is sufficiently good.
This is not, however, a very serious problem, as visual inspection of the registered images,
described further in section 11, can easily detect convergence outside the capture range. In
this case, the solution is clearly and obviously wrong (e.g. relevant features in the image do
not overlap at all). If this sort of failure of the algorithm is detected, the registration can be
re-started with a better starting estimate obtained, for example, by interactively transforming
one image until it is approximately aligned with the other.
9. Image transformation
Image registration using voxel similarity measures involves determining the transformation
T that relates the domain of image A to image B. This transformation can then be used to
transform one image into the coordinates of the second within the region of overlap of the two
domains TA,B . As discussed in section 2 above, this process involves interpolation, and needs
to take account of the difference in sample spacing in images A and B.
edge enhancement method, so even in the case of identical images differing only by a rigid
body transformation, using linear interpolation followed by subtraction does not result in the
expected null result but instead results in an edge enhanced version of the original.
Hajnal et al (1995a) recently brought this issue to the attention of MR image analysts and
proposed that the solution is to interpolate using a sinc function truncated with a suitable
window function such as a Hamming window. Care must be taken when truncating the
interpolation kernel to ensure that the integral of the weights of the truncated kernel is unity,
or an artefactual intensity modulation can result (Thacker et al 1999).
Various modifications to sinc interpolation have recently been proposed. These fall
into three categories. Firstly, the use of sinc functions with various radii truncated with
various window functions (Lehmann et al 1999). Secondly, approximations to windowed
sinc functions such as cubic or B-spline interpolants (Lehmann et al 1999, Unser 1999).
Thirdly, the shear transform, which involves transforming the image using a combination of
shears (Eddy et al 1996, Cox and Jesmanowicz 1999). This third approach is fast, though it
does result in artefacts in the corners of the image which must be treated with caution.
An assumption implicit in the discussion above is that the original data being interpolated
are uniformly sampled. This is not always the case in medical images. MR physics researchers
are used to the problem of non-uniform sampling in the acquisition, or k-space domain (Robson
et al 1997, Atkinson et al 2000), but this problem is less often considered in the spatial
domain. The most common circumstances when non-uniform sampling arises are in free-
hand 3D ultrasound acquisition and certain types of CT acquisition where the slice spacing
changes during the acquisition. The correct way of interpolating from non-uniformly sampled
data onto a uniform grid is the reverse of sinc interpolation. This methodology, sometimes
used in k-space regridding (Robson et al 1997, Atkinson et al 2000), involves calculating the
sinc coefficients to go from the desired uniform sampling points to the non-uniform locations
acquired, and inverting the matrix of coefficients in order to do the correct interpolation. In
the cases of 3D ultrasound and CT variable slice sample spacing, the data are a long way from
being bandlimited, so the benefits of inverse sinc interpolation may be small in any case.
The most widely used applications of image registration involve determining the rigid body
transformation that aligns 3D tomographic images of the same subject acquired using different
modalities (intermodality registration), or the same subject imaged with a single modality at
different times (intramodality registration). There is increasing interest in non-rigid registration
of the same or different subjects, registration of 2D images with 3D images, and registration of
images with the physical coordinates of a treatment system. In this section we give examples
of some applications of these approaches.
Figure 4. Top row: unregistered MR (left) and CT (right) images. The MR images are shown in
the original sagittal plane, and reformatted coronal plane. The CT images in the original oblique
plane, and reformatted sagittal plane. Note the different field of view of the images. Bottom panel,
MR images in sagittal, coronal and axial planes with the outline of bone, thresholded from the
registered CT scan, overlaid. The registration transformation has 10 degrees of freedom, to correct
for errors in the machine supplied voxel dimensions and gantry tilt angle.
weeks (e.g. monitoring tumour growth). In all these applications it is desirable to have high
sensitivity to small changes in the images. In functional experiments the signal in some voxels
may change by a few per cent between the resting and activated state. In contrast or perfusion
studies, it is desirable to identify regions that enhance, or quantify intensity change in a region
of interest. In longer-term studies, it is desirable to detect small changes in lesion volume or
degenerative change in order to plan treatment or monitor response to therapy. Visual inspection
of images on a light-box has been shown to be less sensitive to these changes than looking at
difference images (Denton et al 2000). Since patients often move during examinations, and
cannot be repositioned perfectly on subsequent visits, registration is an essential prerequisite
for subsequent analysis. Techniques to register intramodality images of the brain using a
rigid-body transformation were an area of active research during the 1990s (e.g. Woods et al
1992, 1998, Hajnal et al 1995a,b, Freeborough et al 1996, Lemieux et al 1998, Holden et al
2000). It might at first seem that intramodality registration is a much simpler problem than
intermodality registration because the images being aligned are very similar to one another. It
turns out, however, that registration accuracy of much less than a voxel is necessary, so great
care must be taken to handle the image transformation issues discussed in section 9 (Hajnal
et al 1995a). While similarity measures such as SSD, RIU and CC described in section 6
are successfully used for intramodality registration of brain images, care must be taken in the
optimization and interpolation to ensure high-quality results. Furthermore, although the brain
can be accurately aligned using a rigid body transformation, deformable regions such as the
scalp and neck, or regions that change substantially between acquisitions (e.g. due to contrast
uptake), can bias the result, leading to significant errors. Presegmentation of the images to
exclude these regions can be necessary (Hajnal et al 1995a). Alternatively, it has been shown
R32 D L G Hill et al
that the information theoretic similarity measures discussed in section 7.3 can be less sensitive
to these changes, and may have an advantage over the more obvious intramodality measures
described in section 6 for this application (Holden et al 2000).
where {u, v} are pixels in the neighbourhood of xA such that |xA − (u, v)| r. There are two
parameters required by this similarity measure. The first is the radius of the neighbourhood
r. Increasing r can improve the reliability of the algorithm, but increases the computational
requirement. Values of between 3 and 5 are proposed. The parameter σ controls the sensitivity
Medical image registration R33
of the measure to image features. Weese suggests it should be larger than the standard deviation
of the noise in the fluoroscopy image but smaller than the contrast of structures of interest.
To understand how pattern intensity works, it is useful to compare it with the sum of squares
of intensity difference measure introduced in section 6.1 above. If pixels A(xA ) and B T (xA )
are identical, and part of uniform patches of radius R > r, then pixel xA will contribute 0 to the
sum of squares of difference measure and N σσ2 +0 = N to the pattern intensity measure. The
to pattern intensity. For a uniform region of very large difference in intensity σ , the
pixel xA will contribute the large value of 2 to the sum of squares of difference measure,
and approximately N σ 2σ+2 ≈ 0 to the pattern intensity measure. As increases, the sum
of squares of intensity measure will increase as 2 , but pattern intensity will asymptotically
approach zero, with the consequence that regions of very great difference in intensity contribute
proportionately less to pattern intensity than to the sum of squares of intensity difference.
Furthermore, if xA is a single pixel with a large difference between modalities, surrounded
by a uniform patch that is the same in both modalities, then this pixel will still give a contribution
2 to the sum of squares of difference measure, but a contribution of N − 1 ≈ N to pattern
intensity. Pattern intensity is, therefore, almost totally insensitive to individual pixels, or small
numbers of pixels n N where there is a very large difference between modalities. Since
high-contrast instruments in the fluoroscopy image have exactly this characteristic, pattern
intensity is much less sensitive to these objects than sum of square of intensity differences.
10.4.1. Thin plate spline warps from point landmarks. The point registration technique
described in section 5.1 above can be used to determine the rigid body or affine mapping that
aligns the points P and Q in a least squares sense. If the structures being aligned are deformable,
then it may be more appropriate to warp one of the images so that the landmarks are aligned.
In the absence of any other information (for example about the mechanical properties of the
tissue), the most appropriate transformation might be the one that matches the landmarks
exactly, and bends the rest of space as little as possible. This requires an appropriate definition
of bending, or bending energy. Intuitively, the squared norm of the second derivative is a good
choice of energy. This can also be justified more rigorously from plate theory (Marsden and
Hughes 1994). Also, by analogy with differential geometry, the curvature is related to second
derivatives of the metric tensor. In more than one dimension, this is to be interpreted as sum
of squares of all second order derivatives of all components of the mapping T .
Variational problems of this type are usually solved by finding a corresponding PDE
equation, whose solution are minimizers. Here the PDE operator is the Laplacian of the
Laplacian () (Harder and Desmarais 1972, Goshtasby 1988, Bookstein 1989). Solutions
are built by superposition of fundamental solutions, called thin plate splines, again by analogy
with plate theory.
The reader might be more familiar with fundamental solutions of the normal
Laplacian = i ∂xi . In both cases, it is important to notice that these solutions have
a different form in different dimensions: in one dimension, the thin plate splines are cubic
splines (Dryden and Mardia 1998), in higher dimensions they are functions of the distance to
the landmarks. In two dimensions there is a logarithmic term
x = Lx + t + ci ri2 ln ri2 (25)
(a) (b)
Figure 5. Example slice from (a) pre-contrast and (b) post-contrast MR mammogram. Without
registration, the difference image contains distracting artefacts (c). Affine registration results in
some improvement (d). Non-rigid registration (e), using a 10 mm grid of B-spline control points
(Rueckert et al 1999) results in reduced artefact and improved diagnostic value (Denton et al 2000).
Figure 6. Axial (top) and coronal (bottom) slices from average images produced from MR scans
of seven different normal controls, registered using (a) rigid registration, (b) affine registration and
(c) non-rigid registration using a 10 mm grid of B-spline control points (Rueckert et al 1999). In
all cases the registration was achieved by maximizing normalized mutual information, as described
in section 7.3.3. Note that after rigid registration the average image is quite blurred, indicating
that rigid registration does not line up the different brain scans very well. After affine registration,
the images are a lot less blurred, especially around basal structures. The non-rigid registration,
however, produces the sharpest image, indicating that this sort of algorithm is better at lining up
brain features between subjects.
There are two main reasons why it is desirable to quantify registration accuracy. The first
is to calculate the expected accuracy of an algorithm in order to ascertain whether it is good
enough for a particular clinical application, or in order to compare one algorithm with another.
The second reason is to assess the accuracy of registration for a particular subject, for example
prior to using the registered images to make a decision about patient management.
The accuracy of a registration transformation T cannot easily be summarized by a single
number, as it is spatially varying over the image. If T is the calculated transformation, and Tg
is the true ‘gold standard’ transformation, then the registration error will vary with position
xA in the image. If we think of each point in the image as a potential target of some treatment,
then we can define the error at this point as the target registration error (TRE) as follows:
TRE(xA ) = |T (xA ) − Tg (xA )|. (27)
The TRE will normally vary with position. For example, in a rigid body transformation there
will typically be rotational components. It may be that at some position in the image, by
chance, the rotational component of the transformation cancels out the error in the translation
component, giving TRE = 0. Elsewhere, however, the TRE will be greater. In the extremely
unlikely case that the only error in a transformation is a translation error, then TRE will be the
same everywhere. If Tg is known, then TRE can be calculated everywhere in the image. In this
case, an image of TRE could be produced. In practice, it is more common to summarize the
TRE distribution for example by considering the mean or maximum value. For most practical
situations, however, Tg is not known, so TRE cannot be calculated.
R38 D L G Hill et al
where K is the number of spatial dimensions (normally three for medical applications) and <i
are the singular values of the configuration matrix of the markers.
three transformations in turn completes a circuit, and should give the identity transformation
for a perfect algorithm.
Tc = TA→B TB→C TC→A .
For any real algorithm, of course, Tc will not be the identity. This will give a transformation
that provides an estimate of the errors. If the registration errors in the process are uncorrelated,
then the RMS registration error for one application of the algorithm will be √13 times the
error for the whole circuit. Because one image is common in each of TA→B , TB→C and
TC→A , however, the errors are not uncorrelated, so the errors estimated in this way will tend
to underestimate the true error of the algorithm. As an extreme example, an algorithm that
always produces an erroneous transformation close to the identity would incorrectly be found
to be perform well according to this measure.
12. Conclusions
In this review we have introduced the topic of medical image registration and discussed the
main approaches described in the literature. The main emphasis of this article is intrasubject
registration of tomographic modalities, which is predominantly used to find the rigid-body
or affine transformation needed to align images of the head. We have also considered in
less detail the closely related topics of non-affine registration (for intrasubject registration of
deformable regions, and intersubject registration) 2D–3D registration, and image-to-physical
space registration. Non-affine registration, in particular, is a rapidly developing area with many
potential applications in healthcare and medical research.
Because most current algorithms for medical image registration calculate a rigid body
or affine transformation, their applicability is restricted to parts of the body where tissue
deformation is small compared with the desired registration accuracy, and in practice they are
used most commonly for registration of images of the head. The most accurate algorithms
for intermodality registration of the head are based on optimizing a voxel similarity measure.
The most generally applicable of these algorithms are currently the ones based on information
theory. These algorithms can be applied automatically to a variety of modality combinations
for intermodality and intramodality registration, without the need for presegmentation of the
images. They can also be extended to non-affine transformations.
R40 D L G Hill et al
One further appeal of these information theoretic approaches is the mystique that surrounds
the word entropy. An interesting anecdote to emphasize this point comes from a conversation
between Shannon and Von Neumann (quoted in Applebaum (1996)). Apparently, Shannon had
asked Von Neumann which name he should give to his measure of uncertainty. Von Neumann
answered: ‘You should call it “entropy”, and for two reasons: first, the function is already
in use in thermodynamics under that name; second, and more importantly, most people don’t
know what entropy really is, and if you use the word ‘entropy’ in an argument, you will win
every time!’ There is, as yet, no proof that the information theory measures are in any way
optimal for image registration, and better measures are likely to be devised in due course.
The first algorithms for medical image registration were devised in the early 1980s, and
fully automatic algorithms have been available for many intermodality and intramodality
applications since the mid 1990s. Despite this, at the time of writing, image registration is still
seldom carried out on a routine clinical basis. The most widely used registration applications
are probably image-to-physical space registration in neurosurgery and registration of functional
MR images to correct for interscan patient motion. Intermodality registration, which accounts
for the majority of the literature in this area, is still unusual in the clinical setting.
Image registration is, however, being widely used in medical research, especially in
neuroscience where it is used in functional studies, in cohort studies and to quantify changes
in structure during development and ageing.
One barrier to the routine clinical use of image registration may be the logistical difficulties
in getting the images to be registered onto the same computer. Medical research labs doing a
lot of imaging tend to have a more integrated infrastructure than hospitals, so do not suffer from
this problem. The healthcare sector is now moving towards integrating text-based and image
information about the patient to produce multimedia electronic patient records, and when this
infrastructure is in place, the logistics of image registration will be much easier. Another
reason for the lack of clinical use of image registration might be that traditional radiological
practice can provide all the necessary information for patient management, and registration is
unnecessary. Even if this second argument is currently valid, the increasing data generated by
successive generations of scanners (including new multislice helical CT scanners) will steadily
increase the need for registration to assist the radiologist carry out his or her task.
It is worthwhile briefly considering how image registration is likely to evolve over the
next few years. Increasing volumes of data and multimedia electronic patient records have
already been referred to, and these practical developments may see registration entering routine
clinical use at many centres. Also, increasing use of dynamic acquisitions such as perfusion
MRI will necessitate use of registration algorithms to correct for patient motion. In addition,
non-affine registration is likely to find increasing application in the study of development,
ageing and monitoring changes due to disease progression and response to treatment. In
these latter applications, the transformation itself may have more clinical benefit than the
transformed images, as this will quantify the changes in structure in a given patient. New
developments in imaging technology may open up new applications of image registration. It
has recently been shown that very high field whole-body MR scanners can produce high signal
to noise ratio images of the brain with 100 µm resolution (Robitaille et al 2000). Intramodality
registration of these images may open up new applications such as monitoring change in small
blood vessels. Also, while ultrasound images have been largely ignored by image registration
researchers up until now, the increasing quality of ultrasound images and its low cost makes
this a fertile area for both intramodality and intermodality applications (e.g. Roche et al 2000,
King et al 2000).
In some ways, medical image registration is a mature technology that has been around for
nearly two decades and has attracted considerable research activity in devising and validating
Medical image registration R41
algorithms, and in demonstrating clinical application. We believe, however, that there will be
substantial additional innovation in this area over the next few years, especially in non-affine
registration, and applications outside the brain. In the next decade, registration algorithms are
likely to enter routine clinical use, and research applications will play a key role in improving
our understanding of physiology and disease processes, especially in neuroscience.
We are grateful to colleagues in the Computational Imaging Science group at King’s College
London for useful discussions and assistance, especially Dr Julia Schnabel for figure 6 and
Christine Tanner for figure 5.
