3D Surface Reconstruction
3D Surface Reconstruction
3D Surface Reconstruction
3D Surface Reconstruction
Multi-Scale Hierarchical Approaches
123
Francesco Bellocchio N. Alberto Borghese
Università degli Studi di Milano Università degli Studi di Milano
Crema, Italy Milano, Italy
1 Introduction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.1 From real objects to digital models . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 1
1.2 The scanner pipeline . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 8
1.2.1 Data acquisition .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10
1.2.2 Generalization.. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 10
1.2.3 Integration of multiple scans . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 11
1.2.4 Optimization . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 14
1.3 Using a multiresolution hierarchical paradigm . . . .. . . . . . . . . . . . . . . . . . . . 18
2 Scanner systems .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 21
2.1 3D Scanner systems .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 21
2.2 Contact 3D Scanners .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 22
2.3 Non-contact 3D Scanners .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 25
2.3.1 Optical 3D Scanners .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 26
2.3.2 Hybrid Techniques .. . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 40
2.4 Evaluation of 3D Scanners . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 41
3 Reconstruction . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
3.1 Surface computation from point clouds .. . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 43
3.2 Spatial subdivision techniques . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 44
3.3 Surface fitting methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 53
3.4 Multiresolution approach . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 56
4 Surface fitting as a regression problem . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
4.1 The regression problem .. . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 61
4.2 Parametric versus non-parametric approaches .. . . .. . . . . . . . . . . . . . . . . . . . 62
4.3 Instance-based regression .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 64
4.4 Locally weighted regression .. . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 65
4.5 Rule induction regression .. . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 66
4.6 Projection pursuit . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 68
4.7 Multivariate Adaptive Regression Splines . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 69
v
vi Contents
Glossary . . . . .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 149
References .. .. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .. . . . . . . . . . . . . . . . . . . . 151
Chapter 1
Introduction
What is a 3D model? When and why are they used for? How are they created? This
chapter presents a brief overview to understand the usefulness of 3D models and
how they can be computed from a physical object. The pipeline that creates a 3D
model is presented and each step is concisely described. In the rest of the book the
main steps of this pipeline are analyzed and described in depth.
Fig. 1.2 Virtual surgery allows combining the 3D model of a patient with 3D models of tools and
implants in order to properly plan surgery. In this figure, the distance between the implant and the
surface of the patient’s bone is represented in three different visualization modes [79] (reproduced
by permission of IEEE)
specific body characteristics of interest. The digital model of the client can be
dressed with any cloth item and used to show the fit of the item to the client. This
can greatly enhance cloth tailoring by specifically taking into account the body
peculiarities for customized solutions. Such systems have started to be deployed
recently.
• Features can be identified on a model and their quantitative evaluation can be
used to compare objects and classify them. This concept is applied in different
contexts. For example, in the manufacturing industry it is applied to quality
control [200], while in security it is used for person identification through
biometric features [163].
• In medical applications, 3D models of human body parts and organs can offer
a virtual view of the body to physicians for observation, diagnosis support,
and therapy monitoring [69][247][199][107] (Fig. 1.2). Recently, computerized
surgery has been also developed in virtual environments to optimize and tailor the
procedure on specific patients with actual surgery performed by a robot guided
by the clinician on the basis of 3D models [79][255][228].
• Virtual environments are very important for training to correctly execute critical
tasks or dangerous procedures, in which the human error has to be minimized.
Virtual surgery is one of these cases: doctors can practice on virtual patients
and gain experience and confidence before performing the real surgery without
having to resort to cadavers or animals [146]. This has also the advantage for
the trainee that he can exercise ad libitum, with neither time constraint nor the
constraint of being in a specific place for training. Another instance of virtual
training was flight simulation, used both for training and evaluating pilots. Often,
in these applications the system is complemented with haptic devices to enrich
the virtual reality experience with proprioceptive feed-back [184][233] (Fig. 1.3).
• 3D models are extensively used in architectural design and reverse engineering
[250][257]. The 3D models can be used as a powerful tool to show and discuss a
project or an idea. They are more and more used also to design mechanical parts
4 1 Introduction
Fig. 1.3 Virtual reality environments can be used for training. In this figure, the training
framework is constituted of two haptic devices and a display for providing the user a proprioceptive
and a visual feed-back of his interaction with the virtual patient [233] (reproduced by permission
of IEEE)
to optimize their interaction with the environment through the visualization of the
interaction in real-time. This has been called visual computing. 3D models have
a long history as a tool to be used as input to manufacturing in different domains.
More recently devices to produce directly a 3D physical object from a 3D CAD
model have become available and go under the name of 3D printers. Such devices
create a physical object by laying down successive layers of particular material
such as plaster, resins or liquid polymers [208].
• The entertainment industry is increasingly using opportunities offered by 3D
modeling [115][128]. In the last decade the number of movies with 3D digital
characters has grown. On the other hand, the use of the digital 3D model
of an actor allows avoiding complex, expensive and time-consuming makeup
sessions to create particular physical features, as well as the use of stunt-men
in dangerous scenes [153][17] (Fig. 1.5). Similarly, accuracy and sophistication
of many modern 3D video games rely on the extensive use of 3D models to
create scenes and characters. Besides, many video games, like sport games, are
nowadays inspired to real persons. 3D scanning can boost the realism of the
game by significantly increasing the fidelity of the avatars avoiding the so called
“uncanny valley” [212].
Different applications may have very different requirements. For example,
reconstruction in virtual archeology needs a good accuracy and a low invasiveness,
but generally scanning time is not an important constraint. On the contrary, in video-
conference applications, real-time processing is mandatory, while the modeling
quality plays a secondary role. Besides, some a-priori knowledge of the object to be
modeled can be useful to achieve a more robust and fast reconstruction procedure.
For instance, when the facial characteristics and mimics are of interest, the acquisi-
tion and reconstruction techniques can be based on face model warping to achieve
better results [39]. Similarly, in industrial quality control the implementation of fast
reconstruction with a low cost is important, since the same operations are repeated
for many objects of the same type.
1.1 From real objects to digital models 5
Fig. 1.4 3D printing is a complementary technology to 3D modeling. From a real object (a), the
3D model can be obtained (b) and reproduced by means of a 3D printer (c). The production of
physical replicas (which can be realized also at a different scale) can be a cost-effective solution for
small number production or when the intrinsic value or fragility of the original object discourage
its physical accessibility [208] (reproduced by permission of IEEE)
Fig. 1.5 A digital 3D actor can exhibit a variegated range of expressions. The mimic posture can in
fact be synthesized from a neutral face (leftmost) after a suitable parametrization of the expression
of the face has been carried out. This information is then used to displace the vertices of the mesh
associated to the neutral face obtaining the desired expression [153] (reproduced by permission of
IEEE)
Fig. 1.6 Constructive Solid Geometry (CSG) is a powerful method for describing the shape of
an object in terms of boolean operations on primitive solids [101]. It is particularly useful for
CAD modeling of mechanical parts (reproduced by permission of IEEE). A large class of complex
real objects has not a regular shape and CGS can fail in providing an accurate description of
their surface. For this kind of objects, 3D scanning is a viable method for obtaining their 3D
model. This is the case, for instance, of the generation of statue models, such as the application
described in Fig. 1.4). Among this kind of applications, the Digital Michelangelo project [150][4]
is a prominent example, due to the importance of the digitized statues, and the methodology and
the tools developed for achieving the results
More complex 3D digital models can be created in two different ways: they
can be produced by drawing their surface using advanced functionalities of CAD
systems or by measuring directly the surface of the physical object through
scanning.
Nowadays, CAD systems operators are able of creating quite complex models
by using two main kinds of tools: NURBS and subdivision surfaces. Non-Uniform
Rational B-Splines (NURBS, [186]) are a mathematical model that allows gen-
erating curves and surfaces with great flexibility and precision. The NURBS are
suitable for handling both analytic and free-form shapes. The local aspect of the
curve is defined intuitively inserting and displacing a set of points that have a local
influence, called control points (Fig. 1.7). NURBS constitute therefore an interactive
framework to create models. Instead, subdivision surfaces [60] is a technique to
create smooth surfaces as the result of an iterating process that starts from a
coarse definition of a polygonal mesh. The smooth surface is computed recursively
partitioning the initial polygonal mesh: for each step the polygons of the mesh are
subdivided in new polygons by adding vertices that may lie outside the original
surface. Depending on the subdivision rule, the new mesh can implement a smoother
version of the mesh of the previous step (Fig. 1.8).
Scanning is a process that allows obtaining the 3D model in a fully automatic
or semi-automatic way by measuring the geometric features of the object as well
as its visual appearance (e.g., color and texture) and, then, by identifying the most
appropriate surface digital model that represents the measured data.
1.1 From real objects to digital models 7
Fig. 1.7 NURBS are a common surface representation paradigm used in CAD tools. The shape of
the surface in panel (a) is controlled through a polygonal mesh, whose vertices are called control
points, shown in panel (b); each control point affects the shape of the surface locally proportionally
to the value of its weight. The reported images have been obtained using the 3D modeling software
suite Blender [2]
Fig. 1.8 Subdivision surfaces are a framework for making denser a polygonal mesh. When a
suitable subdivision rule is used, the resulting mesh becomes smoother and smoother after each
iteration. The results of the application of three successive subdivision iterations on the mesh
in panel (a) are reported in panel (b), (c), and (d), respectively. The reported images have been
obtained using the 3D modeling software suite Blender [2]
In the CAD systems the human operator shapes interactively the surface model
to represent a given object, while 3D scanners are aimed to produce the digital
representation of the surface automatically. Scanners are generally faster than using
8 1 Introduction
a CAD system and achieve a higher (or, at least, quantifiable) level of accuracy of
the scanner object. Furthermore, scanning, being essentially a measurement process,
does not require any artistic ability to the operator. On the other hand, the creation of
a digital model through scanning requires that the model and the measuring system
are available on the site. This is the scenario considered inside this book.
The acquisition process of the surface geometry of a real object and of its visual
appearance is called 3D scanning and the system used to this aim is called 3D
scanner.
3D Scanners can have very different characteristics because their applications
are very variegated. They are characterized by different performances and the
acquisitions techniques used are based on different physics principles. However,
despite of these differences, they all a common pipeline to construct a 3D digital
model. In fact 3D model construction is a modular process composed of different
sequential steps (Fig. 1.9).
The sensors used in 3D scanner sample the object in a set of points acquiring
therefore a finite set of information, such as 3D coordinates of points belonging
to the object surface, color fields, and textures. Starting from this information a
a b
Digitization Digitization
Generalization Fusion
Refinement Refinement
Fusion Generalization
Optimization Optimization
3D Model 3D Model
Fig. 1.9 The scheme shows the main steps of the 3D objects digitization process. In panel (a) a
model of the object is built first for each acquired dataset; the models are fused afterward. In panel
(b) fusion occurs directly on the raw data points and a single model is obtained from the ensemble
of the whole fused data points
1.2 The scanner pipeline 9
In Fig. 1.9 a high level scheme of two possible 3D model construction pipelines
is reported:
• A set of sensors is used to capture data on the geometry of the surface and of its
color and reflectance.
• From these data, a generalization of the acquired data over the whole object
surface is computed.
• The information from different sensors or from different acquisition sessions is
merged.
• The previous steps can be iterated in order to improve the quality of model
(e.g., for obtaining a denser set of data in some regions of the object).
• The 3D digital representation is transformed and optimized using a better
paradigm for a specific application.
Each of the above steps can be improved exploiting a priori knowledge about the
scene, about the acquisition techniques or about the object to be scanned. Exploiting
a priori knowledge allows an improvement of the final 3D model both in terms of
quality and computational complexity. Techniques developed for general cases can
be improved and customized for particular objects.
In the following, the single steps which constitute the reconstruction of a 3D
digital model are now described in depth and the approaches for implementing them
are discussed.
10 1 Introduction
1.2.2 Generalization
The problem of surface reconstruction from points clouds has been largely
studied in the last years. A detailed taxonomy of the reconstruction techniques can
be found in [164], where the methods are classified into two main classes:
1.2 The scanner pipeline 11
When an object is not observable from a single sensor or a very fine detail is
needed with respect to the object dimension, a data integration step is necessary.
In fact, in this case, scanning has to be performed from different points of view, and
the information captured from the different views has to be integrated in order to
calculate a single representation. This operation can be divided in two steps:
• registration, where the different data are represented in the same reference frame;
• fusion, where a single model is generated combining the data from different
views.
Multi-view scanning can be achieved by acquiring the object in same acquisition
session using multiple scanning devices each aimed at part of the object, or by
scheduling multiple acquisition sessions, where the additional acquisition sessions
will be aimed to acquire data in the objects regions in which the reconstruction
quality was too poor.
1.2.3.1 Registration
The registration is an operation used in many fields. Whenever the data, coming
from different sources of information, have to be related to the same reference
system, a registration stage has to be performed. We remark here that registration is
required also when the matching of different models of the same object have to be
performed. In this latter case, models have to be scaled and aligned parallel to each
other to allow proper comparison.
The registration of data from different views corresponds to the inversion of the
function that describes the relative motion of the object with respect to the sensor in
the two views. This process can be simplified if the motion is known like when the
object is placed on a turn table, or the absolute position of the sensors is available.
12 1 Introduction
Fig. 1.10 In panel (a) the data before the registration. If the registration procedure does not
accurately take into account the distribution of the registration error, as proposed in [193], cliques
of surfaces can be generated as an effect of the minimization of the local registration error on
partially overlapping scans, as shown in panel (b). The effect of having two surface cliques is
evident
Generally, each acquisition device has an error distribution which depends on the
features of the acquired object. For example, the measurement error on the data in
the regions close to the object’s border is, typically, greater than that in the central
regions parallel to the sensor. This information, acquisition and device dependent,
can be exploited in the registration procedure to weight the reliability of the points.
The ICP is applied just to the vertices that have a correspondence in the other
mesh (i.e., they project over a face of the other mesh) and with a distance smaller
than a given threshold. The registration is performed, initially, using a low resolution
representation of the surface (that is obtained by means of downsampling the data)
in order to increase the speed of convergence of the alignment. The registration of
multiple views is computed choosing a reference view and aligning the other views
to this one.
In [252] the registration is performed in two steps from the surfaces built for
the different views. In the first phase, for each view, the points with high value
of gradient are selected. The hypothesis is that the polygonal curves that connect
these points have robust features. In the second phase the registration transform
(rototranslation) is computed aligning these curves.
1.2.3.2 Fusion
Even after registration, the surfaces belonging to different views may not overlap
perfectly, due calibration and reconstruction errors that adds to the measurement
error. Moreover, the two surfaces have to be fused so that their description does not
become denser in the overlapping region.
The simple unification of the registered data or the partial surfaces would produce
a poor representation of the object surface, with artifacts in the overlapping regions.
The aim of the fusion procedure is to come up with a single model, improving the
representation of the surface in the overlapping region (Fig. 1.11).
The information on the reliability of the sampled data in the two views can be
used also inside the fusion process to privilege the data from one of the two views.
For instance, in [81], where each view is a range image, each pixel of the view is
weighted with its distance from the closest border. The range image are registered
and a point cloud is obtained from merging them. The position of each point is
computed using a weighted average of the pixels of the multiple views. Since the
border pixels are not very reliable, their position in the point cloud is estimated
using pixels from other (overlapping) range images, resulting in a smooth mapping
of those points in the final dataset.
The procedure in [232] fuses the meshes from two views in two steps. In the first
step, the two meshes are joined together re-triangulated the vertices that fall inside
the overlapping region. In the first step, the parts of the meshes in the overlapping
regions are substituted by triangulating the vertices in the overlapping regions.
14 1 Introduction
Fig. 1.11 Two partial models (a) are registered (b) and merged (c)
Since this procedure can create small triangles in the overlapping regions, these
triangles and all the vertices that do not belong to any other triangle are eliminated
and the resulting holes are filled by re-triangulating the vertices on the hole border.
More recently, an interesting approach to fusion is presented in [176]. The in-
tegration of the data from multiple views is here performed using the wavelet
transform: the original data are decomposed through wavelet transform into low
and high frequency components, then the components are separately integrated and
finally are composed back.
1.2.4 Optimization
Fig. 1.12 The poses of a chinchilla 3D model shown in panel (a) are obtained modifying the
mesh through a skeleton, reported in panel (b). These images have been obtained using the 3D
modeling software suite Blender [2], while the model has been realized in the Peach Project from
Blender.Org [1]
recognize it, for example for copyright reasons, without degrade the geometry and
the appearance of the model. Only a particular procedure allows the extraction of
the marker, allowing the recognition of the model.
The most critical feature of a watermarking procedure is its robustness: it has
to resist to the common editing operations carried out on digital models, such as
scaling, resampling and segmentation.
Animation of a 3D model is obtained associating a motion law (as a function of
time) to the surface points (Fig. 1.12b). However, in practice, specifying the motion
law for each point of a model is not feasible and a simplified solution is adopted.
This is based on the creation of a structure, called skeleton, which is constituted of
a set of links connected by joints. The skeleton is associated to the surface through
a procedure called rigging; each point of the surface is associated to one (or more
than one) joints of the skeleton, possibly with a different weight, and animation is
obtained trough rotation of the joints.
1.2 The scanner pipeline 17
Fig. 1.13 Panels from (a) to (f) shows some intermediate steps of morphing an eagle model into
a parrot one using the software made available at [11]. Note the simultaneous change in geometry
and color fields
The skeleton can be substituted by a low resolution surface that can be directly
animated as in [145][143]. Such models can be obtained simplifying a high
resolution mesh and therefore the correspondences of this mesh with the original
one is automatically found.
Morphing is the operation that is used to transform the geometry and visual
appearance of an object into that of another object (Fig. 1.13). Morphing is realized
by the computation of a smooth time function that relates some features of the two
different models, these can be, for instance, the position of corresponding points
and their color value [144]. To obtain good results, features characteristic of the
two models have to be correctly matched before morphing. For instance, for the
face morphing, the nose of the original face should become the nose of the second
face. The presence of a hierarchical structure for the representation can simplify
morphing. In fact, as the most representative features of the objects should be
present at the large scales, the morphing transformation can be performed using
mainly the few features present in the layers at low resolution.
In the following we will analyze in depth the process of building a model, starting
from its acquisition and set-up procedures.
18 1 Introduction
regions of the two partial models, described by a proper subset of the parameters
of the models. These are the only data that have to be processed for computing the
parameters of the fused model.
Finally, multiresolution representation can be used for incorporating topology
information, since a low detailed model usually approximates well the topology of
the surface.
Chapter 2
Scanner systems
Generally, the first step in the creation of a 3D model consists in capturing the
geometric and color information of the physical object. Objects can be as small as
coins or as large as buildings, they can be still or move while scanning, and this has
prompted the development of very different technologies and instruments. The aim
of this chapter is to present such technologies to explain the techniques on which
3D scanners are based. Comparison in terms of accuracy, speed and applicability
is reported, in order to understand advantages and disadvantages of the different
approaches. How to use the information captured to compute the 3D model will be
discussed in the next chapters.
Accuracy The accuracy is an index that describes how close the measurements
provided are to their true value. Different techniques can be used to determine
such index. They can be classified in direct measurement (e.g., using a lattice of
known step) and indirect (e.g., average distance of sampled points on a plane
from the optimal surface).
Resolution The resolution measures the details density that the system can recover,
or in similar way, the minimum distance that two features should have to be
discernible.
Speed The speed of an acquisition system is evaluated as the time required for
measuring a given feature (for example, the points per second that can be
sampled). Obviously, the kind of features measured is equally important.
Fig. 2.1 A 3D scanner taxonomy based on the physical principle exploited by the system for
measuring the object geometrical features
Generally, these systems enjoy a good accuracy (e.g., the Helmel Checkmaster
112-102 scanner system [3] has an accuracy of 8.6 μm) and they are used mostly in
manufacturing. They have two main disadvantages: the working volume is bounded
by the structure and sampling is carried out only along the vertical direction.
24 2 Scanner systems
Fig. 2.2 Coordinate Measuring Machines: the Helmel Checkmaster 112-102 scanner system has
a working volume of 300 × 300 × 250 mm3 and an accuracy of 8.6 μm [3] (reproduced by
permission of Helmel Engineering Products, Inc., Niagara Falls, NY, USA)
that the tactile probe of both these systems could be substituted with another kind
of sensor, for instance an ultrasound sensor, to realize non-contact measurement. In
this case the system would become a non-contact 3D scanner.
In transmissive systems the object has to be positioned between the emitter (which
irradiates the object) and receiver (which collects the radiation attenuated by
the object). The main representative of this category is the Industrial Computed
Tomography. The radiation is generated by an X-ray tube by means the collision of
electrons with particular materials, usually tungsten, molybdenum or copper. The X-
ray photons emitted by the collision penetrate the target object, and are captured by
a 2D detector as a digital radiographic image. The density and the thickness of
the object affect the energy collected by the X-rays detector and a 3D model can
be reconstructed from a set of 2D X-ray images of the object taken from different
views. The views can be obtained either rotating the object and fixing the source-
sensor pair (for example, positioning the object on a turn table) or fixing the object
and rotating the source-sensor pair (for example, in medical scanners, where the
system revolves around the patient). From this set of 2D radiographs, a 3D model,
generally defined as an ensemble of voxels, can be reconstructed [88], [107].
The three-dimensional resolution of the obtained model ranges from a few
micrometers to hundreds of micrometers, and depends on X-ray detector pixel size
and the set-up. This kind of systems allows the reconstruction of the internal struc-
ture of the object and the method is largely unaffected by surface characteristics.
It is worth noting that both the density and the thickness of the object affect the
energy collected by the X-rays detector. Furthermore tomographic reconstruction is
computationally intensive.
Reflective Systems
The reflective systems exploit the radiation reflected by the object surface for
estimating the position of the points on the surface. They can be classified upon
the type of radiation used. In particular, optical systems use optical radiation
26 2 Scanner systems
(wavelength between 100 nm and 300 μm), while non-optical systems use other
types of waves to make the measurements. Since optical systems form the main
category of 3D scanners, they will be considered in depth in the next section.
The class of non-optical systems is constituted of devices based on radar or
sonar. Although the type of waves used is very different (radars use electromagnetic
microwaves while sonars use sound or ultrasound waves), both are based on
measuring the time-of-flight of the emitted radiation. The distance covered by the
radiation, which is assumed equal to double the distance of the object from the
scanning device, can be computed from the time required for the wave to reach
the object and return to the system and from the known speed of the wave. As this
principle is used also for a class of optical scanners, it will be better explained in the
next section.
Radar systems have a very large depth of field and can measure the surface of
objects up to 400 km far away. Moreover they can perform ground penetrating
reconstructions [177][59].
A typical application is for air defense and air traffic control. These systems are
quite expensive and generally have a low accuracy.
Sonic waves are particularly advantageous underwater, where optical radiation
would be distorted or attenuated too much. Nevertheless, these systems are charac-
terized by a low accuracy due to low signal to noise ratio.
Passive scanners do not emit any kind of radiation by themselves as they mea-
sure the radiation reflected by the objects surface. Generally, they are based on
Charge-Coupled Devices (CCD) sensors employed by commercial digital cameras.
The camera collects images of the scene, eventually from different points of view
or with different lens arrangements. Then, the images are analyzed to compute the
3D coordinates of some points in the scene.
Passive scanner can be very cheap as they do not need particular hardware.
However, they are generally not able to yield dense and highly accurate data.
Moreover, they require a large computational effort to produce a set of 3D points
sampled over the surface.
2.3 Non-contact 3D Scanners 27
Fig. 2.3 The intersection of the projected rays gives the 3D position of the points
Although they share the same sensor technology, different families of passive
optical scanner can be found in the literature [246], which are characterized by the
principle used to estimate the surface points: stereoscopic, silhouettes, texture (or
contour) and defocusing.
Stereoscopic systems
The stereoscopic systems are based on the analysis of two (or more) images of the
same scene, seen from different points of view. The 3D points of the scene are
captured by each camera through their 2D projection in the taken images. The first
task of the reconstruction algorithm consists in identifying the 2D points in different
images that correspond to the same 3D point in the scene (correspondence problem).
From the 2D points their associated 3D point can be determined as the intersection
of the retro-projected rays (Figs. 2.3 and 2.4). This reconstruction method is known
as triangulation. It is worth noting that this method requires complete knowledge
of the set-up parameters: the (relative) position and orientation of the cameras,
as well as of the cameras internal parameters (focal length, optical center, pixel
size and distortion parameters). These parameters are determined during a phase
called calibration. Generally, this phase is performed before the scanning session
starts, using adequate known objects, like chessboards or simple objects on which
the correspondence problem can be easily solved. An estimate of the calibration
parameters can be also computed directly from the images of the object (exploiting
some additional constraints on the acquired model) [161].
28 2 Scanner systems
Fig. 2.4 Example of real pair of images where peculiar points are highlighted
Shape-from-silhouettes systems
a b c d
e f g h
i j k l
Fig. 2.5 In (a), (c), (e), (g), (i), and (k), the object seen from different view angles. In (b), (d), (f),
(h), and (l), the silhouette of the object for each view. From the silhouettes, only the visual hull of
the can be computed and, as shown in the figure, the cavities of the object cannot be detected and
reconstructed
Such systems have the benefit to be easily realizable with low cost hardware, but
they have the strong limitation that a controlled turntable is required to move the
object with respect to the camera. Furthermore they have the strong limitation that
only convex objects can be accurately reconstructed. In fact, the cavities of an object
are not visible in the projected silhouettes and cannot be reconstructed. This limits
the use of these systems in many real applications.
Techniques that extract information about the object’s shape from its texture or
contour provide useful clues for 3D digitization and are interesting results of the
computer vision theory, but are rarely implemented in real 3D scanners. In fact,
30 2 Scanner systems
these techniques are not able to compute the 3D position of points on the object
surface, but only the surface curvature (up to a scale parameter) or its orientation.
Shape-from-texture is grounded on the hypothesis that the surface of the object
is covered by a texture characterized by a pattern that is repeated with regularity.
The curvature of the surface can be computed analyzing the distortion of the texture
and the field of local surface normals can be estimated from the analysis of local
inhomogeneities [20]. Furthermore a diffuse illumination of the scene is required,
as the shading can influence the texture analysis.
A similar technique is called shape-from-contour. In this case the surface
orientation is computed by the analysis of the distortion of a surface. For example, if
the object contour is known to be a circle (e.g., a coin), while the contour of the
acquired object is elliptical, the surface orientation that realizes this distortion due
to perspective projection, can be estimated.
Shape-from-defocus systems
Time-of-flight scanners
Time-of-flight (ToF) scanners measure the distance of a point on the surface from
the scanner through the measurement of the time employed by the radiation to reach
the object and come back to the scanner. Knowing the speed of the radiation and
measuring the round-trip time, the distance can be computed. Moreover, from the
direction of the radiation emitted, one 3D point of the surface can be identified
[110][9]. Hence, changing the direction of the emission, the system will sample
a different point over the surface and therefore it can cover the entire field of
view.
Depending on the type of waves used, such devices are classified as optical
radar (optical waves), radar (electromagnetic waves at low frequency) and sonar
(acoustic waves). The systems that use optical waves are the most used. Such
systems are sometimes referred to as LIDAR (LIght Detection And Ranging) [241]
or LADAR (LAser Detection And Ranging) [16] (Fig. 2.6). They are characterized
by a relatively high acquisition speed (10, 000 ÷ 100, 000 points per second) and
their depth of field can reach some kilometers.
Generally, the accuracy of optical ToF scanners is limited because of the high
acquisition speed. In fact, a depth accuracy of 1 mm requires a time measurement
accuracy in the order of picoseconds. Hence, these systems are generally applied
in surveying very large and far objects, such as buildings and geographic features.
32 2 Scanner systems
The optical properties and the orientation of the surface with respect to the ray
direction affect the energy collected by the photo detector and can cause loss of
accuracy.
As said above, these systems are often used for geographic reconstruction. Aerial
laser scanning is probably the most advanced and efficient technique to survey a
wide natural or urban territory. These systems, mounted on an airplane or on a
helicopter, work emitting/receiving up to 100,000 laser beams per second. The laser
sensor is often coupled with a GPS satellite receiver, which allows recovering with
good accuracy the scanner position for each acquired point. Hence, each point can be
referred to the same reference system and the acquired points (which form a dense
cloud of points) can be related to a cartographic reference frame, for an extremely
detailed description of the covered surface [239].
ToF scanners are also often used in environment digitization. A relatively recent
application is the digital reconstruction of a crime scene. Using the constructed
digital model, police is helped in the scene analysis task.
For this last application, the typical scanner is constituted of a rotating head
which permits a wide field of view. For example the model Leica ScanStation C10
has a field of view of 360◦ horizontal and 320◦ vertical [8] (2.6).
Another kind of ToF system is the Zcam [129] (produced by 3DVSystems, and
lately acquired by Microsoft) which provides in real-time the depth information of
the observed scene. The scene is illuminated by the Zcam which emits pulses of
infra-red light. Then it senses the reflected light from scene pixel-wise. Depending
on the sensed distance, the pixels are arranged in layer. The distance information is
output as a gray level image, where the gray value correlates to the relative distance.
Phase shift scanners use a laser beam whose intensity is sinusoidally modulated over
time [5]. From the phase difference between the emitted and the reflected signal the
round-trip distance can be computed, as the difference is proportional to the traveled
distance (Fig. 2.7a). Since the phase can be distinguished only within the same
period, the periodicity of the signal creates ambiguity. To resolve this ambiguity,
signals at multiple frequencies are used. This method has performances quite similar
to the ToF method, but can reach a higher acquisition speed (500,000 points per
second).
the interaction of this radiation with the surface object will produce a spot, which
can be detected by a sensor (typically a CCD camera [13]). The orientation and the
position of the source and the sensor are typically known. From the spot location
on the sensor, the line between the sensed spot and the camera center point can be
computed, line l1 in Fig. 2.8a. As the direction of the laser is known, the line between
the laser emitter and the spot can be computed, line l2 in Fig. 2.8a. The 3D position
34 2 Scanner systems
of the spot on the object surface can be computed as the intersection between l1 and
l2 can be computed by triangulation (Fig. 2.8a). If the laser orientation and position
are not known, it is possible to calculate the coordinates using two or more cameras
using standard stereoscopic methods [48].
Such systems acquire not more than one point per frame. If more complex
patterns like lines or grids are used, more points per frame can be captured. As a
laser line describes a plane in 3D space, the camera captures the contour resulting
from the intersection of this plane and the object surface. Then, for any pixel of
the contour, the corresponding 3D point on the object surface can be found by
2.3 Non-contact 3D Scanners 35
intersecting the ray passing through the pixel and the 3D plane described by the
laser. The use of a grid allows to sample the surface at the grid crossing that can
be robustly identified trough standard detectors [229]. However, identifying the grid
corners can be more complex than in the single beam case.
More complex structured patterns have been proposed and specific techniques
have been proposed to reconstruct the surface from them using a calibrated camera-
projector pair [27]. The aim of these techniques is to identify each point in a robust
way. For instance in [249] colored stripes are proposed. However, this method
presents a severe drawback: both the surface color of the object and the ambient light
influence the color of the reflected light. For this reason, to reconstruct a colored
(or textured) object, other kinds of coding are preferred.
In [206], a structured-light range-finder scanner composed of a projector and a
camera that uses temporal stripe coding is proposed. For every frame, a different
pattern of white and black stripes is projected over the object, and the camera
captures the stripes pattern distorted by the the surface of the illuminated object.
The profiles of the object are identified by identifying the transitions from black to
white and white to black over the image and over time. Actually, the entities used for
carrying the code are not the stripes, but the stripe boundaries: in this way, a more
efficient coding is possible. In fact, a single stripe can carry one bit (the stripe can be
black or white), while a boundary can carry two bits (it can be white-black or black-
white). This method has shown to run at 60 Hz. Four successive frames are exploited
to acquire and reconstruct the model from the actual scanning view as a matrix of
115×77 points. To obtain the whole model the object is rotated continuously while
partial models are reconstructed. At the end the partial models are fused together.
Another very efficient scanner system is proposed in [127]. This system projects
sequentially three phase-shifted sinusoidal gray-scale fringe patterns that are cap-
tured by a camera. The analysis of the intensity of three consecutive frames allows
to compute, for each pixel, the phase of the three patterns. This information allows
identifying a specific fringe element. The pattern phase maps can hence be converted
to a depth map by a phase-to-depth conversion algorithm based on triangulation.
The scanner has been shown to measure up to 532×500 points every 0.025 s (about
107 points/s), achieving a system accuracy of 0.05 mm. For example, the system has
been claimed able to measure human faces, capturing 3D dynamic facial changes.
In order to provide a high definition real-time reconstruction, the processing power
of a Graphics Processing Unit (GPU) [155][179] is employed taking advantage of
the highly parallel structure of the reconstruction algorithm.
A very popular device recently introduced for gaming control, Microsoft Kinect,
is in fact a real-time 3D scanner. This device is equipped with a RGB camera, an
infrared camera, and an infrared projector. The infrared camera captures images
of the scene illuminated by the infrared projector with a static structured pattern
constituted of pseudo random dots of different diameter (Fig. 2.9a). This simplifies
the identification of the spots on the infrared camera. The reconstruction is
performed by triangulation similarly to spot based scanners. This system allows
the acquisition of 640 × 480 matrix of points at 30 Hz. The working range of the
Kinect is 0.8–3.5 m, while the accuracy decreases with the object distance. Because
36 2 Scanner systems
Fig. 2.9 In panel (a) the IR pattern projected by Kinect is shown. In panel (b) a 3D model obtained
using Kinect [174] (reproduced by permission of IEEE)
its low cost (about 100 US dollars), Kinect is used, today, in many research projects
as 3D scanner for indoor reconstruction [174] (Fig. 2.9b) and for robust tracking.
Kinect has been recently explored as input device of a platform that has the goal to
move rehabilitation at home [12][187] (Fig. 2.10).
The active triangulation systems, are in general characterized by a good accuracy
and are relatively fast. The strong limitation of these systems is the size of the
scanning field. As the coordinates are computed by means of triangulation, the
accuracy of these systems depends on the angle formed by the emitted light and
the line of sight, or equivalently, by the ratio between the emitter/sensor distance
and the distance of the object (d/l Fig. 2.8). In the optimal set-up, the considered
angle should be 45◦ [44]. Moreover, the resolution of the CCD, the resolution and
the power of the light emitter limit further the active triangulation system to a depth
of field of few meters. Hence, these systems cannot be usable for digitization of large
objects. Furthermore object’s color and ambient illumination may interfere with the
measurement.
Fig. 2.11 Example of the shape information contained in the shading: different light sources
produce different gray level gradients on the image. In each figure, the light comes from a different
corner
reflection properties of the object surface is also required. In particular, the surface
should be Lambertian, namely the apparent brightness of the surface has to be the
same when the observer change the angle of view, and the albedo (i.e., the fraction
of light that is reflected) should be known.
As the method implies the use of a known radiation, it can be considered
belonging to the active systems class. Under these conditions the angle between the
surface normals and the incident light can be computed. However in this way the
surface normals are derived as cones around the light direction. Hence, the surface
normal in a given point is not unique and it is derived considering also the value of
the normals in a neighborhood of the considered point. Moreover, the assumption
of a smooth surface is often made.
When a photometric stereo technique is used, the problem is simplified by
illuminating the scene from different positions [122] [120]. With this technique,
introduced in [248], the estimate of the local surface orientation can be obtained
using several images of the same surface taken from the same viewpoint, but under
illumination that comes from different directions (Fig. 2.11). The light sources
are ideally point sources, whose position is known with respect to the reference
system, and they are oriented in different directions. The lights are activated one at
a time, one for each captured frame, so that in each image there is a well-defined
2.3 Non-contact 3D Scanners 39
Fig. 2.12 A scheme of Moiré Interferometry system is represented. A regularly spaced grid is
projected on the surface of the object and the scene is observed by a camera through a reference
grid
light source direction from which to measure the surface orientation. Analyzing the
sequence of intensities change of a region, a single value for the surface normal
can be derived. In general, for a Lambertian surface, three different light directions
are enough to solve uncertainties and compute the normals. This approach is more
robust with respect to shape-from-shading, but the use of synchronized light sources
implies a more complicated 3D system, which can strongly limit the acquisition
volume. On the other hand, the availability of images taken with different lighting
conditions allows a more robust estimate of the color of the surface.
This technique allows a high accuracy (on the order of micrometers), but a very
small field of view is required. In fact, the grid projected has to be very dense (e.g.,
with 1000-2000 lines/mm). This characteristic limits the method to microscopic
reconstruction.
The systems have been classified in a taxonomy that privileges the physical
principle exploited to extract the 3D information. However, other properties of the
scanning systems can be used as a classification key, like accuracy, resolution, or
speed of acquisition. Nevertheless, these properties are more related to an actual
implementation of the system than to a class of scanners and hence they are not
suited for a robust classification. On the other hand, these properties have to be
considered when a scanning system has to be chosen and are often critical for the
choice itself. In Table 2.1, the data of some commercial scanners are reported to
illustrate the variagated scenario. Depending on the target application, the critical
capabilities can differently rank each scanner and guide the choice.
In [57] a method for evaluating the 3D scanners is suggested. It considers some
important features like field of view, accuracy, physical weight, scanning time, and
it has associated a weight to each feature. By scoring each feature, a single feature
can be computed as the sum of the single score for each scanner. Scores given to
each feature depend on the application domain which make the scoring arbitrary.
Anyway, the three principal aspects that should be considered in order to choose
a 3D scanner are the properties of the objects to be acquired (size and material
features), the accuracy required, and the budget, under auxiliary constraints such as
the speed of acquisition and the environmental conditions. Some attention should
be paid also on the human factors: some devices require a deep knowledge of the
principles exploited by the scanner and can be used only by trained people.
An important issue associated to all 3D scanners is the calibration procedure.
Generally, 3D scanners have different setups and a reliable acquisition of a point
cloud is made possible only if the setup parameters are known. The aim of
calibration phase is the estimate these parameters. This phase is critical for many
types of scanners because the precision of the system is, typically, tight connected
to the quality of calibration performed. Moreover the time employed to calibrate can
42 2 Scanner systems
be very long. The procedure can spend several minutes, even though the scanning
sessions can require just few seconds. Calibration procedures typically used for
video systems have been employed [256][230][41]. When only subset of calibration
parameters are required, simplified calibration procedures can be adopted [45][106].
This is for instance the case of Kinect 3D camera for which simplified procedure
have been developed [108][218] that take into account that the relative arrangements
of the sensors is known. A particular issue is represented by distortions introduced
by the optics. These can be effectively corrected on the field through adequate
interpolations [61][116].
However, as available computational power increases, more attention is paid by
the scanner designers to making the systems more user friendly. In fact, in the last
decade many research reports have been related to the estimate of the calibration
parameters without a proper calibration stage. In this track, an interesting approach,
mainly oriented to stereoscopic techniques, is the passive 3D reconstruction, which
allows estimating the calibration parameters after the acquisition session.
Since devices for the reproduction of 3D content are becoming widely available
to the consumer market, compact and easy-to-use devices for capturing 3D contents
are likely to be proposed. Hence, it can be envisioned that the miniaturization of
components such as CCD sensors or pico-projectors will be exploited for imple-
menting very small, point-and-click optical devices. The light-field photography
is another promising technology that can potentially be used for capturing the 3D
properties of real objects. A light-field acquisition device is a lensless camera that
can capture, for each pixel, not only the intensity and the color of the incoming
light rays irradiated from the scene, but also their direction (that is a sampling
of what is called the plenoptic function of the scene). These information can be
processed through computational photography techniques in order to render the
desired image. The principal advantage of such technology is the possibility of
choosing a-posteriori the setting of the camera (e.g., focus distance, aperture), but
it also enables the extraction of the 3D information of the scene. Although some
commercial models are available, this technology is not yet mature for a large
consumer market, but in this fast-moving field we can easily envision their broad
availability in the next future.
Chapter 3
Reconstruction
Once the geometrical data of a physical object have been sampled on its surface,
the next step is the generalization of the sampled data to obtain a continuous
description of the object surface, to which visual attributes (like color, textures, and
reflectance) can be associated. In this chapter an overview of the techniques used
for generalization is presented. These can be subdivided into two broad families:
volumetric and surface fitting methods.
The aim of the reconstruction phase is to find the best approximating surface for
the collection of data gathered in the acquisition phase. There are two aspects that
characterize this problem: the a priori knowledge about the object and the kind of
information collected in the acquisition phase.
In order to solve this problem, a paradigm for surface representation has to
be chosen such that a priori knowledge about the problem can be encoded.
For example, if the acquisition is restricted to a certain class of objects, a parametric
model can be used to represent the objects of that class. The reconstruction of the
surface becomes a search of the parameters that best fit the acquired data.
If the a priori knowledge about the object is limited or the class of objects is large,
the reconstruction paradigm should have many parameters in order to accommodate
many different shapes and the reconstruction technique should be more complex.
The problem of surface reconstruction can be formulated as an optimization
problem. Although a model based approach is more robust [78], it can be applied in
few cases. The general case, in which a priori knowledge is strongly limited, is here
examined.
Spatial subdivision techniques are based on searching the volume occupied by the
object. The algorithms of this class are composed of the following steps [164]:
1. decomposition of the point space in the cells,
2. selection of the cells crossed by the surface,
3. surface computation from the selected cells.
In [19] the bounding box of the points is regularly partitioned into cubes
(Fig. 3.1a–b). Only the cubes that have at least an internal data point are considered
(Fig. 3.1b). A first surface approximation is composed of the exterior faces of the
cubes that are not shared by any two cubes. These faces are then diagonally divided
in order to obtain a triangular mesh (Fig. 3.1c). A smoother version of this surface
is obtained moving each vertex of the mesh in the point resulting from a weighted
average of its old position and the position of its neighbors (Fig. 3.1d). Then, the
reconstruction is improved by adapting the resulting surface to the data points.
A critical factor of the volumetric methods based on a regular lattice is the size
of the cells.
In [25] tetragonal cells are used. First of all a rough approximation of the
surface computed using α-shapes (see Glossary 7.3.2). For each point, a scalar that
represents the signed distance between the point and such surface is computed. The
algorithm starts with a single tetrahedron that includes all the points, which is then
split in four tetrahedrons adding the barycenter to the four vertices; for each of
the five points, the signed distance from the approximation surface is computed.
3.2 Spatial subdivision techniques 45
The tetrahedrons whose vertices have the same sign of their scalar (all positive or
all negative) are deleted. For each of the remaining tetrahedrons an approximation
of the surface that passes through it (a Bernstein-Bézier polynomial surface) is
computed. If the approximation error of the surface with respect to the points inside
the tetrahedron is over a given threshold, the procedure is iterated.
In [182] the set of points is partitioned by means of the k-means clustering
algorithm. Each cluster is represented by a planar polygonal approximation and the
model obtained is used as a first approximation of the surface. In order to avoid the
problem of putting in the same cluster points from different faces of a thin object,
the points’ normals (estimated by the local properties of the dataset) are also used
in clustering.
In [168] a mesh deforming procedure is used. The initial model is a non-
self-intersecting polyhedron that is either embedded in the object or surrounds
the object in the volume data representation. Then the model is deformed with
respect to a set of constraints. For each vertex, a function that describe the cost
of local deformations is associated; the function considers the noise of the data, the
reconstruction features and the simplicity of the mesh.
In [19] a polygonal model is transformed in a physical model and is used in
a data-driven adaptation process. The vertices are seen as points with mass and
the edges are seen as springs. Moreover, each vertex is connected to the closer
data point by a spring. The evolution of the model can be expressed as a system
of differential linear equations and the solution can be found iteratively as the
equilibrium state. After the computation of the equilibrium state, the physical model
46 3 Reconstruction
The parameter η is the learning rate, and controls the speed of adaptation, while the
value assumed by hj (k) is inversely proportional to distance of the units j and k,
evaluated on the reticular structure (topological distance). When applied to three-
dimensional data, the reticular structure will represent the polygonal surface and it
will be a 2D manifold embedded in the 3D space that smoothly approximates the
input data points.
The SOM presents two kinds of problems: convergence and dead units. The
quality is greatly affected by the initial position of the units: a good starting
configuration allows saving computational time, hence, a preprocessing phase
devoted to obtain an initial model close to the data can be useful in this sense.
Moreover, some units, called dead units, may form. These are units that do not
move significantly during the adaptation phase because they are pushed too far
from any of the data. These units may significantly degrade the quality of the
obtained surface. To solve the problem of dead units, in [23] a two steps algorithm
is proposed. In the first step (direct adaptation), the input is presented to the SOM
and the respective units are updated, as in the standard one, while in the second
step (inverted adaptation) the units are randomly chosen and they are updated using
the closest input data. Hence, potential dead units can be brought near to the data
and can effectively participate to the fitting. A similar technique is used in [26],
where the attraction of data points outside the SOM mesh is increased. Hence,
especially during the first steps of learning, the mesh is expanded toward the external
points (Fig. 3.2a–c). When the SOM lattice is exploited to represent the surface
3.2 Spatial subdivision techniques 47
Fig. 3.2 3D surface reconstruction through SOM. The data set in panel (a) is reconstructed by
the SOM shown in panel (b) as a lattice and in panel (c) as a shaded surface [26] (reproduced by
permission of IEEE). The SOM training procedure has been modified to improve the reconstruction
accuracy at the border of the dataset, by increasing the attraction of the data points that lie outside
the mesh. Spurious triangles are a common drawback of SOM based surface reconstruction meshes.
In panel (d), this defect is evident in the region of the bunny’s ears. In [162], an iterative procedure
based on SOM is proposed. A low resolution mesh is adapted using the SOM training algorithm.
At the end of the training, unstable vertices are removed and large triangles are subdivided. If
the reconstruction accuracy of the resulting refined mesh does not satisfies the specifications, the
procedure can be iterated. This allows to obtain a sequence of meshes at increasing resolution, as
those in panel (e), as shown in [162] (reproduced by permission of IEEE)
that reconstruct the data points, some precautions have to be taken to avoid self-
intersection and spurious triangles. For instance, in [162], the surface reconstruction
has been structured in three phases. The mesh obtained by the training of the SOM is
processed to find a better fitting to the data by applying mesh operators such as edge
swap and vertex removal to the unstable vertices of the mesh (i.e., those vertices
48 3 Reconstruction
Fig. 3.3 Example of α-shape computation in 2D space. In (a) the Delaunay triangulation for a
set of six points is shown, in (b), (c), and (d) the results of the edges elimination based on three
different values of α
that are relatively far from the data points). Finally the mesh is refined by adding
new vertices: large triangle are subdivided and vertices with high connectivity
are splitted. The process can then be iterated obtaining a sequence of increasing
resolution meshes (Fig. 3.2d–e).
There are some methods based on the deletion of elements. This approach is
called sculpturing or carving. In [85] a reconstruction algorithm based on α-shape
(see Glossary 7.3.2) has been proposed. Delaunay triangulation (see Glossary ) is
first computed for the given data set and then every tetrahedron which cannot be
circumscribed by a sphere of radius α is deleted: the remaining triangles compose
the α-shape of the dataset. It is worth noting that for α = 0 the α-shape is the dataset
itself, while for α = ∞ the α-shape is the Delaunay triangulation of the dataset.
In Fig. 3.3a–d the working principle for the 2D case is represented. In Fig. 3.4,
instead, the surfaces reconstructed from a cloud of points through 3D α-shape for
several values of α are reported [226].
The following criterion is adopted for the computation of the faces that constitute
the surface: the two spheres of radius α that pass through the vertices of each triangle
are computed. The triangle belongs to the surface if at least one of the two spheres
does not contain any other point of the dataset. This method is very simple and
has only α as global parameter. If the sampling density is not uniform, the use of
a single α value can cause a poor reconstruction. In fact, if the value of α is too
large, the reconstruction can be too smooth, while for a too small value of α the
reconstruction can have undesired holes. In [226], a suitable value for α is locally
3.2 Spatial subdivision techniques 49
Fig. 3.4 Example of α-shape computation in 3D space. In (a) the points cloud, in (b)–(e) the α-
shape surface reconstruction using α = {0.19, 0.25, 0.75, ∞} [226] (reproduced by permission
of IEEE)
50 3 Reconstruction
Fig. 3.5 Carving associated to 3D Delaunay tetrahedralisation [181]. The real object in panel (a)
is observed from different point of views and the images are used to obtain the point cloud in panel
(b). Their 3D Delaunay tetrahedralisation is reported in panel (c). It is then carved by eliminating
all the triangles that occlude the background elements present in the acquired images, obtaining
the surface mesh in panel (d). Texture can then be applied obtaining the 3D textured model shown
in panels (e) and (f) (reproduced by permission of IEEE)
adapted to the data, considering the normal of the potential triangle vertices; if they
are almost collinear, the α value is locally shrinked for eliminating the triangle from
the mesh.
In [181], a carving technique based on background occlusion is proposed (Fig.
3.5). It should be noted that it requires auxiliary information that can be gathered
during the scanning session. A method that, from the set of the tetrahedrons obtained
from the Delaunay triangulation, computes the elimination of those tetrahedrons
considered external to the object is proposed in [40]. The external tetrahedrons are
selected using a heuristic (the tetrahedrons that have the following elements toward
the outside of the object: two faces, five edges and four points or a single face,
three edges and three points) (Fig. 3.6). The aim of the procedure is obtaining a
3.2 Spatial subdivision techniques 51
Fig. 3.6 Example of a tetrahedrons elimination procedure [40]. The tetrahedrons ABCI and
BCDI are eliminated because they have the following elements toward the outside of the object:
single face, three edges and three points. In this way, the cavity determined by I can be made
visible
polyhedron of genus 0 (see Glossary 7.3.2) whose vertices are all the data points,
in other words, all the data points are on the surface. This is obtained by iteratively
eliminating one tetrahedron at a time. For each tetrahedron which has at least one
face on the surface, a value, called decision value, is computed and is used as a
key to sort the tetrahedrons. After evaluating all the tetrahedrons, the ones with the
largest values will be eliminated first. The threshold is defined as the maximum
distance between the faces of the tetrahedron and the points that lie on the sphere
circumscribing the tetrahedron itself. Hence, tetrahedrons that are large and flat
will be eliminated first. These tetrahedrons, generally, hide surface details. This
algorithm does not allow the reconstruction of surfaces with non-zero genus.
An alternative approach [210] is based on the duality between Delaunay tri-
angulation and Voronoi diagram (see Glossary 7.3.2). The minimum spanning
tree of the Voronoi diagram (where each node corresponds to the center of the
tetrahedron of the Delaunay triangulation of the data points and the length of each
edge corresponds to the distance of the connected centers) is computed and some
heuristics are applied in order to prune the tree (Fig. 3.7). The elimination of a node
52 3 Reconstruction
Fig. 3.8 Example of medial axis approach to model surfaces in 2D space. In (a) the point cloud.
In (b) the input points partitioned into a regular lattice. In (c) external squares are eliminated and
the squares with the maximal distance from the axis are selected and the medial axis of the object is
identified. In (d) the circles with radius equal to the minimum distance to the points are represented
The methods based on the Delaunay triangulation use the hypothesis that the
points are not affected by error. However, as generally these points are the result of
a measurement, they do have noise. For this reason, a noise filtering phase can be
needed for the vertices of the reconstructed surface.
where Ni,k (u) and Mi,l (v) are the B-spline basis functions of order k and l, and
h
Bi,j are the coordinates of the control points. This approach has the limit that
is in general very difficult to make a surface defined on suitable control points
from unorganized collection of scanned point data. For these cases, in practice,
the parametric reconstruction is used just for a rough initial model description that
has to be improved using other techniques. From the other side, this approach is
particularly suited for interactive modeling because moving the control points is
easy to control the locally deformation of the model surface.
The objective of the function reconstruction approach can be stated as follow:
given a set of pairs {(x1 , z1 ), ..., (xn , zn )}, where xi ∈ R2 and zi ∈ R, the goal
is to determine a function f : R2 → R such that f (xi ) ≈ zi . More formally,
f (xi ) = zi + i , where i is a random variable with a given distribution.
The implicit reconstruction approach allows complete 3D reconstruction, but the
surface is described in terms of distance function. The input domain represents the
3D coordinates of points, the codomain is a measure of the distance between the
points, and the surface is implicitly computed by the function itself. This means
that the points belonging to the surface have to be estimated by searching the
zero set elements. This is typically more expensive with respect to the case of
function reconstruction, in which the function itself directly represents the surface.
54 3 Reconstruction
Fig. 3.9 The 15 possible intersections of a cube and a the surface are shown. The color of each
vertex represents if the signed distance function is positive (dark) or negative (light). Depending
on the signed distance function value in the vertices, different triangular patches are computed
56 3 Reconstruction
When very large data sets are considered, the data density is not needed to be
uniformly dense as details may concentrate in few regions. Both the volumetric and
the surface fitting methods can be realized considering the measured information at
different scales. Many phenomena of real world have an informative content that is
appreciable at different space scales. Some features of an object can be associated
to the global shape, while other features can be associated to the details, localized
in space. The large scale features are generally related to the object type, while the
small scale features are related to the details and can be used to differentiate objects
of the same type. The large scale features have typically a low frequency content
(hence a small variability), while the small scale features have a high frequency
content. If an organization of the information based on the scale would be available,
few resources would be generally sufficient to describe the behavior at a large scale.
Conversely, the configuration of the resources for the small scale features can be
performed, considering only the data belonging to small regions.
The multiresolution and hierarchical paradigms are based on these concepts.
They are structured to perform a description at different scales. There are mainly
two ways to obtain a multiresolution description:
• coarse to fine (Fig. 3.10a), where the description at low resolution is extracted
first, and then the resolution is increased till the details are reconstructed;
• fine to coarse (Fig. 3.10b), where the description at the maximum resolution is
performed first, and then the description at smaller resolution is obtained.
A coarse to fine multiresolution paradigm is presented in [254], where a cloud
of points is recursively approximated using superquadrics. In the initial phase,
the superquadric that best approximates the data is estimated (Fig. 3.11b). For
Fig. 3.10 The two approaches to obtain a multiresolution representation of the target surface f . In
the fine to coarse approach, (a), the scale of approximation increases with the layers. The function
ai (0 ≤ i ≤ 4) represents the approximation of ai−1 , where a0 = f , obtained by layer i, while di
represents the approximation error of the layer i: di = ai−1 − ai . In the coarse to fine approach,
(b), the scale of approximation decreases with the layers. ai represents the approximation of ri−1 ,
where r0 = f , obtained at layer i. ri represent the approximation error of the layer i: ri = ri−1 −ai
3.4 Multiresolution approach 57
Fig. 3.11 In (a) a 3D point cloud is shown. In (b) the first approximation computed using a
superquadric. In (c) the colored region represents where the residual of the points is large. In
(d) the dividing plane for the region with large residual is shown. In (e) the second approximation
on the two subset is depicted and in (f) reported as a shaded surface
each point, the residual of the fitting (i.e., the distance between the point and its
approximation) is computed (Fig. 3.11c) and a plane is fitted to data, characterized
by a large error (Fig. 3.11d). The plane is then used for splitting the cloud of
points into two disjoint parts. For each subset, the superquadric that represents the
data in the optimal way is computed (Fig. 3.11e). If the procedure is iterated, for
example computing the dividing plane for the subsets with a higher error, a set of
models, with increased complexity that can be associated to different resolutions, is
determined.
The hierarchical paradigms are an interesting family of the multiresolution
paradigms. They have a structure characterized by levels (or layers), where each
level is characterized by reconstruction scale. The reconstruction at a certain scale is
obtained using all the levels at scales higher or equal to the scale chosen. Therefore,
the representation presents a hierarchical structure.
In the next chapters two kinds of coarse to fine hierarchical paradigms will be
presented: in particular the first one is called Hierarchical Radial Basis Functions
(HRBF) networks [46][47], treated in depth in Chap. 5, while the second one, called
Hierarchical Support Vector Regression (HSVR) [90], will be treated in depth in
Chap. 6.
The HRBF model is composed of a pool of subnetworks, called layers, and its
output is computed as the sum of the output of its layers. The output of each layer
is computed as a linear combination of Gaussians. For each layer, the Gaussians are
58 3 Reconstruction
40
20
50
-50 0
0 -50
50 -100
60
40
20
10
0 50
50 -50 0
-50 0 0 -50
0 -50
50 -100
50 -100
60
40
20
5
-5 50
50 -50 0
-50 0 0 -50
0 -50 50 -100
50 -100
60
40
20
5
-5
50 50
-50 -50 0
0
0 0 -50
-50
50 -100 50 -100
Fig. 3.12 The output of four layers of a HRBF model are depicted in the left. The sum of the
output of the first layers gives the multi-scale approximation, in the right
regularly spaced on the input domain and they have the same standard deviation,
which acts as the scale parameter of the layer.
The configuration of the coefficients of the linear combination is based on the
analysis of the points that lie in the region of the Gaussian. A first layer of Gaussians
at large scale performs an approximation of the global shape of the surface.
The detail is computed as the difference of the dataset and the approximation
computed by the first layer. The scale parameter is then decreased and a new layer
is inserted to approximate the detail. The process can be iterated till the maximal
possible resolution is reached (Fig. 3.12).
The Hierarchical Support Vector Regression model works in a very similar way.
The idea again is based on a model organized in layers, where each layer performs
the reconstruction at a certain scale. The main differences with respect to the HRBF
3.4 Multiresolution approach 59
model are that the basis functions are placed at the input points position (in general,
they are not regularly spaced), and the weights are found as the solution of an
optimization problem.
In a similar way, the Multilevel B-splines, proposed in [147], performs the
reconstruction using the superimposition of different levels of B-splines. The co-
efficients of the first level are computed by means of a least squares based
optimization, considering only the points that lie in the influence region of the base.
The computation starts from a reticular structure of control points. The detail is
computed as difference between the original data and the approximation performed
by the first layer. A new layer of B-spline, with a denser reticular structure of
control points is then used for the reconstruction of the detail. A similar approach,
called Hierarchical Spline, [99] works only with regularly spaced data. For this
case, a fine to coarse algorithm has been proposed [100], where the multiresolution
representation is derived from a initial B-spline representation at the maximal
resolution.
The Multiresolution Analysis based on the wavelet transform [157][63] works
with regularly spaced data. An MRA is characterized by a pair of functions, called
scaling function and wavelet, that span complementary spaces. The basis for these
spaces are formed respectively by shifted copies of the scaling function and the
wavelet. The scaling function is usually smooth and features the low frequency
components, while the wavelet varies rapidly and features the high frequency
components.
A function can be decomposed as the sum of an approximation and a detail
function, which belong respectively to the space formed by the scaling function
(approximation space) and the wavelet (detail spaces). Function decomposition can
be iterated, which allows to obtain the function representation with a fine to coarse
approach: the process starts from the data and it is iterated on the coefficients
of the approximation function to obtain the coefficients for the coarser scales.
At the end, the multiresolution reconstruction can be performed by adding the
coarsest approximation and the details up to a given scale. Hence a function can
be represented as a linear combination of shifted copies of the scaling function and
the wavelet. The coefficients of the approximation and of the detail are determined
by the convolution between the data samples and a suitable pair of digital filters
(low-pass filter for the approximation and high-pass filter for the detail).
In [183], a MRA based approach for not regularly spaced data is presented.
The data considered are a three-dimensional cloud of points. The approach is based
on the hypothesis that the surface has the topology and the genus of a sphere. The
initial polygonal mesh is computed by adapting a sphere to the set of 3D points.
To this mesh, the wavelet transformation defined on a spherical domain [211] is
applied.
In [98] another fine to coarse technique based on wavelets for not regularly
spaced data is presented (but limited to one-dimensional data). The scaling function
and wavelet coefficients are obtained by means of the minimization of the recon-
struction error, through the resolution of a linear system.
Chapter 4
Surface fitting as a regression problem
A brief overview of the methods for surface reconstruction has been presented in
the previous chapters. In this chapter the attention will be focused on a particular
class of methods that see surface reconstruction as a multivariate approximation
problem. Some of the most popular techniques of this kind will be presented and
pros and cons will be discussed. An evolution of two of these techniques, namely
Radial Basis Function Neural Networks and Support Vector Machines, will be the
topic of the next two chapters.
The prediction of the value of a variable from its observations is known in statistical
literature as regression. This problem has been studied in several disciplines in the
area of computer science and applied mathematics. In particular, the prediction of
real value variables is an important topic for machine learning. When the value to be
predicted is categorical or discrete, the problem is called classification. Although in
machine learning most of the research work has been concentrated on classification,
in the last decade the focus has moved toward regression, as a large number of
real-life problems can be modeled as regression problems.
The regression problem is identified using different names [234], for example:
functional prediction [219], real value prediction [251], function approximation
[213] and continuous class learning [196]. In this chapter, the name regression is
used, because it is the historical one. More formally, a regression problem can be
defined as follows:
Given a set of samples {(xi , yi ) | i = 1, . . . , n}, where xi ∈ X ⊂ RD and
yi ∈ Y ⊂ R, the goal is to estimate an unknown function f , called regression
function, such that y = f (x) + e, where e is a (zero mean) random variable which
models the noise on the dataset.
The elements of the vector x will be indicated as input variables (or attributes)
and y will be indicated as output (or target) variables. The regression function
One of the most used approaches in regression is to limit the search for the solution
to a subset of functions. In particular, the limitation can be realized searching the
solution among a family of parametric functions. Although a choice of a small
family of parametric functions can have the collateral effect of excluding the
function that actually generates the dataset (which would be the “true” solution of
the regression problem), this approach offers the advantage of a simpler solution
(e.g., it is less computationally expensive than other methods). Simple linear
regression in statistical analysis is an example of a parametric approach. In this
approach, the variable y is supposed to change at a constant rate with respect to the
variable x. Hence, the solution of the regression problem will be a hyperplane that
satisfies the following linear system:
The first subscript, i, denotes the observations or instances, while the second denotes
the number of input variables. Hence a system of n linear equation can be used for
computing D + 1 parameters, βj , j = 0, · · · , D, of the hyperplane. The term εi
represents the error of the solution on the sample i, namely εi = yi − yˆi , where
yˆi = fˆ(xi ), the value of the solution (the hyperplane) for xi . In general, the criterion
used is based on the minimization of an error (or loss) function on the training
points. An error measure often used in the applications is the the sum of squares
difference between the predicted and the actual value of the examples (least squares
criterion).
In parametric models, the mathematical structure of fˆ is given and it is expected
to be suitable to the data. This allows largely reducing the number of the parameters
βi at the price of a more complex shape of fˆ. This approach is reasonable only when
adequate a-priori information on the data is available. When, the a priori knowledge
is poor or not available, a non-parametric approach is generally preferable. The non-
parametric approach is based on the use of a generic structure of fˆ able to model
a large set of functions, requiring only very weak assumptions on the distribution
underlying the data. An example of non-parametric approach is the locally weighted
regression that will be presented in Sect. (4.4).
The choice of the possible set of functions among which the solution has to be
searched is known as model selection problem. It is a critical task due to the bias-
variance dilemma [111]. It can be shown that the expectation value of the error
committed by a model, M , for the solution of a regression problem can be expressed
as the sum of three components: Var(M ) + Bias(M ) + σε .
The first term, Var(M ), describes the robustness of a model with respect
to the training set. If different training sets, sampled by the same distribution,
determine very different solutions, the model M is characterized by a large variance.
The second term, Bias(M ), describes how the best approximation achievable by
the model, M , is a good estimate of the regression function f . The third term,
σε describes the measurement error on the data and it represents the limit of the
accuracy of the solution.
For example, if the regression function is a quadratic polynomial and the
regression problem is solved using a linear model (Fig. 4.1a), the variance can be
small (the solution can be robust with respect to different training sets), but the bias
is large (the best, in some sense, solution for the model is not a good estimate of
the regression function). On the contrary if the model is non linear (Fig. 4.1-b), the
variance can be large (as the solution tends to overfit the data and it does change
for different training sets), but the bias is small (as the model is able to reproduce a
quadratic polynomial, then a good estimate of the regression function f is possible).
In general a good model realizes a trade-off between bias and variance.
The solution can be computed in both parametric and non-parametric approaches
mainly using two classes of methods: local methods, that locally search the solution
locally averaging the data, and global methods, in which the solution is found
solving a global optimization problem.
In the following, some popular parametric and non-parametric regression tech-
niques are presented.
64 4 Surface fitting as a regression problem
a b
Fig. 4.1 In (a) and (b) the dashed line represents the target function that has been sampled in the
reported points. In (a) the solid line is a solution computed by a linear model. In (b) the solid line
is a solution computed by a complex model. The reconstruction is characterized by a small bias
and large variance
ˆ x ∈T S(x, xj ) yj
f (x) = j i (4.2)
xj ∈Ti S(x, xj )
where Ti is the subset of the training set composed of the training set elements most
similar to x.
This method performs good approximation only for a sufficiently large training
set and when f can be locally linearized. The method is based on the assumption that
all the input variables have the same importance with respect to the output variable.
If this assumption is not verified, the method has to be modified applying a weight
for each input variable. Since all the training examples have to be stored, for large
dataset the amount of memory required can be huge. In order to reduce the quantity
of memory needed, averaging techniques have been proposed [15].
The complexity of this class of methods as well as its performance depends on
the number of nearest neighbors considered. The most effective number of nearest
neighbors for a specific problem can be found using different heuristic techniques
such as cross-validation (see Glossary 7.3.2) [139].
Locally weighted methods [22] belong also to the family of lazy learning algorithms.
Like the instance-based methods, the prediction is operated using a subset of the
instances in the training set. Hence, the training set instances, which are represented
as points in D-dimensional Euclidean space, strongly influence the prediction on
a local basis. The main difference between the instance-based and the locally
weighted methods is that, while the formers compute the prediction by averaging
the training set elements closer to the considered position, the locally weighted
methods perform the prediction using a locally estimated model. The local models
are generally linear or non-linear parametric functions [22]. These methods are also
based on weighting the data to give more importance to relevant examples and less
importance to less relevant examples. The same effect can also be obtained replicat-
ing the important instances. The relevance is computed, analogously to the similarity
of instance-based methods, measuring the distance between a new instance and
each training point. The functions used to weight the training points contribution
are called kernels; one of the most used is the Gaussian function. For example the
prediction for the instance x can be obtained using the Nadaraya-Watson estimator
[55] as:
−||xi −x||2
y i K σ (xi , x) y i e σ2
fˆ(x) = i
= i
(4.3)
i Kσ (xi , x)
−||xi −x||2
σ2
ie
where the sums are limited to an appropriate neighborhood of x, and σ is a parameter
which controls the width of the influence region of the kernel.
66 4 Surface fitting as a regression problem
The actual form of the solution, fˆ, has to be chosen in order to minimize a given
training error function, L, called loss function:
n
L= E(fˆ(xi ), yi ) (4.4)
i=1
where E is a general function that measures (or weights) the difference between
fˆ(xi ) and yi , i.e., the error made in using fˆ(xi ) in place of yi . Generally, these
functions are of two kinds: squared error, E(fˆ(xi ), yi ) = (fˆ(xi )−yi )2 , and absolute
error, E(fˆ(xi ), yi ) = |fˆ(xi ) − yi |.
It can be shown that weighting the loss function (such that nearby instances are
more considered than those that are distant), is equivalent to directly applying a
weighting function on the data. For instance, considering a constant local model as
the solution, ŷ, in order to require that it fits the nearby points well, the loss function
can be chosen as:
n
L(x) = (ŷ − yi )2 K(xi , x) (4.5)
i=1
It can be shown that the best estimate ŷ(x) that will minimize the loss function L(x)
is ŷ = fˆ(x), where fˆ(x) is computed as in (4.3).
Different weighting functions can be applied to this approach [22] and this choice
can produce very different performances. Locally weighted methods have shown
higher flexibility and interesting properties, such as smoothness, than the instance-
based methods. However, the choice of an appropriate similarity function is critical
for the performance of the model. This can be a problem when it is difficult to
formalize when two points can be considered close. As mentioned before, the
computational complexity of the training phase is reduced to the minimum (it is
realized just storing the dataset), but the prediction phase can be computationally
expensive. In order to limit the cost of the prediction for large data sets, suitable
data structures can be used for storing the training set. For instance in [22] a k-D
tree data structure is used for speeding up this phase. A k-d tree is a binary data
structure that recursively splits a D-dimensional space into smaller subregions in
order to reduce the search time of the relevance points for an instance.
Rule induction regression methods have been developed for the regression problems
which provide interpretable solutions. These methods have the aim of extracting
rules from a given training set. A common format for interpretable solutions is
the Disjunctive Normal Form (DNF) model [244]. Rule induction models have
been initially introduced for classification problem [244] and then extended for
regression [245].
4.5 Rule induction regression 67
where Rk is a region of the input space, ak is a constant, and {yik } is the set of
training points that lie inside Rk . In this example, the output variable value for the
sample x is computed as the median of the output variables of the training points
that lie inside Rk . This is a common choice, as the median is the minimizer of the
mean absolute distance.
The general procedure for obtaining the regression tree is described in the
following. The tree is initialized by associating the whole training set to the root
of the tree. Then a recursive splitting procedure is applied to each node, until
the cardinality of the dataset associated to each node is below a given threshold.
For this purpose, for each node, the single best split (e.g., the one that minimizes
the mean absolute distance of the training examples of the partitioned subset) is
applied, generating two children for each node. As the goal is to find the tree that
best generalizes new cases, a second stage of pruning generally follows to eliminate
the nodes that cause overfitting.
In the regression trees domain, a single partition of Rk represents a rule, and the
set of all disjoint partitions is called rule-set. In rule induction approach, instead, the
regions for rules need not be disjointed: a single sample can satisfy several rules.
Hence, a strategy to choose a single rule to be applied from those that are satisfied
is needed. In [244], the rules are ordered according to a given criterion (e.g., the
creation order). Such ordered rule-set is called decision list. Then, the first rule in
the list is selected.
The rule-based regression model can be built, as the regression tree, adding a
new element at a time (i.e., the one that minimizes the distance). Each rule is
extended until the number of training examples which are covered by it, drops
below a given threshold. The covered cases are then removed and rule induction
process can continue on the remaining cases. This procedure is very similar to the
classification case [167][65].
In [245] a rule-induction regression algorithm based on the classification algo-
rithm Swap-1 [244] is presented. Swap-1 starts with a randomly chosen set of input
variables to create a candidate rule and swaps all the conjunctive components with
68 4 Surface fitting as a regression problem
all the possible components. This swap includes the deletion of some components
from the candidate rule. The search finishes when there are no swaps that improve
the candidate rule anymore, where the evaluation of each candidate rule is performed
on their ability to predict the value of the output variables on the examples that
satisfy the rule condition. Each time a new rule is added to the model, all the
examples covered by that rule are removed. This procedure is iterated until the
training set becomes empty. A pruning step is performed after the creation of
the rule-set: if the deletion of a rule does not decrease the accuracy of the model
the rule is deleted.
Since Swap-1 is designed for categorical input variables, its regression version
implies a preprocessing to map the numeric input variables into categorical ones.
This is realized using a variant of the K-means algorithm [134]:
• the output value of the examples yi is sorted;
• an approximately equal number of contiguous values yi is assigned to each class;
• an example is moved to an adjacent class if the distance between the example
and its class mean is reduced.
The rule induction regression algorithm is realized computing Swap-1 on the
mapped data and then pruning and optimizing the rule-set. After the pruning, the
optimization is performed by searching the best replacement for each rule such that
the prediction error is reduced [245].
The predicted value can be computed as the median or mean value of the class
elements, but a parametric or a non-parametric model can also be applied to each
class.
Local averaging produces a poor accuracy when the input space has a large numbers
of attributes (curse of dimensionality). In fact, data density decreases exponentially
with the increase in the number of dimensions [103][114] [21]. This means that
in higher dimensional spaces, the number of training points should be huge for
computing meaningful local average, and this is rarely true in real applications.
The Projection Pursuit Regression approach [138] is more suited to these cases.
The idea is to compute a global regression by using an iterative procedure that
performs successive refinements. The procedure computes at each step a smooth
approximation of the difference between the data and what has been already
modeled in the previous steps. The model is represented by the sum of the smooth
approximations, Sm , determined at each iteration:
M
fˆ(x) = T
Sm (βm · x) (4.7)
m=1
4.7 Multivariate Adaptive Regression Splines 69
where ri,m represents the error on the i-th training point at the m-th iteration, ri,0 =
yi . The iteration steps are performed until R(β) is greater than a given threshold.
The smooth function S can be chosen in many different ways [138][201][198]. In
[201] the performances of the algorithm are compared for three different smooth
functions: smoothing splines, super-smoothers, and polynomials. Also smooth
functions based on local averaging of the residuals [103] can be used.
a b
15
2.5
max(0, x−2)
2 max(0, 2−x) 10
1.5
5
1
0
0.5
0 −5
−0.5 −10
0 1 2 3 4 0 1 2 3 4
Fig. 4.2 In (a) two examples of hinge functions. In (b), a piecewise linear function defined as a
linear combination of the functions in (a) as f (x) = 6[x − 2]+ − 3[2 − x]+
where D is the input space dimension, n is the number of training points, xji denotes
the value of j-th input variable for the i-th sample. If all the training points are
distinct, there are 2 D n basis functions.
Hinge functions are a key part of MARS models. In Fig. 4.2a are reported two
examples of hinge functions. A pair of hinge functions such as that in Fig. 4.2a
is called reflected pair. As the hinge function is zero in part of its support, it can
be used to partition the data into disjoint regions, each of which can be processed
independently. In particular the use of hinge functions allows building piecewise
linear and non-linear functions (Fig. 4.2b). MARS operates by adding a pair of
functions that are symmetrical with respect to the knot point which corresponds to
each training point, in 1D spaces. In higher dimensional spaces, a function is added
for each component of the training point.
The MARS algorithm presented in [102] is composed by two procedures:
Forward stepwise The basis functions are identified starting from the constant basis
function, the only initially present. At each step, the split that minimizes a given
approximation criterion among all the possible splits of each basis function is
chosen. This procedure stops when the maximum value, chosen by the user,
Mmax , is reached. Since the model found generally overfits the data, a backward
pruning procedure is then applied.
Backward stepwise At each step, the basis function whose elimination causes
the smallest increase in the residual error is deleted. This process produces a
sequence of models, {fˆα }, which have an increasing residual error, where α
expresses the complexity of the estimate. In order to choose the best estimated
model, cross-validation can be applied.
In [225] a variant of the MARS algorithm in which the backward procedure is
not used is proposed. This is substituted by a penalized residual sum of squares
introduced in the forward procedure, and the problem is reformulated as a Tikhonov
regularization problem.
4.8 Artificial Neural Networks 71
In Fig. 4.3, a scheme of the MLP model is depicted. The MLP processes the data
layer-wise: each layer receives the input from the output of the previous layer and
provides the input to the next layer. Usually, only the neurons of consecutive layers
are connected. Each connection is characterized by a scalar parameter called weight,
which is a multiplicative coefficient that is applied to the value transmitted by the
connection itself. The input of each unit is the weighted sum of all the contributions
carried by the incoming connections; hence, it is a linear combination of the output
of the units of the previous layer. The first and the last layers are generally composed
of linear neurons, while the intermediate layers, called hidden layers, are composed
of non-linear (commonly logistic) neurons. The scheme in Fig. 4.3 represents a MLP
with d input units, a single hidden layer, and a single output unit.
The MLP models can be used to solve efficiently a large class of regression and
classification problems. It has been proved that MLP with at least one hidden layer
is a universal approximator [126]. However the addition of extra hidden layers may
allow to perform a more efficient approximation with fewer units [37]. The output
of the network of Fig. 4.3 is computed as:
M
fˆ(x) = T
βm B(γm · x) (4.12)
m=1
where M is the number of hidden units, βm is the weight (a scalar) of the connection
between the output unit and m-th hidden unit, γm is the weights vector of the
connections between the input units and the m-th hidden unit.
In general the neural networks can have more complex schemes than that shown
in Fig. 4.3. For example, in recurrent neural networks the output is fed-back to the
input or to the intermediate layers. In fully connected networks, the input of the units
in the lower layers is sent to all the units of the higher layers and not only to those
of the next one. The number of hidden units of a network determines the trade-off
between bias and variance of the model (see Sect. 4.2). The more hidden units are
used in a network, the better the training data are fit. However if a too large number
of units is used, the model can show overfitting and a poor predictive capability.
The parameters of a neural network are found, generally, minimizing a loss
function (as in Sect. 4.4) that measures the distance between the output measured in
the input data set and the output computed by the network for the same input values.
The distance function usually adopted is sum of squares (Euclidean distance).
Since MLP output is not linear (and the corresponding loss function is not
quadratic) with respect to the model parameters, the optimal value of the parameters
cannot be found directly (e.g., as solution of a linear system). Hence iterative
optimization is usually performed, based on the computation of the gradient
of the loss function. The gradient can be computed in several ways; the most
popular algorithm is backpropagation algorithm [205] because of its computational
efficiency. It exploits the property of the derivatives of a composite function for
decomposing the gradient in simpler expressions. Starting from the output layer,
the gradient of the loss function can be decomposed in its components relative to
each unit of the previous layer and the recursive application of the above mentioned
property allows to propagate the computation of the gradient with respect to the
parameters of the network toward the input layer.
The gradient of the loss function is then used in a minimization algorithm. There
are several iterative methods to perform the minimization of the loss function.
The simple gradient descent is one of them. The updating of the parameters is
computed as a fraction, η, of the gradient of the loss function. The choice of the
proper value of η, called learning rate, is generally, difficult. A too small value of η,
can produce many steps to reach a minimum, while a too large value can produce an
oscillating behavior around the optimal value, and the minimum is never reached.
To speed up the convergence, the η parameter can be varied during the learning
phase and many heuristics have been proposed to this aim.
4.8 Artificial Neural Networks 73
However, gradient descend is, generally, an inefficient method as all first order
methods are, and second order approaches have been explored. The second order
methods are based on the fact that in a minimum of a function, its gradient is zero.
As a consequence, the Taylor expansion of the function computed with respect to the
minimum will not have first order terms. Hence, for points close to the minimum, the
function can be effectively approximated by using only the second order derivatives
and the value of the function in the minimum. In particular, the Taylor expansion of
a function, f (β), of a single variable β, close to a minimum β ∗ results:
2
∗ 1 ∗ 2 d f
f (β) = f (β ) + (β − β ) + O(β 3 ) (4.13)
2 d β 2 β=β ∗
It results that the step that separate the considered point β from the minimum,
(β − β ∗ ), can be computed directly as the ratio between the first derivative of
the function with respect to β and its second order derivative in the minimum.
This result can be extended to the multi-variate case for which the second derivative
becomes the Hessian matrix, H(·), evaluated in the minimum β ∗ . As the Hessian
matrix (in the minimum) is unknown, it has to be approximated. For this aim,
several iterative techniques based on the use of different approximations have
been developed. Two popular methods are the Broyden-Fletcher-Goldfarb-Shanno
(BFGS) algorithm [52] and the Levenberg-Marquardt (LM) algorithm [148][158].
The BFGS requires that the starting parameter vector is close to the minimum to be
effective: other techniques, like those based on the simple gradient descent, can be
to reach a good starting point. The LM algorithm does not suffer from this problem,
but it requires more memory than BFGS as it has to store several matrices. Hence,
for large networks, it becomes inefficient.
The determination of the parameters through iterative optimization procedures
is called learning in neural networks terminology. Other equivalent terms often
used in this domain are configuration, that is used when also the number of units
is considered in the optimization procedure (i.e., it includes the model selection)
and training, that puts the accent on the iterative use of the data to compute the
parameters by changing their value to the optimum.
In the next chapters two kinds of neural models, namely Radial Basis Function
Neural Networks and Support Vector Machines, and their hierarchical version, will
be treated in depth. Both these models can be seen as particular MLPs with a single
layer that make use of alternative training technique.
74 4 Surface fitting as a regression problem
Then the function f (x) can be obtained also as linear combination of the detail
functions which can be obtained as a linear combination of copies (scaled and
translated) of the wavelet. As for the scaling function, it has been proved [157]
that there exists a sequence {gk } such that:
ψ(x) = 2 gk ψ(2x − k) (4.17)
k
In a similar way, the coefficient γj,k can be obtained as the projection of f (x) on
the wavelet:
γj,k = f, ψj,k ⇒ Qj [f (x)] = f, ψj,k ψj,k (x) (4.19)
k
to use wavelet for the case of not regularly spaced data. In the Lifting Scheme the
computation of the wavelet is divided in a chain of weighted averaging and sub/over
sampling operations. If the data are not regularly spaced, the basis functions
are not scaled and translated copies of the same function, but they have to be
adapted to the training data. The increased flexibility of the method is paid with
a higher complexity of the analysis-synthesis procedure. When the data are sparse,
a connectivity structure which describes the topological relationship between the
data is required.
Chapter 5
Hierarchical Radial Basis Functions Networks
In this chapter a particular kind of neural model, namely the Hierarchical Radial
Basis Function Network, is presented as an effective hierarchical network organiza-
tion. In a similar way, in the next chapter another kind of multi-scale model, namely
Support Vector Machines, will be presented.
In this chapter, a particular kind of neural network, which belongs to the perceptron
family [121], is considered. The models of this family are characterized, as the other
neural networks models, by the activation function of their neural units and the
connection scheme. In particular, the units with radial symmetry allow obtaining a
good approximation with a limited number of units. Each unit is characterized by a
width that defines its influence region: only the input inside this region produces
a significant activation of the related unit. This locality property contributes to
improve the efficiency of the learning process, as only a subset of the training
data can be considered for the configuration of each unit. Moreover, locality allows
avoiding the interference of examples in distant regions that can produce a collinear
error on the output of a layer that compensates the error generated by closer
points, with the result that the learning procedure may get trapped in a local
minimum. The networks based on such units are called Radial Basis Functions
(RBF) Networks [190][169][188].
Using Gaussians as basis functions, a RBF network can be described as a
function fˆ(x) : RD → R, as follows:
||x−μk ||2
M M −
e σ2
k
fˆ(x) = wk G(x; μk , σk ) = wk √ (5.1)
k=1 k=1 π D σkD
niques are able to find the global minimum, they are based on the exhaustive
exploration of parameters space and then they are therefore unfeasible in real
scenarios because they are computationally expensive.
A different strategy, called hybrid learning [169][51], derives from the ob-
servation that the parameters in (5.1) can be divided in two classes depending
on their role: structural parameters and synaptic weights. M , {μk } and {σk }
belong to the first class, while the weights, {wk }, belong to the second class.
The different nature of these parameters suggests to use different algorithms to
determine their values. With hybrid techniques the position of the units {μk } can
be determined by using clustering algorithms (e.g., [159][94] and the number of
units, M , can be chosen a priori. The parameters {σk }, define the behavior of the
function in regions between the samples and they can be determined using heuristics
[169][188][178][216]. When the structural parameters are set, the equation (5.1)
becomes a linear system where the unknowns are the weights, {wk }. Although the
weights can be computed solving the system, for networks with high number of units
this solution is not feasible because of numerical instability or memory allocating
problems and different strategies have been explored. These are based on estimating
the weights of the units starting from the data local to each unit.
The growing structures [188][104][105] are an improvement over hybrid schemes.
The number of units is not given a priori, but the units are added to the network one
after the other until a given criterion is satisfied. These schemes are iterative and
a good estimate of the network parameters, generally, requires a time consuming
learning phase. A similar approach is followed by boosting regression [197].
Another approach, pioneered in digital filtering and in computer vision, is based
on the use of regularly spaced Gaussians placed on a lattice covering the input space
[185][207]. The disadvantage of this approach is the rigidity of the structure used.
The distance among the units is the same in all the input space: the resulting function
can suffer of overfitting in some regions and is not able to reproduce the finest details
in other regions. Furthermore, the presence of overfitting would imply a waste of
resources due to the use of too many units. This problem can be solved in several
ways.
In [64] a method based on building a model adding each time an orthogonal basis
is proposed. Such model allows both improving the generalization ability of the
previous model and reducing the number of the units. The weights of the orthogonal
basis are found by an iterative minimization of a cost function. The technique
is based on a modified Gram-Schmidt orthogonalization procedure and can be
computational intensive for large networks. Besides, numerical problems may arise.
A different approach is represented by a model called Hierarchical Radial
Basis Functions (HRBF) [46][97][89][29]. It is based on the idea of inserting
units only where they are needed without resorting to any iterative procedure to
determine the network parameters. HRBF is one of the two hierarchical models
considered in the present book for surface reconstruction and the remaining sections
of this chapter are dedicated to it. It enables the non-iterative computation of the
80 5 Hierarchical Radial Basis Functions Networks
in Sect. 5.2.1.1. Then, in Sects. 5.2.1.2 and 5.2.1.3, the regular RBF network model
will be generalized to models that simplify the theoretical derivation of the HRBF
learning algorithm.
The ideal low-pass filter keeps all the components at frequency lower than νcut-off
unaltered and deletes all the components at higher frequencies. This filter is not
realizable in practice. In the frequency domain, a real low pass filter, F̃ (ν), is
characterized by two frequencies, νcut-off and νmax , that identify three bands: Pass
Band, Stop Band and Transition Band (Fig. 5.2). Two suitable thresholds δ1 and δ2 ,
νcut-off and νmax are identified by the following conditions:
In the Pass band, the frequency components are kept almost unchanged, in the
second one, the Stop band, they are almost deleted and in the last one, the
Transition band, they are attenuated progressively. The Gaussian filter (Fig. 5.2)
is formulated as:
1 x2
G(x, σ) = √ e− σ2 (5.5)
πσ
and its Fourier transform is:
2
σ2 ν 2
G̃(ν, σ) = e−π (5.6)
82 5 Hierarchical Radial Basis Functions Networks
The following relationships exist between σ and the cut-off and maximum frequency
in equations (5.3) and (5.3):
√
−π 2 σ2 νcut-off
2 − log δ1
e = δ1 ⇒ νcut-off = (5.7)
πσ
√
2 2 2 − log δ2
e−π σ νmax = δ2 ⇒ νmax = (5.8)
πσ
where the function f (x) is obtained from the function w(x). In the regression,
however, we have the inverse problem: we have to find a rule for obtaining a function
w(x) that provide a good estimate of f (x). If w(x) were replaced by the function
f (x) itself, the function fˆ(x) would be obtained as:
fˆ(x) = f (c) G(x − c|σ) dc = f (x) ∗ G(x; σ) (5.11)
R
From (5.12) it is clear that fˆ(x) will be a smooth version of f (x). In fact all the
f (x)’s frequency components over νcut-off will be attenuated progressively by the
convolution with the Gaussian filter.
5.2 Hierarchical Radial Basis Functions Networks - HRBF 83
In real cases there is a limited number, N , of samples. Let us consider the case
in which a function, f , is reconstructed from a regular sampling of the function
itself, {(xi , fi ) | fi = f (xi )}, with sampling step Δx, using a linear combination
of Gaussians centered in the samples. In the points {xi } the function can be
reconstructed as:
N
N
Δx (xi −xk )2
fˆ(xi ) = fk G(xi ; xk , σ)Δx = √ fk e− σ 2 (5.14)
πσ
k=1 k=1
Equation (5.14) can be extended on the whole real line using the following
interpolation:
N
N
Δx (x−xk )2
fˆ(x) = fk G(x; xk , σ)Δx = √ fk e− σ 2 (5.15)
πσ
k=1 k=1
1
In (5.15), also the Gaussian filter is sampled, with sampling frequency νs = Δx .
This fact, for the sampling theorem, introduces another constraint:
νs
νmax < (5.16)
2
Relations (5.7, 5.8) can be modified as follows:
2
σ2 νcut-off
2
e−π = δ1 (5.17)
2
σ2 νmax
2 δ2
e−π = (5.18)
2
δ2 has been halved because, when νmax = νs /2, the Gaussian receives the same
contribution from both the main lobe and from the closest replica (aliasing effect).
The contribution of the other Gaussians replicas could also be considered, but
they are too far and their contribution is too small for significantly influence the
reconstruction.
It can be proved that with a sampling frequency of νs , the reconstructed function
does not have (significant) components over νM , where:
log δ1 νs
νM = (5.19)
log δ2 /2 2
84 5 Hierarchical Radial Basis Functions Networks
Equation (5.19) can be compared with Shannon constraint. The latter asserts that
the maximum frequency that can be reconstructed by a sampled signal is equal to
half the sampling frequency. Since δ1 > δ2 , the maximum reconstructed frequency
by the model in (5.15) will√ be smaller than that indicated by the Shannon theorem.
In particular, δ1 is set to 2/2, according with common practice (attenuation of 3
dB), and δ2 is set to 0.01, the following is obtained from (5.19):
νs = 7.829 νM (5.20)
Hence, the sampling frequency should be almost about eight times the maximal
frequency that has to be reconstructed. Decreasing the value of δ1 or increasing
the value of δ2 (i.e., when δ1 approaches δ2 ), the ratio (5.19) tends to two, but the
quality of the reconstruction decreases. In particular, if δ2 increases there will be
an increase of aliasing. If δ1 decreases there will be greater attenuation of higher
frequency components.
Using the above relationships and setting the maximum frequency to νM , a
regular discrete RBF network can be configured, setting the number, M , the
position, {μk }, and the width, σ, of the Gaussians (compatible with the sampling
frequency νs ). The weights {wk } will be proportional to the input data themselves.
For example, given {(xi , yi )|x ∈ R, y ∈ R, i ∈ {1, ..., n}} a set of regularly
sampled points, Δx = xi+1 − xi ∀i, a regular discrete RBF network can reconstruct
1
frequency components up to 8Δx (due to the equation (5.20)). If this is compatible
with the maximal frequency component of the function, then the function can be
reconstructed by placing a Gaussian unit in every input point: μk = xk , M = n.
We can derive from the constraints on the frequencies that the value of σ cannot be
less than 1.465Δx. The weights correspond to the value assumed by the sampled
points times the Δx: wk = yk Δx.
1
It is worth noting that, given a maximum frequency below 8Δx , a suitable
subsampling of the data set can be applied to match the associated sampling
distance. In this case, the number of Gaussians, their centers are selected according
to the reduced data set.
The previous technique is not efficient when the function f (·) presents different
frequency content in different regions of the input domain. The presence of high
frequency details in just some regions of the input domain would require to use
a high number of units also in the regions where the frequency content is low.
Instead, these regions would be more efficiently reconstructed with a lower number
of Gaussians with a larger σ.
The reconstruction can be realized, in a more efficient way, using more than one
network, each one characterized by a different value of σ. The first network has
the goal of realizing an approximation of the function at a very large scale, a1 (x).
The value of the cut-off frequency of this network is chosen relatively small, say ν1 .
5.2 Hierarchical Radial Basis Functions Networks - HRBF 85
For the estimate of the Gaussian weights a suitable input points subsampling
can be realized, and the approximation a1 (xi ) of f (xi ) can be computed for all the
examples in the data set.
a1 (x) can be seen as a first rough approximation of the data. The data points will
not generally be on a1 (x) and a residual can be defined for each point as:
A third network, with a cut-off frequency ν3 , ν3 > ν2 can now be inserted and
the procedure can be iterated until the residual is under a chosen threshold.
The described procedure performs a multi-resolution representation of the
sampled function [42] where the representation at the l-th resolution is given by
the sum of the output of the first l networks. In real applications, the data are
usually not regularly spaced and affected by measurement error. In this scenario the
configuration procedure described in the previous section cannot be used anymore.
In the following section, this problem will be characterized and addressed for
enabling the HRBF configuration algorithm to deal with real data.
The procedure described in Sect. 5.2.1.3 supposes regularly spaced noise-free data
sampled in the same position of the Gaussians center. If this hypothesis is not
verified, a technique for the estimate of the value assumed by the function, f (·),
in the position of the Gaussians center is needed and it has to be estimated. We will
call this estimate, fˇ(·). fˇk = fˇ(μk ) is estimated locally: a subset Ak of the dataset
is selected according to the distance of the data points from μk . Then, the points in
Ak are used to provide the value of fˇk . The region to which the points in Ak belong
to, is called receptive field of the k-th Gaussian.
The criteria used to determine Ak are mainly two. The first one is based on the
number of elements used in the estimation: for each Gaussian the n input points
closer to its center, μk are selected. The second one is based on the distance: for
each Gaussian every point closer to μk than a given threshold, ρ, is selected. Both
the methods have advantages and disadvantages.
86 5 Hierarchical Radial Basis Functions Networks
In [46], the estimate of fˇk = fˇ(μk ) is based on a weighted average [22] of the
points that belong to the set Ak , where the weight of each sample decreases with
the distance of the sample from the center of the Gaussian. The implicit assumption
is that the points closer to the Gaussian center are able to provide a more reliable
estimate of the function in that point. In particular, the weight function used is again
a Gaussian centered in μk , and fˇk can be computed as:
(xr −μk )2
−
σw2
yr e
fˇ(μk ) = xr ∈Ak (5.23)
(x −μ )2
− r σ2 k
xr ∈Ak e w
The parameter σw in the weighting function is set equal to the value of σ of the
RBF layer, although σw < σ would be more conservative as it avoids to possibly
neglect the high frequency components of the function. ρ, that determines the
amplitude of Ak , is set equal to Δx which is the spacing between two consecutive
Gaussians (μx − μx−1 = Δx, ∀x).
However, for a given layer, the computation of fˇk is not required over the whole
input domain, as previous layers could have produced a function that is accurate
enough in some input regions. In this case, the weights are better set to zero in these
regions and no new Gaussians have to be inserted there.
To this aim, before computing fˇk we evaluate if the local residual value is below
threshold. Such threshold can be for instance associated to the measurement noise,
ε. Hence, the k-th Gaussian is inserted in the l-th layer (i.e., wk is set to Δx fk ) if
and only if the average value of the residual in the neighborhood of μk , Rk , is over
ε, namely if the following is satisfied for the k-th Gaussian:
xr ∈Ak ||yr ||
Rk = >ε (5.25)
|Ak |
(xr −μl,j )2
− 2
σw
nl,j r:xr ∈Al,j yr e
wl,j = Δxl = Δxl (xr −μl,j )2
(5.29)
dl,j − 2
σw
r:xr ∈Al,j e
8. The approximation function al = {al (xi ) = j wl,j G(xi ; μl,j , σl )} is
computed.
9. The residual is computed as: rl = {rl−1 (xi ) − al (xi )}.
The output function is the sum of all the approximation functions computed:
L
f˜(x) = wl,j G(x; μl,j , σl ) (5.30)
l=1 j
88 5 Hierarchical Radial Basis Functions Networks
Despite there are not theoretical constraints, the set {ν1 , ..., νl } is computed,
generally, such that each frequency is doubled with respect to the previous one.
In this way a wide range of frequencies can be covered by using a small number
of layers.
Since σ and ν are closely related, in many applications providing directly the
value of σ is preferable. Furthermore, the number of layers has not to be set a priori
but a termination criterion can be used: for example, the procedure can be iterated
until the residual goes under a given threshold on the entire domain.
From equation 5.31, we see that the output of all the Gaussians in all the layers
should be computed for each point in the input space, that would require a huge
computational effort. However, due to its exponential decay, the Gaussian assumes
values significantly larger than zero only in the region close to its center and some
windowing to zero can be introduced. In fact the Gaussian value at the distance of
3σ from its center is 1.24 · 10−4 times its value in the center.
This allows to limit the computation of the contribution of each Gaussian to the
output of the layer only in a suitable neighborhood of its center. We call this region
influence region of the Gaussian k, Ik . This region is a spherical region with radius
τ proportional to Δx:
⎧
⎨ √ 1 exp − ||x−μl,k ||)2 , ||x − μl,k || < τl,k
πσl,k σ 2
Gl,k = (5.32)
⎩0, elsewhere
of the Gaussians can be halved at each layer. The position of the units of each layer
can be easily determined starting from the center of the domain moving laterally
by Δx (5.28). The number of layers, L, could be determined a priori considering
the computational resources available (e.g., the maximum number of units), or can
be determined run-time considering the error achieved by the last configured layer.
Otherwise it can be determined by a trial and error strategy or by cross-validation.
The hyperparameters σw , ρ, τ and α have been chosen through empirical
considerations and they have been adequate in all the experiments that we have
made. The error threshold ε depends on the root mean squared error on the data and
the degree of the approximation accuracy of the HRBF network.
The hyperparameter ρ controls the width of the receptive field, and, hence, it is
set proportional to the width of the units of the layer, σl , or to their spacing, Δxl .
Hence, reasonable choices can be: ρ = σl or ρ = 2Δxl . A too small value of ρ can
make the estimate of fˇk unreliable due to lack of samples. On the other hand, large
values of ρ can make the estimate computationally expensive. Moreover, the use of
samples far from the Gaussian center is questionable.
The radius of the influence region affects both the accuracy in the approximation
of the Gaussian output and the computational cost of the training procedure.
We have considered, somehow arbitrarily the contribution of a Gaussian in points
distant more than 3 σ from its center equal to zero. This can be a reasonable value
because, as described before, the Gaussian value at the distance of 3σ from its center
is generally negligible. The minimum number of samples required for a reliable
estimate depends both on the variability of the function and on the amount of noise
on the data. However, since only a rough approximation is required, few samples
are sufficient for the scope (3 to 5 can be a reasonable value for α).
90 5 Hierarchical Radial Basis Functions Networks
The error threshold, ε, determines if the Gaussians of the new layer will be added
to the network and then it determines the accuracy of the approximation of the whole
HRBF network. It should be set from the accuracy of the measurement instruments
used for obtaining the data set. In the case of the 3D scanning, it can be estimated
by using several techniques (e.g., average distance of sampled points on a plane
with respect to the optimal plane), or from the calibration procedure. If a priori
information is not available, a trial and error strategy could be used to estimate the
error threshold value.
The configuration procedure of a HRBF network requires operations, local to the
data, to compute the parameters. This produces a very fast configuration algorithm
that is also suitable to be parallelized.
Due to the properties of the Gaussian function, the extension to the multivariate case
is straightforward. In particular, we will consider here the two-dimensional case
since it is of interest for the applicative problem of surface reconstruction although
generalization to spaces of higher dimensions is trivial. In the R2 → R case, the
Gaussian filter results:
1 − ||x−μ2k ||2
G(x, xk ; σ) = e σ (5.33)
πσ 2
where x ∈ R2 . As x = (x(1) , x(2) ), the (5.33) can be written as:
2 2
1 − x(1) +x (2)
G((x(1) , x(2) ); σ) = e σ2
πσ 2
2 2
1 − x(1) 1 − x(2)
= e σ2 e σ2 =G(x(1) ; σ)G(x(2) ; σ)
πσ 2 πσ 2
(5.34)
as the Fourier transform of a separable function is equal to the product of the one-
dimensional transform of its components. It follows that all the considerations and
relationships for the one-dimensional case are valid also for the two-dimensional
one.
We stress that, the relationship between the continuous and discrete case, as
shown in (5.14), leads to:
N
fˆ(x; σ) = fk G(x(1) ; μk ; σ) Δx(1) G(x(2) ; μk ; σ) Δx(2) (5.36)
k=1
5.2 Hierarchical Radial Basis Functions Networks - HRBF 91
a b
c d
Fig. 5.4 Reconstruction of a doll with the HRBF model using (a) 5 layers, (b) 6 layers, (c) 7
layers, and (d) 8 layers. The hierarchical structure of the model allows to choose the number of
layers that match the required visual quality
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
−0.4 −0.4
−0.6 −0.6
−0.8 −0.8
−1 −1
−1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1 −1 −0.8−0.6−0.4−0.2 0 0.2 0.4 0.6 0.8 1
Fig. 5.5 The grids of Gaussians for the 5th (a), 6th (b), 7th (c), and 8th (d) layers of the HRBF
model used for the reconstruction of the doll in Fig. 5.4. Note the sparseness of the Gaussians
are depicted. The centers of the Gaussians effectively used (i.e., those that have not
a zero weight) are reported as a cross mark. It can be noticed that in the last layers
the Gaussians are inserted only in some regions on the input space, as in the others
the approximation is already below the threshold ε after the first layers.
In [97], the approximation properties of the HRBF model have been investigated
formally. It is shown that a RBF network with units equally spaced on a lattice
can be regarded as a Riesz basis and therefore it can approximate a large class of
5.3 Real-time, incremental surface construction through HRBF networks 93
For the HRBF networks, the scheme described in the previous section cannot be
efficiently applied for real-time reconstruction. In fact, if an HRBF network has
been already configured using a given data set, when a new sample, (xnew , ynew ),
becomes available, the output of (5.23) becomes out of date for the first layer. Then
fˇ(μk ) has to be estimated again for all the Gaussians by using the new data set
constituted of the old data set plus the new sample. As a result, a1 (·) changes inside
the influence region of all the updated units and the residual r1 changes for all the
points inside this area. This requires updating the weights of the second layer for
those Gaussians whose receptive field intersects with this area. This chain reaction
may involve an important subset of the HRBF network’s units. Moreover, the new
point can prompt the request of a new layer, at a smaller scale if the average residual
becomes over threshold in some regions of the input domain.
A brute force solution is to reconfigure the entire network re-computing all the
parameters every time a new point is added to the input data set. This solution is
computationally expensive and unfeasible for real-time configuration. To avoid this,
an on-line version of the previous scheme has been developed [91][32] and it is
presented in the following.
94 5 Hierarchical Radial Basis Functions Networks
a b
Fig. 5.6 The close neighborhood Cl,k of the Gaussian centered in μl,k , belonging to the l-th layer,
is shown in pale gray in (a). The ensemble of the close neighborhoods tessellates the input domain,
partitioning it in squares which have side equal to that of the l-th grid Δxl and are offset by half
grid side. In the next layer, Cl,k is split into four close neighborhoods, Cl+1,j (quads) according
to a quad-tree decomposition scheme, as shown in (b). Each Cl+1,j has the side half the length
of Cl,k , and it is centered in a Gaussian μl+1,j positioned in ”+”. Originally published in [91]
(reproduced by permission of IEEE)
Fig. 5.7 Schematic representation of the on-line HRBF configuration algorithm. Originally
published in [91] (reproduced by permission of IEEE)
can be inserted. This latter operation is called also split as 2D new Gaussians can
be inserted at the next layer, inside the area covered by the Gaussian of the current
layer. These two phases, depicted in the scheme in Fig. 5.7, are iterated until new
points are added.
The algorithm starts with a single Gaussian positioned approximately in the cen-
ter of the acquisition volume, with a width sufficiently large to cover the volume
covered by the data. An estimate of the dimension of the volume is therefore the
only a priori information needed by the configuration algorithm.
A particular data structure is associated to each Gaussian Gl,k . This contains the
Gaussian’s position μl,k , its weight wl,k , the numerator nl,k , and the denominator
dl,k of (5.29). The structure associated to the Gaussian at the top of the hierarchy
(current highest layer h) contains also all the samples that lie inside Ch,k . To obtain
this, when a Gaussian is split during learning, its associated samples are sorted
locally by mean of the qsort algorithm and distributed among the 2D new Gaussians
of the higher layer.
As Δxl = Δxl−1 /2, the close neighborhood of each Gaussian of the l-th layer
(father) is formed as the union of the close neighborhoods of the 2D corresponding
Gaussians of the (l + 1)-th layer (children). This relationship, depicted in Fig. 5.6b,
96 5 Hierarchical Radial Basis Functions Networks
is exploited to organize the data in a quad-tree: the points which lie inside Cl,k
are efficiently retrieved as those contained inside the close neighborhood of its four
children Gaussians.
In the following, it is assumed that the side of the receptive field Al,k and of
the influence region Il,k of a Gaussian are set to twice the size of the Gaussian’s
close neighborhood Cl,k to allow partial overlapping of adjacent units. However, any
relationship such that Al,k and Il,k cover an integer number of close neighborhoods
produces an efficient computational scheme.
When a new point xnew is given, the quantities nl,k , dl,k , and wl,k (5.29), associated
to the Gaussians such that xnew ∈ Al,k , are updated
2
/(σl /2)2
nl,k := nl,k + rl−1 (xnew )e−||μl,k −xnew || (5.37)
2
/(σl /2)2
dl,k := dl,k + e−||μl,k −xnew || (5.38)
nl,k
wl,k = · ΔxD l (5.39)
dl,k
where rl−1 (xnew ) is computed, like in (5.21), as the difference between the input
data and the sum of the output of the first l − 1 layers of the present network
computed in xnew .
It is explicitly noticed that the modification of the weight of a Gaussian in the
l-th layer Gl,k modifies the residual of that layer rl inside the Gaussian’s influence
region. Hence, the terms in (5.37)–(5.39) should be recomputed for all the layers
starting from the first layer upwards.
However, this would lead to an excessive computational load, and, in the updating
phase, the terms in (5.37)–(5.39) are modified only for the last configured layer, l.
The rationale is that increasing the number of points, (5.39) tends to (5.29). After
computing the new numerator, denominator and weight for the Gaussian l, k, the
residual is computed in xnew .
After updating the weights, xnew is inserted into the data structure associated to
the Gaussian of the highest layer h, such that xnew ∈ Ch,k .
After Q points have been collected, the need for new Gaussians is evaluated. To this
aim, the reconstructed manifold is examined inside the close neighborhood of
those Gaussians which satisfy the following three conditions: i) they do not have
5.3 Real-time, incremental surface construction through HRBF networks 97
any children (that is there are no Gaussians in a higher layer that share the same
support), ii) at least a given number K of points has been sampled inside their close
neighborhood, and iii) their close neighborhood includes at least one of the last Q
points acquired. These are the Gaussians candidated for splitting. Let us call J their
ensemble.
For each Gaussian of J, the local residual Rl,k is re-evaluated for all the points
inside its close neighborhood using the current network parameters. If Rl,k is larger
than the given error threshold ε, splitting occurs: 2D new Gaussians at half scale are
inserted inside Cl,k .
The points associated to the Gaussian Gl,k are distributed among these four new
Gaussians depending on which Cl,k they belong to [cf. Fig. 5.6b].
We remark that the estimate of Rl,k requires the computation of the residual, that
is the output of all the previous layers of the network for all the points inside Cl,k .
To this aim, the output of all the Gaussians (of all the layers) whose receptive field
contains Cl,k is computed.
The parameters of a newly inserted Gaussian Gl+1,k , nl+1,k , dl+1,k and wl+1,k
in (5.37, 5.38, 5.39) are computed using all the points contained in its close neigh-
borhood. For this new Gaussian, no distinction is made between the points sampled
in the earlier acquisition stages and the last Q sampled points. The quantities nl+1,k ,
dl+1,k and wl+1,k are set to zero when no data point is present inside Cl+1,k and the
Gaussian Gl+1,k will not contribute to the network output.
It is worth noting that, as a consequence of this growing mechanism, the network
does not grow layer by layer, as in the batch case, but it grows on a local basis.
In [97] it was shown that the sequence of the residuals obtained with the HRBF
scheme converges to zero under mild conditions on f (·). As the on-line configu-
ration procedure is different from the batch one, the convergence of the residuals
obtained with the on-line scheme has to be proved.
The on-line and the batch scheme differ for both the computation of the weights
[(5.37)–(5.39) versus (5.29)] and for the rule of insertion of new Gaussians (in the
batch scheme, this occurs layer-wise, while in the on-line scheme, it occurs locally
during the splitting phase).
It is first shown that the output of each layer of the on-line HRBF is asymptoti-
cally equivalent to that of the batch HRBF. Let us first consider the case of the first
layer.
Let fp be the input data set constituted of the first p points sampled from f and
denote with the convolution operation:
that produces the output of the first HRBF layer, configured using fp . It can be
shown that when p tends to infinity, the function computed in (5.40) converges to
the value computed in (5.30) for the batch case.
This is evident for this first layer, whose weights are estimated as fˇ(μl,j ), and
r0 (xi ) = yi holds. In this case, the following asymptotic condition can be derived:
p
2
/(σ1 /2)2
ym e−||μ1, k −xm ||
b m=1
lim w1,k = p ΔxD o
1 = lim w1,k (5.41)
p→∞ 2
p→∞
/(σ1 /2)2
e−||μ1, k −xm ||
m=1
b o
where w1,k are the weights computed in the batch algorithm by (5.29) and w1,k are
those computed in the on-line algorithm by (5.37)–(5.39).
It follows that:
where r1b (xi ) is the residual at the point xi computed by (5.29), and r1o (xi ) is the
same residual computed by (5.37, 5.38, 5.39).
If a second layer is considered, the estimate of its weights can be reformulated as
2
/(σ2 /2)2
n2, k := n2, k + (r1o (xp ) + Δr1 (xp )) · e−||μ2, k −xp || (5.43)
2 2
d2, k := d2, k + e−||μ2, k −xp || /(σ2 /2)
(5.44)
Since limp→∞ Δr1 (xp ) = 0 and d2, k always increases with p, the contribution
of the initially sampled data points becomes negligible as p increases. As a result,
b o
limp→∞ w2,k = limp→∞ w2,k , and also the approximation of the residual of the
second layer tends to be equal for the batch and on-line approaches. The same
applies also to the higher layers.
Splitting cannot introduce a poor approximation as the weights of the Gaussians
inserted during the splitting phase are robustly initialized with an estimate obtained
from at least K points.
The on-line HRBF model has been extensively applied to 3D scanning. Points
were acquired by the Autoscan system, that is based on acquiring the 3D position
of a laser spot through a set of cameras that can be set-up according to the
acquisition needs: a very flexible set-up is therefore obtained. Autoscan [47][48]
allows sampling more points inside those regions which contain more details: a
higher data density can therefore be achieved in those regions that contain higher
spatial frequencies. To this aim a real-time feedback of the reconstructed surface
5.3 Real-time, incremental surface construction through HRBF networks 99
Fig. 5.8 A typical data set acquired by the Autoscan system [48]: the panda mask in (a) has been
sampled into 33 000 3D points to obtain the cloud of points shown in panel (b). These points
constitute the input to the HRBF network [91] (reproduced by permission of IEEE). Note the
higher point density in the mouth and eyes regions
The error, expressed in millimeters, was measured in both the l1 -norm, mean
absolute error, εmean , and in l2 -norm and in the l2 -norm, root mean squared error,
RMSE, as:
N
1
εmean = |rT (xi )| (5.45a)
N i=1
N
1
RMSE = rT (xi )2 (5.45b)
N i=1
100 5 Hierarchical Radial Basis Functions Networks
a b c
d e f
Fig. 5.9 Panels show the reconstruction with on-line HRBF after 1 000, 5 000, 10 000, 20 000,
25 000, and 32 000 points have been sampled [91] (reproduced by permission of IEEE)
where rT (xi ) is the reconstruction error on the i-th point of the test set, i =
1, . . . , n. The error was measured as the mean value of the test error averaged over
ten randomizations on the same dataset.
To avoid border effects, (5.45) have been computed considering only the points
that lie inside an internal region of the input domain; this region has been defined as
the region delimited by the convex hull of the data set, reduced by 10%.
Results of the comparison with the batch HRBF model are reported in Table 5.1.
These figures have been obtained with the following parameters: Q = 100, K = 3,
L = 8 layers. The error threshold, ε, was set for all the layers equal to the nominal
digitization error, that was 0.4 mm and the final reconstruction error of 0.391 mm,
is very close to this value. A total of 9 222 Gaussians were allocated over the eight
layers and produce a sparse approximation (cf. Figs. 5.10d–f).
Table 5.1 Accuracy and parameters of each layer of the HRBF networks
on-line pure batch batch constrained
#layer σ #Gauss. (total) RMSE εmean #Gauss. (total) RMSE εmean RMSE εmean
1 363.3 1 (1) 47.8 46.2 1 (1) 47.8 46.2 47.8 46.2
2 181.7 4 (5) 30.2 28.0 4 (5) 30.2 28.0 30.2 28.0
3 90.8 16 (21) 13.0 10.7 16 (21) 13.0 10.7 13.0 10.7
4 45.4 46 (67) 6.44 5.10 62 (83) 6.40 5.05 6.39 5.07
5 22.7 160 (227) 3.33 2.68 204 (287) 3.07 2.50 3.03 2.48
6 11.4 573 (800) 2.17 1.66 678 (965) 1.73 1.41 1.72 1.41
7 5.68 2 092 (2 892) 1.16 0.838 2 349 (3 314) 0.849 0.637 0.872 0.657
8 2.84 6 330 (9 222) 0.530 0.391 7 079 (10 393) 0.510 0.373 0.526 0.385
Published in [91] (reproduced by permission of IEEE).
5.3 Real-time, incremental surface construction through HRBF networks
101
102 5 Hierarchical Radial Basis Functions Networks
Fig. 5.10 Reconstruction with (a) HRBF batch and (b) HRBF on-line. The difference between
the two surfaces is shown in panel (c). In panels (d)–(f) the center of the Gaussians allocated by
the on-line algorithm in the last three layers is shown. Originally published in [91] (reproduced by
permission of IEEE)
The network complexity and the reconstruction error have been compared with
those obtained when the network was configured using a batch approach [95], with
the same number of layers as the on-line version.
Two batch modalities have been considered. In the first one, pure batch, the
configuration procedure described in Sect. 5.2.3 [97] is adopted. In the second
approach, batch constrained, the Gaussians are placed in the same position, and
have the same width as those of the on-line approach, while the weights are
computed by (5.29), considering all the data points inside the receptive field of each
Gaussian, as described in [46].
As shown in Fig. 5.10, the surface reconstructed by the batch HRBF has a slight
better appearance than the on-line one, especially at the object border as shown
by the difference image (Figs. 5.10c and 5.11). This was obtained at the expense
of a larger number of Gaussians: about 12.7% more than those used in the on-
5.3 Real-time, incremental surface construction through HRBF networks 103
Fig. 5.11 Difference in the reconstruction error on the points of the test set: on-line vs. pure batch
(a), on-line vs. batch constrained (b). Originally published in [91] (reproduced by permission of
IEEE)
line approach, being 10 393 versus 9 222 (Table 5.1). Despite the difference in the
number of Gaussians, the global accuracy of the batch approach is only slightly
better than the on-line one, being of 0.373 mm versus 0.391 mm (4.82%).
As the acquisition was stopped when the visual appearance of the model
(reconstructed with the on-line approach) was considered adequate by the operator,
we have investigated if there was room for further accuracy improvement by
acquiring more data points. To this aim, the rate of Gaussians allocation and of
error decrease as a function of the number of data points is plotted in Fig. 5.12.
The figure shows also that, adding new points, the error of the on-line model can
be slightly lowered down to 0.381 mm, closer to the batch approach. To achieve
such an error only 99 more Gaussians are required. However, as it is clearly shown,
the batch version grows and converges faster than the on-line version: it achieves
εmean of 0.391 mm using only 8 500 data points. This is due to the fact that in the
batch approach all the parameters are computed together for each layer and they are
therefore optimized.
Results are consistent for different artifacts (cf. Table 5.2 and Fig. 5.13). Real-
time visualization of the actual surface has been of great value in directing the laser
spot for more time in the most critical regions, collecting more points there. Best
results are obtained when the points are sampled mostly uniformly in the first few
layers. This allows a more robust estimate of the weights of the Gaussians of these
layers, and these Gaussians cover each a large portion of the input space.
In all the experiments, data acquisition was stopped when the visual appearance
of the reconstructed model was considered satisfactory by the operator. Alterna-
tively, data acquisition could be stopped when splitting inside the HRBF model
does not occur anymore.
104 5 Hierarchical Radial Basis Functions Networks
#effective Gaussians
set is grown of 500 data
8000
points at a time. Originally
published in [91] (reproduced
6000
by permission of IEEE)
4000
2000 online
batch
0
0 2 4 6
#Acquired points
x 104
1.2
0.38
1
0.37
0.8 3 4 5 6
x 104
0.6
0.4
0.2
0 1 2 3 4 5 6
#Acquired points
x 104
Fig. 5.13 HRBF on-line reconstruction of the dataset (a) cow (33861 points, 9501 Gaussians),
and (b) doll (15851 points, 6058 Gaussians) [91] (reproduced by permission of IEEE)
The behavior of the test error as a function of Q is shown in Fig. 5.14b. For
small values of Q, the behavior does not changes significantly with Q: the error
starts increasing after Q = 1 000, although the increase is of small amplitude (about
0.01 mm from Q = 1 000 to Q = 2 500). The number of Gaussian units instead
decreases monotonically with Q, with a marked decrease above Q = 300. This
can be explained by the fact that, when a new Gaussian is inserted, its weight is
initialized using all the already acquired points that belong to its close neighborhood
(Sect. 5.3.2). Afterward, its weight is updated considering only the points inserted
inside its receptive field, as in (5.37, 5.38, 5.39). Therefore, increasing Q, the weight
associated to each new Gaussian can be computed more reliably as more points are
available for its estimate. However, when Q assumes too large values with respect
to the number of available data points, not enough splits are allowed to occur (these
are at most N/Q), and the reconstruction becomes poorer. This situation is depicted
in Fig. 5.14b where for a relatively large value of Q, the test error tends to increase
with Q, as an effect of the decrease in the number of the allocated Gaussians.
From the curve in Fig. 5.14, it can be derived that the optimal value of Q would
be about 1 000, as it allows a low reconstruction error with a reduced number of
Gaussians. However, the price to be paid for such saving in computational units is
some loss in the degree of interactivity. In fact, when Q increases, splitting occurs
less frequently and, as it produces the largest decrease in the reconstruction error, a
longer time has to elapse before the user can see a large change in the appearance of
the reconstructed model while the user would see, ideally, the reconstructed model
improve swiftly in time. Moreover, the reconstruction error (computed as in 5.45a)
decreases less quickly as shown in Fig. 5.15b. Therefore, although the value of K
and Q may be subjected to optimization with respect to network accuracy or size,
the resulting value may not be adequate for real-time applications. In particular, as
Q produces a very similar test error over a wide range of values, it has been set here
5.3 Real-time, incremental surface construction through HRBF networks 107
#Gaussians
0.55
fixed to 3 (b). Originally 7000
published in [91] (reproduced 0.5
by permission of IEEE) 6000
0.45
5000
0.4
0.35 test err 4000
#gauss
3000
0 5 10 15 20
K
N=31000, K=3
b 0.425 9300
9200
0.42
9100
Test error [mm]
#Gaussians
0.415
test err 9000
#gauss 8900
0.41
8800
0.405
8700
0.4 8600
100 102 104
Q
#Gaussians
and, in (b), Q was fixed to 40
6000
1 000 [91] (reproduced by
permission of IEEE) 30
4000
20
10 2000
0 0
0 1 2 3
#Acquired points
x 104
b Q = 1000, K = 3
60 10000
test err
50 #gauss 8000
Test error [mm]
#Gaussians
40
6000
30
4000
20
10 2000
0 0
0 1 2 3
#Acquired points
x 104
For this reason, a very low value of K can be chosen: a value of K < 5 worked
well in all our experiments and produced a good reconstruction with a reasonably
low number of Gaussians.
The parameter L decides the level of detail in the reconstructed surface as it
sets the smallest value of σ, related to the local spatial frequency. It could be set
in advance when this information is available to avoid the introduction of spurious
high frequency components. Otherwise, L can be incremented until the error in
(5.45) goes under threshold or a maximum number of Gaussians is inserted.
It should be remarked that in this latter case, if Q were too small, L could increase
more than necessary.
The mechanism used in the weight update phase, that does not require the
reconfiguration of the whole network, may introduce a slight bias in the weights.
This can be appreciated in Table 5.1, where the accuracy obtained when the weights
are estimated considering all the data points (batch constrained) is compared with
that obtained with the on-line approach described here.
5.3 Real-time, incremental surface construction through HRBF networks 109
In fact, the value output by the current network in μl, k (5.29) is computed as the
ratio between nl, k and dl, k , obtained as the run-time sum of the values derived from
each sampled point for the higher layer. However, this value is equal to that output
by the batch model only for the first layer, in which the weight of all the Gaussians
is computed using all the sampled points inside the Gaussians receptive field). In the
higher layers, where the residual in the already acquired points is not updated, the
estimate of the weights may introduce a bias that, in turns, may produce a bias in
reconstruction.
However, as this bias increases the value of the residual, in the splitting phase
it is taken into account by the weights of the new Gaussians inserted in the higher
layer. In fact, the residual is recomputed there for the last layer, using all the data
points inside the Cl, k . Moreover, due to the small angle between the spaces spanned
by two consecutive approximations, the HRBF model is able to compensate the
reconstruction error in one layer, with the approximation produced by the next
layer [97].
The maximum number of layers does not determine only the maximum spatial
frequency that can be reconstructed, but it has also a subtle effect. In fact, the on-line
configuration approach, differently from the batch one, can introduce Gaussians at
the k-th level also when k − 1-th level has not been completed, as the parameters
of each Gaussian are updated independently from those of the others. Therefore
one branch of the network can grow before another branch; a Gaussian can be
split before the neighbor Gaussians of the same layer. This is the case when higher
frequency details are concentrated inside the receptive field of that Gaussian.
When the maximum number of layers of the network, L, is too low for the
frequency content of the given dataset, the error inside the close neighborhood of
some Gaussians of the last layers will contain also the highest frequency details.
As a consequence, the reconstruction in these regions can be poor. This error
affects also the close neighborhood of the adjacent units by the influence region
and the receptive field of the corresponding Gaussians. This, in turns, may induce
splitting of these adjacent units and produces networks of different structures when
a different maximum number of layers is prescribed (cf. Table 5.3).
On-line HRBF shares with other growing networks models the criterion for
inserting new units: the insertion is decided on the basis of an error measure,
computed locally, that is fundamental to achieve real-time operation. The other
element which allows real-time operation is the adoption of a grid support for the
units, which guides Gaussians positioning. This is shared also by [18]. However,
in [18] all the weights are recomputed after the insertion of new Gaussians, while
110 5 Hierarchical Radial Basis Functions Networks
here only a subset of the weights is recomputed thanks to the hierarchical close
neighborhood structure. This produces a large saving, especially for large networks.
Grid support has been also adopted by [207][160]; however, in their approach global
optimization is used, that makes the configuration procedure computationally heavy.
Finally, growing strategy implicitly implements an active learning procedure
[117] as the data points that participate in the configuration of the new Gaussians
are only those that carry an over-threshold error.
Chapter 6
Hierarchical Support Vector Regression
In the previous chapter the RBFN model and the advantages of a hierarchical
version for surface reconstruction has been presented. In a similar way in this
chapter another paradigm, Support Vector Regression (SVR), and its hierarchical
version, Hierarchical Support Vector Regression (HSVR) that allows an efficient
construction of the approximating surface, are introduced. Thanks to the hierarchical
structure, the model can be better applied to 3D surface reconstruction giving a new,
more robust and faster configuration procedure.
The SVM approach has been developed by Vapnik and co-workers from 1960’s to
1990’s who, starting from the non-linear generalization of the Generalized Portrait
model [238], extensively analyzed the conditions under which a minimum of the
generalization error corresponds to the minimum of the training error. This has led
to the foundations of the statistical learning theory [236][237], called also Vapnik
Chervonenkis (VC) theory. This has led to defining a cost function that has the shape
of a regularizer.
Initially, this approach was applied only to classification problems. In partic-
ular the first works were focused on optical character recognition (OCR) [24].
The classification problem is formulated as a convex optimization problem in which
the solution is unique and it can be found using standard optimization software.
Furthermore, the computational complexity of the procedure used to find the
solution is independent from the input dimension. Thanks to the good performances
with respect to other methods, SVMs rapidly became popular for all areas where
statistical classification required [53][54].
SVMs have been more recently extended to regression [215], domain in which
this approach has been called Support Vector Regression (SVR). The problem is for-
mulated, again, as convex optimization problem whose computational complexity is
independent from the data input dimensionality and it has shown good performances
for the solution of many applicative problems [84][171][217], among which 3D
scanning is one of the most suitable.
The quality of the solution computed by the SVR paradigm depends on a few
hyperparameters, generally, selected using a trial and error strategy. When the data
are characterized by different frequency content over the input domain, a combina-
tion of parameters that produces a good solution over the entire input domain may
not be found. A hierarchical SVR structure, that realizes a reconstruction locally at
different scales, addresses this problem and it is described here: core of this chapter
is an innovative paradigm based on SVR allowing multi-scale reconstruction and its
use for 3D surface reconstruction.
In order to better understand the working principles of the Hierarchical Support
Vector Regression (HSVR) model, the basic SVM approach is initially considered.
Starting from the linear classification problem, the explanation is extended to non-
linear case and then to the regression problem. In Sect. 6.2, the hierarchical approach
is treated in its different aspect and the HSVR regression is described along as the
results of its application to 3D scanning.
the margin, making the problem well-posed. This is defined as the distance between
the hyperplane and the closest examples (Fig. 6.2). The maximization of the margin
allows finding the best solution from a statistical point of view, i.e., the hyperplane
that minimizes the statistical risk of misclassification.
The computation of the hyperplane that maximizes the margin is shown in the
following. Let h(x) = ω · x + b = 0 be the equation of the hyperplane. The
signed distance between the hyperplane and a generic point x is h(x) ||ω|| (Fig. 6.3).
The distance can be obtained multiplying the signed distance by the label yi (6.2).
Every separator hyperplane satisfies the following:
114 6 Hierarchical Support Vector Regression
yi h(xi )
≥ M > 0, ∀ i = 1, . . . , n (6.3)
||ω||
where M represent the distance between the hyperplane and the closest example.
The maximal margin is then:
If only canonical hyperplanes are considered, maximizing the margin has the same
1
effect of minimizing ||ω|| = M . The hyperplane (ω, b) that solves the optimization
problem:
1
minω,b ||ω||2
2 (6.6)
subject to yi (ω · x + b) ≥ 1, i = 1, . . . , n
1
realizes the maximal margin hyperplane with geometric margin M = ||ω|| .
The optimization problem can be solved transforming it into its corresponding
dual problem, because the latter is, generally, easier to solve than the primal one.
The dual problem is obtained by the Lagrangian form of the primal problem (6.6):
n
1
L(ω, b, α) = ||ω||2 − αi (yi (ω · xi + b) − 1) (6.7)
2 i=1
where αi are the Lagrangian multipliers. If the Lagrangian form is maximized with
respect to the multipliers αi ≥ 0 and minimized with respect to ω and b:
it can be shown that the solution of this problem is the same of the solution of the
primal problem (6.6).
Let (ω ∗ , b∗ ) be a pair of values for the problem (6.6). If (ω ∗ , b∗ ) do not satisfy
all the constraints of (6.6) then maxα L(ω, b, α) tends to infinite and hence
(ω ∗ , b∗ ) is not a solution of (6.8). If (ω ∗ , b∗ ) satisfy all the constraints of (6.6) then
maxα L(ω, b, α) = 12 ||ω||2 , hence the solution of (6.8) is equal to the solution of
(6.6).
Necessary conditions for a point (ω, b) to be a minimum of the primal problem
(6.6) are the following:
6.1 SVM Origin 115
r
δL(ω, b, α)
=0⇒ αi yi = 0
δb i=1
r
δL(ω, b, α)
=0⇒ω= αi yi xi (6.9)
δω i=1
r r r
1
L(ω, b, α) = (ω · ω) − αi (ω · xi ) − b αi yi + αi (6.10)
2 i=1 i=1 i=1
Substituting the conditions (6.9) in the right part of (6.11), the dual problem is then
obtained:
1
maxα W (α) = αi αj yi yj (xi · xj )
2 i,j
s.t. (6.12)
αi ≥ 0, ∀ i = 1, . . . , r
Therefore, the minimum of the primal problem (6.6) coincides with the maximum
of the dual problem (6.12). The latter is also a quadratic programming problem
(convex quadratic functional and linear constraints) with a unique minimum that
can be found using standard optimization software.
∗
rLet α∗ be the solution of the dual problem. The second condition of (6.9) is ω ∗ =
∗
i=1 αi yi xi hence ω is a linear combination of training set points. Furthermore,
from the Kuhn-Tucker theorem [141] it is known that the solution has to satisfy:
α∗ (yi (ω ∗ · xi + b∗ ) − 1) = 0, ∀ i = 1, . . . , n (6.13)
The vector ω ∗ that determines the slope of the hyperplane is computed as linear
combination of SVs:
ω∗ = yi α∗i xi (6.14)
SV
116 6 Hierarchical Support Vector Regression
The b∗ can be computed using the KKT corresponding to just anyone of the support
vectors:
yi (ω ∗ · xi + b∗ ) = 1 ⇒ b∗ = yi − yj α∗j (xj · xi ) (6.15)
SV
If the training set is non-linearly separable (Fig. 6.5), there is no hyperplane that
can correctly classify all the points. However, it is clear that some hyperplanes are
preferable than others for this task. A possible strategy consists in the minimization
of the misclassification error, that is the number of points incorrectly classified,
and in the maximization at the same time, of the margin for the points correctly
classified.
6.1 SVM Origin 117
In order to realize that strategy, called soft margin, the constraints are relaxed by
means of the introduction of the slack variables, ξi ≥ 0. The constraints in (6.6) are
reformulated as:
yi (ω · xi + b) ≥ 1 − ξi , ∀ i = 1, . . . , n (6.17)
n p
The classification error for the training set can be measured as i=1 ξi , where
p ∈ R. The minimization problem can be expressed as:
n
1
min (ω T · ω) + C ξi1 (6.18)
2 i=1
where p has been chosen equal to one. C is a regularization constant that determines
the trade-off between misclassified samples and the maximization of the margin.
The corresponding Lagrangian is:
n n n
1
L(ω, b, ξ, α, q)= (ω · ω)+C ξi − αi (yi (ω · xi +b)−1 + ξi ) − qi ξi
2 i=1 i=1 i=1
(6.19)
with αi ≥ 0 and qi ≥ 0. The dual form is found by differentiating with respect to
ω, ξ, and b and imposing stationarity:
r
δL(ω, b, ξ, α, r)
=ω− yi αi xi = 0
δω i=1
δL(ω, b, ξ, α, r)
= C − αi − qi = 0
δξi
r
δL(ω, b, ξ, α, r)
= yi αi = 0 (6.20)
δb i=1
s.t.
(6.21)
r
αi yi = 0,
i=1
0 ≤ αi ≤ C, ∀ i = 1, . . . , r
(6.21) differs from (6.12) only for the constraints on the multipliers. In this case, the
KKT conditions are:
118 6 Hierarchical Support Vector Regression
Fig. 6.6 The function φ maps the data in another space called feature space
Fig. 6.7 In the feature space the data become linearly separable, and the SVM algorithm can be
applied
αi (yi (ω · xi + b) − 1 + ξi ) = 0
(6.22)
ξi (αi − C) = 0
As in the separable case the solution is sparse. The points for which α∗i = C are
called bounded support vectors and they have non-zero associated slack variables
(second KKT condition). The points for which 0 ≤ α∗i < C are called unbounded
support vectors and they have null slack variables. The decision function is equal to
(6.16).
Since SVMs are linear machines, they are able to compute only hyperplanes. Hence,
they perform poorly on classification problems where the data are not linearly
separable. The strategy used to realize a non linear classification with a linear
machine is based on the idea that the data can be mapped in another space, called
feature space (Fig. 6.6). Since, generally, the feature space has higher dimension,
the data in this space can become linearly separable, which allows the use of the
SVM algorithm (Fig. 6.7).
6.1 SVM Origin 119
s.t.
(6.24)
n
i=1 αi yi = 0,
0 ≤ αi ≤ C, ∀ i = 1, . . . , n
We note that the elements φ(xi ) appear only in dot products in both dual problem
and in the decision function. Computing directly the results of these dot products
is possible to avoid the explicit computation of φ function. The kernel function,
introduced in the next section, is the tool that allows the direct computation of the
dot products.
6.1.3.1 Kernel
where φ is a map from the input space X to space F endowed with the dot product
operation. The kernel defines implicitly the map φ and then can be used to find the
optimal hyperplane in the space F . Hence, the explicit computation of φ(x) can be
avoided using the kernel (this technique is also known as the “kernel trick”).
120 6 Hierarchical Support Vector Regression
The kernel can be found from the mapping function φ but generally it is set a
priori and the function φ can remain unknown. As the dot product is commutative,
the kernel has to be symmetric:
1
max W (α) = αi αj yi yj k(xi , xj )
α 2 i,j
s.t.
r
αi yi = 0,
i=1
0 ≤ αi ≤ C, ∀ i = 1, . . . , r (6.29)
Fig. 6.8 The goodness of an approximation, fˆ, is evaluated on the distance between a point and its
approximation on the surface (left panel). This distance is evaluated by means of a loss function,
which, for instance, can be quadratic or linear (right panel)
Fig. 6.9 The points inside the ε-tube do not contribute to the training error (on the left), while the
points outside the ε-tube contribute linearly to the error due to the shape of the loss function (on
the right)
The cost function can be seen as composed of two terms: the first, 12 (ω T ·ω), controls
1 n
the slope of the solution, while the second, n i=1 Lε1 (|yi − (ω · xi ) − b|), controls
the approximation error.
By the absorption of the term n1 in the constant C and introducing the slack
variables the problem becomes:
n
1
minω,b (ω · ω) + C (ξi + ξi∗ )
2 i=1
s.t.
(6.33)
yi − (ω · x) − b ≤ ε + ξi , ∀ i = 1, . . . , n
(ω · x) + b − yi ≤ ε + ξi∗ , ∀ i = 1, . . . , n
ξi , ξi∗ ≥ 0, ∀ i = 1, . . . , n
The constant C is the regularizer parameter and it controls the trade-off between the
slope of the hyperplane and the training error measured according to (6.31).
As for the classification case, the Lagrangian of the primal problem is formulated
introducing the multipliers αi , α∗i , ηi , ηi∗ :
n
1
L(ω, b, α, α∗ , η, η ∗ ) = ||ω||2 + C (ξi + ξi∗ )
2 i=1
n
− αi (ε + ξi − yi + (ω · xi + b))
i=1
n
− α∗i (ε + ξi∗ + yi − (ω · xi − b))
i=1
n
− (ηi ξi + ηi∗ ξi∗ ) (6.34)
i=1
n
δL
=0⇒ω= (αi − α∗i )xi
δω i=1
6.1 SVM Origin 123
δL
= 0 ⇒ ηi = C − αi
δξi
δL
= 0 ⇒ ηi∗ = C − α∗i (6.35)
δξi∗
From the substitution of the previous conditions in the Lagrangian, the dual
problem is obtained:
n
n
maxαi ,αj (αi − α∗i )(αj − α∗j )(xi · xj )− ε (αi + α∗i )+
i,j=1 i=1
n
yi (αi + α∗i )
i=1
s.t. (6.36)
n
(αi − α∗i ) = 0
i=1
The first two conditions assure that the Lagrangian multipliers are zero for each
point lying in the ε-tube. Then the solution is sparse. The last two conditions say
that each point lying outside the ε-tube is a bounded support vector, namely ξi >
0 ⇒ αi = C. Each point lying on the margin (yi − (ω · xi ) + b = ε, ξ = 0), has
Lagrangian multipliers with value in the range [0, C], 0 ≤ αi ≤ C (Fig. 6.10).
The solution can be expressed as:
fˆ(x) = (αi − α∗i )(xi · x) + b (6.38)
SV
The value of b is computed using the KKT conditions. In Fig. 6.11, an example
of linear regression is shown for several value of ε. The training set is obtained
sampling a linear function (Fig. 6.11a) and a random uniform quantity has been
added to the samples.
124 6 Hierarchical Support Vector Regression
a b
c d
Fig. 6.11 In (a), a training set is obtained sampling a linear function in 10 points and adding a
random uniform quantity to the points. In (b), (c), and (d) the SVR approximation obtained with
ε-tube respectively of 0:1, 0:5, and 0:7 is shown by the darker line. The circled points are the SVs
6.1 SVM Origin 125
Since the SVR are linear machines, they are able to compute just linear regression.
As for the classification problem, in order to obtain a non-linear solution the training
set is mapped to another space (characterized by higher dimension) in which the
problem becomes linear. The solution is then computed in that space. Since the
feature space can have infinite dimensions, the computation of the mapping function
can be unfeasible. Fortunately, by the use of the kernel, also in this case, the explicit
computation of this function can be avoided.
First of all, it can be noted that both in the dual regression problem (6.36) and
in the decision function (6.38) the training points appear only as the dot product of
pairs of them. The kernel function can be used to compute the result of these dot
products. By using the kernel, the regression optimization problem has the form:
n
n
maxαi ,αj (αi − α∗i )(αj − α∗j )k(xi , xj )− ε (αi + α∗i )+
i,j=1 i=1
n
yi (αi + α∗i )
i=1
s.t. (6.39)
n
(αi − α∗i ) = 0
i=1
n (6.41)
βi = 0
i=1
−C ≤ βi ≤ C, ∀ i = 1, . . . , n
The decision function becomes:
h(x) = βi k(xi , x) + b (6.42)
SV
126 6 Hierarchical Support Vector Regression
As the solution satisfies the KKT conditions, the following conditions on βi hold:
⎧
⎪
⎪ 0, |yi − f (xi )| < ε
⎪
⎪
⎪
⎨
|βi | = [0, C], |yi − f (xi )| = ε (6.43)
⎪
⎪
⎪
⎪
⎪
⎩
C, |yi − f (xi )| > ε
⎧
⎪
⎪ 0, |yi − f (xi )| < ε − δ
⎪
⎪
⎪
⎨
|βi | = [0, C], ε − δ ≤ |yi − f (xi )| ≤ ε + δ (6.44)
⎪
⎪
⎪
⎪
⎪
⎩
C, |yi − f (xi )| > ε + δ
The Gaussian function is one of the most used kernels. When used, (6.42)
assumes exactly the form of the function computed by the RBF network (5.2).
Hence, both the SVRs and the RBF networks are characterized by a solution that is
a linear combination of kernel functions.
In the following sections the SVM regression using kernel functions is considered
and an approach based on a hierarchical structure is presented. In order to support
the explanation, some experimental results are reported.
A single kernel function is used in the standard SVR approach. This features a
predefined shape characterized by a set of parameters and therefore has a predefined
frequency content. The accuracy of the solution is strongly dependent on the choice
of such kernel and the choice is often not straightforward [142]. Generally, the
kernel is chosen through trial and error, considering a priori knowledge on the
problem or using heuristic strategies.
There are datasets for which the use of a single kernel does not produce accurate
solutions: for example when the data are characterized by a frequency content that
6.2 Multi-scale hierarchical SVR 127
a
Target function
3
2.5
h(x)
1.5
0.5
0
0 0.5 1 1.5 2
x
b c
SVR (σ = 0.05) SVR (σ = 0.015)
3 3
2.5 2.5
2 2
f(x)
f(x)
1.5 1.5
ˆ
1 1
0.5 0.5
0 0
0 0.5 1 1.5 2 0 0.5 1 1.5 2
x x
Fig. 6.12 (a) A function with non-stationary frequency content, and (b)–(c) two SVR using a
single Gaussian kernel with two different scale parameters, σ. (b) A large scale kernel provides a
smooth solution, but is unable to reconstruct the details, while (c) a small scale kernel suffers of
overfitting, providing poor generalization. Originally published in [31] (reproduced by permission
of IEEE)
varies in the space. Multiple kernels approaches have been introduced [195] [242]
to cover these cases. In order to improve the accuracy of the solution, the kernel
is defined as a linear combination of basic kernels and the coefficients of this
combination are determined during the computation of the solution. With these
approaches the number and the type of the basic kernels have to be chosen a priori.
The form of the solution becomes:
n
m
fˆ(x) = (βi μj kj (x, xi )) + b (6.45)
i=1 j=1
where m is the number of basic kernels and μj are the coefficients that have to be
determined in the optimization phase. The solution is a linear combination of the
same multiple kernels function.
128 6 Hierarchical Support Vector Regression
An example of using a SVR with a single kernel is shown in Fig. 6.12. The data
points have been sampled on the curve f (·)
whose local frequency content increases with x (Fig. 6.12a). The sampling step
1
is decreased with the local frequency according to 120x . A large scale kernel
determines a solution with large error in the high frequency regions (Fig. 6.12b).
Vice versa, a small scale kernel determines an inaccurate solution in the low
frequency regions (Fig. 6.12c).
As for the HRBF model, using a set of different layers, each one featuring a
reconstruction at a certain scale, it is possible to cope with the problem shown
before. In particular, each layer will be characterized by the same kernel function,
but in the different layers different kernel function will be used. This model has been
named Hierarchical Support Vector Regression (HSVR).
As the HRBF model, the HSVR model is composed of a set of layers organized in
a hierarchical stack. The output of the model is computed as the sum of the outputs
of each layer, {al (·)}. Each layer is a SVR and it is characterized by the scale of
its kernel. In case of the Gaussian kernel, these parameters are the Gaussian width.
More formally, the solution computed by the HSVR model has the following form:
L
fˆ(x) = al (x, σl ) (6.47)
l=1
where L is the number of layers and σl determines the scale of the kernel of the l-th
layer. The scale of each layer decrease with the increase in the number of layers,
σl ≥ σl+1 . One of the most used kernels is the Gaussian function; in this case the
output of each layer has the following form:
Ml
Ml
2
/σl2 )
al (x; σl ) = βl,k G(||x − xl,k ||; σl ) + bl = βl,k e((x−xl,k ) + bl (6.48)
k=1 k=1
where Ml is the number of SVs, βl,k is the coefficient of the k-th SV and bl is the
bias of the l-th layer. The σl parameter determines the reconstruction scale of the
l-th layer. The configuration of the HSVR model is realized layer by layer starting
from that one with the largest σ, σ1 .
The configuration of first layer is realized using the dataset. For all the other
layers, the configuration is performed considering the residual rl at the previous
6.2 Multi-scale hierarchical SVR 129
layer, namely the difference between the dataset and the output of the model
configured up to l-layer. The residual rl is computed as:
The l-th layer will be configured with the training set, Sl , defined as:
Although the previous scheme does solve the problem of reconstructing a function
with different frequency content in different input regions, it presents the drawback
of requiring a large number of SVs. This is due to to the fact that the number of
SVs is strongly related to ε and the HSVR model uses the same ε for all the layers.
The total number of SVs is then generally close to the number of layers multiplied
130 6 Hierarchical Support Vector Regression
by the number of SVs of the first layer. As the first layers compute a very smooth
approximation of the data it seems unreasonable that they required a large number
of SVs with large scale parameter. This is due to the form of the solution computed
by SVR: all the points outside the ε-tube are selected as SVs (6.43).
This problem is faced configuring each layer two times. The first time the
regression is computed using all the data to select few points that are meaningful
for that layer. The second time, the regression can be computed using only a subset
of selected points. We note that the points that lie close to the regression function
are those that are better explained by the approximation itself while those that lie
far from the function can be regarded as outliers. For this reason, the configuration
of the layer using just few meaningful points produces an approximation similar to
that computed using all the points with a substantial decrease in the total number
of SVs. For each layer the error introduced using the selected set of points is,
generally, small (cf. Fig. 6.13). On the contrary, the number of SVs decreases
significantly. The selected set of points used in the second step for configuring a
layer is defined as:
! ε"
Sl = (xi , rl−1 (xi )) s.t. ||rl (xi )| − ε| < δ ∨ |rl (xi )| < (6.52)
2
where δ is the numerical approximation tolerance threshold (6.44). This set is
composed of the points that lie on the border of ε-tube and those that lie far from
the approximation less of ε/2. It is remarked that, for each layer, the approximation
is computed in two different steps. In the first step an approximation of the residual
is computed using all the training points. The aim of this phase is the computation
of Sl . In the second step, using only the points in Sl , the final regression for that
layer is obtained. As in the second step the density of the points is significantly less
than in the first step, experimental results have suggested to increase, for the second
step, the value of Cl proportionally to the change in data density:
|Sl | |Sl |
Cl = Cl = J std( rl−1 (xi )) (6.53)
|Sl | |Sl |
It should be remarked that, similarly to the HRBF model [97] [91], the error
introduced by the reduction strategy is not critical because it will be contained
in the residual taken care in the next layer where it is recovered. Other reduction
procedures for SVR [152] [175] have been proposed in the literature and they will
be compared and discussed in Sect. 6.2.7.
In order to evaluate the performance of the HSVR model with respect to the
standard SVR model [215] for 3D scanning, an experimental comparison has been
6.2 Multi-scale hierarchical SVR 131
f(x)
−2
displayed as dots. Thick dots
represent all the points used
by the optimization engine to −3
determine the regression
represented as a thick dashed −4 Errmean = 0.427
line. Circled dots represent
the SVs.Errmeanˆ is computed
as 1/n n i=1 |f (xi ) − yi |. In
−5
panel (a) the SVR curve −1 0 1 2
obtained through standard x
SVR (ε = 0.416, C = 9.67, b
σ = 1.66, Errmean = 0.427) 1
and in panel (b) the solution
obtained considering only the
0
points in Sl (6.52). Note that
in the latter case only 32
points are used in the −1
optimization procedure (the
unused points are shown as
f(x)
−2
small dots). The number of
SVs drops from 49 to 5. Both
solutions are contained inside −3
an ε-tube around the real
function. Originally −4 Errmean = 0.455
published in [31] (reproduced
by permission of IEEE)
−5
−1 0 1 2
x
carried out. The experiments have been performed on simulated and real data. The
obtained surface has been evaluated from a qualitative point of view analyzing the
surface aspect and from a quantitative point of view using the following indices: the
root mean square error (RMSE), the mean absolute error (Errmean ) and its standard
deviation (Errstd ). The datasets have been partitioned in three subsets: training set,
validation test and test set. The first one is used to configure the models, the second
to select, with respect to the accuracy, the best combination of hyperparameters (ε,
σ, and C for the standard SVR model, and ε and C only for the HSVR model) and
the last to evaluate the selected model.
For the HSVR model the validation set is used also to decide when the insertion
of new layers has to be stopped. The σ of the first layer is taken equal to the size
132 6 Hierarchical Support Vector Regression
of the input domain. The value of C of each layer is set as indicated in (6.51)
and (6.53).
The LibCVM Toolkit Version 2.2 [231] has been used to solve the optimization
problem of the both HSVR and standard SVR models. This software has been
chosen because it shows the accuracy of SVMlight [132] (one of the most popular C
implementation for SVM) with sensitive saving of computational time. The machine
used for the experiments was equipped with an Intel Pentium 4, at 2.40 GHz, 512
KB cache, and 512 MB of memory.
The first experiment has been carried out using a 2D synthetic dataset to better
appreciate the regression properties. The set of points has been obtained sampling
the univariate function defined in (6.46). This function, having different frequency
content in different input domain regions, allows for emphasizing the limits due
to the use of a single scale kernel. The training set has 252 points, sampled with
a varying step proportional to the frequency content. The values of a random
uniform variable in the range of [−0.1, 0.1] have been added to the data to simulate
the measurement noise. The validation and test set have been obtained sampling
uniformly the target function (6.46) in 500 points.
The trials on standard SVR have been performed on all the possible combinations
of the following values of the hyperparameters, ε, σ, and C:
0
0 0.5 1 1.5 2
x
computational time grows enormously to 3,129 s, also in this case the error for
the SVR is larger than that one of HSVR.
When the approximation error is under the value of ε, the points, on average,
lie inside the ε-tube. This happens for the SVR only when the value of ε is quite
large (for ε > 0.1). Instead for the HSVR model the error is smaller than ε for
ε > 0.05 and therefore also for the optimal value of 0.075 (cf. Fig. 6.16). The
different frequency contribution of the HSVR layers is shown in Fig. 6.17. Looking
at this figure it is clear as the different layers act at different level of detail. The
approximations with and without reduction are very close each other and they have
a similar time course. The effect of reduction for each layer is depicted in Fig. 6.18.
The error for HSVR model without and with reduction after the first and second
configuration step is represented with respect to the number of layers. The figure
shows as the final error is almost coincident for the three cases. A more detailed
description of the approximation error for the two HSVR models is summarized in
Table 6.2.
The Fig. 6.19 represents a typical points cloud obtained by a 3D scanner. This
dataset has been collected from a real artifact (a Panda mask) by means of the
same 3D scanner used for Sect. 5.3.4 [91]. The dataset considered contains 22,000
points. The experiments have been carried out choosing randomly 18,000 points for
training, 2,000 for validation, and 2,000 for testing. The coordinates of the points
have been scaled in the range [−1, 1]. Similarly to Sect.5.3.4, only points that lie far
from the boundary by more than 0.1 are considered for validation and test error as
the border regions are generally characterized by a larger reconstruction error due
to the lack of points.
134 6 Hierarchical Support Vector Regression
0
0 0.5 1 1.5 2
x
b HSVR with data reduction (ε = 0.075)
3
2.5
f̂(x) 1.5
0.5
0
0 0.5 1 1.5 2
x
Table 6.1 The reconstruction accuracy on the synthetic dataset of the models in
Figs. 6.14 and 6.15
Errmean Errstd RMSE #SVs Time [s]
HSVR 0.0282 0.0262 0.0385 1545 0.308
HSVR (red.) 0.0313 0.0338 0.0460 243 0.382
SVR 0.0816 0.167 0.186 149 0.451
Published in [31] (reproduced by permission of IEEE).
Different surfaces have been created through SVR, using all the possible
combinations of the following values of the hyperparameters, ε, σ, and J:
0.05
0
0 0.05 0.1 0.15 0.2
e
b 2500
HSVR
HSVR red
2000 SVR
1500
#SV
1000
500
0
0 0.05 0.1 0.15 0.2
e
C = J std(y) (6.60)
while only the different combinations of ε and J where considered to configure the
HSVR model.
The surfaces associated to the lowest test error are shown in Fig. 6.20a–c.
However, they have not the best appearance because of the presence of residual
noise. Better surfaces can be obtained for the HSVR models, discarding some of the
last layers, Fig. 6.20e–f. This is possible thanks to the hierarchical structure of the
HSVR models and cannot be applied to SVR model. For SVR model we analyzed
136 6 Hierarchical Support Vector Regression
a1(x) 2
1
a5(x) 0
−1
1
a9(x)
0
Fig. 6.17 Reconstruction operated by the 1-th, 5-th and 9-th layer of the HSVR model when
all the data points are considered (dashed line) and when only the points in Sl are considered
(continuous line). The residual of each layer (i.e., the training points for that layer) are reported as
dots. A small difference between the solution obtained with and without data point reduction can
be observed. This difference becomes smaller and smaller and in the last layer, the two curves are
almost coincident. Originally published in [31] (reproduced by permission of IEEE)
one by one by hands all the surfaces configured and chose the one with the best
appearance, that is reported in Fig. 6.20.
The test error and the number of SV with respect to ε for the three models is
reported in Fig. 6.21 for the best, the average and the worst case. These results have
been obtained as the average over five randomizations of the data set. A quantitative
representation of the best case for the three models is reported in Table 6.3. The
best case is very similar from the accuracy point of view. This is true also for all
the best models for each value of ε (Fig. 6.21a). An important difference is clear for
the average and worst case. In fact the averaged (worst) test error is 0.0113 (0.012),
0.0116 (0.015), and 0.0140 (0.019) for respectively HSVR, HSVR with reduction,
and SVR.
Another important difference is the configuration time required to obtain a good
solution. This is 682 s for HSVR, 1,104 s for HSVR with reduction, and 382 s
for SVR for the best case and of 1024 s, 1241 s, and 1093 s on average for all
the surfaces created. However, we explicitly note that we have to consider also the
time required to explore the hyperparameters space besides the configuration time.
In particular, as for the HSVR the σ does not need to be selected in advance, the
hyperparameters space has one dimension less than the SVR case. In practice, we
tried 20 combination of J and ε to find the best case for HSVR without and with
reduction for a total configuration time of 16,845 s and 22,168 s respectively. Instead
6.2 Multi-scale hierarchical SVR 137
HSVR (ε = 0.075)
HSVR
0.5
1−st pass
2−nd pass
Test error (Errmean) 0.4
0.3
0.2
0.1 ε
0
2 4 6 8
# Layer
Fig. 6.18 The test error (Errmean ) of the HSVR models as new layers are inserted. The continuous
line represents the error of the HSVR model with no data reduction. The dashed line represents the
error of the model when data reduction was applied in all the previous layers, but not in the current
one in which all the data points are passed to the optimization procedure (1-st optimization step).
The dot dashed line represents the error of the HSVR model when data reduction is applied to all
the layers included the current one (2-nd optimization step is applied to the configuration of all the
layers). Originally published in [31] (reproduced by permission of IEEE)
for the SVR we tried 80 combinations of J, ε and σ for a total time configuration
time of 87,410 s.
The total number of SVs is another critical aspect of these models. The HSVR
uses a significant large amount of SVs (100,448) with respect to the standard SVR
(Fig. 6.21 and Table 6.3). However, after pruning, the number of SVs drops to
11,351 saving 9.61% of SVs (Table 6.3) with respect to standard SVR.
The HSVR model presented here has been developed for mainly three reasons: solve
the problem connected to the use of a single scale kernel function, allow a multi-
scale surface reconstruction, and speed-up the exploration of the hyperparameters
space.
The standard version of SVR [215] is characterized by the use of a single kernel
function. This has to be carefully chosen to obtain good results for a given data set.
Moreover, poor approximation is still obtained for data with a different frequency
content in different domain regions: Fig. 6.14 and Table 6.1 show that in this case the
SVR approach is not able to compute an acceptable solution. In particular the best
138
a b
Training dataset
0.5
1
0.5
0
0
−0.5
−0.5
−1
−0.5
0
0.5
1 −1
Fig. 6.19 Panel (a) shows the artifact (a Panda mask) that has been digitized obtaining the cloud
of data reported in panel (b). Originally published in [31] (reproduced by permission of IEEE)
kernel function produces rapid oscillations in the smoothest region and it is not able
to reconstruct the high frequency oscillations in the high frequency region. Thanks
to the introduction of multiple scales, HSVR, is able to compute an accurate solution
also in this case. As shown in Fig. 6.17 the smooth region is reconstructed by kernels
with a large scale (first layers) and the high frequency regions are reconstructed by
the kernels with a small scale parameter (last layers).
Moreover, a more accurate surface is obtained by the HSVR model for a wide
range of ε as shown in Figs. 6.16a and 6.21a–c. The drawback of HSVR is the
large amount of SVs used. The reduction procedure can effectively solve this
problem (cf. Fig. 6.18), allowing a similar accuracy with a large saving in the
number of SVs. This means that a large amount of SVs are not actually needed
for the reconstruction; these are the points that lie far from the actual surface. We
also remark that the reduction procedure can be applied because the hierarchical
structure is able to recover in the next layers the small error introduced in one layer
by this procedure as shown in Fig. 6.18.
The aim of the reduction strategy is the optimization of the trade-off between the
number of points discarded and the increase of the error. The experimental results
have shown that the rule described by (6.52) allows obtaining good results. By a
closer analysis (6.13), the points passed to second configuration step can be be
grouped in two kinds: the most internal to the ε-tube are used to constraint the
function close to the target but are not SVs, those that lie on the border of the ε-
140 6 Hierarchical Support Vector Regression
a b c
1 1 1
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
5 -0.2 5 -0.2 5 -0.2
0 -0.4 0 -0.4 0 -0.4
5 5 5
-0.6 -0.6 -0.6
-1 -0.8 -1 -0.8 -1 -0.8
-0.5 -0.5 -0.5
0 -1 0 0 -1
0.5 0.5 -1 0.5
1 1 1
d e f
1 1 1
0.8 0.8 0.8
0.6 0.6 0.6
0.4 0.4 0.4
0.2 0.2 0.2
0 0 0
5 -0.2 5 -0.2 5 -0.2
0 -0.4 0 -0.4 0 -0.4
5 5 5
-0.6 -0.6 -0.6
-1 -0.8 -1 -0.8 -1 -0.8
-0.5 -0.5 -0.5
0 -1 0 -1 0 -1
0.5 0.5 0.5
1 1 1
Fig. 6.20 Panels (a), (b), and (c) show the surfaces that determine the lowest test error for SVR,
HSVR and HSVR with reduction. The parameter used were respectively J = 0.5, ε = 0.005 and
σ = 0.0469 for SVR, J = 0.5 and ε = 0.01 for HSVR, and J = 1 and ε = 0.01 for HSVR with
reduction. Although these surfaces are optimal in terms of the test error, their visual appearance
is not of good quality. A better result is shown in panels (d), (e), and (f), for SVR, HSVR, and
HSVR with reduction, respectively. In (d) the surface obtained through SVR with a suboptimal set
of parameters (J = 5, ε = 0.005, and σ = 0.0938) is shown. In panels (e) and (f) the surface
from the same HSVR models for (b) and (c) are used, but discarding some of the last layers (one
of seven for (e) and three of ten for (f)). Originally published in [31] (reproduced by permission of
IEEE)
tube, are SVs and support the surface. In the second step, a subset of both types of
points will become the SVs of the current layer.
The reduction strategy produces a more sparse solution with respect to the first
configuration step and the standard configuration strategy and it is particularly
efficient when the input dataset is dense. For sparse dataset the use of this strategy
could turn out not so advantageous. Sparse solutions have been investigated in
several papers. The methods used can be divided in two families. The first family
includes in the optimization problem a strategy to control the number of SVs used in
the solution [109] [135]. The second family computes the solution in the standard
way and then applies a strategy to select which SVs have to be discarded [175].
The strategy proposed here belongs to the second family. In general, the pruning
methods consider very carefully the error introduced eliminating or limiting the
6.2 Multi-scale hierarchical SVR 141
a b
x 104 Best case
Best case 15
0.02 HSVR
HSVR HSVR red
Test error (Errmean)
#SV
0.014 5
0.012
0.01 0
0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02
ε ε
c d
x 104 Average
Average 15
0.02 HSVR
HSVR HSVR red
Test error (Errmean)
0.014 5
0.012
0.01 0
0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02
ε ε
e f
Worst case
Worst case x 104
15
0.02 HSVR
HSVR HSVR red
Test error (Errmean)
0.014 5
0.012
0.01 0
0 0.005 0.01 0.015 0.02 0 0.005 0.01 0.015 0.02
ε ε
Fig. 6.21 Test error (a, c, e) and number of SVs Lettering in panels (b, d, f) seems smaller than
lettering in panels (a, c, e) used by the SVR and HSVR model for the Panda dataset as a function
of ε are reported. Results for the best, average and worst cases are plotted. For reference, the value
of ε has been reported as a dot-dashed line. Originally published in [31] (reproduced by permission
of IEEE)
number of SVs. The strategy proposed here can relax this constraint because the
error introduced will flow into the residual input to the next layer where it can
142 6 Hierarchical Support Vector Regression
be recovered. Therefore the error introduced by the reduction is not so critical and
the time spent to analyze it can be saved. For this reason this strategy is, generally,
more efficient than the other reduction strategies proposed.
The error introduced by the reduction strategy is shown in Fig. 6.13a, b: the dif-
ference between the functions is due to the absence of some SVs. This is added to
the residual of the layer and it is taken care by the next layer, where it is recovered as
it is evident in Fig. 6.17. Furthermore the error due to the reduction becomes smaller
as new layers are added, in the last layers the difference tends to vanish: comparing
Fig. 6.20 the difference between the surface computed with and without reduction is
not appreciable. This is confirmed also by the test error in Fig. 6.21. The drawback
of the reduction strategy is the increase of the computational time due to the second
step of configuration (Table 6.3). For the Panda dataset the time increases of 61.9%
and the number of SVs decrease of 88.7%. Furthermore the HSVR with reduction
compares well with standard SVR from the points of view of computational time
and number of SVs.
Computational time and number of SVs, as well as the accuracy, are strongly
related to the number of layers of the model. The strategy chosen to stop the
growth of the HSVR model considered here is based on monitoring the validation
error: when it does not decrease anymore the configuration procedure is stopped.
Other criteria are possible. For example, if the level of noise on the data is known,
configuration could be stopped when the training error goes under the noise level
over the entire input domain [91]. Alternatively, the visual appearance of the surface
could be monitored by the user, who can decide when stopping the configuration
procedure. This would not be possible with the classical SVR approach.
The hyperparameters choice remains a critical aspect of all SVR approaches.
In practice, the common strategy is based on trial and error. This means that
the exploration of the hyperparameters space, constituted by C, σ and ε, is time
consuming. HSVR allows reducing the dimensionality of the search space to two
dimensions as optimal value of σ has not to be searched. Starting with a large σ and
halving its value for each new layer generally assures that a scale locally adequate
to the data will be automatically found (Fig. 6.17). Moreover, C can be set through a
secondary parameter, J (6.51) that depends on the range of the output data. We have
experimentally found that low values of J (in the range (0, 5]) are usually enough to
obtain accurate results. Furthermore, as Fig. 6.21a–c shown that the HSVR approach
is robust also with respect to different values of J as best, average and worst cases
produce similar results.
Chapter 7
Conclusion
We review here the main characteristics of the HRBF and HSVR models. Pos-
sible future developments based on parallelization and GPU implementation are
described.
The HRBF, like all the local approaches, has the advantage that being based on
local fitting it does not need iterative configuration procedures. In practice, this
means that the configuration of a HRBF network is less computational demanding
than configuring global models. This is confirmed by data reported in Table 7.1: the
configuration time of a HRBF model is only the 7.2% (4.44%) of the time required
to configure a HSVR without (with) pruning.
7.2 Comparison of the HRBF and HSVR approaches 145
Table 7.1 Comparison of HRBF and HSVR models on the “panda mask”
dataset
Errmean Errstd RMSE #SVs time [s]
HRBF 0.0112 0.0111 0.0158 14 679 49
HSVR [31] 0.0110 0.0115 0.0160 100 448 681
HSVR [31] 0.0112 0.0119 0.0163 11 351 1 104
Partially published in [31] (reproduced by permission of IEEE).
On the other hand, local approaches are more sensitive to outliers, which can
strongly influence the local statistics while they have little influence on a global cost
function, like the one used as training error in (6.31). The local approaches are also
more sensitive to the sparseness of the data. The sparser the data, the fewer are the
elements used to locally fit the surface. This means a less robust estimate. This is
the reason for which, generally, local approaches do not show good performance
with data that lie in spaces of high dimensionality as data sparseness increases
exponentially with the number of space dimensions. The SVR (and HSVR) model
has the advantage that the computational complexity of the configuration phase
does not depend on the number of dimensions of the input space as this is hidden
by the use of the kernel function. Moreover, in the 3D case, the influence region
of each Gaussian of HRBF is assumed a square (close neighborhood, Sect. 5.3).
Only the points that lie in this region are used to estimate the unit’s weight. The
approximation of the neighborhood of a point with a hypercube becomes less and
less accurate with the increase of data dimensionality and many points further
from the Gaussian center than points in its influence region do participate in the
computation of the Gaussian’s weight.
In the 3D space with sufficiently dense data sets, both methods reach the same
level of accuracy (cf. Table 7.1). Even though the test error is very similar for
these two approaches, the visual appearance of the surface computed by the HRBF
model is better than that computed by HSVR (Fig. 7.1). This is due to the strategy
used for placing the units. In the HSVR approach the position of the units is
determined by the position of the training points, while in HRBF the units are
placed on regularly spaced grid for each layer. Hence, in HRBF the input space
is covered uniformly by the units, and this, generally, improves the smoothness of
the reconstruction.
The position of the units of the different layers that contribute to the reconstruc-
tion of the function defined in (6.46) and shown in Fig. 6.12a is shown in Fig. 7.2a
for HRBF and in Fig. 7.2b for HSVR.
As can be seen the SVs are somewhat more sparse on the input domain in all
the layers, although their density increases on the right (high frequency) with the
number of layers. HRBF model appears instead more selective and Gaussians at
smaller scales are added only in the regions on the right, those of higher frequency
content.
146 7 Conclusion
Fig. 7.1 (a) The surface computed by the HRBF model. (b) The surface computed by the HSVR
model [31] (reproduced by permission of IEEE)
When a new data point is added to a data set, the HSVR model should be
reconfigured from scratch. To avoid this and to allow a continuous adaptation to new
data points of the already configured model, on-line configuration is required. This
modality has been explored for SVM models used for classification [73][223]. More
recently on-line configuration of SVR models has also been developed [156][258]:
the weight associated to a new point is computed in a finite number of discrete steps
until it meets the KKT conditions, while ensuring that the existing data continue to
satisfy the KKT conditions at each step. However, once the first HSVR layer has
been modified, all the next layers have to be reconfigured as the update of the first
layer implies changing the residual used to configure all the next layers.
#Layer
5
4
3
2
1
0 0.5 1 1.5 2
x
b
SV’s position
9
8
7
6
#Layer
5
4
3
2
1
0 0.5 1 1.5 2
x
Such architectures are a good match for both HRBF and HSVR models.
Moreover, GPUs can be used also for rendering in real-time the surface computed
by the models. The HRBF configuration procedure, both on-line and batch, is
characterized by local operations, which can be performed in parallel over different
units. The configuration procedure of the SVR consists of minimizing a convex
problem defined on the coefficients space of the input points. This process is
generally realized using a Sequential Minimal Optimization (SMO) that performs
optimization by considering two coefficients at a time and by freezing the others
[72]. Even though this formulation is not suited for parallel hardware, some
variants of the SMO algorithm have been proposed for parallel hardware [56][58].
For instance, the solution presented in [56] has been realized on CUDA architecture
and shows a speed-up of 10 to 70 times over libSVM. This would allow a very short
configuration time.
Very recently GPUs have been introduced in smart phones: new Nvidia Tegra
processor line integrates an ARM CPU, a GPU, a memory controller and the bus
controllers, northbridge and southbridge, onto one card delivering high performance
with low power consumption to make these architectures particularly suitable to
mobile computing [10]. This enables the users not only to get 2D images and videos,
but also to acquire 3D models directly on their portable systems. Moreover, since
devices for reproducing 3D content are becoming widely available to the consumer
market, compact and easy-to-use devices for capturing 3D contents are likely to be
proposed soon. Hence, besides current 3D digital cameras which capture only a pair
of displaced images, it can be envisioned that the miniaturization of components
such as CCD sensors or pico-projectors will be exploited for implementing very
small, point-and-click active optical devices for 3D scanning.
Glossary
Octree An octree is a tree data structure where each node has eight children. Each
node of the octree represents a cube in the three-dimensional space. Each child
represents an eighth of the volume of the father node. This data structure is used for
the search of points belonging to a region of the space, since it allows a logarithmic
access time/cost to the data of the set.
Voronoi diagram The Voronoi diagram of a set of points P ⊂ RD is a partition
in regions of the space determined by P . For each p ∈ P , the corresponding region
is determined as the collection of the points in RD that are closer to p than to any
other point in P .
References
20. Aloimonos, Y.: Detection of surface orientation from texture I: the case of plane. In:
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp.
584–593 (1986)
21. Alpern, B., Carter, L.: The hyperbox. In: Proceedings of the IEEE Conference on
Visualization, pp. 133–139, 418 (1991)
22. Atkenson, G.C., Moore, A.W., Schaal, S.: Locally weighted learning. Artificial Intelligence
Review 11, 11–73 (1997)
23. Baader, A., Hirzinger, G.: A self-organizing algorithm for multisensory surface reconstruc-
tion. In: Proceedings of the IEEE/RSJ/GI International Conference on Intelligent Robots and
Systems, vol. 1, pp. 81–89 (1994)
24. Bahlmann, C., Haasdonk, B., Burkhardt, H.: On-line handwriting recognition with support
vector machines — a kernel approach. In: Proceedings of the 8th International Workshop on
Frontiers in Handwriting Recognition, pp. 49–54 (2002)
25. Bajaj, C.L., Bernardini, F., Xu, G.: Automatic reconstruction of surfaces and scalar fields
from 3D scans. In: Proceedings of the ACM SIGGRAPH Conference on Computer graphics
and interactive techniques, pp. 109–118 (1995)
26. Barhak, J., Fischer, A.: Parameterization and reconstruction from 3D scattered points based
on neural network and PDE techniques. IEEE Transactions on Visualization and Computer
Graphics 7(1), 1–16 (2001)
27. Batlle, J., Mouaddib, E., Salvi, J.: Recent progress in coded structured light as a technique to
solve the correspondence problem: a survey. Pattern Recognition 31(7), 963–982 (1998)
28. Belhumeur, P.N., Kriegman, D.J., Yuille, A.L.: The bas-relief ambiguity. International Journal
of Computer Vision pp. 33–44 (1999)
29. Bellocchio, F., Borghese, N.A., Ferrari, S., Piuri, V.: Kernel regression in HRBF networks for
surface reconstruction. In: Proceedings of the 2008 IEEE International Workshop on Haptic
Audio and Visual Environments and Games, pp. 160–165 (2008)
30. Bellocchio, F., Ferrari, S.: Depth Map and 3D Imaging Applications: Algorithms and
Technologies, chap. 3D Scanner, State of the Art, pp. 451–470. IGI Global (2011)
31. Bellocchio, F., Ferrari, S., Piuri, V., Borghese, N.A.: A hierarchical approach for multi-scale
support vector regression. IEEE Transactions on Neural Networks and Learning Systems
23(9), 1448–1460 (2012)
32. Bellocchio, F., Ferrari, S., Piuri, V., Borghese, N.: Online training of hierarchical RBF. In:
Proceedings of the 2007 IEEE-INNS International Joint Conference on Neural Networks, pp.
2159–2164 (2007)
33. Berg, A.B.: Locating global minima in optimisation problems by a random-cost approach.
Nature 361, 708–710 (1993)
34. Bernardini, F., Bajaj, C.L., Chen, J., Schikore, D.R.: Automatic reconstruction of 3D
CAD models from digital scans. International Journal of Computational Geometry and
Applications 9(4-5), 327–370 (1999)
35. Besl, P., McKay, N.: A method for registration of 3-D shapes. IEEE Transactions on Pattern
Analysis and Machine Intelligence 17(8) (1992)
36. Billings, S.A., Zheng, G.L.: Radial basis function network configuration using genetic
algorithms. Neural Networks 8(6), 877–890 (1995)
37. Bishop, C.M.: Neural networks for pattern recognition. Oxford University Press (1995)
38. Bittar, E., Tsingos, N., Gascuel, M.P.: Automatic reconstruction of unstructured 3D data:
Combining medial axis and implicit surfaces. Computer Graphics Forum 14(3), 457–468
(1995)
39. Blanz, V., Vetter, T.: Face recognition based on fitting a 3D morphable model. IEEE
Transactions on Pattern Analysis and Machine Intelligence 25(9), 1063– 1074 (2003)
40. Boissonat, J.D.: Geometric structures for three-dimensional shape representation. ACM
Transactions on Graphics 3(4), 266–286 (1984)
41. Borghese, N., Cerveri, P.: Calibrating a video camera pair with a rigid bar. Pattern Recognition
33(1), 81–95 (2000)
References 153
42. Borghese, N., Ferrari, S., Piuri, V.: A methodology for surface reconstruction based on
hierarchical models. In: Proceedings of the 2003 IEEE International Workshop on Haptic
Virtual Environments and Their Applications, pp. 119–124 (2003)
43. Borghese, N., Ferrari, S., Piuri, V.: Real-time surface meshing through HRBF networks. In:
Proceedings of 2003 IEEE-INNS-ENNS International Joint Conference of Neural Networks,
vol. 2, pp. 1361–1366 (2003)
44. Borghese, N., Ferrigno, G.: An algorithm for 3-D automatic movement detection by means
of standard TV cameras. IEEE Transactions on Biomedical Engineering 37(12), 1221–1225
(1990)
45. Borghese, N.A., Colombo, F.M., Alzati, A.: Computing camera focal length by zooming a
single point. Pattern Recognition 39(8), 1522 – 1529 (2006)
46. Borghese, N.A., Ferrari, S.: Hierarchical RBF networks and local parameter estimate.
Neurocomputing 19(1–3), 259–283 (1998)
47. Borghese, N.A., Ferrari, S.: A portable modular system for automatic acquisition of 3D
objects. IEEE Transactions on Instrumentation and Measurement 49(5), 1128–1136 (2000)
48. Borghese, N.A., Ferrigno, G., Baroni, G., Ferrari, S., Savaré, R., Pedotti, A.: AUTOSCAN:
a flexible and portable 3D scanner. IEEE Computer Graphics and Applications 18(3), 38–41
(1998)
49. Breiman, L., Friedman, J.H., Olshen, R.A., Stone, C.J.: Classification and regression trees.
Wadsworth (1984)
50. Brooks, M.: Two results concerning ambiguity in shape from shading. In: Proceedings of the
AAAI Third National Conference on Artificial Intelligence, pp. 36–39 (1983)
51. Broomhead, D.S., Lowe, D.: Multivariable functional interpolation and adaptive networks.
Complex Systems 2, 321–355 (1988)
52. Broyden, C.G.: The convergence of a class of double-rank minimization algorithms. Journal
of the Institute of Mathematics and Its Applications 6, 76–90 (1970)
53. Burges, B.S.C., Vapnik, V.: Extracting support data for a given task. In: Proceedings of the
First International Conference on Knowledge Discovery and Data Mining. AAAI Press (1995)
54. Burges, B.S.C., Vapnik, V.: Incorporating invariances in support vector learning machines.
In: Proceedings of the International Conference on Artificial Neural Networks, vol. 1112, pp.
47–52 (1996)
55. Cai, Z.: Weighted Nadaraya-Watson regression estimation. In: Statistics and Probability
Letters, pp. 307–318 (2001)
56. Carpenter, A.: cuSVM: a CUDA implementation of support vector classification and regres-
sion (2009). URL http://patternsonascreen.net/cuSVM.html
57. Catalan, R.B., Perez, E.I., Perez, B.Z.: Evaluation of 3D scanners to develop virtual reality
applications. In: Proceedings of the Fourth Congress of Electronics, Robotics and Automotive
Mechanics, pp. 551–556 (2007)
58. Catanzaro, B., Sundaram, N., Keutzer, K.: Fast support vector machine training and classifica-
tion on graphics processors. In: Proceedings of the 25th International Conference on Machine
learning, pp. 104–111. ACM (2008)
59. Catapano, I., Crocco, L., Krellmann, Y., Triltzsch, G., Soldovieri, F.: A tomographic approach
for helicopter-borne ground penetrating radar imaging. Geoscience and Remote Sensing
Letters, IEEE 9(3), 378–382 (2012)
60. Catmull, E., Clark, J.: Recursively generated B-spline surfaces on arbitrary topological
meshes. Computer-Aided Design 10(6), 350–355 (1978)
61. Cerveri, P., Ferrari, S., Borghese, N.A.: Calibration of TV cameras through RBF networks.
In: Proceedings of SPIE Conference on Applications of Soft Computing, pp. 312–318 (1997)
62. Cetin, B., Barhen, J., Burdick, J.: Terminal repeller unconstrained subenergy tunnelling (trust)
for fast global optimization. Journal of Optimization Theory and Application 77(1), 97–
126 (1993)
63. Chan, Y.T.: Wavelet Basics. Kluwer Academic Publishers (1995)
64. Chen, S., Hong, X., Harris, C.J.: Sparse kernel density construction using orthogonal forward
regression with leave-one-out test score and local regularization. IEEE Transactions on
System, Man, and Cybernetics, Part B: Cybernetics 34(4), 1708–1717 (2004)
154 References
65. Clark, P., Niblett, T.: The CN2 induction algorithm. Machine Learning 3, 261–283 (1989)
66. Colombo, C., Bimbo, A.D., Del, A., Pernici, F.: Metric 3D reconstruction and texture
acquisition of surfaces of revolution from a single uncalibrated view. IEEE Transactions
on Pattern Analysis and Machine Intelligence 27, 99–114 (2005)
67. Colombo, C., Bimbo, A.D., Persini, F.: Metric 3D reconstruction and texture acquisition of
surfaces of revolution from a single uncalibrated view. IEEE Transactions on Pattern Analysis
and Machine Intelligence pp. 99–114 (2005)
68. Cordier, F., Seo, H., Magnenat-Thalmann, N.: Made-to-measure technologies for an online
clothing store. IEEE Computer Graphics and Applications 23(1), 38–48 (2003)
69. Costin, M., Ignat, A., Baltag, O., Bejinariu, S., Stefanescu, C., Rotaru, F., Costandache, D.: 3D
breast shape reconstruction for a non-invasive early cancer diagnosis system. In: Proceedings
of the 2nd International Workshop on Soft Computing Applications, pp. 45–50 (2007)
70. Cover, T., Hart, P.E.: Nearest neighbor pattern classification. IEEE Transactions on
Information Theory pp. 21–27 (1967)
71. Craven, P., Wahba, G.: Smoothing noisy data with spline functions. Estimating the correct
degree of smoothing by the method of generalized cross-validation. Numerische Mathematik
31(4), 377–403 (1978/79)
72. Cristianini, N., Shawe-Taylor, J.: An introduction to support vector machines and other
kernel-based learning methods. Cambridge University Press (2000)
73. Csato, L., Opper, M.: Sparse representation for gaussian process models. Advances in neural
information processing systems 13, 444–450 (2001)
74. Dai, L.: The 3D digital technology of fashion design. In: Proceedings of the International
Symposium on Computer Science and Society, pp. 178–180 (2011)
75. Dasarathy, B.: Nearest Neighbor (NN) norms: NN pattern classification techniques. IEEE
Press (1991)
76. Daubechies, I.: Orthonormal bases of compactly supported wavelets. Communications on
Pure and Applied Mathematics 41, 909–996 (1988)
77. Daubechies, I.: Ten lectures on Wavelets. SIAM (1992)
78. Debevec, P.E., Taylor, C.J., Malik, J.: Modeling and rendering architecture from photographs:
A hybrid geometry- and image-based approach. In: Proceedings of the ACM SIGGRAPH
Conference on Computer graphics and interactive techniques, pp. 11–20 (1996)
79. Dick, C., Burgkart, R., Westermann, R.: Distance visualization for interactive 3D implant
planning. IEEE Transactions on Visualization and Computer Graphics 17(12), 2173–2182
(2011)
80. Donoho, D.L., Johnstone, I.M.: Ideal spatial adaptation by wavelet shrinkage. Biometrika
(1993)
81. Dorai, C., Wang, G., Jain, A.K., Mercer, C.: Registration and integration of multiple object
views for 3D model construction. IEEE Transactions on Pattern Analysis and Machine
Intelligence 20(1), 83–89 (1998)
82. Dorn, W.S.: Duality in quadratic programming. Quarterly of Applied Mathematics 18(2),
155–162 (1960)
83. Dreyfus, G.: Neural networks: methodology and applications. Springer (2005)
84. Drucker, H., Burges, C., Smola, L.K.A., Vapnik, V.: Support vector regression machines pp.
151–161 (1997)
85. Edelsbrunner, H., Mucke, E.P.: Three-dimensional alpha shapes. ACM Transactions on
Graphics 13(1), 43–72 (1994)
86. English, C., Zhu, S., Smith, C., Ruel, S., Christie, I.: Tridar: A hybrid sensor for exploiting
the complimentary nature of triangulation and LIDAR technologies. In: B. Battrick (ed.)
Proceedings of the 8th International Symposium on Artificial Intelligence, Robotics and
Automation in Space (2005)
87. Evans, F., Skiena, S., Varshney, A.: Optimizing triangle strips for fast rendering. In:
Proceedings of the IEEE Visualization Conference, vol. 2, pp. 319–326 (1996)
88. Feldkamp, L.A., Davis, L.C., Kress, J.: Practical conebeam algorithm. Journal of the Optical
Society of America A: Optics, Image Science, and Vision 1, 612–619 (1984)
References 155
89. Ferrari, S., Bellocchio, F., Borghese, N., Piuri, V.: Refining hierarchical radial basis function
networks. In: Proceedings of the 2007 IEEE International Workshop on Haptic Audio and
Visual Environments and Games, pp. 166–170 (2007)
90. Ferrari, S., Bellocchio, F., Piuri, V., Borghese, N.: Multi-scale support vector regression. In:
Proceedings of the 2010 IEEE-INNS International Joint Conference on Neural Networks, pp.
2159–2164 (2010)
91. Ferrari, S., Bellocchio, F., Piuri, V., Borghese, N.A.: A hierarchical RBF online learning
algorithm for real-time 3-D scanner. IEEE Transactions on Neural Networks 21(2), 275–285
(2010)
92. Ferrari, S., Borghese, N.A., Piuri, V.: Multi-resolution models for data processing: an
experimental sensitivity analysis. In: Proceedings of the 2000 IEEE Instrumentation and
Measurement Technology Conference, vol. 2, pp. 1056–1060 (2000)
93. Ferrari, S., Borghese, N.A., Piuri, V.: Multiscale models for data processing: an experi-
mental sensitivity analysis. IEEE Transactions on Instrumentation and Measurement 50(4),
995–1002 (2001)
94. Ferrari, S., Ferrigno, G., Piuri, V., Borghese, N.A.: Reducing and filtering point clouds with
enhanced vector quantization. IEEE Transactions Neural Networks 18(1), 161–177 (2007)
95. Ferrari, S., Frosio, I., Piuri, V., Borghese, N.: The accuracy of the HRBF networks. In:
Proceedings of 2004 IEEE Instrumentation and Measurement Technology Conference, pp.
482–486 (2004)
96. Ferrari, S., Frosio, I., Piuri, V., Borghese, N.A.: Automatic multiscale meshing through HRBF
networks. IEEE Transactions on Instrumentation and Measurement 54(4), 1463–1470 (2005)
97. Ferrari, S., Maggioni, M., Borghese, N.A.: Multi-scale approximation with hierarchical radial
basis functions networks. IEEE Transactions on Neural Networks 15(1), 178–188 (2004)
98. Ford, C., Etter, D.: Wavelet basis reconstruction of nonuniformly sampled data. IEEE
Transactions on Circuits and Systems II: Analog and Digital Signal Processing 45(8),
1165–1168 (1998)
99. Forsey, D.R., Bartels, R.H.: Hierarchical B-spline refinement. ACM SIGGRAPH Computer
Graphaphics Newsletter 22(4), 205–212 (1988)
100. Forsey, D.R., Wong, D.: Multiresolution surface reconstruction for hierarchical B-splines.
Graphics Interface pp. 57–64 (1998)
101. Fougerolle, Y., Gribok, A., Foufou, S., Truchetet, F., Abidi, M.: Boolean operations with
implicit and parametric representation of primitives using r-functions. IEEE Transactions on
Visualization and Computer Graphics 11(5), 529–539 (2005)
102. Friedman, J.: Multivariate adaptive regression splines. Annals of Statistics 19(1), 1–141
(1991)
103. Friedman, J.H., Stuetzle, W.: Projection pursuit regression. Journal of the American Statistical
Association 76, 817–823 (1981)
104. Fritzke, B.: Growing cell structures — A self-organizing network for unsupervised and
supervised learning. Neural Networks 7(9), 1441–1460 (1994)
105. Fritzke, B.: Growing grid — A self-organizing network with constant neighborhood range
and adaptation strength. Neural Processing Letters 2(5), 9–13 (1995)
106. Frosio, I., Alzati, A., Bertolini, M., Turrini, C., Borghese, N.A.: Linear pose estimate from
corresponding conics. Pattern Recognition 45(12), 4169–4181 (2012)
107. Frosio, I., Borghese, N.A.: Optimized algebraic local tomography. In: Proceedings of the
GIRPR Conference (2010)
108. Frosio, I., Mainetti, R., Palazzo, S., Borghese, N.A.: Robust kinect for rehabilitation. In:
Proceedings of the GIRPR Conference (2012)
109. Fung, G.M., Mangasarian, O.L., Smola, A.J.: Minimal kernel classiers. Journal of Machine
Learning Research 3, 2303–321 (2002)
110. Gambino, M.C., Fontana, R., Gianfrate, G., Greco, M., Marras, L., Materazzi, M., Pampaloni,
E., Pezzati, L.: A 3D scanning device for architectural relieves based on Time-Of-Flight
technology. Springer (2005)
156 References
111. Geman, S., Bienenstock, E., Doursat, R.: Neural networks and the bias/variance dilemma.
Neural Computation 4(1), 1–58 (1992)
112. Girosi, F., Jones, M., Poggio, T.: Regularization theory and neural networks architectures.
Neural Computation 7(2), 219–269 (1995)
113. Gorse, D., Sheperd, A., Taylor, J.: Avoiding local minima by a classical range expansion
algorithm. In: Proceedings of the International Conference on Artificial Neural Networks
(1994)
114. Hall, P.: On projection pursuit regression. Annals of Statistics 17(2), 573–588 (1989)
115. Harman, P.: Home based 3D entertainment-an overview. In: Proceedings of the International
Conference on Image Processing, pp. 1–4 vol.1 (2000)
116. Hartley, R.I., Zisserman, A.: Multiple View Geometry in Computer Vision, second edn.
Cambridge University Press (2004)
117. Hasenjäger, M., Ritter, H.: New learning paradigms in soft computing. In: Active learning in
neural networks, pp. 137–169. Physica-Verlag GmbH (2002)
118. Heckbert, P.S., Garland, M.: Optimal triangulation and quadric-based surface simplification.
Computational Geometry 14, 49–65 (1998)
119. Heikkinen, V., Kassamakov, I., Haggstrom, E., Lehto, S., Kiljunen, J., Reinikainen, T.,
Aaltonen, J.: Scanning white light interferometry, x2014; a new 3D forensics tool. In:
Proceedings of the 2011 IEEE International Conference on Technologies for Homeland
Security, pp. 332–337 (2011)
120. Hernandez, C., Vogiatzis, G., Cipolla, R.: Multiview photometric stereo. IEEE Transactions
on Pattern Analysis and Machine Intelligence 30(3), 548–554 (2008)
121. Hertz, J., Krogh, A., Palmer, R.G.: An introduction to the theory of neural computation.
Addison Wesley (1991)
122. Higo, T., Matsushita, Y., Joshi, N., Ikeuchi, K.: A hand-held photometric stereo camera for
3D modeling. In: Proceedings of the IEEE International Conference on Computer Vision, pp.
1234–1241 (2009)
123. Hoppe, H.: Surface reconstruction from unorganized points. PhD Thesis, Dept. of Computer
Science and Engineering, University of Washington (1994)
124. Hoppe, H., DeRose, T., Duchamp, T., Mc-Donald, J., Stuetzle, W.: Surface reconstruction
from unorganized points. In: Proceedings of the ACM SIGGRAPH Conference on Computer
graphics and interactive techniques), vol. 26, pp. 71–78 (1992)
125. Horn, B.: Obtaining shape from shading information. The Psychology of Computer Vision
(1975)
126. Hornik, K.: Approximation capabilities of multilayer feedforward networks. Neural Networks
4(2), 251–257 (1991)
127. Huang, P., Zhang, S.: Fast three-step phase shifting algorithm. Applied Optics 45(21), 5086–
5091 (2006)
128. Hur, N., Lee, H., Lee, G.S., Lee, S.J., Gotchev, A., Park, S.I.: 3DTV broadcasting and
distribution systems. IEEE Transactions on Broadcasting 57(2), 395–407 (2011)
129. Iddan, G.J., Yahav, G.: 3D imaging in the studio (and elsewhere). In: Proceedings of the SPIE
Conference on Three-Dimensional Image Capture and Applications, pp. 48–55. San Jose, CA
(2001)
130. Idesawa, M., Yatagai, T., Soma, T.: Scanning moiré method and automatic measurement of
3-D shapes. Applied Optics 16(8), 2152–2162 (1977)
131. Isler, V., Wilson, B., Bajcsy, R.: Building a 3D virtual museum of native american baskets.
In: Proceedings of the Third International Symposium on 3D Data Processing, Visualization,
and Transmission, pp. 954–961 (2006)
132. Joachims, T.: Making large-scale SVM learning practical. In: B. Schölkopf, C. Burges,
A. Smola (eds.) Advances in Kernel Methods — Support Vector Learning, chap. 11, pp.
169–184. MIT Press, Cambridge, MA (1999)
133. Katznelson, Y.: An introduction to harmonic analysis. Dover (1976)
134. Kaufman, L., Rousseeuw, P.J.: Finding groups in data — an introduction to cluster analysis.
Wiley (1990)
References 157
135. Keerthi, S.S., Chapelle, O., Coste, D.D.: Building support vector machines with reduced
classifier complexity. Journal of Machine Learning Research 7, 1493–1515 (2006)
136. Kibler, D., Aha, D.W., Albert, M.K.: Instance-based prediction of real-valued attributes.
Computational Intelligence pp. 51–57 (1989)
137. Kirkpatrick, S., Gelatt, C., Vecchi, M.: Optimisation by simulated annealing. Science 220,
671–680 (1983)
138. Klinke, S., Grassmann, J.: Projection pursuit regression and neural network. Tech. rep.,
Humboldt Universitaet Berlin (1998)
139. Kohavi, R.: A study of cross-validation and bootstrap for accuracy estimation and model
selection. In: Proceedings of the Fourteenth International Joint Conference on Artificial
Intelligence, vol. 2, pp. 1137–1143 (1995)
140. Kohonen, T.: Self-Organizing Maps. Springer (1995)
141. Kuhn, H.W., Tucker, A.W.: Nonlinear programming. In: Proceedings of the 2nd Berkeley
Symposium, pp. 481–492. Berkeley: University of California Press (1951)
142. Lanckriet, G., Cristianini, N., Bartlett, P., Ghaoui, L.E., Jordan, M.I.: Learning the kernel
matrix with semi-definite programming. Journal of Machine Learning Research 5, 27–72
(2004)
143. Lee, A., Moreton, H., Hoppe, H.: Displaced subdivision surfaces. In: Proceedings of the ACM
SIGGRAPH Conference on Computer graphics and interactive techniques, pp. 85–94 (2000)
144. Lee, A.W.F., Dobkin, D., Sweldens, W., Schröder, P.: Multiresolution mesh morphing. In:
Proceedings of the 1999 ACM SIGGRAPH Conference on Computer graphics and interactive
techniques, pp. 343–350 (1999)
145. Lee, A.W.F., Sweldens, W., Schroder, P., Cowsar, L., Dobkin, D.: MAPS: Multiresolution
adaptive parameterization of surfaces. In: Proceedings of the ACM SIGGRAPH Conference
on Computer graphics and interactive techniques, pp. 95–104 (1998)
146. Lee, J.-D., Lan, T.-Y., Liu, L.-C., Lee, S.-T., Wu, C.-T., Yang, B.: A remote virtual-surgery
training and teaching system. In: Proceedings of the IEEE 3DTV Conference, pp. 1–4 (2007)
147. Lee, S., Wolberg, G., Shin, S.Y.: Scattered data interpolation with multilevel B-splines. IEEE
Transactions on Visualization and Computer Graphics 3(3), 228–244 (1997)
148. Levenberg, K.: A method for the solution of certain non-linear problems in least squares. The
Quarterly of Applied Mathematics 2, 164–168 (1944)
149. Levin, A., Fergus, R., Durand, F., Freeman, W.T.: Image and depth from a conventional
camera with a coded aperture. ACM Transactions on Graphics 26(3), 701–709 (2007)
150. Levoy, M., Rusinkiewicz, S., Ginzton, M., Ginsberg, J., Pulli, K., Koller, D., Anderson, S.,
Shade, J., Curless, B., Pereira, L., David, J., Fulk, D.: The Digital Michelangelo project: 3D
scanning of large statues. In: Proceedings of the 2000 ACM SIGGRAPH Conference on
Computer graphics and interactive techniques (2000)
151. Levy, R., Dawson, P.: Reconstructing a Thule whalebone house using 3D imaging. IEEE
Multimedia 13, 78–83 (2006)
152. Liang, X.: An effective method of pruning support vector machine classifiers. IEEE
Transactions on Neural Networks 21(1), 26–38 (2010)
153. Lin, Y., Song, M., Quynh, D.T.P., He, Y., Chen, C.: Sparse coding for flexible, robust 3D
facial-expression synthesis. IEEE Computer Graphics and Applications 32(2), 76–88 (2012)
154. Lorensen, W.E., Cline, H.E.: Marching cubes: a high resolution 3D surface construction
algorithm. ACM SIGGRAPH Computer Graphaphics Newsletter 21(4), 163–169 (1987)
155. Luebke, D., Humphreys, G.: How GPUs work. IEEE Computer (2007)
156. Ma, J., Theiler, J., Perkins, S.: Accurate on-line support vector regression. Neural Computa-
tion 15, 2683–2703 (2003)
157. Mallat, S.: A theory for multiscale signal decomposition: The wavelet representation. IEEE
Transactions on Pattern and Machine Intelligence 11(7), 674–693 (1989)
158. Marquardt, D.: An algorithm for least-squares estimation of nonlinear parameters. SIAM
Journal on Applied Mathematics 11, 431–441 (1963)
159. Martinetz, T., Berkovich, S., Schulten, K.: Neural-gas network for vector quantization and its
application to time-series prediction. IEEE Transactions on Neural Networks 4(4), 558–568
(1993)
158 References
160. Martı́nez, J.I.M.: Best approximation of gaussian neural networks with nodes uniformly
spaced. IEEE Transactions on Neural Networks 19(2), 284–298 (2008)
161. Mckinley, T.J., McWaters, M., Jain, V.K.: 3D reconstruction from a stereo pair without the
knowledge of intrinsic or extrinsic parameters. In: Proceedings of the Second International
Workshop on Digital and Computational Video, pp. 148–155 (2001)
162. de Medeiros Brito, A., Doria Neto, A., Dantas de Melo, J., Garcia Goncalves, L.: An adaptive
learning approach for 3-D surface reconstruction from point clouds. IEEE Transactions on
Neural Networks 19(6), 1130–1140 (2008)
163. Medioni, G., Choi, J., Kuo, C.H., Choudhury, A., Zhang, L., Fidaleo, D.: Non-cooperative
persons identification at a distance with 3D face modeling. In: Proceedings of the First IEEE
International Conference on Biometrics: Theory, Applications, and Systems, pp. 1–6 (2007)
164. Mencl, R., Muller, H.: Interpolation and approximation of surfaces from three-dimensional
scattered data points. In: Proceedings of the Eurographics 98 Conference, pp. 51–67 (1998)
165. Mencl, R., Müller, H.: Interpolation and approximation of surfaces from three-dimensional
scattered data points. In: Proceedings of the Dagstuhl Conference on Scientific Visualization,
pp. 223–232 (1999)
166. Meng, Q., Lee, M.: Error-driven active learning in growing radial basis function networks for
early robot learning. Neurocomputing 71(7-9), 1449–1461 (2008)
167. Michalsky, R., Mozetic, I., Hong, J., Lavrac, N.: The multi-purpose incremental learning
system, AQ15 and its testing application to three medical domains. In: Proceedings of the
AAAI Sixth National Conference on Artificial Intelligence, pp. 1041–1045 (1986)
168. Miller, J.V., Breen, D.E., Lorensen, W.E., O’Bara, R.M., Wozny, M.J.: Geometrically
deformed models: a method for extracting closed geometric models form volume data. ACM
SIGGRAPH Computer Graphaphics Newsletter 25(4), 217–226 (1991)
169. Moody, J., Darken, C.J.: Fast learning in networks of locally-tuned processing units. Neural
Computation 1(2), 281–294 (1989)
170. Morcin, F., Garcia, N.: Hierarchical coding of 3D models with subdivision surfaces. In:
Proceedings of the IEEE International Conference on Image Processing, vol. 2, pp. 911–914
(2000)
171. Müller, K.R., Smola, A., Rätsch, G., Schölkopf, B., Kohlmorgen, J., Vapnik, V.: Predicting
time series with support vector machines. In: Proceedings of the International Conference on
Artificial Neural Networks, vol. 1327, pp. 999–1004 (1997)
172. Narendra, K.S., Parthasarathy, K.: Gradient methods for the optimization of dynamical
systems containing neural networks. IEEE Transactions on Neural Networks 2(2), 252–262
(1992)
173. Narendra, K.S., Thathachar, M.A.L.: Learning automata — an introduction. Prentice Hall
(1989)
174. Newcombe, R.A., Davison, A.J., Izadi, S., Kohli, P., Hilliges, O., Shotton, J., Molyneaux,
D., Hodges, S., Kim, D., Fitzgibbon, A.: KinectFusion: Real-time dense surface mapping
and tracking. In: Proceedings of the 2011 IEEE International Symposium on Mixed and
Augmented Reality, pp. 127–136 (2011)
175. Nguyen, D., Ho, T.: An efficient method for simplifying support vector machines. In:
Proceedings of the 22nd international Conference on Machine learning, pp. 617–624 (2005)
176. Nikolov, S., Bull, D., Canagarajah, C., Halliwell, M., Wells, P.: Image fusion using a 3-
D wavelet transform. In: Proceedings of the Seventh International Conference on Image
Processing And Its Applications, pp. 235–239 (1999)
177. Olhoeft, G.: Applications and frustrations in using ground penetrating radar. Aerospace and
Electronic Systems Magazine, IEEE 17(2), 12–20 (2002)
178. Orr, M.J.L.: Regularization in the selection of radial basis function centers. Neural
Computation 7(3), 606–623 (1993)
179. Owens, J.D., Houston, M., Luebke, D., Green, S., Stone, J.E., Phillips, J.C.: GPU computing.
Proceedings of the IEEE 96, 879–899 (2008)
180. Owens, J.D., Luebke, D., Harris, N.G.M., Kruger, J., Lefohn, A.E., Purcell, T.J.: A survey
of general-purpose computation on graphics hardware. In: Proceedings of the Eurographics
Conference, pp. 21–25 (2005)
References 159
181. Pan, Q., Reitmayr, G., Rosten, E., Drummond, T.: Rapid 3D modelling from live video. In:
Proceedings of the 33rd IEEE MIPRO International Convention, pp. 252–257 (2010)
182. Park, K., Yun, I.D., Lee, S.U.: Automatic 3-D model synthesis from measured range data.
IEEE Transactions on Circuits and Systems for Video Technology 10(2), 293–301 (2000)
183. Pastor, L., Rodriguez, A.: Surface approximation of 3D objects from irregularly sampled
clouds of 3D points using spherical wavelets. In: Proceedings of the International Conference
on Image Analysis and Processing, pp. 70–75 (1999)
184. Perez-Gutierrez, B., Martinez, D., Rojas, O.: Endoscopic endonasal haptic surgery simulator
prototype: A rigid endoscope model. In: Proceedings of the 2010 IEEE Virtual Reality
Conference, pp. 297–298 (2010)
185. Perona, P., Malik, J.: Scale-space and edge detection using anisotropic diffusion. IEEE
Transactions on Pattern Analysis and Machine Intelligence 12(7), 629–639 (1990)
186. Piegel, L., Tiller, W.: The NURBS book. Springer (1997)
187. Pirovano, M., Mainetti, R., Baud-Bovy, G., Lanzi, P.L., Borghese, N.A.: Self-adaptive games
for rehabilitation at home. In: Proceedings of IEEE Conference on Computational Intelligence
and Games. Granada (Spain) (2012)
188. Platt, J.: A resource-allocating network for function interpolation. Neural Computation 3,
213–225 (1991)
189. Poggio, T.: A theory of how the brain might work. Tech. Rep. AIM-1253, Massachusetts
Institute of Technology (1990)
190. Poggio, T., Girosi, F.: Network for approximation and learning. Proceedings of the IEEE 78,
1481–1497 (1990)
191. Potmesil, M.: Generating octree models of 3D objects from their silhouettes in a sequence of
images. Computer Vision, Graphics, and Image Processing 40(1), 1–29 (1987)
192. Praun, E., Hoppe, H., Finkelstein, A.: Robust mesh watermarking. In: Proceedings of the
ACM SIGGRAPH Conference on Computer graphics and interactive techniques, pp. 49–56
(1999)
193. Pulli, K.: Multiview registration for large data sets. In: Proceedings of the Second
International Conference on 3-D Digital Imaging and Modeling, pp. 160–168 (1999)
194. Qi, Y., Cai, S., Yang, S.: 3D modeling, codec and protection in digital museum. In:
Proceedings of the Second Workshop on Digital Media and its Application in Museum
Heritages, pp. 231–236 (2007)
195. Qiu, S., Lane, T.: Multiple kernel learning for support vector regression. Tech. rep., Computer
Science Department, The University of New Mexico, Albuquerque, NM, USA (2005)
196. Quinlan, R.J.: Learning with continuous classes. In: Proceedings of the 5th Australian Joint
Conference on Artificial Intelligence, pp. 343–348. World Scientific, Singapore (1992)
197. Reddy, C.K., Park, J.H.: Multi-resolution boosting for classification and regression problems.
In: Proceedings of the 13th Pacific-Asia Conference on Advances in Knowledge Discovery
and Data Mining, pp. 196–207. Springer-Verlag, Berlin, Heidelberg (2009)
198. Rejtö, L., Walter, G.: Remarks on projection pursuit regression and density estimation.
Stochastic Analysis and Applications 10, 213–222 (1992)
199. Rigotti, C., Borghese, N.A., Ferrari, S., Baroni, G., Ferrigno, G.: Portable and accurate 3D
scanner for breast implants design and reconstructive plastic surgery. In: Proceedings of the
SPIE International Symposium on Medical Imaging, pp. 1558–1567 (1998)
200. Rodriguez-Larena, J., Canal, F., Campo, F.: An optical 3D digitizer for industrial quality
control applications. In: Proceedings of the 1999 IEEE International Conference on Emerging
Technologies and Factory Automation, pp. 1155–1160 (1999)
201. Roosen, C., Hastie, T.: Automatic smoothing spline projection pursuit. Journal of Computa-
tional and Graphical Statistics 3, 235–248 (1994)
202. Rossignac, J.: Edgebreaker: Connectivity compression for triangle meshes. GVU Technical
Report GIT-GVU-98-35 (1998)
203. Roth, G., Wibowoo, E.: An efficient volumetric method for building closed triangular meshes
from 3-D image and point data. In: Graphics Interface, pp. 173–180 (1997)
160 References
204. Rozza, A., Lombardi, G., Rosa, M., Casiraghi, E., Campadelli, P.: IDEA: Intrinsic dimension
estimation algorithm. In: Proceedings of the International Conference on Image Analysis and
Processing, pp. 433–442 (2011)
205. Rumelhart, D.E., Hinton, G.E., Williams, R.J.: Learning internal representations by error
propagation. Parallel Distributed Processing: Explorations in the Microstructure of Cognition
1, 318–362 (1986)
206. Rusinkiewicz, S., Hall-Holt, O., Levoy, M.: Real-time 3D model acquisition. In: Proceedings
of the 2002 ACM SIGGRAPH Conference on Computer graphics and interactive techniques,
pp. 438–446. ACM Press (2002)
207. Sanner, R.M., Slotine, J.E.: Gaussian networks for direct adaptive control. IEEE Transactions
on Neural Networks 3(6), 837–863 (1992)
208. Sansoni, G., Docchio, F.: 3-D optical measurements in the field of cultural heritage: the case
of the Vittoria Alata of Brescia. IEEE Transactions on Instrumentation and Measurement
54(1), 359–368 (2005)
209. Schnars, U., Jptner, W.P.O.: Digital recording and numerical reconstruction of holograms.
Measurement Science and Technology 13(9), R85 (2002)
210. Schreiber, T., Brunnett, G.: Approximating 3D objects from measured points. In: Proceedings
of 30th ISATA Conference (1997)
211. Schroder, P., Sweldens, W.: Spherical wavelets: Efficiently representing functions on the
sphere. In: Proceedings of the ACM SIGGRAPH Conference on Computer graphics and
interactive techniques, pp. 161–172 (1995)
212. Seyama, J., Nagayama, R.S.: The uncanny valley: Effect of realism on the impression of
artificial human faces. Presence: Teleoperators and Virtual Environments 16(4), 337–351
(2007)
213. Shapiai, M.I., Ibrahim, Z., Khalid, M., Jau, L.W., Pavlovich, V.: A non-linear function approx-
imation from small samples based on Nadaraya-Watson kernel regression. In: Proceedings of
the Second International Conference on Computational Intelligence, Communication Systems
and Networks, pp. 28–32 (2010)
214. Smola, A., Murata, N., Schölkopf, B., Müller, K.R.: Asymptotically optimal choice of ε-loss
for support vector machines. In: Proceedings of the 8th International Conference on Artificial
Neural Networks, pp. 105–110. Springer (1998)
215. Smola, A., Schölkopf, B.: A tutorial on support vector regression. Statistics and Computing
14, 199–222 (2004)
216. Specht, D.F.: Probabilistic neural networks. Neural Networks 3, 109–118 (1990)
217. Stitson, M., Gammerman, A., Vapnik, V., Vovk, V., Watkins, C., Weston, J.: Support vector
regression with ANOVA decomposition kernels. Advances in Kernel MethodsSupport Vector
Learning pp. 285–292 (1999)
218. Stone, E., Skubic, M.: Evaluation of an inexpensive depth camera for passive in-home fall
risk assessment. In: Proceedings of the 5th International Conference on Pervasive Computing
Technologies for Healthcare, pp. 71–77 (2011)
219. Su, J., Dodd, T.J.: Online functional prediction for spatio-temporal systems using time-
varying radial basis function networks. In: Proceedings of the 2nd International Asia
Conference on Informatics in Control, Automation and Robotics, vol. 2, pp. 147–150 (2010)
220. Sweldens, W.: The lifting scheme: A new philosophy in biorthogonal wavelet constructions.
In: Proceedings of the SPIE Conference on Wavelet applications in signal and image
processing, pp. 68–79 (1995)
221. Sweldens, W.: The lifting scheme: A custom-design construction of biorthogonal wavelets.
Applied and Computational Harmonic Analysis 3(2), 186–200 (1996)
222. Sweldens, W.: The lifting scheme: A construction of second generation wavelets. SIAM
Journal on Mathematical Analysis 29(2), 511–546 (1997)
223. Syed, N.A., Liu, H., Sung, K.K.: Incremental learning with support vector machines.
In: Proceedings of the Workshop on Support Vector Machines at the International Joint
Conference on Artificial Intelligence (1999)
224. Tang, Y., Guo, W., Gao, J.: Efficient model selection for support vector machine with Gaussian
kernel function. In: proceedings of the IEEE Symposium on Computational Intelligence and
Data Mining, pp. 40–45 (2009)
References 161
225. Taylan, P., Weber, G.W.: Multivariate adaptive regression spline and continuous optimization
for modern applications in science, economy and techology. Tech. rep., Humboldt Universi-
taet Berlin (2004)
226. Teichmann, M., Capps, M.: Surface reconstruction with anisotropic density-scaled alpha
shapes. In: Proceedings of the IEEE Visualization Conference, pp. 67–72 (1998)
227. Terzopoulos, D., Metaxas, D.: Dynamic 3D models with local and global deformations:
deformable superquadrics. IEEE Transactions on Pattern Analysis and Machine Intelligence
13(7), 703–714 (1991)
228. Tirelli, P., Momi, E.D., Borghese, N.A., Ferrigno, G.: An intelligent atlas-based planning
system for keyhole neurosurgery. International Journal of Computer Assisted Radiology and
Surgery 4(1), 85–91 (2009)
229. Tomasi, C., Kanade, T.: Detection and tracking of point features. Tech. rep., Carnegie Mellon
University (1991)
230. Tsai, R.: A versatile camera calibration technique for high-accuracy 3D machine vision
metrology using off-the-shelf TV cameras and lenses. IEEE Journal of Robotics and
Automation 3(4), 323–344 (1987)
231. Tsang, I., Kwok, J., Cheung, P.M.: Core vector machines: Fast SVM training on very large
data sets. Journal of Machine Learning Research 6, 363–392 (2005)
232. Turk, G., Levoy, M.: Zippered polygon meshes from range images. In: Proceedings of the
ACM SIGGRAPH Conference on Computer graphics and interactive techniques, pp. 311–318
(1994)
233. Ullrich, S., Kuhlen, T.: Haptic palpation for medical simulation in virtual environments. IEEE
Transactions on Visualization and Computer Graphics 18(4), 617–625 (2012)
234. Uysal, I.L., Altay, H., Venir, G.: An overview of regression techniques for knowledge
discovery. Knowledge Engineering Review 14, 319–340 (1999)
235. Vaillant, R., Faugeras, O.: Using extremal boundaries for 3D object modelling. IEEE
Transactions on Pattern Analysis and Machine Intelligence 2(14), 157–173 (1992)
236. Vapnik, V.: Statistical learning theory. Wiley-Interscience (1989)
237. Vapnik, V., Chervonenkis, A.: Theory of pattern recognition. Nauka, Moscow (1974)
238. Vapnik, V., Lerner, A.: Pattern recognition using generalized portrait method. Automation
and Remote Control 24, 774–780 (1963)
239. Visintini, D., Spangher, A., Fico, B.: The VRML model of Victoria square in Gorizia (Italy)
from laser scanning and photogrammetric 3D surveys. In: Proceedings of the Web3D 2007
International Symposium, pp. 165–169 (2007)
240. Wang, D., Wu, X.B., Lin, D.M.: Two heuristic strategies for searching optimal hyper
parameters of C-SVM. In: Proceedings of the Eighth International Conference on Machine
Learning and Cybernetics, pp. 3690–3695 (2009)
241. Wang, L., Chu, C.h.: 3D building reconstruction from LiDAR data. In: Proceedings of
the 2009 IEEE International Conference on Systems, Man and Cybernetics, pp. 3054–3059
(2009)
242. Wang, Z., Chen, S., Sun, T.: MultiK-MHKS: A novel multiple kernel learning algorithm.
IEEE Transactions on Pattern Analysis and Machine Intelligence 30(2), 348–353 (2008)
243. Watkins, C.J.C.H., Dayan, P.: Q-learning. Machine Learning Journal 8(3) (1992)
244. Weiss, S., Indurkhya, N.: Optimized rule induction. IEEE Expert 8(6), 61–69 (1993)
245. Weiss, S., Indurkhya, N.: Rule-based machine learning methods for functional prediction.
Journal of Artificial Intelligence Research 3, 383–403 (1995)
246. Wohler, C.: 3D computer vision: efficient methods and applications. Springer (2009)
247. Wongwaen, N., Sinthanayothin, C.: Computerized algorithm for 3D teeth segmentation. In:
Proceedings of the International Conference On Electronics and Information Engineering, pp.
V1–277–V1–280 (2010)
248. Woodham, R.J.: Photometric method for determining surface orientation from multiple
images. Optical Engineering 19(1), 139–144 (1980)
249. Wust, C., Capson, D.W.: Surface profile measurement using color fringe projection. Machine
Vision and Applications 4, 193–203 (1991)
162 References
250. Xu, H., Hu, Y., Chen, Y., Ma, Z., Wu, D.: A novel 3D surface modeling based on spatial
neighbor points coupling in reverse engineering. In: Proceedings of the International
Conference on Computer Design and Applications, pp. V5–59–V5–62 (2010)
251. Xue, B., Dor, O., Faraggi, E., Zhou, Y.: Real-value prediction of backbone torsion angles.
Proteins 1(72), 427–33 (2008)
252. Yang, R., Allen, P.K.: Registering, integrating and building CAD models from range data. In:
Proceedings of the IEEE International Conference on Robotics and Automation, pp. 3115–
3120 (1998)
253. Yuille, A.L.: A computational theory for the perception of coherent visual motion. Nature
333, 71–74 (1988)
254. Zha, H., Hoshide, T., Hasegawa, T.: A recursive fitting-and-splitting algorithm for 3-D object
modeling using superquadrics. In: Proceedings of the International Conference on Pattern
Recognition, vol. 1, pp. 658–662 (1998)
255. Zhang, F., Du, Z., Sun, L., Jia, Z.: A new novel virtual simulation system for robot-assisted
orthopedic surgery. In: Proceedings of the IEEE International Conference on Robotics and
Biomimetics, pp. 366–370 (2007)
256. Zhang, Z.: Flexible camera calibration by viewing a plane from unknown orientations. In:
Proceedings of the 1999 IEEE International Conference on Computer Vision, pp. 666–673
vol.1 (1999)
257. Zhao, Y., Zhao, J., Zhang, L., Qi, L.: Development of a robotic 3D scanning system for reverse
engineering of freeform part. In: Proceedings of the International Conference on Advanced
Computer Theory and Engineering, pp. 246–250 (2008)
258. Zhenhua, Y., Xiao, F., Yinglu, L.: Online support vector regression for system identification.
Advances in Natural Computation 3611, 627–630 (2005)