Nonlinear Scale Space Analysis in Image Proce
Nonlinear Scale Space Analysis in Image Proce
by
Ilya Pollak
Doctor of Philosophy
in
Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology
August, 1999
°
c 1999 Massachusetts Institute of Technology
All Rights Reserved.
Signature of Author:
Dept. of Electrical Engineering and Computer Science
July 20, 2000
Certified by:
Alan S. Willsky
Professor of EECS
Thesis Supervisor
Certified by:
Hamid Krim
Professor of ECE, North Carolina State University,
Thesis Supervisor
Accepted by:
Arthur C. Smith
Professor of EECS
Chair, Committee for Graduate Students
2
Nonlinear Scale Space Analysis in Image Processing
by Ilya Pollak
ipollak@alum.mit.edu
Submitted to the Department of Electrical Engineering
and Computer Science on July 20, 2000
in Partial Fulfillment of the Requirements for the Degree
of Doctor of Philosophy in Electrical Engineering and Computer Science
Abstract
The objective of this work is to develop and analyze robust and fast image segmen-
tation algorithms. They must be robust to pervasive, large-amplitude noise, which
cannot be well characterized in terms of probabilistic distributions. This is because the
applications of interest include synthetic aperture radar (SAR) segmentation in which
speckle noise is a well-known problem that has defeated many algorithms. The meth-
ods must also be robust to blur, because many imaging techniques result in smoothed
images. For example, SAR image formation has a natural blur associated with it, due
to the finite aperture used in forming the image. We introduce a family of first-order
multi-dimensional ordinary differential equations with discontinuous right-hand sides
and demonstrate their applicability to segmenting both scalar-valued and vector-valued
images, as well as images taking values on a circle. An equation belonging to this family
is an inverse diffusion everywhere except at local extrema, where some stabilization is
introduced. For this reason, we call these equations “stabilized inverse diffusion equa-
tions” (“SIDEs”). Existence and uniqueness of solutions, as well as stability, are proven
for SIDEs. A SIDE in one spatial dimension may be interpreted as a limiting case of a
semi-discretized Perona-Malik equation [49,50], which, in turn, was proposed in order to
overcome certain shortcomings of Gaussian scale spaces [72]. These existing techniques
are reviewed in a background chapter. SIDEs are then described and experimentally
shown to suppress noise while sharpening edges present in the input image. Their ap-
plication to the detection of abrupt changes in 1-D signals is also demonstrated. It is
shown that a version of the SIDEs optimally solves certain change detection problems.
Its relations to the Mumford-Shah functional [44] and to linear programming are dis-
cussed. Theoretical performance analysis is carried out, and a fast implementation of
the algorithm is described.
1
Despite this fact, I probably hold another dubious record—namely, having the shortest dissertation.
I think there have been people in the group whose theses had sentences longer than this whole document.
7
8 ACKNOWLEDGMENTS
sometimes frightening; by following his comments and suggestions, one would typically
improve the quality of a paper by orders of magnitude. His help and advice—and, I am
sure, his reputation—were also instrumental to the success of my job-hunting campaign.
My co-supervisor Hamid Krim was equally supportive; I am very thankful to him for
having an “open door” and for the endless hours we spent discussing research, politics,
soccer, religion, and life in general. I am grateful to Hamid for inviting me to spend
several days in North Carolina, and for his immense help during my job search.
The suggestions and advice of the two remaining committee members—Olivier
Faugeras and Sanjoy Mitter—also greatly contributed both to my thesis and to finding
a good academic position. I am indebted to Olivier for inviting me to visit his group
at INRIA. I thoroughly enjoyed the month I spent in Antibes and Sophia-Antipolis, for
which I would like to thank the whole ROBOTVIS group: Didier Bondyfalat, Sylvain
Bougnoux, Rachid (Nour-Eddine) Deriche, Cyrille Gauclin, José Gomez, Pierre Korn-
probst, Marie-Cécile Lafont, Diane Lingrand, Théo Papadopoulo, Nikos Paragios, Luc
Robert, Robert Stahr, Thierry Viéville, and Imad Zoghlami.
My special thanks go to Stéphane Mallat for inviting me to give a seminar at École
Polytechnique, and for his help in my job search. He has greatly influenced my profes-
sional development, both through a class on wavelets which he taught very engagingly
and enthusiastically at MIT in 1994, and through his research, some of which was the
basis for my Master’s thesis.
I would like to thank Stuart Geman for suggesting the dynamic programming solu-
tion of Chapter 4, as well as for a number of other interesting and useful insights.
I am thankful for the stimulating discussions with Michele Basseville, Charlie Bouman,
Yoram Bresler, Patrick Combettes, David Donoho, Al Hero, Mohamed Khodja, Jiten-
dra Malik, Jean-Michel Morel, David Mumford, Igor Nikiforov, Pietro Perona, Jean-
Christophe Pesquet, Guillermo Sapiro, Eero Simoncelli, Gil Strang, Allen Tannenbaum,
Vadim Utkin, and Song-Chun Zhu, all of whom have contributed to improving the qual-
ity of my thesis and to broadening my research horizon.
In addition to the people I acknowledged above, Eric Miller, Peyman Milanfar, and
Andy Singer generously shared with me many intricacies of the academic job search.
I thank the National Science Foundation and Alan Willsky for providing the financial
support, through a Fellowship and a Research Assistantship, respectively. I thank Clem
Karl, Paul Fieguth and Andrew Kim for all their help with computers; Ben Halpern
for teaching me repeatedly and patiently how to use the fax machine, and for helping
me to tame our color printer; Mike Daniel and Austin Frakt for sharing the LATEXstyle
file which was used to format this thesis; John Fisher for organizing a very informative
reading group on learning theory; Asuman Koksal for making our office a little cozier;
Andy Tsai for being a great officemate; Dewey Tucker for his help in generating my
PowerPoint job talk; Jun Zhang and Tony Yezzi both for technical interactions and for
in-depth discussions of the opera; Mike Schneider for sharing his encyclopedic knowledge
(and peculiar tastes) of the art in general, and for educating me on a number of other
topics, ranging from estimation and linear programming to botany and ethnography,
ACKNOWLEDGMENTS 9
Abstract 3
Acknowledgments 7
List of Figures 15
1 Introduction 19
1.1 Problem Description and Motivation . . . . . . . . . . . . . . . . . . . . 19
1.2 Summary of Contributions and Thesis Organization . . . . . . . . . . . 22
2 Preliminaries 25
2.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Linear and Non-linear Diffusions. . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Region Merging Segmentation Algorithms. . . . . . . . . . . . . . . . . . 30
2.4 Shock Filters and Total Variation. . . . . . . . . . . . . . . . . . . . . . 31
2.5 Constrained Restoration of Geman and Reynolds. . . . . . . . . . . . . . 34
2.6 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
11
12 CONTENTS
4 Probabilistic Analysis 69
4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Background and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 SIDE as an Optimizer of a Statistic. . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Implementation of the SIDE Via a Region Merging Algorithm. . 75
4.4 Detection Problems Optimally Solved by the SIDE. . . . . . . . . . . . . 79
4.4.1 Two Distributions with Known Parameters. . . . . . . . . . . . . 79
4.4.2 Two Gaussian Distributions with Unknown Means. . . . . . . . . 81
4.4.3 Random Number of Edges and the Mumford-Shah Functional. . 83
4.5 Alternative Implementations. . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.1 Dynamic Programming. . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.2 An Equivalent Linear Program. . . . . . . . . . . . . . . . . . . . 87
4.6 Performance Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.1 Probability Bounds. . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.2 White Gaussian Noise. . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6.3 H∞ -Like Optimality. . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7 Analysis in 2-D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96
Bibliography 129
Index 135
14 CONTENTS
List of Figures
2.1 (a) An artificial image; (b) the edges corresponding to the image in (a);
(c) the image in (a) blurred with a Gaussian kernel; (d) the edges cor-
responding to the blurred image. Note that T-junctions are removed,
corners are rounded, and two black squares are merged together. The
edges here are the maxima of the absolute value of the gradient. . . . . 28
2.2 The G function from the right-hand side of the Perona-Malik equation
(2.11). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 The F function from the right-hand side of the Perona-Malik equation
(2.12). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 (b) Gaussian blurring of the signal depicted in (a). (c) Signal depicted in
(b), with additive white Gaussian noise of variance 0.1. (d) The steady
state of the shock filter (2.15), with the signal (b) as the initial condition.
The reconstruction is perfect, modulo numerical errors. (e) The steady
state of the shock filter (2.15), with the signal (c) as the initial condition.
It is virtually the same as (c), since all extrema remain stationary. . . . 32
2.5 Filtering the blurred unit step signal of Figure 2.4, (b) with the shock
filter (2.16): (a) 5 iterations, (b) 10 iterations, (c) 18 iterations. Spurious
maxima and minima are created; the unit step is never restored. . . . . 33
2.6 The SIDE energy function, also encountered in the models of Geman and
Reynolds, and Zhu and Mumford. . . . . . . . . . . . . . . . . . . . . . . 35
15
16 LIST OF FIGURES
4.1 Functions F from the right-hand side of the SIDE: (a) generic form; (b)
the signum function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Illustrations of Definitions 4.1 and 4.3: a sequence with three α-crossings,
where α=3 (top); the hypothesis generated by the three α-crossings (mid-
dle); the hypothesis generated by the two rightmost α-crossings (bottom). 72
4.3 Edge detection for a binary signal in Gaussian noise. . . . . . . . . . . . 80
4.4 Detection of changes in variance of Gaussian noise. . . . . . . . . . . . . 80
4.5 Edge detection in 2-D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
5.3 (a) A test image; (b) its noisy version (normalized); (c) detected bound-
ary, superimposed onto the noise-free image . . . . . . . . . . . . . . . . 101
5.4 (a) Image of two textures: fabric (left) and grass (right); (b) the ideal
segmentation of the image in (a). . . . . . . . . . . . . . . . . . . . . . . 102
5.5 (a-c) Filters; (d-f) Filtered versions of the image in Figure 5.4, (a). . . . 102
5.6 (a) Two-region segmentation, and (b) its deviation from the ideal one. . 103
5.7 (a) A different feature image: the direction of the gradient; (b) the cor-
responding two-region segmentation, and (c) its deviation from the ideal
one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.8 (a) Image of two wood textures; (b) the ideal segmentation of the image
in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.9 (a) Feature image for the wood textures; (b) the corresponding five-region
segmentation, and (c) its deviation from the ideal one. . . . . . . . . . . 104
5.10 (a) Another image of two wood textures; (b) the ideal segmentation of
the image in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.11 (a) Feature image for the wood textures in Figure 5.10, (a); (b) the
corresponding five-region segmentation, and (c) its deviation from the
ideal one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.12 A SIDE energy function which is flat at π and −π and therefore results
in a force function which vanishes at π and −π. . . . . . . . . . . . . . . 106
5.13 (a) The orientation image for Figure 5.7, (a); (b) the corresponding two-
region segmentation, and (c) its deviation from the ideal one. . . . . . . 108
5.14 (a) The orientation image for Figure 5.9, (a); (b) the corresponding five-
region segmentation, and (c) its deviation from the ideal one. . . . . . . 109
5.15 (a) The orientation image for Figure 5.10, (a); (b) the corresponding
five-region segmentation, and (c) its deviation from the ideal one. . . . . 109
Introduction
N this chapter, we introduce the problem of segmentation and change detection ad-
I dressed in this thesis, and describe the organization of the thesis.
100
200
100 200
19
20 CHAPTER 1. INTRODUCTION
Image segmentation is closely related to restoration, that is, the problem of estimat-
ing an image based on its degraded observation. Indeed, the solution to one of these
problems makes the other simpler: estimation is easier if the boundaries of homogeneous
image regions are known, and vice versa, segmentation is easier once a good estimate
of the image has been computed. It is therefore natural that many segmentation al-
gorithms are related to restoration techniques, and in fact some methods combine the
two, producing estimates of both the edge locations and image intensity [36, 44], as we
will see in Chapter 2.
In describing any restoration or segmentation technique, the notion of scale is very
important. Any such technique incorporates a scale parameter—either directly in the
computation procedure, or implicitly as a part of the image model—which controls
the smoothness of the estimate and/or sizes of the segmented regions. The precise
definitions of scale, in several contexts, are given in Chapters 2 and 3; intuitively,
changing the parameter from zero to infinity will produce a so-called scale space, i.e.
a set of increasingly coarse versions of the input image. There are two approaches to
generating a scale space: one starts with a probabilistic model, the other starts with a
set of “common-sense” heuristics. The difference between the two is conceptual: they
both may lead to the same algorithm [39, 44], producing the same scale space. In the
former case, one would build a probabilistic model of images of interest [6, 32, 39] and
proceed to derive an algorithm for computing the solution which is, in some probabilistic
sense, optimal. For example, one could model images as piecewise constant functions
with additive white Gaussian noise, and the edges (i.e. the boundaries separating the
constant pieces of the function) as continuous curves whose total length is a random
variable with a known distribution. Assigning larger probabilities to the occurrence of
edges would correspond to larger scales in such a model, which will be illustrated in
Chapter 4. Given a realization of this random field, the objective could be to compute
the maximum likelihood estimates [66] of the edge locations. The main shortcoming of
this approach is that a good model is unavailable in many applications, and that usually
any realistic model yields a complicated objective functional to be optimized. Obtaining
the optimal solution is therefore not computationally feasible, and one typically settles
for a local maximum [6, 63]. An alternative to such probabilistic methods of generating
scale spaces is to devise an algorithm using a heuristic description of images of interest.
Stabilized Inverse Diffusion Equations (SIDEs), which are the main topic of this thesis,
belong to this latter category.
SIDEs are motivated by the great recent interest in using evolutions specified by
partial differential equations (PDE’s) as image processing procedures for tasks such as
restoration and segmentation, among others [1, 12, 35, 46, 49, 50, 55–57, 72]. The basic
paradigm behind SIDEs, borrowed from [1, 35, 49, 72], is to treat the input image as the
initial data for a diffusion-like differential equation. The unknown in this equation is
usually a function of three variables: two spatial variables (one for each image dimen-
sion) and the scale—which is also called time because of the similarity of such equations
to evolution equations encountered in physics. In fact, one of the starting points of this
Sec. 1.1. Problem Description and Motivation 21
line of investigation was the observation [72] that smoothing an image with Gaussians
of varying width is equivalent to solving the linear heat diffusion equation with the
image as the initial condition. Specifically, the solution to the heat equation at time
t is the convolution of its initial condition with a Gaussian of variance 2t. Gaussian
filtering has been used both to remove noise and as a pre-processor for edge detection
procedures [9]. It has serious drawbacks, however: it displaces and removes important
image features, such as edges, corners, and T-junctions. (An example of this behavior
will be given in Chapter 2 (Figure 2.1).) The interpretation of Gaussian filtering as a
linear diffusion led to the design of other, nonlinear, evolution equations, which better
preserve these features [1, 46, 49, 55–57]. For example, one motivation for the work of
Perona and Malik in [49, 50] is achieving both noise removal and edge enhancement
through the use of an equation which in essence acts as an unstable inverse diffusion
near edges and as a stable linear-heat-equation-like diffusion in homogeneous regions
without edges.
The point of departure for the development of our SIDEs are the anisotropic diffu-
sions introduced by Perona and Malik and described in the next chapter. In a sense
that we will make both precise and conceptually clear, the evolutions that we intro-
duce may be viewed as a conceptually limiting case of the Perona-Malik diffusions.
These evolutions have discontinuous right-hand sides and act as inverse diffusions “al-
most everywhere” with stabilization resulting from the presence of the discontinuities
in the right-hand side. As we will see, the scale space of such an equation is a fam-
ily of segmentations of the original image, with larger values of the scale parameter t
corresponding to segmentations at coarser resolutions.
Since “segmentation” may have different meanings in different contexts, we close
this section by further clarifying which segmentation problems are addressed in this
thesis and which are not. This also gives us an opportunity to mention four important
application areas.
the gray matter and white matter. The main challenges of the segmentation prob-
lem depend on the object and on the imaging modality. For example, ultrasound
imaging introduces both significant blurring and speckle noise [8, 22, 70], and so
the corresponding segmentation algorithms must be robust to such degradations.
Robustness of the algorithm introduced in this thesis is experimentally demon-
strated in Chapters 3 and 4; Chapter 4 also contains its theoretical analysis.
Detection of abrupt changes in 1-D signals [3]. Application areas include analy-
sis of electrocardiograms and seismic signals, vibration monitoring in mechanical
structures, and quality control. Several synthetic 1-D examples are considered in
Chapters 3 and 4.
Computer vision. In the computer vision literature, the term “segmentation” often
refers to finding the contours of objects in natural images—i.e., photographic
pictures of scenes which we are likely to see in our everyday life [16]. This is an
important problem in low-level vision, because it has been universally accepted
since [71] that segmenting the perceived scene plays an important role in human
vision. However, noise and other types of degradation are usually not as significant
here as in the medical and radar images; the main challenge is the variety of
objects, shapes, and textures in a typical picture. This problem is therefore not
directly addressed in the present thesis; applying the algorithms developed here
to this problem is a topic for future research.
is carried out in Chapter 4. It is shown that a specific SIDE finds in N log N time the
maximum likelihood solutions to certain binary classification problems. The likelihood
function for one of these problems is essentially a 1-D version of the Mumford-Shah
functional [44]. Thus, an interesting link is established between diffusion equations,
Mumford and Shah’s variational formulation, and probabilistic models. The robustness
of the SIDE is explained by showing that, in a certain special case, it is optimal with re-
spect to an H∞ -like criterion—which, roughly speaking, means that the SIDE achieves
the minimum worst-case error. The performance is also analyzed by computing bounds
on the probabilities of errors in edge location estimates. To summarize, the main con-
tribution of Chapter 4 is establishing a connection between diffusion-based methods
and maximum likelihood edge detection, as well as extensive performance analysis.
Chapter 5 extends SIDEs to vector-valued images and images taking values on a
circle. We argue that most of the properties derived in Chapter 3 carry over. These
results are applicable to color segmentation, where the image value at every pixel is
a three-vector of red, green, and blue values. We also apply our algorithm to texture
segmentation, in which the vector image to be processed is formed by extracting features
from the raw texture image, as well as to segmenting orientation images.
Possible directions of future research are proposed in Chapter 6.
24 CHAPTER 1. INTRODUCTION
Chapter 2
Preliminaries
¥ 2.1 Notation.
In this section, we describe the notation which is used in the current chapter. Most
of this notation will carry over to the rest of the thesis; however, the large quantity of
symbols needed will force us to adopt a slightly different notation in Chapter 4—which
we will describe explicitly in Section 4.2.
We begin with the one-dimensional (1-D) case. The 1-D signal to be processed is
denoted by u0 (x). The superscript 0 is a reminder of the fact that the signal is to be
processed via a partial differential equation (PDE) of the following form:
The variable t is called scale or time, and the solution u(t, x) to (2.1), for 0 ≤ t < ∞,
is called a scale space. The partial derivatives with respect to t and x are denoted by
subscripts, and A1 is an operator. The scale space is called linear (nonlinear) if A1 is
a linear (nonlinear) operator.
Similarly, an image u0 (x, y) depending on two spatial variables, x and y, will be
processed using a PDE of the form
which generates the scale space u(t, x, y), for 0 ≤ t < ∞. In the PDEs we consider,
the right-hand side will sometimes involve the gradient and divergence operators. The
gradient of u(t, x, y) is the two-vector consisting of the partial derivatives of u with
respect to the spatial variables x and y:
def
∇u = (ux , uy )T , (2.3)
25
26 CHAPTER 2. PRELIMINARIES
where the superscript T denotes the transpose of a vector. The norm of the gradient
is:
q
def
|∇u| = u2x + u2y (2.4)
We also consider semi-discrete versions of (2.1) and (2.2), obtained by discretizing the
spatial variables and leaving t continuous. Specifically, an N -point 1-D discrete signal
to be processed is denoted by u0 ; it is an element of the N -dimensional vector space
IRN . We exclusively reserve boldface letters for vectors—i.e., discrete signals and
images. The vector u0 is the initial condition to the following N -dimensional ordinary
differential equation (ODE):
where u(t) is the corresponding scale space, and u̇(t) is its derivative with respect to
t. We denote the entries of an N -point signal by the same symbol as the signal itself,
with additional subscripts 1 through N :
Since most operators B1 of interest will involve first differences of the form un+1 − un ,
it will simplify our notation to also define non-existent samples u0 and uN +1 . Thus,
all vectors will implicitly be (N + 2)-dimensional. Typically, we will take u0 = u1 and
uN +1 = uN . We emphasize that subscripts 0 through N + 1 will always denote the
samples of a signal, whereas the superscript 0 will be reserved exclusively to denote the
signal which is the initial condition of a differential equation.
2
We similarly denote an N -by-N image to be processed by u0 ∈ IRN ; it will always
be clear from the context whether u0 refers to a 1-D or a 2-D discrete signal. The
corresponding system of ODEs is
where u0 and u(t) are matrices whose entries in the i-th row and j-th column are u0i,j
and ui,j (t), respectively.
Sec. 2.2. Linear and Non-linear Diffusions. 27
The operators B 1 and B 2 will typically be the negative gradient of some energy
functional, which we will denote by E(u). This energy will depend on the first differences
of u in the following way:
X
E(u) = E(us − ur ), (2.8)
(s,r)∈N
where
• E is an even function;
• s and r are single indices if u is a 1-D signal and pairs of indices if u is a 2-D
image;
• N is the list of all neighboring pairs of pixels: s and r are neighbors if and only
if (s, r) ∈ N .
We will use the following neighborhood structure in 1-D:
N −1
N = {(n, n + 1)}n=1 . (2.9)
where the subscripts denote partial derivatives. This insight led to the pursuit and
development of a new paradigm for processing images via the evolution of nonlinear
28 CHAPTER 2. PRELIMINARIES
PDEs [1, 46, 49, 50, 56] which effectively lift the limitations of the linear heat equation.
For example, in [49, 50], Perona and Malik propose to achieve both noise removal and
edge enhancement through the use of a non-uniform diffusion which in essence acts as an
unstable inverse diffusion near edges and as a stable linear-heat-equation-like diffusion
in homogeneous regions without edges:
~ · {G(|∇u|)∇u} ,
ut = ∇ (2.11)
0
u(0, x, y) = u (x, y),
~ and ∇ are the divergence (2.5) and gradient (2.3), respectively. The nonlinear
where ∇
G(v)
v
Figure 2.2. The G function from the right-hand side of the Perona-Malik equation (2.11).
ically decreasing function with G(0) = 1. (Note that if G were identically equal to 1,
~ · (∇u) = uxx + uyy .)
then (2.11) would turn into the linear heat equation (2.10), since ∇
To simplify the analysis of the behavior of this equation near edges, we re-write it
below in one spatial dimension; however, the statements we make also apply to 2-D.
∂
ut = {F (ux )} (2.12)
∂x
u(0, x) = u0 (x),
where F (ux ) = G(|ux |)ux , i.e., F is odd, and tends to zero at infinity. Perona and
Malik also impose that F have a unique maximum at some location K (Figure 2.3).
This constant K is the threshold between diffusion and enhancement, in the following
sense. If, for a particular time t = t0 , we define an “edge” of u(t0 , x) as an inflection
F(v)
-K K v
Figure 2.3. The F function from the right-hand side of the Perona-Malik equation (2.12).
point with the property ux uxxx < 0, then a simple calculation shows that all such
edges where |ux | < K will be diminished by (2.12)—i.e. |ux | will be reduced, while
the larger edges, with |ux | > K, will be enhanced. It has been observed [34] that the
numerical implementations of (2.12) do not exactly exhibit this behavior, although they
do produce temporary enhancement of edges, resulting in both noise removal and scale
spaces in which the edges are much more stable across scale than in linear scale spaces.
As Weickert pointed out in [69], “a scale-space representation cannot perform better
than its discrete realization”. These observations naturally led to a closer analysis
(described in the next chapter) of a semi-discrete counterpart of (2.12), i.e., of the
following system of ordinary differential equations:
u̇n = F (un+1 − un ) − F (un − un−1 ), n = 1, . . . , N, (2.13)
0
u(0) = u ,
where u0 = (u01 , . . . , u0N )T ∈ IRN is the signal to be processed, and where the conven-
tions uN +1 = uN and u0 = u1 are used.
The 2-D semi-discrete version of the Perona-Malik equation is similar:
u̇ij = F (ui+1,j − uij ) − F (uij − ui−1,j )
+ F (ui,j+1 − uij ) − F (uij − ui,j−1 ),
u(0) = u0 ,
with i = 1, 2, . . . , N, j = 1, 2, . . . , N, and with the conventions u0,j = u1,j , uN +1,j =
uN,j , ui,0 = ui,1 and ui,N +1 = ui,N .
30 CHAPTER 2. PRELIMINARIES
(u − u0 )T (u − u0 ) + tl.
Sec. 2.4. Shock Filters and Total Variation. 31
Here, |Op | is the number of pixels in the region Op and l is the total length of all the
edges.
This method admits fast numerical implementations and has been experimentally
shown to be robust to white Gaussian noise. However, as we will illustrate, the quadratic
penalty on the disagreement between the estimate u and the initial data u0 renders it
ineffective against more severe noise, such as speckle encountered in SAR images.
Note that region merging methods do not allow edges to be created. Thus, decisions
made in the beginning of an algorithm cannot be undone later. A slight modification of
such methods results in split-and-merge methods, which combine region growing with
region splitting [43].
0.8
0.6
0.4
0.2
−0.2
0 50 100 150 200
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 50 100 150 200 0 50 100 150 200
(b) A blurred step. (c) The blurred step with additive noise.
1 1
0.8 0.8
0.6 0.6
0.4 0.4
0.2 0.2
0 0
−0.2 −0.2
0 50 100 150 200 0 50 100 150 200
context and in a somewhat different form. We start with Bouman and Sauer’s work,
since it was chronologically first, and since—as we will see in the next chapter—it is
conceptually closer to the results presented in this thesis.
The objective of [6, 58] is reconstructing an image u from its tomographic projec-
tions u0 . The authors consider transmission tomography, where the projection data
are in the form of the number of photons detected after passing through an absorptive
material. In other words, u0 is the number of photon counts for each angle and dis-
placement. Bouman and Sauer use a probabilistic setting, where the photon counts
are Poisson random variables, independent among angles and displacements. They de-
rive an expression for the log likelihood function L(u|u0 ), and seek the maximum a
posteriori [66] estimate û of u:
0.8
0.6
0.4
0.2
−0.2
0 50 100 150 200
0.8
0.6
0.4
0.2
−0.2
0 50 100 150 200
0.8
0.6
0.4
0.2
−0.2
0 50 100 150 200
where E(u) is the logarithm of the prior density function of u. They propose the
following prior model:
X
E(u) = γ |us − ur |, (2.18)
(s,r)∈N
along the hyperplane {u : us1 = us2 = . . . = usi } until a minimum of the objective
function (2.17) is achieved. After each pixel of the image is visited in such a manner, a
“split” iteration follows, where each pixel is freed to seek its own conditionally optimal
value. This approach is theoretically justified and extended in Chapter 3, where it is
shown that the steepest descent for a non-differentiable energy function such as (2.18)
is a differential equation which automatically merges pixels, thereby segmenting the
underlying image.
We also point out that the continuous version of the energy (2.18) is
Z
|ux | dx in 1-D,
Z Z
and |∇u| dx dy in 2-D,
and is called the total variation of u. Its constrained minimization was used in [56] for
image restoration. The restored version u(x, y) of an image u0 (x, y) was computed by
solving the following optimization problem:
Z Z
minimize |∇u| dx dy (2.19)
Z Z
subject to (u − u0 ) dx dy = 0
Z Z
and (u − u0 )2 dx dy = σ 2 .
¥ 2.6 Conclusion.
An exhaustive survey of variational models in image processing is beyond the scope
of this thesis. A much more complete bibliography can be found in [43]. In particu-
lar, Chapter 3 of [43] contains a very nice discussion of region merging segmentation
algorithms, starting with Brice and Fennema’s [7] and Pavlidis’ [47], which may be
considered as ancestors to both [36], snakes [31], and SIDEs. Examples of more re-
cent algorithms, not covered in [43], are [16] and [24]. Another important survey text,
Sec. 2.6. Conclusion. 35
Figure 2.6. The SIDE energy function, also encountered in the models of Geman and Reynolds, and
Zhu and Mumford.
which also contains a wealth of references both on variational methods and nonlinear
diffusions, is [55].
36 CHAPTER 2. PRELIMINARIES
Chapter 3
¥ 3.1 Introduction.
N this chapter, we introduce the Stabilized Inverse Diffusion Equations (SIDEs), as
I well as illustrate their speed and robustness, in comparison with some of the methods
reviewed in Chapter 2. As we mentioned in the previous chapter, the starting point for
the development of SIDEs were image restoration and segmentation procedures based on
PDEs of evolution [1, 12, 46, 49, 50, 55–57, 72]. We observed that the numerical schemes
for solving such equations do not necessarily exhibit the behavior of the equations
themselves. We therefore concentrate in this thesis on semi-discrete scale spaces (i.e.,
continuous in scale and discrete in space). More specifically, SIDEs, which are the
main focus and contribution of this thesis, are a new family of semi-discrete evolution
equations which stably sharpen edges and suppress noise. We will see that SIDEs
may be viewed as a conceptually limiting case of Perona-Malik diffusions which were
reviewed in the previous chapter. SIDEs have discontinuous right-hand sides and act as
inverse diffusions “almost everywhere”, with stabilization resulting from the presence
of discontinuities in the vector field defined by the evolution. The scale space of such an
equation is a family of segmentations of the original image, with larger values of the scale
parameter t corresponding to segmentations at coarser scales. Moreover, in contrast to
continuous evolutions, the ones introduced here naturally define a sequence of logical
“stopping times”, i.e. points along the evolution endowed with useful information, and
corresponding to times at which the evolution hits a discontinuity surface of its defining
vector field.
In the next section we begin by describing a convenient mechanical analog for the
visualization of many spatially-discrete evolution equations, including discretized linear
or nonlinear diffusions such as that of Perona and Malik, as well as the discontinuous
equations that we introduce in Section 3.3. The implementation of such a discontinuous
equation naturally results in a recursive region merging algorithm. Because of the
discontinuous right-hand side of SIDEs, some care must be taken in defining solutions,
but as we show in Section 3.4, once this is done, the resulting evolutions have a number
of important properties. Moreover, as we have indicated, they lead to very effective
37
38 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
u N , MN
u n , Mn
F1
FN
........ Fn
u 1 , M1 ........
move along N vertical lines. Each particle is connected by springs to its two neighbors
(except the first and last particles, which are only connected to one neighbor.) Every
spring whose vertical extent is v has energy E(v), i.e., the energy of the spring between
the n-th and (n + 1)-st particles is E(un+1 − un ). We impose the usual requirements
on this energy function:
E(v) ≥ 0,
E(0) = 0, (3.2)
0
E (v) ≥ 0 for v > 0,
E(v) = E(−v).
Then the derivative of E(v), which we refer to as “the force function” and denote by
F (v), satisfies
F (0) = 0,
F (v) ≥ 0 for v > 0, (3.3)
F (v) = −F (−v).
We also call F (v) a “force function” and E(v) an “energy” if −E(v) satisfies (3.2)
and −F (v) satisfies (3.3). We make the movement of the particles non-conservative by
stopping it after a small period of time ∆t and re-starting with zero velocity. (Note
that this will make our equation non-hyperbolic.) It is assumed that during one such
step, the total force Fn = −F (un − un+1 ) − F (un − un−1 ), acting on the n-th particle,
stays approximately constant. The displacement during one iteration is proportional
to the product of acceleration and the square of the time interval:
(∆t)2 Fn
un (t + ∆t) − un (t) = .
2 Mn
1
u̇n = (F (un+1 − un ) − F (un − un−1 )), n = 1, 2, . . . , N, (3.4)
mn
with the conventions u0 = u1 and uN +1 = uN imposed by the absence of springs to the
left of the first particle and to the right of the last particle. We will refer to mn as “the
mass of the n-th particle” in the remainder of the thesis. Note that Equation (3.4) is a
(weighted) gradient descent equation for the following global energy:
X
N −1
E(u) = E(ui+1 − ui ). (3.5)
i=1
A linear force function F (v) = v leads to the semi-discrete linear heat equation
This corresponds to a simple discretization of the 1-D linear heat equation and results
in evolutions which produce increasingly low-pass filtered and smoothed versions of the
original signal u0 .
which is illustrated in Figure 3.2(a). We shall call the corresponding energy a “dif-
-K K
fusion energy” and the corresponding evolution (3.4) a “diffusion”. The evolution in
Example 3.1 is clearly a diffusion. We call F (v) an “inverse diffusion force” if −F (v)
satisfies Equations (3.3) and (3.6), as illustrated in Figure 3.2(b). The corresponding
evolution (3.4) is called an “inverse diffusion”. Inverse diffusions have the characteris-
tic of enhancing abrupt differences in u corresponding to “edges” in the 1-D sequence.
Such pure inverse diffusions, however, lead to unstable evolutions (in the sense that they
greatly amplify arbitrarily small noise). The following example, which is prototypical of
the examples considered by Perona and Malik, defines a stable evolution that captures
at least some of the edge enhancing characteristics of inverse diffusions.
We shall call the corresponding energy a “Perona-Malik energy” and the corresponding
evolution equation a “Perona-Malik equation of thickness K”. As Perona and Malik
demonstrate (and as can also be inferred from the results in the present thesis), evolu-
tions with such a force function act like inverse diffusions in the regions of high gradient
and like usual diffusions elsewhere. They are stable and capable of achieving some level
of edge enhancement depending on the exact form of F (v).
Finally, to extend the mechanical model of Figure 3.1 to images, we simply replace
the sequence of vertical lines along which the particles move with an N -by-N square
grid of such lines, as shown in Figure 3.3. The particle at location (i, j) is connected
by springs to its four neighbors: (i − 1, j), (i, j + 1), (i + 1, j), (i, j − 1), except for the
particles in the four corners of the square (which only have two neighbors each), and the
rest of the particles on the boundary of the square (which have three neighbors). This
arrangement is reminiscent of (and, in fact, was suggested by) the resistive network of
Figure 8 in [49]. The analog of Equation (3.4) for images is then:
1
u̇ij = (F (ui+1,j − uij ) − F (uij − ui−1,j )
mij
+ F (ui,j+1 − uij ) − F (uij − ui,j−1 ), (3.8)
F 0 (v) ≤ 0 for v 6= 0,
+
F (0 ) > 0 (3.9)
F (v1 ) = F (v2 ) ⇔ v1 = v 2 .
Contrasting this form of a force function to the Perona-Malik function in Figure 3.2,
we see that in a sense one can view the discontinuous force function as a limiting form
of the continuous force function in Figure 3.2(c), as K → 0. However, because of
the discontinuity at the origin of the force function in Figure 3.4, there is a question
of how one defines solutions of Equation (3.4) for such a force function. Indeed, if
Equation (3.4) evolves toward a point of discontinuity of its RHS, the value of the
RHS of (3.4) apparently depends on the direction from which this point is approached
(because F (0+ ) 6= F (0− )), making further evolution non-unique. We therefore need a
special definition of how the trajectory of the evolution proceeds at these discontinuity
points.1 For this definition to be useful, the resulting evolution must satisfy well-
posedness properties: the existence and uniqueness of solutions, as well as stability of
solutions with respect to the initial data. In the rest of this section we describe how to
define solutions to (3.4) for force functions (3.9). Assuming the resulting evolutions to
be well-posed, we demonstrate that they have the desired qualitative properties, namely
that they both are stable and also act as inverse diffusions and hence enhance edges.
We address the issue of well-posedness and other properties in Section 3.4.
Consider the evolution (3.4) with F (v) as in Figure 3.4 and Equation (3.9) and with
all of the masses mn equal to 1. Notice that the RHS of (3.4) has a discontinuity at a
point u if and only if ui = ui+1 for some i between 1 and N − 1. It is when a trajectory
reaches such a point u that we need the following definition. In terms of the spring-mass
model of Figure 3.1, once the vertical positions ui and ui+1 of two neighboring particles
become equal, the spring connecting them is replaced by a rigid link. In other words,
1
Having such a definition is crucial because, as we will show in Section 3.4, equation (3.4) will reach
a discontinuity point of its RHS in finite time, starting with any initial condition.
Sec. 3.3. Stabilized Inverse Diffusion Equations (SIDEs): The Definition. 43
the two particles are simply merged into a single particle which is twice as heavy (see
Figure 3.5), yielding the following modification of (3.4) for n = i and n = i + 1:
1
u̇i = u̇i+1 = (F (ui+2 − ui+1 ) − F (ui − ui−1 )).
2
(The differential equations for n 6= i, i + 1 do not change.) Similarly, if m consecutive
u i+2 u i+2
ui u i+1 u i = u i+1
u i-1 u i-1
particles reach equal vertical position, they are merged into one particle of mass m
(1 ≤ m ≤ N ):
u̇n = . . . = u̇n+m−1 =
1
= (F (un+m − un+m−1 ) − F (un − un−1 )) (3.10)
m
if
un−1 6= un = un+1 = . . . = un+m−2 = un+m−1 6= un+m .
Notice that this system is the same as (3.4), but with possibly unequal masses. It is
convenient to re-write this equation so as to explicitly indicate the reduction in the
number of state variables:
1
u̇ni = (F (uni+1 − uni+1 −1 ) − F (uni − uni−1 )), (3.11)
mni
uni = uni +1 = . . . = uni +mni −1 ,
where i = 1, . . . p,
1 = n1 < n2 < . . . < np−1 < np ≤ N,
ni+1 = ni + mni .
The compound particle described by the vertical position uni and mass mni consists
of mni unit-mass particles uni , uni +1 , . . . , uni +mni −1 that have been merged, as shown
in Figure 3.5. The evolution can then naturally be thought of as a sequence of stages:
during each stage, the right-hand side of (3.11) is continuous. Once the solution hits
a discontinuity surface of the right-hand side, the state reduction and re-assignment
44 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
of mni ’s, described above, takes place. The solution then proceeds according to the
modified equation until it hits the next discontinuity surface, etc.
Notice that such an evolution automatically produces a multiscale segmentation of
the original signal if one views each compound particle as a region of the signal. Viewed
as a segmentation algorithm, this evolution can be summarized as follows:
1. Start with the trivial initial segmentation: each sample is a distinct region.
2. Evolve (3.11) until the values in two or more neighboring regions become equal.
3. Merge the neighboring regions whose values are equal.
4. Go to step 2.
The same algorithm can be used for 2-D images, which is immediate upon re-writing
Equation (3.11):
1 X
u̇ni = F (unj − uni )pij , (3.12)
mni
nj ∈Ani
where
mni is again the mass of the compound particle ni (= the number of pixels in the
region ni );
Ani is the set of the indices of all the neighbors of ni , i.e., of all the compound particles
that are connected to ni by springs;
pij is the number of springs between regions ni and nj (always 1 in 1-D, but can be
larger in 2-D).
Just as in 1-D, two neighboring regions n1 and n2 are merged by replacing them with one
region n of mass mn = mn1 + mn2 and the set of neighbors An = An1 ∪ An2 \{n1 , n2 }.
We close this section by describing one of the basic and most important properties
of these evolutions, namely that the evolution is stable but nevertheless behaves like an
inverse diffusion. Notice that a force function F (v) satisfying (3.9) can be represented
as the sum of an inverse diffusion force Fid (v) and a positive multiple of sgn(v): F (v) =
Fid (v) + C sgn(v), where C = F (0+ ) and −Fid (v) satisfies (3.3) and (3.6). Therefore,
if uni+1 − uni and uni − uni−1 are of the same sign (which means that uni is not a local
extremum of the sequence (un1 , . . . , unp )), then (3.11) can be written as
1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 )). (3.13)
mni
If uni > uni+1 and uni > uni−1 (i.e., uni is a local maximum), then (3.11) is
1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 ) − 2C). (3.14)
mni
Sec. 3.4. Properties of SIDEs. 45
If uni < uni+1 and uni < uni−1 (i.e., uni is a local minimum), then (3.11) is
1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 ) + 2C). (3.15)
mni
Equation (3.13) says that the evolution is a pure inverse diffusion at the points which
are not local extrema. It is not, however, a global inverse diffusion, since pure inverse
diffusions drive local maxima to +∞ and local minima to −∞ and thus are unstable.
In contrast, equations (3.14) and (3.15) show that at local extrema, the evolution in-
troduced in this chapter is an inverse diffusion plus a stabilizing term which guarantees
that the local maxima do not increase and the local minima do not decrease. Indeed,
|Fid (v)| ≤ F (0+ ) = C for any v and for any SIDE force function F , and therefore the
RHS of (3.14) is negative, and the RHS of (3.15) is positive. For this reason, we call
the new evolution (3.11), (3.12) a “stabilized inverse diffusion equation” (“SIDE”), a
force function satisfying (3.9) a “SIDE force”, and the corresponding energy a “SIDE
energy”. In Chapter 4, we will analyze a simpler version of this equation, which results
from dropping the inverse diffusion term. In this particular case, the local extrema
move with constant speed and all the other samples are stationary, which makes the
analysis of the equation more tractable.
able to show that a trajectory whose initial point is very close to Sni will, in fact, hit Sni
(see Figure 3.6). In the literature on differential equations and control theory [17, 67],
the behavior that SIDEs exhibit and which is illustrated in Figure 3.6 is referred to as
“sliding modes”. Specifically, as proven in Appendix A, the behavior of the evolution
near discontinuity hyperplanes satisfies the following:
vn+1
vn
Intuitively, and as illustrated in Figure 3.6, this lemma states that the solution field
of a SIDE near any discontinuity surface points toward that surface. As a consequence,
a trajectory which hits such a surface may be continuously extended to “slide” along the
surface, as shown in [17, 67]. For this reason the discontinuity surfaces are commonly
referred to as “sliding surfaces”. For SIDEs, a simple calculation verifies that the
dynamics along such a surface, obtained through any of the three classical definitions
in [17, 67], correspond exactly to the definition given in the preceding section.
The Lemma on Sliding, together with the well-posedness of SIDEs inside their con-
tinuity regions, directly implies the overall well-posedness of 1-D SIDEs: for finite T ,
the trajectory from t = 0 to t = T depends continuously on its initial point. As shown
in Property 3.2 to follow, a SIDE reaches a steady state in finite time, which establishes
its well-posedness for infinite time intervals.
2
In IRp−1 , a quadrant containing a vector a = (a1 , . . . , ap−1 )T such that ai 6= 0 for i = 1, . . . , p − 1
is the set Q = {b ∈ IRp−1 : bi ai > 0 for i = 1, . . . , p − 1}.
Sec. 3.4. Properties of SIDEs. 47
Property 3.1 (maximum principle). Every local maximum is decreased and every
local minimum is increased by a SIDE. Therefore,
Property 3.2 (finite evolution time). A SIDE, started at u0 = (u01 , . . . , u0N )T , reaches
P
its equilibrium (i.e., the point u = (u1 , . . . , uN )T where u1 = . . . = uN = N1 N 0
i=1 ui )
in finite time.
Proof. The sum of the vertical positions of all unit-mass particles is equal to the
sum
PN of the vertical
Pp positions of the compound particles, weighted by their masses:
n=1 un = i=1 uni mni . The time derivative of this quantity is zero, as verified
byP summing up the right-hand sides of (3.11). Therefore, the mean vertical position
1 N
N n=1 un is constant throughout the evolution. Writing (3.11) for i = 1, u̇n1 =
mn1 (un2 −un1 ), we see that the leftmost compound particle is stationary only if p = 1,
1
F
i.e., if all unit-mass particles have the same vertical position: un1 = u1 = u2 = .P. . = uN .
Since the mean is conserved, the unique steady state is u1 = . . . = uN = N1 N 0
i=1 ui .
To prove that it is reached in finite time, we again refer to the spring-mass model of
Figure 3.1 and use the fact that a SIDE force function assigns larger force to shorter
springs. If we put L = 2 max n
|un (0)|, then the maximum principle implies that in the
system there cannot exist a spring with vertical extent larger than L at any time during
the evolution. Therefore, the rate of decrease of the absolute maximum, according to
Equation (3.11), is at least F (L)/N (because F (L) is the smallest force possible in the
system, and N is the largest mass). Similarly, the absolute minimum always increases
at least as quickly. They will meet no later than at t = 2FLN (L) , at which point the
sequence u(t) must be a constant sequence.
The above property allows us immediately to state the well-posedness results as
follows:
Property 3.3 (well-posedness). For any initial condition u0∗ , a SIDE has a unique
solution u∗ (t) satisfying u∗ (0) = u0∗ . Moreover, for any such u0∗ and any ε > 0, there
exists a δ > 0 such that |u0 − u0∗ | ≤ δ implies |u(t) − u∗ (t)| ≤ ε for t ≥ 0, where u(t)
is the solution of the SIDE with the initial condition u0 .
features in an image. For this to be true, however, we would need some type of continuity
of this hitting time sequence. Specifically, let tn (u0 ) denote the “n-th hit time”, i.e., the
time when the solution starting at u0 reaches the sliding hyperplane Sn . By Property
3.2, this is a finite number. Let u(t) be “a typical solution” if it never reaches two
different sliding hyperplanes at the same time: ti (u(0)) 6= tj (u(0)) if i 6= j. One
of the consequences of the Lemma on Sliding is that a trajectory that hits a single
hyperplane Sn does so transversally (that is, cannot be tangent to it). Since trajectories
vary continuously, this means that nearby solutions also hit Sn . Therefore, for typical
solutions the following holds:
Property 3.4 (stability of hit times). If u(t) is a typical solution, all solutions with
initial data sufficiently close to u(0) get onto surfaces Sn in the same order as u(t).
The sequence in which a trajectory hits surfaces Sn is an important characteristic
of the solution. Property 3.4 says that, for a typical solution u(t), the (strict) ordering
of hit times tn (u(0)) is stable with respect to small disturbances in u(0):
tn1 (u(0)) < tn2 (u(0)) < . . . < tnN −1 (u(0)), (3.17)
We note that if the smoothing kernel pK (v) is appropriately chosen, then the result-
ing FK (v) will be a Perona-Malik force function of thickness K. (For example, one easy
choice for pK (v) is a multiple of the indicator function of the interval [-K;K].) Thus,
semi-discrete Perona-Malik evolutions with small K are regularizations of SIDEs, and
consequently a SIDE in 1-D can be viewed as a limiting case of a Perona-Malik-type
evolution. However, as we will see in the experimental section, the SIDE evolutions
appear to have some advantages over such regularized evolutions even in 1-D.
X
N −1
E(u) = E(un+1 − un ), (3.18)
n=1
where E is the SIDE energy function (Figure 2.6), i.e., an antiderivative of the SIDE
force function. Note that the standard definition of the gradient cannot be used here.
Indeed, non-differentiability of E at the origin makes the directional derivatives of E(u)
in the directions orthogonal to a sliding surface S undefined for u ∈ S. But once u(t)
hits a sliding surface, it stays there for all future times, and therefore we do not have to
be concerned with the partial derivatives of E(u) in the directions which do not lie in
the sliding surface. This leads to the definition of the gradient as the vector of partial
derivatives taken with respect to the directions which belong to the sliding surface.
Definition 3.1. Suppose that S is the intersection of all the sliding hyperplanes of
the SIDE (3.11) to which the vector u belongs. Suppose further that {fi }pi=1 is an
orthonormal basis for S. Then the gradient of E with respect to S, ∇S E, is defined
as the weighted sum of the basis vectors, with the weights equal to the corresponding
directional derivatives:
def
X
p
∂E(u)
∇S E(u) = fi . (3.19)
∂fi
i=1
We will show in this section that at any moment t, the RHS of the SIDE (3.11)
is the negative gradient of E(u(t)), taken with respect to the intersection S of all the
sliding surfaces to which u(t) belongs. An auxiliary result is needed in order to show
this.
50 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
Lemma 3.2. Suppose that, as in Equation (3.11), u is a signal with p distinct regions
of masses m1 , . . . , mp :
Let {ej }N
j=1 be the standard basis of IR
N
(i.e., the j-th entry of ej is 1 and all other
entries are zeros), and define
ni+1 −1
1 X
fi = √ ej , for i = 1, . . . , p. (3.21)
mni
j=ni
Then {fi }pi=1 is an orthonormal basis for the sliding surface S defined by (3.20).
Proof. The vector fi satisfies Equation (3.20), and therefore it belongs to the sliding
surface S. Since ej ’s are mutually orthogonal, so are fi ’s. Since there are p distinct fi ’s,
they form a basis for the p-dimensional surface S. The norm of fi is
ni+1 −1 µ ¶2
X 1 1
√ = mni = 1.
mni mni
j=ni
Property 3.6 (gradient descent). The SIDE (3.11) is the gradient descent equation
for the global energy (3.18), i.e.,
where S(t) is the intersection of all sliding hyperplanes to which u(t) belongs, and ∇S(t)
is the gradient with respect to S(t).
Proof. In order to prove this property, we write out Equation (3.22) in terms of the
coefficients of u̇ and −∇S E(u) with respect to the basis {fi }pi=1 (3.21). It is immediate
from the definition (3.21) of fi ’s that
X
p
√
u= uni mni fi ,
i=1
Since the basis {fi }pi=1 is orthonormal, the i-th coefficient of −∇S E(u) in this basis is
the directional derivative of −E in the direction fi :
∂E 1
− = − lim {E(u + fi ∆) − E(u)}
∂fi ∆→0 ∆
Sec. 3.4. Properties of SIDEs. 51
("n −2
1 Xi
∆
= − lim E(un+1 − un ) + E(uni + √ − uni −1 )
∆→0 ∆ mni
n=1
ni+1 −2
X ∆ ∆
+ E(un+1 + √ − un − √ )
n=ni
mni mni
∆ X
N −1 X
N −1
+E(uni+1 − uni+1 −1 − √ )+ E(un+1 − un ) − E(un+1 − un )
mni n=n
i+1 n=1
½ · ¸
1 ∆
= − lim E(uni − uni −1 + √ ) − E(uni − uni −1 )
∆→0 ∆ mni
· ¸¾
1 ∆
+ E(uni+1 − uni+1 −1 − √ ) − E(uni+1 − uni+1 −1 )
∆ mni
½ ¾
1 1
= − √ E 0 (uni − uni −1 ) − √ E 0 (uni+1 − uni+1 −1 )
mni mni
1
= √ (F (uni+1 − uni+1 −1 ) − F (uni − uni −1 )). (3.24)
mni
Equating the coefficients (3.23) and (3.24), we get that the gradient descent equation
(3.22), written in the basis {fi }pi=1 , is:
1
u̇ni = (F (uni+1 − uni+1 −1 ) − F (uni − uni −1 ),
mni
which is the SIDE (3.11).
It is possible to characterize further the process of energy dissipation during the
evolution of a SIDE. Namely, between any two consecutive mergings (i.e., hits of a
sliding surface), the energy is a concave function of time.
Property 3.7 (energy dissipation). Consider the SIDE (3.11) and let E be the cor-
responding SIDE energy function: E 0 = F . Then between any two consecutive mergings
during the evolution of the SIDE, the global energy (3.18) is decreasing and concave as
a function of time:
Ė < 0
Ë ≤ 0.
yi = uni for i = 1, . . . , p,
and will simply write mi instead of mni . Then the global energy (3.18) is
X
p
E= E(yi+1 − yi ),
i=1
52 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
1 ∂E
ẏi = − , i = 1, . . . , p.
mi ∂yi
By the chain rule of differentiation, we have:
X
p p µ
X ¶
∂E ∂E 2 1
Ė = ẏi = − < 0.
∂yi ∂yi mi
i=1 i=1
Differentiating with respect to t one more time and applying the chain rule again yields:
Xp µ ¶
∂E d ∂E 1
Ë = − 2
∂yi dt ∂yi mi
i=1
à p !
Xp
∂E X ∂ 2 E 1
= − 2 ẏk
∂yi ∂yi ∂yk mi
i=1 k=1
à !
Xp
∂E X ∂ 2 E 1 ∂E
p
1
= 2
∂yi ∂yi ∂yk mk ∂yk mi
i=1 k=1
T
= 2D HD, (3.25)
where
µ ¶T
1 ∂E 1 ∂E
D= ,... , ,
m1 ∂y1 mp ∂yp
and H is the Hessian matrix of E, i.e., the matrix of all the mixed second derivatives
2E
of E. The entry in the i-th row and k-th column of H is ∂y∂i ∂yk
. In other words,
x1 −x1 0 0 0 ... 0
−x1 x1 + x2 −x2 0 0 ... 0
0 −x x + x −x 0 ... 0
2 2 3 3
.. ..
H = −
. . ,
.. .. .. ..
. . . . 0
0 ... 0 −xp−2 xp−2 + xp−1 −xp−1
0 ... 0 −xp−1 xp−1
where xi = −F 0 (yi+1 − yi ). Note that, by our definition of yi ’s, yi+1 − yi 6= 0, and that
F (yi+1 − yi ) is monotonically decreasing for yi+1 − yi 6= 0. Therefore, xi > 0.
All that remains to show is that H is negative semidefinite, which, combined with
(3.25), means that Ë ≤ 0. It is easily verified that −H can be factorized into a lower-
Sec. 3.4. Properties of SIDEs. 53
The diagonal entries x1 , . . . , xp−1 , 0 of the upper-triangular matrix are the pivots ( [62],
page 32) of −H. Since all the pivots are nonnegative, it follows ( [62], page 339) that
−H ≥ 0 ⇒ H ≤ 0, which implies Ë ≤ 0.
A typical picture of the energy dissipation is shown in Figure 3.7; the only points
where E might not be concave as a function of time are the merge points.
E(u(t))
t
Figure 3.7. Typical picture of the energy dissipation during the evolution of a SIDE.
In addition to being the gradient descent equation forPthe global energy (3.18), with
E 0 = F , we now show that the SIDE (3.11) also reduces N n=1 E1 (un+1 − un ) if E1 6= E
is any SIDE energy function.
Property 3.8 (Lyapunov functionals). Consider the SIDE (3.11), and let E be the
corresponding SIDE energy function: E 0 = F . Let E1 be an arbitrary SIDE energy
function (i.e., such function that E10 satisfies (3.3),(3.9)), and define
X
N
E 1 (u) = E1 (un+1 − un ).
n=1
Proof. We again use the notation from the proof of the previous property:
yi = uni for i = 1, . . . , p;
Xp
E1 = E1 (yi+1 − yi ).
i=1
X
p
∂E
Ė = ẏi
∂yi
i=1
X
p−1
= −E10 (y2 − y1 )ẏ1 + (E10 (yi − yi−1 ) − E10 (yi+1 − yi ))ẏi + E10 (yp − yp−1 )ẏp
i=2
·
1 0
= − E (y2 − y1 )F (y2 − y1 )
m1 1
X
p−1
1
+ (E 0 (yi − yi−1 ) − E10 (yi+1 − yi ))(F (yi − yi−1 ) − F (yi+1 − yi ))
mi 1
i=2
¸
1 0
+ E (yp − yp−1 )F (yp − yp−1 ) .
mp 1
The first term inside the brackets is positive, since E10 (y2 − y1 ), F (y2 − y1 ), and y2 − y1
all have the same sign. Similarly, the last term is positive. Each term in the summation
is also positive, because of monotonicity of E10 and F . Therefore, Ė < 0.
We now analyze another class of Lyapunov functionals, which includes the `2 norm
and negative entropy.
X
N
R(u) = R(un ).
n=1
Ṙ < 0,
X
p
= mi R0 (yi )y˙i
i=1
X
p−1
0
= R (y1 )F (y2 − y1 ) + R0 (yi )(F (yi+1 − yi ) − F (yi − yi−1 )) − R0 (yp )F (yp − yp−1 )
i=2
X
p−1
= (R0 (yi ) − R0 (yi+1 ))F (yi+1 − yi ). (3.26)
i=1
(R(u)) q = |un | q
i=1
Figure 3.8. A modified force function, for which sliding happens in 2-D, as well as in 1-D.
conjunction with Equation (3.12). Since sliding modes do not necessarily occur on the
discontinuity hyperplanes, there is no global continuous dependence on the initial data.
In particular, the sequence of hitting times and associated discontinuity planes does not
depend continuously on initial conditions, and our SIDE evolution does not correspond
to a limiting form of a Perona-Malik evolution in 2-D but in fact represents a decidedly
different type of evolutionary behavior. Several factors, however, indicate the value of
this new evolution and also suggest that a weaker stability result can be proven. First of
all, as shown in the experimental results in the next section, SIDEs can produce excellent
segmentations in 2-D images even in the presence of considerable noise. Moreover,
thanks to the maximum principle, excessively wild behavior of solutions is impossible,
something that is again confirmed by the experiments of the next section. Consequently,
the sequence of hit times (3.17) does not seem to be very sensitive to the initial condition
in that the presence of noise, while perhaps perturbing the ordering of hitting times
and the sliding planes that are hit, seems to introduce perturbations that are, in some
sense, “small”.
Sec. 3.5. Experiments. 57
Finally, we note without giving details that the properties on energy dissipation (3.6
and 3.7) and Property 3.9 on Lyapunov functionals, carry over to 2-D, as well as their
proofs—with slight changes to accommodate the fact that a region may have more than
two neighbors in 2-D.
¥ 3.5 Experiments.
In this section we present examples in both 1-D and 2-D. The purpose of 1-D experi-
ments is to provide the basic intuition for how SIDEs work, as well as to contrast SIDEs
with the methods reviewed in the previous chapter. We do not claim that SIDEs are
the best for any of these 1-D examples, for which good results can be efficiently ob-
tained using simple algorithms. In 2-D, however, this is no longer true, and SIDEs have
considerable advantages over the existing methods.
Choosing a SIDE force function best suited for a particular application is an open
research question. (It is partly addressed in Chapter 4, by describing the problems for
which F (v) = sgn(v) is the best choice.) For the examples below, we use a very simple,
piecewise-linear force function F (v) = sgn(v) − Lv , depicted in Figure 3.9. Note that,
F(v)
1
-L L v
-1
Figure 3.9. The SIDE force function used in the experimental section.
formally, this function does not satisfy our definition (3.3) of a force function, since it is
negative for v > L. Therefore, in our experiments we always make sure that L is larger
than the dynamic range of the signal or image to be processed. In that case, thanks to
the maximum principle, we will have |ui (t) − uj (t)| < L for any pair of pixels at any
time t during evolution, and therefore F (|ui (t) − uj (t)|) > 0.
As we mentioned before, choosing the appropriate stopping rule is also an open
problem. In the examples to follow, we assume that we know the number of regions we
are looking for, and stop the evolution when that number of regions is achieved.
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
0 50 100 150 200 0 50 100 150 200
3 3
2 2
1 1
0 0
−1 −1
−2 −2
0 50 100 150 200 0 50 100 150 200
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
0 50 100 150 200 0 50 100 150 200
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
0 50 100 150 200 0 50 100 150 200
Figure 3.11. Scale space of a Perona-Malik equation with a large K for the noisy step of Figure 3.10.
Note that the last remaining edge, i.e., the edge in Figure 3.10(d) for the hitting time at
which there are only two regions left, is located between samples 101 and 102, which is
quite close to the position of the original edge (between the 100-th and 101-st samples).
In this example, the step in Figure 3.10(d) also has amplitude that is close to that
of the original unit step. In general, thanks to the stability of SIDEs, the sizes of
discontinuities will be diminished through such an evolution, much as they are in other
evolution equations. However, from the perspective of segmentation this is irrelevant–
Sec. 3.5. Experiments. 59
7
−1
−2
−3
0 50 100 150 200
−1
−2
−3
0 50 100 150 200
−1
−2
−3
0 50 100 150 200
i.e., the focus of attention is on detecting and locating the edge, not on estimating its
amplitude.
This example also provides us with the opportunity to contrast the behavior of a
SIDE evolution with a Perona-Malik evolution and in fact to describe the behavior that
originally motivated our work. Specifically, as we noted in the discussion of Property
3.5 of the previous section, a SIDE in 1-D can be approximated with a Perona-Malik
equation of a small thickness K. Observe that a Perona-Malik equation of a large
thickness K will diffuse the edge before removing all the noise. Consequently, if the
objective is segmentation, the desire is to use as small a value of K as possible. Following
the procedure prescribed by Perona, Shiota, and Malik in [50], we computed the
histogram of the absolute values of the gradient throughout the initial signal, and fixed
K at 90% of its integral. The resulting evolution is shown in Figure 3.11. In addition to
its good denoising performance, it also blurs the edge, which is clearly undesirable if the
objective is a sharp segmentation. The comparison of Figures 3.10 and 3.11 strongly
suggests that the smaller K the better. It was precisely this observation that originally
60 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
4 4
3 3
2 2
1 1
0 0
4 4
3 3
2 2
1 1
0 0
Figure 3.13. Scale space of a SIDE for a noisy blurred 3-edge staircase: (a) noise-free original signal;
(b) its blurred version with additive noise; (c),(d) representatives of the resulting SIDE scale space.
motivated the development of SIDEs. However, while in 1-D a SIDE evolution can be
viewed precisely as a limit of a Perona-Malik evolution as K goes to 0, there is still
an advantage to using the form of the evolution that we have described rather than
a Perona-Malik evolution with a very small value of K. Specifically, the presence of
explicit reductions in dimensionality during the evolution makes a SIDE implementation
more efficient than that described in [50]. Even for this simple example the Perona-
Malik evolution that produced the result comparable to that in Figure 3.10 evolved
approximately 5 times more slowly than our SIDE evolution. (Both were implemented
via forward Euler discretization schemes [14] in MATLAB.) Although a SIDE in 2-D
cannot be viewed as a limit of Perona-Malik evolutions, the same comparison in speed
of evolution is still true, although in this case the difference in computation time can
be orders of magnitude.
In this example, the region merging method of Koepfler, Lopez and Morel [36] works
quite well (see Figure 3.12). We will soon see, however, that it is not as robust as SIDEs:
its performance worsens dramatically when signals are corrupted with a heavy-tailed
noise.
−1
−2
−3
0 50 100 150 200
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
−3 −3
0 50 100 150 200 0 50 100 150 200
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
−3 −3
0 50 100 150 200 0 50 100 150 200
7 7
6 6
5 5
4 4
3 3
2 2
1 1
0 0
−1 −1
−2 −2
−3 −3
0 50 100 150 200 0 50 100 150 200
Figure 3.15. Scale spaces for the signal of Figure 3.14: SIDE (left) and Koepfler-Lopez-Morel (right).
Top: 33 regions; middle: 11 regions; bottom: 2 regions.
We now compare the robustness of our algorithm to Koepfler, Lopez, and Morel’s
[36] region merging minimization of the Mumford-Shah functional [44]. For that pur-
pose, we use Monte-Carlo simulations on a unit step signal corrupted by “heavy-tailed”
62 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
16 45
Koepfler−Lopez−Morel Koepfler−Lopez−Morel
SIDE SIDE
14 40
35
12
30
10
25
8
20
6
15
4
10
2 5
0 0
0 0.05 0.1 0.15 1 2 3
(a) (b)
Figure 3.16. Mean absolute errors for Monte-Carlo runs. (Koepfler-Lopez-Morel: solid line; SIDE:
broken line.) The error bars are ±two standard deviations. (a) Different contamination probabilities
(0, 0.05, 0.1, and 0.15); contaminating standard deviation is fixed at 2. (b) Contamination probability
is fixed at 0.15; different contaminating standard deviations (1, 2, and 3).
noise which is, with high probability 1 − ε, normally distributed with σ1 = 0.1, and,
with low probability ε, normally distributed with a larger standard deviation σ2 . A
typical sample path, for ε = 0.1 and σ2 = 2, is shown in Figure 3.14. The SIDE and
Koepfler-Lopez-Morel scale spaces for this signal are illustrated in Figure 3.15. During
every Monte-Carlo trial, each algorithm was stopped when only two regions remained,
and the resulting jump location was taken as the output. When σ2 = 2, the mean ab-
solute errors in locating the jump for ε = 0, ε = 0.05, ε = 0.1, and ε = 0.15 are shown
in Figure 3.16(a) (the solid line is Koepfler-Lopez-Morel, the broken line is SIDE). The
error bars are ±two standard deviations. Figure 3.16(b) shows the mean absolute errors
for different standard deviations σ2 of the contaminating Gaussian, when ε is fixed at
0.15.
As we anticipated in Chapter 2 and will further discuss in the next section, the
quadratic term of the Mumford-Shah energy makes it non-robust to heavy-tailed noise,
and the performance degrades considerably as the contamination probability and the
variance of the contaminating Gaussian increase. Note that when σ2 = 3 and ε = 0.15,
using the Koepfler-Lopez-Morel algorithm is not significantly better than guessing the
edge location as a random number between 1 and 200. At the same time, our method
is very robust, even if the outlier probability is as high as 0.15.
Figure 3.17 shows the scale space generated by a Perona-Malik equation for the
step signal with heavy-tailed noise depicted in Figure 3.14. As in Experiment 1, K was
fixed at 90% of the histogram of the gradient, in accordance with Perona, Shiota, and
Malik [50]. As before, its de-noising performance is good; however, it also introduces
blurring and therefore its output does not immediately provide a segmentation. In
order to get a good segmentation from this procedure, one needs to devise a stopping
rule, so as to stop the evolution at a scale when noise spikes are diffused but the step
Sec. 3.5. Experiments. 63
7
−1
−2
−3
0 50 100 150 200
−1
−2
−3
0 50 100 150 200
−1
−2
−3
0 50 100 150 200
−1
−2
−3
0 50 100 150 200
Figure 3.17. Scale space of a Perona-Malik equation with large K for the signal of Figure 3.14.
is not completely diffused (such as in the second plot of Figure 3.14). In addition, one
needs to use an edge detector in order to extract the edge from the signal at that scale.
We again emphasize that neither SIDEs, nor the Koepfler-Lopez-Morel algorithm,
nor any combination of the Perona-Malik equation with a stopping rule and an edge
detector, are optimal for this simple 1-D problem, for which near-perfect results can be
achieved in a computationally efficient manner by very simple procedures. The purpose
of including this example is to provide statistical evidence for our claim of robustness of
SIDEs. This becomes very important for complicated 2-D problems, such as the ones
considered in the next example, where simple techniques no longer work.
64 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
100 100
200 200
100 200 100 200
2 regions left SAR image: final segmentation
100 100
200 200
100 200 100 200
Figure 3.18. Scale space of a SIDE for the SAR image of trees and grass, and the final boundary
superimposed on the initial image.
Figure 3.19. Segmentations of the SAR image via the region merging method of Koepfler, Lopez,
and Morel.
10 10
20 20
30 30
40 40
50 50
60 60
70 70
80 80
90 90
100 100
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
20 20
30 30
40 40
50 50
60 60
70 70
80 80
90 90
100 100
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100
∂ 0
ut = C [sgn(ux )] + Fid (ux )uxx . (3.27)
∂x
The first of the RHS terms is the 1-D version of the gradient descent on total variation.
It has very good noise removal properties, but, if used alone, it will ultimately blur
the signal. If Fid (v) = − 12 v|v|, then the second term is equal to the RHS of one of the
shock filters introduced by Osher and Rudin in [46]—namely, Equation (2.16) which we
considered in Chapter 2. Discretizations of certain shock filters are excellent for edge
enhancement, but, as we saw in Chapter 2, they cannot remove noise (and, in fact, some
of them are unstable and noise-enhancing.) Thus, SIDEs combine the noise-suppressive
properties of the total variation approach with the edge-sharpening features of shock
Sec. 3.7. Conclusion. 67
filters. It should be noted, however, that (3.27) requires careful interpretation, because
its RHS contains the signum function of ux which itself may have both singularities
and segments over which it is identically zero. In addition, this strange object is dif-
ferentiated with respect to x. Thus, the interesting research issue arises of defining
what one means by a solution to (3.27), in such a manner that the definition results
in solutions relevant to the desired image processing applications. This complicated
problem is avoided entirely with the introduction of SIDEs, since there one starts with
a semi-discrete formulation, in which the issues of the existence and uniqueness of so-
lutions are well understood. The SIDEs are thus a logical extension of Bouman and
Sauer’s approach of [6, 58] in which images are discrete matrices of numbers, rather
than functions of two continuous variables. As described in Chapter 2, Bouman and
Sauer proposed minimizing an energy functional consisting of two terms, one of which
is the discrete counterpart of the total variation. Their method of quickly computing
a local minimum of this non-differentiable functional involved merging pixels and thus
anticipated SIDEs.
¥ 3.7 Conclusion.
In this chapter we have presented a new approach to edge enhancement and segmenta-
tion, and demonstrated its successful application to signals and images with very high
levels of noise, as well as to blurry signals. Our approach is based on a new class of
evolution equations for the processing of imagery and signals which we have termed
stabilized inverse diffusion equations or SIDEs. These evolutions, which have discontin-
uous right-hand sides, have conceptual and mathematical links to other evolution-based
methods in signal and image processing, but they also have their own unique qualitative
characteristics and properties. The next chapter is devoted to extensive analysis of a
particular version of SIDEs.
68 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
Chapter 4
Probabilistic Analysis
¥ 4.1 Introduction.
HE recent years have seen a great number of exciting developments in the field of
T nonlinear diffusion filtering of images. As summarized in Chapter 2 and Section 3.6,
many theories have been proposed that result in edge-preserving scale spaces possessing
various interesting properties. One striking feature unifying many of these frameworks–
including the one introduced in the previous chapter–is that they are deterministic.
Usually, one starts with a set of “common-sense” principles which an image smoothing
operation should satisfy. Examples of these are the axioms in [1] and the observation
in [49] that, in order to achieve edge preservation, very little smoothing should be done
at points with high gradient. From these principles, a nonlinear scale space is derived,
and then it is analyzed–again, deterministically. Note, however, that since the objective
of these techniques is usually restoration or segmentation of images in the presence of
noise, a natural question to ask would be:
An affirmative answer would help us understand which technique is suited best for a
particular application, and aid in designing new algorithms. It would also put the tools
of the classical detection and estimation theory at our disposal for the analysis of these
techniques, making it easier to tackle an even more crucial question:
Attempts to address these issues in the literature have remained scarce–most likely,
because the complex nature of the nonlinear partial differential equations (PDEs) con-
sidered and of the images of interest make this analysis prohibitively complicated. Most
notable exceptions are [63,75] which establish qualitative relations between the Perona-
Malik equation [49] and gradient descent procedures for estimating random fields mod-
eled by Gibbs distributions. Bayesian ideas are combined in [76] with snakes and region
growing for image segmentation. In [5], concepts from robust statistics are used to
69
70 CHAPTER 4. PROBABILISTIC ANALYSIS
(a) (b)
Figure 4.1. Functions F from the right-hand side of the SIDE: (a) generic form; (b) the signum
function.
modify the Perona-Malik equation. In [38], a connection between random walks and
diffusions is used to obtain a new evolution equation.
The goal of this chapter is to move forward the discussion of questions (*) and (**).
We consider a very simple nonlinear diffusion (a variant of those introduced in the
previous chapter) which provides a multiscale sequence of segmentations of its initial
condition. In Sections 4.3 and 4.4, we apply our algorithm to 1-D signals, and describe
edge detection problems which are solved optimally by this diffusion. These are binary
classification problems: each sample has to be classified as coming from one of two
classes, subject to the constraint on the number of “edges”—i.e., changes between the
two classes. One of these problems turns out to be the minimization of a special case
of the 1-D Mumford-Shah functional [44]. We describe an efficient implementation of
the 1-D diffusion, requiring O(N log N ) computations in the worst case, where N is
the size of the input signal. In Section 4.5, we point out that the same 1-D problem
can also be solved via dynamic programming and via linear programming, but that
our method has certain advantages over both. To analyze the performance (Section
4.6), we simplify even further, by considering signals with only one change in mean.
Our performance measure is the accuracy in locating the change. More precisely, the
probability of events of the form “the detected change location is more than p samples
away from the actual one” is analyzed. We show that the asymptotic probabilities
of these events can be obtained directly from the classical change detection paper by
Hinkley [28]. We also derive non-asymptotic lower bounds on these probabilities. The
robustness of the algorithm—which is experimentally confirmed both in this chapter
and in Chapter 3—is analyzed theoretically by showing the optimality with respect to
a certain H∞ -like criterion. In Section 4.7, we treat segmentation of 2-D images.
following equation:
sgn(u2 − u1 ) sgn(uN −1 − uN )
u̇1 = , u̇N = ,
m1 mN
1
u̇n = (sgn(un+1 − un ) − sgn(un − un−1 )), (4.1)
mn
for n = 2, . . . , N − 1,
u(0) = u0 , (4.2)
The best hypothesis among those whose number of edges does not exceed some constant
ν is
−1
0 2 4 6 8 10
0 2 4 6 8 10
0 2 4 6 8 10
Figure 4.2. Illustrations of Definitions 4.1 and 4.3: a sequence with three α-crossings, where α=3
(top); the hypothesis generated by the three α-crossings (middle); the hypothesis generated by the two
rightmost α-crossings (bottom).
Note that an hypothesis is uniquely defined by the set of its edges and the sign
of one of the edges. Therefore, binary classification problems can also be viewed as
edge detection problems. For the problems considered in this chapter, the optimal edge
locations will typically be level crossings of some signal.
Definition 4.3. A signal u is said to have an α-crossing at the location i if (ui − α)(uj − α) < 0,
where j = min{n: n > i, un 6= α}. (In other words, uj is the first sample to the right of
i which is not equal to α.) We call sgn(α − ui ) the sign of the α-crossing, and say
that the α-crossing is directed upward (downward) if ui < α (ui > α). We define the
hypothesis generated by a set of α-crossings {g1 , . . . , gν } of u as the hypothesis whose
edges are at g1 , . . . , gν and for which the sign of each edge is the same as the sign of
the corresponding α-crossing.
To illustrate Definitions 4.1, 4.2, and 4.3, let us consider an example.
Example 4.1. Illustration of the definitions of edges, α-crossings, and statistics.
Suppose
u = (1, 2, 2, 3, 3, 4, −1, 2, 5)T
Sec. 4.3. SIDE as an Optimizer of a Statistic. 73
(see top of Figure 4.2), and α = 3. Then u has three α-crossings, at locations g1 = 3,
g2 = 6, and g3 = 8. The second one is directed downward, and the other two are directed
upward. The hypothesis h1 generated by these three α-crossings must therefore have
upward edges at 3 and 8 and a downward edge at 6:
h1 = (0, 0, 0, 1, 1, 1, 0, 0, 1)T ,
as depicted in the middle plot of Figure 4.2. The hypothesis h2 generated by the
α-crossings g2 and g3 will only have a downward edge at 6 and an upward edge at 8:
h2 = (1, 1, 1, 1, 1, 1, 0, 0, 1)T ,
φ(u, h) = hT (u − a),
where a = (3, . . . , 3)T ∈ IR9 ,
then we have:
φ(u, h1 ) = 3,
φ(u, h2 ) = −1.
Then every edge of h∗≤ν (u0 ) occurs at an α-crossing of u0 , for any u0 ∈ IRN .
Proof is in Appendix B.
Proposition 4.2. Suppose that φ is the statistic described in Proposition 4.1. Fix the
initialPcondition u0 of the SIDE (4.1), and let u(t) be the corresponding solution. Then
α( N1 N i=1 ui (t)) is constant during the evolution of the SIDE, as verified by summing
up the equations (4.1). Let να (t) be the number of α-crossings of u(t). Then, for any
time instant tf > 0,
h∗≤να (tf ) (u0 ) = h∗ (u(tf )).
The proof is in Appendix B. We note that Proposition 1 of [51] is a different
formulation of the same result: in [51], we explicitly listed the properties of φ which are
used in the proof. The equivalence of the two formulations is also shown in Appendix
B.
This proposition says that, if the SIDE is evolved until να (t) α-crossings remain,
then these α-crossings are the optimal edges, where “optimality” means maximizing
the statistic φ(u0 , h) subject to the constraint that the hypothesis have να (t) or fewer
edges. It is verified in the next subsection that να (t) is a non-increasing function of time,
with να (∞) = 0. Unfortunately, να (t) is not guaranteed to assume every integer value
between να (0) and 0: during the evolution of the SIDE, α-crossings can disappear in
pairs. We will show in the next subsection, however, that no more than two α-crossings
can disappear at the same time. We will also show that, even if for some integer
ν < να (0) there is no t such that να (t) = ν (i.e. να (t) goes from ν + 1 directly to ν − 1),
we can still easily find h∗≤ν (u0 ) using the set of α-crossings of the solution u(t) to the
SIDE at the time t when να (t) = ν + 1. If the desired number of edges is greater than
or equal to the initial number of α-crossings, ν ≥ να (0), then, from the definitions of
h∗≤ν (u0 ) and h∗ (u0 ), we immediately have:
Proposition 4.3. Suppose that φ is the statistic described in Proposition 4.1. If ν ≥
να (0), then
h∗≤ν (u0 ) = h∗ (u0 ).
In the remainder of the chapter, we assume that ν < να (0).
In Section 4.4, we will give examples of detection problems whose solution is equiv-
alent to maximizing the statistic φ. We will therefore be able to utilize the SIDE for
optimally solving these problems. Before we do that, however, we describe how to
efficiently implement the SIDE.
Sec. 4.3. SIDE as an Optimizer of a Statistic. 75
where a(x) = (α(x), . . . , α(x))T ∈ IRN , and α is a real-valued function of a real argu-
ment. Given an integer ν and a signal u0 , we are interested in finding the best hypothesis
h∗≤ν (u0 ) among all the hypotheses with ν or fewer edges, where “the best” means the
one maximizing φ(u0 , ·).
Proposition 4.2 relates h∗≤ν (u0 ) to the solution u(t) of the SIDE whose initial data is
u0 . Namely, it says that if ν is the number of α-crossings of u(t)1 , then these α-crossings
generate the hypothesis h∗≤ν (u0 ). It is, however, not guaranteed that for every integer
ν there is a time instant t when the solution u(t) has exactly ν α-crossings. Therefore,
in order to compute the solution to Problem 4.1, we need to deal with two issues:
We first consider issue A. In order to find the α-crossings of the solution to the SIDE, one
can certainly use a finite difference scheme to numerically integrate the equation. There
is, however, a much faster way, which exploits the special structure of the equation. It
turns out that, during the evolution of the SIDE, α-crossings cannot be created or
shifted: they can only be erased. We therefore only need to compute the order in which
they disappear. We now make these statements precise.
Lemma 4.1. Suppose that at time t0 , the solution u(t0 ) to the SIDE has no α-crossing
at the location i. Then u(t) has no α-crossing at i, either, for all t ≥ t0 .
We illustrate this Lemma by evolving the SIDE (4.1) for the initial condition
(top of Figure 4.2) and α = 3. The values of the solution for several time instants are
recorded in Table 1.
1 1
PN
Just as in the previous subsection, we abuse notation by dropping the argument N i=1 ui (t) of
α.
76 CHAPTER 4. PROBABILISTIC ANALYSIS
t = 1 23 (2 29 , 2 29 , 2 29 , 2 29 , 2 29 , 2 29 , 2 16 , 2 16 , 3 23 )
t = 2 (2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 3 13 )
t = 2 23 (2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 23 )
Lemma 4.2. Let (i, j) be a region of u(t0 ), where t0 ≥ 0, and let the intensity values
inside this region be above α: ui (t0 ) > α, . . . , uj (t0 ) > α. Let t1 be the first time instant
after t0 at which one of the values inside the region, say uk (t1 ), becomes equal to α:
Then
Proof. Notice that, according to Equations (4.1,4.3), uk (t) can be decreasing only
if it is a local maximum (i.e., if uk (t) ≥ uk±1 (t)). Thus, at time t1 , we have that
Consequently, it must be that the values at all the samples inside the region (i, j) are
equal to α.
Proof of Lemma 4.1. A proof similar to the one above applies to the variant of Lemma
4.2 when ui (t0 ), . . . , uj (t0 ) are less than α. Thus, the value uk (t) at any location k can
2
Note that this definition is somewhat different from the one in Chapter 3, where all pixels in a
region had the same value.
Sec. 4.3. SIDE as an Optimizer of a Statistic. 77
cross the level α only when its whole region does so. This means that the evolution of
the SIDE cannot create or shift α-crossings; it can only remove them.
We now show how to calculate the order in which the regions disappear. It turns
out that this ordering depends on how the removal of a region influences the statistic
φ(u(t), h). Define the energy Eij of the region (i, j) by
¯ j ¯
1 ¯¯X ¯
¯
Eij (t) = ¯ (un (t) − α)¯ , (4.5)
ρij ¯ ¯
n=i
½
1 if i = 1 or j = N
ρij =
2 otherwise.
Note that the energy measures the contribution of the region (i, j) to φ(u(t), h). Sum-
ming up the equations (4.1) from n = i to n = j, we see that, for every region (i, j),
Ėij (t) = −1. A region (i, j) is erased when the values of all its samples reach α—i.e.,
when the energy Eij (t) becomes equal zero. Since all the energies are diminished with
the same speed, it follows that the first region to disappear will be the one for which
Eij (0) is the smallest. Applying this reasoning recursively, we then remove the region
with the next smallest energy, etc., obtaining the following algorithm to compute the
α-crossings of u(t).
1. Initialize. Let A be the set of all α-crossings of u0 , ordered from left to right, and
let ν̄ = να (0) be the total number of α-crossings.
2. Compute the energies. Denote the elements of the set A by g1 , . . . , gν̄ , and form
ν̄ + 1 regions: (1, g1 ), (g1 + 1, g2 ), . . . , (gν̄ + 1, N ). For each region (i, j), compute its
energy Eij , as defined by (4.5).
3. Remove the region with minimal energy. Let (im , jm ) be the region for which
Eij is the smallest (if there are several regions with the smallest energy, choose any
one). Merge the region (im , jm ) with its neighbors by re-defining A and ν̄ via
Iteration 1. There are three α-crossings, and four regions: (1, 3), (4, 6), (7, 8), and
(9, 9), with the energies E13 = 4, E46 = 0.5, E78 = 2.5, and E99 = 2, respectively. The
78 CHAPTER 4. PROBABILISTIC ANALYSIS
region (4, 6) has the smallest energy, and therefore it is removed first, by merging it
with its two neighbors to form the new region (1, 8).
Iteration 2. There are now two regions, (1, 8) and (9, 9), with the energies E1,8 = 8
and E99 = 2, respectively. They are merged, to form one region (1, 9). Note that the
order in which the regions disappear is in agreement with Table 1.
We now show that the algorithm is fast. The initialization steps 1 and 2 take
O(N ) time. Step 3 merges either two regions (if im = 1 or jm = N ) or three regions
(otherwise). The energy of the new region is essentially the sum of the energies of its
constituent regions, and therefore the recomputation of energies after a merging takes
O(1) time. If a binary heap [11] is used to store the energies, the size of the heap at
every iteration of the algorithm will be equal to the current number of distinct regions,
which is ν̄ + 1, where ν̄ is the current number of α-crossings. Therefore, finding the
minimal element of the heap (step 3) at each iteration will take O(log ν̄ + 1) time, which
Pνα (0)
means that the algorithm will run in O( ν̄=ν+1 log ν̄ +N ) time. The worst case is when
ν = 1 and να (0) = N − 1. Then the computational complexity is O(N log N ). However,
if the desired number of edges ν is comparable with the initial number να (0) (which
can happen in low-noise scenarios), the complexity is O(N ).
We still have to address Question (B) which we posed at the beginning of this
subsection, namely, how to find h∗≤ν (u0 ) for every integer ν. If there is a time instant
t at which u(t) has exactly ν α-crossings, then, according to Proposition 4.2, these
α-crossings generate the hypothesis h∗≤ν (u0 ), which means that Problem 4.1 is solved.
The scenario which we need to consider now is when there is no such time t at which
u(t) has exactly ν α-crossings. As we showed above, computing the locations of α-
crossings of u(t) involves removing regions one at a time (Step 3 of the algorithm).
Thus, at most two α-crossings can disappear at the same time. Therefore, if u(t) never
has ν α-crossings, it must go from ν + 1 α-crossings directly to ν − 1. It turns out that,
in this case, one can still compute h∗≤ν (u0 ) by running the algorithm above until ν − 1
α-crossings remain, and then doing post-processing whose computational complexity is
O(N ). Specifically, the following proposition holds.
Proposition 4.4. Suppose that there is a time instant t during the evolution of the
SIDE such that u(t− ) has ν + 1 α-crossings, at locations g1 , . . . , gν+1 . The hypoth-
esis generated by these α-crossings is h∗≤ν+1 (u0 ). Suppose further that the region
(gk + 1, gk+1 ) disappears at time t, so that u(t+ ) has ν − 1 α-crossings, at locations
g1 , . . . , gk−1 , gk+2 , . . . , gν+1 . The hypothesis generated by these α-crossings is h∗≤ν−1 (u0 ).
Then one of the following four possibilities must happen.
(i) h∗≤ν (u0 ) = h∗≤ν−1 (u0 ).
(iv) h∗≤ν (u0 ) has edges at the locations g1 , . . . , gk−1 , gk+2 , . . . , gν+1 , as well as one edge
at the location which is an element of the set {1, 2, . . . , g1 −1, gν+1 +1, . . . , N −1}.
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 79
Thus, finding h∗≤ν (u0 ) is achieved by running the SIDE and doing post-processing of
complexity O(N ).
Proposition 4.4 is the recipe for obtaining the optimal hypothesis h∗≤ν (u0 ) from
the ν + 1 α-crossings of u(t− ). It says that either ν or ν − 1 of these α-crossings
coincide with the edges of h∗≤ν (u0 ). Cases (ii) and (iii) describe the only two subsets
consisting of ν α-crossings which can generate h∗≤ν (u0 ). If only ν − 1 α-crossings of
u(t− ) coincide with the edges of h∗≤ν (u0 ), then either there are no other edges (Case
(i)), or the remaining edge is easily found in linear time (Case (iv)).
We note that a slight correction to what was reported in [51] is in order: although
the statement that the complexity of the post-processing is O(N ) is correct, the specific
post-processing procedure given there is somewhat different from the one outlined in
Proposition 4.4 above, and therefore, it may be incorrect for some data sequences.
As one can infer from the statement of this proposition, its proof is rather technical
and amounts to analyzing various scenarios of the disappearance of the α-crossings.
This proposition is a direct corollary of Proposition 4.1 and the following lemma.
Lemma 4.3. As in Proposition 4.4, let t be the time instant when the solution of the
SIDE goes from ν + 1 α-crossings to ν − 1. Let h = h∗≤ν (u0 ), and suppose that it is
generated by ν α-crossings of u0 : g1 , . . . , gν . (Note that this notation is different from
the notation of Proposition 4.4.) Then at least ν − 1 elements of the set {g1 , . . . , gν }
are also α-crossings of u(t− ), with possible exception of either g1 or gν . Furthermore,
if exactly ν − 1 elements of the set {g1 , . . . , gν } are α-crossings of u(t− ), they are also
α-crossings of u(t+ ),
Proof is in Appendix B.
−2
−4
0 200 400 600 800 1000
−2
−4
0 200 400 600 800 1000
−2
−4
0 200 400 600 800 1000
−2
−4
0 200 400 600 800 1000
−2
−4
0 200 400 600 800 1000
1.5
where the hypothesis h is such that the sample yi is hypothesized to be from the pdf
f (y, θhi ). Note that by defining a signal consisting of pointwise log-likelihoods,
X
N
T 0
h u + log f (yi , θ0 ).
i=1
The second term is independent of h, and therefore maximizing this function is equiv-
alent to maximizing
def
φ(u0 , h) = hT u0 , (4.7)
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 81
which is the statistic of Proposition 4.2 with α = 0. Thus, the SIDE can be employed
for finding the maximum likelihood hypothesis h∗≤ν (u0 ), where u0 is related to the
observation y through (4.6).
In this example, f (y, θj ) is the Gaussian density with mean θj and variance 1. We
took θ0 = 0 and θ1 = 1. We assumed that the right number of edges, 10, is known,
and so the stopping rule for the SIDE was να (t) ≤ 10. (In Subsection 4.4.3, we will
treat the situation when the number of edges is a random variable, rather than a known
parameter.)
The pointwise log-likelihoods (4.6) in this case are
1
u0i = yi − (θ1 + θ0 ). (4.8)
2
(Note that, if u(t) is the solution to the SIDE with the initial condition u0 of (4.8),
and u0 (t) is the solution to the SIDE with the initial condition u0 (0) = y, then u0 (t) =
u(t) + α0 , where
1
α0 = (θ1 + θ0 ), (4.9)
2
and therefore the zero-crossings of u(t) coincide with the α0 -crossings of u0 (t). Conse-
quently, we can simply evolve the SIDE with the data y as the initial condition, and
look at its α0 -crossings.)
Figure 4.3, from top down, depicts the true segmentation with ten edges, a corre-
sponding observation y, and the edges detected by the SIDE (the bottom plot will be
explained in the next subsection). Note that the result is extremely accurate, despite
the fact that the data is very noisy. The computations took 0.25 seconds on a Sparc
Ultra 1, thanks to the fast implementation described in Subsection 4.3.1.
independent given h, with the i-th random variable Yi having conditional pdf f (y, θhi ).
Let ν be an upper bound on √ the number of edges in h. Let K be the number of zeros
σ
in h, and define σ1 = θ1 −θ0
N . Let the prior knowledge be as follows:
θ0 and h are unknown;
σ, σ1 , and ν are known;
K is a random variable with the following discrete Gaussian probability mass function:
à !
1 k − N 2
pr(K = k) = C exp − 2
, k = 1, . . . , N − 1, (4.10)
2 σ1
where f1 is the conditional pdf of Y. After simplifying this formula, we obtain that ĥ
must maximize
N −k X
N
def
φ(y, h) = h y−
T
yi
N
i=1
à à !!
1 X
N
= hT
y−a yi ,
N
i=1
where α(x) = x, and, as in Proposition 4.2, a(x) = (α(x), . . . , α(x))T ∈ IRN . Thus,
according to Proposition 4.2, in order to find ĥ, one has to evolve the SIDE whose
initial condition is the observed signal: u0 = y. The α-crossings of the solution u(t)
will then coincide with the optimal edges, where
1 X
N
α= ui (t). (4.11)
N
i=1
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 83
Thus, the only difference from Example 4.4 is that the threshold α0 (4.9) is unknown,
since 12 (θ1 + θ0 ) is unknown. The threshold α (4.11) can be considered as an estimate
of α0 . If the underlying signal has roughly as many samples with mean θ0 as ones with
mean θ1 , then α is a good estimate of α0 , and we expect the estimates of the edge
locations to be comparable to those in Example 4.4—i.e., despite less knowledge, the
optimal estimates of the edge locations in this example would be similar to the optimal
estimates of Example 4.4. This is confirmed by the experimental result for the data of
Example 4.4, shown in the bottom plot of Figure 4.3, which is still very good and differs
from the result of Example 4.4 in only two pixels out of the thousand. If the number of
samples with mean θ0 greatly differs from N2 , we would expect α to be a poor estimate
of α0 , which will lead to larger errors in the optimal estimates of edge locations. This
situation, however, is of low probability, according to our model (4.10).
We saw in Subsection 4.4.1 that the h-dependent part of the likelihood term is hT u0 ,
where u0 is the sequence of log-likelihood ratios. Therefore, maximizing (4.12) is equiv-
alent to minimizing the following statistic ψ(u0 , h):
be the hypothesis which achieves the maximal hT u0 among all the hypotheses with ν̄
or fewer edges. Suppose we could show that hψ is actually one of the hypotheses (4.14).
84 CHAPTER 4. PROBABILISTIC ANALYSIS
Then we could compute hψ as follows: run the SIDE to compute the N hypotheses
(4.14), compute ψ(u0 , ·) for each of them, and pick the hypothesis which results in the
smallest ψ(u0 , ·). To complete the proof, we now show that hψ is indeed one of the
hypotheses (4.14).
Let us fix an arbitrary hypothesis h̄ with ν̄ edges, and let ν ∗ ≤ ν̄ be the number of
edges in the hypothesis h∗≤ν̄ (u0 ). Then, by the definition of h∗≤ν̄ (u0 ), we have:
© ªT
h∗≤ν̄ (u0 ) u0 ≥ h̄T u0 . (4.15)
In other words, for an arbitrary hypothesis h̄, we found an hypothesis from among
(4.14) which results in a smaller (or equal) ψ(u0 , ·). Therefore, the optimal hypothesis
hψ is among (4.14).
The second main result of this subsection is that having the exponential distribution
pν is equivalent to specifying a stopping rule for the SIDE.
Proposition 4.6. Let
e−λ − 1 −λν̄
pν (ν̄) = e , for ν̄ = 0, 1, . . . , N − 1, (4.17)
e−λN − 1
and let hψ be the hypothesis which minimizes ψ(u0 , ·) (4.13). Then the algorithm of
Subsection 4.3.1, with a modified stopping rule, will produce hψ . The new stopping
criterion is:
Suppose that the solution to the SIDE has ν̄ + 1 zero crossings at some time instant
t, and call the hypothesis generated by these zero crossings h1 . Let the next region to
be removed be (i∗ , j ∗ ), and call the hypothesis resulting from its removal h2 . Let us
denote by E ∗ (t) the energy of the region (i∗ , j ∗ ):
In order to determine which hypothesis is better with respect to η(h), we will look at
η(h2 ) − η(h1 ). First note that
¯ ∗ ¯
¯Xj ¯
¯ ¯
(h2 − h1 )T u0 = − ¯¯ u0n ¯¯ = −ρi∗ ,j ∗ E ∗ (t),
¯n=i∗ ¯
and therefore
So, if
where Γ are the edges, i.e. the set on which u is discontinuous; ν̄ is the total length of
the edges; and γ and λ are constants which control the smoothness of u within regions
and the total length of the edges, respectively. If an approximation u is sought which is
constant within each region [36,43], the second term disappears. In 1-D, the integration
is over IR1 , and ν̄ is simply the number of the discontinuities in u. Assuming that we
seek a piecewise-constant approximation, we discretize the 1-D version of (4.20):
1
(u − y)T (u − y) + λν̄. (4.21)
2
PN
If one is looking for a binary approximation u = h ∈ {0, 1}N , then hT h = i=1 hi , and
so if we define
1
u0i = yi − , (4.22)
2
then minimizing (4.21) is equivalent to minimizing η(h) (4.18). We note that (4.22)
defines the log-likelihood ratios for the situation when p(yi |h) is the Gaussian density
with unit variance and mean hi . Indeed, in this case
1 1
log p(yi |hi = 1) − log p(yi |hi = 0) = − (yi − 1)2 + yi2
2 2
1
= yi − .
2
We have just shown the following.
Proposition 4.7. If
e−λ − 1 −λν̄
pν (ν̄) = e , for ν̄ = 0, 1, . . . , N − 1, and
e−λN − 1
1 1 2
p(yi |h) = √ e− 2 (yi −hi ) , for i = 1, . . . , N,
2π
then the generalized likelihood function is (4.18), which is
a) a special case of the Mumford-Shah functional for 1-D signals, and
b) according to Proposition 4.6, is optimized by the SIDE.
Sec. 4.5. Alternative Implementations. 87
where we replaced the requirement that h be binary with a seemingly less restrictive
condition that h belong to the unit hypercube [0, 1]N . Constraints (4.25) and (4.26)
mean that ri ≥ |hi+1 − hi |. On the other hand, (4.24) means that ri must be as small
as possible, and therefore ri = |hi+1 − hi |. The fact that the constraints (4.27) are
equivalent to h ∈ {0, 1}N is a little less obvious; it is verified in Appendix B (Section
B.5). We point out that any generic linear programming algorithm will lose out in speed
to the SIDE, because the SIDE exploits the special structure of the problem (4.23).
N −k X 0
N
φ(u0 , h) = hT u0 − ui ,
N
i=1
P
we evolve the SIDE (4.1) until exactly one α-crossing remains, where α = N1 N 0
i=1 ui .
c c
We denote the correct hypothesis h and the correct location of the edge g . Without
loss of generality, we assume that the first g c samples of u0 have mean θ0 , the last
def
N − g c samples have mean θ1 , and that d = θ1 − θ0 > 0. We denote the detected edge
location g ∗ , its sign z ∗ , the corresponding hypothesis h∗ , and the number of zeros in it
k ∗ : if z ∗ = 1, then k ∗ = g ∗ ; if z ∗ = −1, then k ∗ = N − g ∗ .
Pick two integers, p0 and q0 , satisfying 1 ≤ p0 ≤ g c ≤ q0 ≤ N − 1, so that the
location g c of the true edge is between p0 and q0 . The goal of this section is to compute
a lower bound for the probability of the event
{p0 ≤ g ∗ ≤ q0 , z ∗ = 1}, (4.28)
which says that the detected edge location g ∗ is between p0 and q0 , and that the detected
sign of the edge is correct. The strategy will be to find a lower bound for the probability
of a simpler event which implies (4.28). Specifically, suppose that g ∗ > g c and z ∗ = 1.
Then
∗
X
g
∗
φ(u , h ) − φ(u , h ) =
0 0 c
(−u0i + α).
i=g c +1
Sec. 4.6. Performance Analysis. 89
Since h∗ is the optimal hypothesis, the above expression has to be positive. Thus, if
X
q
(−u0i + α) < 0 for q = q0 + 1, . . . , N, (4.29)
i=g c +1
X
N X
N
(−u0i + α) + (−u0i + α) < 0 for q = g c + 1, . . . , N, and (4.31)
i=g c +1 i=q
c
X
g X
p
(u0i − α) + (u0i − α) < 0 for p = 1, . . . , g c , (4.32)
i=1 i=1
then the detected sign is correct, i.e. z ∗ = 1. Thus, the simultaneous occurrence
of the events (4.29)-(4.32) implies (4.28). If α were not random, the expressions in
(4.29)-(4.32) would be sums of independent identically distributed random variables,
and therefore we would be able to employ results from the theory of random walks.
We will remove the randomness of α from (4.29)-(4.32) by introducing a non-random
bound on how far α can be from its mean
def 1 c
m = (g θ0 + (N − g c )θ1 ).
N
In other words, suppose that there are two positive real numbers, δ1 and δ2 , such that
m − δ1 ≤ α ≤ m + δ2 . (4.33)
X
q
(−u0i + m + δ2 ) < 0 for q = q0 + 1, . . . , N (4.34)
i=g c +1
implies the corresponding inequality in (4.29). Let us call Aq the event that the q-th
inequality in (4.34) holds, for q = q0 + 1, . . . , N . We shall similarly bound the events
(4.30), by defining events Ap whose intersection implies (4.30):
c
X g
Ap = (u0i − (m − δ1 )) < 0 , for p = 1, . . . , p0 − 1. (4.35)
i=p+1
90 CHAPTER 4. PROBABILISTIC ANALYSIS
We shall call A0q and A0p the events which imply (4.31) and (4.32), respectively:
X N XN
A0q = + (−u0i + m + δ2 ) < 0, for q = g c + 1, . . . , N ; (4.36)
i=g c +1 i=q
(Ã gc ! )
X X p
A0p = + (u0i − (m − δ1 )) < 0 , for p = 1, . . . , g c . (4.37)
i=1 i=1
c
Let ε1 be the union (upper) bound for the probability of ∪gp=1 A0p , where the overbar
denotes the complement:
c
X
g
ε1 = Pr(A0p ). (4.38)
p=1
Suppose further that p1 is a lower bound for the probability of the intersection of the
events Ap :
0 −1
p\
p1 ≤ Pr Ap . (4.39)
p=1
Then
Ãp −1 Ã gc !!
\
0 \ 0 −1
p\ [
gc
Pr Ap ∩ A0p = Pr Ap ∩ A0p (4.40)
i=1 i=1 i=1 i=1
Ãp −1 ! Ãp −1 Ã gc !!
\
0 \
0 [
= Pr Ap − Pr Ap ∩ A0p (4.41)
i=1 i=1 i=1
à gc !
[
≥ p1 − Pr A0p (4.42)
i=1
≥ p1 − ε1 , (4.43)
where we used the identity ∩A0p = ∪A0p in (4.40), the identity Pr(A ∩ B) = Pr(A) −
Pr(A ∩ B) in (4.41), and the inequality −Pr(A ∩ B) ≥ −Pr(B) in (4.42). Similarly, the
probability of the intersection of (4.34) and (4.36) is bounded from below by p2 − ε2 ,
where
X
N
ε2 = Pr(A0q ),
q=g c +1
\
N
p2 ≤ Pr Aq .
q=q0 +1
Sec. 4.6. Performance Analysis. 91
We showed earlier in this section that the intersection of these events implies the inter-
section of the events (4.29)-(4.32), which, in turn, implies the event (4.28). Thus, the
above expression (4.44) is a lower bound for the probability of the event (4.28).
In [28], asymptotic probabilities of the events (4.28) are computed, for N → ∞ and
g c → ∞. When α is non-random (as in, e.g., our Examples 4.4 and 4.5 of Subsection
4.4.1), these asymptotic probabilities are also (non-asymptotic) lower bounds. In the
process of computing these, lower bounds p1 (4.39) and p2 are also computed in [28];
these are asymptotically tight. In the next subsection, we describe a different method
for computing p1 and p2 for the Gaussian case (i.e. when the model of Subsection
4.4.2 applies). Our method produces looser bounds than [28]; however, the derivation
is conceptually much simpler and leads to easier computations.
u0 = m + w,
Pr(p ≤ g ∗ ≤ q0 and z ∗ = 1) ≥
0
µ d √g c − p + 1 ¶ µ √ c
d 1 g − p0 + 1
¶ X gc µ c
d1 (p + g )
¶
1 0
Φ −Φ − − Φ − √ ×
σ σ σ 3p + g c
p=1
µ √ ¶ µ √ ¶ µ ¶
X d2 (2N − q − g c )
N
d2 q0 − g c + 1 d 2 q0 − g c + 1
× Φ −Φ − − Φ − √ −
σ σ σ 4N − 3q − g c
q=g c +1
à √ ! à √ !
δ2 N δ1 N
− Φ − −Φ − , (4.45)
σ σ
where
• δ1 and δ2 are any positive real numbers such that d1 > 0 and d2 > 0.
92 CHAPTER 4. PROBABILISTIC ANALYSIS
Proof. The terms of this bound come from calculating the parameters p1 , p2 , ε1 ,
ε2 , and ε of the expression (4.44). We now present
P this0 calculation.
It follows from our noise model that α = N1 N i=1 ui is Gaussian with mean
σ2
N (g θ0 + (N − g )θ1 ) and variance N . Thus, the probability ε that (4.33) does not
1 c c
hold is
Z δ2 µ ¶ Ã √ ! Ã √ !
σ2 δ1 N δ2 N
ε=1− N 0, =Φ − +Φ − , (4.46)
−δ1 N σ σ
X
p X
g c µ ¶
(N − g c )d
2 wi + wi − (p + g ) c
− δ1 > 0.
N
i=1 i=p+1
The sum of the noise samples is a zero-mean Gaussian random variable with variance
σ 2 (4p + (g c − p)) = σ 2 (3p + g c ). Therefore, if we define
(N − g c )d
d1 = − δ1 ,
N
then the probability of A0p is
µ ¶
d1 (p + g c )
Φ − √ .
σ 3p + g c
X
g c µ ¶
d1 (p + g c )
ε1 = Φ − √ .
p=1
σ 3p + g c
and note that the intersection of the events Ap of Equation (4.35) is equivalent to
d1
Sj < j for j = g c − p0 + 1, ..., g c − 1.
σ
Also note that the Sj ’s form the standard (discrete) Brownian motion [52], which can
be viewed as a sampling of the standard continuous Brownian motion S(t) at integer
Sec. 4.6. Performance Analysis. 93
Given s0 , P (t) − dσ1 t is a Brownian motion with drift − dσ1 . If the drift is non-negative,
then the probability inside the integral is zero [52]. We therefore assume that d1 > 0,
i.e., that
(N − g c )d
δ1 < .
N
Then the drift is negative, in which case the supremum is finite almost surely, and its
probability distribution is [52]
µ ¶
d1
1 − exp −2 x for x ≥ 0, and zero otherwise.
σ
Substituting this into the integral above, we get
Z d1 t0 ½ · µ ¶¸¾ µ ¶
σ d1 d1 1 s20
1 − exp −2 t0 − s0 √ exp − ds0 =
−∞ σ σ 2πt0 2t0
µ √ ¶ Z d1 t0 Ã !
d1 t0 σ 1 (s0 − 2dσ1 t0 )2
Φ − √ exp − ds0 =
σ −∞ 2πt0 2t0
µ √ c ¶ µ √ ¶
d1 g − p0 + 1 d 1 g c − p0 + 1
Φ −Φ − .
σ σ
Combining the values for p1 , ε1 , and ε, obtained above, with similarly obtained values
c
for p2 and ε2 (where d2 = gNd − δ2 ), we arrive at the expression (4.45). As we mentioned
c
above, this bound is looser than those of [28]. For example, if N is very large, gN = 0.5,
d = 3σ, and p0 = q0 = g c , then the asymptotic probability (from Table 3.3 of [28]) is
0.857, whereas our bound is 0.751.
94 CHAPTER 4. PROBABILISTIC ANALYSIS
Problem 4.2. Let d > 0 be a known real number. Consider a step sequence mgc =
(0, . . . , 0, d . . . , d)T of length N , whose first g c entries are zeros. Let the observed signal
be
y = mgc + v,
As stated in Section 4.3, if v is a zero-mean white Gaussian noise, then the SIDE
will find the ML estimate, i.e.
X
N
d
c
ĝM L = arg max (yi − ). (4.47)
g 2
i=g+1
To analyze the robustness of this estimator, we define the following performance mea-
sure:
|g c − f (y)|
B(f ) = sup ,
g c ,v6=0 kvk1
where f (y) is any estimator of g c , and k · k1 stands for the `1 norm. Choosing the
estimator which minimizes B is similar in spirit to H∞ estimation: we would like to
minimize the worst possible error, over all possible disturbances. We presently show
that the SIDE estimator (4.47) does minimize B. This means that our estimator is
robust: it has the best worst-case error performance among all estimators, and for all
noise sequences.
We will prove the optimality of the SIDE estimator by showing two things:
Sec. 4.6. Performance Analysis. 95
• that B(f ) is always larger than a certain constant (see Proposition 4.9 below),
and
• that the SIDE estimator achieves this lower bound (see Proposition 4.10 below).
Proposition 4.9. For any estimator f , B(f ) ≥ d2 .
Proof. Fix the noise level at kvk1 = d2 , and suppose that the observation is y =
(0, d2 , d, . . . , d)T . The signal mgc which resulted in this y after adding noise of norm d2
could be either m1 = (0, d, d, . . . , d)T or m2 = (0, 0, d, . . . , d)T , in which cases g c = 1
and g c = 2, respectively. Thus,
|g c − f (y)|
B(f ) ≥ sup d
=
g c ∈{1,2} 2
2
= sup |g c − f (y)| =
d gc ∈{1,2}
½ 2
= d |2 − f (y)| if f (y) = 1
≥
2
d |1 − f (y)| if f (y) ≥ 2
2
≥ .
d
We will now show that the ML estimator achieves this bound.
L = fM L (y), B(fM L ) =
c 2 c
Proposition 4.10. For ĝM d. Thus, the estimator ĝM L is
optimal with respect to the criterion B.
c
Proof. Suppose that ĝM c
L > g . Then (4.47) implies
X
N
d X
N
d
(yi − ) > (yi − ) ⇒
c
2 2
i=ĝM L +1
i=g c +1
c
ĝM
X L
d
(yi − ) < 0 ⇒
2
i=g c +1
c
ĝM
X L
d
((d + vi ) − ) < 0 ⇒
2
i=g c +1
c
ĝM
d X L
c
(ĝM L −g )c
< − vi ≤ kvk1 .
2
i=g c +1
Therefore, the smallest `1 norm of the disturbance required to create the error ĝM L −g
c c
|g c − fM L (y)| |g c − fM L (y)| 2
B(fM L ) = sup ≤ sup = .
c
g ,v6=0 kvk1 g |g − fM L (y)| 2
c c d d
96 CHAPTER 4. PROBABILISTIC ANALYSIS
We note that, while (4.48) is the equation we have conjectures about, it is not the
equation we use in practice for images. The reason is that thresholding the initial
condition u0 with the threshold α typically leads to initial segmentations which are
too coarse—that is, which have too few regions. Even if the evolution then provides
the best coarsening of this initial segmentation, it may not be a good result. Better
results are achieved when one first evolves the SIDE using (3.12), and then applies the
threshold α to the image u(t). This is what was done in examples of Figure 4.5, which
are experimental evidence of the fact that the algorithm works well and is very robust
to degradations which do not conform well to the models of Section 4.3. The data on
the left is a very blurry and noisy synthetic aperture radar image of two textures: forest
Sec. 4.7. Analysis in 2-D. 97
100 50
200 100
100 200 50 100
and grass. The pervasive speckle noise is inherent to this method of data collection. The
algorithm was run on the raw data itself (which corresponds to assuming a Gaussian
model with changes in mean—see Section 4.3), and stopped when two regions remained.
The resulting boundary (shown superimposed onto the logarithm of the original image)
is extremely accurate. The logarithm of a similarly blurry and noisy ultrasound image
of a thyroid is shown on the right, with the boundary detected by the SIDE.
98 CHAPTER 4. PROBABILISTIC ANALYSIS
Chapter 5
HE preceding chapters were all devoted to the analysis of images and signals which
T take values in IR. It is often necessary, however, to process vector-valued images
where each pixel value is a vector belonging to IRM , with M ≥ 1. The entries of this
vector could correspond to red, green, and blue intensity values in color images [74],
to data gathered from several sensors [33] or imaging modalities [68], or to the entries
of a feature vector obtained from analyzing a texture image [10]. In the next section,
we generalize our SIDEs to vector-valued images and argue that most properties of
scalar-valued SIDEs still apply. We then give several examples of segmentation of color
and texture images.
Section 5.2 treats images whose every pixel belongs to a circle S1 . Such images arise
in the analysis of orientation [48] and optical flow [30].
where E is a SIDE energy function (Figure 2.6). The norm k · k here stands simply for
the absolute value of its scalar argument. Now notice that we still can use the above
equation if the image under consideration is vector-valued, by interpreting k · k as the
`2 norm. To remind ourselves that the pixel values of vector-valued images are vectors,
we will use arrows to denote them:
X
E= E(k~uj − ~ui k), (5.1)
i,jare neighbors
99
100 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES
Figure 5.1. Spring-mass model for vector-valued diffusions. This figure shows a 2-by-2 image whose
pixels are two-vectors: (2,2), (0,0), (0,1), and (1,2). The pixel values are depicted, with each pixel
connected by springs to its neighboring pixels.
where
We will call the collection of the k-th entries of these vectors the k-th channel of the
image. In this case, the gradient descent equation is:
1 X ~uj − ~ui
~u˙i = F (k~uj − ~ui k)pij . (5.2)
mi k~uj − ~ui k
j∈Ai
(This notation combines M equations—one for each channel—in a single vector equa-
tion). Just as for the scalar images, we merge two neighboring pixels at locations i and
j when their values become equal: ~uj (t) = ~ui (t). Just as in Chapter 3, mi and Ai are
the area of the i-th region and the set of its neighbors, respectively. The length of the
boundary between regions i and j is pij .
A slight modification of the spring-mass models of Chapter 3, Figures 3.1 and 3.3,
can be used to visualize this evolution. We recall that those models consisted of particles
forced to move along straight lines, and connected by springs to their neighbors. If
we replace each straight line with an M -dimensional space, we will get the model
corresponding to the vector-valued equation, with pixel values in IRM . For example,
when M = 2, the particles are forced to move in 2-D planes. Another way to visualize
the system is by depicting all the particles as points in a single M -dimensional space,
as shown in Figure 5.1. Each pixel value ~ui = (ui,1 , ui,2 )T is depicted in Figure 5.1 as
a particle whose coordinates are ui,1 and ui,2 . Each particle is connected by springs to
its neighbors. The spring whose length is v exerts a force whose absolute value is F (v),
and which is directed parallel to the spring.
By using techniques similar to those in Chapter 3, it can be verified that Equation
(5.2) inherits many useful properties of the scalar equation. Namely, the conservation
of mean and the local maximum principle hold for each channel; the equation reaches
Sec. 5.1. Vector-Valued Images. 101
the steady state in finite time and has the energy dissipation properties described in
Subsection 3.4.2. For usual SIDE force functions, there is no sliding for vector images—
just as there was no sliding for scalar images. However, for force functions which are
infinite at zero, such as that of Figure 3.8, the sliding property holds, and therefore so
does well-posedness. Vector-valued SIDEs are also robust to severe noise, as we show
in the experiments.
30 30
60 60
30 60 30 60
(a) (b)
Figure 5.4. (a) Image of two textures: fabric (left) and grass (right); (b) the ideal segmentation of
the image in (a).
30 30 30
60 60 60
30 60 30 60 30 60
30 30 30
60 60 60
30 60 30 60 30 60
superimposed onto the initial image, is depicted in Figure 5.2, (c). Just as in the scalar
case, the algorithm is very accurate in locating the boundary: less than 0.2% of the
pixels are misclassified in this 100-by-100 image.
A similar experiment, with the same level of noise, is conducted for a more com-
plicated shape, whose image is in Figure 5.3, (a). The result of processing the noisy
image of Figure 5.3, (b) is shown in Figure 5.3, (c). In this 200-by-200 image, 0.8% of
the pixels are misclassified.
30 30
60 60
30 60 30 60
(a) (b)
Figure 5.6. (a) Two-region segmentation, and (b) its deviation from the ideal one.
30 30 30
60 60 60
30 60 30 60 30 60
where ui,j is the (i, j)-th pixel of the original image. This leads to a significant im-
provement in performance, shown in the rest of Figure 5.7: the shape of the boundary
is much closer to that of the ideal boundary, and the number of misclassified pixels is
104 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES
50 50
100 100
50 100 50 100
(a) (b)
Figure 5.8. (a) Image of two wood textures; (b) the ideal segmentation of the image in (a).
50 50 50
100 100
200 200
E(v)
−π π v
Figure 5.12. A SIDE energy function which is flat at π and −π and therefore results in a force
function which vanishes at π and −π.
We define SIDEs for circle-valued images and signals as the following gradient descent
equation:
u̇ = −∇E,
with the i-th pixel evolving according to:
u̇i = −∇i E,
where ∇i is the gradient taken on the unit circle S1 . To visualize this evolution, the
straight vertical lines of the spring-mass models of Figures 3.1 and 3.3 are replaced with
circles: each particle is moving around a circle. After taking the gradients, simplifying,
and taking into account merging of pixels, we obtain that the differential equation
governing the evolution of the phase angles of ui ’s is very similar to the scalar SIDE:
1 X
θ˙i = F (ζ(uj , ui ))pij , (5.5)
mi
j∈Ai
Sec. 5.2. Orientation Diffusions. 107
where θi is the phase angle of ui (we use the convention 0 ≤ θi < 2π, and identify
θi = 2π with θi = 0). The rest of the notation is the same as in Chapter 3. Two
neighboring pixels are merged when they have the same phase angle.
While this evolution has many similarities to its scalar and vector-valued counter-
parts, it also has important differences, stemming from the fact that it operates on the
phase angles. Thus, it is not natural to talk about the mean of the (complex) values
of the input image; instead, this evolution preserves the sum of the phases, modulo 2π.
This is easily verified by summing up the equations (5.5).
Property 5.1 (Total phase conservation). The phase angle of the product of all
pixel values stays invariant throughout the evolution of (5.5).
Another important distinction from the scalar and vector SIDEs is the existence of
unstable equilibria. Since F (π) = E 0 (π) = 0, it follows that if two neighboring pixels
have values which are two antipodal points on S1 , these points will neither attract not
repulse each other. Thus, unlike the scalar and vector-valued equations whose only
equilibria were constant images, many other equilibrium states are possible here. The
next property, however, guarantees that the only stable equilibria are constant images.
Property 5.2 (Equilibria). Suppose that all the pixel values of an image u have the
same phase. Then u is a stable equilibrium of (5.5). Conversely, such images u are the
only stable equilibria of (5.5).
Proof. If all the phases are the same, then the whole image u is a single region,
which therefore has no neighbors and is not changing. Moreover, if a pixel value is
perturbed by a small amount, forces exerted by its neighbors will immediately pull it
back. Thus, it is a stable equilibrium.
Suppose now that an equilibrium image u has more than one region. Let us pick an
arbitrary region, call its value u∗ , and partition the set of its neighboring regions into
two subsets: U = {u1 , . . . , up }, whose every element pulls u∗ in the counter-clockwise
direction (i.e., U is comprised of those regions for which the phase of uu∗i is positive
and strictly less than π), and the set V = {v1 , . . . , vq }, whose elements pull u∗ in the
clockwise direction (i.e., those regions for which the phase of uv∗i is negative and greater
than or equal to −π). One of the sets U, V —but not both—can be empty.
Since our system (5.5) is in equilibrium, it means that the resultant force acting on
u∗ is zero—i.e., the right-hand side of the corresponding differential equation is zero.
Suppose now that u∗ is slightly perturbed in the clockwise direction. Since the force
function F is monotonically decreasing, this means that the resultant force exerted on
u∗ by the regions comprising the set V will increase, and the resultant force exerted by
U will decrease. The net result will be to further push u∗ in the clockwise direction.
A similar argument applies if u∗ is perturbed in the counter-clockwise direction. Thus,
the equilibrium is unstable, which concludes the proof.
For any reasonable probabilistic model of the initial data, the probability of at-
taining an unstable equilibrium during the evolution is zero. In any case, a numerical
108 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES
30 30 30
60 60 60
30 60 30 60 30 60
implementation can be designed to avoid such equilibria. Therefore, in the generic case,
the steady state of the evolution is a stable equilibrium, which, by the above property,
is a constant image. This corresponds to the coarsest segmentation (everything is one
region), just like in the scalar-valued and vector-valued cases.
Since there is no notion of a maximum on the unit circle, there is no notion of a
“maximum principle”, either. This means, moreover, that we cannot mimic the proof
of Property 3.2 (finite evolution time) of the scalar evolutions. This property does hold
for the evolutions on a circle, but the proof is different. Specifically, Property 3.7 holds
here, with a similar proof, which means that between two consecutive mergings, the
global energy is a concave decreasing function of time (Figure 3.7). It will therefore
reach zero in finite time, at which point the evolution will be at one of its equilibria.
Property 5.3 (Finite evolution time). The SIDE (5.5) reaches its equilibrium in
finite time, starting with any initial condition.
¥ 5.2.1 Experiments.
To illustrate segmentation of orientation images, we use the same texture images which
we used in the previous section. To extract the orientations, we use the direction of the
gradient. The (i, j)-th pixel value of the orientation image is
√
angle[(ui,j+1 − ui,j + −1(ui+1,j − ui,j ))]2 ,
where ui,j is the (i, j)-th pixel value of the raw texture image. (Note that the absolute
value of this orientation image was used as a feature image in the previous section.)
The expression in the above formula is squared so as to equate the phases which differ
by π.
The orientations for the fabric-and-grass image are shown in Figure 5.13, (a). We
present this image as the initial condition to the circle-valued SIDE (5.5) and evolve it
until two regions remain. The resulting segmentation is depicted in Figure 5.13, (b),
and its difference from the ideal one is in 5.13, (c). About 4.3% of the total number of
pixels are classified incorrectly. It is not surprising that this method performs slightly
worse on this example than the evolutions of the previous section. Indeed, if a human
were asked to segment the original texture image based purely on orientation, he might
Sec. 5.3. Conclusion. 109
50 50 50
make similar errors. Note in particular that the protrusion in the upper portion of the
boundary found by the SIDE corresponds to a horizontally oriented piece of grass in
the original image, which can be mistaken for a portion of the fabric.
In the second example, however, the orientation information is very appropriate
for characterizing and discriminating the two differently oriented wood textures. The
orientation image is in Figure 5.14, (a). The resulting five-region segmentation of Figure
5.13, (b) incorrectly classifies only 252 pixels (1.5%) in the 128-by-128 image, which is
14 pixels better than the method of the previous section. A more dramatic improvement
is achieved in the example of Figure 5.15, which shows the five-region segmentation of
the image in Figure 5.10, (a), using the circle-valued SIDE (5.5). Only 1.5% of the
pixels are misclassified, as compared to 2.3% using the method of the previous section.
¥ 5.3 Conclusion.
In this chapter, we generalized SIDEs to the situations when the image to be pro-
cessed is not scalar-valued. We described the properties of the resulting evolutions and
demonstrated their application to segmenting color, texture, and orientation images.
110 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES
Chapter 6
111
112 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH
between our non-linear diffusion equation and the Mumford-Shah variational method
of image segmentation, and showed that a certain particular case of these is a linear
programming problem.
Thus, the contribution of Chapter 4 was two-fold. First, we presented a fast and
robust 1-D edge detection and 2-D image segmentation method. Second, we established
a link between deterministic methods for image restoration and segmentation (based on
non-linear diffusions and variational formulations) and a probabilistic framework. This
leads to a deeper understanding of these methods: both of their performance, and of
how to use them in a variety of situations (e.g., in Section 4.4 this meant pre-processing
the data by forming the log-likelihood ratios). As we will argue in the next section, we
have no doubt that these lines of investigation can and should be pursued further.
Finally, in Section 5 we demonstrated that our framework can be easily adapted
to non-scalar-valued images. Specifically, we used the result from Chapter 3 which
showed that a scalar-valued SIDE is the steepest descent equation for a certain energy
functional. We then generalized SIDEs by deriving the steepest descent equations for
similar energy functionals in vector-valued and circle-valued cases. We showed that
many properties of the scalar-valued SIDEs applied, and pointed out several important
differences. We successfully applied the resulting evolution to the segmentation of
color, texture, and orientation images.
F(v)
-K K v
models of the input signal. It is unclear how to choose force functions for other signal
models; an interesting question is for what models this can be done so as to guarantee
that the solution produced by the SIDE is the maximum likelihood solution.
Another question is that of robustness properties of the SIDEs corresponding to
different shapes of the force function. Intuitively, if the goal is segmentation in the
presence of outliers, then it is appropriate to diffuse quickly both in the areas of very
large gradient (corresponding to the outliers), and in the areas of very small gradient
(corresponding to small-amplitude noise). Ideally, the minimum diffusion speed would
be at the locations with intermediate values of the gradient, corresponding to edges.
We are then led to the form of a force function depicted in Figure 6.1, which is the
inverse of a Perona-Malik force function, and has a unique minimum at some location
K. If the parameters of the probabilistic model of the input signal (such as the standard
deviation of the noise, the magnitude and frequency of the outliers) are fixed, then it
is natural to expect that some value of K is, in some sense, optimal. Uncovering this
relationship between the model and the corresponding value of the parameter K is an
interesting research topic.
To simplify notation, we replace ni with i in (3.11) and re-write the system in terms of
vi = ui+1 − ui :
1 1
v̇i = (F (vi+1 ) − F (vi )) − (F (vi ) − F (vi−1 ))
mi+1 mi
i = 1, . . . , p − 1. (A.1)
We need to prove that if (i1 , . . . , ip−1 ) is any permutation of (1, . . . , p − 1), then, as v
p−1
approaches S = ∩m k=1 Sik \(∪k=m+1 Sik ), lim(v̇iq sign(viq )) ≤ 0 for all integers q between
1 and m, and for at least one such q the inequality is strict (i.e., the trajectories enter
S transversally). Note that for every point s ∈ S and every quadrant Q, we only need
to find one sequence of v’s approaching s from Q and satisfying these inequalities. This
is because the solutions vary continuously inside each quadrant.
Fix vim+1 , . . . , vip−1 at non-zero values, let
1
ε= min |vi |,
2 m+1≤j≤p−1 j
let initially δ = ε, set |vi1 | = . . . = |vim | = δ, and drive v towards S by letting δ go to
zero. Take an arbitrary index q between 1 and m. By our construction, viq is approach-
ing zero, and either viq = δ > 0 or viq = −δ < 0. If viq = δ, then, by construction,
viq ≤ |viq±1 |, implying F (viq ) ≥ F (viq±1 ), which makes the RHS of (A.1) for i = iq
non-positive: limδ→0 v̇iq ≤ 0. If m < p − 1, then there is a j between 1 and m such
that at least one of the two neighbors of vij is in the set {vim+1 , . . . , vip−1 }, and whose
absolute value is therefore staying above ε: |vij+1 | > ε or |vij−1 | > ε. Without loss of
generality, suppose it is the left neighbor: |vij−1 | > ε. If m = p − 1, define j = 1. If our
arbitrary q happens to be equal to this j, then
F (viq ) − F (viq−1 ) = F (vij ) − F (vij−1 ) > F (vij ) − F (ε),
and hence (A.1) for i = ij has a strictly negative limit: limδ→0 v̇ij < 0. Similar reasoning
for the case viq = −δ leads to limδ→0 v̇iq ≥ 0, and, if it happens that q = j, then
limδ→0 v̇iq > 0.
115
116 APPENDIX A. PROOF OF LEMMA ON SLIDING (CHAPTER 3)
Appendix B
2 2
1 1
0 0
1 2 3 1 2 3
1 1
0 0
1 2 3 1 2 3
(a) (b)
Figure B.1. Samples of a signal (top plots) and impossible edge configurations of optimal hypotheses
(bottom plots).
117
118 APPENDIX B. PROOFS FOR CHAPTER 4.
smaller than α.
Note that, for any integer s such that 1 ≤ s ≤ N − 1 and us+1 (t) − us (t) 6= 0, we have,
by summing up Equations (4.1) from n = s + 1 to n = N ,
X
N
u̇n (t) = −sgn(us+1 (t) − us (t)) = ±1. (B.4)
s+1
If us+1 (t) = us (t), let p be the smallest index to the left of s such that up+1 (t) = us (t),
and let q be the largest index to the right of s such that uq (t) = us (t). If q ≤ N − 1
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 119
X
N X
q X
N
u̇n (t) = u̇n (t) + u̇n (t)
n=s+1 n=s+1 n=q+1
= (q − s)u̇s − sgn(uq+1 (t) − uq (t))
1
= (q − s) (sgn(uq+1 (t) − uq (t)) − sgn(up+1 (t) − up (t)))
q−p
−sgn(uq+1 (t) − uq (t))
1
= − sgn(uq+1 (t) − uq (t))
q−p
1
− sgn(up+1 (t) − up (t)) ≥ −1, (B.5)
q−p
since q − p ≥ 2. The same
PN inequality is obtained if q = N or p = 0. Combining (B.5)
and (B.4), we see that n=s+1 u̇n (t) ≥ −1 for any s. Therefore, the minimal possible
P
value for (B.3) is (−1) times the number of sums of the form N n=s+1 u̇n (t) in that
expression. This is −j, which is, by assumption, not smaller than −ν:
φ̇(u(t), h) ≥ −j ≥ −ν.
Now note that in this double inequality, both equalities are achieved for all time t,
0 ≤ t ≤ tf , when h = h∗ (u(tf )). Indeed, since in this case j = ν, and—as easily seen
from the definition of φ—g1 , . . . , gν are α-crossings of u(tf ) (and, therefore—by Lemma
4.1—also of u(t) for 0 ≤ t ≤ tf ), we have that for t ∈ [0, tf ],
½
−1 if i is odd
−sgn(ugi +1 (t) − ugi (t)) =
1 if i is even.
Inserting this into (B.4), and then back into (B.3), we see that in this case, (B.3) is
equal to −ν. By definition of h∗ (u(tf )), φ(u(tf ), h) is the largest for h = h∗ (u(tf )). On
the other hand, we just showed that the amount of the reduction of φ(u(t), h) during
the evolution was the greatest for h = h∗ (u(tf )), over all possible hypotheses with ν or
fewer edges. Therefore, φ(u(0), h) must also have been the largest for h = h∗ (u(tf )),
over the same set of hypotheses, which is the statement of the Proposition.
The statement of the same proposition in [51] is as follows.
Proposition B.1. Fix the initial condition u0 of the SIDE (4.1), and let u(t) be the
corresponding solution. Suppose that a statistic φ satisfies two conditions:
© ª
d
1) dt φ(u(t), h) − hT u(t)) = 0;
Proposition B.2. Propositions 4.2 and B.1 are equivalent, in the following sense.
i) If φ(u, h) is as in Proposition 4.2, it satisfies the two conditions of Proposition B.1.
ii) Suppose that φ0 (u, h) satisfies the two conditions of Proposition B.1 for any initial
data u0 ∈ IRN (where the constant α may depend on u0 ), and suppose that φ(u, h) is
as in Proposition 4.2. Then, for all u ∈ IRN and for all h ∈ {0, 1}N \{0, 1} P (where
1 = (1, . . . , 1)T ∈ IRN ), φ(u, h) and φ0 (u, h) can only differ by a function of N
i=1 ui :
ÃN !
X
0
φ (u, h) − φ(u, h) = f ui ,
i=1
for some function f : IR → IR, and thus the optimal hypotheses with respect to φ and
φ0 are the same.
Lemma B.1. Let u(t) be the solution of the SIDE (4.1). Suppose that a function
ψ : IRN → IR satisfies
d
(ψ(u(t))) = 0, (B.6)
dt
for any initial data u0 . Then ψ only depends on the sum of the entries of its argument—
i.e., there is a function f : IR → IR such that
ÃN !
X
ψ(u) = f ui .
i=1
Proof. We first show that the partial derivatives of ψ with respect to all the
variables are equal to each other, using the identity
d X ∂ψ
N
(ψ(u(t))) = u̇i . (B.7)
dt ∂ui
i=1
Take an initial condition for which u01 < u02 < . . . < u0N −1 < u0N . It then follows from
(4.1) that u̇1 = 1, u̇N = −1, and u̇i = 0 for 2 ≤ i ≤ N − 1. Substituting these into
(B.7) and using (B.6), we get:
∂ψ ∂ψ
= . (B.8)
∂u1 ∂uN
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 121
Now take an initial condition for which u01 < u02 < . . . < u0N −1 and u0N < u0N −1 . Then
u̇1 = u̇N = 1, u̇N −1 = −2, and u̇i = 0 for 2 ≤ i ≤ N − 2, and therefore
∂ψ ∂ψ ∂ψ
−2 + = 0,
∂u1 ∂uN −1 ∂uN
∂ψ ∂ψ
= .
∂u1 ∂uN −1
wi = ui+1 − ui for i = 1, . . . , N − 1,
XN
wN = ui .
i=1
and therefore
i
∂uk N, i<k
N − 1, k ≤ i < N − 1
i
=
∂wi 1
N, i = N.
∂ XN
∂ψ ∂uk
{ψ(u(w))} =
∂wi ∂uk ∂wi
k=1
" i #
∂ψ X i XN
i
= ( − 1) +
∂uk N N
k=1 k=i+1
· 2 ¸
∂ψ i 1
= − i + (N − i) = 0,
∂uk N N
122 APPENDIX B. PROOFS FOR CHAPTER 4.
∂ψ
where we used the fact (B.9) that ∂uk is the same for all k. If i = N , then
∂ XN
∂ψ ∂uk
{ψ(u(w))} =
∂wi ∂uk ∂wN
k=1
"N #
∂ψ X 1
=
∂uk N
k=1
∂ψ
= .
∂uk
P
Thus, ψ does not depend on w1 , . . . , wN −1 , only on wN = N i=1 ui .
Proof of Proposition B.2.
(i) Is straightforward.
(ii) According to Lemma B.1 above, if φ0 (u, h) satisfies Condition 1 of Proposition B.1,
then
φ0 (u, h) − hT u
P
is a function of h and Ni=1 ui only. It also follows from the same Lemma that α of
PN
Proposition B.1 may only depend on i=1 ui . Therefore, there is a function ψ such
that
X
N
0
φ (u, h) = h (u − a) + ψ(h,
T
ui ).
i=1
Suppose that ψ depends on h for h 6= 0, 1. We presently show that this would lead to
violating Condition 2 of Proposition B.1. Take h1 , h2 ∈ {0, 1}N \{0, 1}, and S ∈ IR,
such that
p = kh2 − h1 k1 ≥ 1.
k = N − kh2 k1 ,
and let
ψ(h1 , S) − ψ(h2 , S)
ε= > 0.
2p
Case 1. There are i and j such that h1,i = h2,i = 1 and h1,j = h2,j = 0.
Let um = α − ε if h2,m = 0 and m 6= j, and let um = α + ε if h2,m = 1 and m 6= i. If
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 123
On the other hand, the edges of h2 coincide with the α-crossings of u: um > α whenever
h2,m = 1 and um < α whenever h2,m = 0. Therefore, if φ0 is to satisfy Condition 2 of
Proposition B.1, hP 2 has to be the optimal hypothesis for u, which contradicts (B.11).
We also note that N m=1 um = S by construction.
Case 2. There is no i for which h1,i = h2,i = 1, and there is no j for which h1,j =
h2,j = 0, i.e., h1,i = 1 − h2,i for i = 1, . . . , N .
If 2 ≤ k ≤ N − 2, let i and j be such that h2,i = h2,j = 1. Let h3 be obtained from h2
by changing the i-th entry from zero to one and the j-th entry from one to zero. Then
either ψ(h1 , S) 6= ψ(h3 , S), or ψ(h2 , S) 6= ψ(h3 , S) (or both), and both pairs (h1 , h3 )
and (h2 , h3 ) fall under Case 1 considered above.
If k = 1, let i be the index for which h2,i = 0. Without loss of generality, assume
that i 6= 1 and i 6= N . Form h01 and h02 as follows:
In other words,
h1 = (0, 0, . . . , 0, 1, 0, . . . , 0, 0)T ,
h01 = (1, 0, . . . , 0, 1, 0, . . . , 0, 0)T ,
h2 = (1, 1, . . . , 1, 0, 1, . . . , 1, 1)T ,
h02 = (1, 1, . . . , 1, 0, 1, . . . , 1, 0)T .
and all three pairs (h1 , h01 ), (h2 , h02 ), and (h01 , h02 ) fall under Case 1 considered above.
The remaining cases are handled similarly to Cases 1 and 2. The conclusion is that, in
each case, (B.10) leads to a violation of Condition 2 of Proposition B.1. This means
that the inequality (B.10) cannot be true, and so ψ(h, S) is independent of h, for
h ∈ {0, 1}N \{0, 1}.
124 APPENDIX B. PROOFS FOR CHAPTER 4.
(i, j). Suppose that, at time t− 2 , there are at least ν + 1 α-crossings remaining.
Case 4. The indices i − 1 and j are consecutive elements of the set {g2 , . . . , gν }.
Case 5. i − 1 6∈ {g2 , . . . , gν }, and j ∈ {g2 , . . . , gν }.
Case 6. i − 1 ∈ {g2 , . . . , gν }, and j 6∈ {g2 , . . . , gν }.
Cases 4, 5, and 6 are handled similarly to Cases 1, 2, and 3, respectively, with the
result that only (i, g2 ) or (gν , j) can be removed, where i − 1 is either 0 or the leftmost
α-crossing of u(t−
2 ), and j is either N or the rightmost α-crossing of u(t2 ).
−
We now show how to handle the case when (1, g2 ) is removed at t2 ; all other cases are
handled using similar techniques.
Without loss of generality, we suppose that g1 is an upward edge of h. Then
X
g1
(un − α) < 0, (B.12)
n=1
X
g2
(un − α) > 0, (B.13)
n=g1 +1
as otherwise removing the edges g1 and g2 would improve h. Since the region (1, g1 )
disappeared at time t1 while the α-crossing at g2 stayed, we have
Thus, h can be improved by replacing the edges g1 and g2 with i0 and j 0 , which is a
contradiction.
and so, for h∗≤ν̄ (u0 ) to be better (with respect to η) than both h1 and h2 , we need to
have:
The latter inequality contradicts the definition of E ∗ as the smallest energy of any
region of u(t).
Case (iii). h∗≤ν̄ (u0 ) has edges at the locations g1 , . . . , gν̄ .
This case is handled similarly to Case 2.
Case (iv). h∗≤ν̄ (u0 ) has edges at the locations {g1 , . . . , gν̄+1 }\{i∗ , j ∗ }, as well as one
edge at some other location g10 . This situation requires considering several sub-cases—
see Case 6 of Section B.3. As they are similar, we only treat the one where the region
(1, g10 ) was removed before time t.
Then
But the region (1, g10 ) disappeared before (i∗ , j ∗ ), and therefore
E ∗ ≥ E1,g10 ,
We now show that, unless hq = 0 or hq = 1, we can change h to make (4.24) smaller, and
that therefore h cannot be a solution if 0 < hq < 1. We will be changing hp+1 , . . . , hq ,
and so let us write out the portion of (4.24) which depends on them:
If 2λ > s, make hq = max(hp , hq+1 ), which will reduce hq and therefore also reduce
(B.20). It will also make either hp = hq or hq = hq+1 , violating our assumption (B.17).
If 2λ < s, make hq = 1, which will also reduce (B.20). In the degenerate case 2λ = s,
we can go either way without changing the solution. Thus, if (B.17) and (B.19) hold,
then hq = 1.
Case 2. hp > hq and hq+1 > hq . This is handled similarly, with the result that hq = 0.
Case 3. hp < hq and hq+1 > hq . Then the hq -dependent portion of Equation (B.18) is:
−hq s. (B.21)
[1] L. Alvarez, P.L. Lions, and J.-M. Morel. Image selective smoothing and edge detec-
tion by nonlinear diffusion, II. SIAM J. Numer. Anal., 29(3), 1992.
[2] M.S. Atkins and B.T. Mackiewich. Fully automatic segmentation of the brain in
MRI. IEEE Trans. on Medical Imaging, 17(1), February 1998.
[3] M. Basseville and I.V. Nikiforov. Detection of Abrupt Changes: Theory and Appli-
cation. Prentice Hall, 1993.
[4] D.P. Bertsekas and S.K. Mitter. A descent method for optimization problems with
nondifferentiable cost functionals. SIAM J. Control, 11(4), November 1973.
[5] M.J. Black, G. Sapiro, D.H. Marimont, and D. Heeger. Robust anisotropic diffusion.
IEEE Trans. on Image Processing, 7(3), 1998.
[7] C. Brice and C. Fennema. Scene analysis using regions. Artificial Intelligence, 1,
1970.
[8] C.B. Burckhardt. Speckle in ultrasound B-mode scans. IEEE Trans. on Sonics and
Ultrasonics, SU-25, January 1978.
[10] T.-H. Chang, Y.-C. Lin, and C.-C. J. Kuo. Techniques in texture analysis. IEEE
Trans. on Image Processing, October 1993.
[11] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT
Press, 1990.
[12] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. In Proc. ICCV,
pages 694–699, Cambridge, MA, 1995.
129
130 BIBLIOGRAPHY
[13] R.N. Czerwinski, D.L. Jones, and W.D. O’Brien, Jr. Line and boundary detection
in speckle images. IEEE Trans. on Image Processing, 7(12), 1998.
[15] S.Z. Der and R. Chellappa. Probe-based automatic target tecognition in infrared
imagery. IEEE Trans. on Image Processing, 6(1), 1997.
[17] A.F. Filippov. Differential Equations with Discontinuous Righthand Sides. Kluwer
Academic Publishers, 1988.
[18] C.H. Fosgate, H. Krim, W.W. Irving, W.C. Karl, and A.S. Willsky. Multiscale
segmentation and anomaly enhancement of SAR imagery. IEEE Trans. on Image
Processing, 6(1), 1997.
[19] M.G. Fleming, C. Steger, J. Zhang, J. Gao, A.B. Cognetta, I. Pollak, and C.R.
Dyer. Techniques for a structural analysis of dermatoscopic imagery. Computerized
Medical Imaging and Graphics, 22, 1998.
[21] D. Geman and G. Reynolds. Constrained restoration and the recovery of discon-
tinuities. IEEE Trans. on PAMI, 14(3), 1992.
[22] J.W. Goodman. Statistical properties of laser speckle patterns. In Topics in Applied
Physics, vol. 9: Laser Speckle and Related Phenomena, 2nd edition, J.C. Dainty,
Editor. Springer-Verlag, 1984.
[23] D.R. Greer, I. Fung, and J.H. Shapiro Maximum-likelihood multiresolution laser
radar range imaging. IEEE Trans. on Image Processing, 6(1), 1997.
[24] K. Haris, S.N. Efstratiadis, N. Maglaveras, and A.K. Katsaggelos. Hybrid image
segmentation using watersheds and fast region merging. IEEE Trans. on Image
Processing, 7(12), 1998.
[25] B. Hassibi, A.H. Sayed, and T. Kailath. Linear estimation in Krein spaces–part i:
Theory. IEEE Trans. Automatic Control, 41(1), 1996.
[26] B. Hassibi, A.H. Sayed, and T. Kailath. Linear estimation in Krein spaces–part ii:
Applications. IEEE Trans. Automatic Control, 41(1), 1996.
[27] G.T. Herman. Image Reconstruction from Projections: The Fundamentals of Com-
puterized Tomography. Academic Press, 1980.
BIBLIOGRAPHY 131
[28] D.V. Hinkley. Inference about the change-point in a sequence of random variables.
Biometrica, 57(1), 1970.
[30] B.K.P. Horn and B. Schunck. Determining optical flow. Artificial Intelligence, 17,
pages 185-203, 1981.
[31] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Int. J.
of Comp. Vis., 1:321–331, 1988.
[32] I.B. Kerfoot and Y. Bresler. Theoretical analysis of multispectral image segmen-
tation criteria. IEEE Trans. on Image Processing, 8(6), 1999.
[33] R.L. Kettig and D.A. Landgrebe. Classification of multispectral image data by
extraction and classification of homogeneous objects. IEEE Trans. on Geoscience
Electronics, GE-14(1), 1976.
[34] S. Kichenassamy. The Perona-Malik paradox. SIAM J. Applied Math., 57, 1997.
[36] G. Koepfler, C. Lopez, and J.-M. Morel. A multiscale algorithm for image segmen-
tation by variational method. SIAM J. Numer. Anal., 31(1), 1994.
[37] B. Kosko. Neural Networks for Signal Processing, pages 37-61. Prentice Hall, 1992.
[38] H. Krim and Y. Bao. A stochastic diffusion approach to signal denoising. In Proc.
ICASSP, Phoenix, AZ, 1999.
[39] Y. Leclerc. Constructing simple stable descriptions for image partitioning. Inter-
national Journal of Computer Vision, 3, 1989.
[40] S.G. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.
[42] Vision and Modeling Group, MIT Media Lab. Vision Texture Database.
http://www-white.media.mit.edu/vismod/imagery/VisionTexture/vistex.html
[45] K.M. Nagpal and P.P. Khargonekar. Filtering and smoothing in an H ∞ setting.
IEEE Trans. Automatic Control, 36(2), 1991.
132 BIBLIOGRAPHY
[46] S. Osher and L.I. Rudin. Feature-oriented image enhancement using shock filters.
SIAM J. Numer. Anal., 27(4), 1990.
[48] P. Perona. Orientation diffusions. IEEE Trans. on Image Processing, 7(3), 1998.
[49] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion.
IEEE Trans. on PAMI, 12(7), 1990.
[51] I. Pollak, A. S. Willsky, and H. Krim. A nonlinear diffusion equation as a fast and
optimal solver of edge detection problems. In Proc. ICASSP, Phoenix, AZ, 1999.
[55] B.M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision.
Kluwer Academic Publishers, 1994.
[56] L.I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal
algorithms. Physica D, 1992.
[57] G. Sapiro. From active contours to anisotropic diffusion: connections between basic
PDE’s in image processing. In Proc. ICIP, Lausanne, 1996.
[59] K. Sauer and C. Bouman. A local update strategy for iterative reconstruction from
projections. IEEE Trans. on Signal Processing, 41(2), 1993.
[62] G. Strang. Linear Algebra and Its Applications, 3rd Edition. Harcourt Brace
Jovanovich, 1988.
[63] P.C. Teo, G. Sapiro, and B. Wandell. Anisotropic smoothing of posterior proba-
bilities. In Proc. ICIP, Santa Barbara, CA, 1997.
BIBLIOGRAPHY 133
[64] P.C. Teo, G. Sapiro, and B. Wandell. Creating connected representations of cortical
gray matter for functional MRI visualization. IEEE Trans. on Medical Imaging,
16(6), December 1997.
[66] H. van Trees. Detection, Estimation, and Modulation Theory, volume 1. Wiley,
1968.
[67] V.I. Utkin. Sliding Modes in Control and Optimization. Springer-Verlag, 1992.
[68] P. Viola and W.M. Wells III. Alignment by maximization of mutual information.
International Journal of Computer Vision, 24(2), 1997.
[69] J. Weickert. Nonlinear diffusion scale-spaces: from the continuous to the discrete
setting. In Proc. ICAOS: Images, Wavelets, and PDEs, pages 111–118, Paris, 1996.
[70] R.F. Wagner, S.W. Smith, J.M. Sandrik, and H. Lopez. Statistics of speckle in
ultrasound B-scans. IEEE Trans. on Sonics and Ultrasonics, SU-30, May 1983
[72] A. Witkin. Scale-space filtering. In Int. Joint Conf. on AI, pages 1019–1022,
Karlsruhe, 1983.
[74] G. Wyszecki and W.S. Stiles. Color Science: Concepts and Methods, Quantitative
Data and Formulae. Wiley, 1982.
[75] S.C. Zhu and D. Mumford. Prior learning and Gibbs reaction-diffusion. IEEE
Trans. on PAMI, 19(11), 1997.
[76] S.C. Zhu and A. Yuille. Region competition: unifying snakes, region growing, and
Bayes/MDL for multiband image segmentation. IEEE Trans. on PAMI, 18(9), 1996.
134 BIBLIOGRAPHY
Index
H∞ , 23, 70, 88, 93, 94, 111 Geman, D., 22, 34, 35, 38, 66, 111
Geman, S., 8
automatic target recognition, 21 gradient, 25–28, 33, 34, 39, 41, 49–51,
binary classification, 23, 70–72, 111 53, 59, 62, 66, 69, 99, 100, 103,
Bouman, C., 31–33, 67 104, 106, 108, 111, 113
Brice, C., 34
Hinkley, D., 70, 91
circle-valued images, 3, 23, 99, 106, 108,
Koepfler, G., 30, 38, 59–64, 66
109, 112
Krim, H., 8
color image, 101, 109
color images, 23, 99, 101, 112 level crossing, 72–75, 77–79, 81, 84, 85,
computer vision, 22 88, 96, 118, 119, 123–125
dermatoscopy, 21, 65 likelihood, 20, 23, 32, 71, 79–83, 86, 94,
detection of abrupt changes, 3, 19, 22, 111, 112
70, 80 linear programming, 3, 70, 87, 88, 112,
diffusion, 3, 20–23, 27–29, 35, 37, 40–42, 127, 128
44, 45, 67, 69, 70, 100, 104, 106, linear programming , 87
111–113 Lopez, C., 30, 38, 59–64, 66
divergence, 25, 26, 28 Malik, J., 3, 21, 22, 28, 29, 31, 37, 40–
dynamic programming, 70, 87, 111 42, 49, 56, 58–60, 62, 63, 65, 69,
energy, 27, 30, 34, 35, 39–41, 45, 49– 70, 111
51, 53, 57, 62, 66, 67, 77, 78, Mallat, S., 8
84, 99, 101, 106, 108, 111, 112, Mitter, S., 8
124, 126 Morel, J.-M., 30, 38, 59–64, 66
enhancement, 21, 28, 29, 31, 38, 40–42, Mumford, D., 35, 38, 66
60, 66, 67, 111 Mumford-Shah functional, 3, 23, 30, 38,
61, 62, 66, 70, 83, 85, 86, 111,
Faugeras, O., 8 112
Fennema, C., 34
force function, 39–42, 44, 45, 47–49, 55– orientation, 23, 99, 104, 106, 108, 109,
57, 66, 70, 100, 101, 106, 107, 112
112 Osher, S., 31, 38, 66
135
136 INDEX
Pavlidis, T., 30, 34 total variation, 22, 31, 34, 38, 66, 67,
Perona, P., 3, 21, 22, 28, 29, 31, 37, 40– 111
42, 49, 56, 58–60, 62, 63, 65, 69,
70, 106, 111 ultrasound, 21, 22, 38, 65, 97, 111
region merging, 22, 29–31, 34, 37, 38, 48, vector-valued images, 3, 23, 99–103, 107,
51, 56, 59–61, 64–67, 74, 77, 85, 108, 112
87, 100, 106–108, 111
Weickert, J., 29
restoration, 19, 20, 22, 25, 31, 34, 37,
well-posedness, 42, 45–47, 101, 111
60, 66, 69, 112
Willsky, A., 8
Reynolds, G., 22, 34, 35, 38, 66, 111
Willsky, A.S., 7, 9
robustness, 3, 19, 21–23, 31, 37, 38, 60–
Witkin, A., 27
64, 66, 69, 70, 88, 94, 96, 101,
111–113 Zhu, S.-C., 35, 38, 66
Rudin, L., 31, 38, 66