0% found this document useful (0 votes)

38 views136 pages

Nonlinear Scale Space Analysis in Image Proce

This thesis is submitted by Ilya Pollak to the Department of Electrical Engineering and Computer Science at MIT in partial fulfillment of the requirements for a Doctor of Philosophy degree. The thesis focuses on developing robust and fast image segmentation algorithms that can handle large-amplitude noise like speckle noise in synthetic aperture radar images. It introduces a family of partial differential equations called stabilized inverse diffusion equations that can suppress noise while sharpening edges. The thesis is supervised by Professor Alan Willsky and Professor Hamid Krim of North Carolina State University.

Uploaded by

saujanya rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

38 views136 pages

Nonlinear Scale Space Analysis in Image Proce

Uploaded by

saujanya rao

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 136

Nonlinear Scale Space Analysis in Image Processing

Ilya Pollak

B.S. and M.Eng, Electrical Engineering and Computer Science

Massachusetts Institute of Technology, 1995

Submitted to the Department of Electrical Engineering and Computer Science in

partial fulfillment of the requirements for the degree of

Doctor of Philosophy
in
Electrical Engineering and Computer Science
at the Massachusetts Institute of Technology

August, 1999

°
c 1999 Massachusetts Institute of Technology
All Rights Reserved.

Signature of Author:
Dept. of Electrical Engineering and Computer Science
July 20, 2000
Certified by:
Alan S. Willsky
Professor of EECS
Thesis Supervisor
Certified by:
Hamid Krim
Professor of ECE, North Carolina State University,
Thesis Supervisor
Accepted by:
Arthur C. Smith
Professor of EECS
Chair, Committee for Graduate Students
2
Nonlinear Scale Space Analysis in Image Processing
by Ilya Pollak
ipollak@alum.mit.edu
Submitted to the Department of Electrical Engineering
and Computer Science on July 20, 2000
in Partial Fulfillment of the Requirements for the Degree
of Doctor of Philosophy in Electrical Engineering and Computer Science

Abstract
The objective of this work is to develop and analyze robust and fast image segmen-
tation algorithms. They must be robust to pervasive, large-amplitude noise, which
cannot be well characterized in terms of probabilistic distributions. This is because the
applications of interest include synthetic aperture radar (SAR) segmentation in which
speckle noise is a well-known problem that has defeated many algorithms. The meth-
ods must also be robust to blur, because many imaging techniques result in smoothed
images. For example, SAR image formation has a natural blur associated with it, due
to the finite aperture used in forming the image. We introduce a family of first-order
multi-dimensional ordinary differential equations with discontinuous right-hand sides
and demonstrate their applicability to segmenting both scalar-valued and vector-valued
images, as well as images taking values on a circle. An equation belonging to this family
is an inverse diffusion everywhere except at local extrema, where some stabilization is
introduced. For this reason, we call these equations “stabilized inverse diffusion equa-
tions” (“SIDEs”). Existence and uniqueness of solutions, as well as stability, are proven
for SIDEs. A SIDE in one spatial dimension may be interpreted as a limiting case of a
semi-discretized Perona-Malik equation [49,50], which, in turn, was proposed in order to
overcome certain shortcomings of Gaussian scale spaces [72]. These existing techniques
are reviewed in a background chapter. SIDEs are then described and experimentally
shown to suppress noise while sharpening edges present in the input image. Their ap-
plication to the detection of abrupt changes in 1-D signals is also demonstrated. It is
shown that a version of the SIDEs optimally solves certain change detection problems.
Its relations to the Mumford-Shah functional [44] and to linear programming are dis-
cussed. Theoretical performance analysis is carried out, and a fast implementation of
the algorithm is described.

Thesis Supervisor: Alan S. Willsky

Title: Professor of Electrical Engineering and Computer Science

Thesis Supervisor: Hamid Krim

Title: Assistant Professor of Electrical and Computer Engineering, North Carolina
State University
4
This thesis is dedicated to the memory of my

mother, Tamara B. Skomorovskaya (1935-1997).

6
Acknowledgments
After spending eight years at MIT, all in the same group and in the same office,
I have the dubious distinction of having been in the Stochastic Systems Group (SSG)
longer than any current (and, I suspect, past) member except our leader Alan Will-
sky1 . I have seen many people go through the group: dozens of graduate students,
six secretaries, numerous post-docs, research scientists, and visitors—sufficient to staff
an Electrical Engineering department of a medium-size college, and still have enough
administrative assistants left over to sustain “Murphy Brown” for at least a couple of
seasons.
I have seen two people engaged in a rubber-band fight; they are now a quantitative
analyst at a large financial company and an assistant professor at a top-ranked univer-
sity. I remember an attempted coup d’etat in the group when two other people (also
now assistant professors), using the fact that Alan was traveling, decided to take a shot
at the sacred tradition of naming the computer equipment after fine wines, by calling
our new printer “Gallo”. Luckily, Alan’s speedy and decisive actions upon his return
restored the order: the printer was re-named “Chenas”. Yet another member of the
group (also an assistant professor now) once surprised me with his masterful use of a
few Russian choice words; it turned out that he learned those especially for me, using
his five-language dictionary of obscenities. Even more entertaining is the ongoing feud
between two current group members: actions taken have included sealing the opponent
inside a room, putting something interesting onto his desk (e.g., another desk), invent-
ing a nickname for him and cleverly working it—in a disguised form—into a seminar
on one’s research, etc. In short, I have seen a lot. Therefore, despite the fact that my
acknowledgment section is unusually large, I cannot possibly name all the people who
have contributed to making my experience at MIT an enjoyable and entertaining one.
First and foremost, I would like to thank my thesis supervisor Alan Willsky, who
was the ideal advisor: not only a great scientist, but also a very enthusiastic and
supportive mentor, providing broad guidance and posing interesting and important
problems, yet allowing me virtually unrestricted freedom. His encouragement when
research progressed well and his patience when it did not, were extremely helpful.
Equally important was the fact that Alan was a fantastic collaborator: he was always
up to date on every minor technical detail (in spite of his large number of students),
which is why I never felt isolated in my research. His many suggestions on how to
deal with difficult technical issues, as well as his vast knowledge of the literature, were
invaluable. The speed with which he read drafts of papers and thesis chapters was

1
Despite this fact, I probably hold another dubious record—namely, having the shortest dissertation.
I think there have been people in the group whose theses had sentences longer than this whole document.

7
8 ACKNOWLEDGMENTS

sometimes frightening; by following his comments and suggestions, one would typically
improve the quality of a paper by orders of magnitude. His help and advice—and, I am
sure, his reputation—were also instrumental to the success of my job-hunting campaign.
My co-supervisor Hamid Krim was equally supportive; I am very thankful to him for
having an “open door” and for the endless hours we spent discussing research, politics,
soccer, religion, and life in general. I am grateful to Hamid for inviting me to spend
several days in North Carolina, and for his immense help during my job search.
The suggestions and advice of the two remaining committee members—Olivier
Faugeras and Sanjoy Mitter—also greatly contributed both to my thesis and to finding
a good academic position. I am indebted to Olivier for inviting me to visit his group
at INRIA. I thoroughly enjoyed the month I spent in Antibes and Sophia-Antipolis, for
which I would like to thank the whole ROBOTVIS group: Didier Bondyfalat, Sylvain
Bougnoux, Rachid (Nour-Eddine) Deriche, Cyrille Gauclin, José Gomez, Pierre Korn-
probst, Marie-Cécile Lafont, Diane Lingrand, Théo Papadopoulo, Nikos Paragios, Luc
Robert, Robert Stahr, Thierry Viéville, and Imad Zoghlami.
My special thanks go to Stéphane Mallat for inviting me to give a seminar at École
Polytechnique, and for his help in my job search. He has greatly influenced my profes-
sional development, both through a class on wavelets which he taught very engagingly
and enthusiastically at MIT in 1994, and through his research, some of which was the
basis for my Master’s thesis.
I would like to thank Stuart Geman for suggesting the dynamic programming solu-
tion of Chapter 4, as well as for a number of other interesting and useful insights.
I am thankful for the stimulating discussions with Michele Basseville, Charlie Bouman,
Yoram Bresler, Patrick Combettes, David Donoho, Al Hero, Mohamed Khodja, Jiten-
dra Malik, Jean-Michel Morel, David Mumford, Igor Nikiforov, Pietro Perona, Jean-
Christophe Pesquet, Guillermo Sapiro, Eero Simoncelli, Gil Strang, Allen Tannenbaum,
Vadim Utkin, and Song-Chun Zhu, all of whom have contributed to improving the qual-
ity of my thesis and to broadening my research horizon.
In addition to the people I acknowledged above, Eric Miller, Peyman Milanfar, and
Andy Singer generously shared with me many intricacies of the academic job search.
I thank the National Science Foundation and Alan Willsky for providing the financial
support, through a Fellowship and a Research Assistantship, respectively. I thank Clem
Karl, Paul Fieguth and Andrew Kim for all their help with computers; Ben Halpern
for teaching me repeatedly and patiently how to use the fax machine, and for helping
me to tame our color printer; Mike Daniel and Austin Frakt for sharing the LATEXstyle
file which was used to format this thesis; John Fisher for organizing a very informative
reading group on learning theory; Asuman Koksal for making our office a little cozier;
Andy Tsai for being a great officemate; Dewey Tucker for his help in generating my
PowerPoint job talk; Jun Zhang and Tony Yezzi both for technical interactions and for
in-depth discussions of the opera; Mike Schneider for sharing his encyclopedic knowledge
(and peculiar tastes) of the art in general, and for educating me on a number of other
topics, ranging from estimation and linear programming to botany and ethnography,
ACKNOWLEDGMENTS 9

as well as for volunteering to be my personal guide and chauffeur in Phoenix, AZ;

Ron Dror for our numerous discussions of biological and computer vision; and Martin
Wainwright for conveying some interesting facts about probability and natural image
statistics, and for having even more peculiar tastes in arts than Mike. I also enjoyed
the sometimes heated arguments with Khalid Daoudi over soccer, but I still think he
was wrong about the penalty in that Brazil vs. Norway game! The unfortunate lack of
a resident ballet expert in SSG was rectified by my summer employment at Alphatech,
Inc. (in Burlington, MA), where I had a chance to exchange opinions and ballet tapes
with Bob Tenney.
I enjoyed interacting with all other SSG group members and visitors, past and
present. The ones whose names are not mentioned above are Lori Belcastro, François
Bergeaud, Mickey Bhatia, Terrence Ho, Taylore Kelly, Junmo Kim, Alex Ihler, Bill
Irving, Seema Jaggi, Rachel Learned, Cedric Logan, Mark Luettgen, Mike Perrott,
Paula Place, John Richards, and Gerald Spica.
I thank my brother Nick for his leadership by example in showing me that an
excellent PhD thesis can be finished even under very unfavorable circumstances, such
as having two small children and a full-time job.
I am most grateful to my parents who gladly endured many sacrifices that accom-
panied our emigration, so that my brother and I could have a better life in the United
States. Without their strength and courage I would not have had a chance to study in
the best university in the world; without their tremendous support, I would not have
been able to complete my studies. As I know how important my getting this degree
was to my mother, I dedicate this thesis to her memory.
10
Contents

Abstract 3

Acknowledgments 7

List of Figures 15

1 Introduction 19
1.1 Problem Description and Motivation . . . . . . . . . . . . . . . . . . . . 19
1.2 Summary of Contributions and Thesis Organization . . . . . . . . . . . 22

2 Preliminaries 25
2.1 Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 25
2.2 Linear and Non-linear Diffusions. . . . . . . . . . . . . . . . . . . . . . . 27
2.3 Region Merging Segmentation Algorithms. . . . . . . . . . . . . . . . . . 30
2.4 Shock Filters and Total Variation. . . . . . . . . . . . . . . . . . . . . . 31
2.5 Constrained Restoration of Geman and Reynolds. . . . . . . . . . . . . . 34
2.6 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34

3 Image Segmentation with Stabilized Inverse Diffusion Equations 37

3.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 37
3.2 A Spring-Mass Model for Certain Evolution Equations. . . . . . . . . . 38
3.3 Stabilized Inverse Diffusion Equations (SIDEs): The Definition. . . . . . 41
3.4 Properties of SIDEs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.1 Basic Properties in 1-D. . . . . . . . . . . . . . . . . . . . . . . . 45
3.4.2 Energy Dissipation in 1-D. . . . . . . . . . . . . . . . . . . . . . 49
3.4.3 Properties in 2-D. . . . . . . . . . . . . . . . . . . . . . . . . . . 56
3.5 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57
3.5.1 Experiment 1: 1-D Unit Step in High Noise Environment. . . . . 57
3.5.2 Experiment 2: Edge Enhancement in 1-D. . . . . . . . . . . . . . 60
3.5.3 Experiment 3: Robustness in 1-D. . . . . . . . . . . . . . . . . . 61
3.5.4 Experiment 4: SIDE Evolutions in 2-D. . . . . . . . . . . . . . . 64
3.6 Related Approaches. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 66

11
12 CONTENTS

3.6.1 Mumford-Shah, Geman-Reynolds, and Zhu-Mumford. . . . . . . 66

3.6.2 Shock Filters and Total Variation. . . . . . . . . . . . . . . . . . 66
3.7 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

4 Probabilistic Analysis 69
4.1 Introduction. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69
4.2 Background and Notation. . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.3 SIDE as an Optimizer of a Statistic. . . . . . . . . . . . . . . . . . . . . 73
4.3.1 Implementation of the SIDE Via a Region Merging Algorithm. . 75
4.4 Detection Problems Optimally Solved by the SIDE. . . . . . . . . . . . . 79
4.4.1 Two Distributions with Known Parameters. . . . . . . . . . . . . 79
4.4.2 Two Gaussian Distributions with Unknown Means. . . . . . . . . 81
4.4.3 Random Number of Edges and the Mumford-Shah Functional. . 83
4.5 Alternative Implementations. . . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.1 Dynamic Programming. . . . . . . . . . . . . . . . . . . . . . . . 87
4.5.2 An Equivalent Linear Program. . . . . . . . . . . . . . . . . . . . 87
4.6 Performance Analysis. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.1 Probability Bounds. . . . . . . . . . . . . . . . . . . . . . . . . . 88
4.6.2 White Gaussian Noise. . . . . . . . . . . . . . . . . . . . . . . . . 91
4.6.3 H∞ -Like Optimality. . . . . . . . . . . . . . . . . . . . . . . . . . 94
4.7 Analysis in 2-D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 96

5 Segmentation of Color, Texture, and Orientation Images 99

5.1 Vector-Valued Images. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 99
5.1.1 Experiment 1: Color Images. . . . . . . . . . . . . . . . . . . . . 101
5.1.2 Experiment 2: Texture Images. . . . . . . . . . . . . . . . . . . . 102
5.2 Orientation Diffusions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 106
5.2.1 Experiments. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 108
5.3 Conclusion. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 109

6 Conclusions and Future Research 111

6.1 Contributions of the Thesis. . . . . . . . . . . . . . . . . . . . . . . . . . 111
6.2 Future Research. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2.1 Feature Extraction. . . . . . . . . . . . . . . . . . . . . . . . . . 112
6.2.2 Choosing the Force Function. . . . . . . . . . . . . . . . . . . . . 112
6.2.3 Choosing the Stopping Rule. . . . . . . . . . . . . . . . . . . . . 113
6.2.4 Image Restoration. . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.5 PDE Formulation. . . . . . . . . . . . . . . . . . . . . . . . . . . 114
6.2.6 Further Probabilistic Analysis. . . . . . . . . . . . . . . . . . . . 114
6.2.7 Prior Knowledge. . . . . . . . . . . . . . . . . . . . . . . . . . . . 114

A Proof of Lemma on Sliding (Chapter 3) 115

CONTENTS 13

B Proofs for Chapter 4. 117

B.1 Proof of Proposition 4.1. . . . . . . . . . . . . . . . . . . . . . . . . . . . 117
B.2 Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. . . . . . . 118
B.3 Proof of Lemma 4.3. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 124
B.4 Completion of the Proof of Proposition 4.6. . . . . . . . . . . . . . . . . 126
B.5 Equivalence of the SIDE (4.1) to a Linear Programming Problem. . . . 127

Bibliography 129

Index 135
14 CONTENTS
List of Figures

1.1 SAR image of trees and grass. . . . . . . . . . . . . . . . . . . . . . . . . 19

2.1 (a) An artificial image; (b) the edges corresponding to the image in (a);
(c) the image in (a) blurred with a Gaussian kernel; (d) the edges cor-
responding to the blurred image. Note that T-junctions are removed,
corners are rounded, and two black squares are merged together. The
edges here are the maxima of the absolute value of the gradient. . . . . 28
2.2 The G function from the right-hand side of the Perona-Malik equation
(2.11). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
2.3 The F function from the right-hand side of the Perona-Malik equation
(2.12). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.4 (b) Gaussian blurring of the signal depicted in (a). (c) Signal depicted in
(b), with additive white Gaussian noise of variance 0.1. (d) The steady
state of the shock filter (2.15), with the signal (b) as the initial condition.
The reconstruction is perfect, modulo numerical errors. (e) The steady
state of the shock filter (2.15), with the signal (c) as the initial condition.
It is virtually the same as (c), since all extrema remain stationary. . . . 32
2.5 Filtering the blurred unit step signal of Figure 2.4, (b) with the shock
filter (2.16): (a) 5 iterations, (b) 10 iterations, (c) 18 iterations. Spurious
maxima and minima are created; the unit step is never restored. . . . . 33
2.6 The SIDE energy function, also encountered in the models of Geman and
Reynolds, and Zhu and Mumford. . . . . . . . . . . . . . . . . . . . . . . 35

3.1 A spring-mass model. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38

3.2 Force functions: (a) diffusion; (b) inverse diffusion; (c) Perona-Malik. . . 40
3.3 Spring-mass model in 2-D (view from above). . . . . . . . . . . . . . . . 41
3.4 Force function for a stabilized inverse diffusion equation. . . . . . . . . . 42
3.5 A horizontal spring is replaced by a rigid link. . . . . . . . . . . . . . . . 43
3.6 Solution field near discontinuity surfaces. . . . . . . . . . . . . . . . . . 46
3.7 Typical picture of the energy dissipation during the evolution of a SIDE. 53
3.8 A modified force function, for which sliding happens in 2-D, as well as
in 1-D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56

15
16 LIST OF FIGURES

3.9 The SIDE force function used in the experimental section. . . . . . . . . 57

3.10 Scale space of a SIDE for a noisy unit step at location 100: (a) the
original signal; (b)–(d) representatives of the resulting SIDE scale space. 58
3.11 Scale space of a Perona-Malik equation with a large K for the noisy step
of Figure 3.10. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 58
3.12 Scale space of the region merging algorithm of Koepfler, Lopez, and
Morel for the noisy unit step signal of Figure 3.10(a). . . . . . . . . . . . 59
3.13 Scale space of a SIDE for a noisy blurred 3-edge staircase: (a) noise-
free original signal; (b) its blurred version with additive noise; (c),(d)
representatives of the resulting SIDE scale space. . . . . . . . . . . . . . 60
3.14 A unit step with heavy-tailed noise. . . . . . . . . . . . . . . . . . . . . 61
3.15 Scale spaces for the signal of Figure 3.14: SIDE (left) and Koepfler-
Lopez-Morel (right). Top: 33 regions; middle: 11 regions; bottom: 2
regions. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 61
3.16 Mean absolute errors for Monte-Carlo runs. (Koepfler-Lopez-Morel: solid
line; SIDE: broken line.) The error bars are ±two standard deviations.
(a) Different contamination probabilities (0, 0.05, 0.1, and 0.15); contam-
inating standard deviation is fixed at 2. (b) Contamination probability
is fixed at 0.15; different contaminating standard deviations (1, 2, and 3). 62
3.17 Scale space of a Perona-Malik equation with large K for the signal of
Figure 3.14. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63
3.18 Scale space of a SIDE for the SAR image of trees and grass, and the final
boundary superimposed on the initial image. . . . . . . . . . . . . . . . 64
3.19 Segmentations of the SAR image via the region merging method of
Koepfler, Lopez, and Morel. . . . . . . . . . . . . . . . . . . . . . . . . . 64
3.20 Scale space of a SIDE for the ultrasound image of a thyroid. . . . . . . . 65

4.1 Functions F from the right-hand side of the SIDE: (a) generic form; (b)
the signum function. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70
4.2 Illustrations of Definitions 4.1 and 4.3: a sequence with three α-crossings,
where α=3 (top); the hypothesis generated by the three α-crossings (mid-
dle); the hypothesis generated by the two rightmost α-crossings (bottom). 72
4.3 Edge detection for a binary signal in Gaussian noise. . . . . . . . . . . . 80
4.4 Detection of changes in variance of Gaussian noise. . . . . . . . . . . . . 80
4.5 Edge detection in 2-D. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97

5.1 Spring-mass model for vector-valued diffusions. This figure shows a 2-

by-2 image whose pixels are two-vectors: (2,2), (0,0), (0,1), and (1,2).
The pixel values are depicted, with each pixel connected by springs to
its neighboring pixels. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
5.2 (a) A test image; (b) its noisy version (normalized); (c) detected bound-
ary, superimposed onto the noise-free image . . . . . . . . . . . . . . . . 101
LIST OF FIGURES 17

5.3 (a) A test image; (b) its noisy version (normalized); (c) detected bound-
ary, superimposed onto the noise-free image . . . . . . . . . . . . . . . . 101
5.4 (a) Image of two textures: fabric (left) and grass (right); (b) the ideal
segmentation of the image in (a). . . . . . . . . . . . . . . . . . . . . . . 102
5.5 (a-c) Filters; (d-f) Filtered versions of the image in Figure 5.4, (a). . . . 102
5.6 (a) Two-region segmentation, and (b) its deviation from the ideal one. . 103
5.7 (a) A different feature image: the direction of the gradient; (b) the cor-
responding two-region segmentation, and (c) its deviation from the ideal
one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 103
5.8 (a) Image of two wood textures; (b) the ideal segmentation of the image
in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 104
5.9 (a) Feature image for the wood textures; (b) the corresponding five-region
segmentation, and (c) its deviation from the ideal one. . . . . . . . . . . 104
5.10 (a) Another image of two wood textures; (b) the ideal segmentation of
the image in (a). . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.11 (a) Feature image for the wood textures in Figure 5.10, (a); (b) the
corresponding five-region segmentation, and (c) its deviation from the
ideal one. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
5.12 A SIDE energy function which is flat at π and −π and therefore results
in a force function which vanishes at π and −π. . . . . . . . . . . . . . . 106
5.13 (a) The orientation image for Figure 5.7, (a); (b) the corresponding two-
region segmentation, and (c) its deviation from the ideal one. . . . . . . 108
5.14 (a) The orientation image for Figure 5.9, (a); (b) the corresponding five-
region segmentation, and (c) its deviation from the ideal one. . . . . . . 109
5.15 (a) The orientation image for Figure 5.10, (a); (b) the corresponding
five-region segmentation, and (c) its deviation from the ideal one. . . . . 109

6.1 A force function which results in faster removal of outliers. . . . . . . . 113

B.1 Samples of a signal (top plots) and impossible edge configurations of

optimal hypotheses (bottom plots). . . . . . . . . . . . . . . . . . . . . . 117
18 LIST OF FIGURES
Chapter 1

Introduction

N this chapter, we introduce the problem of segmentation and change detection ad-
I dressed in this thesis, and describe the organization of the thesis.

¥ 1.1 Problem Description and Motivation

To segment a 1-D or 2-D signal means, roughly speaking, to partition the domain of
its definition into several regions in such a way that the signal is homogeneous within
each region and changes abruptly between regions. The exact meaning of the word
“homogeneous” depends on the application: most often it means smoothly varying or
constant intensity (this case is addressed in Chapters 3 and 4), or uniform texture
(Chapters 4 and 5).
The objective of this thesis is to develop and analyze robust and fast image seg-
mentation algorithms. They must be robust to pervasive, large-amplitude noise, which
cannot be well characterized in terms of probabilistic distributions. This is because the
applications of interest are exemplified by synthetic aperture radar (SAR) segmenta-
tion in which speckle noise is a well-known problem that has defeated many algorithms.
(A prototypical SAR log-magnitude image of two textural regions–forest and grass–is
shown in Figure 1.1.) The methods must also be robust to blur, because many imaging
techniques result in smoothed images. For example, SAR image formation has a natural
blur associated with it, due to the finite aperture used in forming the image.
SAR image of forest and grass

100

200
100 200

Figure 1.1. SAR image of trees and grass.

19
20 CHAPTER 1. INTRODUCTION

Image segmentation is closely related to restoration, that is, the problem of estimat-
ing an image based on its degraded observation. Indeed, the solution to one of these
problems makes the other simpler: estimation is easier if the boundaries of homogeneous
image regions are known, and vice versa, segmentation is easier once a good estimate
of the image has been computed. It is therefore natural that many segmentation al-
gorithms are related to restoration techniques, and in fact some methods combine the
two, producing estimates of both the edge locations and image intensity [36, 44], as we
will see in Chapter 2.
In describing any restoration or segmentation technique, the notion of scale is very
important. Any such technique incorporates a scale parameter—either directly in the
computation procedure, or implicitly as a part of the image model—which controls
the smoothness of the estimate and/or sizes of the segmented regions. The precise
definitions of scale, in several contexts, are given in Chapters 2 and 3; intuitively,
changing the parameter from zero to infinity will produce a so-called scale space, i.e.
a set of increasingly coarse versions of the input image. There are two approaches to
generating a scale space: one starts with a probabilistic model, the other starts with a
set of “common-sense” heuristics. The difference between the two is conceptual: they
both may lead to the same algorithm [39, 44], producing the same scale space. In the
former case, one would build a probabilistic model of images of interest [6, 32, 39] and
proceed to derive an algorithm for computing the solution which is, in some probabilistic
sense, optimal. For example, one could model images as piecewise constant functions
with additive white Gaussian noise, and the edges (i.e. the boundaries separating the
constant pieces of the function) as continuous curves whose total length is a random
variable with a known distribution. Assigning larger probabilities to the occurrence of
edges would correspond to larger scales in such a model, which will be illustrated in
Chapter 4. Given a realization of this random field, the objective could be to compute
the maximum likelihood estimates [66] of the edge locations. The main shortcoming of
this approach is that a good model is unavailable in many applications, and that usually
any realistic model yields a complicated objective functional to be optimized. Obtaining
the optimal solution is therefore not computationally feasible, and one typically settles
for a local maximum [6, 63]. An alternative to such probabilistic methods of generating
scale spaces is to devise an algorithm using a heuristic description of images of interest.
Stabilized Inverse Diffusion Equations (SIDEs), which are the main topic of this thesis,
belong to this latter category.
SIDEs are motivated by the great recent interest in using evolutions specified by
partial differential equations (PDE’s) as image processing procedures for tasks such as
restoration and segmentation, among others [1, 12, 35, 46, 49, 50, 55–57, 72]. The basic
paradigm behind SIDEs, borrowed from [1, 35, 49, 72], is to treat the input image as the
initial data for a diffusion-like differential equation. The unknown in this equation is
usually a function of three variables: two spatial variables (one for each image dimen-
sion) and the scale—which is also called time because of the similarity of such equations
to evolution equations encountered in physics. In fact, one of the starting points of this
Sec. 1.1. Problem Description and Motivation 21

line of investigation was the observation [72] that smoothing an image with Gaussians
of varying width is equivalent to solving the linear heat diffusion equation with the
image as the initial condition. Specifically, the solution to the heat equation at time
t is the convolution of its initial condition with a Gaussian of variance 2t. Gaussian
filtering has been used both to remove noise and as a pre-processor for edge detection
procedures [9]. It has serious drawbacks, however: it displaces and removes important
image features, such as edges, corners, and T-junctions. (An example of this behavior
will be given in Chapter 2 (Figure 2.1).) The interpretation of Gaussian filtering as a
linear diffusion led to the design of other, nonlinear, evolution equations, which better
preserve these features [1, 46, 49, 55–57]. For example, one motivation for the work of
Perona and Malik in [49, 50] is achieving both noise removal and edge enhancement
through the use of an equation which in essence acts as an unstable inverse diffusion
near edges and as a stable linear-heat-equation-like diffusion in homogeneous regions
without edges.
The point of departure for the development of our SIDEs are the anisotropic diffu-
sions introduced by Perona and Malik and described in the next chapter. In a sense
that we will make both precise and conceptually clear, the evolutions that we intro-
duce may be viewed as a conceptually limiting case of the Perona-Malik diffusions.
These evolutions have discontinuous right-hand sides and act as inverse diffusions “al-
most everywhere” with stabilization resulting from the presence of the discontinuities
in the right-hand side. As we will see, the scale space of such an equation is a fam-
ily of segmentations of the original image, with larger values of the scale parameter t
corresponding to segmentations at coarser resolutions.
Since “segmentation” may have different meanings in different contexts, we close
this section by further clarifying which segmentation problems are addressed in this
thesis and which are not. This also gives us an opportunity to mention four important
application areas.

Automatic Target Recognition. Segmentation problems arise in many aspects of

image processing for automatic target recognition. Some examples are the iden-
tification of tree-grass boundaries in synthetic aperture radar images [18] (an
example given in the beginning of this section will be further treated in Chapter
3), the localization of hot targets in infrared radar [15], and the separation of
foreground and background in laser radar (see [23] and references therein). Due
to high rates of data acquisition, it is desirable for some of these problems that the
algorithm be fast, in addition to being robust. As shown in Chapters 3 and 4, the
algorithms presented in this thesis outperform existing segmentation algorithms
in speed and/or robustness.

Segmentation of medical images. Given an image or a set of images resulting from

a medical imaging procedure—such as ultrasound [13], magnetic resonance imag-
ing [2, 64], tomography [6, 27], dermatoscopy [19, 60]—it is necessary to extract
certain objects of interest, e. g. an internal organ, a tumor, or a boundary between
22 CHAPTER 1. INTRODUCTION

the gray matter and white matter. The main challenges of the segmentation prob-
lem depend on the object and on the imaging modality. For example, ultrasound
imaging introduces both significant blurring and speckle noise [8, 22, 70], and so
the corresponding segmentation algorithms must be robust to such degradations.
Robustness of the algorithm introduced in this thesis is experimentally demon-
strated in Chapters 3 and 4; Chapter 4 also contains its theoretical analysis.

Detection of abrupt changes in 1-D signals [3]. Application areas include analy-
sis of electrocardiograms and seismic signals, vibration monitoring in mechanical
structures, and quality control. Several synthetic 1-D examples are considered in
Chapters 3 and 4.

Computer vision. In the computer vision literature, the term “segmentation” often
refers to finding the contours of objects in natural images—i.e., photographic
pictures of scenes which we are likely to see in our everyday life [16]. This is an
important problem in low-level vision, because it has been universally accepted
since [71] that segmenting the perceived scene plays an important role in human
vision. However, noise and other types of degradation are usually not as significant
here as in the medical and radar images; the main challenge is the variety of
objects, shapes, and textures in a typical picture. This problem is therefore not
directly addressed in the present thesis; applying the algorithms developed here
to this problem is a topic for future research.

¥ 1.2 Summary of Contributions and Thesis Organization

Chapter 2 is a review of several methods of image restoration and segmentation, each
of which has important connections to the main contribution of this thesis, namely
Stabilized Inverse Diffusion Equations (SIDEs). We mentioned in the previous section
that SIDE evolutions produce nonlinear scale spaces, similarly to the equations in [1,49].
What sets SIDEs apart from other frameworks is the form of their right-hand sides which
results in improved performance. This is shown in Chapter 3, both through experiments
confirming the SIDEs’ speed and robustness, and through proving a number of their
useful properties, such as stability with respect to small changes in the input data.
It is also explained how the SIDEs are related to several existing methods, such as
Perona-Malik diffusions [49], shock filters [46], total variation minimization [6, 46, 58],
the Geman-Reynolds functional [21], and region merging [43].
The main drawback of the methods which are based on heuristic descriptions of
images of interest (rather than on probabilistic models) is that it is usually unclear how
they perform on noisy images; experimentation is the most popular way of demonstrat-
ing robustness [5,49,56,63]. This has been the case with diffusion-based methods: they
are complicated nonlinear procedures, and therefore theoretical analysis of what hap-
pens when a random field is subjected to such a procedure is usually impossible. Unlike
their predecessors, SIDEs turn out to be amenable to some probabilistic analysis, which
Sec. 1.2. Summary of Contributions and Thesis Organization 23

is carried out in Chapter 4. It is shown that a specific SIDE finds in N log N time the
maximum likelihood solutions to certain binary classification problems. The likelihood
function for one of these problems is essentially a 1-D version of the Mumford-Shah
functional [44]. Thus, an interesting link is established between diffusion equations,
Mumford and Shah’s variational formulation, and probabilistic models. The robustness
of the SIDE is explained by showing that, in a certain special case, it is optimal with re-
spect to an H∞ -like criterion—which, roughly speaking, means that the SIDE achieves
the minimum worst-case error. The performance is also analyzed by computing bounds
on the probabilities of errors in edge location estimates. To summarize, the main con-
tribution of Chapter 4 is establishing a connection between diffusion-based methods
and maximum likelihood edge detection, as well as extensive performance analysis.
Chapter 5 extends SIDEs to vector-valued images and images taking values on a
circle. We argue that most of the properties derived in Chapter 3 carry over. These
results are applicable to color segmentation, where the image value at every pixel is
a three-vector of red, green, and blue values. We also apply our algorithm to texture
segmentation, in which the vector image to be processed is formed by extracting features
from the raw texture image, as well as to segmenting orientation images.
Possible directions of future research are proposed in Chapter 6.
24 CHAPTER 1. INTRODUCTION
Chapter 2

Preliminaries

NUMBER of existing algorithms for image restoration and segmentation, which

A are related to SIDEs, are summarized in this chapter.

¥ 2.1 Notation.
In this section, we describe the notation which is used in the current chapter. Most
of this notation will carry over to the rest of the thesis; however, the large quantity of
symbols needed will force us to adopt a slightly different notation in Chapter 4—which
we will describe explicitly in Section 4.2.
We begin with the one-dimensional (1-D) case. The 1-D signal to be processed is
denoted by u0 (x). The superscript 0 is a reminder of the fact that the signal is to be
processed via a partial differential equation (PDE) of the following form:

ut = A1 (u, ux , uxx ) (2.1)

0
u(0, x) = u (x).

The variable t is called scale or time, and the solution u(t, x) to (2.1), for 0 ≤ t < ∞,
is called a scale space. The partial derivatives with respect to t and x are denoted by
subscripts, and A1 is an operator. The scale space is called linear (nonlinear) if A1 is
a linear (nonlinear) operator.
Similarly, an image u0 (x, y) depending on two spatial variables, x and y, will be
processed using a PDE of the form

ut = A2 (u, ux , uy , uxx , uyy , uxy ) (2.2)

0
u(0, x, y) = u (x, y),

which generates the scale space u(t, x, y), for 0 ≤ t < ∞. In the PDEs we consider,
the right-hand side will sometimes involve the gradient and divergence operators. The
gradient of u(t, x, y) is the two-vector consisting of the partial derivatives of u with
respect to the spatial variables x and y:
def
∇u = (ux , uy )T , (2.3)

25
26 CHAPTER 2. PRELIMINARIES

where the superscript T denotes the transpose of a vector. The norm of the gradient
is:
q
def
|∇u| = u2x + u2y (2.4)

The divergence of a vector function (u(x, y), v(x, y))T is:

µ ¶
~ u def
∇· = u x + vy . (2.5)
v

We also consider semi-discrete versions of (2.1) and (2.2), obtained by discretizing the
spatial variables and leaving t continuous. Specifically, an N -point 1-D discrete signal
to be processed is denoted by u0 ; it is an element of the N -dimensional vector space
IRN . We exclusively reserve boldface letters for vectors—i.e., discrete signals and
images. The vector u0 is the initial condition to the following N -dimensional ordinary
differential equation (ODE):

u̇(t) = B 1 (u(t)) (2.6)

0
u(0) = u ,

where u(t) is the corresponding scale space, and u̇(t) is its derivative with respect to
t. We denote the entries of an N -point signal by the same symbol as the signal itself,
with additional subscripts 1 through N :

u0 = (u01 , u02 , . . . , u0N −1 , u0N )T ;

u(t) = (u1 (t), u2 (t), . . . , uN −1 (t), uN (t))T .

Since most operators B1 of interest will involve first differences of the form un+1 − un ,
it will simplify our notation to also define non-existent samples u0 and uN +1 . Thus,
all vectors will implicitly be (N + 2)-dimensional. Typically, we will take u0 = u1 and
uN +1 = uN . We emphasize that subscripts 0 through N + 1 will always denote the
samples of a signal, whereas the superscript 0 will be reserved exclusively to denote the
signal which is the initial condition of a differential equation.
2
We similarly denote an N -by-N image to be processed by u0 ∈ IRN ; it will always
be clear from the context whether u0 refers to a 1-D or a 2-D discrete signal. The
corresponding system of ODEs is

u̇(t) = B 2 (u(t)) (2.7)

0
u(0) = u ,

where u0 and u(t) are matrices whose entries in the i-th row and j-th column are u0i,j
and ui,j (t), respectively.
Sec. 2.2. Linear and Non-linear Diffusions. 27

The operators B 1 and B 2 will typically be the negative gradient of some energy
functional, which we will denote by E(u). This energy will depend on the first differences
of u in the following way:
X
E(u) = E(us − ur ), (2.8)
(s,r)∈N

where
• E is an even function;

• s and r are single indices if u is a 1-D signal and pairs of indices if u is a 2-D
image;

• N is the list of all neighboring pairs of pixels: s and r are neighbors if and only
if (s, r) ∈ N .
We will use the following neighborhood structure in 1-D:
N −1
N = {(n, n + 1)}n=1 . (2.9)

In other words, the sample at n has two neighbors: at n − 1 and at n + 1. We use

a similar neighborhood structure in 2-D, where each pixel (i, j) has four neighbors:
(i − 1, j), (i + 1, j), (i, j − 1), and (i, j + 1).

¥ 2.2 Linear and Non-linear Diffusions.

To understand the conceptual basis of SIDEs, it is useful to briefly review one of the lines
of thought that has spurred work in evolution-based methods for image analysis. In [72]
Witkin proposed filtering an original image u0 (x, y) with Gaussian kernels of variance
2t, to result in a one-parameter family of images u(t, x, y) he referred to as “a scale
space”. This filtering technique has both a very important interpretation and a number
of significant limitations that inspired the search for alternative scale spaces, better
adapted to edge detection and image segmentation. In particular, a major limitation is
that linear Gaussian smoothing blurs and displaces edges, merges boundaries of objects
that are close to each other, and removes edge junctions [55], as illustrated in Figure
2.1.
However, the important insight found, for example, in [35], is that the family of
images u(t, x, y) is the solution of the linear heat equation with u0 (x, y) as the initial
data:

ut = uxx + uyy (2.10)

0
u(0, x, y) = u (x, y),

where the subscripts denote partial derivatives. This insight led to the pursuit and
development of a new paradigm for processing images via the evolution of nonlinear
28 CHAPTER 2. PRELIMINARIES

(a) An image. (b) Its edges.

(c) Blurred image. (d) Corresponding edges.

Figure 2.1. (a) An artificial image; (b) the edges corresponding to the image in (a); (c) the image
in (a) blurred with a Gaussian kernel; (d) the edges corresponding to the blurred image. Note that
T-junctions are removed, corners are rounded, and two black squares are merged together. The edges
here are the maxima of the absolute value of the gradient.

PDEs [1, 46, 49, 50, 56] which effectively lift the limitations of the linear heat equation.
For example, in [49, 50], Perona and Malik propose to achieve both noise removal and
edge enhancement through the use of a non-uniform diffusion which in essence acts as an
unstable inverse diffusion near edges and as a stable linear-heat-equation-like diffusion
in homogeneous regions without edges:
~ · {G(|∇u|)∇u} ,
ut = ∇ (2.11)
0
u(0, x, y) = u (x, y),
~ and ∇ are the divergence (2.5) and gradient (2.3), respectively. The nonlinear
where ∇

G(v)

v
Figure 2.2. The G function from the right-hand side of the Perona-Malik equation (2.11).

diffusion coefficient G(|∇u|) is chosen so as to suppress diffusion in the regions of high

gradient which are identified with edges, and to encourage diffusion in low-gradient
regions identified with noise (Figure 2.2). More formally, G is a nonnegative monoton-
Sec. 2.2. Linear and Non-linear Diffusions. 29

ically decreasing function with G(0) = 1. (Note that if G were identically equal to 1,
~ · (∇u) = uxx + uyy .)
then (2.11) would turn into the linear heat equation (2.10), since ∇
To simplify the analysis of the behavior of this equation near edges, we re-write it
below in one spatial dimension; however, the statements we make also apply to 2-D.
∂
ut = {F (ux )} (2.12)
∂x
u(0, x) = u0 (x),
where F (ux ) = G(|ux |)ux , i.e., F is odd, and tends to zero at infinity. Perona and
Malik also impose that F have a unique maximum at some location K (Figure 2.3).
This constant K is the threshold between diffusion and enhancement, in the following
sense. If, for a particular time t = t0 , we define an “edge” of u(t0 , x) as an inflection
F(v)

-K K v

Figure 2.3. The F function from the right-hand side of the Perona-Malik equation (2.12).

point with the property ux uxxx < 0, then a simple calculation shows that all such
edges where |ux | < K will be diminished by (2.12)—i.e. |ux | will be reduced, while
the larger edges, with |ux | > K, will be enhanced. It has been observed [34] that the
numerical implementations of (2.12) do not exactly exhibit this behavior, although they
do produce temporary enhancement of edges, resulting in both noise removal and scale
spaces in which the edges are much more stable across scale than in linear scale spaces.
As Weickert pointed out in [69], “a scale-space representation cannot perform better
than its discrete realization”. These observations naturally led to a closer analysis
(described in the next chapter) of a semi-discrete counterpart of (2.12), i.e., of the
following system of ordinary differential equations:
u̇n = F (un+1 − un ) − F (un − un−1 ), n = 1, . . . , N, (2.13)
0
u(0) = u ,
where u0 = (u01 , . . . , u0N )T ∈ IRN is the signal to be processed, and where the conven-
tions uN +1 = uN and u0 = u1 are used.
The 2-D semi-discrete version of the Perona-Malik equation is similar:
u̇ij = F (ui+1,j − uij ) − F (uij − ui−1,j )
+ F (ui,j+1 − uij ) − F (uij − ui,j−1 ),
u(0) = u0 ,
with i = 1, 2, . . . , N, j = 1, 2, . . . , N, and with the conventions u0,j = u1,j , uN +1,j =
uN,j , ui,0 = ui,1 and ui,N +1 = ui,N .
30 CHAPTER 2. PRELIMINARIES

¥ 2.3 Region Merging Segmentation Algorithms.

It is shown in the next chapter that SIDEs can be classified under a very broad category
of multiscale region merging segmentation methods.
The goal of image segmentation is to partition the domain Ω of definition of a given
image u0 into several disjoint regions O1 , ... , Ok (∪ki=1 Oi = Ω), such that u0 is, in some
sense, homogeneous within each region. Many segmentation algorithms also provide a
filtered version u of u0 , which is in essence a smoothing of u0 within each region
and has abrupt changes between regions. A boundary between two such neighboring
regions is called an edge. If a segmentation E2 can be obtained from a segmentation
E1 by erasing edges, it is said that E2 is coarser than E1 . A multiscale segmentation
algorithm provides a hierarchy of segmentations which get progressively coarser. Morel
and Solimini’s [43] definition of a generic multiscale region merging algorithm for image
segmentation is paraphrased and augmented below.
1. Initialize the algorithm with the finest possible segmentation (i.e., each pixel is a
separate region). Fix the scale t at a very small value t0 .
2. Merge all pairs of regions whose merging “improves” the segmentation. If the
coarsest segmentation is obtained (i.e., the whole domain Ω is the one and only
region), stop.
3. If necessary, update the estimate u.
4. Increment the scale parameter t.
5. Go to Step 2.
Numerous algorithms have been constructed that fall under this, very general, paradigm.
Their principal points of difference are Step 3 and the merging criterion used in Step
2. Two instances of such algorithms are Pavlidis’ algorithm of 1972 [47] and Koepfler,
Lopez, and Morel’s algorithm of 1994 [36].
Example 2.1. Pavlidis’ algorithm.
Step 2 of Pavlidis’ algorithm consists of merging two neighboring regions, Oi and
Oj , if the variance of u0 over Oi ∪ Oj is less than t. Step 3 is omitted.
Example 2.2. The algorithm of Koepfler, Lopez, and Morel.
Koepfler, Lopez, and Morel use the piecewise-constant Mumford-Shah model [44]
as their merging criterion. They form the filtered image u by assigning to every pixel
in a region Oi the average value of u0 over that region. Two regions, Oi and Oj , with
respective average values ui and uj , are merged by removing the boundary between them
and replacing both ui and uj with their weighted average, (|Oi |ui +|Oj |uj )/(|Oi |+|Oj |),
provided that the global energy is reduced:

(u − u0 )T (u − u0 ) + tl.
Sec. 2.4. Shock Filters and Total Variation. 31

Here, |Op | is the number of pixels in the region Op and l is the total length of all the
edges.
This method admits fast numerical implementations and has been experimentally
shown to be robust to white Gaussian noise. However, as we will illustrate, the quadratic
penalty on the disagreement between the estimate u and the initial data u0 renders it
ineffective against more severe noise, such as speckle encountered in SAR images.
Note that region merging methods do not allow edges to be created. Thus, decisions
made in the beginning of an algorithm cannot be undone later. A slight modification of
such methods results in split-and-merge methods, which combine region growing with
region splitting [43].

¥ 2.4 Shock Filters and Total Variation.

We will see in the next chapter that SIDEs blend together elements of the total vari-
ation minimization [6, 56, 58] and shock filters [46]. A shock filter is a PDE which
was introduced by Osher and Rudin in [46] to achieve signal and image enhancement
(de-blurring). Since its 2-D version will not be needed in this thesis, we restrict our
discussion to one spatial dimension:
ut = −|ux |f (uxx ), (2.14)
0
u(0, x) = u (x),
where f (0) = 0, and the function f has the same sign as its argument: sgn(f (uxx )) =
sgn(uxx ) for uxx 6= 0. This PDE has the property of developing shocks (jumps) near
the points of inflection of the initial condition. The most serious limitation of the
shock filters, acknowledged in [46], is non-robustness to noise. In fact, they enhance
noise, together with the “useful” edges. There are, however, stable numerical schemes,
which—similarly to the semi-discrete Perona-Malik equation—do not exhibit certain
properties of the underlying PDE. In particular, for certain choices of f , the scheme pre-
sented in [46] sharpens noise-free signals, but leaves noisy signals virtually unchanged.
This is because every local extremum remains stationary under this scheme. This be-
havior is illustrated in Figure 2.4, which presents the results of the simulations for the
equation
ut = −|ux | sgn(uxx ). (2.15)
For other choices of f , the numerical scheme of [46], in addition to keeping the local
extrema stationary, creates other local extrema and is unstable, which is illustrated in
Figure 2.5 for the equation
ut = −|ux |uxx . (2.16)

Total variation minimization is another restoration technique, developed by the

same authors in [56] and independently by Bouman and Sauer in [6, 58] in a different
32 CHAPTER 2. PRELIMINARIES

0.8

0.6

0.4

0.2

−0.2
0 50 100 150 200

(a) A unit step signal.

1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2
0 50 100 150 200 0 50 100 150 200

(b) A blurred step. (c) The blurred step with additive noise.
1 1

0.8 0.8

0.6 0.6

0.4 0.4

0.2 0.2

0 0

−0.2 −0.2
0 50 100 150 200 0 50 100 150 200

(d) Shock filtering of (b). (e) Shock filtering of (c).

Figure 2.4. (b) Gaussian blurring of the signal depicted in (a). (c) Signal depicted in (b), with additive
white Gaussian noise of variance 0.1. (d) The steady state of the shock filter (2.15), with the signal (b)
as the initial condition. The reconstruction is perfect, modulo numerical errors. (e) The steady state of
the shock filter (2.15), with the signal (c) as the initial condition. It is virtually the same as (c), since
all extrema remain stationary.

context and in a somewhat different form. We start with Bouman and Sauer’s work,
since it was chronologically first, and since—as we will see in the next chapter—it is
conceptually closer to the results presented in this thesis.
The objective of [6, 58] is reconstructing an image u from its tomographic projec-
tions u0 . The authors consider transmission tomography, where the projection data
are in the form of the number of photons detected after passing through an absorptive
material. In other words, u0 is the number of photon counts for each angle and dis-
placement. Bouman and Sauer use a probabilistic setting, where the photon counts
are Poisson random variables, independent among angles and displacements. They de-
rive an expression for the log likelihood function L(u|u0 ), and seek the maximum a
posteriori [66] estimate û of u:

û = arg max{L(u|u0 ) + E(u)}, (2.17)

u
Sec. 2.4. Shock Filters and Total Variation. 33

0.8

0.6

0.4

0.2

−0.2
0 50 100 150 200

(a) Five iterations of the shock filter (2.16).

0.8

0.6

0.4

0.2

−0.2
0 50 100 150 200

(a) Ten iterations of the shock filter (2.16).

0.8

0.6

0.4

0.2

−0.2
0 50 100 150 200

(a) Eighteen iterations of the shock filter (2.16).

Figure 2.5. Filtering the blurred unit step signal of Figure 2.4, (b) with the shock filter (2.16): (a) 5
iterations, (b) 10 iterations, (c) 18 iterations. Spurious maxima and minima are created; the unit step
is never restored.

where E(u) is the logarithm of the prior density function of u. They propose the
following prior model:
X
E(u) = γ |us − ur |, (2.18)
(s,r)∈N

where N is the list of all neighboring pairs of pixels, and γ is a constant.

The optimization problem (2.17) cannot be solved by gradient descent, since E(u) is
not differentiable at the points where us = ur for (s, r) ∈ N . Bouman and Sauer improve
upon the existing iterative methods for solving non-differentiable optimization problems
[4, 54, 73] by introducing a technique which they call segmentation based optimization.
The basic idea is to combine a Gauss-Seidel type approach [59] with a split-and-merge
segmentation strategy. Specifically, if the image values at two neighboring locations
are both equal to some number α, then α is changed until a minimum of the objective
function (2.17) is achieved. Similarly, if several neighboring pixels us1 = us2 = . . . =
usi have the same value, they are grouped together and changed together by moving
34 CHAPTER 2. PRELIMINARIES

along the hyperplane {u : us1 = us2 = . . . = usi } until a minimum of the objective
function (2.17) is achieved. After each pixel of the image is visited in such a manner, a
“split” iteration follows, where each pixel is freed to seek its own conditionally optimal
value. This approach is theoretically justified and extended in Chapter 3, where it is
shown that the steepest descent for a non-differentiable energy function such as (2.18)
is a differential equation which automatically merges pixels, thereby segmenting the
underlying image.
We also point out that the continuous version of the energy (2.18) is
Z
|ux | dx in 1-D,
Z Z
and |∇u| dx dy in 2-D,

and is called the total variation of u. Its constrained minimization was used in [56] for
image restoration. The restored version u(x, y) of an image u0 (x, y) was computed by
solving the following optimization problem:
Z Z
minimize |∇u| dx dy (2.19)
Z Z
subject to (u − u0 ) dx dy = 0
Z Z
and (u − u0 )2 dx dy = σ 2 .

¥ 2.5 Constrained Restoration of Geman and Reynolds.

We will see in the next chapter
P that a 1-D SIDE is the gradient descent equation for
the global energy E(u) = E(ui+1 − ui ), where E(v) is concave everywhere except at
zero and non-differentiable at zero, and looks like g (for example, E(v) = arctan(v)
or E(v) = 1 − (1 + |v|)−1 —see Figure 2.6). This energy is similar to the first term of
the image restoration model of D. Geman and Reynolds [21]. It is also interesting to
note that the potential function of the Gibbs distribution learned from natural images
in [75] has the same basic g-shape. This indicates that the functionals involving such
a term may be the right ones for modeling natural images.

¥ 2.6 Conclusion.
An exhaustive survey of variational models in image processing is beyond the scope
of this thesis. A much more complete bibliography can be found in [43]. In particu-
lar, Chapter 3 of [43] contains a very nice discussion of region merging segmentation
algorithms, starting with Brice and Fennema’s [7] and Pavlidis’ [47], which may be
considered as ancestors to both [36], snakes [31], and SIDEs. Examples of more re-
cent algorithms, not covered in [43], are [16] and [24]. Another important survey text,
Sec. 2.6. Conclusion. 35

Figure 2.6. The SIDE energy function, also encountered in the models of Geman and Reynolds, and
Zhu and Mumford.

which also contains a wealth of references both on variational methods and nonlinear
diffusions, is [55].
36 CHAPTER 2. PRELIMINARIES
Chapter 3

Image Segmentation with Stabilized

Inverse Diffusion Equations

¥ 3.1 Introduction.
N this chapter, we introduce the Stabilized Inverse Diffusion Equations (SIDEs), as
I well as illustrate their speed and robustness, in comparison with some of the methods
reviewed in Chapter 2. As we mentioned in the previous chapter, the starting point for
the development of SIDEs were image restoration and segmentation procedures based on
PDEs of evolution [1, 12, 46, 49, 50, 55–57, 72]. We observed that the numerical schemes
for solving such equations do not necessarily exhibit the behavior of the equations
themselves. We therefore concentrate in this thesis on semi-discrete scale spaces (i.e.,
continuous in scale and discrete in space). More specifically, SIDEs, which are the
main focus and contribution of this thesis, are a new family of semi-discrete evolution
equations which stably sharpen edges and suppress noise. We will see that SIDEs
may be viewed as a conceptually limiting case of Perona-Malik diffusions which were
reviewed in the previous chapter. SIDEs have discontinuous right-hand sides and act as
inverse diffusions “almost everywhere”, with stabilization resulting from the presence
of discontinuities in the vector field defined by the evolution. The scale space of such an
equation is a family of segmentations of the original image, with larger values of the scale
parameter t corresponding to segmentations at coarser scales. Moreover, in contrast to
continuous evolutions, the ones introduced here naturally define a sequence of logical
“stopping times”, i.e. points along the evolution endowed with useful information, and
corresponding to times at which the evolution hits a discontinuity surface of its defining
vector field.
In the next section we begin by describing a convenient mechanical analog for the
visualization of many spatially-discrete evolution equations, including discretized linear
or nonlinear diffusions such as that of Perona and Malik, as well as the discontinuous
equations that we introduce in Section 3.3. The implementation of such a discontinuous
equation naturally results in a recursive region merging algorithm. Because of the
discontinuous right-hand side of SIDEs, some care must be taken in defining solutions,
but as we show in Section 3.4, once this is done, the resulting evolutions have a number
of important properties. Moreover, as we have indicated, they lead to very effective

37
38 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

algorithms for edge enhancement and segmentation, something that is demonstrated

in Section 3.5. In particular, as we will see, they can produce sharp enhancement of
edges in high noise as well as accurate segmentations of very noisy imagery such as
SAR and ultrasound imagery subject to severe speckle. In Section 3.6, we point out its
principal differences from Koepfler, Lopez, and Morel’s [36] region merging procedure
for minimizing the Mumford-Shah functional [44]. The rest of that section is devoted
to exploring the links with other important work in the field reviewed in Chapter 2:
the total variation approach [6,56,58]; shock filters of Osher and Rudin [46]; the robust
variational formulation of D. Geman and Reynolds [21]; and the stochastic modeling
approach of Zhu and Mumford [75].

¥ 3.2 A Spring-Mass Model for Certain Evolution Equations.

As we indicated in the introduction, the focus of this chapter is on discrete-space,
continuous-time evolutions of the following general form:

u̇(t) = F(u)(t), (3.1)

0
u(0) = u ,

where u is either a discrete sequence consisting of N samples (u = (u1 , . . . , uN )T ∈ IRN ),

2
or an N -by-N image whose j-th entry in the i-th row is uij (u ∈ IRN ). The initial
condition u0 corresponds to the original signal or image to be processed, and u(t) then
represents the evolution of this signal/image at time (scale) t, resulting in a scale-space
family for 0 ≤ t < ∞.

u N , MN
u n , Mn

F1
FN

........ Fn
u 1 , M1 ........

Figure 3.1. A spring-mass model.

The nonlinear operators F of interest in this chapter can be conveniently visualized

through the following simple mechanical model. For the sake of simplicity in visual-
ization, let us first suppose that u ∈ IRN is a one-dimensional (1-D) sequence, and
interpret u(t) = (u1 (t), . . . , uN (t))T in (3.1) as the vector of vertical positions of the
N particles of masses M1 , . . . , MN , depicted in Figure 3.1. The particles are forced to
Sec. 3.2. A Spring-Mass Model for Certain Evolution Equations. 39

move along N vertical lines. Each particle is connected by springs to its two neighbors
(except the first and last particles, which are only connected to one neighbor.) Every
spring whose vertical extent is v has energy E(v), i.e., the energy of the spring between
the n-th and (n + 1)-st particles is E(un+1 − un ). We impose the usual requirements
on this energy function:

E(v) ≥ 0,
E(0) = 0, (3.2)
0
E (v) ≥ 0 for v > 0,
E(v) = E(−v).

Then the derivative of E(v), which we refer to as “the force function” and denote by
F (v), satisfies

F (0) = 0,
F (v) ≥ 0 for v > 0, (3.3)
F (v) = −F (−v).

We also call F (v) a “force function” and E(v) an “energy” if −E(v) satisfies (3.2)
and −F (v) satisfies (3.3). We make the movement of the particles non-conservative by
stopping it after a small period of time ∆t and re-starting with zero velocity. (Note
that this will make our equation non-hyperbolic.) It is assumed that during one such
step, the total force Fn = −F (un − un+1 ) − F (un − un−1 ), acting on the n-th particle,
stays approximately constant. The displacement during one iteration is proportional
to the product of acceleration and the square of the time interval:

(∆t)2 Fn
un (t + ∆t) − un (t) = .
2 Mn

Letting ∆t → 0, while fixing 2Mn

∆t = mn , where mn is a positive constant, leads to

1
u̇n = (F (un+1 − un ) − F (un − un−1 )), n = 1, 2, . . . , N, (3.4)
mn
with the conventions u0 = u1 and uN +1 = uN imposed by the absence of springs to the
left of the first particle and to the right of the last particle. We will refer to mn as “the
mass of the n-th particle” in the remainder of the thesis. Note that Equation (3.4) is a
(weighted) gradient descent equation for the following global energy:

X
N −1
E(u) = E(ui+1 − ui ). (3.5)
i=1

The examples below, where mn = 1, clearly illustrate these notions.

40 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

Example 3.1. Linear heat equation.

A linear force function F (v) = v leads to the semi-discrete linear heat equation

u̇n = un+1 − 2un + un−1 .

This corresponds to a simple discretization of the 1-D linear heat equation and results
in evolutions which produce increasingly low-pass filtered and smoothed versions of the
original signal u0 .

In general, F (v) is called a “diffusion force” if, in addition to (3.3), it is monotoni-

cally increasing:

v1 < v2 ⇒ F (v1 ) < F (v2 ), (3.6)

which is illustrated in Figure 3.2(a). We shall call the corresponding energy a “dif-

-K K

(a) (b) (c)

Figure 3.2. Force functions: (a) diffusion; (b) inverse diffusion; (c) Perona-Malik.

fusion energy” and the corresponding evolution (3.4) a “diffusion”. The evolution in
Example 3.1 is clearly a diffusion. We call F (v) an “inverse diffusion force” if −F (v)
satisfies Equations (3.3) and (3.6), as illustrated in Figure 3.2(b). The corresponding
evolution (3.4) is called an “inverse diffusion”. Inverse diffusions have the characteris-
tic of enhancing abrupt differences in u corresponding to “edges” in the 1-D sequence.
Such pure inverse diffusions, however, lead to unstable evolutions (in the sense that they
greatly amplify arbitrarily small noise). The following example, which is prototypical of
the examples considered by Perona and Malik, defines a stable evolution that captures
at least some of the edge enhancing characteristics of inverse diffusions.

Example 3.2. Perona-Malik equations.

v 2
Taking F (v) = v exp (−( K ) ), as illustrated in Figure 3.2(c), yields a 1-D semi-
discrete (continuous in scale and discrete in space) version of the Perona-Malik equation
(see equations (3.3), (3.4), and (3.12) in [50]). In general, given a positive constant K,
a force F (v) will be called “Perona-Malik force of thickness K” if, in addition to (3.3),
it satisfies the following conditions:

F (v) has a unique maximum at v = K, (3.7)

F (v1 ) = F (v2 ) ⇒ (|v1 | − K)(|v2 | − K) < 0.
Sec. 3.3. Stabilized Inverse Diffusion Equations (SIDEs): The Definition. 41

Figure 3.3. Spring-mass model in 2-D (view from above).

We shall call the corresponding energy a “Perona-Malik energy” and the corresponding
evolution equation a “Perona-Malik equation of thickness K”. As Perona and Malik
demonstrate (and as can also be inferred from the results in the present thesis), evolu-
tions with such a force function act like inverse diffusions in the regions of high gradient
and like usual diffusions elsewhere. They are stable and capable of achieving some level
of edge enhancement depending on the exact form of F (v).
Finally, to extend the mechanical model of Figure 3.1 to images, we simply replace
the sequence of vertical lines along which the particles move with an N -by-N square
grid of such lines, as shown in Figure 3.3. The particle at location (i, j) is connected
by springs to its four neighbors: (i − 1, j), (i, j + 1), (i + 1, j), (i, j − 1), except for the
particles in the four corners of the square (which only have two neighbors each), and the
rest of the particles on the boundary of the square (which have three neighbors). This
arrangement is reminiscent of (and, in fact, was suggested by) the resistive network of
Figure 8 in [49]. The analog of Equation (3.4) for images is then:
1
u̇ij = (F (ui+1,j − uij ) − F (uij − ui−1,j )
mij
+ F (ui,j+1 − uij ) − F (uij − ui,j−1 ), (3.8)

with i = 1, 2, . . . , N, j = 1, 2, . . . , N, and the conventions u0,j = u1,j , uN +1,j = uN,j ,

ui,0 = ui,1 and ui,N +1 = ui,N imposed by the absence of springs outside of 1 ≤ i ≤ N ,
1 ≤ j ≤ N.

¥ 3.3 Stabilized Inverse Diffusion Equations (SIDEs): The Definition.

In this section, we introduce a discontinuous force function, resulting in a system (3.4)
that has discontinuous right-hand side (RHS). Such equations received much attention
in control theory because of the wide usage of relay switches in automatic control
systems [17, 67]. More recently, deliberate introduction of discontinuities has been
used in control applications to drive the state vector onto lower-dimensional surfaces
in the state space [67]. As we will see, this objective of driving a trajectory onto a
lower-dimensional surface also has value in image analysis and in particular in image
segmentation. Segmenting a signal or image, represented as a high-dimensional vector
42 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

u, consists of evolving it so that it is driven onto a comparatively low-dimensional

subspace which corresponds to a segmentation of the signal or image domain into a
small number of regions.
The type of force function of interest to us here is illustrated in Figure 3.4. More
precisely, we wish to consider force functions F (v) which, in addition to (3.3), satisfy
the following conditions:

F 0 (v) ≤ 0 for v 6= 0,
+
F (0 ) > 0 (3.9)
F (v1 ) = F (v2 ) ⇔ v1 = v 2 .

Contrasting this form of a force function to the Perona-Malik function in Figure 3.2,

Figure 3.4. Force function for a stabilized inverse diffusion equation.

we see that in a sense one can view the discontinuous force function as a limiting form
of the continuous force function in Figure 3.2(c), as K → 0. However, because of
the discontinuity at the origin of the force function in Figure 3.4, there is a question
of how one defines solutions of Equation (3.4) for such a force function. Indeed, if
Equation (3.4) evolves toward a point of discontinuity of its RHS, the value of the
RHS of (3.4) apparently depends on the direction from which this point is approached
(because F (0+ ) 6= F (0− )), making further evolution non-unique. We therefore need a
special definition of how the trajectory of the evolution proceeds at these discontinuity
points.1 For this definition to be useful, the resulting evolution must satisfy well-
posedness properties: the existence and uniqueness of solutions, as well as stability of
solutions with respect to the initial data. In the rest of this section we describe how to
define solutions to (3.4) for force functions (3.9). Assuming the resulting evolutions to
be well-posed, we demonstrate that they have the desired qualitative properties, namely
that they both are stable and also act as inverse diffusions and hence enhance edges.
We address the issue of well-posedness and other properties in Section 3.4.
Consider the evolution (3.4) with F (v) as in Figure 3.4 and Equation (3.9) and with
all of the masses mn equal to 1. Notice that the RHS of (3.4) has a discontinuity at a
point u if and only if ui = ui+1 for some i between 1 and N − 1. It is when a trajectory
reaches such a point u that we need the following definition. In terms of the spring-mass
model of Figure 3.1, once the vertical positions ui and ui+1 of two neighboring particles
become equal, the spring connecting them is replaced by a rigid link. In other words,
1
Having such a definition is crucial because, as we will show in Section 3.4, equation (3.4) will reach
a discontinuity point of its RHS in finite time, starting with any initial condition.
Sec. 3.3. Stabilized Inverse Diffusion Equations (SIDEs): The Definition. 43

the two particles are simply merged into a single particle which is twice as heavy (see
Figure 3.5), yielding the following modification of (3.4) for n = i and n = i + 1:
1
u̇i = u̇i+1 = (F (ui+2 − ui+1 ) − F (ui − ui−1 )).
2
(The differential equations for n 6= i, i + 1 do not change.) Similarly, if m consecutive

u i+2 u i+2
ui u i+1 u i = u i+1

u i-1 u i-1

Figure 3.5. A horizontal spring is replaced by a rigid link.

particles reach equal vertical position, they are merged into one particle of mass m
(1 ≤ m ≤ N ):

u̇n = . . . = u̇n+m−1 =
1
= (F (un+m − un+m−1 ) − F (un − un−1 )) (3.10)
m
if
un−1 6= un = un+1 = . . . = un+m−2 = un+m−1 6= un+m .

Notice that this system is the same as (3.4), but with possibly unequal masses. It is
convenient to re-write this equation so as to explicitly indicate the reduction in the
number of state variables:
1
u̇ni = (F (uni+1 − uni+1 −1 ) − F (uni − uni−1 )), (3.11)
mni
uni = uni +1 = . . . = uni +mni −1 ,
where i = 1, . . . p,
1 = n1 < n2 < . . . < np−1 < np ≤ N,
ni+1 = ni + mni .

The compound particle described by the vertical position uni and mass mni consists
of mni unit-mass particles uni , uni +1 , . . . , uni +mni −1 that have been merged, as shown
in Figure 3.5. The evolution can then naturally be thought of as a sequence of stages:
during each stage, the right-hand side of (3.11) is continuous. Once the solution hits
a discontinuity surface of the right-hand side, the state reduction and re-assignment
44 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

of mni ’s, described above, takes place. The solution then proceeds according to the
modified equation until it hits the next discontinuity surface, etc.
Notice that such an evolution automatically produces a multiscale segmentation of
the original signal if one views each compound particle as a region of the signal. Viewed
as a segmentation algorithm, this evolution can be summarized as follows:
1. Start with the trivial initial segmentation: each sample is a distinct region.
2. Evolve (3.11) until the values in two or more neighboring regions become equal.
3. Merge the neighboring regions whose values are equal.
4. Go to step 2.
The same algorithm can be used for 2-D images, which is immediate upon re-writing
Equation (3.11):
1 X
u̇ni = F (unj − uni )pij , (3.12)
mni
nj ∈Ani

where
mni is again the mass of the compound particle ni (= the number of pixels in the
region ni );
Ani is the set of the indices of all the neighbors of ni , i.e., of all the compound particles
that are connected to ni by springs;
pij is the number of springs between regions ni and nj (always 1 in 1-D, but can be
larger in 2-D).
Just as in 1-D, two neighboring regions n1 and n2 are merged by replacing them with one
region n of mass mn = mn1 + mn2 and the set of neighbors An = An1 ∪ An2 \{n1 , n2 }.
We close this section by describing one of the basic and most important properties
of these evolutions, namely that the evolution is stable but nevertheless behaves like an
inverse diffusion. Notice that a force function F (v) satisfying (3.9) can be represented
as the sum of an inverse diffusion force Fid (v) and a positive multiple of sgn(v): F (v) =
Fid (v) + C sgn(v), where C = F (0+ ) and −Fid (v) satisfies (3.3) and (3.6). Therefore,
if uni+1 − uni and uni − uni−1 are of the same sign (which means that uni is not a local
extremum of the sequence (un1 , . . . , unp )), then (3.11) can be written as
1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 )). (3.13)
mni
If uni > uni+1 and uni > uni−1 (i.e., uni is a local maximum), then (3.11) is
1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 ) − 2C). (3.14)
mni
Sec. 3.4. Properties of SIDEs. 45

If uni < uni+1 and uni < uni−1 (i.e., uni is a local minimum), then (3.11) is

1
u̇ni = (Fid (uni+1 − uni ) − Fid (uni − uni−1 ) + 2C). (3.15)
mni

Equation (3.13) says that the evolution is a pure inverse diffusion at the points which
are not local extrema. It is not, however, a global inverse diffusion, since pure inverse
diffusions drive local maxima to +∞ and local minima to −∞ and thus are unstable.
In contrast, equations (3.14) and (3.15) show that at local extrema, the evolution in-
troduced in this chapter is an inverse diffusion plus a stabilizing term which guarantees
that the local maxima do not increase and the local minima do not decrease. Indeed,
|Fid (v)| ≤ F (0+ ) = C for any v and for any SIDE force function F , and therefore the
RHS of (3.14) is negative, and the RHS of (3.15) is positive. For this reason, we call
the new evolution (3.11), (3.12) a “stabilized inverse diffusion equation” (“SIDE”), a
force function satisfying (3.9) a “SIDE force”, and the corresponding energy a “SIDE
energy”. In Chapter 4, we will analyze a simpler version of this equation, which results
from dropping the inverse diffusion term. In this particular case, the local extrema
move with constant speed and all the other samples are stationary, which makes the
analysis of the equation more tractable.

¥ 3.4 Properties of SIDEs.

¥ 3.4.1 Basic Properties in 1-D.
The SIDEs described in the two preceding sections enjoy a number of interesting prop-
erties which validate and explain their adaptability to segmentation problems. We
first examine the SIDEs in one spatial dimension for which we can make the strongest
statements.
We define the ni -th discontinuity hyperplane of a SIDE (3.11) by Sni = {u ∈ IRp :
uni = uni+1 }, i = 1, . . . , p − 1. Sometimes it is more convenient to work with the
vector v = (vn1 , . . . , vnp−1 )T ∈ IRp−1 of the first differences of u: vni = uni+1 − uni , for
i = 1, . . . , p − 1. We abuse notation by also denoting Sni = {v ∈ IRp−1 : vni = 0}.
On such hyperplanes, we defined the solution of a SIDE as the solution to a modified,
lower-dimensional, equation whose RHS is continuous on Sni . In what follows, we will
assume that the SIDE force function F (v) is sufficiently regular away from zero, so
that the ODE (3.11), restricted to the domain of continuity of its RHS, is well-posed.
As a result, existence and uniqueness of solutions of SIDEs immediately follow from
the existence and uniqueness of solutions of ODEs with continuous RHS. Continuous
dependence on the initial data is also guaranteed for a trajectory segment lying inside
a region of continuity of the RHS. In order to show, however, that the solutions that
we have defined are continuous with respect to initial conditions over arbitrary time
intervals, we must take into account the presence of discontinuities on the RHS. In
particular, what must be shown is that trajectories that start very near a discontinuity
surface remain close to one that starts on the surface. More precisely, we need to be
46 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

able to show that a trajectory whose initial point is very close to Sni will, in fact, hit Sni
(see Figure 3.6). In the literature on differential equations and control theory [17, 67],
the behavior that SIDEs exhibit and which is illustrated in Figure 3.6 is referred to as
“sliding modes”. Specifically, as proven in Appendix A, the behavior of the evolution
near discontinuity hyperplanes satisfies the following:

Lemma 3.1 (Lemma on Sliding). Let σ be a permutation of (n1 , . . . , np−1 ), and m

an integer between 1 and p − 1, and let S be the set of all points in the intersection of
m hyperplanes which do not belong to the remaining p − m − 1 hyperplanes:
 
\m [
p−1
S= Sσ(q) \  Sσ(q)  .
q=1 q=m+1

Then, as v approaches S from any quadrant,2 its velocity is directed towards S:

lim(v̇σ(q) sign(vσ(q) )) ≤ 0 for q = 1, . . . , m,

and for at least one q this inequality is strict.

vn+1

Figure 3.6. Solution field near discontinuity surfaces.

Intuitively, and as illustrated in Figure 3.6, this lemma states that the solution field
of a SIDE near any discontinuity surface points toward that surface. As a consequence,
a trajectory which hits such a surface may be continuously extended to “slide” along the
surface, as shown in [17, 67]. For this reason the discontinuity surfaces are commonly
referred to as “sliding surfaces”. For SIDEs, a simple calculation verifies that the
dynamics along such a surface, obtained through any of the three classical definitions
in [17, 67], correspond exactly to the definition given in the preceding section.
The Lemma on Sliding, together with the well-posedness of SIDEs inside their con-
tinuity regions, directly implies the overall well-posedness of 1-D SIDEs: for finite T ,
the trajectory from t = 0 to t = T depends continuously on its initial point. As shown
in Property 3.2 to follow, a SIDE reaches a steady state in finite time, which establishes
its well-posedness for infinite time intervals.
2
In IRp−1 , a quadrant containing a vector a = (a1 , . . . , ap−1 )T such that ai 6= 0 for i = 1, . . . , p − 1
is the set Q = {b ∈ IRp−1 : bi ai > 0 for i = 1, . . . , p − 1}.
Sec. 3.4. Properties of SIDEs. 47

We call uni , with i ∈ {2, . . . , p − 1} a local maximum (minimum) of the sequence

(un1 , . . . , unp ) if uni > uni±1 (uni < uni±1 ). The point un1 is a local maximum (min-
imum) if un1 > un2 (un1 < un2 ); unp is a local maximum (minimum) if unp > unp−1
(unp < unp−1 ). Therefore, we immediately have (as we saw in Equations (3.14), (3.15))
that the maxima (minima) are always pulled down (up):

Property 3.1 (maximum principle). Every local maximum is decreased and every
local minimum is increased by a SIDE. Therefore,

|ui (t)| < max |un (0)| for t > 0. (3.16)

Using this result, we can prove the following:

Property 3.2 (finite evolution time). A SIDE, started at u0 = (u01 , . . . , u0N )T , reaches
P
its equilibrium (i.e., the point u = (u1 , . . . , uN )T where u1 = . . . = uN = N1 N 0
i=1 ui )
in finite time.

Proof. The sum of the vertical positions of all unit-mass particles is equal to the
sum
PN of the vertical
Pp positions of the compound particles, weighted by their masses:
n=1 un = i=1 uni mni . The time derivative of this quantity is zero, as verified
byP summing up the right-hand sides of (3.11). Therefore, the mean vertical position
1 N
N n=1 un is constant throughout the evolution. Writing (3.11) for i = 1, u̇n1 =
mn1 (un2 −un1 ), we see that the leftmost compound particle is stationary only if p = 1,
1
F
i.e., if all unit-mass particles have the same vertical position: un1 = u1 = u2 = .P. . = uN .
Since the mean is conserved, the unique steady state is u1 = . . . = uN = N1 N 0
i=1 ui .
To prove that it is reached in finite time, we again refer to the spring-mass model of
Figure 3.1 and use the fact that a SIDE force function assigns larger force to shorter
springs. If we put L = 2 max n
|un (0)|, then the maximum principle implies that in the
system there cannot exist a spring with vertical extent larger than L at any time during
the evolution. Therefore, the rate of decrease of the absolute maximum, according to
Equation (3.11), is at least F (L)/N (because F (L) is the smallest force possible in the
system, and N is the largest mass). Similarly, the absolute minimum always increases
at least as quickly. They will meet no later than at t = 2FLN (L) , at which point the
sequence u(t) must be a constant sequence.
The above property allows us immediately to state the well-posedness results as
follows:

Property 3.3 (well-posedness). For any initial condition u0∗ , a SIDE has a unique
solution u∗ (t) satisfying u∗ (0) = u0∗ . Moreover, for any such u0∗ and any ε > 0, there
exists a δ > 0 such that |u0 − u0∗ | ≤ δ implies |u(t) − u∗ (t)| ≤ ε for t ≥ 0, where u(t)
is the solution of the SIDE with the initial condition u0 .

As we pointed out in the introductory section of this chapter, a SIDE evolution

defines a natural set of hitting times which intuitively should be of use in characterizing
48 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

features in an image. For this to be true, however, we would need some type of continuity
of this hitting time sequence. Specifically, let tn (u0 ) denote the “n-th hit time”, i.e., the
time when the solution starting at u0 reaches the sliding hyperplane Sn . By Property
3.2, this is a finite number. Let u(t) be “a typical solution” if it never reaches two
different sliding hyperplanes at the same time: ti (u(0)) 6= tj (u(0)) if i 6= j. One
of the consequences of the Lemma on Sliding is that a trajectory that hits a single
hyperplane Sn does so transversally (that is, cannot be tangent to it). Since trajectories
vary continuously, this means that nearby solutions also hit Sn . Therefore, for typical
solutions the following holds:
Property 3.4 (stability of hit times). If u(t) is a typical solution, all solutions with
initial data sufficiently close to u(0) get onto surfaces Sn in the same order as u(t).
The sequence in which a trajectory hits surfaces Sn is an important characteristic
of the solution. Property 3.4 says that, for a typical solution u(t), the (strict) ordering
of hit times tn (u(0)) is stable with respect to small disturbances in u(0):

tn1 (u(0)) < tn2 (u(0)) < . . . < tnN −1 (u(0)), (3.17)

where (n1 , . . . , nN −1 ) is a permutation of (1, . . . , N − 1). For the purposes of segmenta-

tion and edge detection, the only interesting output occurs at these N − 1 time points,
since they are the only instants when the segmentation of the initial signal changes (i.e.,
when regions are merged and edges are erased). While a thorough investigation of how
to use these hitting times and in particular how to stop a SIDE so as to obtain the best
segmentation is an open one for the general form of the SIDE force function F , we will
obtain a partial answer for a certain choice of F in the next chapter. Specifically, when
the number of regions is an exponential random variable, and F (v) = sgn(v), then the
stopping rule in 1-D is given by Proposition 4.6 of the next chapter. For other SIDE
force functions, the fact that the choice of the output time points is limited to a finite
set provides us with both a natural sequence of segmentations of increasing granularity
and with, at the very least, some simple stopping rules. For example, if the number of
“useful” regions, r, is known or bounded a priori, a natural candidate for a stopping
time would be tnN −r , i.e., the time when exactly r regions remain. In the next section
we illustrate the effectiveness of such a rule in the simplest case, namely when r = 2 so
that we are seeking a partition of the field of interest into two regions.
We already mentioned that our definition of solutions on sliding surfaces for SIDEs
in one spatial dimension coincides with all three classical definitions of solutions for
a general equation with discontinuous right-hand side, which are presented on pages
50-56 of Filippov’s book [17]. We use a result on page 95 of [17] to infer the following:
Property 3.5 (continuous dependence on the RHS). Let us consider a SIDE force
function FS (v), and let pK (v) be a smoothing kernel of width K:
Z
pK (v) ≥ 0, supp(pK ) = [−K; K], pK (v) dv = 1.
Sec. 3.4. Properties of SIDEs. 49
R
Let FK (v) = FS (w)pK (v − w) dw be a regularized version of FS (v). Consider system
(3.4) with mn = 1 and F (v) = FK (v). Then for any ε, there is a K such that the
solution of this system stays closer than ε to the solution of the SIDE with the same
initial condition and force FS (v).

We note that if the smoothing kernel pK (v) is appropriately chosen, then the result-
ing FK (v) will be a Perona-Malik force function of thickness K. (For example, one easy
choice for pK (v) is a multiple of the indicator function of the interval [-K;K].) Thus,
semi-discrete Perona-Malik evolutions with small K are regularizations of SIDEs, and
consequently a SIDE in 1-D can be viewed as a limiting case of a Perona-Malik-type
evolution. However, as we will see in the experimental section, the SIDE evolutions
appear to have some advantages over such regularized evolutions even in 1-D.

¥ 3.4.2 Energy Dissipation in 1-D.

It was mentioned in the previous chapter that the SIDE (3.11) is the gradient descent
equation for the global energy

X
N −1
E(u) = E(un+1 − un ), (3.18)
n=1

where E is the SIDE energy function (Figure 2.6), i.e., an antiderivative of the SIDE
force function. Note that the standard definition of the gradient cannot be used here.
Indeed, non-differentiability of E at the origin makes the directional derivatives of E(u)
in the directions orthogonal to a sliding surface S undefined for u ∈ S. But once u(t)
hits a sliding surface, it stays there for all future times, and therefore we do not have to
be concerned with the partial derivatives of E(u) in the directions which do not lie in
the sliding surface. This leads to the definition of the gradient as the vector of partial
derivatives taken with respect to the directions which belong to the sliding surface.

Definition 3.1. Suppose that S is the intersection of all the sliding hyperplanes of
the SIDE (3.11) to which the vector u belongs. Suppose further that {fi }pi=1 is an
orthonormal basis for S. Then the gradient of E with respect to S, ∇S E, is defined
as the weighted sum of the basis vectors, with the weights equal to the corresponding
directional derivatives:

def
X
p
∂E(u)
∇S E(u) = fi . (3.19)
∂fi
i=1

We will show in this section that at any moment t, the RHS of the SIDE (3.11)
is the negative gradient of E(u(t)), taken with respect to the intersection S of all the
sliding surfaces to which u(t) belongs. An auxiliary result is needed in order to show
this.
50 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

Lemma 3.2. Suppose that, as in Equation (3.11), u is a signal with p distinct regions
of masses m1 , . . . , mp :

1 = n1 < n2 , . . . < np−1 < np ≤ N are such that

ni+1 = ni + mni , and (3.20)
uni −1 6= uni = uni +1 = . . . = uni+1 −1 6= uni+1 , for i = 1, . . . , p.

Let {ej }N
j=1 be the standard basis of IR
N
(i.e., the j-th entry of ej is 1 and all other
entries are zeros), and define
ni+1 −1
1 X
fi = √ ej , for i = 1, . . . , p. (3.21)
mni
j=ni

Then {fi }pi=1 is an orthonormal basis for the sliding surface S defined by (3.20).
Proof. The vector fi satisfies Equation (3.20), and therefore it belongs to the sliding
surface S. Since ej ’s are mutually orthogonal, so are fi ’s. Since there are p distinct fi ’s,
they form a basis for the p-dimensional surface S. The norm of fi is
ni+1 −1 µ ¶2
X 1 1
√ = mni = 1.
mni mni
j=ni

Property 3.6 (gradient descent). The SIDE (3.11) is the gradient descent equation
for the global energy (3.18), i.e.,

u̇(t) = −∇S(t) E(u(t)), (3.22)

where S(t) is the intersection of all sliding hyperplanes to which u(t) belongs, and ∇S(t)
is the gradient with respect to S(t).
Proof. In order to prove this property, we write out Equation (3.22) in terms of the
coefficients of u̇ and −∇S E(u) with respect to the basis {fi }pi=1 (3.21). It is immediate
from the definition (3.21) of fi ’s that
X
p
√
u= uni mni fi ,
i=1

and so the i-th coefficient of u̇ in the basis {fi }pi=1 is

√
mni u̇ni (3.23)

Since the basis {fi }pi=1 is orthonormal, the i-th coefficient of −∇S E(u) in this basis is
the directional derivative of −E in the direction fi :
∂E 1
− = − lim {E(u + fi ∆) − E(u)}
∂fi ∆→0 ∆
Sec. 3.4. Properties of SIDEs. 51
("n −2
1 Xi
∆
= − lim E(un+1 − un ) + E(uni + √ − uni −1 )
∆→0 ∆ mni
n=1
ni+1 −2
X ∆ ∆
+ E(un+1 + √ − un − √ )
n=ni
mni mni
 
∆ X
N −1 X
N −1 
+E(uni+1 − uni+1 −1 − √ )+ E(un+1 − un ) − E(un+1 − un )
mni n=n

i+1 n=1
½ · ¸
1 ∆
= − lim E(uni − uni −1 + √ ) − E(uni − uni −1 )
∆→0 ∆ mni
· ¸¾
1 ∆
+ E(uni+1 − uni+1 −1 − √ ) − E(uni+1 − uni+1 −1 )
∆ mni
½ ¾
1 1
= − √ E 0 (uni − uni −1 ) − √ E 0 (uni+1 − uni+1 −1 )
mni mni
1
= √ (F (uni+1 − uni+1 −1 ) − F (uni − uni −1 )). (3.24)
mni

Equating the coefficients (3.23) and (3.24), we get that the gradient descent equation
(3.22), written in the basis {fi }pi=1 , is:
1
u̇ni = (F (uni+1 − uni+1 −1 ) − F (uni − uni −1 ),
mni
which is the SIDE (3.11).
It is possible to characterize further the process of energy dissipation during the
evolution of a SIDE. Namely, between any two consecutive mergings (i.e., hits of a
sliding surface), the energy is a concave function of time.
Property 3.7 (energy dissipation). Consider the SIDE (3.11) and let E be the cor-
responding SIDE energy function: E 0 = F . Then between any two consecutive mergings
during the evolution of the SIDE, the global energy (3.18) is decreasing and concave as
a function of time:

Ė < 0
Ë ≤ 0.

Proof. To simplify notation, we will denote

yi = uni for i = 1, . . . , p,

and will simply write mi instead of mni . Then the global energy (3.18) is
X
p
E= E(yi+1 − yi ),
i=1
52 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

and therefore the SIDE (3.11) can be re-written as follows:

1 ∂E
ẏi = − , i = 1, . . . , p.
mi ∂yi
By the chain rule of differentiation, we have:

X
p p µ
X ¶
∂E ∂E 2 1
Ė = ẏi = − < 0.
∂yi ∂yi mi
i=1 i=1

Differentiating with respect to t one more time and applying the chain rule again yields:

Xp µ ¶
∂E d ∂E 1
Ë = − 2
∂yi dt ∂yi mi
i=1
Ã p !
Xp
∂E X ∂ 2 E 1
= − 2 ẏk
∂yi ∂yi ∂yk mi
i=1 k=1
Ã !
Xp
∂E X ∂ 2 E 1 ∂E
p
1
= 2
∂yi ∂yi ∂yk mk ∂yk mi
i=1 k=1
T
= 2D HD, (3.25)

where
µ ¶T
1 ∂E 1 ∂E
D= ,... , ,
m1 ∂y1 mp ∂yp

and H is the Hessian matrix of E, i.e., the matrix of all the mixed second derivatives
2E
of E. The entry in the i-th row and k-th column of H is ∂y∂i ∂yk
. In other words,
 
x1 −x1 0 0 0 ... 0
−x1 x1 + x2 −x2 0 0 ... 0 
 
 0 −x x + x −x 0 ... 0 
 2 2 3 3 
 .. .. 
H = −
 . . ,

 .. .. .. .. 
 . . . . 0 
 
 0 ... 0 −xp−2 xp−2 + xp−1 −xp−1 
0 ... 0 −xp−1 xp−1

where xi = −F 0 (yi+1 − yi ). Note that, by our definition of yi ’s, yi+1 − yi 6= 0, and that
F (yi+1 − yi ) is monotonically decreasing for yi+1 − yi 6= 0. Therefore, xi > 0.
All that remains to show is that H is negative semidefinite, which, combined with
(3.25), means that Ë ≤ 0. It is easily verified that −H can be factorized into a lower-
Sec. 3.4. Properties of SIDEs. 53

triangular and an upper-triangular matrix as follows:

  
1 0 ... 0 x1 −x1 0 0 ... 0
−1 1 0 . . . 0  0 x2 −x2 0 ... 0 
  
 0 −1 1 . . . 0  0 0 x −x . .. 0 
  3 3 
 .. .   .
..   .. .. 
−H =   .  . .

 .. .. ..  .. .. 
 . . . 0  . . 0 
  
0 ... −1 1 0   0 ... xp−1 −xp−1 
0 ... 0 −1 1 0 ... 0

The diagonal entries x1 , . . . , xp−1 , 0 of the upper-triangular matrix are the pivots ( [62],
page 32) of −H. Since all the pivots are nonnegative, it follows ( [62], page 339) that
−H ≥ 0 ⇒ H ≤ 0, which implies Ë ≤ 0.
A typical picture of the energy dissipation is shown in Figure 3.7; the only points
where E might not be concave as a function of time are the merge points.

E(u(t))

t
Figure 3.7. Typical picture of the energy dissipation during the evolution of a SIDE.

In addition to being the gradient descent equation forPthe global energy (3.18), with
E 0 = F , we now show that the SIDE (3.11) also reduces N n=1 E1 (un+1 − un ) if E1 6= E
is any SIDE energy function.

Property 3.8 (Lyapunov functionals). Consider the SIDE (3.11), and let E be the
corresponding SIDE energy function: E 0 = F . Let E1 be an arbitrary SIDE energy
function (i.e., such function that E10 satisfies (3.3),(3.9)), and define

X
N
E 1 (u) = E1 (un+1 − un ).
n=1

Then E 1 is a Lyapunov functional of the SIDE. In other words, E 1 (u(t)) is a decreasing

function of time, until the steady state is reached.
54 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

Proof. We again use the notation from the proof of the previous property:

yi = uni for i = 1, . . . , p;
Xp
E1 = E1 (yi+1 − yi ).
i=1

Then, by the chain rule,

X
p
∂E
Ė = ẏi
∂yi
i=1
X
p−1
= −E10 (y2 − y1 )ẏ1 + (E10 (yi − yi−1 ) − E10 (yi+1 − yi ))ẏi + E10 (yp − yp−1 )ẏp
i=2
·
1 0
= − E (y2 − y1 )F (y2 − y1 )
m1 1
X
p−1
1
+ (E 0 (yi − yi−1 ) − E10 (yi+1 − yi ))(F (yi − yi−1 ) − F (yi+1 − yi ))
mi 1
i=2
¸
1 0
+ E (yp − yp−1 )F (yp − yp−1 ) .
mp 1

The first term inside the brackets is positive, since E10 (y2 − y1 ), F (y2 − y1 ), and y2 − y1
all have the same sign. Similarly, the last term is positive. Each term in the summation
is also positive, because of monotonicity of E10 and F . Therefore, Ė < 0.
We now analyze another class of Lyapunov functionals, which includes the `2 norm
and negative entropy.

Property 3.9 (Lyapunov functionals, continued). Suppose that R : IR → IR is a

function such that its derivative R0 is monotonically increasing. Define

X
N
R(u) = R(un ).
n=1

Then R is a Lyapunov functional of Equation (3.4), i.e.,

Ṙ < 0,

until the steady state is reached. In particular, R is a Lyapunov functional of SIDEs.

Proof. Using the notation of the previous proof,

Ã p !
d X
Ṙ = mi R(yi )
dt
i=1
Sec. 3.4. Properties of SIDEs. 55

X
p
= mi R0 (yi )y˙i
i=1
X
p−1
0
= R (y1 )F (y2 − y1 ) + R0 (yi )(F (yi+1 − yi ) − F (yi − yi−1 )) − R0 (yp )F (yp − yp−1 )
i=2
X
p−1
= (R0 (yi ) − R0 (yi+1 ))F (yi+1 − yi ). (3.26)
i=1

Since R0 is monotonically increasing,

sgn(R0 (yi ) − R0 (yi+1 )) = sgn(yi − yi+1 ) = − sgn(yi+1 − yi );
since F is a force function,
sgn(F (yi+1 − yi )) = sgn(yi+1 − yi ).
Therefore, the product (R0 (yi ) − R0 (yi+1 ))F (yi+1 − yi ) is negative, the sum (3.26) is
negative, and so Ṙ < 0.
Example 3.3. `q norms.
Suppose R(un ) = |un |q with q > 1. Then R0 (un ) = sgn(un )q|un |q−1 is monotonically
increasing, which means that the `q norm
Ã q !1
1 X q

(R(u)) q = |un | q

i=1

is a Lyapunov functional of Equation (3.4). This is yet another characterization of the

stability of the equations described by the mechanical model of Section 3.2.
Example 3.4. Moments.
It is similarly shown that the even central moments of u are also Lyapunov func-
tionals:
Ã !2k
1 X 1 X
N N
un − ui , k = 1, 2, . . .
N N
n=1 i=1

Example 3.5. Entropy.

Suppose the initial condition u0 is positive:
min u0n > 0,
1≤n≤N

and define R(un ) = un ln un . Then 0

PNR (un ) = ln un + 1 is monotonically increasing, and
so the negative entropy R(u) = n=1 un ln un is a Lyapunov functional. The fact that
the entropy is increased by SIDEs and other evolution equations of the form (3.4) is in
agreement with the intuitive notion that, as scale increases, the signal is simplified: at
coarser scales, the information content is reduced.
56 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

¥ 3.4.3 Properties in 2-D.

The existence and uniqueness of solutions in 2-D again follow easily from our construc-
tion of solutions. Property 3.1 (the maximum principle) is easily inferred from the 2-D
spring-mass model. (A local maximum (minimum) is a region of a 2-D image whose
value is larger (smaller) than the values of its neighbors. Re-phrasing this definition
in terms of our spring-mass model, a maximum (minimum) is a particle with all its
attached springs directed downward (upward).) Property 3.2 (finite evolution time)
also carries over, with the same proof.
There is, however, no analog of the Lemma on Sliding in 2-D for SIDE force functions
such as that depicted in Figure 3.4: it is easy to show that the solutions in the vicinity
of a discontinuity hyperplane of (3.12) do not necessarily slide onto that hyperplane.
Notice, however, that forcing two neighboring regions with equal intensities to merge
is conceptually very similar to using a modified force function which is infinite at zero,
as depicted in Figure 3.8. Indeed, the fact that F (0± ) = ±∞ means that if the vertical
distance between two particles is very small compared to the distances to their other
neighbors, they will be driven towards each other and will be merged. Thus, the
Lemma on Sliding holds for the force function of Figure 3.8, from which the global
continuous dependence on the initial data is again inferred. We do not use this force
function in simulations, since its large values near zero present problems with numerical
integration of the corresponding equation. What we do use is a SIDE force function, in

Figure 3.8. A modified force function, for which sliding happens in 2-D, as well as in 1-D.

conjunction with Equation (3.12). Since sliding modes do not necessarily occur on the
discontinuity hyperplanes, there is no global continuous dependence on the initial data.
In particular, the sequence of hitting times and associated discontinuity planes does not
depend continuously on initial conditions, and our SIDE evolution does not correspond
to a limiting form of a Perona-Malik evolution in 2-D but in fact represents a decidedly
different type of evolutionary behavior. Several factors, however, indicate the value of
this new evolution and also suggest that a weaker stability result can be proven. First of
all, as shown in the experimental results in the next section, SIDEs can produce excellent
segmentations in 2-D images even in the presence of considerable noise. Moreover,
thanks to the maximum principle, excessively wild behavior of solutions is impossible,
something that is again confirmed by the experiments of the next section. Consequently,
the sequence of hit times (3.17) does not seem to be very sensitive to the initial condition
in that the presence of noise, while perhaps perturbing the ordering of hitting times
and the sliding planes that are hit, seems to introduce perturbations that are, in some
sense, “small”.
Sec. 3.5. Experiments. 57

Finally, we note without giving details that the properties on energy dissipation (3.6
and 3.7) and Property 3.9 on Lyapunov functionals, carry over to 2-D, as well as their
proofs—with slight changes to accommodate the fact that a region may have more than
two neighbors in 2-D.

¥ 3.5 Experiments.
In this section we present examples in both 1-D and 2-D. The purpose of 1-D experi-
ments is to provide the basic intuition for how SIDEs work, as well as to contrast SIDEs
with the methods reviewed in the previous chapter. We do not claim that SIDEs are
the best for any of these 1-D examples, for which good results can be efficiently ob-
tained using simple algorithms. In 2-D, however, this is no longer true, and SIDEs have
considerable advantages over the existing methods.
Choosing a SIDE force function best suited for a particular application is an open
research question. (It is partly addressed in Chapter 4, by describing the problems for
which F (v) = sgn(v) is the best choice.) For the examples below, we use a very simple,
piecewise-linear force function F (v) = sgn(v) − Lv , depicted in Figure 3.9. Note that,

F(v)
1

-L L v
-1

Figure 3.9. The SIDE force function used in the experimental section.

formally, this function does not satisfy our definition (3.3) of a force function, since it is
negative for v > L. Therefore, in our experiments we always make sure that L is larger
than the dynamic range of the signal or image to be processed. In that case, thanks to
the maximum principle, we will have |ui (t) − uj (t)| < L for any pair of pixels at any
time t during evolution, and therefore F (|ui (t) − uj (t)|) > 0.
As we mentioned before, choosing the appropriate stopping rule is also an open
problem. In the examples to follow, we assume that we know the number of regions we
are looking for, and stop the evolution when that number of regions is achieved.

¥ 3.5.1 Experiment 1: 1-D Unit Step in High Noise Environment.

We first test this SIDE on a unit step function corrupted by additive white Gaussian
noise whose standard deviation is equal to the amplitude of the step, and which is
depicted in Figure 3.10(a). The remaining parts of this figure display snapshots of the
SIDE evolution starting with the noisy data in Figure 3.10(a), i.e., they correspond to
the evolution at a selected set of hitting times. The particular members of the scale
space which are illustrated are labeled according to the number of remaining regions.
58 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2
0 50 100 150 200 0 50 100 150 200

(a) Initial signal: noisy unit step. (b) 33 regions left.

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2
0 50 100 150 200 0 50 100 150 200

(c) 11 regions left. (d) 2 regions left.

Figure 3.10. Scale space of a SIDE for a noisy unit step at location 100: (a) the original signal;
(b)–(d) representatives of the resulting SIDE scale space.

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2
0 50 100 150 200 0 50 100 150 200

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2
0 50 100 150 200 0 50 100 150 200

Figure 3.11. Scale space of a Perona-Malik equation with a large K for the noisy step of Figure 3.10.

Note that the last remaining edge, i.e., the edge in Figure 3.10(d) for the hitting time at
which there are only two regions left, is located between samples 101 and 102, which is
quite close to the position of the original edge (between the 100-th and 101-st samples).
In this example, the step in Figure 3.10(d) also has amplitude that is close to that
of the original unit step. In general, thanks to the stability of SIDEs, the sizes of
discontinuities will be diminished through such an evolution, much as they are in other
evolution equations. However, from the perspective of segmentation this is irrelevant–
Sec. 3.5. Experiments. 59
7

−1

−2

−3
0 50 100 150 200

(a) 33 regions left.

−1

−2

−3
0 50 100 150 200

(b) 11 regions left.

−1

−2

−3
0 50 100 150 200

(c) 2 regions left.

Figure 3.12. Scale space of the region merging algorithm of Koepfler, Lopez, and Morel for the noisy
unit step signal of Figure 3.10(a).

i.e., the focus of attention is on detecting and locating the edge, not on estimating its
amplitude.
This example also provides us with the opportunity to contrast the behavior of a
SIDE evolution with a Perona-Malik evolution and in fact to describe the behavior that
originally motivated our work. Specifically, as we noted in the discussion of Property
3.5 of the previous section, a SIDE in 1-D can be approximated with a Perona-Malik
equation of a small thickness K. Observe that a Perona-Malik equation of a large
thickness K will diffuse the edge before removing all the noise. Consequently, if the
objective is segmentation, the desire is to use as small a value of K as possible. Following
the procedure prescribed by Perona, Shiota, and Malik in [50], we computed the
histogram of the absolute values of the gradient throughout the initial signal, and fixed
K at 90% of its integral. The resulting evolution is shown in Figure 3.11. In addition to
its good denoising performance, it also blurs the edge, which is clearly undesirable if the
objective is a sharp segmentation. The comparison of Figures 3.10 and 3.11 strongly
suggests that the smaller K the better. It was precisely this observation that originally
60 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

(a) Three edges (b) Blurred three edges + noise

4 4

3 3

2 2

1 1

0 0

0 50 100 150 200 0 50 100 150 200

(c) 10 regions left (d) 4 regions left

4 4

3 3

2 2

1 1

0 0

0 50 100 150 200 0 50 100 150 200

Figure 3.13. Scale space of a SIDE for a noisy blurred 3-edge staircase: (a) noise-free original signal;
(b) its blurred version with additive noise; (c),(d) representatives of the resulting SIDE scale space.

motivated the development of SIDEs. However, while in 1-D a SIDE evolution can be
viewed precisely as a limit of a Perona-Malik evolution as K goes to 0, there is still
an advantage to using the form of the evolution that we have described rather than
a Perona-Malik evolution with a very small value of K. Specifically, the presence of
explicit reductions in dimensionality during the evolution makes a SIDE implementation
more efficient than that described in [50]. Even for this simple example the Perona-
Malik evolution that produced the result comparable to that in Figure 3.10 evolved
approximately 5 times more slowly than our SIDE evolution. (Both were implemented
via forward Euler discretization schemes [14] in MATLAB.) Although a SIDE in 2-D
cannot be viewed as a limit of Perona-Malik evolutions, the same comparison in speed
of evolution is still true, although in this case the difference in computation time can
be orders of magnitude.
In this example, the region merging method of Koepfler, Lopez and Morel [36] works
quite well (see Figure 3.12). We will soon see, however, that it is not as robust as SIDEs:
its performance worsens dramatically when signals are corrupted with a heavy-tailed
noise.

¥ 3.5.2 Experiment 2: Edge Enhancement in 1-D.

Our second one-dimensional example shows that SIDEs can stably enhance edges. The
staircase signal in the upper left-hand corner of Figure 3.13 was convolved with a
Gaussian and corrupted by additive noise. The evolution was stopped when there
were only four regions (three edges) left. The locations of the edges are very close to
those in the original signal. (Note that the amplitudes of the final signal are quite
different from those of the initial condition. This is immaterial, since we are interested
in segmentation, not in restoration.)
Sec. 3.5. Experiments. 61

¥ 3.5.3 Experiment 3: Robustness in 1-D.

−1

−2

−3
0 50 100 150 200

Figure 3.14. A unit step with heavy-tailed noise.

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3
0 50 100 150 200 0 50 100 150 200

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3
0 50 100 150 200 0 50 100 150 200

7 7

6 6

5 5

4 4

3 3

2 2

1 1

0 0

−1 −1

−2 −2

−3 −3
0 50 100 150 200 0 50 100 150 200

Figure 3.15. Scale spaces for the signal of Figure 3.14: SIDE (left) and Koepfler-Lopez-Morel (right).
Top: 33 regions; middle: 11 regions; bottom: 2 regions.

We now compare the robustness of our algorithm to Koepfler, Lopez, and Morel’s
[36] region merging minimization of the Mumford-Shah functional [44]. For that pur-
pose, we use Monte-Carlo simulations on a unit step signal corrupted by “heavy-tailed”
62 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

16 45
Koepfler−Lopez−Morel Koepfler−Lopez−Morel
SIDE SIDE
14 40

35
12

30
10

25
8
20

6
15

4
10

2 5

0 0
0 0.05 0.1 0.15 1 2 3

(a) (b)
Figure 3.16. Mean absolute errors for Monte-Carlo runs. (Koepfler-Lopez-Morel: solid line; SIDE:
broken line.) The error bars are ±two standard deviations. (a) Different contamination probabilities
(0, 0.05, 0.1, and 0.15); contaminating standard deviation is fixed at 2. (b) Contamination probability
is fixed at 0.15; different contaminating standard deviations (1, 2, and 3).

noise which is, with high probability 1 − ε, normally distributed with σ1 = 0.1, and,
with low probability ε, normally distributed with a larger standard deviation σ2 . A
typical sample path, for ε = 0.1 and σ2 = 2, is shown in Figure 3.14. The SIDE and
Koepfler-Lopez-Morel scale spaces for this signal are illustrated in Figure 3.15. During
every Monte-Carlo trial, each algorithm was stopped when only two regions remained,
and the resulting jump location was taken as the output. When σ2 = 2, the mean ab-
solute errors in locating the jump for ε = 0, ε = 0.05, ε = 0.1, and ε = 0.15 are shown
in Figure 3.16(a) (the solid line is Koepfler-Lopez-Morel, the broken line is SIDE). The
error bars are ±two standard deviations. Figure 3.16(b) shows the mean absolute errors
for different standard deviations σ2 of the contaminating Gaussian, when ε is fixed at
0.15.
As we anticipated in Chapter 2 and will further discuss in the next section, the
quadratic term of the Mumford-Shah energy makes it non-robust to heavy-tailed noise,
and the performance degrades considerably as the contamination probability and the
variance of the contaminating Gaussian increase. Note that when σ2 = 3 and ε = 0.15,
using the Koepfler-Lopez-Morel algorithm is not significantly better than guessing the
edge location as a random number between 1 and 200. At the same time, our method
is very robust, even if the outlier probability is as high as 0.15.
Figure 3.17 shows the scale space generated by a Perona-Malik equation for the
step signal with heavy-tailed noise depicted in Figure 3.14. As in Experiment 1, K was
fixed at 90% of the histogram of the gradient, in accordance with Perona, Shiota, and
Malik [50]. As before, its de-noising performance is good; however, it also introduces
blurring and therefore its output does not immediately provide a segmentation. In
order to get a good segmentation from this procedure, one needs to devise a stopping
rule, so as to stop the evolution at a scale when noise spikes are diffused but the step
Sec. 3.5. Experiments. 63
7

−1

−2

−3
0 50 100 150 200

−1

−2

−3
0 50 100 150 200

−1

−2

−3
0 50 100 150 200

−1

−2

−3
0 50 100 150 200

Figure 3.17. Scale space of a Perona-Malik equation with large K for the signal of Figure 3.14.

is not completely diffused (such as in the second plot of Figure 3.14). In addition, one
needs to use an edge detector in order to extract the edge from the signal at that scale.
We again emphasize that neither SIDEs, nor the Koepfler-Lopez-Morel algorithm,
nor any combination of the Perona-Malik equation with a stopping rule and an edge
detector, are optimal for this simple 1-D problem, for which near-perfect results can be
achieved in a computationally efficient manner by very simple procedures. The purpose
of including this example is to provide statistical evidence for our claim of robustness of
SIDEs. This becomes very important for complicated 2-D problems, such as the ones
considered in the next example, where simple techniques no longer work.
64 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

1000 regions left 100 regions left

100 100

200 200
100 200 100 200
2 regions left SAR image: final segmentation

100 100

200 200
100 200 100 200

Figure 3.18. Scale space of a SIDE for the SAR image of trees and grass, and the final boundary
superimposed on the initial image.

1000 regions left 100 regions left 2 regions left

100 100 100

200 200 200

100 200 100 200 100 200

Figure 3.19. Segmentations of the SAR image via the region merging method of Koepfler, Lopez,
and Morel.

¥ 3.5.4 Experiment 4: SIDE Evolutions in 2-D.

Both the sharpness of boundaries and robustness of SIDEs are also evident in the image
experiments we have conducted. These properties are used to advantage in segmenting
the SAR image of Figure 1.1 in which only two textures are present (forest and grass).
The scale space is shown in Figure 3.18 (with the intensity values of each image scaled so
as to take up the whole grayscale range), as well as the resulting boundary superimposed
onto the original log-magnitude image. SAR imagery, such as the example shown here,
are subject to the phenomenon known as speckle, which is present in any coherent
imaging system and which leads to the large amplitude variations and noise evident
in the original image. Consequently, the accurate segmentation of such imagery can
Sec. 3.5. Experiments. 65

be quite challenging and in particular cannot be accomplished using standard edge

detection algorithms. For example, the scale space of the region merging algorithm
of [36], as implemented in [20] and discussed above, is depicted in Figure 3.19. If
evolved until two regions remain, it will find the boundary around a burst of noise. In
contrast, the two-region segmentation displayed in Figure 3.18(d) is very accurate.
We note, that, as mentioned in Experiment 1, the SIDE evolutions require far less
computation time than Perona-Malik-type evolutions. Since in 2-D a SIDE evolution
is not a limiting form of a Perona-Malik evolution, the comparison is not quite as
simple. However, in experiments that we have performed in which we have devised
Perona-Malik evolutions that produce results as qualitatively similar to those in Figure
3.18 as possible, we have found that the resulting computational effort is roughly 130
times slower for this (201 × 201) image than our SIDE evolution. (As in 1-D, we used
forward Euler discretization schemes [14] to implement both equations.)

10 10

20 20

30 30

40 40

50 50

60 60

70 70

80 80

90 90

100 100
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

(a) Ultrasound image of a thyroid. (b) 2000 regions.

10 10

20 20

30 30

40 40

50 50

60 60

70 70

80 80

90 90

100 100
10 20 30 40 50 60 70 80 90 100 10 20 30 40 50 60 70 80 90 100

(c) 170 regions. (d) 2 regions.

Figure 3.20. Scale space of a SIDE for the ultrasound image of a thyroid.

We close this section by presenting another segmentation example in 2-D. In the

ultrasound image of a thyroid (Figure 3.20(a)), both speckle and blurring are evident.
This degradation, inherent to ultrasound imaging, makes the problem of segmentation
very difficult. Three images from the SIDE scale space are shown in Figure 3.20(b-d).
The two-region segmentation (Figure 3.20(d)) is again very accurate.
We note that SIDEs have been used successfully in the context of another medical
application in [19], namely segmentation of dermatoscopic images of skin lesions.
66 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS

¥ 3.6 Related Approaches.

¥ 3.6.1 Mumford-Shah, Geman-Reynolds, and Zhu-Mumford.
The global energy (3.18) associated with SIDEs is similar to the first term of the image
restoration model of D. Geman and Reynolds [21], as well as to Zhu and Mumford’s
potential function [75]. As we showed in the experimental section, this not only leads
to sharp segmentations, but also allows our method to be more robust to heavy-tailed
noise than algorithms which use quadratic energy terms. An important distinction
from [21] and [75] is the interpretation of SIDEs as a region merging method, which
leads to a much faster numerical implementation. It is also a more universal method
of edge sharpening, since, unlike the algorithm of Geman and Reynolds, it does not
require the knowledge of the blur model.
Since SIDEs are implemented via recursive region merging, it is instructive to com-
pare them with other recursive region merging algorithms, such as Koepfler, Lopez,
and Morel’s [36] implementation of Mumford-Shah [44] segmentation, in which merging
of neighboring regions occurs if it reduces the energy EM S (u) = (u − u0 )T (u − u0 ) + λl.
(Here l is the total length of the boundaries, and λ is a scale parameter: a larger λ
imposes greater penalty on boundaries, which results in a coarser segmentation u.) The
first term of this functional makes it non-robust to noise outliers. As we have seen, this
term quadratically penalizes the difference between the initial image and its approxima-
tion u, thereby causing very large outliers present in the original image u0 to re-appear
in u, even for large values of λ. This was clearly illustrated by the examples presented
in the experimental section. The absence of such a term from the SIDE energy allows
the evolution to diffuse strong bursts of noise, making it robust.

¥ 3.6.2 Shock Filters and Total Variation.

Replacing the discrete vector u(t) with a function u(t, x) of a continuous spatial variable
x, and replacing first differences with derivatives in Equation (3.4), we see that, for
∂
mn = 1, Equation (3.4) is a discretization of ut = ∂x [F (ux )], where letter subscripts
denote corresponding partial derivatives. Expanding the SIDE force function again as
F (v) = Fid (v) + C sgn(v), we obtain:

∂ 0
ut = C [sgn(ux )] + Fid (ux )uxx . (3.27)
∂x
The first of the RHS terms is the 1-D version of the gradient descent on total variation.
It has very good noise removal properties, but, if used alone, it will ultimately blur
the signal. If Fid (v) = − 12 v|v|, then the second term is equal to the RHS of one of the
shock filters introduced by Osher and Rudin in [46]—namely, Equation (2.16) which we
considered in Chapter 2. Discretizations of certain shock filters are excellent for edge
enhancement, but, as we saw in Chapter 2, they cannot remove noise (and, in fact, some
of them are unstable and noise-enhancing.) Thus, SIDEs combine the noise-suppressive
properties of the total variation approach with the edge-sharpening features of shock
Sec. 3.7. Conclusion. 67

filters. It should be noted, however, that (3.27) requires careful interpretation, because
its RHS contains the signum function of ux which itself may have both singularities
and segments over which it is identically zero. In addition, this strange object is dif-
ferentiated with respect to x. Thus, the interesting research issue arises of defining
what one means by a solution to (3.27), in such a manner that the definition results
in solutions relevant to the desired image processing applications. This complicated
problem is avoided entirely with the introduction of SIDEs, since there one starts with
a semi-discrete formulation, in which the issues of the existence and uniqueness of so-
lutions are well understood. The SIDEs are thus a logical extension of Bouman and
Sauer’s approach of [6, 58] in which images are discrete matrices of numbers, rather
than functions of two continuous variables. As described in Chapter 2, Bouman and
Sauer proposed minimizing an energy functional consisting of two terms, one of which
is the discrete counterpart of the total variation. Their method of quickly computing
a local minimum of this non-differentiable functional involved merging pixels and thus
anticipated SIDEs.

¥ 3.7 Conclusion.
In this chapter we have presented a new approach to edge enhancement and segmenta-
tion, and demonstrated its successful application to signals and images with very high
levels of noise, as well as to blurry signals. Our approach is based on a new class of
evolution equations for the processing of imagery and signals which we have termed
stabilized inverse diffusion equations or SIDEs. These evolutions, which have discontin-
uous right-hand sides, have conceptual and mathematical links to other evolution-based
methods in signal and image processing, but they also have their own unique qualitative
characteristics and properties. The next chapter is devoted to extensive analysis of a
particular version of SIDEs.
68 CHAPTER 3. IMAGE SEGMENTATION WITH STABILIZED INVERSE DIFFUSION EQUATIONS
Chapter 4

Probabilistic Analysis

¥ 4.1 Introduction.
HE recent years have seen a great number of exciting developments in the field of
T nonlinear diffusion filtering of images. As summarized in Chapter 2 and Section 3.6,
many theories have been proposed that result in edge-preserving scale spaces possessing
various interesting properties. One striking feature unifying many of these frameworks–
including the one introduced in the previous chapter–is that they are deterministic.
Usually, one starts with a set of “common-sense” principles which an image smoothing
operation should satisfy. Examples of these are the axioms in [1] and the observation
in [49] that, in order to achieve edge preservation, very little smoothing should be done
at points with high gradient. From these principles, a nonlinear scale space is derived,
and then it is analyzed–again, deterministically. Note, however, that since the objective
of these techniques is usually restoration or segmentation of images in the presence of
noise, a natural question to ask would be:

Do nonlinear diffusion techniques solve standard estimation or detection

problems? (*)

An affirmative answer would help us understand which technique is suited best for a
particular application, and aid in designing new algorithms. It would also put the tools
of the classical detection and estimation theory at our disposal for the analysis of these
techniques, making it easier to tackle an even more crucial question:

Given a probabilistic noise model, can one characterize the performance of

the nonlinear diffusion techniques? (**)

Attempts to address these issues in the literature have remained scarce–most likely,
because the complex nature of the nonlinear partial differential equations (PDEs) con-
sidered and of the images of interest make this analysis prohibitively complicated. Most
notable exceptions are [63,75] which establish qualitative relations between the Perona-
Malik equation [49] and gradient descent procedures for estimating random fields mod-
eled by Gibbs distributions. Bayesian ideas are combined in [76] with snakes and region
growing for image segmentation. In [5], concepts from robust statistics are used to

69
70 CHAPTER 4. PROBABILISTIC ANALYSIS

(a) (b)
Figure 4.1. Functions F from the right-hand side of the SIDE: (a) generic form; (b) the signum
function.

modify the Perona-Malik equation. In [38], a connection between random walks and
diffusions is used to obtain a new evolution equation.
The goal of this chapter is to move forward the discussion of questions (*) and (**).
We consider a very simple nonlinear diffusion (a variant of those introduced in the
previous chapter) which provides a multiscale sequence of segmentations of its initial
condition. In Sections 4.3 and 4.4, we apply our algorithm to 1-D signals, and describe
edge detection problems which are solved optimally by this diffusion. These are binary
classification problems: each sample has to be classified as coming from one of two
classes, subject to the constraint on the number of “edges”—i.e., changes between the
two classes. One of these problems turns out to be the minimization of a special case
of the 1-D Mumford-Shah functional [44]. We describe an efficient implementation of
the 1-D diffusion, requiring O(N log N ) computations in the worst case, where N is
the size of the input signal. In Section 4.5, we point out that the same 1-D problem
can also be solved via dynamic programming and via linear programming, but that
our method has certain advantages over both. To analyze the performance (Section
4.6), we simplify even further, by considering signals with only one change in mean.
Our performance measure is the accuracy in locating the change. More precisely, the
probability of events of the form “the detected change location is more than p samples
away from the actual one” is analyzed. We show that the asymptotic probabilities
of these events can be obtained directly from the classical change detection paper by
Hinkley [28]. We also derive non-asymptotic lower bounds on these probabilities. The
robustness of the algorithm—which is experimentally confirmed both in this chapter
and in Chapter 3—is analyzed theoretically by showing the optimality with respect to
a certain H∞ -like criterion. In Section 4.7, we treat segmentation of 2-D images.

¥ 4.2 Background and Notation.

In this chapter, we consider a special case of SIDEs (3.11) in 1-D, which results if
one drops the “monotonically decreasing” requirement on F , and takes F (v) = sgn(v)
instead (see Figure 4.1, (b)). Specifically, we are interested in the evolution of the
Sec. 4.2. Background and Notation. 71

following equation:
sgn(u2 − u1 ) sgn(uN −1 − uN )
u̇1 = , u̇N = ,
m1 mN
1
u̇n = (sgn(un+1 − un ) − sgn(un − un−1 )), (4.1)
mn
for n = 2, . . . , N − 1,

with the initial condition

u(0) = u0 , (4.2)

where, as we explain in Section 4.4, u0 is either the signal to be processed or a sequence

of logarithms of pointwise likelihood ratios. As in the previous chapter, N stands for
the number of samples in the signals under consideration. Boldface letters denote these
signals, whose entries are always denoted by the same letter with subscripts 1 through
N : u = (u1 , . . . uN )T . Initially the finest segmentation is assumed: each pixel is a
separate region, i.e. mn = 1, for n = 1, . . . , N . As soon as ui becomes equal to ui+1 ,
these values stay equal forever, and their equations are replaced with
(sgn(ui+2 − ui+1 ) − sgn(ui − ui−1 ))
u̇i = u̇i+1 = , (4.3)
mi + mi+1
which is Equation (3.10). The rest of the present chapter deals with the particular
SIDE (4.1,4.2,4.3).
We apply the SIDE (4.1,4.2) to binary classification problems. Given an observation
y, the goal is to label each sample as coming from one of two classes, i.e. to produce a
binary signal h whose entries are zeros and ones. We call any such binary signal h an
hypothesis.
Definition 4.1. We denote the set of all N -dimensional binary signals by {0, 1}N . A
member h of this set is called an hypothesis with the following interpretation. Specif-
ically, if hi 6= hi+1 , we say that an edge (or, equivalently, a change) is hypothesized
at the location i, and we call sgn(hi+1 − hi ) the sign of the edge. We say that an
edge is directed upward (downward) if hi+1 > hi (hi+1 < hi ).
Definition 4.2. A statistic is a function φ : IRN × {0, 1}N → IR. The optimal
hypothesis h∗ (u) for a signal u ∈ IRN with respect to φ is

h∗ (u) = arg max φ(u, h).

def
h∈{0,1}N

The best hypothesis among those whose number of edges does not exceed some constant
ν is

h∗≤ν (u) = arg

def
max φ(u, h).
h ∈ {0, 1}N , h has ν or fewer edges
72 CHAPTER 4. PROBABILISTIC ANALYSIS

−1
0 2 4 6 8 10

0 2 4 6 8 10

Figure 4.2. Illustrations of Definitions 4.1 and 4.3: a sequence with three α-crossings, where α=3
(top); the hypothesis generated by the three α-crossings (middle); the hypothesis generated by the two
rightmost α-crossings (bottom).

Note that an hypothesis is uniquely defined by the set of its edges and the sign
of one of the edges. Therefore, binary classification problems can also be viewed as
edge detection problems. For the problems considered in this chapter, the optimal edge
locations will typically be level crossings of some signal.
Definition 4.3. A signal u is said to have an α-crossing at the location i if (ui − α)(uj − α) < 0,
where j = min{n: n > i, un 6= α}. (In other words, uj is the first sample to the right of
i which is not equal to α.) We call sgn(α − ui ) the sign of the α-crossing, and say
that the α-crossing is directed upward (downward) if ui < α (ui > α). We define the
hypothesis generated by a set of α-crossings {g1 , . . . , gν } of u as the hypothesis whose
edges are at g1 , . . . , gν and for which the sign of each edge is the same as the sign of
the corresponding α-crossing.
To illustrate Definitions 4.1, 4.2, and 4.3, let us consider an example.
Example 4.1. Illustration of the definitions of edges, α-crossings, and statistics.
Suppose
u = (1, 2, 2, 3, 3, 4, −1, 2, 5)T
Sec. 4.3. SIDE as an Optimizer of a Statistic. 73

(see top of Figure 4.2), and α = 3. Then u has three α-crossings, at locations g1 = 3,
g2 = 6, and g3 = 8. The second one is directed downward, and the other two are directed
upward. The hypothesis h1 generated by these three α-crossings must therefore have
upward edges at 3 and 8 and a downward edge at 6:

h1 = (0, 0, 0, 1, 1, 1, 0, 0, 1)T ,

as depicted in the middle plot of Figure 4.2. The hypothesis h2 generated by the
α-crossings g2 and g3 will only have a downward edge at 6 and an upward edge at 8:

h2 = (1, 1, 1, 1, 1, 1, 0, 0, 1)T ,

as shown in the last plot of Figure 4.2. If we define a statistic φ by

φ(u, h) = hT (u − a),
where a = (3, . . . , 3)T ∈ IR9 ,

then we have:

φ(u, h1 ) = 3,
φ(u, h2 ) = −1.

¥ 4.3 SIDE as an Optimizer of a Statistic.

The usefulness of the SIDE (4.1,4.2) in solving edge detection problems comes from its
ability to maximize certain statistics. One of the properties of such a statistic φ(u, h)
is that the optimal hypothesis h∗ (u) is generated by the set of all α-crossings of u, for
some number α. It is shown below in Proposition 4.1 that every edge of h∗≤ν (u) also
occurs at an α-crossing of u. Furthermore, Proposition 4.2 describes how to find these
edges using the SIDE (4.1): it says that the α-crossings of the solution u(t) to the SIDE
(4.1) then generate the constrained optimal hypothesis h∗≤ν (u0 ), where u0 is the initial
data, and ν is the number of α-crossings of u(t). We note that when we talk about P α-
crossings of u in Propositions 4.1 and 4.2, we will allow α to be a function of N1 Ni=1 i .
u
This is necessary to P
cover some important examples considered later in Section 4.4. On
the other hand, N1 N i=1 ui (t) is constant throughout the evolution of the SIDE, as
1 PN
easily verified by summing up the equations (4.1). Thus, α( N i=1 ui ) will also stay
constant during the evolution of the SIDE. We therefore will simply refer to α, dropping
its argument to avoid notational clutter. It will, however, be our implicit assumption
throughout the remainder of this chapter that P whenever we mention α-crossings of a
signal u, α is allowed to be a function of N1 N i=1 ui .
74 CHAPTER 4. PROBABILISTIC ANALYSIS

Proposition 4.1. Let α : IR → IR be a scalar function, and let a : IR → IRN be

the vector function whose every entry is α: a(x) = (α(x), . . . , α(x))T for any x ∈ IR.
Define the following statistic:
Ã Ã !!
1 X
N
φ(u, h) = hT
u−a ui . (4.4)
N
i=1

Then every edge of h∗≤ν (u0 ) occurs at an α-crossing of u0 , for any u0 ∈ IRN .
Proof is in Appendix B.
Proposition 4.2. Suppose that φ is the statistic described in Proposition 4.1. Fix the
initialPcondition u0 of the SIDE (4.1), and let u(t) be the corresponding solution. Then
α( N1 N i=1 ui (t)) is constant during the evolution of the SIDE, as verified by summing
up the equations (4.1). Let να (t) be the number of α-crossings of u(t). Then, for any
time instant tf > 0,
h∗≤να (tf ) (u0 ) = h∗ (u(tf )).
The proof is in Appendix B. We note that Proposition 1 of [51] is a different
formulation of the same result: in [51], we explicitly listed the properties of φ which are
used in the proof. The equivalence of the two formulations is also shown in Appendix
B.
This proposition says that, if the SIDE is evolved until να (t) α-crossings remain,
then these α-crossings are the optimal edges, where “optimality” means maximizing
the statistic φ(u0 , h) subject to the constraint that the hypothesis have να (t) or fewer
edges. It is verified in the next subsection that να (t) is a non-increasing function of time,
with να (∞) = 0. Unfortunately, να (t) is not guaranteed to assume every integer value
between να (0) and 0: during the evolution of the SIDE, α-crossings can disappear in
pairs. We will show in the next subsection, however, that no more than two α-crossings
can disappear at the same time. We will also show that, even if for some integer
ν < να (0) there is no t such that να (t) = ν (i.e. να (t) goes from ν + 1 directly to ν − 1),
we can still easily find h∗≤ν (u0 ) using the set of α-crossings of the solution u(t) to the
SIDE at the time t when να (t) = ν + 1. If the desired number of edges is greater than
or equal to the initial number of α-crossings, ν ≥ να (0), then, from the definitions of
h∗≤ν (u0 ) and h∗ (u0 ), we immediately have:
Proposition 4.3. Suppose that φ is the statistic described in Proposition 4.1. If ν ≥
να (0), then
h∗≤ν (u0 ) = h∗ (u0 ).
In the remainder of the chapter, we assume that ν < να (0).
In Section 4.4, we will give examples of detection problems whose solution is equiv-
alent to maximizing the statistic φ. We will therefore be able to utilize the SIDE for
optimally solving these problems. Before we do that, however, we describe how to
efficiently implement the SIDE.
Sec. 4.3. SIDE as an Optimizer of a Statistic. 75

¥ 4.3.1 Implementation of the SIDE Via a Region Merging Algorithm.

We now show how to solve efficiently the optimization problem treated in Proposition
4.2 of the previous subsection, using the SIDE (4.1).

Problem 4.1. Define a statistic

Ã Ã !!
1 X
N
φ(u, h) = h T
u−a ui ,
N
i=1

where a(x) = (α(x), . . . , α(x))T ∈ IRN , and α is a real-valued function of a real argu-
ment. Given an integer ν and a signal u0 , we are interested in finding the best hypothesis
h∗≤ν (u0 ) among all the hypotheses with ν or fewer edges, where “the best” means the
one maximizing φ(u0 , ·).

Proposition 4.2 relates h∗≤ν (u0 ) to the solution u(t) of the SIDE whose initial data is
u0 . Namely, it says that if ν is the number of α-crossings of u(t)1 , then these α-crossings
generate the hypothesis h∗≤ν (u0 ). It is, however, not guaranteed that for every integer
ν there is a time instant t when the solution u(t) has exactly ν α-crossings. Therefore,
in order to compute the solution to Problem 4.1, we need to deal with two issues:

(A) How to calculate the locations of the α-crossings of u(t)?

(B) How to compute h∗≤ν (u0 ) for every integer ν?

We first consider issue A. In order to find the α-crossings of the solution to the SIDE, one
can certainly use a finite difference scheme to numerically integrate the equation. There
is, however, a much faster way, which exploits the special structure of the equation. It
turns out that, during the evolution of the SIDE, α-crossings cannot be created or
shifted: they can only be erased. We therefore only need to compute the order in which
they disappear. We now make these statements precise.

Lemma 4.1. Suppose that at time t0 , the solution u(t0 ) to the SIDE has no α-crossing
at the location i. Then u(t) has no α-crossing at i, either, for all t ≥ t0 .

Example 4.2 (Example 4.1, continued). Illustration of Lemma 4.1.

We illustrate this Lemma by evolving the SIDE (4.1) for the initial condition

u0 = (1, 2, 2, 3, 3, 4, −1, 2, 5)T

(top of Figure 4.2) and α = 3. The values of the solution for several time instants are
recorded in Table 1.

1 1
PN
Just as in the previous subsection, we abuse notation by dropping the argument N i=1 ui (t) of
α.
76 CHAPTER 4. PROBABILISTIC ANALYSIS

Table 1: Evolution of the SIDE.

t uT (t)
t=0 (1, 2, 2, 3, 3, 4, −1, 2, 5)
t = 12 (1 12 , 2, 2, 3, 3, 3, 0, 2, 4 12 )
t=1 (2, 2, 2, 2 23 , 2 23 , 2 23 , 1, 2, 4)
1
t = 12 (2 6 , 2 16 , 2 16 , 2 13 , 2 13 , 2 13 , 2, 2, 3 12 )
1

t = 1 23 (2 29 , 2 29 , 2 29 , 2 29 , 2 29 , 2 29 , 2 16 , 2 16 , 3 23 )
t = 2 (2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 2 14 , 3 13 )
t = 2 23 (2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 13 , 2 23 )

At the beginning, u(t) has three α-crossings, at locations 3, 6, and 8. At time t = 12 ,

the first two α-crossings disappear. Between t = 2 and t = 2 23 , the α-crossing located
at 8 also disappears.
In order to prove Lemma 4.1, we need to re-define a region (i, j) of a signal u as the
set of all samples {ui , . . . , uj } either between two consecutive α-crossings i − 1 and j
of u, or to the left of the leftmost α-crossing j (in which case i = 1), or to the right of
the rightmost α-crossing i − 1 (in which case j = N )2 . For example, referring to Table
1, u(0) has four regions: (1, 3), (4, 6), (7, 8), and (9, 9). Similarly, u(2) has two regions:
(1, 8) and (9, 9). Note that a sample uk (t) belonging to a region (i, j) can only cross
the level α if all other samples in its region do so at the same time:

Lemma 4.2. Let (i, j) be a region of u(t0 ), where t0 ≥ 0, and let the intensity values
inside this region be above α: ui (t0 ) > α, . . . , uj (t0 ) > α. Let t1 be the first time instant
after t0 at which one of the values inside the region, say uk (t1 ), becomes equal to α:

t1 = inf{t > t0 : ∃k, i ≤ k ≤ j, uk (t) = α}.

Then

ui (t1 ) = ui+1 (t1 ) = . . . uj (t1 ).

Proof. Notice that, according to Equations (4.1,4.3), uk (t) can be decreasing only
if it is a local maximum (i.e., if uk (t) ≥ uk±1 (t)). Thus, at time t1 , we have that

• uk (t1 ) = α is a local maximum;

• all other values inside the region (i, j) are ≥ α.

Consequently, it must be that the values at all the samples inside the region (i, j) are
equal to α.
Proof of Lemma 4.1. A proof similar to the one above applies to the variant of Lemma
4.2 when ui (t0 ), . . . , uj (t0 ) are less than α. Thus, the value uk (t) at any location k can
2
Note that this definition is somewhat different from the one in Chapter 3, where all pixels in a
region had the same value.
Sec. 4.3. SIDE as an Optimizer of a Statistic. 77

cross the level α only when its whole region does so. This means that the evolution of
the SIDE cannot create or shift α-crossings; it can only remove them.
We now show how to calculate the order in which the regions disappear. It turns
out that this ordering depends on how the removal of a region influences the statistic
φ(u(t), h). Define the energy Eij of the region (i, j) by
¯ j ¯
1 ¯¯X ¯
¯
Eij (t) = ¯ (un (t) − α)¯ , (4.5)
ρij ¯ ¯
n=i
½
1 if i = 1 or j = N
ρij =
2 otherwise.

Note that the energy measures the contribution of the region (i, j) to φ(u(t), h). Sum-
ming up the equations (4.1) from n = i to n = j, we see that, for every region (i, j),
Ėij (t) = −1. A region (i, j) is erased when the values of all its samples reach α—i.e.,
when the energy Eij (t) becomes equal zero. Since all the energies are diminished with
the same speed, it follows that the first region to disappear will be the one for which
Eij (0) is the smallest. Applying this reasoning recursively, we then remove the region
with the next smallest energy, etc., obtaining the following algorithm to compute the
α-crossings of u(t).
1. Initialize. Let A be the set of all α-crossings of u0 , ordered from left to right, and
let ν̄ = να (0) be the total number of α-crossings.

2. Compute the energies. Denote the elements of the set A by g1 , . . . , gν̄ , and form
ν̄ + 1 regions: (1, g1 ), (g1 + 1, g2 ), . . . , (gν̄ + 1, N ). For each region (i, j), compute its
energy Eij , as defined by (4.5).

3. Remove the region with minimal energy. Let (im , jm ) be the region for which
Eij is the smallest (if there are several regions with the smallest energy, choose any
one). Merge the region (im , jm ) with its neighbors by re-defining A and ν̄ via

A ← A\{im , jm }, ν̄ ← the size of the new A.

4. Iterate. If ν̄ is greater than the desired number of edges, go to step 2.

To illustrate the algorithm, we apply it to the signal of Example 4.1 (top of Figure 4.2).
Example 4.3 (Examples 4.1 and 4.2, continued). Illustration of the region merg-
ing algorithm.
We again consider

u0 = (1, 2, 2, 3, 3, 4, −1, 2, 5)T .

Iteration 1. There are three α-crossings, and four regions: (1, 3), (4, 6), (7, 8), and
(9, 9), with the energies E13 = 4, E46 = 0.5, E78 = 2.5, and E99 = 2, respectively. The
78 CHAPTER 4. PROBABILISTIC ANALYSIS

region (4, 6) has the smallest energy, and therefore it is removed first, by merging it
with its two neighbors to form the new region (1, 8).
Iteration 2. There are now two regions, (1, 8) and (9, 9), with the energies E1,8 = 8
and E99 = 2, respectively. They are merged, to form one region (1, 9). Note that the
order in which the regions disappear is in agreement with Table 1.
We now show that the algorithm is fast. The initialization steps 1 and 2 take
O(N ) time. Step 3 merges either two regions (if im = 1 or jm = N ) or three regions
(otherwise). The energy of the new region is essentially the sum of the energies of its
constituent regions, and therefore the recomputation of energies after a merging takes
O(1) time. If a binary heap [11] is used to store the energies, the size of the heap at
every iteration of the algorithm will be equal to the current number of distinct regions,
which is ν̄ + 1, where ν̄ is the current number of α-crossings. Therefore, finding the
minimal element of the heap (step 3) at each iteration will take O(log ν̄ + 1) time, which
Pνα (0)
means that the algorithm will run in O( ν̄=ν+1 log ν̄ +N ) time. The worst case is when
ν = 1 and να (0) = N − 1. Then the computational complexity is O(N log N ). However,
if the desired number of edges ν is comparable with the initial number να (0) (which
can happen in low-noise scenarios), the complexity is O(N ).
We still have to address Question (B) which we posed at the beginning of this
subsection, namely, how to find h∗≤ν (u0 ) for every integer ν. If there is a time instant
t at which u(t) has exactly ν α-crossings, then, according to Proposition 4.2, these
α-crossings generate the hypothesis h∗≤ν (u0 ), which means that Problem 4.1 is solved.
The scenario which we need to consider now is when there is no such time t at which
u(t) has exactly ν α-crossings. As we showed above, computing the locations of α-
crossings of u(t) involves removing regions one at a time (Step 3 of the algorithm).
Thus, at most two α-crossings can disappear at the same time. Therefore, if u(t) never
has ν α-crossings, it must go from ν + 1 α-crossings directly to ν − 1. It turns out that,
in this case, one can still compute h∗≤ν (u0 ) by running the algorithm above until ν − 1
α-crossings remain, and then doing post-processing whose computational complexity is
O(N ). Specifically, the following proposition holds.
Proposition 4.4. Suppose that there is a time instant t during the evolution of the
SIDE such that u(t− ) has ν + 1 α-crossings, at locations g1 , . . . , gν+1 . The hypoth-
esis generated by these α-crossings is h∗≤ν+1 (u0 ). Suppose further that the region
(gk + 1, gk+1 ) disappears at time t, so that u(t+ ) has ν − 1 α-crossings, at locations
g1 , . . . , gk−1 , gk+2 , . . . , gν+1 . The hypothesis generated by these α-crossings is h∗≤ν−1 (u0 ).
Then one of the following four possibilities must happen.
(i) h∗≤ν (u0 ) = h∗≤ν−1 (u0 ).

(ii) h∗≤ν (u0 ) has edges at the locations g2 , . . . , gν+1 .

(iii) h∗≤ν (u0 ) has edges at the locations g1 , . . . , gν .

(iv) h∗≤ν (u0 ) has edges at the locations g1 , . . . , gk−1 , gk+2 , . . . , gν+1 , as well as one edge
at the location which is an element of the set {1, 2, . . . , g1 −1, gν+1 +1, . . . , N −1}.
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 79

Thus, finding h∗≤ν (u0 ) is achieved by running the SIDE and doing post-processing of
complexity O(N ).
Proposition 4.4 is the recipe for obtaining the optimal hypothesis h∗≤ν (u0 ) from
the ν + 1 α-crossings of u(t− ). It says that either ν or ν − 1 of these α-crossings
coincide with the edges of h∗≤ν (u0 ). Cases (ii) and (iii) describe the only two subsets
consisting of ν α-crossings which can generate h∗≤ν (u0 ). If only ν − 1 α-crossings of
u(t− ) coincide with the edges of h∗≤ν (u0 ), then either there are no other edges (Case
(i)), or the remaining edge is easily found in linear time (Case (iv)).
We note that a slight correction to what was reported in [51] is in order: although
the statement that the complexity of the post-processing is O(N ) is correct, the specific
post-processing procedure given there is somewhat different from the one outlined in
Proposition 4.4 above, and therefore, it may be incorrect for some data sequences.
As one can infer from the statement of this proposition, its proof is rather technical
and amounts to analyzing various scenarios of the disappearance of the α-crossings.
This proposition is a direct corollary of Proposition 4.1 and the following lemma.
Lemma 4.3. As in Proposition 4.4, let t be the time instant when the solution of the
SIDE goes from ν + 1 α-crossings to ν − 1. Let h = h∗≤ν (u0 ), and suppose that it is
generated by ν α-crossings of u0 : g1 , . . . , gν . (Note that this notation is different from
the notation of Proposition 4.4.) Then at least ν − 1 elements of the set {g1 , . . . , gν }
are also α-crossings of u(t− ), with possible exception of either g1 or gν . Furthermore,
if exactly ν − 1 elements of the set {g1 , . . . , gν } are α-crossings of u(t− ), they are also
α-crossings of u(t+ ),
Proof is in Appendix B.

¥ 4.4 Detection Problems Optimally Solved by the SIDE.

We now give examples of detection problems whose solution is equivalent to maximizing
the statistic φ of Proposition 4.2. These problems can therefore be optimally solved by
the SIDE.

¥ 4.4.1 Two Distributions with Known Parameters.

Let y be an observation of a sequence of N independent random variables. Suppose that
each random variable has probability density function (pdf) either f (y, θ0 ) or f (y, θ1 ),
where θ0 and θ1 are known. It is also known that the number of changes between the
two pdf’s does not exceed ν; however, it is not known where these changes occur.
To obtain the maximum likelihood hypothesis [66], one has to maximize the log
likelihood function
X X
log f (yi , θ1 ) + log f (yi , θ0 ),
i:hi =1 i:hi =0
80 CHAPTER 4. PROBABILISTIC ANALYSIS

−2

−4
0 200 400 600 800 1000

−2

−4
0 200 400 600 800 1000

−2

−4
0 200 400 600 800 1000

−2

−4
0 200 400 600 800 1000

Figure 4.3. Edge detection for a binary signal in Gaussian noise.

−2

−4
0 200 400 600 800 1000

1.5

0 200 400 600 800 1000

Figure 4.4. Detection of changes in variance of Gaussian noise.

where the hypothesis h is such that the sample yi is hypothesized to be from the pdf
f (y, θhi ). Note that by defining a signal consisting of pointwise log-likelihoods,

u0i = log f (yi , θ1 ) − log f (yi , θ0 ), (4.6)

we see that the log-likelihood is equal to

X
N
T 0
h u + log f (yi , θ0 ).
i=1

The second term is independent of h, and therefore maximizing this function is equiv-
alent to maximizing
def
φ(u0 , h) = hT u0 , (4.7)
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 81

which is the statistic of Proposition 4.2 with α = 0. Thus, the SIDE can be employed
for finding the maximum likelihood hypothesis h∗≤ν (u0 ), where u0 is related to the
observation y through (4.6).

Example 4.4. Changes in mean in a Gaussian random vector.

In this example, f (y, θj ) is the Gaussian density with mean θj and variance 1. We
took θ0 = 0 and θ1 = 1. We assumed that the right number of edges, 10, is known,
and so the stopping rule for the SIDE was να (t) ≤ 10. (In Subsection 4.4.3, we will
treat the situation when the number of edges is a random variable, rather than a known
parameter.)
The pointwise log-likelihoods (4.6) in this case are
1
u0i = yi − (θ1 + θ0 ). (4.8)
2
(Note that, if u(t) is the solution to the SIDE with the initial condition u0 of (4.8),
and u0 (t) is the solution to the SIDE with the initial condition u0 (0) = y, then u0 (t) =
u(t) + α0 , where
1
α0 = (θ1 + θ0 ), (4.9)
2
and therefore the zero-crossings of u(t) coincide with the α0 -crossings of u0 (t). Conse-
quently, we can simply evolve the SIDE with the data y as the initial condition, and
look at its α0 -crossings.)
Figure 4.3, from top down, depicts the true segmentation with ten edges, a corre-
sponding observation y, and the edges detected by the SIDE (the bottom plot will be
explained in the next subsection). Note that the result is extremely accurate, despite
the fact that the data is very noisy. The computations took 0.25 seconds on a Sparc
Ultra 1, thanks to the fast implementation described in Subsection 4.3.1.

Example 4.5. Changes in variance in a Gaussian random vector.

Now f (y, θj ) is a zero-mean Gaussian density with standard deviation θj ; θ0 = 1

and θ1 = 1.5. The changes between the two are at the same locations as the jumps
in the previous example (see the top plot of Figure 4.3). The top plot of Figure 4.4
shows an observation y. Again, we assume that the number of changes is known. The
bottom plot of Figure 4.4 shows the changes detected by the SIDE, depicted as a binary
sequence of θ0 ’s and θ1 ’s. In addition to being very accurate, the computations took
just 0.25 seconds.

¥ 4.4.2 Two Gaussian Distributions with Unknown Means.

Suppose that f (y, θj ) is the Gaussian density with mean θj and variance σ 2 . Let h be
an hypothesis, and let Y be a sequence of N random variables which are conditionally
82 CHAPTER 4. PROBABILISTIC ANALYSIS

independent given h, with the i-th random variable Yi having conditional pdf f (y, θhi ).
Let ν be an upper bound on √ the number of edges in h. Let K be the number of zeros
σ
in h, and define σ1 = θ1 −θ0
N . Let the prior knowledge be as follows:
θ0 and h are unknown;
σ, σ1 , and ν are known;
K is a random variable with the following discrete Gaussian probability mass function:
 Ã ! 
 1 k − N 2
pr(K = k) = C exp − 2
, k = 1, . . . , N − 1, (4.10)
 2 σ1 

where C is a normalization constant.

We stress here that, even though our model assumes the knowledge of σ and σ1 , we
will shortly see that the SIDE never uses them in computing the optimal hypothesis;
hence the title of this subsection is justified. The only parameter required by the SIDE
is ν. We also point out that the distribution (4.10) is a reasonable one: it assigns larger
probabilities to the hypotheses with roughly as many zeros as ones. The reason for
choosing this particular form of the distribution is to make the generalized likelihood
simplify to the statistic of Proposition 4.2. This implies that the SIDE can be used
to find the optimal solution. More precisely, given an observation y of Y, we seek the
best hypothesis in the generalized likelihood ratio sense [66]: the maximum likelihood
estimates of the hypothesis and θ0 are calculated for each value of K, and these estimates
are then used in a multiple hypothesis testing procedure to estimate K. In other words,
we seek
(ĥ, θ̂0 , K̂) = arg max (log f1 (y|h, θ0 , k) + log pr(K = k)),
h,θ0 ,k

where f1 is the conditional pdf of Y. After simplifying this formula, we obtain that ĥ
must maximize
N −k X
N
def
φ(y, h) = h y−
T
yi
N
i=1
Ã Ã !!
1 X
N
= hT
y−a yi ,
N
i=1

where α(x) = x, and, as in Proposition 4.2, a(x) = (α(x), . . . , α(x))T ∈ IRN . Thus,
according to Proposition 4.2, in order to find ĥ, one has to evolve the SIDE whose
initial condition is the observed signal: u0 = y. The α-crossings of the solution u(t)
will then coincide with the optimal edges, where

1 X
N
α= ui (t). (4.11)
N
i=1
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 83

Thus, the only difference from Example 4.4 is that the threshold α0 (4.9) is unknown,
since 12 (θ1 + θ0 ) is unknown. The threshold α (4.11) can be considered as an estimate
of α0 . If the underlying signal has roughly as many samples with mean θ0 as ones with
mean θ1 , then α is a good estimate of α0 , and we expect the estimates of the edge
locations to be comparable to those in Example 4.4—i.e., despite less knowledge, the
optimal estimates of the edge locations in this example would be similar to the optimal
estimates of Example 4.4. This is confirmed by the experimental result for the data of
Example 4.4, shown in the bottom plot of Figure 4.3, which is still very good and differs
from the result of Example 4.4 in only two pixels out of the thousand. If the number of
samples with mean θ0 greatly differs from N2 , we would expect α to be a poor estimate
of α0 , which will lead to larger errors in the optimal estimates of edge locations. This
situation, however, is of low probability, according to our model (4.10).

¥ 4.4.3 Random Number of Edges and the Mumford-Shah Functional.

As in the two previous subsections, let y be an observation of a sequence of N random
variables, with conditional densities p(yi |h) = f (yi , θhi ), where hi ∈ {0, 1}N . Given h,
the yi ’s are conditionally independent. Let the number ν of edges in h be a random
variable with a probability mass function pν (ν̄), ν̄ = 0, 1, . . . , N − 1. To find the
maximum a posteriori estimate of ν, we need to maximize the conditional probability
of ν, which depends on the unknown h. We therefore, as in the previous subsection,
use the generalized likelihood strategy [66]: the maximum likelihood estimates of h are
calculated for each value of ν, and then these estimates are used in a multiple hypothesis
test to estimate ν. In other words, we pick h to maximize
X
N
log p(yi |h) + log pν (ν̄). (4.12)
i=1

We saw in Subsection 4.4.1 that the h-dependent part of the likelihood term is hT u0 ,
where u0 is the sequence of log-likelihood ratios. Therefore, maximizing (4.12) is equiv-
alent to minimizing the following statistic ψ(u0 , h):

ψ(u0 , h) = −hT u0 − log pν (ν̄). (4.13)

We now relate (4.13) to the SIDE.

Proposition 4.5. Suppose that pν is monotonically non-increasing: pν (0) ≥ pν (1) ≥
. . . ≥ pν (N − 1). Then the minimizer of ψ(u0 , ·) (4.13) can be computed using the
SIDE, with post-processing whose computational complexity is O(N ).
Proof. Let hψ be the hypothesis which minimizes ψ(u0 , ·), and, as previously, let

h∗≤ν̄ (u0 ), for ν̄ = 0, 1, . . . , N − 1 (4.14)

be the hypothesis which achieves the maximal hT u0 among all the hypotheses with ν̄
or fewer edges. Suppose we could show that hψ is actually one of the hypotheses (4.14).
84 CHAPTER 4. PROBABILISTIC ANALYSIS

Then we could compute hψ as follows: run the SIDE to compute the N hypotheses
(4.14), compute ψ(u0 , ·) for each of them, and pick the hypothesis which results in the
smallest ψ(u0 , ·). To complete the proof, we now show that hψ is indeed one of the
hypotheses (4.14).
Let us fix an arbitrary hypothesis h̄ with ν̄ edges, and let ν ∗ ≤ ν̄ be the number of
edges in the hypothesis h∗≤ν̄ (u0 ). Then, by the definition of h∗≤ν̄ (u0 ), we have:
© ªT
h∗≤ν̄ (u0 ) u0 ≥ h̄T u0 . (4.15)

The monotonicity of pν implies

log pν (ν ∗ ) ≥ log pν (ν̄). (4.16)

Summing the two inequalities (4.15) and (4.16), we get:

ψ(u0 , h∗≤ν̄ (u0 )) ≤ ψ(u0 , h̄).

In other words, for an arbitrary hypothesis h̄, we found an hypothesis from among
(4.14) which results in a smaller (or equal) ψ(u0 , ·). Therefore, the optimal hypothesis
hψ is among (4.14).
The second main result of this subsection is that having the exponential distribution
pν is equivalent to specifying a stopping rule for the SIDE.
Proposition 4.6. Let
e−λ − 1 −λν̄
pν (ν̄) = e , for ν̄ = 0, 1, . . . , N − 1, (4.17)
e−λN − 1
and let hψ be the hypothesis which minimizes ψ(u0 , ·) (4.13). Then the algorithm of
Subsection 4.3.1, with a modified stopping rule, will produce hψ . The new stopping
criterion is:

min Eij (t) > λ,

(i, j) is a region of u(t)

where Eij(t) is defined as in (4.5).

Proof. Substituting pν (4.17) into (4.13), we see that minimizing ψ(u0 , h) with
respect to h is equivalent to minimizing the following function η(h):

η(h) = −hT u0 + λν̄. (4.18)

Suppose that the solution to the SIDE has ν̄ + 1 zero crossings at some time instant
t, and call the hypothesis generated by these zero crossings h1 . Let the next region to
be removed be (i∗ , j ∗ ), and call the hypothesis resulting from its removal h2 . Let us
denote by E ∗ (t) the energy of the region (i∗ , j ∗ ):

E ∗ (t) = min Eij (t).

(i, j) is a region of u(t)
Sec. 4.4. Detection Problems Optimally Solved by the SIDE. 85

In order to determine which hypothesis is better with respect to η(h), we will look at
η(h2 ) − η(h1 ). First note that
¯ ∗ ¯
¯Xj ¯
¯ ¯
(h2 − h1 )T u0 = − ¯¯ u0n ¯¯ = −ρi∗ ,j ∗ E ∗ (t),
¯n=i∗ ¯

and therefore

η(h2 ) − η(h1 ) = −(h2 − h1 )T u0 + λ(ν̄ + 1 − ρi∗ ,j ∗ ) − λ(ν̄ + 1) = ρi∗ ,j ∗ (E ∗ (t) − λ).

So, if

E ∗ (t) > λ, (4.19)

we prefer h1 to h2 —i.e., it is better not to remove the region (i∗ , j ∗ ); otherwise, h2

is better than h1 —i.e., we prefer to merge. Note that, by definition, E ∗ (t) is a non-
decreasing function of time, and so if (4.19) holds for some t, it will also hold for all
future times. Also, if (4.19) is violated at some t, it is also violated for all past times.
So, (4.19) is indeed a stopping rule.
The proof is not over, however, due to the fact that, as we remarked in Section 4.3,
not every optimal hypothesis is generated by the SIDE. Therefore, we still need to rule
out the possibility that the SIDE skips over the optimal hypothesis hψ . In other words,
we need to make sure that the following situation does not occur:
• hψ = h∗≤ν̄ (u0 ) 6= h∗≤ν̄−1 (u0 ), and

• ρi∗ ,j ∗ = 2, i.e., the SIDE’s solution goes from ν̄ + 1 zero-crossings directly to ν̄ − 1.

The proof that this never happens is conceptually similar to the argument we made
above when comparing h1 and h2 . Since it is rather technical, we give it in Appendix
B. The final result is that, if the solution to the SIDE goes from ν̄ + 1 zero-crossings
directly to ν̄−1, skipping h∗≤ν̄ (u0 ), then h∗≤ν̄ (u0 ) cannot be hψ . Therefore, the algorithm
of Subsection 4.3.1 with the stopping rule (4.19) will find hψ .
The importance of Proposition 4.6 stems from the fact that it provides the stopping
rule for the SIDE (4.1) when the probabilistic model for the number of changes is
(4.17). A more difficult question, which is not addressed in this thesis, is the following.
Suppose it is known that the probability mass function of ν is of the form (4.17), but
the parameter λ is free to be chosen. How can one adaptively select λ? One could
approach this question by looking at the relative sizes of the energies E ∗ (t) to decide at
what time t the evolution starts to eliminate significant regions, rather than just those
due to noise. This problem is conceptually similar to choosing the order of a model,
and therefore various known techniques can be employed, like the minimum description
length principle [53].
We conclude this section by showing that a specific conditional density p(yi |h) turns
(4.18) into a special 1-D version of the Mumford-Shah functional [44]. It was proposed
86 CHAPTER 4. PROBABILISTIC ANALYSIS

in [44] to estimate a piecewise-smooth function u from its noisy observations y by

minimizing over u and Γ the following cost:
Z Z
1
(u − y) + γ
2
|∇u|2 + λν̄, (4.20)
2 IR2 IR2 \Γ

where Γ are the edges, i.e. the set on which u is discontinuous; ν̄ is the total length of
the edges; and γ and λ are constants which control the smoothness of u within regions
and the total length of the edges, respectively. If an approximation u is sought which is
constant within each region [36,43], the second term disappears. In 1-D, the integration
is over IR1 , and ν̄ is simply the number of the discontinuities in u. Assuming that we
seek a piecewise-constant approximation, we discretize the 1-D version of (4.20):
1
(u − y)T (u − y) + λν̄. (4.21)
2
PN
If one is looking for a binary approximation u = h ∈ {0, 1}N , then hT h = i=1 hi , and
so if we define
1
u0i = yi − , (4.22)
2
then minimizing (4.21) is equivalent to minimizing η(h) (4.18). We note that (4.22)
defines the log-likelihood ratios for the situation when p(yi |h) is the Gaussian density
with unit variance and mean hi . Indeed, in this case
1 1
log p(yi |hi = 1) − log p(yi |hi = 0) = − (yi − 1)2 + yi2
2 2
1
= yi − .
2
We have just shown the following.

Proposition 4.7. If

e−λ − 1 −λν̄
pν (ν̄) = e , for ν̄ = 0, 1, . . . , N − 1, and
e−λN − 1
1 1 2
p(yi |h) = √ e− 2 (yi −hi ) , for i = 1, . . . , N,
2π
then the generalized likelihood function is (4.18), which is
a) a special case of the Mumford-Shah functional for 1-D signals, and
b) according to Proposition 4.6, is optimized by the SIDE.
Sec. 4.5. Alternative Implementations. 87

¥ 4.5 Alternative Implementations.

¥ 4.5.1 Dynamic Programming.
The problem which is solved by the SIDE—namely, that of finding the optimal hypoth-
esis h∗≤ν (u0 ) with respect to the statistic φ of Proposition 4.2—can also be solved using
dynamic programming. The basic idea is to scan the samples of u0 from left to right, and
notice that the best hypothesis for the first i + 1 samples can be easily calculated from
the best hypothesis for the first i samples. More precisely, let u0i = (u01 , u02 , . . . , u0i )T
be the first i samples of u0 . Let φ(u0i , h) = hT (u0i − a), where h ∈ {0, 1}i and
a = (α, . . . , α)T ∈ IRi . Let Aiν̄p , for i = 1, . . . , N , ν̄ = 0, . . . , i − 1, and p = 0, 1, be
the set of all hypotheses h ∈ {0, 1}i which have exactly ν̄ edges, and which end with
p (i.e., hi = p). Let hiν̄p be the best hypothesis in the set Aiν̄p . Since φ is additive,
hi+1,ν̄,p is either (hTiν̄p , p)T or (hTi,ν̄−1,1−p , p)T . Therefore, once we have computed and
stored hiν̄p and the corresponding values of φ for ν̄ = 0, . . . , ν, and p = 0, 1, it will
take us one comparison to compute hi+1,ν̄,p for each ν̄ and p. Thus, to go from hi,ν̄,p to
hi+1,ν̄,p , we will need O(ν) computations. The total computational complexity is there-
fore O(νN ). The required memory is also O(νN ). Thus, if ν is smaller than log N ,
the dynamic programming approach is faster than the SIDE implemented as the region
merging method of Subsection 4.3.1. If ν is comparable to log N , the two methods
are similar in speed, but the SIDE wins in terms of memory use, since it only requires
O(N ) memory. Finally, if ν is larger than log N , the SIDE wins both in complexity and
memory. Another important advantage of the SIDE is that it generalizes to 2-D—as
shown in Section 4.7, while the dynamic programming algorithm does not.

¥ 4.5.2 An Equivalent Linear Program.

We now show that the problem of minimizing η (4.18), considered in Section 4.4, is a
linear programming problem.
Let D be the (N − 1)-by-N first-difference matrix. Then, for any hypothesis h ∈
{0, 1}N , we have that the `1 -norm of its first difference kDhk1 is exactly equal to the
number of edges in h. Therefore, we can re-write the problem of minimizing η (4.18)
as follows:

minimize −hT u0 + λkDhk1 , (4.23)

subject to h ∈ {0, 1} .
N

This problem is equivalent to the following linear program:

PN −1
minimize −hT u0 + λ i=1 ri (4.24)
subject to ri ≥ hi+1 − hi for i = 1, . . . , N − 1 (4.25)
ri ≥ hi − hi+1 for i = 1, . . . , N − 1 (4.26)
0 ≤ hi ≤ 1 for i = 1, . . . , N, (4.27)
88 CHAPTER 4. PROBABILISTIC ANALYSIS

where we replaced the requirement that h be binary with a seemingly less restrictive
condition that h belong to the unit hypercube [0, 1]N . Constraints (4.25) and (4.26)
mean that ri ≥ |hi+1 − hi |. On the other hand, (4.24) means that ri must be as small
as possible, and therefore ri = |hi+1 − hi |. The fact that the constraints (4.27) are
equivalent to h ∈ {0, 1}N is a little less obvious; it is verified in Appendix B (Section
B.5). We point out that any generic linear programming algorithm will lose out in speed
to the SIDE, because the SIDE exploits the special structure of the problem (4.23).

¥ 4.6 Performance Analysis.

This section is devoted to the performance analysis of the SIDE, for signals with one
change in mean. We first consider the event that the detected change location is farther
than a certain number of samples from the actual one. An expression for the lower
bound of the probability of this event is derived in Subsection 4.6.1; it is applied to the
analysis of signals with additive white Gaussian noise in Subsection 4.6.2. We address
the robustness of our algorithm in Subsection 4.6.3, by showing its optimality with
respect to a certain H∞ -like performance criterion.

¥ 4.6.1 Probability Bounds.

This section is devoted to the analysis of the example in Subsection 4.4.2, when the
number of edges is equal to one. That is, in order to maximize

N −k X 0
N
φ(u0 , h) = hT u0 − ui ,
N
i=1
P
we evolve the SIDE (4.1) until exactly one α-crossing remains, where α = N1 N 0
i=1 ui .
c c
We denote the correct hypothesis h and the correct location of the edge g . Without
loss of generality, we assume that the first g c samples of u0 have mean θ0 , the last
def
N − g c samples have mean θ1 , and that d = θ1 − θ0 > 0. We denote the detected edge
location g ∗ , its sign z ∗ , the corresponding hypothesis h∗ , and the number of zeros in it
k ∗ : if z ∗ = 1, then k ∗ = g ∗ ; if z ∗ = −1, then k ∗ = N − g ∗ .
Pick two integers, p0 and q0 , satisfying 1 ≤ p0 ≤ g c ≤ q0 ≤ N − 1, so that the
location g c of the true edge is between p0 and q0 . The goal of this section is to compute
a lower bound for the probability of the event
{p0 ≤ g ∗ ≤ q0 , z ∗ = 1}, (4.28)
which says that the detected edge location g ∗ is between p0 and q0 , and that the detected
sign of the edge is correct. The strategy will be to find a lower bound for the probability
of a simpler event which implies (4.28). Specifically, suppose that g ∗ > g c and z ∗ = 1.
Then
∗
X
g
∗
φ(u , h ) − φ(u , h ) =
0 0 c
(−u0i + α).
i=g c +1
Sec. 4.6. Performance Analysis. 89

Since h∗ is the optimal hypothesis, the above expression has to be positive. Thus, if

X
q
(−u0i + α) < 0 for q = q0 + 1, . . . , N, (4.29)
i=g c +1

then g ∗ cannot be larger than q0 . Similarly, if

c
X
g
(u0i − α) < 0 for p = 1, . . . , p0 − 1, (4.30)
i=p+1

then g ∗ cannot be smaller than p0 . Similar arguments show that, if

X
N X
N
(−u0i + α) + (−u0i + α) < 0 for q = g c + 1, . . . , N, and (4.31)
i=g c +1 i=q
c
X
g X
p
(u0i − α) + (u0i − α) < 0 for p = 1, . . . , g c , (4.32)
i=1 i=1

then the detected sign is correct, i.e. z ∗ = 1. Thus, the simultaneous occurrence
of the events (4.29)-(4.32) implies (4.28). If α were not random, the expressions in
(4.29)-(4.32) would be sums of independent identically distributed random variables,
and therefore we would be able to employ results from the theory of random walks.
We will remove the randomness of α from (4.29)-(4.32) by introducing a non-random
bound on how far α can be from its mean
def 1 c
m = (g θ0 + (N − g c )θ1 ).
N
In other words, suppose that there are two positive real numbers, δ1 and δ2 , such that

m − δ1 ≤ α ≤ m + δ2 . (4.33)

Then each of the inequalities

X
q
(−u0i + m + δ2 ) < 0 for q = q0 + 1, . . . , N (4.34)
i=g c +1

implies the corresponding inequality in (4.29). Let us call Aq the event that the q-th
inequality in (4.34) holds, for q = q0 + 1, . . . , N . We shall similarly bound the events
(4.30), by defining events Ap whose intersection implies (4.30):
 c 
 X g 
Ap = (u0i − (m − δ1 )) < 0 , for p = 1, . . . , p0 − 1. (4.35)
 
i=p+1
90 CHAPTER 4. PROBABILISTIC ANALYSIS

We shall call A0q and A0p the events which imply (4.31) and (4.32), respectively:
  
 X N XN 
A0q =  +  (−u0i + m + δ2 ) < 0, for q = g c + 1, . . . , N ; (4.36)
 
i=g c +1 i=q
(Ã gc ! )
X X p
A0p = + (u0i − (m − δ1 )) < 0 , for p = 1, . . . , g c . (4.37)
i=1 i=1
c
Let ε1 be the union (upper) bound for the probability of ∪gp=1 A0p , where the overbar
denotes the complement:
c
X
g
ε1 = Pr(A0p ). (4.38)
p=1

Suppose further that p1 is a lower bound for the probability of the intersection of the
events Ap :
 
0 −1
p\
p1 ≤ Pr  Ap  . (4.39)
p=1

Then
Ãp −1 Ã gc !!   
\
0 \ 0 −1
p\ [
gc
Pr Ap ∩ A0p = Pr  Ap ∩  A0p  (4.40)
i=1 i=1 i=1 i=1
Ãp −1 ! Ãp −1 Ã gc !!
\
0 \
0 [
= Pr Ap − Pr Ap ∩ A0p (4.41)
i=1 i=1 i=1
Ã gc !
[
≥ p1 − Pr A0p (4.42)
i=1
≥ p1 − ε1 , (4.43)

where we used the identity ∩A0p = ∪A0p in (4.40), the identity Pr(A ∩ B) = Pr(A) −
Pr(A ∩ B) in (4.41), and the inequality −Pr(A ∩ B) ≥ −Pr(B) in (4.42). Similarly, the
probability of the intersection of (4.34) and (4.36) is bounded from below by p2 − ε2 ,
where
X
N
ε2 = Pr(A0q ),
q=g c +1
 
\
N
p2 ≤ Pr  Aq  .
q=q0 +1
Sec. 4.6. Performance Analysis. 91

Finally, if we denote by 1 − ε the probability of (4.33), we have that the intersection of

the events (4.33)-(4.37) is lower-bounded by

(p1 − ε1 )(p2 − ε2 ) − ε. (4.44)

We showed earlier in this section that the intersection of these events implies the inter-
section of the events (4.29)-(4.32), which, in turn, implies the event (4.28). Thus, the
above expression (4.44) is a lower bound for the probability of the event (4.28).
In [28], asymptotic probabilities of the events (4.28) are computed, for N → ∞ and
g c → ∞. When α is non-random (as in, e.g., our Examples 4.4 and 4.5 of Subsection
4.4.1), these asymptotic probabilities are also (non-asymptotic) lower bounds. In the
process of computing these, lower bounds p1 (4.39) and p2 are also computed in [28];
these are asymptotically tight. In the next subsection, we describe a different method
for computing p1 and p2 for the Gaussian case (i.e. when the model of Subsection
4.4.2 applies). Our method produces looser bounds than [28]; however, the derivation
is conceptually much simpler and leads to easier computations.

¥ 4.6.2 White Gaussian Noise.

Proposition 4.8. Suppose that the following model applies:

u0 = m + w,

where m = (θ0 , . . . , θ0 , θ1 . . . , θ1 )T is the mean vector of u0 , and w is zero-mean white

Gaussian noise with variance σ 2 . Then

Pr(p ≤ g ∗ ≤ q0 and z ∗ = 1) ≥
 0 
 µ d √g c − p + 1 ¶ µ √ c
d 1 g − p0 + 1
¶ X gc µ c
d1 (p + g )
¶
1 0
Φ −Φ − − Φ − √ ×
 σ σ σ 3p + g c 
p=1
 
 µ √ ¶ µ √ ¶ µ ¶
X d2 (2N − q − g c ) 
N
d2 q0 − g c + 1 d 2 q0 − g c + 1
× Φ −Φ − − Φ − √ −
 σ σ σ 4N − 3q − g c 
q=g c +1
Ã √ ! Ã √ !
δ2 N δ1 N
− Φ − −Φ − , (4.45)
σ σ

where

• Φ is the Gaussian cumulative distribution function;

(N −g c )d
• d1 = N − δ1 ;
gc d
• d2 = N − δ2 ;

• δ1 and δ2 are any positive real numbers such that d1 > 0 and d2 > 0.
92 CHAPTER 4. PROBABILISTIC ANALYSIS

Proof. The terms of this bound come from calculating the parameters p1 , p2 , ε1 ,
ε2 , and ε of the expression (4.44). We now present
P this0 calculation.
It follows from our noise model that α = N1 N i=1 ui is Gaussian with mean
σ2
N (g θ0 + (N − g )θ1 ) and variance N . Thus, the probability ε that (4.33) does not
1 c c

hold is
Z δ2 µ ¶ Ã √ ! Ã √ !
σ2 δ1 N δ2 N
ε=1− N 0, =Φ − +Φ − , (4.46)
−δ1 N σ σ

where N is the Gaussian density.

In order to find ε1 , re-write the event A0p as

X
p X
g c µ ¶
(N − g c )d
2 wi + wi − (p + g ) c
− δ1 > 0.
N
i=1 i=p+1

The sum of the noise samples is a zero-mean Gaussian random variable with variance
σ 2 (4p + (g c − p)) = σ 2 (3p + g c ). Therefore, if we define

(N − g c )d
d1 = − δ1 ,
N
then the probability of A0p is
µ ¶
d1 (p + g c )
Φ − √ .
σ 3p + g c

Substituting this into (4.38), we get:

X
g c µ ¶
d1 (p + g c )
ε1 = Φ − √ .
p=1
σ 3p + g c

To find a lower bound p1 (see Equation (4.39)), we define S0 = 0 and

c
1 X
g
Sj = wi for j = 1, . . . , g c ,
σ
i=g c −j+1

and note that the intersection of the events Ap of Equation (4.35) is equivalent to

d1
Sj < j for j = g c − p0 + 1, ..., g c − 1.
σ
Also note that the Sj ’s form the standard (discrete) Brownian motion [52], which can
be viewed as a sampling of the standard continuous Brownian motion S(t) at integer
Sec. 4.6. Performance Analysis. 93

times. We define t0 = g c − p0 + 1, s0 = S(t0 ), and f (s0 ) = the pdf of s0 . Then

 
0 −1
p\
d1
Pr  Ap  = Pr(Sj < j for j = t0 , ..., g c − 1)
σ
p=1
d1
≥ Pr(S(t) < t for t0 ≤ t ≤ g c − 1)
σ
Z d1 t0 µ ¶
σ d1
= Pr S(t) < t for t > t0 |s0 f (s0 )ds0 .
−∞ σ
Conditioned on s0 , the process P (t) = S(t + t0 ) − s0 is the standard Brownian motion
for t ≥ 0. Therefore, the integral above is equal to
Z d1 t0 µ ¶
σ d1
Pr P (t) < (t + t0 ) − s0 for t > 0|s0 f (s0 )ds0 =
−∞ σ
Z 1 t0
d µ ¶
σ d1 d1
Pr sup(P (t) − t) < t0 − s0 |s0 f (s0 )ds0 .
−∞ t≥0 σ σ

Given s0 , P (t) − dσ1 t is a Brownian motion with drift − dσ1 . If the drift is non-negative,
then the probability inside the integral is zero [52]. We therefore assume that d1 > 0,
i.e., that
(N − g c )d
δ1 < .
N
Then the drift is negative, in which case the supremum is finite almost surely, and its
probability distribution is [52]
µ ¶
d1
1 − exp −2 x for x ≥ 0, and zero otherwise.
σ
Substituting this into the integral above, we get
Z d1 t0 ½ · µ ¶¸¾ µ ¶
σ d1 d1 1 s20
1 − exp −2 t0 − s0 √ exp − ds0 =
−∞ σ σ 2πt0 2t0
µ √ ¶ Z d1 t0 Ã !
d1 t0 σ 1 (s0 − 2dσ1 t0 )2
Φ − √ exp − ds0 =
σ −∞ 2πt0 2t0
µ √ c ¶ µ √ ¶
d1 g − p0 + 1 d 1 g c − p0 + 1
Φ −Φ − .
σ σ
Combining the values for p1 , ε1 , and ε, obtained above, with similarly obtained values
c
for p2 and ε2 (where d2 = gNd − δ2 ), we arrive at the expression (4.45). As we mentioned
c
above, this bound is looser than those of [28]. For example, if N is very large, gN = 0.5,
d = 3σ, and p0 = q0 = g c , then the asymptotic probability (from Table 3.3 of [28]) is
0.857, whereas our bound is 0.751.
94 CHAPTER 4. PROBABILISTIC ANALYSIS

¥ 4.6.3 H∞ -Like Optimality.

We continue the analysis of detecting one change in mean in a sequence of Gaussian
random variables, but now we assume that both means are known, and therefore α = 0,
as in Example 4.4 of Subsection 4.4.1. We now show that the change location estimate
produced by the SIDE—which in this case, according to Section 4.3, is the maximum
likelihood estimate—is optimal according to an H∞ -like criterion.
H∞ estimation and control arose out of situations when there is no complete a pri-
ori knowledge of the system dynamics and of the statistical properties of the exogenous
inputs. Model uncertainties and the lack of statistical information encountered in many
applications have led to research in minimax estimation, producing so-called H∞ algo-
rithms which are robust to parameter variations [25, 26, 45, 61]. In these approaches,
the quantity to be minimized is the H∞ norm of the operator mapping the inputs to
the desired function (either the output or a weighted error).
We will analyze the following problem.

Problem 4.2. Let d > 0 be a known real number. Consider a step sequence mgc =
(0, . . . , 0, d . . . , d)T of length N , whose first g c entries are zeros. Let the observed signal
be

y = mgc + v,

where v is an unknown disturbance. The objective is to estimate the location of change

g c ∈ {0, . . . , N } (g c = 0 and g c = N refer to the same hypothesis, corresponding to no
change).

As stated in Section 4.3, if v is a zero-mean white Gaussian noise, then the SIDE
will find the ML estimate, i.e.

X
N
d
c
ĝM L = arg max (yi − ). (4.47)
g 2
i=g+1

To analyze the robustness of this estimator, we define the following performance mea-
sure:
|g c − f (y)|
B(f ) = sup ,
g c ,v6=0 kvk1

where f (y) is any estimator of g c , and k · k1 stands for the `1 norm. Choosing the
estimator which minimizes B is similar in spirit to H∞ estimation: we would like to
minimize the worst possible error, over all possible disturbances. We presently show
that the SIDE estimator (4.47) does minimize B. This means that our estimator is
robust: it has the best worst-case error performance among all estimators, and for all
noise sequences.
We will prove the optimality of the SIDE estimator by showing two things:
Sec. 4.6. Performance Analysis. 95

• that B(f ) is always larger than a certain constant (see Proposition 4.9 below),
and
• that the SIDE estimator achieves this lower bound (see Proposition 4.10 below).
Proposition 4.9. For any estimator f , B(f ) ≥ d2 .
Proof. Fix the noise level at kvk1 = d2 , and suppose that the observation is y =
(0, d2 , d, . . . , d)T . The signal mgc which resulted in this y after adding noise of norm d2
could be either m1 = (0, d, d, . . . , d)T or m2 = (0, 0, d, . . . , d)T , in which cases g c = 1
and g c = 2, respectively. Thus,
|g c − f (y)|
B(f ) ≥ sup d
=
g c ∈{1,2} 2
2
= sup |g c − f (y)| =
d gc ∈{1,2}
½ 2
= d |2 − f (y)| if f (y) = 1
≥
2
d |1 − f (y)| if f (y) ≥ 2
2
≥ .
d
We will now show that the ML estimator achieves this bound.

L = fM L (y), B(fM L ) =
c 2 c
Proposition 4.10. For ĝM d. Thus, the estimator ĝM L is
optimal with respect to the criterion B.
c
Proof. Suppose that ĝM c
L > g . Then (4.47) implies

X
N
d X
N
d
(yi − ) > (yi − ) ⇒
c
2 2
i=ĝM L +1
i=g c +1
c
ĝM
X L
d
(yi − ) < 0 ⇒
2
i=g c +1
c
ĝM
X L
d
((d + vi ) − ) < 0 ⇒
2
i=g c +1
c
ĝM
d X L
c
(ĝM L −g )c
< − vi ≤ kvk1 .
2
i=g c +1

Therefore, the smallest `1 norm of the disturbance required to create the error ĝM L −g
c c

is (ĝM L − g ) 2 . The calculations for ĝM L < g are similar, and so

c c d c c

¥ 4.7 Analysis in 2-D.

In 1-D, there are always one fewer edges than regions. In 2-D, there is no such relation-
ship between the total length of the boundaries and the total number of regions. From
this difference stem two ways to generalize the SIDE (4.1) to 2-D. The first one is to
evolve the intensity value inside each region according to
1
u̇i = sgn(α − ui )pi , (4.48)
mi
where mi is the number of pixels inside the i-th region, and pi is the perimeter of the
i-th region. (We still define the boundaries between regions as lines of α-crossings.)
The second possible 2-D generalization of the SIDE has the same form, with pi being
the number of neighboring regions of the region i.
The 2-D SIDEs unfortunately do not find the global solution to the optimization
problem of Proposition 4.2. However, we believe that a similar but weaker statement
can be made—namely, that they find optimal coarsenings of the initial segmentation.
We briefly repeat the definitions from Chapter 2 which are used in this statement.
A segmentation of a given image u0 is a partitioning of the domain Ω of its definition
into several disjoint regions O1 , ... , Ok (∪ki=1 Oi = Ω). If a segmentation E1 = ∪Oj0 can
be obtained from a segmentation E = ∪Oi by erasing edges, it is said that E1 is coarser
than E. More precisely, E1 is coarser than E when for every region Oi of E there is a
region Oj0 of E1 such that Oi ⊂ Oj0 .
Conjecture 1. As in Proposition 4.2, let φ(u, h) = hT (u − a). Let E be the
segmentation of u0 given by the α-crossings of u0 . Consider the SIDE (4.48) with u0
as the initial data and with pi equal to the perimeter of the i-th region. Suppose the
evolution is stopped when the total perimeter of the α-crossings is ν, and let E1 be
the segmentation given by these α-crossings. Then E1 is the optimal coarsening of E,
subject to the constraint that the segmentation have total edge length ν or less.
Conjecture 2. The situation is the same as in Conjecture 1, except that pi is the
number of neighbors of the i-th region. Suppose the evolution is stopped when the total
number of regions is ν, and let E1 be the resulting segmentation. Then E1 is the optimal
coarsening of E, subject to the constraint that the segmentation have ν or fewer regions.

We note that, while (4.48) is the equation we have conjectures about, it is not the
equation we use in practice for images. The reason is that thresholding the initial
condition u0 with the threshold α typically leads to initial segmentations which are
too coarse—that is, which have too few regions. Even if the evolution then provides
the best coarsening of this initial segmentation, it may not be a good result. Better
results are achieved when one first evolves the SIDE using (3.12), and then applies the
threshold α to the image u(t). This is what was done in examples of Figure 4.5, which
are experimental evidence of the fact that the algorithm works well and is very robust
to degradations which do not conform well to the models of Section 4.3. The data on
the left is a very blurry and noisy synthetic aperture radar image of two textures: forest
Sec. 4.7. Analysis in 2-D. 97

100 50

200 100
100 200 50 100

Figure 4.5. Edge detection in 2-D.

and grass. The pervasive speckle noise is inherent to this method of data collection. The
algorithm was run on the raw data itself (which corresponds to assuming a Gaussian
model with changes in mean—see Section 4.3), and stopped when two regions remained.
The resulting boundary (shown superimposed onto the logarithm of the original image)
is extremely accurate. The logarithm of a similarly blurry and noisy ultrasound image
of a thyroid is shown on the right, with the boundary detected by the SIDE.
98 CHAPTER 4. PROBABILISTIC ANALYSIS
Chapter 5

Segmentation of Color, Texture,

and Orientation Images

HE preceding chapters were all devoted to the analysis of images and signals which
T take values in IR. It is often necessary, however, to process vector-valued images
where each pixel value is a vector belonging to IRM , with M ≥ 1. The entries of this
vector could correspond to red, green, and blue intensity values in color images [74],
to data gathered from several sensors [33] or imaging modalities [68], or to the entries
of a feature vector obtained from analyzing a texture image [10]. In the next section,
we generalize our SIDEs to vector-valued images and argue that most properties of
scalar-valued SIDEs still apply. We then give several examples of segmentation of color
and texture images.
Section 5.2 treats images whose every pixel belongs to a circle S1 . Such images arise
in the analysis of orientation [48] and optical flow [30].

¥ 5.1 Vector-Valued Images.

We use one of the properties discussed in Chapter 3 in order to generalize our evolution
equations to vector-valued images. We recall that a SIDE is the gradient descent for
the global energy (2.8), which we re-write as follows:
X
E= E(kuj − ui k),
i,jare neighbors

where E is a SIDE energy function (Figure 2.6). The norm k · k here stands simply for
the absolute value of its scalar argument. Now notice that we still can use the above
equation if the image under consideration is vector-valued, by interpreting k · k as the
`2 norm. To remind ourselves that the pixel values of vector-valued images are vectors,
we will use arrows to denote them:
X
E= E(k~uj − ~ui k), (5.1)
i,jare neighbors

99
100 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES

Figure 5.1. Spring-mass model for vector-valued diffusions. This figure shows a 2-by-2 image whose
pixels are two-vectors: (2,2), (0,0), (0,1), and (1,2). The pixel values are depicted, with each pixel
connected by springs to its neighboring pixels.

where

~ui = (ui,1 , . . . , ui,M )T ∈ IRM ,

~uj = (uj,1 , . . . , uj,M )T ∈ IRM , and
v
uM
uX
k~uj − ~ui k = t (uj,k − ui,k )2 .
k=1

We will call the collection of the k-th entries of these vectors the k-th channel of the
image. In this case, the gradient descent equation is:
1 X ~uj − ~ui
~u˙i = F (k~uj − ~ui k)pij . (5.2)
mi k~uj − ~ui k
j∈Ai

(This notation combines M equations—one for each channel—in a single vector equa-
tion). Just as for the scalar images, we merge two neighboring pixels at locations i and
j when their values become equal: ~uj (t) = ~ui (t). Just as in Chapter 3, mi and Ai are
the area of the i-th region and the set of its neighbors, respectively. The length of the
boundary between regions i and j is pij .
A slight modification of the spring-mass models of Chapter 3, Figures 3.1 and 3.3,
can be used to visualize this evolution. We recall that those models consisted of particles
forced to move along straight lines, and connected by springs to their neighbors. If
we replace each straight line with an M -dimensional space, we will get the model
corresponding to the vector-valued equation, with pixel values in IRM . For example,
when M = 2, the particles are forced to move in 2-D planes. Another way to visualize
the system is by depicting all the particles as points in a single M -dimensional space,
as shown in Figure 5.1. Each pixel value ~ui = (ui,1 , ui,2 )T is depicted in Figure 5.1 as
a particle whose coordinates are ui,1 and ui,2 . Each particle is connected by springs to
its neighbors. The spring whose length is v exerts a force whose absolute value is F (v),
and which is directed parallel to the spring.
By using techniques similar to those in Chapter 3, it can be verified that Equation
(5.2) inherits many useful properties of the scalar equation. Namely, the conservation
of mean and the local maximum principle hold for each channel; the equation reaches
Sec. 5.1. Vector-Valued Images. 101

(a) (b) (c)

Figure 5.2. (a) A test image; (b) its noisy version (normalized); (c) detected boundary, superimposed
onto the noise-free image

(a) (b) (c)

Figure 5.3. (a) A test image; (b) its noisy version (normalized); (c) detected boundary, superimposed
onto the noise-free image

the steady state in finite time and has the energy dissipation properties described in
Subsection 3.4.2. For usual SIDE force functions, there is no sliding for vector images—
just as there was no sliding for scalar images. However, for force functions which are
infinite at zero, such as that of Figure 3.8, the sliding property holds, and therefore so
does well-posedness. Vector-valued SIDEs are also robust to severe noise, as we show
in the experiments.

¥ 5.1.1 Experiment 1: Color Images.

We start by applying a vector-valued SIDE to the color image in Figure 5.2. The
image in Figure 5.2, (a) consists of two regions: two of its three color channels undergo
an abrupt change at the boundary between the regions. More precisely, the {red,
green, blue} channel values are {0.1, 0.6, 0.9} for the background and {0.6, 0.6, 0.5} for
the square. Each channel is corrupted with independent white Gaussian noise whose
standard deviation is 0.4. The resulting image (normalized in order to make every pixel
of every channel be between 0 and 1) is shown in Figure 5.2, (b). We evolve a vector-
valued SIDE on the noisy image, until exactly two regions remain. The final boundary,
102 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES

30 30

60 60
30 60 30 60

(a) (b)
Figure 5.4. (a) Image of two textures: fabric (left) and grass (right); (b) the ideal segmentation of
the image in (a).

30 30 30

60 60 60
30 60 30 60 30 60

(a) (b) (c)

30 30 30

60 60 60
30 60 30 60 30 60

(d) (e) (f)

Figure 5.5. (a-c) Filters; (d-f) Filtered versions of the image in Figure 5.4, (a).

superimposed onto the initial image, is depicted in Figure 5.2, (c). Just as in the scalar
case, the algorithm is very accurate in locating the boundary: less than 0.2% of the
pixels are misclassified in this 100-by-100 image.
A similar experiment, with the same level of noise, is conducted for a more com-
plicated shape, whose image is in Figure 5.3, (a). The result of processing the noisy
image of Figure 5.3, (b) is shown in Figure 5.3, (c). In this 200-by-200 image, 0.8% of
the pixels are misclassified.

¥ 5.1.2 Experiment 2: Texture Images.

Many algorithms for segmenting vector-valued images may be applied to texture seg-
mentation, by extracting features from the texture image and presenting the image con-
sisting of these feature vectors to the segmentation algorithm [36]. This is the paradigm
we adopt here. We note that the problem of feature extraction is very important: it
is clear that the quality of features will influence the quality of the final segmenta-
tion. However, this problem is beyond the scope of the present thesis. The purpose of
this experiment is to show the feasibility of texture segmentation using SIDEs, given a
Sec. 5.1. Vector-Valued Images. 103

30 30

60 60
30 60 30 60

(a) (b)
Figure 5.6. (a) Two-region segmentation, and (b) its deviation from the ideal one.

30 30 30

60 60 60
30 60 30 60 30 60

(a) (b) (c)

Figure 5.7. (a) A different feature image: the direction of the gradient; (b) the corresponding two-
region segmentation, and (c) its deviation from the ideal one.

reasonable set of features.

Our first example is the image depicted in Figure 5.4, (a), which consists of two
textures: fabric (left) and grass (right). The underlying texture images whose grayscale
versions were used to create the image of Figure 5.4, (a), were taken from the VisTex
database of the MIT Media Lab [42]. The correct segmentation is in Figure 5.4, (b). We
extract features by convolving our image with the three Gabor functions [40] depicted
in Figure 5.5, (a-c), which essentially corresponds to computing smoothed directional
derivatives. The results of filtering our image with these functions are shown in Figure
5.5, (d-f). These are the three channels of the vector-valued image which we use as
the initial condition for the SIDE (5.2). Figure 5.6, (a) is the resulting two-region
segmentation, and its difference from the ideal segmentation is in Figure 5.6, (b). The
result is very accurate: about 3.2% of the pixels are classified incorrectly, all of which
are very close to the initial boundary.
The next example illustrates the importance of the pre-processing step. We again
segment the image of Figure 5.4, (a), but this time we use a different feature. Instead
of the three features of Figure 5.5, we use a single feature, which is the absolute value
of the gradient direction, depicted in Figure 5.7, (a). More precisely, the (i, j)-th pixel
value of the feature image is
√
|angle(ui,j+1 − ui,j + −1(ui+1,j − ui,j ))2 |,

where ui,j is the (i, j)-th pixel of the original image. This leads to a significant im-
provement in performance, shown in the rest of Figure 5.7: the shape of the boundary
is much closer to that of the ideal boundary, and the number of misclassified pixels is
104 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES

50 50

100 100

50 100 50 100
(a) (b)
Figure 5.8. (a) Image of two wood textures; (b) the ideal segmentation of the image in (a).

50 50 50

100 100 100

50 100 50 100 50 100

(a) (b) (c)
Figure 5.9. (a) Feature image for the wood textures; (b) the corresponding five-region segmentation,
and (c) its deviation from the ideal one.

only 2% of the total area of the image.

In our next example, we segment the image of Figure 5.8, which involves two wood
textures generated using the Markov random field method of [37]. We again use the
direction of the gradient as the sole feature (Figure 5.9, (a)). The five-region segmenta-
tion of Figure 5.9, (b) is again very accurate, with the area between our boundary and
the ideal one being only 1.6% of the total area of the image (Figure 5.9, (c)).
Our final example of this subsection involves the same wood textures, but arranged
in more complicated shapes (Figure 5.10). The feature image and the corresponding
segmentation (2.3% of the pixels are misclassified) are shown in Figure 5.11. We will see
in the next section that if the two textures only differ in orientation (as in the last two
examples), then performance can be improved by using a different version of SIDEs,
adapted for processing orientation images.
We close this section by noting that in all the examples of this subsection, our
segmentation algorithm performed very well, despite the fact that the feature images
were of poor quality. No attempt was made to optimize the pre-processing step of
extracting the features.
Sec. 5.1. Vector-Valued Images. 105

100 100

200 200

100 200 100 200

(a) (b)
Figure 5.10. (a) Another image of two wood textures; (b) the ideal segmentation of the image in (a).

100 100 100

200 200 200

100 200 100 200 100 200

(a) (b) (c)
Figure 5.11. (a) Feature image for the wood textures in Figure 5.10, (a); (b) the corresponding
five-region segmentation, and (c) its deviation from the ideal one.
106 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES

E(v)

−π π v

Figure 5.12. A SIDE energy function which is flat at π and −π and therefore results in a force
function which vanishes at π and −π.

¥ 5.2 Orientation Diffusions.

In [48], Perona used the linear heat diffusion as a basis for developing orientation dif-
fusions. In this section, we follow a similar procedure to adapt SIDEs for processing
orientation images. In other words, we consider the case when the image to be pro-
cessed takes its values on the unit circle S1 , i.e., each pixel value is a complex number
with unit magnitude. We define the signed distance ζ(a, b) between two such numbers
a and b as their phase difference:
def √ a
ζ(a, b) = − −1 log , (5.3)
b
where log is the principal value, i.e., −π ≤ ζ < π.
We would like the force acting between two antipodal points to change continuously
as the distance between them changes from π − 0 to −π + 0 (i.e., from slightly smaller
than π to slightly larger than −π). We therefore restrict our attention to SIDE energy
functions for which E 0 (π) = E 0 (−π) = 0, as shown in Figure 5.12. We define the
following global energy:
X
E= E(|ζ(uj , ui )|). (5.4)
i,jare neighbors

We define SIDEs for circle-valued images and signals as the following gradient descent
equation:
u̇ = −∇E,
with the i-th pixel evolving according to:
u̇i = −∇i E,
where ∇i is the gradient taken on the unit circle S1 . To visualize this evolution, the
straight vertical lines of the spring-mass models of Figures 3.1 and 3.3 are replaced with
circles: each particle is moving around a circle. After taking the gradients, simplifying,
and taking into account merging of pixels, we obtain that the differential equation
governing the evolution of the phase angles of ui ’s is very similar to the scalar SIDE:
1 X
θ˙i = F (ζ(uj , ui ))pij , (5.5)
mi
j∈Ai
Sec. 5.2. Orientation Diffusions. 107

where θi is the phase angle of ui (we use the convention 0 ≤ θi < 2π, and identify
θi = 2π with θi = 0). The rest of the notation is the same as in Chapter 3. Two
neighboring pixels are merged when they have the same phase angle.
While this evolution has many similarities to its scalar and vector-valued counter-
parts, it also has important differences, stemming from the fact that it operates on the
phase angles. Thus, it is not natural to talk about the mean of the (complex) values
of the input image; instead, this evolution preserves the sum of the phases, modulo 2π.
This is easily verified by summing up the equations (5.5).

Property 5.1 (Total phase conservation). The phase angle of the product of all
pixel values stays invariant throughout the evolution of (5.5).

Another important distinction from the scalar and vector SIDEs is the existence of
unstable equilibria. Since F (π) = E 0 (π) = 0, it follows that if two neighboring pixels
have values which are two antipodal points on S1 , these points will neither attract not
repulse each other. Thus, unlike the scalar and vector-valued equations whose only
equilibria were constant images, many other equilibrium states are possible here. The
next property, however, guarantees that the only stable equilibria are constant images.

Property 5.2 (Equilibria). Suppose that all the pixel values of an image u have the
same phase. Then u is a stable equilibrium of (5.5). Conversely, such images u are the
only stable equilibria of (5.5).

Proof. If all the phases are the same, then the whole image u is a single region,
which therefore has no neighbors and is not changing. Moreover, if a pixel value is
perturbed by a small amount, forces exerted by its neighbors will immediately pull it
back. Thus, it is a stable equilibrium.
Suppose now that an equilibrium image u has more than one region. Let us pick an
arbitrary region, call its value u∗ , and partition the set of its neighboring regions into
two subsets: U = {u1 , . . . , up }, whose every element pulls u∗ in the counter-clockwise
direction (i.e., U is comprised of those regions for which the phase of uu∗i is positive
and strictly less than π), and the set V = {v1 , . . . , vq }, whose elements pull u∗ in the
clockwise direction (i.e., those regions for which the phase of uv∗i is negative and greater
than or equal to −π). One of the sets U, V —but not both—can be empty.
Since our system (5.5) is in equilibrium, it means that the resultant force acting on
u∗ is zero—i.e., the right-hand side of the corresponding differential equation is zero.
Suppose now that u∗ is slightly perturbed in the clockwise direction. Since the force
function F is monotonically decreasing, this means that the resultant force exerted on
u∗ by the regions comprising the set V will increase, and the resultant force exerted by
U will decrease. The net result will be to further push u∗ in the clockwise direction.
A similar argument applies if u∗ is perturbed in the counter-clockwise direction. Thus,
the equilibrium is unstable, which concludes the proof.
For any reasonable probabilistic model of the initial data, the probability of at-
taining an unstable equilibrium during the evolution is zero. In any case, a numerical
108 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES

30 30 30

60 60 60
30 60 30 60 30 60

(a) (b) (c)

Figure 5.13. (a) The orientation image for Figure 5.7, (a); (b) the corresponding two-region segmen-
tation, and (c) its deviation from the ideal one.

implementation can be designed to avoid such equilibria. Therefore, in the generic case,
the steady state of the evolution is a stable equilibrium, which, by the above property,
is a constant image. This corresponds to the coarsest segmentation (everything is one
region), just like in the scalar-valued and vector-valued cases.
Since there is no notion of a maximum on the unit circle, there is no notion of a
“maximum principle”, either. This means, moreover, that we cannot mimic the proof
of Property 3.2 (finite evolution time) of the scalar evolutions. This property does hold
for the evolutions on a circle, but the proof is different. Specifically, Property 3.7 holds
here, with a similar proof, which means that between two consecutive mergings, the
global energy is a concave decreasing function of time (Figure 3.7). It will therefore
reach zero in finite time, at which point the evolution will be at one of its equilibria.
Property 5.3 (Finite evolution time). The SIDE (5.5) reaches its equilibrium in
finite time, starting with any initial condition.

¥ 5.2.1 Experiments.
To illustrate segmentation of orientation images, we use the same texture images which
we used in the previous section. To extract the orientations, we use the direction of the
gradient. The (i, j)-th pixel value of the orientation image is
√
angle[(ui,j+1 − ui,j + −1(ui+1,j − ui,j ))]2 ,

where ui,j is the (i, j)-th pixel value of the raw texture image. (Note that the absolute
value of this orientation image was used as a feature image in the previous section.)
The expression in the above formula is squared so as to equate the phases which differ
by π.
The orientations for the fabric-and-grass image are shown in Figure 5.13, (a). We
present this image as the initial condition to the circle-valued SIDE (5.5) and evolve it
until two regions remain. The resulting segmentation is depicted in Figure 5.13, (b),
and its difference from the ideal one is in 5.13, (c). About 4.3% of the total number of
pixels are classified incorrectly. It is not surprising that this method performs slightly
worse on this example than the evolutions of the previous section. Indeed, if a human
were asked to segment the original texture image based purely on orientation, he might
Sec. 5.3. Conclusion. 109

50 50 50

100 100 100

50 100 50 100 50 100

(a) (b) (c)
Figure 5.14. (a) The orientation image for Figure 5.9, (a); (b) the corresponding five-region segmen-
tation, and (c) its deviation from the ideal one.

100 100 100

200 200 200

100 200 100 200 100 200

(a) (b) (c)
Figure 5.15. (a) The orientation image for Figure 5.10, (a); (b) the corresponding five-region segmen-
tation, and (c) its deviation from the ideal one.

make similar errors. Note in particular that the protrusion in the upper portion of the
boundary found by the SIDE corresponds to a horizontally oriented piece of grass in
the original image, which can be mistaken for a portion of the fabric.
In the second example, however, the orientation information is very appropriate
for characterizing and discriminating the two differently oriented wood textures. The
orientation image is in Figure 5.14, (a). The resulting five-region segmentation of Figure
5.13, (b) incorrectly classifies only 252 pixels (1.5%) in the 128-by-128 image, which is
14 pixels better than the method of the previous section. A more dramatic improvement
is achieved in the example of Figure 5.15, which shows the five-region segmentation of
the image in Figure 5.10, (a), using the circle-valued SIDE (5.5). Only 1.5% of the
pixels are misclassified, as compared to 2.3% using the method of the previous section.

¥ 5.3 Conclusion.
In this chapter, we generalized SIDEs to the situations when the image to be pro-
cessed is not scalar-valued. We described the properties of the resulting evolutions and
demonstrated their application to segmenting color, texture, and orientation images.
110 CHAPTER 5. SEGMENTATION OF COLOR, TEXTURE, AND ORIENTATION IMAGES
Chapter 6

Conclusions and Future Research

¥ 6.1 Contributions of the Thesis.

N this thesis we have presented a new approach to edge enhancement and segmen-
I tation which is robust to very high levels of noise and blurring. Our approach is
based on a new class of evolution equations for the processing of imagery and signals
which we have termed stabilized inverse diffusion equations or SIDEs. These evolutions,
which have discontinuous right-hand sides, have conceptual and mathematical links to
other evolution-based and variational methods in signal and image processing, such as
the Perona-Malik equation, shock filters, total variation minimization, region merging
methods, the Geman-Reynolds functional, and the Mumford-Shah functional. SIDEs,
however, have their own unique qualitative characteristics and properties which were
described in Chapter 3 and led to improved performance. Specifically, a number of sta-
bility properties were derived, such as well-posedness of the 1-D equation, the maximum
principle, and—for almost all input images—the invariance of the resulting segmenta-
tion with respect to small changes in the input image. SIDEs were further characterized
as gradient descent equations for certain energy functionals. Several classes of Lyapunov
functionals were found for these equations.
SIDEs’ superior performance was demonstrated in Chapter 3 through a multitude of
examples, both in 1-D and in 2-D. Their robustness—in contrast with other methods—
was shown in segmenting signals with heavy-tailed noise and images with high levels of
blurring and speckle noise, such as ultrasound and SAR images.
Chapter 4 provided an extensive performance analysis and a probabilistic interpreta-
tion of a variant of SIDEs. We showed that it produced maximum likelihood solutions
to certain binary classification/edge detection problems in 1-D. We conjectured that
similar results hold in 2-D. Extensive performance analysis for the 1-D problem of find-
ing one change in mean was done, producing bounds on the probabilities of errors. For
this problem, it was also proved that the algorithm is optimal with respect to a certain
H∞ -like criterion. The experiments in both 1-D and 2-D showed that the algorithm
performs extremely well, and is robust to large-amplitude noise and blurring.
We also presented a fast implementation and compared it with the dynamic pro-
gramming solution of the 1-D problem. Dynamic programming was faster in certain in-
stances, but took up more memory and did not generalize to 2-D. We explored the links

111
112 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH

between our non-linear diffusion equation and the Mumford-Shah variational method
of image segmentation, and showed that a certain particular case of these is a linear
programming problem.
Thus, the contribution of Chapter 4 was two-fold. First, we presented a fast and
robust 1-D edge detection and 2-D image segmentation method. Second, we established
a link between deterministic methods for image restoration and segmentation (based on
non-linear diffusions and variational formulations) and a probabilistic framework. This
leads to a deeper understanding of these methods: both of their performance, and of
how to use them in a variety of situations (e.g., in Section 4.4 this meant pre-processing
the data by forming the log-likelihood ratios). As we will argue in the next section, we
have no doubt that these lines of investigation can and should be pursued further.
Finally, in Section 5 we demonstrated that our framework can be easily adapted
to non-scalar-valued images. Specifically, we used the result from Chapter 3 which
showed that a scalar-valued SIDE is the steepest descent equation for a certain energy
functional. We then generalized SIDEs by deriving the steepest descent equations for
similar energy functionals in vector-valued and circle-valued cases. We showed that
many properties of the scalar-valued SIDEs applied, and pointed out several important
differences. We successfully applied the resulting evolution to the segmentation of
color, texture, and orientation images.

¥ 6.2 Future Research.

In this section, we list several interesting problems motivated by this thesis.

¥ 6.2.1 Feature Extraction.

Once one admits the concept of sliding surfaces for signal or image evolution, the ques-
tion immediately arises as to whether one can design sliding surfaces other than those
used in this work. In particular, the sliding surfaces reflecting the enforced equality of
neighboring points or pixels, directly correspond to piecewise-constant approximations
of signals, and the resulting SIDE evolution in essence produces an adapted sequence
of staircase approximations to a signal or an image. An open question is whether it
is also possible to produce a sequence of other approximations (e.g., linear splines) to
signals and images by appropriately defining the sliding surfaces. The challenge is to
design these surfaces in such a way that the resulting SIDE evolution is both fast and
robust.

¥ 6.2.2 Choosing the Force Function.

We mentioned in Chapter 3 that choosing a SIDE force function best suited for a
particular application is an open research question. We also partially addressed this
issue in Chapter 4, by describing and analyzing the problems for which F (v) = sgn(v)
was the optimal choice. This choice was optimal in the sense that it resulted in the
maximum likelihood solutions to change detection problems for certain probabilistic
Sec. 6.2. Future Research. 113

F(v)

-K K v

Figure 6.1. A force function which results in faster removal of outliers.

models of the input signal. It is unclear how to choose force functions for other signal
models; an interesting question is for what models this can be done so as to guarantee
that the solution produced by the SIDE is the maximum likelihood solution.
Another question is that of robustness properties of the SIDEs corresponding to
different shapes of the force function. Intuitively, if the goal is segmentation in the
presence of outliers, then it is appropriate to diffuse quickly both in the areas of very
large gradient (corresponding to the outliers), and in the areas of very small gradient
(corresponding to small-amplitude noise). Ideally, the minimum diffusion speed would
be at the locations with intermediate values of the gradient, corresponding to edges.
We are then led to the form of a force function depicted in Figure 6.1, which is the
inverse of a Perona-Malik force function, and has a unique minimum at some location
K. If the parameters of the probabilistic model of the input signal (such as the standard
deviation of the noise, the magnitude and frequency of the outliers) are fixed, then it
is natural to expect that some value of K is, in some sense, optimal. Uncovering this
relationship between the model and the corresponding value of the parameter K is an
interesting research topic.

¥ 6.2.3 Choosing the Stopping Rule.

Another interesting question is how to stop a SIDE so as to obtain the best segmen-
tation. When the force function is F (v) = sgn(v) and the number of regions is an
exponential random variable, the answer to this question is given by Proposition 4.6 of
Chapter 4. We have been unable, however, to answer this question for the general form
of the SIDE force function F . In our examples, we assumed that the desired number
of regions was known, and stopped the evolution when that number was attained. In
some applications where the number of objects of interest is not known a priori, this
may not be a realistic assumption.
As we remarked in Chapter 4, one way to address the question of when to stop is by
looking at the size of the energy of the region being removed, or at the time required
to remove it. If a region has large energy, or if a long time is required to remove it, it
is more likely to be significant, and less likely to be due to noise.
114 CHAPTER 6. CONCLUSIONS AND FUTURE RESEARCH

¥ 6.2.4 Image Restoration.

As mentioned in Chapter 3, while SIDEs may do an excellent job of segmentation and
location of edges, they do not directly provide good estimates of the values within
regions or between edges. This limitation can be addressed, for example, by using a
SIDE for segmentation and then using optimal linear estimation or filtering within each
segmented region in order to get both accurate edge estimates and denoising within
each so-identified region.

¥ 6.2.5 PDE Formulation.

The problem of re-casting SIDEs as PDEs similar to (3.27) of Chapter 3 is of great
theoretical interest, because PDEs have been the unifying language of most of the
researchers in the field of non-linear scale spaces and geometry-driven diffusion. A
PDE formulation is also important because it will allow one to obtain certain results
concerning the evolution. For example, the analysis of the invariance of the equation
to various coordinate transformations of the image domain is clearly more convenient
to carry out in the continuous-space setting.

¥ 6.2.6 Further Probabilistic Analysis.

An important topic for future investigations is whether results similar to those of Chap-
ter 4 can be obtained for signals whose samples come from more than two probability
distributions. In addition, we would like to generalize these results to the situation
when the samples of the observed process are not conditionally independent. This
would make possible the design of probabilistic texture segmentation algorithms. We
also would like to explore the case of unknown probability densities for the binary
problem, which means devising ways of learning the log-likelihood ratios from the data.
Proving the conjectures of Section 4.7 also has high priority.
The most ambitious question is perhaps analyzing the performance in 2-D, which
amounts to analyzing random fields with abrupt changes. In order to understand the
sensitivity of a segmentation algorithm to noise, we need to understand how the edges
produced by the algorithm will change if the input image is perturbed by a spatial
random field.

¥ 6.2.7 Prior Knowledge.

Another avenue of future research is understanding how to incorporate prior knowl-
edge, which is a very important question not only for segmentation, but for any other
image processing task. For example, we saw in Chapter 4 that SIDEs can be modified
to make use of a prior distribution on the number of desired regions in the segmenta-
tion. It remains to be seen, however, how to incorporate other knowledge, for instance,
information about sizes and shapes of the regions.
Appendix A

Proof of Lemma on Sliding

(Chapter 3)

To simplify notation, we replace ni with i in (3.11) and re-write the system in terms of
vi = ui+1 − ui :
1 1
v̇i = (F (vi+1 ) − F (vi )) − (F (vi ) − F (vi−1 ))
mi+1 mi
i = 1, . . . , p − 1. (A.1)
We need to prove that if (i1 , . . . , ip−1 ) is any permutation of (1, . . . , p − 1), then, as v
p−1
approaches S = ∩m k=1 Sik \(∪k=m+1 Sik ), lim(v̇iq sign(viq )) ≤ 0 for all integers q between
1 and m, and for at least one such q the inequality is strict (i.e., the trajectories enter
S transversally). Note that for every point s ∈ S and every quadrant Q, we only need
to find one sequence of v’s approaching s from Q and satisfying these inequalities. This
is because the solutions vary continuously inside each quadrant.
Fix vim+1 , . . . , vip−1 at non-zero values, let
1
ε= min |vi |,
2 m+1≤j≤p−1 j
let initially δ = ε, set |vi1 | = . . . = |vim | = δ, and drive v towards S by letting δ go to
zero. Take an arbitrary index q between 1 and m. By our construction, viq is approach-
ing zero, and either viq = δ > 0 or viq = −δ < 0. If viq = δ, then, by construction,
viq ≤ |viq±1 |, implying F (viq ) ≥ F (viq±1 ), which makes the RHS of (A.1) for i = iq
non-positive: limδ→0 v̇iq ≤ 0. If m < p − 1, then there is a j between 1 and m such
that at least one of the two neighbors of vij is in the set {vim+1 , . . . , vip−1 }, and whose
absolute value is therefore staying above ε: |vij+1 | > ε or |vij−1 | > ε. Without loss of
generality, suppose it is the left neighbor: |vij−1 | > ε. If m = p − 1, define j = 1. If our
arbitrary q happens to be equal to this j, then
F (viq ) − F (viq−1 ) = F (vij ) − F (vij−1 ) > F (vij ) − F (ε),
and hence (A.1) for i = ij has a strictly negative limit: limδ→0 v̇ij < 0. Similar reasoning
for the case viq = −δ leads to limδ→0 v̇iq ≥ 0, and, if it happens that q = j, then
limδ→0 v̇iq > 0.

115
116 APPENDIX A. PROOF OF LEMMA ON SLIDING (CHAPTER 3)
Appendix B

Proofs for Chapter 4.

¥ B.1 Proof of Proposition 4.1.

Let h = h∗≤ν (u0 ), and suppose that h has an upward edge at location i: hi = 0, and
hi+1 = 1. Suppose further that there is no upward α-crossing at i. We only treat the
case when the signal values are not equal to the threshold: u0i 6= α, u0i+1 6= α. The
degenerate cases when one or more signal values are equal to the threshold are handled
similarly.
Consider two possibilities.
Case 1. u0i > α (Figure B.1, (b)).
Changing hi from 0 to 1 will either maintain the same number of edges (if
i > 1 and hi−1 = 0), or reduce the number of edges by one (if i = 1), or
reduce the number of edges by two (if i > 1 and hi−1 = 1). It will also
increase φ(u0 , h) (by the amount u0i − α), which contradicts the assumption
that h is the best hypothesis with ν or fewer edges. Thus, u0i has to be

2 2

1 1

0 0
1 2 3 1 2 3

1 1

0 0
1 2 3 1 2 3
(a) (b)
Figure B.1. Samples of a signal (top plots) and impossible edge configurations of optimal hypotheses
(bottom plots).

117
118 APPENDIX B. PROOFS FOR CHAPTER 4.

smaller than α.

Case 2. u0i < α and u0i+1 < α (Figure B.1, (a)).

Changing hi+1 from 1 to 0 will either maintain the same number of edges (if
i < N −1 and hi+2 = 1), or reduce the number of edges by one (if i = N −1),
or reduce the number of edges by two (if i < N −1 and hi+2 = 0). It will also
increase φ(u0 , h) (by the amount α − u0i+1 ), again violating the assumption
that h is the best hypothesis with ν or fewer edges. Combining this with
Case 1, we see that u must have an upward α-crossing at i.
It is similarly shown that every downward edge of h occurs at a downward α-crossing
of u0 .

¥ B.2 Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic.

We note that this proof relies on Lemma 4.1, proved in Subsection 4.3.1.
We first show that at any time t between 0 and tf , the rate of reduction of φ(u(t), h)
is the largest for h = h∗ (u(tf )), among all hypotheses h with να (tf ) or fewer edges. To
simplify notation, define ν = να (tf ), gj+1 = N , and let g1 , . . . , gj be the edge locations
(from left to right) of an arbitrary hypothesis h with ν or fewer edges (i.e., j ≤ ν).
Without loss of generality, assume that the leftmost edge is upward, so that hg1 = 0.
Then

φ̇(u(t), h) = hT u̇(t) (B.1)

X X
gi+1
= u̇n (t) (B.2)
i is odd n=gi +1
 
X  X N XN 
= u̇n (t) − u̇n (t) , (B.3)
 
i is odd n=gi +1 n=gi+1 +1

Note that, for any integer s such that 1 ≤ s ≤ N − 1 and us+1 (t) − us (t) 6= 0, we have,
by summing up Equations (4.1) from n = s + 1 to n = N ,

X
N
u̇n (t) = −sgn(us+1 (t) − us (t)) = ±1. (B.4)
s+1

If us+1 (t) = us (t), let p be the smallest index to the left of s such that up+1 (t) = us (t),
and let q be the largest index to the right of s such that uq (t) = us (t). If q ≤ N − 1
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 119

and p ≥ 1, then, according to Equations (4.1) and (4.3), we have:

X
N X
q X
N
u̇n (t) = u̇n (t) + u̇n (t)
n=s+1 n=s+1 n=q+1
= (q − s)u̇s − sgn(uq+1 (t) − uq (t))
1
= (q − s) (sgn(uq+1 (t) − uq (t)) − sgn(up+1 (t) − up (t)))
q−p
−sgn(uq+1 (t) − uq (t))
1
= − sgn(uq+1 (t) − uq (t))
q−p
1
− sgn(up+1 (t) − up (t)) ≥ −1, (B.5)
q−p
since q − p ≥ 2. The same
PN inequality is obtained if q = N or p = 0. Combining (B.5)
and (B.4), we see that n=s+1 u̇n (t) ≥ −1 for any s. Therefore, the minimal possible
P
value for (B.3) is (−1) times the number of sums of the form N n=s+1 u̇n (t) in that
expression. This is −j, which is, by assumption, not smaller than −ν:

φ̇(u(t), h) ≥ −j ≥ −ν.

Now note that in this double inequality, both equalities are achieved for all time t,
0 ≤ t ≤ tf , when h = h∗ (u(tf )). Indeed, since in this case j = ν, and—as easily seen
from the definition of φ—g1 , . . . , gν are α-crossings of u(tf ) (and, therefore—by Lemma
4.1—also of u(t) for 0 ≤ t ≤ tf ), we have that for t ∈ [0, tf ],
½
−1 if i is odd
−sgn(ugi +1 (t) − ugi (t)) =
1 if i is even.

Inserting this into (B.4), and then back into (B.3), we see that in this case, (B.3) is
equal to −ν. By definition of h∗ (u(tf )), φ(u(tf ), h) is the largest for h = h∗ (u(tf )). On
the other hand, we just showed that the amount of the reduction of φ(u(t), h) during
the evolution was the greatest for h = h∗ (u(tf )), over all possible hypotheses with ν or
fewer edges. Therefore, φ(u(0), h) must also have been the largest for h = h∗ (u(tf )),
over the same set of hypotheses, which is the statement of the Proposition.
The statement of the same proposition in [51] is as follows.

Proposition B.1. Fix the initial condition u0 of the SIDE (4.1), and let u(t) be the
corresponding solution. Suppose that a statistic φ satisfies two conditions:
© ª
d
1) dt φ(u(t), h) − hT u(t)) = 0;

2) there exists α ∈ IR such that, ∀t ≥ 0, the optimal hypothesis h∗ (u(t)) is generated

by the set of all α-crossings of u(t).
120 APPENDIX B. PROOFS FOR CHAPTER 4.

Let να (t) be the number of α-crossings of u(t). Then

h∗≤να (t) (u0 ) = h∗ (u(t)).

We now show that the two statements are equivalent.

Proposition B.2. Propositions 4.2 and B.1 are equivalent, in the following sense.
i) If φ(u, h) is as in Proposition 4.2, it satisfies the two conditions of Proposition B.1.
ii) Suppose that φ0 (u, h) satisfies the two conditions of Proposition B.1 for any initial
data u0 ∈ IRN (where the constant α may depend on u0 ), and suppose that φ(u, h) is
as in Proposition 4.2. Then, for all u ∈ IRN and for all h ∈ {0, 1}N \{0, 1} P (where
1 = (1, . . . , 1)T ∈ IRN ), φ(u, h) and φ0 (u, h) can only differ by a function of N
i=1 ui :

ÃN !
X
0
φ (u, h) − φ(u, h) = f ui ,
i=1

for some function f : IR → IR, and thus the optimal hypotheses with respect to φ and
φ0 are the same.

Lemma B.1. Let u(t) be the solution of the SIDE (4.1). Suppose that a function
ψ : IRN → IR satisfies
d
(ψ(u(t))) = 0, (B.6)
dt
for any initial data u0 . Then ψ only depends on the sum of the entries of its argument—
i.e., there is a function f : IR → IR such that
ÃN !
X
ψ(u) = f ui .
i=1

Proof. We first show that the partial derivatives of ψ with respect to all the
variables are equal to each other, using the identity

d X ∂ψ
N
(ψ(u(t))) = u̇i . (B.7)
dt ∂ui
i=1

Take an initial condition for which u01 < u02 < . . . < u0N −1 < u0N . It then follows from
(4.1) that u̇1 = 1, u̇N = −1, and u̇i = 0 for 2 ≤ i ≤ N − 1. Substituting these into
(B.7) and using (B.6), we get:

∂ψ ∂ψ
= . (B.8)
∂u1 ∂uN
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 121

Now take an initial condition for which u01 < u02 < . . . < u0N −1 and u0N < u0N −1 . Then
u̇1 = u̇N = 1, u̇N −1 = −2, and u̇i = 0 for 2 ≤ i ≤ N − 2, and therefore

∂ψ ∂ψ ∂ψ
−2 + = 0,
∂u1 ∂uN −1 ∂uN

which, combined with (B.8), gives:

∂ψ ∂ψ
= .
∂u1 ∂uN −1

Proceeding inductively in a similar fashion, we obtain:

∂ψ ∂ψ ∂ψ
= = ... = . (B.9)
∂u1 ∂u2 ∂uN
Let

wi = ui+1 − ui for i = 1, . . . , N − 1,
XN
wN = ui .
i=1

Then it is easily verified by direct substitution that

ÃN −1 ! N −1
1 X X
uk = iwi + wN − wi , for k = 1, . . . , N − 1
N
i=1 i=k
ÃN −1 !
1 X
uN = iwi + wN ,
N
i=1

and therefore
 i
∂uk  N, i<k
N − 1, k ≤ i < N − 1
i
=
∂wi  1
N, i = N.

This means that, if, i 6= N ,

∂ XN
∂ψ ∂uk
{ψ(u(w))} =
∂wi ∂uk ∂wi
k=1
" i #
∂ψ X i XN
i
= ( − 1) +
∂uk N N
k=1 k=i+1
· 2 ¸
∂ψ i 1
= − i + (N − i) = 0,
∂uk N N
122 APPENDIX B. PROOFS FOR CHAPTER 4.

∂ψ
where we used the fact (B.9) that ∂uk is the same for all k. If i = N , then

∂ XN
∂ψ ∂uk
{ψ(u(w))} =
∂wi ∂uk ∂wN
k=1
"N #
∂ψ X 1
=
∂uk N
k=1
∂ψ
= .
∂uk
P
Thus, ψ does not depend on w1 , . . . , wN −1 , only on wN = N i=1 ui .
Proof of Proposition B.2.
(i) Is straightforward.
(ii) According to Lemma B.1 above, if φ0 (u, h) satisfies Condition 1 of Proposition B.1,
then

φ0 (u, h) − hT u
P
is a function of h and Ni=1 ui only. It also follows from the same Lemma that α of
PN
Proposition B.1 may only depend on i=1 ui . Therefore, there is a function ψ such
that
X
N
0
φ (u, h) = h (u − a) + ψ(h,
T
ui ).
i=1

Suppose that ψ depends on h for h 6= 0, 1. We presently show that this would lead to
violating Condition 2 of Proposition B.1. Take h1 , h2 ∈ {0, 1}N \{0, 1}, and S ∈ IR,
such that

ψ(h1 , S) > ψ(h2 , S). (B.10)

Denote the number of samples where h1 and h2 are different by p:

p = kh2 − h1 k1 ≥ 1.

Let k be the number of zeros in h2 :

k = N − kh2 k1 ,

and let
ψ(h1 , S) − ψ(h2 , S)
ε= > 0.
2p
Case 1. There are i and j such that h1,i = h2,i = 1 and h1,j = h2,j = 0.
Let um = α − ε if h2,m = 0 and m 6= j, and let um = α + ε if h2,m = 1 and m 6= i. If
Sec. B.2. Proof of Proposition 4.2: SIDE as a Maximizer of a Statistic. 123

S − N α − (N − 2k)ε ≥ 0, let ui = α + ε + S − N α − (N − 2k)ε, and let uj = α − ε.

Otherwise, let ui = α + ε and uj = α − ε + S − N α − (N − 2k)ε. Then

φ0 (u, h1 ) − φ0 (u, h2 ) = −pε + ψ(h1 , S) − ψ(h2 , S)

= −pε + 2pε = pε > 0. (B.11)

On the other hand, the edges of h2 coincide with the α-crossings of u: um > α whenever
h2,m = 1 and um < α whenever h2,m = 0. Therefore, if φ0 is to satisfy Condition 2 of
Proposition B.1, hP 2 has to be the optimal hypothesis for u, which contradicts (B.11).
We also note that N m=1 um = S by construction.
Case 2. There is no i for which h1,i = h2,i = 1, and there is no j for which h1,j =
h2,j = 0, i.e., h1,i = 1 − h2,i for i = 1, . . . , N .
If 2 ≤ k ≤ N − 2, let i and j be such that h2,i = h2,j = 1. Let h3 be obtained from h2
by changing the i-th entry from zero to one and the j-th entry from one to zero. Then
either ψ(h1 , S) 6= ψ(h3 , S), or ψ(h2 , S) 6= ψ(h3 , S) (or both), and both pairs (h1 , h3 )
and (h2 , h3 ) fall under Case 1 considered above.
If k = 1, let i be the index for which h2,i = 0. Without loss of generality, assume
that i 6= 1 and i 6= N . Form h01 and h02 as follows:

h02,m = h2,m for m = 1, . . . , N − 1

h02,N = 0
h01,m = h1,m for m = 2, . . . , N
h01,1 = 1.

In other words,

h1 = (0, 0, . . . , 0, 1, 0, . . . , 0, 0)T ,
h01 = (1, 0, . . . , 0, 1, 0, . . . , 0, 0)T ,
h2 = (1, 1, . . . , 1, 0, 1, . . . , 1, 1)T ,
h02 = (1, 1, . . . , 1, 0, 1, . . . , 1, 0)T .

Then one of the following holds:

ψ(h1 , S) 6= ψ(h01 , S),

ψ(h2 , S) 6= ψ(h02 , S),
ψ(h01 , S) 6= ψ(h02 , S),

and all three pairs (h1 , h01 ), (h2 , h02 ), and (h01 , h02 ) fall under Case 1 considered above.
The remaining cases are handled similarly to Cases 1 and 2. The conclusion is that, in
each case, (B.10) leads to a violation of Condition 2 of Proposition B.1. This means
that the inequality (B.10) cannot be true, and so ψ(h, S) is independent of h, for
h ∈ {0, 1}N \{0, 1}.
124 APPENDIX B. PROOFS FOR CHAPTER 4.

¥ B.3 Proof of Lemma 4.3.

Consider the earliest time instant t1 during the evolution of the SIDE when one of these
α-crossings {g1 , . . . , gν } disappears. Let us denote the region being removed at t1 by
(i, j). Suppose that, at time t− 1 , there are at least ν + 2 α-crossings remaining. We will
presently show that the only α-crossings among {g1 , . . . , gν } that can be disappearing
at t1 are either g1 or gν , but not both.
Case 1. The signal u(t− 1 ) has at least ν + 3 α-crossings; i − 1 and j are consecutive
elements of the set {g1 , . . . , gν }.
Since u(t− 1 ) has at least ν + 3 α-crossings whereas the set {g1 , . . . , gν } has only ν
elements, there must be at least one pair of consecutive α-crossings of u(t− 0
1 ), say i − 1
0
and j , such that
(i) i0 − 1, j 0 6∈ {g1 , . . . , gν }, and

(ii) either the α-crossing of u(t− 0

1 ) immediately to the left of i − 1 or the α-crossing of
− 0
u(t1 ) immediately to the right of j is an element of {g1 , . . . , gν }.
Since the region (i, j) disappears at time t1 while the region (i0 , j 0 ) stays, the energy
(4.5) of the region (i, j) must be smaller: Eij < Ei0 j 0 , or equivalently,
¯ j ¯ ¯¯ j 0 ¯
¯
¯X ¯ ¯X ¯
¯ ¯ ¯
¯ (un − α)¯ < ¯ (un − α)¯¯ .
¯ ¯ ¯ 0 ¯
n=i n=i

Therefore, we can increase φ(u0 , h) by changing h as follows: remove the edges i − 1

and j and add edges at i0 − 1 and j 0 . This contradicts our assumption that h is the best
hypothesis with ν or fewer edges. The conclusion is that two consecutive α-crossings
from {g1 , . . . , gν } cannot be erased.
Case 2. i − 1 6∈ {g1 , . . . , gν }, and j ∈ {g1 , . . . , gν }.
Suppose that there are α-crossings of u(t− 0
1 ) to the left of i − 1. Let i − 1 be the α-
−
crossing of u(t1 ) immediately to the left of i − 1. Since the region (i, j) is removed, we
have: Eij < Ei0 ,i−1 , which means that changing h by moving the edge from j to i0 − 1
will increase φ(u0 , h), and so h 6= h∗≤ν (u0 ). Thus, it must be that either i = 1 or i − 1
is the leftmost α-crossing of u(t− 1 ), and therefore j = g1 .
Case 3. i − 1 ∈ {g1 , . . . , gν }, and j 6∈ {g1 , . . . , gν }.
This case is handled similarly to Case 2.
Cases 1, 2, and 3 combined imply that, at time t1 , only one of the α-crossings
{g1 , . . . , gν } can disappear: either g1 or gν . Without loss of generality, suppose that
it is g1 . We showed that it can only disappear at the time t− 1 when region (i, g1 ) gets
erased, where i = 1 or i − 1 is the leftmost α-crossing of u(t− 1 ). In what follows, we
consider only the case i = 1; the other case is handled similarly.
We now show that no other α-crossing from the set {g1 , . . . , gν } can disappear
before t+ . We consider the earliest time instant t2 > t1 when one of the remaining
α-crossings {g2 , . . . , gν } disappears. Let us again denote the region being removed by
Sec. B.3. Proof of Lemma 4.3. 125

(i, j). Suppose that, at time t− 2 , there are at least ν + 1 α-crossings remaining.
Case 4. The indices i − 1 and j are consecutive elements of the set {g2 , . . . , gν }.
Case 5. i − 1 6∈ {g2 , . . . , gν }, and j ∈ {g2 , . . . , gν }.
Case 6. i − 1 ∈ {g2 , . . . , gν }, and j 6∈ {g2 , . . . , gν }.
Cases 4, 5, and 6 are handled similarly to Cases 1, 2, and 3, respectively, with the
result that only (i, g2 ) or (gν , j) can be removed, where i − 1 is either 0 or the leftmost
α-crossing of u(t−
2 ), and j is either N or the rightmost α-crossing of u(t2 ).
−

We now show how to handle the case when (1, g2 ) is removed at t2 ; all other cases are
handled using similar techniques.
Without loss of generality, we suppose that g1 is an upward edge of h. Then

X
g1
(un − α) < 0, (B.12)
n=1

as otherwise we would be able to improve h by removing the edge g1 . Also,

X
g2
(un − α) > 0, (B.13)
n=g1 +1

as otherwise removing the edges g1 and g2 would improve h. Since the region (1, g1 )
disappeared at time t1 while the α-crossing at g2 stayed, we have

E1,g1 < Eg1 +1,g2 ,

¯g ¯ ¯ ¯
¯X ¯ ¯ X g2 ¯
¯
1
¯ 1¯ ¯ ¯
i.e., ¯ (un − α)¯ < (u − α)¯. (B.14)
2 ¯¯ ¯
n
¯ ¯ ¯
n=1 n=g1 +1

We combine (B.12), (B.13), and (B.14) to get

¯g ¯
¯X 2 ¯
¯ ¯
E1,g2 = ¯ (un − α)¯
¯ ¯
¯n=1 ¯ ¯ ¯
¯ g2 ¯ ¯X ¯
¯ X ¯ ¯ g1 ¯
= ¯¯ (un − α)¯¯ − ¯ (un − α)¯
¯n=g1 +1 ¯ ¯ ¯
n=1
¯ ¯
¯ g2 ¯
1 ¯¯ X ¯
> ¯ (un − α)¯¯
2¯ ¯
n=g1 +1

= Eg1 +1,g2 . (B.15)

Choose the region (i0 , j 0 ) of u(t−

2 ) analogously to Case 1. Then, since (1, g2 ) gets removed
0 0
at t2 and (i , j ) does not,

Ei0 j 0 > E1,g2 .

126 APPENDIX B. PROOFS FOR CHAPTER 4.

Together with (B.15), this gives

Ei0 j 0 > Eg1 +1,g2 .

Thus, h can be improved by replacing the edges g1 and g2 with i0 and j 0 , which is a
contradiction.

¥ B.4 Completion of the Proof of Proposition 4.6.

We need to show that the following situation is impossible:
• hψ = h∗≤ν̄ (u0 ) 6= h∗≤ν̄−1 (u0 ), and

• ρi∗ ,j ∗ = 2, i.e., the SIDE’s solution goes from ν̄ + 1 zero-crossings directly to ν̄ − 1.

We denote the edges of h1 = h∗≤ν̄+1 (u0 ) by g1 , . . . , gν̄+1 , and show that each of the three
possibilities allowed by Proposition 4.4 in the situation above, leads to a contradiction.
Case (ii). h∗≤ν̄ (u0 ) has edges at the locations g2 , . . . , gν̄+1 .
Then

η(h∗≤ν̄ (u0 )) − η(h1 ) = E1,g1 − λ,

η(h2 ) − η(h∗≤ν̄ (u0 )) = 2E ∗ − E1,g1 − λ,

and so, for h∗≤ν̄ (u0 ) to be better (with respect to η) than both h1 and h2 , we need to
have:

E1,g1 < λ, and

∗
2E − E1,g1 > λ,

from which it follows that

2E ∗ − E1,g1 > E1,g1 ⇒

∗
E > E1,g1 .

The latter inequality contradicts the definition of E ∗ as the smallest energy of any
region of u(t).
Case (iii). h∗≤ν̄ (u0 ) has edges at the locations g1 , . . . , gν̄ .
This case is handled similarly to Case 2.
Case (iv). h∗≤ν̄ (u0 ) has edges at the locations {g1 , . . . , gν̄+1 }\{i∗ , j ∗ }, as well as one
edge at some other location g10 . This situation requires considering several sub-cases—
see Case 6 of Section B.3. As they are similar, we only treat the one where the region
(1, g10 ) was removed before time t.
Then

η(h∗≤ν̄ (u0 )) − η(h1 ) = 2E ∗ − E1,g10 − λ,

η(h2 ) − η(h∗≤ν̄ (u0 )) = E1,g10 − λ.
Sec. B.5. Equivalence of the SIDE (4.1) to a Linear Programming Problem. 127

If hψ = h∗≤ν̄ (u0 ), we then must have

E1,g10 > λ, and

∗
2E − E1,g10 < λ,

from which it follows that

2E ∗ − E1,g10 < E1,g10 ⇒

∗
E < E1,g10 . (B.16)

But the region (1, g10 ) disappeared before (i∗ , j ∗ ), and therefore

E ∗ ≥ E1,g10 ,

which contradicts (B.16). This concludes the proof.

¥ B.5 Equivalence of the SIDE (4.1) to a Linear Programming Problem.

We need to show that a solution to the linear program (4.24-4.27) exists and consists
of ones and zeros. There are 2N − 1 variables involved (h1 , . . . , hN , r1 , . . . , rN −1 ), and
it is easily seen that there are 2N − 1 linearly independent constraints in (4.25-4.27).
Moreover, h1 = . . . = hN = r1 = . . . = rN −1 = 0 satisfies all the constraints. Thus,
(4.25-4.27) define a non-empty polyhedron in IR2N −1 with at least one vertex (see [65],
Theorem 2.8), which means that a solution to (4.24-4.27) exists. We assume non-
degeneracy—i.e., that the solution is unique. This is not necessary for the proof, but it
greatly simplifies notation. Now suppose that, for this solution h, there exist p and q
such that 1 ≤ p < q ≤ N − 1, and

hp 6= hp+1 = hp+2 = . . . = hq 6= hq+1 . (B.17)

We now show that, unless hq = 0 or hq = 1, we can change h to make (4.24) smaller, and
that therefore h cannot be a solution if 0 < hq < 1. We will be changing hp+1 , . . . , hq ,
and so let us write out the portion of (4.24) which depends on them:

−shq + λ(|hq − hp | + |hq − hq+1 |), (B.18)

P
where s = qi=p+1 ui .
Case 1. Suppose that

hp < hq and hq+1 < hq . (B.19)

Then the hq -dependent portion of Equation (B.18) is:

hq (2λ − s). (B.20)

128 APPENDIX B. PROOFS FOR CHAPTER 4.

If 2λ > s, make hq = max(hp , hq+1 ), which will reduce hq and therefore also reduce
(B.20). It will also make either hp = hq or hq = hq+1 , violating our assumption (B.17).
If 2λ < s, make hq = 1, which will also reduce (B.20). In the degenerate case 2λ = s,
we can go either way without changing the solution. Thus, if (B.17) and (B.19) hold,
then hq = 1.
Case 2. hp > hq and hq+1 > hq . This is handled similarly, with the result that hq = 0.
Case 3. hp < hq and hq+1 > hq . Then the hq -dependent portion of Equation (B.18) is:

−hq s. (B.21)

If s > 0, increase hq by setting hq = hq+1 . If s < 0, reduce hq by setting hq = hp . Both

will reduce (B.21) and violate (B.17). In the degenerate case s = 0, we can go either
way.
Case 4. hp > hq and hq+1 < hq . This is handled similarly to Case 3.
The situations when p = 0 and/or q = N are also handled similarly. The conclusion
is that the optimal solution consists of zeros and ones, which means that the linear
program (4.24-4.27) is equivalent to the optimization problem (4.23).
Bibliography

[1] L. Alvarez, P.L. Lions, and J.-M. Morel. Image selective smoothing and edge detec-
tion by nonlinear diffusion, II. SIAM J. Numer. Anal., 29(3), 1992.

[2] M.S. Atkins and B.T. Mackiewich. Fully automatic segmentation of the brain in
MRI. IEEE Trans. on Medical Imaging, 17(1), February 1998.

[3] M. Basseville and I.V. Nikiforov. Detection of Abrupt Changes: Theory and Appli-
cation. Prentice Hall, 1993.

[4] D.P. Bertsekas and S.K. Mitter. A descent method for optimization problems with
nondifferentiable cost functionals. SIAM J. Control, 11(4), November 1973.

[5] M.J. Black, G. Sapiro, D.H. Marimont, and D. Heeger. Robust anisotropic diffusion.
IEEE Trans. on Image Processing, 7(3), 1998.

[6] C. Bouman and K. Sauer. An edge-preserving method for image reconstruction

from integral projections. In Proc. Conf. on Info. Sci. and Syst., pages 383-387,
Baltimore, MD, March 1991.

[7] C. Brice and C. Fennema. Scene analysis using regions. Artificial Intelligence, 1,
1970.

[8] C.B. Burckhardt. Speckle in ultrasound B-mode scans. IEEE Trans. on Sonics and
Ultrasonics, SU-25, January 1978.

[9] J. Canny. A computational approach to edge detection. IEEE Trans. on PAMI,

PAMI-8, 1986.

[10] T.-H. Chang, Y.-C. Lin, and C.-C. J. Kuo. Techniques in texture analysis. IEEE
Trans. on Image Processing, October 1993.

[11] T.H. Cormen, C.E. Leiserson, and R.L. Rivest. Introduction to Algorithms. MIT
Press, 1990.

[12] V. Caselles, R. Kimmel, and G. Sapiro. Geodesic active contours. In Proc. ICCV,
pages 694–699, Cambridge, MA, 1995.

129
130 BIBLIOGRAPHY

[13] R.N. Czerwinski, D.L. Jones, and W.D. O’Brien, Jr. Line and boundary detection
in speckle images. IEEE Trans. on Image Processing, 7(12), 1998.

[14] G. Dahlquist and A. Bjorck. Numerical Methods. Prentice Hall, 1974.

[15] S.Z. Der and R. Chellappa. Probe-based automatic target tecognition in infrared
imagery. IEEE Trans. on Image Processing, 6(1), 1997.

[16] P. Felzenszwalb and D. Huttenlocher. Image segmentation using local variation. In

Proc. of IEEE Conf. on Computer Vision and Pattern Recognition, pages 98-104,
1998.

[17] A.F. Filippov. Differential Equations with Discontinuous Righthand Sides. Kluwer
Academic Publishers, 1988.

[18] C.H. Fosgate, H. Krim, W.W. Irving, W.C. Karl, and A.S. Willsky. Multiscale
segmentation and anomaly enhancement of SAR imagery. IEEE Trans. on Image
Processing, 6(1), 1997.

[19] M.G. Fleming, C. Steger, J. Zhang, J. Gao, A.B. Cognetta, I. Pollak, and C.R.
Dyer. Techniques for a structural analysis of dermatoscopic imagery. Computerized
Medical Imaging and Graphics, 22, 1998.

[20] J. Froment and S. Parrino. MegaWave2 software package. CEREMADE, URA

CNRS 749 de l’Université Paris-IX Dauphine, 1995.

[21] D. Geman and G. Reynolds. Constrained restoration and the recovery of discon-
tinuities. IEEE Trans. on PAMI, 14(3), 1992.

[22] J.W. Goodman. Statistical properties of laser speckle patterns. In Topics in Applied
Physics, vol. 9: Laser Speckle and Related Phenomena, 2nd edition, J.C. Dainty,
Editor. Springer-Verlag, 1984.

[23] D.R. Greer, I. Fung, and J.H. Shapiro Maximum-likelihood multiresolution laser
radar range imaging. IEEE Trans. on Image Processing, 6(1), 1997.

[24] K. Haris, S.N. Efstratiadis, N. Maglaveras, and A.K. Katsaggelos. Hybrid image
segmentation using watersheds and fast region merging. IEEE Trans. on Image
Processing, 7(12), 1998.

[25] B. Hassibi, A.H. Sayed, and T. Kailath. Linear estimation in Krein spaces–part i:
Theory. IEEE Trans. Automatic Control, 41(1), 1996.

[26] B. Hassibi, A.H. Sayed, and T. Kailath. Linear estimation in Krein spaces–part ii:
Applications. IEEE Trans. Automatic Control, 41(1), 1996.

[27] G.T. Herman. Image Reconstruction from Projections: The Fundamentals of Com-
puterized Tomography. Academic Press, 1980.
BIBLIOGRAPHY 131

[28] D.V. Hinkley. Inference about the change-point in a sequence of random variables.
Biometrica, 57(1), 1970.

[29] B.K.P. Horn. Robot Vision. MIT Press, 1986.

[30] B.K.P. Horn and B. Schunck. Determining optical flow. Artificial Intelligence, 17,
pages 185-203, 1981.

[31] M. Kass, A. Witkin, and D. Terzopoulos. Snakes: Active contour models. Int. J.
of Comp. Vis., 1:321–331, 1988.

[32] I.B. Kerfoot and Y. Bresler. Theoretical analysis of multispectral image segmen-
tation criteria. IEEE Trans. on Image Processing, 8(6), 1999.

[33] R.L. Kettig and D.A. Landgrebe. Classification of multispectral image data by
extraction and classification of homogeneous objects. IEEE Trans. on Geoscience
Electronics, GE-14(1), 1976.

[34] S. Kichenassamy. The Perona-Malik paradox. SIAM J. Applied Math., 57, 1997.

[35] J. Koenderink. The structure of images. Biological Cybernetics, 50, 1984.

[36] G. Koepfler, C. Lopez, and J.-M. Morel. A multiscale algorithm for image segmen-
tation by variational method. SIAM J. Numer. Anal., 31(1), 1994.

[37] B. Kosko. Neural Networks for Signal Processing, pages 37-61. Prentice Hall, 1992.

[38] H. Krim and Y. Bao. A stochastic diffusion approach to signal denoising. In Proc.
ICASSP, Phoenix, AZ, 1999.

[39] Y. Leclerc. Constructing simple stable descriptions for image partitioning. Inter-
national Journal of Computer Vision, 3, 1989.

[40] S.G. Mallat. A Wavelet Tour of Signal Processing. Academic Press, 1998.

[41] D. Marr. Vision. W.H. Freeman and Co., 1982.

[42] Vision and Modeling Group, MIT Media Lab. Vision Texture Database.
http://www-white.media.mit.edu/vismod/imagery/VisionTexture/vistex.html

[43] J.-M. Morel and S. Solimini. Variational Methods in Image Segmentation.

Birkhauser, 1995.

[44] D. Mumford and J. Shah. Boundary detection by minimizing functionals, I. In

Proc. CVPR, pages 22–26, San Francisco, June 1985.

[45] K.M. Nagpal and P.P. Khargonekar. Filtering and smoothing in an H ∞ setting.
IEEE Trans. Automatic Control, 36(2), 1991.
132 BIBLIOGRAPHY

[46] S. Osher and L.I. Rudin. Feature-oriented image enhancement using shock filters.
SIAM J. Numer. Anal., 27(4), 1990.

[47] T. Pavlidis. Segmentation of pictures and maps through functional approximation.

Computer Graphics and Image Processing, 1, 1972.

[48] P. Perona. Orientation diffusions. IEEE Trans. on Image Processing, 7(3), 1998.

[49] P. Perona and J. Malik. Scale-space and edge detection using anisotropic diffusion.
IEEE Trans. on PAMI, 12(7), 1990.

[50] P. Perona, T. Shiota, and J. Malik. Anisotropic diffusion. In [55].

[51] I. Pollak, A. S. Willsky, and H. Krim. A nonlinear diffusion equation as a fast and
optimal solver of edge detection problems. In Proc. ICASSP, Phoenix, AZ, 1999.

[52] S. Resnick. Adventures in Stochastic Processes. Birkhauser, 1992.

[53] J. Rissanen. Stochastic complexity in statistical inquiry. World Scientific, 1989.

[54] R.T. Rockafellar. Convex Analysis. Princeton Univerity Press, 1970.

[55] B.M. ter Haar Romeny, editor. Geometry-Driven Diffusion in Computer Vision.
Kluwer Academic Publishers, 1994.

[56] L.I. Rudin, S. Osher, and E. Fatemi. Nonlinear total variation based noise removal
algorithms. Physica D, 1992.

[57] G. Sapiro. From active contours to anisotropic diffusion: connections between basic
PDE’s in image processing. In Proc. ICIP, Lausanne, 1996.

[58] K. Sauer and C. Bouman. Bayesian estimation of transmission tomograms using

segmentation based optimization. IEEE Trans. on Nuclear Science, 39(4), August
1992.

[59] K. Sauer and C. Bouman. A local update strategy for iterative reconstruction from
projections. IEEE Trans. on Signal Processing, 41(2), 1993.

[60] P. Schmid. Segmentation of digitized dermatoscopic images by two-dimensional

color clustering. IEEE Trans. on Medical Imaging, 18(2), February 1999.

[61] U. Shaked and Y. Theodor. H∞ -optimal estimation: A tutorial. In Proc. of the

31st CDC, Tucson, Arizona, 1992.

[62] G. Strang. Linear Algebra and Its Applications, 3rd Edition. Harcourt Brace
Jovanovich, 1988.

[63] P.C. Teo, G. Sapiro, and B. Wandell. Anisotropic smoothing of posterior proba-
bilities. In Proc. ICIP, Santa Barbara, CA, 1997.
BIBLIOGRAPHY 133

[64] P.C. Teo, G. Sapiro, and B. Wandell. Creating connected representations of cortical
gray matter for functional MRI visualization. IEEE Trans. on Medical Imaging,
16(6), December 1997.

[65] J. Tsitsiklis. Introduction to Mathematical Programming. 6.251J Course Notes,

MIT, 1995.

[66] H. van Trees. Detection, Estimation, and Modulation Theory, volume 1. Wiley,
1968.

[67] V.I. Utkin. Sliding Modes in Control and Optimization. Springer-Verlag, 1992.

[68] P. Viola and W.M. Wells III. Alignment by maximization of mutual information.
International Journal of Computer Vision, 24(2), 1997.

[69] J. Weickert. Nonlinear diffusion scale-spaces: from the continuous to the discrete
setting. In Proc. ICAOS: Images, Wavelets, and PDEs, pages 111–118, Paris, 1996.

[70] R.F. Wagner, S.W. Smith, J.M. Sandrik, and H. Lopez. Statistics of speckle in
ultrasound B-scans. IEEE Trans. on Sonics and Ultrasonics, SU-30, May 1983

[71] M. Wertheimer. Laws of organization in perceptual forms. In A Sourcebook of

Gestalt Psychology, W.B. Ellis, Editor. Harcourt, Brace, and Company, 1938.

[72] A. Witkin. Scale-space filtering. In Int. Joint Conf. on AI, pages 1019–1022,
Karlsruhe, 1983.

[73] P. Wolfe. A method of conjugate gradients for minimizing nondifferentiable func-

tions. In Nondifferentiable Optimization, M.L. Balanski and P. Wolfe, Editors.
North-Holland Publishing, 1975.

[74] G. Wyszecki and W.S. Stiles. Color Science: Concepts and Methods, Quantitative
Data and Formulae. Wiley, 1982.

[75] S.C. Zhu and D. Mumford. Prior learning and Gibbs reaction-diffusion. IEEE
Trans. on PAMI, 19(11), 1997.

[76] S.C. Zhu and A. Yuille. Region competition: unifying snakes, region growing, and
Bayes/MDL for multiband image segmentation. IEEE Trans. on PAMI, 18(9), 1996.
134 BIBLIOGRAPHY
Index

H∞ , 23, 70, 88, 93, 94, 111 Geman, D., 22, 34, 35, 38, 66, 111
Geman, S., 8
automatic target recognition, 21 gradient, 25–28, 33, 34, 39, 41, 49–51,
binary classification, 23, 70–72, 111 53, 59, 62, 66, 69, 99, 100, 103,
Bouman, C., 31–33, 67 104, 106, 108, 111, 113
Brice, C., 34
Hinkley, D., 70, 91
circle-valued images, 3, 23, 99, 106, 108,
Koepfler, G., 30, 38, 59–64, 66
109, 112
Krim, H., 8
color image, 101, 109
color images, 23, 99, 101, 112 level crossing, 72–75, 77–79, 81, 84, 85,
computer vision, 22 88, 96, 118, 119, 123–125
dermatoscopy, 21, 65 likelihood, 20, 23, 32, 71, 79–83, 86, 94,
detection of abrupt changes, 3, 19, 22, 111, 112
70, 80 linear programming, 3, 70, 87, 88, 112,
diffusion, 3, 20–23, 27–29, 35, 37, 40–42, 127, 128
44, 45, 67, 69, 70, 100, 104, 106, linear programming , 87
111–113 Lopez, C., 30, 38, 59–64, 66
divergence, 25, 26, 28 Malik, J., 3, 21, 22, 28, 29, 31, 37, 40–
dynamic programming, 70, 87, 111 42, 49, 56, 58–60, 62, 63, 65, 69,
energy, 27, 30, 34, 35, 39–41, 45, 49– 70, 111
51, 53, 57, 62, 66, 67, 77, 78, Mallat, S., 8
84, 99, 101, 106, 108, 111, 112, Mitter, S., 8
124, 126 Morel, J.-M., 30, 38, 59–64, 66
enhancement, 21, 28, 29, 31, 38, 40–42, Mumford, D., 35, 38, 66
60, 66, 67, 111 Mumford-Shah functional, 3, 23, 30, 38,
61, 62, 66, 70, 83, 85, 86, 111,
Faugeras, O., 8 112
Fennema, C., 34
force function, 39–42, 44, 45, 47–49, 55– orientation, 23, 99, 104, 106, 108, 109,
57, 66, 70, 100, 101, 106, 107, 112
112 Osher, S., 31, 38, 66

135
136 INDEX

Pavlidis, T., 30, 34 total variation, 22, 31, 34, 38, 66, 67,
Perona, P., 3, 21, 22, 28, 29, 31, 37, 40– 111
42, 49, 56, 58–60, 62, 63, 65, 69,
70, 106, 111 ultrasound, 21, 22, 38, 65, 97, 111

region merging, 22, 29–31, 34, 37, 38, 48, vector-valued images, 3, 23, 99–103, 107,
51, 56, 59–61, 64–67, 74, 77, 85, 108, 112
87, 100, 106–108, 111
Weickert, J., 29
restoration, 19, 20, 22, 25, 31, 34, 37,
well-posedness, 42, 45–47, 101, 111
60, 66, 69, 112
Willsky, A., 8
Reynolds, G., 22, 34, 35, 38, 66, 111
Willsky, A.S., 7, 9
robustness, 3, 19, 21–23, 31, 37, 38, 60–
Witkin, A., 27
64, 66, 69, 70, 88, 94, 96, 101,
111–113 Zhu, S.-C., 35, 38, 66
Rudin, L., 31, 38, 66

SAR, 3, 19, 31, 38, 64, 96, 111

Sauer, K., 31–33, 67
scale, 3, 20, 21, 25, 29, 30, 37, 40, 44,
55, 62, 63, 66, 70
scale space, 3, 20–22, 25–27, 29, 37, 38,
57–65, 69
segmentation, 3, 19–23, 25, 27, 29, 30,
34, 37, 38, 41, 44, 45, 48, 56,
58–60, 62, 64–67, 69–71, 81, 96,
101–105, 108, 109, 111, 112
segmentation based optimization, 33
semi-discrete, 26, 29, 31, 37, 40, 49, 67
Shiota, T., 59, 62
shock filters, 22, 31–33, 38, 66, 111
sliding, 46, 48–51, 56, 101, 112, 115
Solimini, S., 30
speckle, 3, 19, 22, 31, 38, 64, 65, 97
split-and-merge, 31, 33
spring-mass model, 38, 41, 42, 47, 55,
56, 100, 106
statistic, 71–75, 77, 79, 81–83, 87, 118,
119
stopping time, 37, 81, 84, 85, 96, 97

texture, 19, 22, 23, 64, 96, 99, 102, 104,

105, 108, 109
texture images, 112

2.digital Image Processing (S. Jayaraman) 1
100% (1)
2.digital Image Processing (S. Jayaraman) 1
751 pages
Numerical Initial Value Problems in Ordinary Differential Equations
100% (6)
Numerical Initial Value Problems in Ordinary Differential Equations
271 pages
David Tschumperle, Christophe Tilmant, Vincent Barra - Digital Image Processing With C++ - Implementing Reference Algorithms With The CImg Library-CRC Press (2023) PDF
100% (2)
David Tschumperle, Christophe Tilmant, Vincent Barra - Digital Image Processing With C++ - Implementing Reference Algorithms With The CImg Library-CRC Press (2023) PDF
309 pages
Marvin Lee Minsky - Computation - Finite and Infinite Machines (1967, Prentice Hall) - Libgen - Li
No ratings yet
Marvin Lee Minsky - Computation - Finite and Infinite Machines (1967, Prentice Hall) - Libgen - Li
334 pages
An Introduction To Transfer Entropy - Information Flow in Complex Systems
No ratings yet
An Introduction To Transfer Entropy - Information Flow in Complex Systems
210 pages
Dynamics of Physical Systems
From Everand
Dynamics of Physical Systems
Robert H., Jr. Cannon
No ratings yet
Concepts of Classical Optics
From Everand
Concepts of Classical Optics
John Strong
No ratings yet
978 3 540 76414 4
No ratings yet
978 3 540 76414 4
595 pages
Angjoo
No ratings yet
Angjoo
142 pages
Elegant Chaos - Algebraically Simple Chaotic Flows
100% (1)
Elegant Chaos - Algebraically Simple Chaotic Flows
302 pages
Real Time Pattern Recognition Using Matrox Imaging System
No ratings yet
Real Time Pattern Recognition Using Matrox Imaging System
88 pages
PhDThesis TaegSangCho
No ratings yet
PhDThesis TaegSangCho
143 pages
Abdeldjallil AIDI's Master Thesis
No ratings yet
Abdeldjallil AIDI's Master Thesis
98 pages
Introduction To Dynamical Systems
No ratings yet
Introduction To Dynamical Systems
37 pages
A Trainable System For Object Detection in Images and Video Sequences
No ratings yet
A Trainable System For Object Detection in Images and Video Sequences
148 pages
Nit Kurukshetra PHD Thesis Template
No ratings yet
Nit Kurukshetra PHD Thesis Template
31 pages
An Adaptive Tracking Algorithm For Robotics and Computer Vision Application
No ratings yet
An Adaptive Tracking Algorithm For Robotics and Computer Vision Application
82 pages
Suzuki E. (Ed), Arikawa S. (Ed) - Discovery Science - 7th International Conference, DS 2004, Padova, Italy, October 2-5, 2004, Proceeindgs (2004)
No ratings yet
Suzuki E. (Ed), Arikawa S. (Ed) - Discovery Science - 7th International Conference, DS 2004, Padova, Italy, October 2-5, 2004, Proceeindgs (2004)
448 pages
Revisiting the Bohr Model: Over 55 New Conceptual Theories
From Everand
Revisiting the Bohr Model: Over 55 New Conceptual Theories
Thomas E. Baker
No ratings yet
Stability Dynamic Systems
100% (1)
Stability Dynamic Systems
514 pages
AASAN.2021 - Invertible and Pseudo-Invertible Encoders An Approach To Inverse Problems With Neural Networks
No ratings yet
AASAN.2021 - Invertible and Pseudo-Invertible Encoders An Approach To Inverse Problems With Neural Networks
199 pages
Li Liyunzhu PHD EECS 2022 Thesis
No ratings yet
Li Liyunzhu PHD EECS 2022 Thesis
243 pages
Topics in Signal Processing Volume 10 - Computer Vision Analysis of Image Motion by Variational Methods (2014) (Amar Mitiche, J.K. Aggarwal)
100% (1)
Topics in Signal Processing Volume 10 - Computer Vision Analysis of Image Motion by Variational Methods (2014) (Amar Mitiche, J.K. Aggarwal)
212 pages
Cremona
No ratings yet
Cremona
311 pages
(Lecture Notes in Computer Science 6099 - Theoretical Computer Science and General Issues) Roberto Grossi, Alessio Orlandi, Giuseppe Ottaviano (Auth.), Paolo Boldi, Luisa Gargano (Eds.) - Fun With Alg
No ratings yet
(Lecture Notes in Computer Science 6099 - Theoretical Computer Science and General Issues) Roberto Grossi, Alessio Orlandi, Giuseppe Ottaviano (Auth.), Paolo Boldi, Luisa Gargano (Eds.) - Fun With Alg
390 pages
Goal-Oriented Memory Allocation in Database Management Systems
No ratings yet
Goal-Oriented Memory Allocation in Database Management Systems
102 pages
Statement of Purpose Rudrasis UChicago
No ratings yet
Statement of Purpose Rudrasis UChicago
2 pages
Full Download Quantum Information Processing With Finite Resources Mathematical Foundations 1st Edition Marco Tomamichel (Auth.) PDF
100% (2)
Full Download Quantum Information Processing With Finite Resources Mathematical Foundations 1st Edition Marco Tomamichel (Auth.) PDF
42 pages
Alexander 4
No ratings yet
Alexander 4
9 pages
Computer Vision Research Statement
No ratings yet
Computer Vision Research Statement
5 pages
Lectures 169
No ratings yet
Lectures 169
57 pages
PJB Dissertation
No ratings yet
PJB Dissertation
178 pages
Tutorials and Problems For Discrete-Time Signals and Systems
No ratings yet
Tutorials and Problems For Discrete-Time Signals and Systems
12 pages
Modern Introduction To Dynamical Systems
No ratings yet
Modern Introduction To Dynamical Systems
188 pages
Discovering Context in Wearable Computing
No ratings yet
Discovering Context in Wearable Computing
181 pages
Encyclopedia of Computer Science
No ratings yet
Encyclopedia of Computer Science
747 pages
Ney PHD Thesis PDF
No ratings yet
Ney PHD Thesis PDF
215 pages
Noise and Fluctuations Twentieth International Conference On Noise and Fluctuations
100% (1)
Noise and Fluctuations Twentieth International Conference On Noise and Fluctuations
804 pages
(Applying Mathematics) David Mumford, Agnès Desolneux - Pattern Theory - The Stochastic Analysis of Real-World Signals-A K Peters - CRC Press (2010)
No ratings yet
(Applying Mathematics) David Mumford, Agnès Desolneux - Pattern Theory - The Stochastic Analysis of Real-World Signals-A K Peters - CRC Press (2010)
413 pages
Introduction To Chaos - Theoretical and Numerical Methods - Carlo F Barenghi
No ratings yet
Introduction To Chaos - Theoretical and Numerical Methods - Carlo F Barenghi
149 pages
Signals and Systems: Lecture #2: Introduction To Systems
No ratings yet
Signals and Systems: Lecture #2: Introduction To Systems
8 pages
Index
No ratings yet
Index
444 pages
Visvesvaraya Technological University: An Image Is Worth 16X16 Words Transformers For Image Recognition at Scale
No ratings yet
Visvesvaraya Technological University: An Image Is Worth 16X16 Words Transformers For Image Recognition at Scale
21 pages
Autonomous Robotics and Deep Learning PDF
100% (3)
Autonomous Robotics and Deep Learning PDF
73 pages
Jiajun
No ratings yet
Jiajun
325 pages
Large Scale
No ratings yet
Large Scale
359 pages
Springer Ebooks
0% (1)
Springer Ebooks
541 pages
Image Processing 2nd Edition WWW EBooksWorld Ir PDF
100% (2)
Image Processing 2nd Edition WWW EBooksWorld Ir PDF
816 pages
Amread
No ratings yet
Amread
330 pages
Systems and Techniques For Efficient Real-World Graph Analytics
No ratings yet
Systems and Techniques For Efficient Real-World Graph Analytics
110 pages
Sparse Fourier Thesis
No ratings yet
Sparse Fourier Thesis
250 pages
REPORT
No ratings yet
REPORT
24 pages
Thesis PDF
No ratings yet
Thesis PDF
208 pages
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
No ratings yet
(Computing Supplement 11) Dr. K. Daniilidis (Auth.), Prof. Dr. W. Kropatsch, Prof. Dr. R. Klette, Prof. Dr. F. Solina, Prof. Dr. R. Albrecht (Eds.) - Theoretical Foundations of Computer Vision-Springe
259 pages
Image Processing, 2nd Edition PDF
100% (1)
Image Processing, 2nd Edition PDF
815 pages
Book Reviews: Nonlinear Control Systems: Analysis and Design
No ratings yet
Book Reviews: Nonlinear Control Systems: Analysis and Design
3 pages
VCSELs for Cesium-Based Miniaturized Atomic Clocks
From Everand
VCSELs for Cesium-Based Miniaturized Atomic Clocks
Ahmed Al-Samaneh
No ratings yet
Inorganic Nanostructures: Properties and Characterization
From Everand
Inorganic Nanostructures: Properties and Characterization
Petra Reinke
No ratings yet
The Gateway to Understanding: Electrons to Waves and Beyond
From Everand
The Gateway to Understanding: Electrons to Waves and Beyond
Matthew M. Radmanesh Ph.D.
No ratings yet
Expansion & Innovation: The Story of Western Engineering 1954-1999
From Everand
Expansion & Innovation: The Story of Western Engineering 1954-1999
G. S. Peter Castle
No ratings yet
A Benchmark For Comparison of Cell Tracking Algorithms
No ratings yet
A Benchmark For Comparison of Cell Tracking Algorithms
9 pages
Dic Image Segmentation of Dense Cell Populations
No ratings yet
Dic Image Segmentation of Dense Cell Populations
5 pages
Q1, Q2 and Q3: HW7 Nagasoujanya Annasamudram D10670628
No ratings yet
Q1, Q2 and Q3: HW7 Nagasoujanya Annasamudram D10670628
9 pages
Q4 1. LU Factorization Without Pivoting
No ratings yet
Q4 1. LU Factorization Without Pivoting
5 pages
Youth Parliament Guide
No ratings yet
Youth Parliament Guide
21 pages
PC110R-1 M Weam000402 PC110R-1 PDF
No ratings yet
PC110R-1 M Weam000402 PC110R-1 PDF
212 pages
Trott - Innovation Management Overview
No ratings yet
Trott - Innovation Management Overview
18 pages
Box Culvert Design IRC - 1
No ratings yet
Box Culvert Design IRC - 1
9 pages
Respiratory Protective Equipment
No ratings yet
Respiratory Protective Equipment
19 pages
Reup 11 13
No ratings yet
Reup 11 13
3 pages
Suspension Colloids
No ratings yet
Suspension Colloids
27 pages
Lesson 3-Roots and Optimization
No ratings yet
Lesson 3-Roots and Optimization
30 pages
Assignment On Aerodynamics in F35 Fighter Jet
No ratings yet
Assignment On Aerodynamics in F35 Fighter Jet
4 pages
CHAPTER 9 Philippine Green Building Code - 084722
No ratings yet
CHAPTER 9 Philippine Green Building Code - 084722
33 pages
6 rDNA Technology
No ratings yet
6 rDNA Technology
58 pages
Read BMR, Stay Ahead! - Building Material Reporter Magazine
No ratings yet
Read BMR, Stay Ahead! - Building Material Reporter Magazine
82 pages
Outdoors Song By: Jason Mraz
No ratings yet
Outdoors Song By: Jason Mraz
4 pages
Basics of Miaphysite Christology - Polish Miaphysite
No ratings yet
Basics of Miaphysite Christology - Polish Miaphysite
6 pages
rt14 Owners Manual
No ratings yet
rt14 Owners Manual
36 pages
Lecturer Profile
No ratings yet
Lecturer Profile
3 pages
TeamBuildingExercisesWorksheet PDF
No ratings yet
TeamBuildingExercisesWorksheet PDF
1 page
Why Start Online Selling During This Time of Pandemic
No ratings yet
Why Start Online Selling During This Time of Pandemic
2 pages
Result - COMMON QUIZ (03.08.2024) - CLASS-11th (2024-26)
No ratings yet
Result - COMMON QUIZ (03.08.2024) - CLASS-11th (2024-26)
3 pages
Novelty Creation Homework
No ratings yet
Novelty Creation Homework
2 pages
Comp9020 Cheatsheet Cref
No ratings yet
Comp9020 Cheatsheet Cref
2 pages
Distance
No ratings yet
Distance
2 pages
Subs Use Nclexq
No ratings yet
Subs Use Nclexq
3 pages
Hydraulic Calculation For Box Type Minor Bridge Bridge at CH - 08+967
100% (7)
Hydraulic Calculation For Box Type Minor Bridge Bridge at CH - 08+967
17 pages
Analysis of Damage Occuring in Steel Plate Girder Bridges On National Roads in Japan
No ratings yet
Analysis of Damage Occuring in Steel Plate Girder Bridges On National Roads in Japan
14 pages
Excretory System Worksheet
100% (2)
Excretory System Worksheet
4 pages
Event Management System - FINAL
100% (1)
Event Management System - FINAL
39 pages
Dividend Definition: Irrelevance and Relevance of Dividend
No ratings yet
Dividend Definition: Irrelevance and Relevance of Dividend
9 pages
US9321907 Patent - PROCESS FOR PREPARATION OF STABLE FATTY ALCOHOL EMULSION
No ratings yet
US9321907 Patent - PROCESS FOR PREPARATION OF STABLE FATTY ALCOHOL EMULSION
7 pages
Bible King James Version
No ratings yet
Bible King James Version
2,652 pages

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.