Visual and Spatial Analysis 2004
Visual and Spatial Analysis 2004
Edited by
Boris Kovalerchuk
Central Washington University,
Ellensburg, WA,
U.S.A.
and
James Schwing
Central Washington University,
Ellensburg, WA,
U.S.A.
A C.I.P. Catalogue record for this book is available from the Library of Congress.
Published by Springer,
P.O. Box 17, 3300 AA Dordrecht, The Netherlands.
springeronline.com
Preface xiii
1. Introduction ......................................................................................79
2. Sentential natural deduction .............................................................81
3. Generalizing to heterogeneous deduction.........................................86
4. Generalizing to heterogeneous reasoning........................................95
5. Applications of the architecture......................................................105
6. Conclusions and further work ........................................................106
7. Exercises and problems ..................................................................107
8. References ......................................................................................108
1. Introduction ....................................................................................129
2. Storytelling iconic reasoning architecture ......................................131
3. Hierarchical iconic reasoning .........................................................137
4. Consistent combined iconic reasoning ...........................................139
5. Related work...................................................................................145
6. Conclusion......................................................................................149
7. Exercises and problems ..................................................................150
8. References ......................................................................................151
1. Introduction ....................................................................................153
2. Visualization as illustration: lessons from hieroglyphic numerals .155
3. Visual reasoning: lessons from hieroglyphic arithmetic ................162
4. Visual discovery: lessons from the discovery of π.........................164
5. Conclusion......................................................................................167
6. Exercises and problems ..................................................................169
7. References ......................................................................................170
1. Introduction ....................................................................................175
2. Examples of numeric visual correlations........................................181
3. Classification of visual correlation methods ..................................189
4. Visual correlation efficiency ..........................................................191
5. Visual correlation: formal definitions, analysis, and theory..........193
6. Conclusion......................................................................................202
7. Acknowledgements ........................................................................203
8. Exercises and problems ..................................................................203
9. References ......................................................................................203
viii
1. Introduction ....................................................................................207
2. Iconic queries .................................................................................210
3. Composite icons .............................................................................213
4. Military iconic language.................................................................215
5. Iconic representations as translation invariants ..............................219
6. Graphical coding principles............................................................220
7. Perception and optimal number of graphical elements ..................224
8. Conclusion......................................................................................227
9. Acknowledgments ..........................................................................228
10. Exercises and problems ..................................................................228
11. References ......................................................................................228
1. Introduction ....................................................................................231
2. The main concepts of the Bruegel iconic system ...........................232
3. Dynamic icon generation for visual correlation .............................237
4. The Bruegel iconic language for automatic icon generation ..........243
5. Case studies: correlating terrorism events ......................................247
6. Case studies: correlating files and criminal events........................254
7. Case studies: market and health care.............................................256
8. Conclusions ....................................................................................259
9. Acknowledgments ..........................................................................260
10. Exercises and problems ..................................................................260
11. References ......................................................................................261
1. Introduction ....................................................................................265
2. Related work...................................................................................267
3. Demonstration dataset and preprocessing ......................................268
4. Multidimensional scaling ...............................................................270
ix
1. Introduction ....................................................................................293
2. The system overview......................................................................295
3. The system architecture..................................................................298
4. Analysis of spatial data...................................................................303
5. Conclusion......................................................................................314
6. Acknowledgements ........................................................................315
7. Exercises and problems ..................................................................315
8. References ......................................................................................316
1. Introduction ....................................................................................319
2. The Predictive model markup language .........................................321
3. VizWiz: interactive visualization and evaluation ...........................324
4. Related work...................................................................................330
5. Discussion ......................................................................................331
6. Acknowledgements ........................................................................331
7. Exercises and problems ..................................................................332
8. References ......................................................................................332
x
1. Introduction ....................................................................................335
2. Neural network based techniques ...................................................338
3. Evolving cascade neural networks .................................................342
4. GMDH-type neural networks .........................................................348
5. Neural-network decision trees........................................................355
6. A rule extraction technique ............................................................366
7. Conclusion......................................................................................368
8. Acknowledgments ..........................................................................368
9. Exercises and problems ..................................................................368
10. References ......................................................................................369
1. Introduction ....................................................................................371
2. Definitions ......................................................................................374
3. Theorem on simultaneous scaling ..................................................375
4. A test example ................................................................................377
5. Discovering simultaneous scaling ..................................................378
6. Additive structures in decision making ..........................................380
7. Physical structures ..........................................................................382
8. Conclusion......................................................................................384
9. Exercises and problems ..................................................................385
10. References ......................................................................................385
1. Introduction ....................................................................................387
2. A method for visualizing data ........................................................390
3. Methods for visual data comparison...............................................392
4. A method for visualizing pattern borders .......................................395
5. Experiment with a Boolean data set ...............................................398
6. Data structures and formal definitions ...........................................403
7. Conclusion......................................................................................404
8. Exercises and problems ..................................................................405
9. References ......................................................................................406
xi
1. Introduction ....................................................................................409
2. Combining and resolving conflicts with geospatial datasets..........411
3. Measures of decision correctness ...................................................422
4. Visualization...................................................................................426
5. Conflict resolution by analytical and visual conflation agents.......428
6. Conclusion......................................................................................431
7. Acknowledgements ........................................................................432
8. Exercises and problems ..................................................................432
9. References ......................................................................................432
1. Introduction ....................................................................................435
2. Image inconsistencies.....................................................................438
3. AVDM framework and complexities space ...................................444
4. Conflation levels.............................................................................446
5. Scenario of conflation.....................................................................449
6. Rules for virtual imagery expert.....................................................454
7. Case study: pixel-level conflation based on mutual information ...459
8. Conclusion......................................................................................470
9. Acknowledgements ........................................................................471
10. Exercises and problems ..................................................................471
11. References ......................................................................................471
1. Introduction ....................................................................................473
2. Algebraic invariants .......................................................................475
3. Feature correlating algorithms........................................................491
4. Conflation measures .......................................................................500
5. Generalization: image structural similarity ....................................507
xii
6. Conclusion......................................................................................507
7. Acknowledgements ........................................................................507
8. Exercises.........................................................................................507
9. References ......................................................................................508
1. Introduction ....................................................................................509
2. Steps of the algorithm development technology ............................512
3. Parameter identification steps.........................................................514
4. Attempt to formalize parameters ....................................................518
5. Analyze parameter invariance ........................................................520
6. Conflation algorithm development.................................................522
7. Determine conflatable images and algorithm limitations...............528
8. Software and computational experiment ........................................529
9. Conclusion......................................................................................534
10. Acknowledgements ........................................................................534
11. Exercises and problems ..................................................................534
12. References ......................................................................................535
1. Introduction ....................................................................................537
2. Shortcomings of previous attempts to deal with the subject ..........539
3. Goals and IVES System Architecture ............................................542
4. Interactive on-the-fly analysis and recording .................................544
5. Multi-image knowledge extractor ..................................................546
6. Iconic Markup in IVES ..................................................................551
7. Iconic ontological conflation..........................................................552
8. Conclusion......................................................................................559
9. Acknowledgements ........................................................................559
10. Exercises and problems ..................................................................559
11. References ......................................................................................560
Visual problem solving has been successful for millennia. The Pythago-
rean Theorem was proved by visual means more than 2000 years ago. The
entire study of geometry existed as a visual problem-solving field more than
one and a half millennia before René Descartes invented symbolic coordi-
nates. Albert Einstein wrote in 1953 that the development of Western Sci-
ence is based on two great achievements: the invention of the formal logic
system (in Euclidean geometry) and reasoning based on systematic experi-
mentation during the Renaissance. In the context of this book, it is important
to notice that the formal logical system in Euclidean geometry was visual.
Consider two other important historical examples of visual problem solv-
ing and decision making. Maritime navigation by using the stars presents an
example of sophisticated visual problem solving and decision-making. Then
in the 19th century, John Snow stopped a cholera epidemic in London by
proposing that a specific water pump be shut down. He discovered that pump
by visually correlating data on the city map. Of course, there continue to be
many current examples of advanced visual problem solving and decision-
making.
This book presents the current trends in visual problem solving and deci-
sion-making making a clear distinction between the visualization of an al-
ready identified solution and visually finding a solution. Thus, the book
focuses on two goals:
(G1) displaying a result or solution visually, and
(G2) deriving a result or solution by visual means.
The first goal has two aspects: G1(a) displaying results to a novice and
G1(b) convincing a decision maker. Recently mass media (US News and
World Report, Dec. 2003, p.30) reported that intelligence analysts knew the
danger of coming September 11 but convincing decision makers was one of
their major challenges: “There were people who got it at the analyst level, at
the supervisory level, but all of us were outnumbered”. A novice simply
does not know the subject but has no prejudice, priorities, special interests or
xiv
efficiency. The chapter finishes with a more formal treatment of visual cor-
relation, providing formal definitions, analysis, and theory.
Chapter 9 presents the state-of-the-art in iconic descriptive approaches to
annotating, searching, and correlating that are based on the concepts of com-
pound and composite icons, the iconic annotation process, and iconic que-
ries. Specific iconic languages used for applications such as video annota-
tion, military use and text annotation are discussed. Graphical coding princi-
pals are derived through the consideration of questions such as: How much
information can a small icon convey? How many attributes can be displayed
on a small icon either explicitly or implicitly? The chapter also summirizes
impact of human perception on icon design.
Chapter 10 addresses the problem of visually correlating objects and
events. The new Bruegel visual correlation system based on an iconographic
language that permits compact information representation is described. The
description includes the Bruegel concept, functionality, the ability to com-
press information via iconic semantic zooming, and dynamic iconic sen-
tences. The formal Bruegel iconic language for automatic icon generation is
outlined. The chapter is devoted to case studies that describe how Bruegel
iconic architecture can be used.
Part 4 addresses visual and spatial data mining and consists of Chapters
11-16. Chapter 11 introduces two dynamic visualization techniques using
multi-dimensional scaling to analyze transient data streams such as news-
wires and remote sensing imagery. The chapter presents an adaptive visuali-
zation technique based on data stratification to ingest stream information
adaptively when influx rate exceeds processing rate. It also describes an in-
cremental visualization technique based on data fusion to project new infor-
mation directly onto a visualization subspace spanned by the singular vectors
of the previously processed neighboring data. The ultimate goal is to lever-
age the value of legacy and new information and minimize re-processing of
the entire dataset in full resolution
In Chapter 12, the main objective of the described spatial data mining
platform called SPIN! is to provide an open, highly extensible, n-tier system
architecture based on the Java 2 Platform, Enterprise Edition. The data min-
ing functionality is distributed among (i) Java client application for visuali-
zation and workspace management, (ii) application server with Enterprise
Java Bean container for running data mining algorithms and workspace
management, and (iii) spatial database for storing data and spatial query exe-
cution. In the SPIN! system, visual problem solving involves displaying data
mining results, using visual data analysis tools, and finally producing a solu-
tion based on linked interactive displays with different visualizations of
various types of knowledge and data.
xviii
This book is the first guide to focus on visual decision making and prob-
lem solving in general and for geospatial applications specifically. It com-
bines theory and real-world practice. The book includes uniformly edited
contributions from a multidisciplinary team of experts. Note that the book is
not a collection of independent contributions, but rather a book of intercon-
nected chapters. The book is unique in its integration of modern symbolic
and visual approaches to decision making and problem solving. As such, it
ties together the monograph and textbook literature in this new emerging
area. Each chapter ends with a summary and exercises.
The intended audience of this book is professionals and students in com-
puter science, applied mathematics, imaging science and Geospatial Infor-
mation Systems (GIS). Thus, the book can be used a text for advanced
courses on the subjects such as modeling, computer graphics, visualization,
image processing, data mining, GIS, and algorithm analysis.
We would like to begin our acknowledgements by thanking all the con-
tributing authors for their efforts. A significant part of work presented in
this book has been sponsored by the US Intelligence Community, the De-
partment of Defense, and the Department of Energy. Authors of individual
chapters have made such acknowledgements in their respective chapters.
Several other chapters have been supported by European funding agencies
that are also acknowledged in the individual chapters. All support is grate-
fully acknowledged. Special thanks go to ARDA/NGA GI2Vis Program and
NGA Academic Research Program managers, panel members, and partici-
pants for their interest, stimulating discussions and a variety of support. Sev-
eral students contributed to this book as co-authors and others assisted us in
other forms. Richard Boyce, Mark Curtiss, Steven Heinz, Ping Jang, Bea
Koempel-Thomas, Ashur Odah, Paul Martinez, Logan Riggs, Jamie Powers,
and Chris Watson provided such assistance.
Please find book-related information at www.cwu.edu/~borisk/bookVis.
Boris Kovalerchuk
Central Washington University
Abstract: This chapter provides a conceptual link between the decision making process,
visualization, visual discovery, and visual reasoning. A structural model of the
decision making process is offered along with the relevant visual aspects. Ex-
amples of USS Cole incident in 2000 and the Cholera epidemic in London in
1854 illustrate the conceptual approach. A task-driven visualization is de-
scribed as a part of the decision making process and illustrated with browsing
and search tasks.
Key words: Decision making, visualization, visual reasoning, visual discovery, task-driven
approach.
1. CURRENT TRENDS
Visual problem solving has been known for millennia as both great suc-
cess and failure in science, mathematics, and technology. The quotes above
sum up these facts in a few words. Below we present current trends in this
area that indicate that the field is moving (1) from mostly geometric visuals
to more abstract algebraic, symbolic visuals; (2) from the visualization of
solutions to finding solutions visually, (3) from visual data mining to finding
solutions visually; (4) from drawing tools to visual discovery and conceptual
analysis, and (5) from abstract decision models to visual decision models.
4 Chapter 1
The Pythagorean Theorem was proven by visual means more than 2000
years ago. The entire study of geometry existed as a visual problem-solving
field more than one and a half millennia before René Descartes invented
symbolic coordinates.
In 1953, Albert Einstein wrote that the development of Western Science
is based on two great achievements: the invention of the formal logic system
(in Euclidean geometry) and reasoning based on systematic experimentation
during the Renaissance. For us it is important to notice that the formal logi-
cal system in Euclidean geometry was visual.
Historically in mathematics visuals are associated with geometry that can
be traced to the concept of the number in Greek mathematics. This contrasts
with “non-visual,” abstract mathematics that began with Descartes and ana-
lytic geometry [Schaaf, 1930]. In other words, this is the fundamental con-
ceptual difference between concrete visual forms (in geometry) and abstract
forms (in algebra) of mathematics. Typically, an abstract algebraic form is
not considered to be a visual form although abstract symbols, icons, are
graphical representations.
However, the point is that the algebraic abstract form is also visual and in
some sense, the algebraic visual form is more general than the geometric
visual form. To be abstract does not necessarily mean to be non-visual. The
concept can be visual, abstract and very productive simultaneously. Impor-
tant differences between abstract and concrete hide significant similarities
between geometric and algebraic approaches – both of them are visual
forms, but concrete and abstract respectively.
The geometric form often is an individual invention for a specific task
and that is not applicable to other tasks. This was one of the main reasons
that mathematics moved from the geometric proofs of Greek mathematics to
the more abstract Cartesian mathematics that permitted to work on geometric
problems in an algebraic form.
Chapter 5 and 7 of this book show the productivity of the algebraic visual
approach using historical examples. This productivity derives from the fact
that solving algebraic equation in symbolic form is much more efficient than
using words or geometry.
Let us begin our considerations here with an analysis that uses architec-
tural Computer Aided Design (CAD) tools. Typical CAD tools help an
architect implement an architectural solution similar to how a pen helps us
recording our everyday solutions. This is obviously useful but it is not
guiding an architectural solution. Finding an architectural solution is an
extremely difficult task, because in essence architecture is art. This means
that we need a visual discovery tool that is more complex than visualization
of already discovered solution. One might argue that the architect can start
sketching without a clear solution and can discover it in the process of using
a CAD tool. Thus, the CAD tool would be a discovery tool too. We would
disagree. Leo Tolstoy wrote and rewrote some of his famous works many
times with a simple pen. Should we call his pen a discovery tool?
It is extremely difficult to distinguish between a genuine visual discov-
ery tool and a less sophisticated tool. There are two extremes:
1. visual tools that provide an algorithmic solution and
2. visual tools that only help in recording a solution.
Most available tools lie somewhere between these two extremes. They sup-
port recording and visualization of the solution and partially support discov-
ery of the solution.
At first glance, it maybe surprising but one of the best-known visual
techniques that provides an algorithmic solution is basic elementary school
arithmetic known for centuries. For instance, adding 35 and 17 we (a) write
them one under another, (b) add 5 and 7, (c) write 2 which is the lesser digit
of the sum 5+7 below the result line, (d) write 1 that is the greater digit of
the sum 5+7 and known as the carry in a separate location, (e) add next dig-
its 3 and 1, (f) add carry to them, and (g) write the result 5 below the result
line next to the 2. This explanation text is less clear than a graphical form
where 17 is really written under 35.
What is important in this example? The example shows an algorithmic
process, not an art, where the result varies from person to person. Every per-
son with an elementary school background should produce the same number
52 when the sequence described above is followed.
In architectural design, different architects can add two buildings to each
other very differently. The CAD tool will support any of these solutions
without guiding the solution. Thus, this process is not algorithmic but rather
artistic. Somebody can argue that adding a building is much more complex
task than adding numbers. It is true, but thousands years ago adding numbers
was not an easy task either nor is it an easy task for elementary school stu-
dents today. Later in this book, you will find the Chapter 4 written by D.
1. Decision process and its visual aspects 7
This scale is depicted in Figure 1. Typical current CAD systems are obvi-
ously not fully algorithmic.
Conversion ↓↑
2. CATEGORIES OF VISUALS
metric proofs have been published [Loomis, 1968]. All of them represent a
reasoning category on creativity scale. One of visual proofs, shown in figure
4, is in essence the same as that used by Euclid. Now it is an animated Java
applet [Morey, 1995]. Figure 5 is quite different from figure 4 fits more
readily into the illustration category. Figure 5 both visualizes and visually
proves the statement of the Theorem for a triangle with sides of length 3, 4,
and 5, i.e. 32+42=52. It does not prove the Theorem’s general statement that
a2+b2=c2 for any right triangle.
Both figures 4 and 5 lack the ability to show us how the theorem could be
discovered. A proof deals with a hypothesis that should be proved (verified)
or refuted. To do this we first need to generate that hypothesis. That is to
have a visual process helping generate an initial set of reasonable hypotheses
that should include a true theorem statement. Next, we need to test hypothe-
ses visually. Without a discovery process the situation would be similar to
finding an exit in the maze using random trials. Thus, we distinguish three
categories of visuals applied to theorems:
1) Illustration: visualization of the theorem statement (Figure 5),
2) Reasoning (verification): visualization of the proof process for the
theorem’s statement (see Figure 4, triangles are moved without
changing their areas), and
3) Discovery: visualization of the discovery process that identifies theo-
rem’s statement as a hypothesis.
c2
a2
b2
For the Pythagorean Theorem when we are not proving the theorem, but
using its proved result (a2+b2=c2) in a particular situation (e.g., a=3, b=4),
the reasoning step will be computing the specific numeric result,
c = 5 = 32 + 4 2 , by applying the theorem. Thus, for mathematical tasks
based on the use of the theorems we can describe categories of visuals in the
following way:
• Illustration: visualization of the solution of an individual task based
on the use of the theorem, (e.g., Figure 4 illustrates both the general
statement of Pythagorean Theorem and an individual task with spe-
cific sides 3, 4 and 5).
• Reasoning: verification (proof) of the theorem, and computation of
the result by applying the theorem, e.g., computing side c of the
right triangle given sides a and b.
• Discovery: visualization of the discovery process that identifies
theorem’s statement as a hypothesis.
In visual decision making the listed categories have their counterparts:
• Illustration: visualization of the decision (solution) statement
• Reasoning: explanation of why this is a correct decision, verification
of the hypothesis, and visualization of the reasoning process that
leads to the decision statement using a verified hypothesis.
• Discovery: visualization of the process of hypothesis discovery.
Figure 7 combines the scales for algorithmic and creativity levels. Here
the sizes of circles indicate the relative number of methods currently avail-
able (part a) and the relative number of methods desired (part b). It is obvi-
ous from this figure that new methods dealing with full-solution algorithms
for discovery tasks are in short supply.
Discovery Discovery
Reasoning Reasoning
Illustration Illustration
Below we provide some examples of tasks that occupy the extreme cells
in Figure 7. The first one is an illustration (I) combined with informal algo-
rithm (IA) for accomplishing a task, <I, IA>. Architectural drawing using
CAD tools falls into this category. Another extreme is a discovery task (D)
combined with a full-solution algorithm (FA) for discovery, <D, FA>. Dis-
covery of the number π by interpolating a circle with polygons belongs to
this category. Increasing the number of sides in the polygon increases accu-
racy of the solution. The two other extremes are represented by <D, IA> and
1. Decision process and its visual aspects 13
<I, FA>. Here <D, IA> means the discovery of the solution with informal
algorithm in hand. A variety of guidelines in architectural and engineering
design reside in this category. Such tools do not go beyond providing insight
for finding a solution. The case described by <I, FA> is an illustration of an
algorithm that provides a full-solution. For instance, it can show the regular
polygon with, say, 100 sides and the approximation to the number π that it
provides. Figure 5 represents another example in this category.
An example of the combination of reasoning (R) with a full-solution al-
gorithm <R, FA> is the visual verification of the Pythagorean Theorem (see
Figure 4).
The center point on both scales in Figure 7 is a combination of reasoning
with a heuristic algorithm (HA) or <R, HA>. The visual, interactive schedul-
ing of jobs using heuristic, greedy strategies such as “largest jobs first” is a
representative of this category.
The creativity scale can be further elaborated by distinguishing between
the Discovery of an Individual solution (DI) and the Discovery of a Process
(DP) that can lead to several individual solutions.
Reasoning also has two subcategories. The first one is finding a solution
by Applying a verified solution Process (AP), e.g., finding the hypotenuse
for the triangle with sides 3 and 4 by applying the general statement of the
Pythagorean Theorem. The second more challenging task is Verification of
the solution Process (VP), e.g., proving the Pythagorean Theorem.
In Chapters 5 and 7, we provide more examples of visualization as illus-
tration, visual reasoning, and visual discovery from the history of mathemat-
ics. The examples include the process of discovery (number and visual
counting. Such analysis establishes a background for developing visual deci-
sion-making processes for modern tasks.
3. A MODELING APPROACH
World map Region Ship moves Ship with Ship’s dam- Injured
with a map with in the port identified age sailor in the
marked route and with the armament area hospital bed
region of the city of route of the and damage
incident incident boat-bomb area
These two examples help us to make a point about the concept of deci-
sion making visualization (DMV); namely, that is visualization useful for
decision making based on:
• a discovered relation/pattern (DRP) and
• a decision making model (DMM).
The first example (Aden) is creatively impressive, but does not include the
components DRP and DMM. The second example (London) includes both
of them:
1. Discovered relation – people who used water from well d (death) on
Broad St. died more often from cholera than people who used any other
well:
∀ i (i ≠ d) D(d) > D(i),
where D(i) is the number of dead after drinking water from well i.
“There were only ten deaths in houses situated decidedly nearer to an-
other street Broad St. pump.” [Tufte, 1997; Snow, 1855].
2. Decision making model – shut down a well d if the death toll of people
who used this well is higher than that for people who used other wells:
∀ i (i ≠ d) D(d)>D(i) Shutdown(d).
The DMM is very simple and people often do not even notice that the model
is there. However, this simple model is a result of very non-trivial discovery
by Dr. Snow of the relation between use of well water and death toll [Tufte,
1997, Snow, 1855].
Next we note that two categories of DDM models are necessary:
(a) A model for the decision-maker (e.g., city managers or the board of
guardians of the parish) and
(b) A model for the analyst (e.g., Dr. Snow) who discovers relations for
a decision-maker.
The model (a) for the decision maker can and should be simple, similar to
the decision making model in (2.) above. The model (b) for the analyst must
be complex enough to cover a wide range of possible decision alternatives.
In the London example, the decision making model (2.) produced a single
decision alternative – to shut down the pump/well d. A model of type (b) for
the analyst might include many other alternatives to be explored:
1. Restrict the access of new people to the city,
2. Restrict the contact between people in the city limit,
3. Restrict the consumption of certain foods,
4. Use certain medications,
5. Restrict the contact of the population with certain animals,
6. Restrict the consumption of some drinks.
16 Chapter 1
Actually, the research of Dr. Snow resulted in the last alternative (spe-
cifically to restrict/prohibit consumption of water from the well d). We have
no historic evidence that Dr. Snow really considered all the alternatives (1) -
(6). It is most likely that he came to the well water alternative without a for-
malized decision making model such as the model (b). Our goal is to show
that if his decision-making and visualization process had been driven by a
DMM with alternatives (1)-(6) then the water alternative (6) would have sur-
faced naturally and would have been investigated. This alternative can guide
an investigation (including exploratory visualizations) instead of relying on
insight of such extraordinary people as Dr. Snow. We illustrate the concept
of model-based approach in Figure 9.
This conceptual model has two components. The first component in-
volves an analyst, who builds a DMM model and discovers some relation-
ships. The second component involves a decision maker who works with
relationships discovered by the analyst. This work is based on discovered
and visualized relations and a DMM for actual decision.
Analyst
Decision-maker
pumps, methods of water treatment, and the type of population. These ob-
jects may be suspected of being related to the high death toll. We call this
structured information for the DM. Thus, the DM model will grow like a
tree (see Figure 10). The rectangles show relations to be investigated.
After providing such structural information, an analyst can investigate re-
lations between death toll and each of the components: pumps, distribution
routes, methods of water treatment and type of population. Currently this
process is done by spatial data mining techniques (see Chapter 12 on SPIN
system in this book). Visualization is a natural element in this analytical
process.
Goal:
decrease
death toll
(a) (b)
Figure 11. Visual correlation in the process of discovery of relations
knowledge with likely states based on the context of the problem [Marsh,
2000]. Sometimes this stage is called understanding.
Now the analyst can report the discovery to decision-makers (city man-
agers). If they want to be sure that the analyst did not overlook some impor-
tant alternatives then graphs (a) and (b) from Figure 11 provide them this
information. If the decision-makers just want to consider a course of action
based on the discovered relation then only the simple path marked on Figure
11 (b) is needed. The details of this path are presented on Figure 12. It shows
the discovered pattern, its visual correlation with the decision (shut down the
pump) and the visual correlation of the decision with the ultimate goal –
decrease death toll.
Restrict water
(shut down the pump)
Relation
Pump-Death
Figure 12. Final decision making model with discovered relation: visual correlation approach
Broad St.
Death toll
500
Next, note that the visualization in Figure 14 is not new to decision mak-
ers; they are familiar with this type of plot. This is a standard plot of the rela-
tion used for checking correlation. Thus, decision makers can concentrate on
making decisions instead of studying a new method for presenting data.
We did not find evidence that, with respect to final decision making, the
simple visual correlation (Figure 14) has any disadvantages in comparison
with maps such as Figure 13 when a relation is already discovered. It seems
that a simple visual correlation can serve very well. Moreover, it is possible
that new developments of visual correlation methods for the final decision-
making are not needed.
Consistent use of known visual correlation methods and their combina-
tions has an obvious advantage – decision makers know and trust them. In
the next section, we review more specifically visualizations suitable for this
20 Chapter 1
stage. We note, however, that the situation can be much different when vis-
ual correlation is needed as a tool for discovery of unknown relations. Dis-
covery of relations and their visual aspects is a major subject in such areas as
computational intelligence, data mining, machine learning, and knowledge
discovery (see for example [Kovalerchuk & Vityaev, 2000]).
5. CONCEPTUAL DEFINITIONS
decision
Objectives, priorities,
doctrine, constraints appreciation
understanding
perception
knowledge
information
Real world
Figure 15. A new view of decision making (based on Marsh, 2000)
Each model element can be implemented with some level of visual sup-
port. Ideally, visualization of an element is derived from its role in the
model. For instance, visualization of data, information, and knowledge is
selected using the goal of the decision-making process as discussed above
for the epidemic example.
1. Decision process and its visual aspects 21
The model suggested by [Marsh, 2000] operates with concepts that in-
clude data, information, knowledge, understanding, perception, context, ex-
perience, decision, and goal. Other concepts used in the model are apprecia-
tion, priorities, a doctrine, and constraints. Constraints may include tactics,
techniques, and procedures (TTPs).
There are several and somewhat contradictory interpretations of these
concepts in literature. Watson [Watson, 2002] defines data as properties of
things and knowledge as a property of agents predisposing them to act in
particular circumstances. Boisot [Boisot, 1998] defines information as a
subset of the data residing in things that activates an agent – it is filtered
from the data by the agent’s perceptual or conceptual apparatus. Marsh
[Marsh, 2000] defines knowledge, understanding, and appreciation as fol-
lows:
• Knowledge is the matching of available information to known entities
and behaviors in the real world.
• Understanding is the matching of the knowledge with one or more
likely states based on the context of the problem.
• Appreciation is interpreting the currently understood situation in
terms of the desired end states and choosing the response that best
meets the objective.
Thus, appreciation is defined as a higher form of reasoning that incorpo-
rates both knowledge and understanding. Figures 9 and 10 adapted from
[Marsh, 2000] show the central role of correlation in this view of the deci-
sion-making process. It is consistent with our view on decision-making
process as described above (see Figures 9-14).
According to Figure 16 correlation is:
a) a procedure matching information based on observations with multiple
alternatives (hypotheses) regarding the current situation, or
b) a procedure matching perception of the situation (based on knowledge
and assumptions) with multiple alternatives (hypotheses) regarding the
current situation.
This concept of correlation is somewhat more general than the traditional
correlation concept. The traditional concept often assumes that we correlate
entities of the same modality, e.g., we may correlate stock market data for
different days. In the correlation concept given above, entities are correlated
with entities of potentially different nature: information is correlated with
hypotheses about information; perception of situation is correlated with hy-
potheses about the situation. In essence, entities are correlated with their
possible explanations, which may have a very different structure and nature.
There is also a significant difference between (a) and (b) in the level of
human involvement. For example, in the case of Challenger catastrophe as
noted in [Tufte, 1997], correlation between low temperature and a high fail-
22 Chapter 1
ure rate was established. That is, correlation (a) was in place, but correlation
(b) the perception of the situation was not highly correlated with the high
failure rate. Thus, knowledge existed, but understanding of the situation did
not.
Understanding of
the situation
perception
Knowledge and
assumptions
Directed search
Interpretation of information for information
Real world
Figure 16. Building knowledge from information and understanding from knowledge
(based on [Marsh, 2000])
This definition fits well with the classical mathematical concept of corre-
lation, but at first glance, it does not include correlation between observa-
tions without any specific hypothesis (alternatives) explicitly formulated. A
close look at the concept of correlation in fact assumes that there are some
alternative hypotheses behind the scene. If somebody told us that A and B
are correlated we would ask in what sense, i.e., we want to know what it
means that A and B are correlated. The answer could be that pair (A, B) is
correlated with the hypothesis of a linear relation between A and B, bi = kai,
where B = {bi} and A = {ai}.
There could be many other possible alternative hypotheses such as
b=ka2. Thus, the commonly used expression that A correlates with B actu-
ally is a simplification of more exact statement that pair (A, B) is correlated
with hypothesis H, where H states the type of relationship between the data
A and B.
1. Decision process and its visual aspects 23
Identify a
Objective: Stop epidemic desired state
Constraints: Can implement and chose
only actions A1,A2,…,An decision a response
knowledge
Available
inf ormation
information correlated
with possible
states
context data experience
Real world
7. TASK-DRIVEN APPROACH TO
VISUALIZATION
8. CONCLUSION
9. ACKNOWLEDGEMENTS
5. Find or design visualization that fits a searching task. Justify your solu-
tion.
6. Find or design visualization that fits a browsing task. Justify your solu-
tion.
Advanced
9. Elaborate conceptually the general steps of a task-driven visualization
approach: (1) analysis of a user task, (2) generation of equivalent percep-
tual tasks that can be performed more efficiently, and (3) design of ac-
companying graphics to support the efficiency of the perceptual task. Tip:
Your elaboration should decompose these three steps to smaller substeps.
Be specific and provide examples for the substeps.
11. REFERENCES
Beshers, C., Feiner, S. AutoVisual: Rule-based design of interactive multivariate visualiza-
tions. IEEE Computer Graphics and Applications, 13(4), 1993, 41-49.
http://www.cs.columbia.edu/graphics/projects/AutoVisual/AutoVisual.html
Boisot, M. Knowledge assets, securing competitive advantage in the information economy,
Oxford: Oxford University Press, 1998.
1. Decision process and its visual aspects 29
Calderbank, R., Sloane, N. Claude Shannon 1916-2001, Nature, 410 (6830) April 12, 2001,
768. Accessed Jan. 2004 http://www.research.att.com/~njas/doc/ces5.html
Casner, S. A Task-Analytic Approach to the Automated Design of Graphic Presentations,
ACM Transactions on Graphics, 10 (2) April 1991, 111-151, (Online Version:
http://portal.acm.org/citation.cfm?doid=108360.108361)
Card, S. K., Mackinlay, J. The Structure of the Information Visualization Design Space.
IEEE Symposium on Information Visualization, Phoenix, AZ, 1997, 92-99.
http://www2.parc.com/istl/projects/uir/pubs/pdf/UIR-R-1996-02-Card-InfoVis97-
DesignSpace.pdf.
Feijs, L. and de Jong, R. 3D visualization of software architectures, Communications of
ACM, 41 (12) 2000, 73-78.
Feiner,S. APEX: An experiment in the automatic creation of pictorial explanation, IEEE
Comput. Graph. Appl.,Nov., 1985, 29-37.
Friedell, M. Context-sensitive, graphic presentation of information. Computer Graphic. 16 (3)
1982, 181-168.
Gnanamgari, S. Information presentation through default displays. Ph.D. dissertation, Univ.
of Pennsylvania, May 1981.
Halpin, T., UML Data Models From An ORM Perspective,
http://www.orm.net/uml_orm.html, 2000.
Hany, F., A Picture Tells a Thousand Lies, New Scientist (09/06/03) 179 (2411) 2003, 38.
Jarvenpaa, S. L., Dickson, G. W. Graphics and managerial decision making Research Based
Guidelines. Commun. ACM 31 (6) 1988, 764-774.
Kovalerchuk, B., Vityaev, E., Data Mining in Finance: Advances in Hybrid and Relational
Methods, Boston: Kluwer Acad. Publ., 2000.
Kerpedjiev, S., Roth, S., Mapping Communicative Goals into Conceptual Tasks to Generate
Graphics in Discourse, Proceedings of Intelligent User Interfaces (IUI '00), New Orleans,
LA, January, 2000, 60-67 (Online Version: http://www-
2.cs.cmu.edu/Groups/sage/Papers/IUI00/gmicro9.pdf)
Kerpedjiev, S., Carenini, G., Roth, S. F., and Moore, J. D. Integrating Planning and Task-
based Design for Multimedia Presentation, International Conference on Intelligent User
Interfaces (IUI '97), Orlando, FL, January 1997, ACM, 145-152 (On line Version:
http://www-2.cs.cmu.edu/Groups/sage/Papers/IUI-97/IUI-97.html)
Larkin, J., Simon, H. Why a diagram is (sometimes) worth 10,000 words. Cognitive Sci. 11,
1987, 65-99.
Loomis, E.S. The Pythagorean Proposition: Its Demonstrations Analyzed and Classified and
Bibliography of Sources for Data of the Four Kinds of ``Proofs'', 2nd ed. Reston, VA: Na-
tional Council of Teachers of Mathematics, 1968. 284 p
Mackinlay, J. Automating the design of graphical presentations of relational information.
ACM Trans. Graph. 5,2, Apr. 1986, 110-141 (Online Versión:
http://www2.parc.com/istl/projects/uir/pubs/pdf/UIR-R-1986-02-Mackinlay-TOG-
Automating.pdf)
Mackinlay, J. D. Applying a Theory of Graphical Presentation to the Graphic Design of User
Interface, Proceedings of the ACM SIGGRAPH Symposium on User Interface Software
(UIST '88), 1988, 179-189. http://www2.parc.com/istl/projects/uir/pubs/pdf/UIR-R-1988-
05-Mackinlay-UIST88-Applying.pdf
Mackinlay, J. D. and Genesereth, M. R. Expressiveness and Language Choice, Data and
Knowledge Engineering 1(1, June), 1985, 17-29.
http://www2.parc.com/istl/projects/uir/pubs/pdf/UIR-R-1985-01-Mackinlay-DKE-
Expressiveness.pdf.
Marsh, H.S. Decision Making and Information Technology, Office of Naval
30 Chapter 1
Stephen G. Eick
Univeresity of Illinois and SSS Research, Inc., USA
Figure 2. Information visualizations for presentation and branding. Left NASDAQ display
and Right: Visual Insights’ eBizLive product for showing website activity. See also color
plates.
in particular, for many tasks a picture is more useful than a large table of
numbers.
Figure 2. Executive Dashboard courtesy of Bill Wright. See also color plates.
Visual reports, as with all reports, are a tool for assumptive-based analy-
sis. Reports answer “point questions”: How much of a particular item is in
stock? Where is it? How long will it take to get more? Reports are ideal for
operational tasks, but do not provide full analytics, or enable an analyst to
automatically discover new information that a user has not thought to ask
about.
This is a well-known characteristic of all report-based analytical solu-
tions. The reports pre-assume relationships that are reported upon. The diffi-
2. Information visualization value stack model 35
culty with this approach is that most environments are too complex for a pre-
defined report or query to be exactly right. The important issues will un-
doubtedly be slightly, but significantly different. This is particularly true for
complex, turbulent, environments where the future is uncertain. There are
two common solutions to this problem. The first is to create literally hun-
dreds of reports that are distributed out to an organization, either using a
push distribution mechanism such as email or a pull mechanism involving a
web-based interface. The second involves adding a rich customization ca-
pability to the reporting interface that increase UI complexity. Unfortu-
nately, neither works particularly well. Although a report containing novel
information might exist, finding it is like finding a needle in a haystack.
Adding UI features makes the reporting system difficult to use for non-
specialists.
Figure 5. Bar chart scalability is increased by using levels of rendering detail and a red over-
plotting indicator at the top of the view. Scalability in this case facilitates locating and then
focusing attention on particular bars.
See also color plates.
1
The original version of this idea is due to Doug Cogswell.
2. Information visualization value stack model 41
2
Microsoft, the largest producer of shrink-wrap software, sells essentially all of its software
through distributors.
3
For comparison, a software application that sold 2,000 to 5,000 units would generally be
considered successful.
44 Chapter 2
6. CONCLUSION
7. ACKNOWLEDGEMENTS
Advanced
2. Extend information visualization value stack model to ASP-based appli-
cations.
2. Information visualization value stack model 45
9. REFERENCES
Card, S.K., Mackinlay, J.D. and Shneiderman, B, Readings in Information Visualization:
Using Vision to Think. San Francisco, California: Morgan Kaufman, 1999.
Codd, E.F. Extending the Database Relational Model to Capture More Meaning, Association
for Computer Machinery, 1977.
Crow, V., Lantrip D., Pennock, K., Pottier, M, Schur, A., Thomas, J., et al. Visualizing the
Non-Visual: Spatial Analysis and Interaction with Information from Text Documents. In
Information Visualization ‘96 Proceedings; 1995 October 30. Atlanta, Georgia: IEEE
Computer Science Press 1995.
Eick, S. G. ,Scalable network visualization, Visualization Handbook 2003.
Eick, S.G. Visual discovery and analysis, IEEE Transactions on Computer Graphics and
Visualization January-March 2000; 6:44–59.
Eick, S. G., Visualizing Multi-Dimensional Data, IEEE Computer Graphics and Applications
2000; 34: 44-59.
Eick, S. G., and Fyock, D.E., Visualizing Corporate Data. AT&T Technical Journal 1996;
75:74-86.
Eick, S. G., Karr, A. F., Visual scalability, Journal of Computational Graphics and Statistics
March 2002; 11:22–43.
Eick, S. G., Wills, G.J., High Interaction Graphics. European Journal of Operational Research
1995; 81:445–459.
Gamma, E., Helm, R., Johnson, R., and Vlissides, J., Design Patterns. Addison-Wesley, 1995.
Hanrahan P., Stolte, C., and Tang, D. Polaris: A System for Query, Analysis, and Visualiza-
tion for Multidimensional Relational Databases. IEEE Transactions on Visualization and
Computer Graphics 2002; 8:52-63.
Havre, S. Hetzler, E., Nowell, L., and Whitney, P. Theme River: Visualizing Thematic and
Computer Graphics 2002; 8:9-20.
Hill, W.C., Hollan, J.D., McCandless, T., and Wroblewski, D., Edit Wear and Read Wear:
Their Theory and Generalizations, CHI ’91 Conference Proceedings 1991
Keim, D.A., Information Visualization and Visual Data Mining. IEEE Transactions on Visu-
alization and Computer Graphics 2002; 8:1-8.
PART 2
VISUAL AND
HETEROGENEOUS
REASONING
Chapter 3
Boris Kovalerchuk
Central Washington University, USA
Abstract: Reasoning plays a critical role in decision making and problem solving. This
chapter provides a comparative analysis of visual and verbal (sentential) rea-
soning approaches and their combination called heterogeneous reasoning. It is
augmented with a description of application domains of visual reasoning.
Specifics of iconic, diagrammatic, heterogeneous, graph-based, and geometric
reasoning approaches are described. Next, explanatory (abductive) and deduc-
tive reasoning are identified and their relations with visual reasoning are ex-
plored. The rest of the chapter presents a summary of human and model-based
reasoning with images and text. Issues considered include: cognitive opera-
tions, difference between human visual and spatial reasoning, and image rep-
resentation. One of the main our statements in this chapter is that the funda-
mental iconic reasoning approach proclaimed by Charles Peirce is the most
comprehensive heterogeneous reasoning approach.
Key words: Visual reasoning, spatial reasoning, heterogeneous reasoning, iconic reason-
ing, explanatory reasoning, geometric reasoning.
Figure 1. Pythagorean Theorem known to the Chinese and Indians [Kulpa, 1994]
Once again, this list shows that the process is very informal (and proba-
bly sometimes illogical). It is also illustrates the huge role of visual reason-
ing in all stages of the design process.
Another argument for visual reasoning is that more than one medium
provides information for reasoning that includes text and pictures, but formal
logic is limited by sentences [Shin & Lemon, 2003]. This argument is used
for supporting both pure visual reasoning and for heterogeneous reasoning
that combines text, pictures, and potentially any other medium [Barwise &
Etchemendy, 1995].
The next argument for visual and heterogeneous reasoning is related to
the speed and complexity of reasoning. Reasoning with diagrams and with-
out re-expressing the diagrams in the form of a sentence can be simpler
and faster. It avoids an unnecessary and non-trivial information conversion
process, by working directly with heterogeneous rules of inference, e.g.,
First Order Logic and Euler/Venn reasoning [Swoboda & Allwein, 2002;
Swoboda & Barwise, 2002].
2. ICONIC REASONING
One of the founders of the modern formal logic Charles Peirce (1839-
1914) argued for use of visual inference structures and processes a long time
52 Chapter 3
ago. Recently it has become increasingly clear that one of the fundamental
difficulties of automatic computer reasoning is that it is extremely hard to
incorporate the human observational and iconic part of the reasoning proc-
ess into the computer programs. Indeed, the extreme opinion is that it is sim-
ply impossible [Tiercelin, 1995].
Peirce stated that in order for symbols to convey any information, indices
and icons must accompany them [Peirce, 1976; Hartshorne, Weiss & Burks,
1958; Robin, 1967; Tiercelin, 1995]. It is probably not accidental that in ad-
dition to being a logician and a mathematician, Peirce was also a land sur-
veyor at the American Geodesic Coast Survey. In this capacity, he would
have first hand experience with real world visual and spatial geographic rea-
soning. We should note here that several chapters in this book deal with vis-
ual and spatial reasoning and problem solving related to combining geo-
graphic maps, aerial and satellite photos. Charles Peirce distinguished the
iconic, indexical and symbolic functions of signs. Table 1, based on
[Tiercelin, 1995], summarizes Peirce’s view of icons and diagrams.
3. DIAGRAMMATIC REASONING
B
A
B B
A A A
& C C
The power of this representation lies in the following facts [Shin &
Lemon, 2003]:
• Object membership is easily conceptualized by the object lying in-
side a set.
• Set relationships are represented by the same relationships among
the circles.
• Every object x in the domain is assigned a unique location in the re-
gion R.
• Conventions above are sufficient to establish the meanings of these
circle diagrams.
Now let us consider another original Euler example that involves an exis-
tential statement: “Some A is B” presented in [Shin & Lemon, 2003].
Example 2. No A is B. Some C is A. Therefore, some C is not B.
Euler’s solution is shown in Figure 4.
A C B A C B A C B
The idea of this solution comes from the fact that C may have several al-
ternative relationships with A and B. Figure 4 shows three such cases:
1. C overlaps with A and without overlapping with B,
2. C contains B and overlaps with A,
3. C overlaps with both A and B.
There are many other possible relationships of C with A and B. To identify
them we can construct a 6-dimensional Boolean vector (x1, x2,…,x6), where
x1=1 if C overlaps with A, but neither contains A nor is contained in A
x2=1 if C overlaps with B, but neither contains B nor is contained in B
x3=1 if C contains A,
x4=1 if C contains B,
x5=1 if C is contained in A,
x6=1 if C is contained in B.
Each xi is equal to 0 if the respective condition is not true. Potentially we
have 26=64 combinations, not just the three cases Euler listed. Some of them
are not possible or do not satisfy premises of example 2, but obviously more
than three cases are actually satisfy the premises.
Before analyzing some of these other cases, it is instructive to consider
Euler’s probable view of the “some” quantifier. It appears that Euler inter-
preted “some” not as a modern existential quantifier ∃ that assumes that if
something is true for all it is also true for some: ∀x P(x) ∃x P(x). It seems
3. Visual Reasoning and Representation 57
that Euler’s some is “only for some”, that is if P(x) is true only for some x
then another y exists that P(y) is false: Some x P(x) ∃y ¬P(x).
Now let us turn back to those cases not drawn by Euler. For instance, the
case <000011> where both A and B contain C is not possible, as that would
contradict the premise, “No A is B”. Another case not drawn in Euler’s proof
is obtained when C contains both A and B, <001100>. Under our usual un-
derstanding of “some” this should be drawn since if A and B do not overlap
this case satisfies both premises “No A is B” and “Some C is A” because all
A is C. Yet under Euler’s probable interpretation of some, this case would
not be drawn, as there is no x in A that is not also in C. Now we want to
check if Euler’s solution is complete assuming the interpretation of “some”
given above. Shin and Lemon [Shin & Lemon, 2003] do not accept Euler’s
solution.
However it is far from being visually clear how the first two cases lead a
user to reading off this proposition, since a user might read off “No C is
B” from case 1 and “All B is C” from case 2. The third diagram could be
read off as “Some B is A,” “Some A is not B,” and “Some B is not A” as
well as “Some A is B.
We disagree with this argument. Shin and Lemon accepted the first Euler
example shown in Figure 2 as delivering clear statements about relations
between sets. We can add more nested circles to Figure 2 and it will not be
clear what nested relation the user may want to read off. Recall, we added a
guide to the user, a reasoning diagram displayed with Euler diagrams (Figure
3). For example 2, perhaps the guide should at least be a verbal explanation
that Euler wants to show only those of 64 potential situations (“possible
worlds” in more modern terminology) where the statement is true.
From our viewpoint, Euler provided unique and constructive clarity in
example 2. He actually generated all possible cases given his use of the
“some” quantifier. This constructive algorithmic approach is very sound
from modern computer science viewpoint.
The need to develop another representation is coming from the fact that
when n, the number of predicates A, B, C grows the number of diagrams for
a single statement may grow exponentially with n and the compactness of
visual representation will be lost. This actually happened with Venn dia-
grams.
Venn diagrams were invented in 19th century [Venn, 1880, 1881] and are
widely used. Below we discuss examples provided in [Shin & Lemon, 2003]
58 Chapter 3
A B A B A B
&
All A are B & No A is B Nothing is A
Figure 6. Reasoning with Venn diagrams
At first glance, there is no intuitive meaning in representing “All A are B”
and “No A is B” with diagrams shown in figure 6. At least these diagrams do
not match to meaning of Euler diagram. Venn added shading to the legend to
represent the empty part of the diagram. In the left diagram, the shaded part
of A indicates that this part is empty. Using Euler approach we can read that
this part of A does not belong to B. Combine this with the fact that it is
empty and we conclude that “All A are B”. Similarly, in the middle diagram,
we note that the shaded area consists of the common elements of A and B.
Because the area is shaded, it is empty, that is, there is no A that is also B,
“No A is B”. Overlaying one diagram over another diagram, we obtain the
resulting diagram shown in Figure 6. It is important to notice that knowing
the Venn legend, we can read the right-hand diagram directly, – every part of
A is empty, noticing that all parts of A are shaded.
Figure 6 has two important properties it shows the inference of the result
and an intuitive graphical way to prove the statement:
Venn diagrams use a “primary diagram” legend that shows a general possi-
ble disposition of two sets overlapping. This is a departure from the original
Euler idea that permitted combining several Euler diagrams into one. In
Euler form, this example is presented in Figure 7.
B B
No single
A & A
Euler
diagram
A B
o x
Example 5. Figure 9 represents the proposition “Either all A are B and some
A is B, or no A is B and some B is not A.”
Most of the people including Peirce himself agree that this diagram is too
complex in comparison with very intuitive Euler diagram. An alternative
visualization of example 5 is shown in Figure 10
60 Chapter 3
[Shin, 2003; Shin & Lemon, 2003]. This visualization uses the following
legend:
• Venn's shadings are used to designate emptiness,
• Peirce's ‘x’ is used for existential properties, and
• Peirce's connecting line between x's is used for disjunctive infor-
mation.
A B
x x
o o
x o
o x
X X
Shin and Lemon [Shin & Lemon, 2003] state that this Shin diagram
demonstrates increased expressive power without suffering the loss of visual
clarity that happened in Peirce’s diagram. While this is true, it seems that the
Shin diagram is also limited in scalability. Let us assume that we have more
that two predicates A and B, say, four or five predicates or sets with similar
relations between them.
The number of Shin diagrams can grow exponentially. Consider will
happen with for five sets A, B, C, D, and E. We may need ten pairs of dia-
grams of the type shown in Figure 10. Use of additional graphical elements
such as color and texture can make scalability of the problem less severe.
Regardless of this limitation, Shin diagrams have several important prop-
erties that make them equivalent to rigorous systems expressed in formal
logic. This formal system is sound and complete in the same sense that some
symbolic logic is complete [Shin & Lemon, 2003].
Table 3 provides a summary of diagrammatic systems for representing
relations between sets.
3. Visual Reasoning and Representation 61
4. HETEROGENEOUS REASONING
During the reasoning process, the system may ask the user to determine
the size of some block, say block e (see Figure 11). Barwise and Etche-
3. Visual Reasoning and Representation 63
Figure 12. Task diagram (with permission from Barwise & Etchemendy,
http://www-csli.stanford.edu/hp/Hproof3a.html). See also color plates.
If a diagram is not given, then several diagrams are generated and the
same task is solved with each of them. A diagram is tested if it is compatible
with sentences and a block a can be identified using given sentences and the
assumed diagram.
Barwise and Etchemendy [Barwise & Etchemendy, 1995] claim that it is
not necessary to create an Interlingua to be able to reason with heterogene-
ous information. We feel that some clarification is needed here. The diagram
in the Hyperpoof is defined formally using the same predicates that used in
sentences. There is a one-to-one mapping of the visual legend of the diagram
with predicates to terms in sentences and a formal description of the dia-
gram.
Thus, in some sense, the Interlingua is not needed here because of the
way both representations were designed. If the diagram were described in,
say, the traditional terms of computer graphics, e.g., as OpenGL code with
concepts such as lines, rectangles, and textures or as a single raster bitmap
image, an Interlingua would surely be needed. This problem is well known
in scene analysis, robotics and geospatial imagery analysis where informa-
tion is not only heterogeneous but also obtained from disparate sources
that have not been coordinated in advance.
5. GEOMETRIC REASONING
Geometric reasoning has a long and inspiring history traced back to the
Greeks. One might hope from the term that modern geometric reasoning
3. Visual Reasoning and Representation 65
may continue the intuitively clear visual geometric line of reasoning inher-
ited from Greeks. In fact, as our short review will show, it is not yet the case.
In 1950s, one of the first and seminal efforts of Artificial Intelligence
(AI) research was to simulate human geometric reasoning in a computer
program [Gelenter, 1959]. “This research activity soon stagnated because the
classical AI approaches of rule based inference and heuristic search failed
to produce impressive geometric, reasoning ability” [Kapur & Mundy,
1989].
The next attempts of computer simulation for geometric reasoning were
algebraic approaches developed in 1980s and 1990s [Kapur & Mundy,
1989; Chou & Gao, 2001]. In fact, both approaches were a modern return to
the Cartesian tradition in mathematics, which was very successful for centu-
ries by transforming visual geometric tasks into sets of algebraic, vector and
matrix equations or non-visual If-Then rules. As we shall see below, both
rule-based and algebraic approaches departed from intuitively clear visual
geometric proofs.
Below we describe the frameworks of both approaches using work
adapted from [Chou & Gao, 2001]. In algebraic approach geometric state-
ments are converted to a set (conjunction) of equations
h1(y1, y2,…,ym)=0
h2(y1, y2,…,ym)=0
…
hr(y1, y2,…,ym)=0
c(y1, y2,…,ym)=0.
Chou, Gao, and Zhang [Chou, Gao & Zhang, 1996] also demonstrated
that a revived AI rule-based approach was able to provide valuable results --
short proofs of tasks where an algebraic solution of polynomial equations
was long. A geometric rule or axiom used in their Geometry EXpert (GEX)
system has the following form:
A E B
C F D
7. APPLICATION DOMAINS
Price
Quantity
as formal sentences with spatial predicates, but they are more naturally given
visually.
Two hypotheses have been generated in this example:
time to replace this vivid image with one in which dirtiness is represented
in degrees. In other words, the visual relations, which are hard to envis-
age spatially, lead to a mental picture.
Another experiment conducted in [Knauff, Fangmeier, Ruff & Johnson-
Laird, 2003] supported the explanation about the addition time:
All relations elicit mental models that underlie reasoning, but visual rela-
tions in addition elicit visual images.
This experiment used functional magnetic resonance imaging to identify
types of brain activities during work with relations (1)-(4), that were pre-
sented acoustically via headphones (without any visual input).
These provided experiments are important for understanding limits of
efficient visual reasoning and problem solving.
Thus, according to this theory human image processing starts from input
visual data “as is” without identifying specific properties and parts. Then the
process of detailed image generation starts, where first most distinctive parts
and properties are identified before identifying other properties. This identi-
fication is done by matching the input to stored visual memories.
Visual reasoning and problem solving depend heavily on the visual rep-
resentation used. Below we describe some visual representation models
found in computer science and cognitive science summarized in Table 8.
This table is based on [Croft & Thagard, 2002; Baxton & Neumann,
1996; Glasgow & Papadias, 1992; Tabachnik-Schijf, Leonardo & Simon,
1997]. These models include array-based models, node-link based models,
semantic networks with scene graphs, knowledge-based models, probabilis-
tic models, 2D iconic models and deformable object model.
Iconic image representations are considered biologically plausible
[Buxton, Neumann, 1996]. The term icon is used in a variety of meanings.
One of them was discussed in section 2 above that followed Peirce’s ap-
proach. Nakayama [1990] and Rao and Ballard [1995] use the term iconic to
describe small visual templates, which constitute visual memory.
Specifically in [Rao, Ballard, 1995] a set of icons is just a vector of nu-
meric parameters associated with a pixel or patch, extracted from the image
using some local filters of different size of localities, and used to identify
rotation. Parameters are called icons because they have visual equivalent and
they are small like icons because they cover small patches.
These icons can be called low-level icons. They show just line direction
or pixel distribution. High-level icons represent real world concepts such as
a house or bridge. Both icon types can make image representation shorter if
the image can be described only by patches with complex patterns. In addi-
tion, iconic representation is biologically plausible, that is mental images
possibly are generalized real images in the form that resembles icons [Bux-
ton, Neumann, 1995; Rao, Ballard, 1995].
Rao and Ballard based their iconic model on biological evidence [Field,
1994; Kanerva, 1988] about the primate visual system. Specifically, this sys-
tem takes advantage of the redundancy in the visual environment by produc-
ing a sparsely distributed coding that aims to minimize the number of simul-
taneously active cells.
According to [Kanerva, 1988], the memory operates on features and cre-
ates internal objects by chunking together things that are similar in terms of
those features and relatively invariant to the changes in the environment.
Rao and Ballard [Rao & Ballard, 1995] hypothesize that visual memories
could consist of iconic representations stored in a distributed manner which
can be activated by an incoming visual signal or other iconic representation.
In the context of this hypothesis, visual perception is an activation of mem-
ory.
Thus, relatively invariant iconic feature vectors may be viewed as an ef-
fective medium for vision-related memory storage. The Bruegel visual corre-
lation system presented in Chapter 10 is icon based and derives benefits
from such properties of human perception.
74 Chapter 3
9. CONCLUSION
Simple
1. Construct an Euler diagram and a reasoning diagram similar to that
shown in Figures 2 and 3 for the statement “No A is B. All C are B.
Therefore, no C is A.” Comment on the visual clarity of your result.
2. Adapted from [Lemon & Pratt, 1997]. Using Euler Circles, try to repre-
sent the following premises:
A B C ∅
B C D ∅
C D A ∅
3. Visual Reasoning and Representation 75
Show that this is impossible using diagram shown in Figure 15, where
A B C D ∅.
A
B
C
Advanced
3. Adapted from [H. Simon, 1995]. Assume that somebody wrote: “I notice
a balance beam, with a weight hanging from a two-foot arm. The other
arm is one foot long.” Then somebody asked the question: “How much
force must I apply to the short to balance the weight?”
11. REFERENCES
Allwein, G., Barwise, J., Eds. Logical Reasoning with Diagrams. Oxford Univ. Press, Oxford,
1996.
Anderson, M., Meyer, B., Ovier, P, Diagrammatic Representation and Reasoning, Springer,
2002.
Anderson, M., Diagrammatic Reasoning Bibliography, 1999
http://zeus.cs.hartford.edu/~anderson/biblio.html
Barker-Plummer, D., Etchemendy, J., Applications of heterogeneous reasoning in design
Machine Graphics & Vision International Journal, Vol.12, Issue 1, January 2003, 39-54.
[0]Barwise, J., Etchemendy, J., A Computational Architecture for Heterogeneous Reasoning,
in Theoretical Aspects of Rationality and Knowledge, I. Gilboa, ed., Morgan Kaufmann,
1998, 1-27. http://www-csli.stanford.edu/hp/TARKpaper.pdf
Barwise, J., Etchemendy, J., Language, Proof, and Logic, Stanford: CSLI Press, 1999.
Barwise, J. Etchemendy, J., Computers, Visualization, and the Nature of Reasoning, with, in
The Digital Phoenix: How Computers are Changing Philosophy. T. W. Bynum and James
76 Chapter 3
Peirce, C., Collected Papers. Cambridge, MA: Harvard University Press, 1933
Peirce, C., New Elements of Mathematics. C. Eisele, ed., Mouton, The Hague, 1976.
Pineda, L., Diagrammatic Inference and Graphical Proof, In L. Magnani, N. Nersessian, C.
Pizzi (Eds.), Logical and Computational Aspects of Model-Based Reasoning, Kluwer,
2002.
Shepard, R., Cooper, L., Mental Images and their Transformations, Cambridge, MA, MIT
Press, 1982.
Ransdell, J., The Visual Inference Laboratory, 1998
http://members.door.net/arisbe/menu/links/vilab.htm
Robin, R., The Annotated Catalogue of the Papers of CS Peirce, Amherst, Mass., 1967 (un-
published manuscripts collected in Harvard University Library).
Shin, S., Lemon, O., Diagrams, Stanford Encyclopedia of Philosophy, 2003,
http://plato.stanford.edu/entries/diagrams/
Shin, S., The Iconic Logic of Peirce's Graphs. Cambridge: MIT Press, 2003.
Simon H., Foreword. In: J. Glasgow, N. Narayanan and B. Chandrasekaran (Eds.), Diagram-
matic Reasoning, Cognitive and Computational Perspectives, MIT Press, Boston, 1995.
Swoboda, N., Allwein, G., Modeling Heterogeneous Systems, In Diagrammatic Representa-
tion and Inference: Second International Conference, Diagrams 2002, Proceedings, Eds:
M. Hegarty, B. Meyer, N. Hari Narayanan (Eds.), LNCS Volume 2317 / 2002, Springer,
2002, pp. 131-145.
Swoboda, N., Barwise, J. Implementing Euler Venn Reasoning Systems, In Diagrammatic
Representation and Reasoning, Anderson, M., Meyer, B., Ovier, P., (Eds.), Springer, 2002,
Swoboda, N., Allwein, G., A Case Study of the Design and Implementation of Heterogeneous
Reasoning Systems, In L. Magnani, N. Nersessian, C. Pizzi (Eds.), Logical and Computa-
tional Aspects of Model-Based Reasoning, Kluwer, 2002.
Shimojima, A., A Logical Analysis of Graphical Consistency Proofs, In L. Magnani, N.
Nersessian, C. Pizzi (Eds.), Logical and Computational Aspects of Model-Based Reason-
ing, Kluwer, 2002.
Stenning, K., Lemon, O., 2001, Aligning Logical and Psychological Perspectives on Dia-
grammatic Reasoning, Artificial Intelligence Review, 15(1-2): 29-62. (Reprinted in Think-
ing with Diagrams, Kluwer, 2001.)
Stenning, K., 2002, Seeing Reason: image and language in learning to think, Oxford: Oxford
University Press.
Tabachneck-Schuf, H. J. M., Leonardo, A. M., & Simon, H. A. (1997). CaMeRa: A computa-
tional model of multiple representations. Cognitive Science, 21,
Thagard, P., Computational philosophy of science. MIT Press, Cambridge, Mass., 1988.
Tiercelin, C., The Relevance of Peirce's Semiotic for Contemporary Issues, in Cognitive
Science, Acta Philosophica Fennica, vol.58, 1995, pp. 37-74.
http://jeannicod.ccsd.cnrs.fr/documents/disk0/00/00/01/99/ijn_00000199_00/ijn_00000199
_00.htm
Venn, J., On the diagrammatic and mechanical representation of propositions and reasoning,
The London, Edinburgh, and Dublin Philosophical Magazine and Journal of Science, 9,
1880,. Pp. 1-18.
Venn, J., Symbolic Logic, London: Macmillan, 1881, 2nd ed., 1894.
Chapter 4
REPRESENTING VISUAL DECISION MAKING
A Computational Architecture for Heterogeneous Reasoning
1. INTRODUCTION
2.1 Inference
The sentences occupying the atomic steps of the example proof are writ-
ten in a formal language known as first-order logic (FOL). This language
has an unambiguous syntax and semantics. The sentence ∀x Dodec(x) in
step two for example, means that every object (in the domain) is a dodeca-
hedron.1
The use of a justification is illustrated at step 3:1.2. The inference rule
used here is known as ∀-Elim (universal elimination). In this step the sup-
port is the sentence appearing in step 2 of the proof. The inference here is
from the universal claim that every object is a dodecahedron, to the particu-
lar claim that a is a dodecahedron, where a is some object in the domain of
the problem.
The fact that the claim at step 3:1.2 may be deduced from the claim at
step 2 using the ∀-Elim inference rule can be checked syntactically. To do
this we note that the support is a universally quantified sentence, and that the
justified sentences is the matrix (body) of the sentence, with the bound vari-
able x replaced with a name a. This is a correct application of the rule ac-
cording to the definition of the system.
The number and nature of the available inference rules will depend on the
particular logical system. In the case of logical deduction, the key require-
ment is that each inference rule must be stated in completely unambiguous
terms, and it should be clear when the conditions on the correct application
of the inference rule have been met.
1
Under our conventional interpretation of the predicate Dodec.
84 Chapter 4
On the left of Figure 2, we indicate in black all the nodes accessible from
node 9.2:4.2:2, which is shown in gray. On the right of the same figure, we
indicate in black all nodes accessible from node 9.2:5. Here, notice that
node 9.2:4, which contains a range of cases (subproofs), is accessible from
node 9.2:5, but that the nodes within those cases are not accessible from this
node.
The accessibility relation determines the inheritance of information in a
proof. That is, if A < B, then the information expressed by representations
contained at node A is also available at node B.
It is important to note that although the nodes of a proof form a partial
(not linear) order, when we restrict attention to the nodes accessible from a
specified node, these nodes are linearly ordered. This may be emphasized
by renumbering the nodes in Figure 2 in the manner shown in Figure 3.
Thus, each node in a proof has a unique, linear history which we call the
4. Representing visual decision making 85
3. GENERALIZING TO HETEROGENEOUS
DEDUCTION
With the standard model of natural deduction in hand, we turn our atten-
tion to the task of formalizing the more general domain of heterogeneous
deduction. Figure 4 illustrates a heterogeneous deduction.
signment that satisfies the stated constraints.2 In this section, our intent is to
consider the generalizations to the model of sentential deduction that are
necessary to model the heterogeneous case.
The first thing to notice about Figure 4 is that the structure of the reason-
ing is very similar to that of a typical natural deduction proof. The reasoning
proceeds by splitting into cases represented by subproofs each headed with a
new assumption, and then elucidating the consequences of those assump-
tions within the subproof. At the end of a subproof, information is exported
to the main proof according to some inference rule.
The obvious difference between the two cases is that the individual steps
of the proof contains diagrams, rather than sentences. In fact, in the example
of Figure 4 all of the steps in the proof except the initial assumptions contain
diagrams, though in general some steps might contain sentences.
2
The reasoning of Figure 4 is very similar to that implemented in our Hyperproof program
[Barwise & Etchemendy, 1994] with the key difference being that the kind of diagram
used in this proof (Hyperproof uses diagrams of the placement of objects on a checker-
board.)
88 Chapter 4
made to the representation at later nodes in the proof. The basic intuition is
that since a modification at a particular node introduces incremental
information, that information is inherited at later nodes. And since
inheritance of graphically expressed information is explicit, subsequent
modifications to the representation should preserve that information. Thus,
the information content of a graphical representation should increase
monotonically as we follow any accessibility path through a proof.
This section describes two types of graphical editing that may be permit-
ted in a CAHR-based application enforcing this flow of information: incre-
mental and presentational editing.
The example has an important feature, namely that each edit is permanent
for the duration of its scope. Once an individual has been assigned to an of-
fice, this assignment is not subject to further modification. We call such ed-
its permanent incremental edits. Permanent incremental editing can be ef-
fectively used to enforce the monotonicity requirement discussed above. If a
piece of information has been established at a particular node in a proof,
then that information is available at any node from which the node in ques-
tion is accessible. If that information is recorded as a modification to a
graphic, then the specific modification must be inherited by all descendant
graphics. Permanent incremental editing guarantees that this will be the
case, since the modification cannot be subsequently altered at a later (de-
scendant) node. Permanent edits can, however, be retracted or modified at a
later time, but the modification must be made at or before the node in which
the original edit was made. This imposes a practical restriction that aids us-
ers in the successful completion of the reasoning task: A conclusion or deci-
sion that was previously made cannot be retracted without navigating to the
node associated with that modification and reviewing the original justifica-
tion. This restriction becomes particularly important if the proof is being
constructed by multiple users or if the construction takes place over an ex-
tended period of time.
Permanent incremental editing is a species of a more general notion that
we call constrained incremental editing. A modification to a graphic GN is a
constrained incremental edit if, in addition to modifying GN, it narrows the
range of possible modifications that can be made to descendant graphics.
Permanent editing is the most highly constrained form of incremental edit-
ing.
Figure 6 illustrates constrained incremental editing. We assume a
graphical representation of a single attribute that can be given one of seven
possible values.
3.4 Post-editing
On the other hand, the modification of an existing value can alter the
editing constraints imposed on descendant graphics, and so render an edit
impermissible that was formerly permitted. An example is depicted on the
right of Figure 8, where the value at node 2 of the proof on the left is
changed from V to W. A consequence of this change is that the range of
permissible values at node 4 no longer includes the value X. Thus the
change at node 2 precludes the assignment of X at node 4, and the latter as-
signment is wiped out. In situations of this sort, when a modification pre-
cludes an edit already made at a descendant node, a CAHR-based applica-
tion will typically warn the user of the effects of the modification.
Figure 8. Post-editing
94 Chapter 4
we can isolate two facets to the rule. One is the production of an range of
cases, which must exhaust all of the alternatives represented by the disjunc-
tion, and then second is the promotion rule which permits the extraction of
information common to the subproofs into the containing proof. We can
generalize these observations to the heterogeneous case.
Notice that in Figure 4 the individual subproofs of step 5 are headed not
by sentences but by diagrams, each of which represents a different assign-
ment to the large office. If we are to extract information from these cases we
need to demonstrate that these cases are exhaustive, as indeed they are given
the sentence in step 2. Since diagrams are typically bad at expressing dis-
junctive information, it is not unusual for a sentence to be the source of the
justification that a set of cases is exhaustive.
At step 6 of the example of Figure 4 we have used an inference rule
called Merge to promote information common to the cases into the main
proof. In the sentential case the analog is to establish a sentence common
to all subproofs, and to promote this sentence to the containing proof (on the
basis that since the cases exhaust all possibilities, and each case entails this
information, then it must be valid to assert the information outside of any
specific case.) In the diagrammatic case, we could insist that each case con-
tains the same diagram, and that this diagram is promoted, but to be more
general we can allow the promotion of the diagram that contains the infor-
mation common to diagrams in all subproofs.
4. GENERALIZING TO HETEROGENEOUS
REASONING
ten years. The reasoning shows that under either scenario, the optimal choice
is the former mortgage.
3. Case rules: As observed above, case rules may be used to justify a node
that contains a set of subproofs. The most important class of case rules is
what we call exhaustive cases rules. An exhaustive cases rule allows the
user to break into a collection of alternatives (“cases”) that are jointly ex-
haustive (that is, one of which must hold). The cases are specified by the
initial (assumption) nodes of the subproofs contained by the node being
justified. Exhaustive cases rules permit cases that are specified by vari-
ous types of representation (sentential or graphical), and which are sup-
ported by nodes containing various types of representation.
Not all case rules will require an exhaustive set of cases. In the finan-
cial planning example of Figure 10 the user is deciding between two dif-
ferent mortgage options. In practice there may be very many available
deals, and in principle they might be considered as a range of exhaustive
cases, but more likely only representative or extremal examples need be
considered.
The complete range of cases may not be available even in principle.
The architectural example of Figure 9 considers two locations for the ex-
tension, to the south or to the east of the main house, but (assuming con-
tinuous space) there is an infinity of possible options, with varying di-
mensions for the new addition. Again, only extremal or representative
cases are likely to be considered in practice.
4. Promotion rules: Promotion rules allow users to extract information
from a set of exhaustive cases, that is, to promote information contained
in one or more of the embedded subproofs to a subsequent node in the
embedding (parent) proof. A promotion rule is used to justify a node in
the parent proof and cites as support an accessible node containing a set
of subproofs. Logically, promotion rules allow users to extract informa-
tion from a set of exhaustive cases, that is, to promote information con-
tained in one or more of the embedded subproofs to a subsequent node in
the embedding (parent) proof. A promotion rule is used to justify a node
in the parent proof and cites as support an accessible node containing a
set of subproofs. One important class of promotion rules are what we
call merge rules: The application of a merge rule is legitimate when the
information extracted is present in each of the cases not containing a
“close” declaration at any of its nodes (see below).
The last steps in Figure 9 illustrates the use of a Merge rule. In this
case, one of the subproofs has been closed, indicating that the case under
consideration is inconsistent. The merge rule then promotes the informa-
tion from the sole remaining case as representing “the way things must
be”. If there are multiple open cases, the strongest representable infor-
mation common to all open cases is promoted into the containing proof.
The final step in Figure 10 shows a different kind of promotion rule,
one in which information is promoted as a consequence of the application
104 Chapter 4
of a metric to the various options, in this example the option with the
minimum interest payment in corresponding scenarios.
We note that different promotion rules might be applicable depending
on the nature of the cases that they are acting upon. For example, we
discussed above the assumption rules Option, which reflects a choice on
the part of the user, and Scenario, reflecting a forseeable state of the
world that might arise outside of the user's control. When promoting an
outcome from a range of cases considering different Options, we should
use an optimizing promotion rule which preserves the best case. In con-
trast, it is not obvious whether to use an optimizing, averaging, or pes-
simizing promotion rule when promoting outcomes from a range of Sce-
narios. The most conservative approach would be to use a pessimizing
rule in a manner analogous to the familiar minimax procedure.
In the case of logical deduction, promotion rules are pessimizing in the
sense that only information known in all cases can be promoted from a
range of Assumption cases (and then only if the cases are known to be
exclusive.)
Reasoning by cases is fundamentally hypothetical and disjunctive. It
is hypothetical in that the reasoning within a particular case is based on
assumptions that need not hold. It is disjunctive in that we are in general
left with multiple open alternatives when we conclude our consideration
of the cases. For this reason, promotion rules are essential if our goal is
to find (and justify) a unique or optimal solution to a reasoning task.
5. Declaration rules: Declaration rules are used to justify assertions made
about the state of the proof at the node in question. Examples include
declarations that a case (subproof) is closed or that a case is consistent
with the information assumed in the proof. Declaration rules specify the
conditions under which the declarations can be made.
Several examples of the Close rule are illustrated in Figures 4 and 9.
These proofs also highlight the variety of reasons for which cases may be
closed in the course of solving a reasoning task. In Figure 9, cases are
ruled out for aesthetic and legal reasons; in Figure 4, cases are closed be-
cause they are inconsistent with the constraints given in the problem.
Other declarations are possible. It might be necessary to demonstrate
that the information expressed in a set of representations is consistent,
when the representations are interpreted conjunctively. For example,
imagine being presented with a collection of constraints, and a candidate
solution. The goal of the reasoning is to show that the candidate solution
is in fact a solution to the problem posed by the constraints. We need to
conclude the proof with a declaration that the constraints are all satisfied
in the solution design.
A CAHR-based application may implement rules explicitly or implicitly.
By this we mean that the user may or may not be required to understand and
4. Representing visual decision making 105
Harry and his wife Harriet gave a dinner party. They invited Harry’s
brother, Barry, and Barry’s wife Barbara. They also invited Harry’s sis-
ter, Samantha, and her husband Samuel. Finally, they invited Nathan and
Nathan’s wife Natalie. While they were seated around the table, one per-
son shot another.
Figure 12.
108 Chapter 4
The chairs were arranged as shown in the diagram. The killer sat in the
chair marked K. The victim sat in the chair marked V. Every man sat
opposite his wife. The host was the only man who sat between two
women. The host did not sit next to his sister. The hostess did not sit
next to the host’s brother. The victim was the killer’s former spouse.
4. When working with a patient a doctor typically describes only the diag-
nosis reached and actions taken, and not the alternatives that were con-
sidered and rejected. To what extent do you as patient care about these
alternatives? What stakeholders might benefit from a complete record of
the doctor’s reasoning and which would not? What social forces would
act to encourage and resist the widespread adoption of CAHR-based ra-
tionale capture if it were available in the medical domain?
8. REFERENCES
Barker-Plummer D., Etchemendy J. Applications of Heterogeneous Reasoning in Design.
Machine Graphics and Vision 2003; 12(1):39--54
Barwise J., Etchemendy J., Hyperproof. Stanford and Cambridge: Cambridge University
Press 1994. Program by Gerard Allwein, Mark Greaves and Mike Lenz.
Fitch F.B. Symbolic Logic: An Introduction, Luvain, 1952.
Gentzen G. Investigations into Logical Deduction, In The Collected Papers of Gerhard
Gentzen, M.E.Szabo, ed. Amsterdam: North-Holland, 1935/1969
4. Representing visual decision making 109
Abstract: This chapter provides a discussion of mathematical visual symbolism for prob-
lem solving based on the algebraic approach. It is formulated as lessons that
can be learned from history. The visual formalism is contrasted with text
through the history of algebra beginning with Diophantus’ contribution to al-
gebraic symbolism nearly 2000 years ago. Along the same lines, it is shown
that the history of art provides valuable lessons. The evident historical success
provides a positive indication that similar success can be repeated for modern
decision-making and analysis tasks. Thus, this chapter presents the lessons
from history tuned to new formalizations in the form of iconic equations and
iconic linear programming.
Below we present an extract from the English translation of the real his-
torical mathematical text, the first book in algebra (~ A.D. 830), “Al-jabr
wa'l-muqabala” by M. Al-Khwarizmi [English translation: Al-Khwarizmi,
1974; Parshall, 1988]. The term “algebra” came from this book as well as
the term “algorithm,” which are derived from the name of the author.
“…a square and 10 roots are equal to 39 units.
The question therefore in this type of equation is about as follows: what is the square
which combined with ten of its roots will give a sum total of 39?
The manner of solving this type of equation is to take one-half of the roots just men-
tioned. Now the roots in the problem before us are 10. Therefore take 5, which mul-
tiplied by itself gives 25, an amount which you add to 39 giving 64. Having taken
then the square root of this which is 8, subtract from it half the roots, 5 leaving 3.
114 Chapter 5
The number three therefore represents one root of this square, which itself, of course
is 9. Nine therefore gives the square.”
The modern symbolic (iconic) representation and solution is much shorter
and clearer [Parshall, 1988]:
KTα β Mγ
Table 4 derives the equivalence. Not surprisingly, the textual form at the
bottom is ambiguous and much longer than either the modern or Diophantus’
notations.
It is not clear from the text whether we have: x3+x⋅2-3 or x3+x⋅(2-3). To
avoid this ambiguity the text should be even longer, perhaps: “Cube of un-
known number plus the same unknown number multiplied by two and minus
three from the whole expression presiding to three”. In both modern and
Diophantus’ notations there are 8 characters vs. 130 characters for the text
based representation. That is, Diophantus’ notation is quite competitive even
now.
The advantages of symbolism become even more evident when we try to
combine two expressions. Let us add x3+x⋅2-3 and 2x3+3x-2. Diophantus’
5. Algebraic Visual Symbolsm for probem solving 117
him. After consoling his grief by this science of numbers for four years he ended his
life.
Questions posed based on this text are: When was he married? When did
he get his first son? When did he die?
The answer provided is: Diophantus married at age thirty-three, had a son
when he was thirty-seven, and died when he was eighty-four.
How can one answer these questions without iconic symbols? It is a real
challenge using reasoning in a natural language as we have seen in examples
above, but the solution using iconic notation is simple:
In this way, mathematics was able to reach the highest level of rigorous
reasoning and problem solving that probably would not have been reached
without such iconographic knowledge representation or minimally would
have taken much more effort.
expression occupies the same space as each of the original components. Ta-
ble 1 in Chapter 8 illustrates how this is done in art. The text based metaphor
is doubled practically, but the merged picture metaphor occupies almost the
same space as each of its components, showing the same “compression” ef-
ficiency as in the mathematical examples above.
One can ask: “How is this example from art related to use of visual sym-
bolism in solving decision-making problems that we have seen in mathe-
matical examples?” At first glance, there is no such relation, but the effi-
ciency of symbolic mathematics in large part came from its compressed in-
formation representation. Thus, any efficient compressed visual presentation
can be potentially insightful for the decision-making and problem-solving
tasks.
Not every relation in the world can be easily presented in text. Often spa-
tial adjacency is a natural way to foster a vision of non-textual relations be-
tween entities. Merging two proverbs visually reveals a deeper meaning of
each proverb and their interrelation than a text only representation does.
At first, there is some uncertainty about what this sum means. If we ignore
the data types, we will get 10 abstract entities or living creatures.
But if we pay attention to specific data types, we will have to have two
numbers in our solution: one (lion) and 9 (birds) that cannot be called a sum
in a standard arithmetic. To be consistent with the standard arithmetic two
separate equations: 2-1 = 1 (for lions) and 4+5 = 9 (for birds) can be gener-
ated. Data with a hundred different data types will be expressed in a hundred
equations in the standard arithmetic. Visual, iconic arithmetic allows us to do
this in one equation.
Such visual iconic arithmetic is shown in Figure 1. It can be viewed as a
special type of data fusion, not just counting but a process of combining
data.
5. Algebraic Visual Symbolsm for probem solving 121
2 +5 -1 +4 =
2 -1 +5 +4 =
(2-1)+ (5+4)=
(1)+ (9) = +9 .
Figure 1. Multi-sort visual/iconic arithmetic
2 - +5 +4 = 10
Figure. 3. Iconic arithmetic for a supercategry – creature
Figures 4 and 5 illustrate the same equations with different creatures and
corresponding icons. Both examples are based on Egyptian hieroglyphs.
2_+5y-1_+4y = 2_-1_+5y+4y =
(2-1)_+ (5+4)y = 1_+ 9y
Figure 4. Equations with hieroglyphic icons
2p+5Ì-1p+4Ì = 2p-1p+5Ì+4Ì =
(2-1)p +(5+4)Ì=1p+ 9Ì.
Figure 5. Equations with hieroglyphic icons
5x + 8x = 12y
What does it mean? We can interpret this equation as:
5x + 8x = 12y + 0y
and construct two equations from it - one for lions and other for birds:
5x = 12y
8x = 0y
5. Algebraic Visual Symbolsm for probem solving 123
where icons could indicate a data type similar to our use of kg, sec, m, and $.
For instance lion icon can be lion’s weight, price or size of habitat.
In general, icons can be interpreted not only as data type modality
indicators but also as indicators of measurement units similar to our use of
sec, min, hour, day, week, month, and year.
In addition, we can use icons as indicators of additional variables with a
specific modality and measurement units. For instance, the lion icon can be
interpreted as amount of food consumed by a lion per day in kg. Similarly,
the bird icon can indicate the same of bird. Obviously food for lions and
birds is quite different even if its amount is expressed in common units such
as kilograms. Ignoring these differences for many situations and tasks can
be incorrect. At this stage we deliberately do not specify any of these inter-
pretations for lion and bird icons in equations above. We try to use a purely
syntactic manipulation with icons as long as possible and go to a specific
interpretation only when further syntactic manipulation cannot continue
without an interpretation.
The first equation with lions has many solutions. Namely, every pair (x, y)
for which x = (12/5) y is a solution. The second equation also has many solu-
tions which take the form (0, y). Here x and y could be the number of lions
and birds, respectively.
This method is based on reduction of the multi-sort equation into two sin-
gle-sort equations. An alternative approach would be to simplify the multi-
sort equation before interpreting it by combining all lions and birds together:
(12y - 5x) = 8x
= (12y-5)/8x
If now one choose a particular case x = y = 1, we can write down this
case as
= 7/8
How the last equation can be interpreted? It might represent a relation be-
tween prices. Consider for example, the case of an exotic bird and a wild
lion. The interpretation could be that the price of the bird is 7/8 of price of
the lion. Another interpretation could involve habitats. The size of bird’s
habitat could be 7/8 of the size of lion’s habitat. In this case, lion and bird
icons are interpreted as additional variables: variable a is price of a lion and
124 Chapter 5
variable b is price of a bird. Thus the last equation can be rewritten in a stan-
dard way:
a=(7/8)b.
The iconic presentation is convenient because we do not need to carry
interpretations into the iconic equations. We can operate with iconic equa-
tions syntactically just like classical algebraic equations. The interpretation
can then be external as the semantic meaning of syntactical operations in
iconic algebra. This follows the standard algebraic practice where abstract
equations such as 3y = 15 and y = 15/3 = 5 are used instead of interpretation
specific equations 3y kg = 15 kg or 3y miles = 15 miles.
Currently, with an increasing amount of heterogeneous, multimodal
data coming from a huge variety of different sources multi-sort iconic
arithmetic can be helpful in data fusion and integration. Typically, at the
beginning, the task is vague and we often do not know exactly what we want
to do. For instance, we might want to operate separately with specific enti-
ties (e.g., birds and lions, or the female population) or with more general
categores of entities (animals or humans). We might be uncertain about the
level of ganulatiry that we need to carry. This is often because our goal is not
yet clearly defined and our ability to reach such a goal is also uncertain.
Assume that we want to plant some crops in an area with a known total
available size. We are uncertain about the planting plan. It could be a very
detaied plan involving specific individual varieties of wheat, grass and cot-
ton or much more generalized plan for the three categries: wheat, grass or
cotton. In the latter case, the model would be much simpler and could use
average costs of planting and other prices. The information available may
not be sufficient for a more specific planning model but this may not be clear
in advance. For instance, information may exist for individual crop varieties
that turns out to be unreliable.
Syntactic manipulation with icons permits us: (1) postponing the exact
task specification, (2) providing great flexibility for task specification and
(3) avoid being stuck with some task specification prematurely.
The example below illustrates the use of iconic equations and syntactic
manipulation without explicit interpretation of lion and bird icons and vari-
ables x and v in advance.
2x + 3v =4 (1)
5x + 4v = 9v (2)
5. Algebraic Visual Symbolsm for probem solving 125
We can solve these two equations using an iconic version of the classical
Gaussian elimination method. This iconic generalization can be done in a
matrix form. Multiplying Equation (1) by 5 and Equation (2) by 2 we will
get:
15v - 8v = 20 - 18v
After simplification this is equivalent to
7v = 20 - 18v
7v + 18v = 20
25v = 20 , v = (20/25) /
Now we need to interpret components of equations (1), (2) and the equation
derived from them including the “ratio” of icons in the last equation. In both
equations let x be the number of lions and v be the number of birds. Lion and
bird icons are interpreted as additional variables a and b, that stand for prices
of a lion and a bird.
2xa + 3vy = 4a
This means that exactly the same Gaussian elimination can be done in clas-
sical terms. The idea of iconic equations is that a human when formulating
the problem to be solved can use icons to assist in problem formulation and
initial reasoning. Then iconic equations can be converted to a traditional sys-
tems of equations and solved analytically. If we generalize birds and lions as
creatures from the beginning then R = 1 and a = b.
/ = / =1
This will produce another solution of the equations (1) and (2) that can be
produced analytically in a regular algebraic way.
Max(3x + 5y - 1v + 4w )
2x - 1y + 5v + 4w = 10
2x + 3v < 12
5y + 4w < 14
Figure 6. Multi-sort iconic linear programming task
The exercise section below contains several tasks that are open problems
in iconic equations and iconic optimization, such as non-linear, discrete and
stochastic iconic optimization problems.
5. Algebraic Visual Symbolsm for probem solving 127
3. CONCLUSION
This chapter has discussed the algebraic visual symbolism for problem
solving and lessons learned from the history of algebraic equations from
Diophantus to the present. It was shown that often the textual form of an
equation is ambiguous and much more verbose than either modern or Dio-
phantus’ notations. These lessons from history led us to new concepts for
iconic equations and iconic linear programming tasks.
We discussed several questions. How it was possible that, in spite of ob-
vious advantages of algebraic symbolism, algebraic notation was not used
for 1250 years after Diophantus invented it? Was this accidental? The
conclusion was that this was not accidental. It was a result of the dominance
of Euclidian geometric algebra and the fact that Diophantus’ algebra
operates with abstract entities that are not directly observable. Diophantus’
algebra operates with visual but abstract concepts of unknown values, con-
stants and arithmetic operations. Geometric algebra operates visually with
concrete objects that have a direct match in the real world. Thus, geometric
algebra was much easier to understand. It relies much less on abstract
thinking. Geometry is closer to direct modeling of physical entities.
It was shown that the history of art provides valuable lessons in the same
vane as the history of algebraic equations. Artists “compress” the space oc-
cupied by texts such as Bible, myths and proverbs. The important advantage
of “reading paintings” vs. reading text is that we can see the “whole story” at
once using our parallel visual processing. Thus, pictures compress the time
required to “read” a concept in comparison with sequentially reading text.
Multi-sort iconic equation representation is convenient because we do not
need to carry interpretations into the iconic equations. We can operate with
iconic equations syntactically similar to what is done in classical algebraic
equations. Interpretation can be external as the semantic meaning of syntac-
tical operations in iconic algebra. We also noted that with an increasing
amount of heterogeneous, multimodal data coming from a huge variety of
different sources multi-sort iconic arithmetic can be helpful in data fusion
and integration. For decision-making tasks, iconic equations are important
because they permit one to work with the high level of uncertainty that is
natural for the initial stages of decision making and problem solving.
bolic solution. Compare the amount space used to produce both solu-
tions.
Advanced
3. Design an iconic version of the Gaussian elimination method for solving
systems of linear equations with n variables in a matrix form.
4. Design an iconic version of the simplex method for solving the linear
programming task with n variables.
5. REFERENCES
Al-Khwarizmi, M. Al-jabr wa'l-muqabala, [English translation: Al-Khwarizmi, 1974.
Altshiller-Court, N. The Dawn of Demonstrative Geometry. Mathematics Teacher, 57, 1964,
163-166.
Cohen, M., Drabkin, I. Source Book in Greek Science. Cambridge, MA: Harvard University
Press, 1958.
Geller, L. Start of Symbolism, 1998, http://www.und.nodak.edu/instruct/ lgeller/
Miller, J. Earliest Uses of Symbols for Variables,
http://members.aol.com/jeff570/variables.html, 2001
O'Connor, J. Robertson, F. An overview of the history of mathematics 1997, http://www-
gap.dcs.st-and.ac.uk/~history/HistTopics/History_overview.html#17.
O'Connor, J. Robertson, F. Diophantus of Alexandria, 1999, http://www-
groups.dcs.stand.ac.uk/~history/Mathematicians/Diophantus.html
Parshall, K The art of algebra from Al-Khwarizmi to Viète: a study in the natural selection of
ideas, History of Science, Vol. 26, No. 72, 1988, pp.129-164.
Parshall, Biography of Diophantus, 2002.
http://www.lib.virginia.edu/science/parshall/diophant.html
Schroeder M. Number Theory in Science and Communication: with Applications in Cryptog-
raphy, Physics, Digital Information, Computing, and Self-Similarity (3rd Ed)), Springer-
Verlag, 1997.
Swift, J. Diophantus of Alexandria. American Mathematical Monthly, 63 (1956), 163--70.
Chapter 6
ICONIC REASONING ARCHITECTURE FOR
ANALYSIS AND DECISION MAKING
Boris Kovalerchuk
Central Washington University, USA
Abstract: This chapter describes an iconic reasoning architecture for analysis and deci-
sion-making along with a storytelling iconic reasoning approach. The ap-
proach includes providing visuals for task identification, evidences, reasoning
rules, links of evidences with pre-hypotheses, evaluation of hypotheses. The
iconic storytelling approach is consistent hierarchical reasoning that includes a
variety of rules such as visual search-reasoning rules that are tools for finding
confirming links. The chapter also provides a review of related work on iconic
systems. The review discusses concepts and terminology, controversy in
iconic language design, links between iconic reasoning and iconic languages
and requirements for an efficient iconic system.
Key words: iconic reasoning architecture, analysis and decision-making, storytelling iconic
approach, iconic language.
1. INTRODUCTION
Typically these steps are repeated several times in the process of refining
hypotheses and evidences in scientific research as well as in intelligence
analysis, engineering and architectural design, market analysis, health care,
and many other areas.
To explain the IER approach we use a modified task of ongoing monitor-
ing of the political situation in some fictitious country. The final reasoning
result condenses all of the most important visuals from analytical steps in a
single compact picture. This picture combines several types of icons and ar-
rows that indicate a conclusion, its status and a chain of evidences that sup-
port the conclusion. For presentation purposes the final part of the reasoning
chain can be enlarged and the icons replaced by actual maps, imagery and
photographs of people involved.
Major components of the IER architecture are:
• Collecting and annotating analytical reports as inputs using a
markup language, e.g., XML, DAML;
• Providing iconic representation for hypotheses, evidences, sce-
narios, implications, assessments, and interpretations involved in
the analytical process;
• Providing iconic representation for confirmations, and beliefs
categorized by levels;
• Providing iconic representation for evidentiary reasoning
mechanisms (propositional, first-order logic, modal logic, prob-
abilistic and fuzzy logics);
• Providing scenario-based visualization and visual discovery of
changing patterns and relationships;
• Providing a condensed version of iconic representation of
evidentiary reasoning mechanisms for presentation and
These components are depicted in Figure 1. The use of iconic visuals per-
mits a user to reach a high condensing ratio level. Experiments reported in
Chapter 10 show that iconic sentences can occupy space that is 10 times
smaller than space occupied by text, that is a compression factor of 10 is
possible. Also people can work with multidimensional icons two times
faster than with text [Spence, 2001]. A similar time and space compression
is expected to communicate analytical results (including underlying reason-
ing) to decision-makers and fellow analysts using IER. Moreover at some
moment with such advantages and the ongoing proliferation of visualization
technology, iconic reasoning can become a major way of reasoning and
communication in general. The visual correlation approach described in
chapters 8-10 can be naturally combined with visual reasoning to improve
problem solving.
6. Iconic reasoning architecture for analysis and decision making 131
Iconic representation of
evidentiary reasoning
(propositional, first -order logic,
modal logic, probabilistic and
fuzzy logics)
In this section we discuss how the process of defining the problem and
generating pre-hypotheses can be done visually. The problem is defined as
ongoing monitoring of a political situation in some fictitious country [AQ-
UANT, 2002]. This may include identification of the country, and selection
of processes to be monitored such demographic, economic, democratic proc-
esses, research & development and military activity. Assume that the user
selected a task of monitoring democratic processes and identified overall
output characteristics to be evaluated as one of the judgments: positive
132 Chapter 6
;
Monitoring democratic processes
E
Monitoring economic processes
Monitoring military activity
Monitoring research & development processes
Description of a new process
;
Change direction (positive, no change, negative, mixed)
Rate of change (low, medium, high)
Description of an emerging leader
After that the user picks up a pre-hypothesis from the menu for the se-
lected task. The menu contains all logically possible alternatives for changes
in country Y in the selected scale shown in Table 3.
6. Iconic reasoning architecture for analysis and decision making 133
E21 No indication that new people An official with a neutral, gray back-
with alternative views are in ground
power
E31 Several new persons that op- An official with negative, orange back-
pose democracy are in power ground for opposition to democracy
and two dots for “several” officials
E32 New indications of suppression Orange background encodes negative
of free speech fact- suppression of free speech E
E41 A new persons that oppose An official with negative, orange back-
democracy is in power ground for opposition to democracy
The color language and icon content used for icons in Table 4 is de-
scribed in Table 5. This language is easy to learn. It had only two iconic
elements (iconels) on content (a speaker and an official), three colors and
presence and absence of dots for quantity providing total 3*2*2=12 icons.
change in country Y (H1) is possible (with some confidence)” has its formal
equivalent, E11& E12 possible H1.
IF Evidences E11 and E12 take place then H1 (positive E11& E12 possible H1
change) is possible
R2 IF Evidences E12 and E21 take place then H2 (no sig- E12 & E21 highly probable H2
nificant change) is highly probable
R3 IF Evidences E31 and E32 take place then H3 (nega- E31 & E32true H3
tive change) is true
R4 IF Evidences E11 and E32 take place then H4 (mixed E11 & E32possible H4
change) is true
E11 E12
P
Figure 2. Traditional and iconic visualizations of rule R1. See also color plates.
Figure 3 shows inferences for other rules R2, R3, and R4 from Table 6. The
middle line reveals firm reasons (black arrow) for an orange flag -- "several
new persons that oppose democracy are in power" and there are "new indica-
tions of suppressing free speech”. Thus, a black arrow indicates sure conclu-
sion (true statement). Thus, orange line of reasoning in the middle row im-
mediately and preemptively indicates a negative line of events. Similarly a
136 Chapter 6
gray line indicates a neutral line of reasoning and conclusion. A mixed color
line indicates a mixed conclusion. Similarly, to Figure 2 this figure illustrates
that the iconic form conveys more information, is more appealing and per-
mits to convey a reasoning statement easier.
Figure 3. Traditional and iconic visualizations of rules. See also color plates.
Table 7 presents some arrow icons used in IER. Arrow icons have a hier-
archy, that is if we want only to encode that fact that the result is possible we
can use the first icon in Table 7, but if we want to encode the possibility
more specifically we can use text markers such as HP and LP for highly pos-
sible and low level of possibility. Another option is use of partially filled
arrows to identify the level of conclusion certainty as shown in Table 7.
Table 7. IER selected arrow icons
Icon Interpretation
P - Possible conclusion
PP HP HP - Highly possible conclusion
A new level 2 reasoning rule R13 (If E111 & E112 then H11) is shown visu-
ally in Figure 4. Now keeping in mind that H11=E11 we can visually combine
reasoning steps from Figures 2 and 4 to produce a reasoning chain (see Fig-
ure 5).
Figure 4. Comparison of two visual reasoning alternatives. See also color plates.
6. Iconic reasoning architecture for analysis and decision making 139
E12
E12
In this figure the first line shows the match found between H11 and E11 in
rules visualized by partial overlapping the blocks for H11 and E11. Similarly,
the second line matches icons for H11 and E11 by overlapping them. The third
line shows a completed match with a full overlapping of matched H11 and
E11 icons. A user can do this visually by dragging one icon over another one
and animate the process and the result.
Dragging is an additional intuitive element of visual reasoning. It is also
possible in abstract reasoning (first line in Figure 5), but it requires remem-
bering that H11 and E11 are the same. In contrast icons reveal similarity of
these concepts instantly. Figure 5 makes the reasoning chain evident and
easy to communicate. The first step of reasoning is firm (black arrow), but
the second one is only possible (arrow with letter P). Having a longer
reasoning chain or a tree an analyst and a decision maker can quickly see the
most questionable reasoning steps that may need more attention.
This is a rule R1 from Table 6. Such conjunction is generic for combing any
rules and we will call it a conjunction metarule.
We can also generate another compact visual reasoning rule shown in line
2 in Figure 6. This visual rule shows that if a mixture of “gray” and “green”
properties implies a positive change then a consistent analyst should accept
that two “greens” also should imply a positive change. This rule is based on
principle of monotonicity. The storytelling visual rule is much shorter and
intuitively clearer than text of this rule:
IF (If a new person that supports democracy is in power (E11) and no new
indications of suppressing free speech (E42) then positive change in coun-
try Y is possible)
Then (If a new person that supports democracy is in power (E11) and
there are new indications of free speech (E12) then positive change in
country Y is possible)
The importance of this monotonicity rule is in the fact that we do not need to
write this rule in advance. We can generate a specific form of this rule auto-
matically using the principle of monotonicity. This metarule (rule applied to
rules) will be called monotonicity metarule.
In the same way another short visual rule is generated in line 3 of Figure
6. It can represent analyst’s opinion: "If a positive change is possible be-
cause of a pro democracy person is in power then positive change is possible
even if there is no progress in free speech". A visual presentation of this
statement reveals its structure clearly and is shorter. This rule also has its
metarule counterpart – neutral metarule –adding a neutral statement (with
& operator) does not change reasoning result.
6. Iconic reasoning architecture for analysis and decision making 141
The process of searching for other candidate evidences can bring us to the
lower level hypotheses similarly to the discussion above. In this process,
candidate evidences are considered as pre-hypotheses and new evidence
candidates for them are generated and visualized on the next third level. We
are making search an explicit part of the reasoning process by introducing
search rules such as rule R111 shown in line 2, Figure 7:
If the name of X is known then search in the list of foreign chiefs for this
name and if found retrieve the post occupied.
Thus, this approach visualizes integration of declarative and operational
knowledge, where search rules represent an operational knowledge. Let us
assume that search produced the following result -- Mr. X is a national secu-
rity advisor in country Y. The line of reasoning that produced this result can
be expressed visually by rule R111 shown in the second line in Figure 7.
The textured arrow indicates that the search result can be incorrect or not
guaranteed. For instance, the name may not be in the search list or the list
contains the name, but it is another person with the same name.
Let us assume that (1) we have a candidate evidence E111 ="A new pro-
democracy person X stands next to the president in the recent official pic-
ture"; (2) the analyst has found a photograph with a new person that stands
next to the president during a recent visit to a foreign country, and (3) the
analyst does not know who is Mr. X.
If the name is not known we need a reasoning rule of the next forth level.
For instance we may have rule R1111:
If name is not known then run face recognition software (FRS) of the se-
lected face against all annotated images available from country Y.
Figure 7 provides visuals for this reasoning by depicting rules R1111 and R111
used sequentially.
Post occupied
Trusted
D source
Now successful reasoning steps can be combined into a single visual evi-
dentiary reasoning scheme (Figure 11). This single picture shows all eight
reasoning steps, levels of fidelity of conclusions, and points of linking of
individual reasoning steps. For instance, it shows that only three steps out of
eight steps are firm in final conclusion about positive change in country Y.
D Yes
&
E12 P
Figure 11. Integrated visual evidentiary reasoning scheme. See also color plates.
More complex hypotheses require more complex and dynamic icon de-
velopment. Ideally iconic representation should be automated. This subject is
discussed in Chapter 10.
Figure 13. Visual rules with signal ellipses. See also color plates.
5. RELATED WORK
ing and human conversation for iconic communication. Note the term
“iconic communication” itself is more general than its scope in current re-
search. Thus, we suggest using this term as an umbrella term for both areas.
A variety of icons are developed for both purposes [Dreyfuss, 1972, 1984].
The term an iconic language is another term with different meanings.
Valiant at al. [1995, 1997] define an iconic language as a language with ab-
sence of a specified syntax. In contrast, iconic programming languages are
defined with a specified syntax.
Pierce categorized the patterns of meaning in visual signs as iconic, sym-
bolic and indexical as shown in Table 10 based on [Moriarty, 1995; Nadin,
Küpper, 2003].
Table 10. Peirce’s sign categories
Sign Description Example
More terms based on [King, 1999; Valiant, 1997] are described in Table 11.
Table 11. Sign terminology
Sign Description Example
This is not a complete set of the iconic terms used. For instance, semiom is
defined as a message composed with icons which do not necessarily match
up to linguistic entities. Icons that express predicates are called predicative
icons. Single word icons can be expressed with a single word in a natural
language and multiple word icons correspond to more than one word in a
natural language
6. Iconic reasoning architecture for analysis and decision making 147
Figure 14. CAILS’s conjunctions symbols [Champoux, Fujisawa, Inoue, Iwadate, 2000]
A system based upon a set of dynamic visuals with qualitative reasoning
about information displayed within a document is known as Context Trans-
port Mark up Language, CTML [Tonfoni, 1996,1998-2001].
An iconic communication system to assist a user to construct sentences,
without typing them in words, i.e. solely relying on icons is called Visual
Inter Language, VIL [Becker, 2000]. The goal of VIL is to make the system
language independent so that it can be used universally.
6. Iconic reasoning architecture for analysis and decision making 149
6. CONCLUSION
ture condenses all reasoning steps. This iconic picture combines indicates a
conclusion, its status and a chain of evidences that support the conclusion.
For presentation purposes the final part of the reasoning chain can be
enlarged and icons can be replaced by actual maps, imagery and photographs
of people involved.
Major components of IER architecture are: (1) Collecting and annotating
analytical reports as inputs using a markup language, e.g., XML, DAML; (2)
Providing iconic representation for hypotheses, evidences, scenarios, impli-
cations, assessments, and interpretations involved in analytical process; (3)
Providing iconic representation for confirmations, and beliefs categorized
by levels; (4) Providing iconic representation for evidentiary reasoning
mechanisms (propositional, first-order logic, modal logic, probabilistic and
fuzzy logics); (5) Providing scenario-based visualization and visual discov-
ery of changing patterns and relationships and (6) Providing a condensed
version of iconic representation of evidentiary reasoning mechanisms for
presentation and reporting.
Iconic reasoning is much shorter and perceptually appearing than text.
This is important for communicating analytical results (including underlying
reasoning) to decision-makers and fellow analysts. With these advantages at
some moment iconic reasoning can become a major way of reasoning and
communication in general. In this chapter, the overview of iconic studies
contrasted iconic reasoning and iconic communication. The application area
for iconic reasoning is problem solving and the application area for iconic
communication is unrestricted human communication. The chapter provided
a short overview of iconic terminology started by Charles Pierce. It is also
discussed comprehensibility of icons along with the user-created icons vs. a
fixed iconic vocabulary. The overview indicated that visual reasoning mod-
els that use visual representations isomorphic to their logical models and
hybrid models that combine visuals with logical representation often have
computational advantages over reasoning models based solely on proposi-
tional representations without visual components. The overview also briefly
presented a history of iconic languages and ideas of more recent iconic lan-
guages.
8. REFERENCES
AQUANT program, ARDA, Appendix C 1.1, http://www.ic-arda.org/InfoExploit/aquaint/
index.html
Barwise, J. Etchemendy, J. Heterogeneous Logic, In Diagrammatic Reasoning: Cognitive and
Computational Perspectives, J. Glasgow, N. H. Narayanan and B. Chandrasekaran, eds.,
Cambridge, Mass: The MIT Press and AAAI Press, 1995, pp. 211-234.
Becker, L., Leemans, P., VIL: A Visual Inter Lingua. In: Iconic Communication, Yazdani, M.
and Barker, P. Eds. Iconic Communication, Intellect Books, Bristol, UK, 2000.
Bliss, C. K. Semantography, Semantography publications, Australia, 1965
Block, N. ed. Imagery. Cambridge, Mass.: MIT Press, 1981
Camhy, D., Stubenrauch, R., A Cross-Disciplinary Bibliography on Visual Languages for
Information Sharing and Archiving, Journal of universal computer science, v. 9, 4, 2004,
pp.369-389. http://www.jucs.org/jucs_9_4/a_cross_disciplinary_bibliography/paper.html
Champoux, B. Transmitting visual information: Icons to become words, 4th Iconic Commu-
nication Workshop, 2001.
Champoux, B.; Fujisawa, K.,Inoue, T.Iwadate, Y., Transmitting Visual Information. Icons
Become Words. Proceedings of IEEE Symposium on Information Visualization 2000, p.
244, 2000. http://www.mic.atr.co.jp/organization/dept3/papers/Cails/cails_paper.html
Chandrasekaran, B., H. Simon, Eds. Reasoning with Diagrammatic Representations. AAAI
Spring Symposium, Menlo Park, Calif.: AAAI Press, 1992.
Cruickshank, L., and Barfiel, L., The augmentation of textual communication with user-
created icons, in: Iconic Communication, Intellect, Bistol, UK, Portland, OR, USA, 2000.
Domingue J. Visualizing knowledge based systems, In: Software visualization: programming
as multimedia experience, Eds. Stasko, J., Domingue J., et al. , MIT Press, 1998
Dreyfuss, Henry: Symbol Sourcebook. An Authoritative Guide to International Graphic Sym-
bols. McGraw-Hill, New York 1972; reprint: Van Nostrand Reinhold, New York 1984.
Fauconnier, G. Mental Spaces. Cambridge, Mass.: MIT Press, 1985
Forbus, K. Qualitative Spatial Reasoning: Framework and Frontiers. In Diagrammatic Rea-
soning: Cognitive and Computational Perspectives, J. Glasgow, N. H. Narayanan and B.
Chandrasekaran, eds., Cambridge, Mass: The MIT Press and AAAI Press, 1995
Frixione, M., Gercelli, Zaccaria, R. Diagrammatic Reasoning about Actions Using Artificial
Potential Fields, 1997, http://www.dif.unige.it/epi/hp/frixione/ IEEE_Control_Systems.pdf
Funt, B.V. Problem-Solving with Diagrammatic Representations. Artificial Intelligence 13(3):
201-230, 1980
Gardin, F., B. Meltzer, Analogical Representations of Naive Physics. Artificial Intelligence
38: 139-159, 1989
Gelerntner, H. Realization of a Geometry-Theorem Proving Machine. In E.A. Feigenbaum
and J. Feldman eds., Computers and Thought, New York: McGraw
152 Chapter 6
Gips J., Shape Grammars and Their Uses: Artificial Perception, Shape Generation and Com-
puter Aesthetics, Birkhaüser, Basel:, Switzerland, 1975
Granger, E., Rubin, M, Grossberg, S., Lavoie, What-and-Where fusion neural network for
recognition and tracking of multiple radar emitters, Neural Networks 14, 325-344, 2001
Johnson-Laird, P. , Mental Models. Cambridge: Cambridge University Press, 1983
King, A. Review: 3rd National Conference on Iconic Communication, University of the West
of England, Bristol, 9-10th September 1999, http:/NNN/review3.html
Kosslyn, S.M., Images and Mind. Cambridge, Mass.: Harvard University Press. 1980
Kulpa, Z. Diagrammatic Representation and Reasoning. Machine Graphics & Vision 3(1/2):
77-103, 1994
Levesque, H., Logic and the Complexity of Reasoning. Journal of Philosophical Logic 17:
355-389, 1998
Lutz, C., Description logics, 2003, http://dl.kr.org/
Mikulin, L.,Elsaesser, D., Data Fusion and Correlation Technique Testbed (DFACTT):
Analysis Tools for Emitter Fix Clustering and Doctrinal Template Matching) Defense Es-
tablishments, Ottawa, 1995. http://65.105.56.185/files/09/0912/A091293.html
Moriarty, S., Visual semiotics and the production of meaning in advertising, 1995
http://spot.colorado.edu/~moriarts/vissemiotics.html
Myers, K., K. Konolige 1995. Reasoning with Analogical Representations. In: Diagrammatic
Reasoning, Glasgow et al. eds., MIT Press, 1995
Nadin, M., Küpper A., Semiotics for the HCI Community, 2003, http://www.code.uni-
wuppertal. de/uk/hci/Applied_Semiotics/applied.html
Neurath, O International Picture Language, Department of Typography and Graphic Commu-
nication, University of Reading, England, 1978
Tonfoni G. The theory behind the icon: when, where and why should a system for text
annotation actually be used, 4th Iconic Communication Workshop, 2001, Bournmouth.
http://www.intellectbooks.com/authors/tonfoni/
Tonfoni G., Writing as a visual art. Intellect U.K., 2000
Tonfoni G., Communication Patterns and Textual Forms, Intellect, U.K., 1996
Tonfoni G., Intelligent control and monitoring of strategic documentation: a complex system
for knowledge miniaturization and text iconization, In: Proceedings of the ISIC/ CIRA/
ISAS 98 Conference, pp. 869-874, NIST, Gaithersburg, MD, 1998
Tonfoni G., On augmenting documentation reliability through communicative context trans-
port, In: Proceedings of the 1999 Symposium on Document Image Understanding Tech-
nology, pp.283-286, Annapolis, MD, 1999
Sanderson, D.W., Smileys, O’Reilley & Associates, Inc, Sebastopol, CA, USA., 1993
Shape Grammars, http://www.shapegrammar.org/
Sloman, A. Musings on the Roles of Logical and non-Logical Representations in Intelligence.
In Diagrammatic Reasoning, Glasgow et al. eds. MIT Press, 1995
Spence, Information Visualization, ACM Press, 2001
Straccia, U., Reasoning within Fuzzy Description Logics, Journal of Artificial Intelligence
Research 14, 2001
Vaillant P., Checler M., Intelligent Voice Prosthesis: converting icons into natural language
sentences, http://xxx.lanl.gov/abs/cmp-lg/9506018., 1995
Vaillant, P., A Semantics-based Communication System for Dysphasic Subjects. Los Alamos
Nat. lab paper archive http://xxx.lanl.gov/arXiv:cmp-lg/9703003 v1 12 Mar 1997
Yazdani, M , Barker, P.G. Iconic Communication, Intellect Books, Bristol, UK, 2000.
Chapter 7
TOWARD VISUAL REASONING AND
DISCOVERY
Lessons from the early history of mathematics
Boris Kovalerchuk
Central Washington University, USA
1. INTRODUCTION
• visualization of a decision
• explanation of a decision
• visualization of the process of discovery of a decision.
Computer visualization is moving from being pure illustration to reason-
ing, discovery, and decision making. New terms such as visual data mining,
visual decision making, visual, heterogeneous, iconic and diagrammatic rea-
soning clearly indicate this trend. Beyond a new terminology, the trend itself
is not new as the early history of mathematics clearly shows. In this chapter,
we demonstrate that we can learn valuable lessons from the history of
mathematics. The first one is that all three aspects had been implemented in
the ancient times without the modern power of computer graphics:
1) Egyptians and Babylonian had a well developed illustration system
for visualizing numbers;
2) Egyptians and Babylonian had a well developed reasoning system
for solving arithmetic, geometric and algebraic tasks using visual-
ized numbers called numerals;
3) Ancient Egyptians were able to discover and test visually a non-
trivial math relation, known now as the number π.
How can we learn lesson from this history? How can we accelerate the
transition from illustration to decision-making and problem solving in new
challenging tasks we face now using history lessons? At first that history
should be described in terms of visual illustration, reasoning and discovery.
This will give an empirical base for answering posed questions. Traditionally
texts on the history of mathematics have different focus. This chapter could
be viewed as an attempt to create such an empirical base for a few specific
subjects.
The first lesson from this analysis is: inappropriate results of illustration
stage hinder and harm the next stages of visual reasoning and decision mak-
ing. Moreover, this can completely prevent visual reasoning and decision
making, because these stages are based on visualization of entities provided
in the illustration stage.
The most obvious example of such a case is exhibited by Roman numer-
als. These numerals perfectly fulfill the illustration and demonstration role,
but have very limited abilities to support visual reasoning for arithmetic
(summation, subtraction, multiplication and division). Hindu-Arabic numer-
als fit reasoning tasks much better.
The second lesson is that the most natural visualization that seems iso-
morphic to real world entities is not necessarily the best for reasoning and
decision making. The Ptolemy Geocentric system was isomorphic to the ob-
served rotation of the Sun around the Earth, but eventually it became clear
that it does not provide advanced reasoning tools to compute the orbits of
other planets.
7. Toward visual reasoning and discovery 155
This system also has unique symbols for 20,30,40,50,60,70,80,90 and 100.
The symbol for 20 ( ′2 2
) is directly based on symbols for 10 ( ), and the
symbol for 40 ( −
) is based on symbol for 4 ( − ). Symbols for 200, 300,
400 and 500 are based on the symbol for 100 ( ). They are
drawn by adding one, two, three or four dots (.) above, 200 ( ),
300 ( ), 400 ( ), and 500 ( ) [Friedman,
2003b].
Modern 3105=5+100+3*103
1 2 3 4
(backward notation) 5 0 100 3*103
1000110 = 237
The hieroglyphic system is less positional than the Hindu-Arabic system that
we are using now. The change of the sequence of the components does not
destroy the value of the number:
This combining has its own visual syntax. The presentation for 19607 does
not follow the pattern that larger digits are also larger in size. It shows digits
for 1000 smaller because there are nine of them. Thus, a more general rule is
that a large number of equal digits is the main factor for drawing those dig-
its smaller.
33 444
3 1113
R
4 333
22 21 22 2 2 2
120 1303+1/11 3350
R
66
111 7
6 2111
11111 211
6 111
300003 1010005 1/25
1111333 444 33
111333 444
19607
444 5 222
222
1111113
276
Figure 1. Free sequence of numeral components
(based on [Friedman, 2003c; Williams, 2002c; Arcavi, 2003; Allen,2001a])
7341 = VV VV %% V
because 734110 = 222160 = 2*602 + 2*60 + 21.
Base 60 has many advantages. One of them is that other systems can be
converted to this one (2, 3, 5, 10, 12, 15, 20, and 30 all divide 60) .
160 Chapter 7
%% V
19 is presented as 20-1, where 20 is and
1 is V. The negation symbol is over the sym-
bol for 1.
V V V 19 as 10+9, where symbol stands for 10
%
and 9 small symbols V stand for 1.
V V V
V V V
This brief description shows that the Babylonians had a well developed
illustration system for visualizing numbers. It was not limited to integers;
Babylonians also used fractions. Their abilities for reasoning with numbers
included extracting square roots, solving linear systems, using Pythagorean
triples such as (3, 4, 5): 32 + 42 = 52, and solving cubic equations using tables.
Several of these actions can be qualified as visual reasoning too.
33211111+222221111111=33222222211
215 + 57 = 272
Figure 2. Visual adding integers
R R R R R R
2 2 2 2111 2111 2111
211 211 211
1/10 1/10+1/10=2/10 1/25 1/25+1/25=2/25
Figure 3. Visual adding fractions
33 2 11 Two hundred
222 1111 57
Fifty seven
22 111
2222 11 272 two hundred
33 222
seventy two
Let us assume that we need to sum the numbers 215 and 57 written in
words, two hundred fifteen plus fifty-seven. How can we do this using the
words? A procedure to do this does not exist even after 4000 years of using
numbers. The only method known now is converting verbal numbers to one
of the symbolic (visual forms).
::::
| :::: uu 2801
:::
|| ::: uuuuu 5602 = 2801*2
4444 5
|||
:: 11204 = 5602*2
|
:::
5
||||
444
Sum 19607 =
||| ::: 2801 + 5602 + 11204
444
Figure 5. Visual summation (based on Arcavy, 2002)
162 Chapter 7
7 ||||| || 7 ||||| ||
5 ||||| 5 |||||
result ||||| ||||| || ||
result
compact result || 2
35
|| ||| 2 22
5 30
17 || ||| || 2
7 10
Sum 52 = (12 + 40)
|| ||| || ||| || 2222
12 40
Sum 52 with use “carry” || 22222
(after shifting 10 “|” to 101 position as 2)
0
2 50
Multiplication and division were done by using visuals and a lookup mul-
tiplication table for only 2n, 4n, 8n and so on for every n. Numbers in the
table were generated by repeated visual summation such as n + n = 2n then
2n + 2n = 4n and so on (see Table 7 for the number 25). For example, to
multiply 25 by 11 the property 11 = 1 + 2 + 8 is used along with the two
times multiplication table (Table 8): 25⋅11 = 25⋅(1+2+8) = 25 + 25⋅2 + 25⋅8
(see Table 9).
Table 8. Two times multiplication table for number 25
1⋅25 | || ||| 2 2 25
2⋅25 || 22222 50=25+25
4⋅25 |||| 3 100=50+50
8⋅25 |||| |||| 33 200=100+100
To get a feeling for the advantages of this visual process we just need to
try to multiply 25 by 11 solely in a textual form. This simple task becomes
very difficult to solve and even harder to prove that the result is correct. But
the visual arithmetic computation process is not simple either if we try to
record it completely. Below we show what happens when we add two num-
bers, 35 and 17, using standard arithmetic techniques. We use a spreadsheet
visualization similar to the one used in MS Excel. In Table 10, number 35
occupies cells (1,3), (1,4) and number 17 occupies cells (2,3) and (2,4). The
result should be located in cells (3,3) and (3,4). There are also cells (4,2) and
(4,3) reserved to writing carries. We use symbols aij for a content of cell (i,j).
In this notation, the algorithm for adding 35 and 17 consists of two steps:
Step 1: If a14 + a24 >9 then a34 := (a14 + a24) -10 & a43 := 1
else a34 := a14 + a24 & a43 := 0
164 Chapter 7
Step 2: If a13 + a23 + a43 >9 then a33 := (a13 + a23) -10 & a42 := 1
else a33 := a13 + a23 + a43 & a42 := 0
Table 10. Visual summation
j=1 j=2 j=3 j=4
i=1 3 5
i=2 1 7
i=3 sum 5 2
i=4 carry 0 1
At first glance, this algorithm does not appear to be visual, but the placement
of numbers is visual, similar to what users do everyday working with the
Excel spreadsheet graphical user interface. People accomplish these steps
easily visually, but it is hard to explain steps 1 and 2 without visuals, al-
though it is almost a complete computer program.
One khet is about 50 meters. This papyrus is now in the British Museum. Its
detailed description is presented in [Robins, Shute, 1998; Chance at al.,
1927, 1929]. The papyrus was made by the scribe Ahmose and sometimes is
called the Ahmose papyrus. The solution provided in the Rhind papyrus for
problem 50 is [Friedman, 2003a] :
4 4 3
16 14
9 7
25 21 36
49 42
67 64
Now we need to analyze how after discovering the similarity of the num-
ber of rocks in some circle and square to discover the mathematical indica-
tors to match the circle and the square areas. We already assumed that these
indicators are the diameter (or radius) of the circle and the side of the square.
It is a very realistic assumption. These indicators were already known to
Egyptians as the main characteristics of a circle and a square. But we need to
discover the relation between their squares, S2 =πR2 with an unknown coef-
ficient π.
In essence, this is a data mining task in modern terms. It could be tried in
ancient times visually again by experimenting. For instance, rocks can be
counted in the square that contained in the circle and in the square that con-
tains the circle getting S1<S<S2. Here, S1 is the number of rocks in the con-
tained circle and S2 is the number of rocks in the circle that contains the cir-
cle with area S. Obviously this approach would give a very rough estimate of
π. A more accurate result can be obtained by interpolating a circle by smaller
squares and counting rocks that are contained in the circle. For instance, we
can interpolate the circle area by subtracting from the total area of the sur-
rounding square (9⋅9=81) the area that is not in the circle. This is about a half
of four small square 3×3 in the corners of the 9×9 square in Figure 7, that is
4(3⋅3)/2, with the final result: 9⋅9 –2⋅3⋅3 = 81-18 = 63 that is close to 8⋅8=64.
In essence, this is an octagon interpolation of the circle.
This visual approach has been used in problem 48 of the Rhind papyrus.
In general, problems 41-43, 48, and 50 of that papyrus are devoted to the
circle area computation. Problem 48 of the Rhind Papyrus states [Write,
2000]:
The area of a circle of diameter 9 is the same as the area of a square of
side 8. Where does this come from?
To justify statement (1) problem 48 contains a famous drawing of a square
with an inscribed octagon:
A a b B
h c
g d
D f e C
AreaCircle(9)=AreaSquare(8), (1)
7. Toward visual reasoning and discovery 167
where 9 is the length of the diameter of the circle and 8 is the length of the
side of square. In contrast, problem 50 asks for discovery of the statement
presented in (1). Figure 7 presents the inscribed octagon graphically.
A a b B
h c
g d
D f e C
Figure 7. Illustration for the Rhind problem 48 (based on [Wright, 2000])
5. CONCLUSION
and the rectangles from different viewpoints, but visual guidance how to ex-
periment using visual tools for pattern discovery are not mature yet.
In Chapter 16, we present a new visual data mining technique based on
Monotone Boolean functions that intends to fill this gap for the Boolean data
type. The technique permits discovering patterns by analyzing structural in-
terrelations between objects (cases) in the original visualization and by
changing the visualization to its modifications that permits one to see a con-
tinuous border between patterns if it exists.
To build an advanced visual discovery system we need to start with a
clearly stated goal, as was done in the examples analyzed in this chapter,
such as the goal of discovering a formula to compute the area of the circle.
In modern visual data mining, the goal is discovering patterns. At first
glance, it looks that we also have a goal, but this goal is not that well stated
as computing the area. The correctness of the area computation can be tested
using well-stated and simple criteria. Like data mining, the goal in imagery
conflation (see chapters 17-21) is to find matching features. However, there
are no natural, well-stated formal criteria to test if the goal is reached, even if
people are able to match features by visual inspection of two images using
informal, tacit rules. There are two important questions:
(1) How can we know if a task can be solved by visual means?
(2) How do we select tasks to be solved by visual means?
The answer to both questions based on our analysis of early history of
mathematics is: The task should have a goal and a formalized criteria to
judge that the goal is reached, as was the case for these mathematical tasks.
For less formal tasks, visual reasoning is still possible, as chapters (BK) and
DB indicate, but the conclusions may be much less conclusive and the meth-
ods may be less sophisticated.
In Chapter 1 (Section 2.1) we discussed visualization, visual reasoning
and visual discovery for the Pythagorean Theorem. The first two tasks were
successfully solved visually many times (there are more than 300 different
proofs of the theorem), but it is not the case for visual discovery of the theo-
rem statement. It is difficult to formulate formal criteria for visual discovery.
The goal can be formulated easily – to discover the theorem statement.
However, we cannot assume that parameters and the types of relations
(polynomial or other) between them are known if it is true discovery. Thus,
the task should be formalized. This can be done in many different ways and
the solutions can be quite different. The early history of mathematics clearly
shows the trend from illustration to visual reasoning and discovery. This
chapter demonstrates that we can learn valuable lessons from this history.
The main lessons are:
• inappropriate results at the illustration stage harm the next stages of
visual reasoning and decision making;
7. Toward visual reasoning and discovery 169
70 2222222
70
2222222
Sum 140 = 10*14
2222222
0 140
Sum 140 = 100 + 40
2222 3
(after shifting 10 2 to 102 position as 3) 0 40 100
2. Compute visually 145 + 145 in the Egyptian hieroglyph system using Ta-
ble 12 for 140 + 140 as a prototype.
Table 12. Hieroglyphic arithmetic for 140
Number (in modern notation) Decimal position
| 10 0
2 101 3 102
140 2222 3
0 40 100
140 2222 3
40 100
2222
33
Sum 280 = 10*14
2222
0 80 200
70 2222222
7*10
280 2222 2222 33
0 8*10 200
222
2222222
||||| 33
Sum 385 = 5 + 180 + 200
2222 2222
2222 2222
5 180 200
Sum 385 = 5 + 80 + 300 2222 2222
(after shifting 10 2 to 102 position as 3)
||||| 333
2222 2222
4. Analyze efficiency of hieroglyphic visualization for arithmetic operations.
Do you see any cases where the summation or multiplication using hiero-
glyphic numerals can be accomplished faster than using the Hindu-Arabic
numerals? Tip: Start from the cases presented in exercise 1-3.
7. REFERENCES
Allen, D., (2001a) Counting and Arithmetic – basics, TAMU,
2001http://www.math.tamu.edu/~don. allen/history/egypt/node2.html
Allen, D., (2001b) Babylonian Mathematics, TAMU, 2001,
http://www.math.tamu.edu/~dallen/masters/egypt_babylon/babylon.pdf
Aleff, H., Ancient Creation Stories told by the Numbers, 2003, http://www.recoveredscience.
com/const105hehgods.htm
Arcavi, A., Using historical materials in the mathematics classroom,2002, http://www.
artemisillustration.com/assets/text/Rhind%20Papyrus.htm.
Berlin, H., Foreign Font Archive, Hieroglyphic and Ancient Language Typefaces, 2003,
http://user.dtcc.edu/~berlin/font/downloads/nahkt_tt.zip
Chance, A., Bull,L. .Manning,H., Archibald, R., The Rhind Mathematical Papyrus, vol. 1,
Oberlin, Ohaio, MAA, 1927; vol. 2, Oberlin, Ohaio, MAA, 1929.
Friedman, B., (2003a)A Selection of Problems from the Rhind Mathematical Papyrus and the
Moscow Mathematical Papyrus, 2003, http://www.chass.utoronto.ca/~ajones/cla203/
egyptmath2.pdf]
Friedman, (2003b) B., Sample hieratic math answers, 2003, [http://www.mathorigins.com/
Image%20grid/BRUCE%20OLD_007.htm]
Friedman, (2003c), B., CUBIT, 2003, http://www.mathorigins.com/image%20grid/ CU-
BIT_002.htm
7. Toward visual reasoning and discovery 171
VISUAL CORRELATION
Chapter 8
VISUAL CORRELATION METHODS AND
MODELS
Boris Kovalerchuk
Central Washington University, USA
Abstract: This chapter introduces the concept of visual correlation and describes the
essence of a generalized correlation to be used for multilevel and conflicting
data. Several categories of visual correlation are presented accompanied by
both numeric and non-numeric examples with three levels (high, medium and
low) of coordination. We also present examples of multi-type visual correla-
tions. Next, the chapter provides a classification of visual correlation methods
with corresponding metaphors and criteria for visual correlation efficiency.
Finally, the chapter finishes with a more formal treatment of visual correlation
providing formal definitions, analysis, and theory.
Key words: Visual correlation, heterogeneous data, visual data mining, information visu-
alization, glyph, metaphor, classification, guidance, distortion, formalization,
homomorphism, relational structure.
1. INTRODUCTION
(2) How does one visually correlate O/Es with different levels of resolu-
tion?
(3) How does one visually correlate conflicting data for a given O/E?
(4) How does one visualize data for different categories of users?
(5) How can an O/E symbol be made "rich enough" to portray the differ-
ences between O/Es?
To illustrate the problem and various approaches to it, we start with the
non-traditional problem of correlating non-numeric O/E data. One of the
major challenges here is that often such O/Es are represented by non-struc-
tured or semi-structured text. One solution, a visual correlation system and
visual language BRUEGEL described in Chapter 10, deals with this prob-
lem for text that is tagged with XML tags. The system and language are
named after Pieter Bruegel the Elder (1525-1569). The naming is not acci-
dental.
The visual correlation in BRUEGEL was inspired Bruegel’s famous
painting “Blue cloak” (1559) shown in Figure 1. This paining is named after
the Netherlandish (Flemish) proverb “She hangs a blue cloak (lies) around
her husband”. This proverb is “visualized” in this painting and is marked by
the center box in the picture below.
Figure 1. Pieter Bruegel’s painting “Blue cloak” (1559), oil on oak panel, 117 x 163 cm.
(with permission from Staatliche Museen zu Berlin - Gemaldegalerie, Berlin). See also color
plates.
8. Visual correlation methods and models 177
assumed that the relation to be visualized is already known and available for
the system. The software system computes and visualizes the relationship in
a single VC panel based on user’s data. The user’s role is relatively passive
and involves evaluating the VC without generating alternative visual correla-
tions.
Our first example is classical Linear or Curvilinear correlation using
Cartesian coordinates as shown in Figure 2(a). Initially, it may seem that
the relation to need not be known in advance; that it can be discovered visu-
ally by observing the plot. While it is true that the plot can be used for dis-
covery, that is a different role - not communicating a discovered relation to
an ignorant person. In general, the same visualization may or may not serve
both functions.
1
y 0.5
t
0
x x x1 x2 x3 x4 x5
(a) Linear and Curvilinear (b) Parallel coordinate correlation (c)Cartesian correlation
Cartesian correlation plot of five variables. of two numeric vari-
plots of two variables. ables vs. time.
Figure 2. Cartesian and Parallel correlation plots for homogeneous numeric variables
x1 x1 x1 x1
x1 x2 x3 x4
x2 x2 x2 x2
x1 x2 x3 x4
x3
x3 x3 x3
x1 x2 x3 x4
x4 x4 x4 x4
x1 x2 x3 x4
Figure 3. A grid panel of the pairwise correlation for four homogeneous numeric variables
184 Chapter 8
0 1 0 1 0 1 0 10 1
frequency
Figure 4. Table Lens: a table of individual distributions of numeric variables
Time t1
Time t2
Time t3
Figure 8. Examples of low-level visual correlations based on glyphs. See also color plates.
8. Visual correlation methods and models 187
Visual case
Concepts Commands
Concept 1 Command 1
Concept 2 Command 2
Concept 3 Command 3
Concept 4 Command 4
Concept 5 Command 5
Concept 6
Table 4 lists more metaphors that can be used in visual correlation tasks.
Two of them were implemented in Bruegel project (see chapter 10); these
were spreadsheet of icons and 3-D trees of icons.
Classification. Chi & Riedl [Chi & Riedl, 1998] offer a classification
scheme for data visualization methods. This is an “internal” classification
based on operations for transforming data into the visual form.
It provides a unified description of many well-known data visualization
techniques such as: Dynamic Querying [Ahlberg & Shneiderman, 1994],
AlignmentViewer [Chi, Riedl, Shoop, Carlis, Retzel & Barry, 1996], Paral-
lel Coordinates [Inselberg, 1997], SeeNet [Becker, Eick & Wilks, 1995],
ThemeScape and Galaxies [Wise, Thomas, Pennock, Lantrip, Pottier,
Schur & Crow, 1995], Hierarchical Techniques : Cone tree [Robertson,
Mackinlay & Card, 1991], Hyperbolic Browser [Lamping, Rao & Pirolli,
1995], TreeMap [Johnson & Shneiderman, 1991], Disk Tree [Chi et al.,
1998], Perspective Wall [Mackinlay & Robertson, Card], WebBook and
WebForager [Card, Robertson & York, 1996], Table Lens [Rao & Card,
1994, 1995], Time Tube [Chi et al., 1998], Spreadsheet for Images
[Levoy, 1994], FINESSE [Varshney & Kaufman, 1996], Spreadsheet for
Information Visualization [Chi, Barry, Riedl & Konstan, 1997].
Visual correlation is in need of an “external” classification scheme that
reflects the goal of supporting the correlation of complex objects and events.
This means that we would classify methods based on how they present
correlated objects to a user and less on how the visualization has been
obtained from original data.
Examples presented in the previous section served as the basis of the
classification system shown in Tables 5 and 6. We distinguish types, sub-
types and individual visual correlation methods. Here some types are the
same as individual methods.
Property (2) is the ultimate goal of visualizing known relations. Formulas (1)
and (2) encode two steps:
(S1): Produce a visual representation of the relation R:
R(a,b) ÆV V(R(a,b))
V(R(a,b)) ÆP Q(a,b).
Both steps S1 and S2 can produce corruptions of the relation R(a,b) and fi-
nally produce a relation Q(a,b) that differs significantly from R(a,b); that is,
Q(a,b)≠R(a,b).
Distortion. Below we provide an example of relation distortion. Let us
consider the simple relation between two numbers a=2 and b=4: R(2,4)=“the
number 2 is two times smaller than the number 4.” Figure 14(a) visualizes
this relation by matching a and b with the radii of the circles so that Rb=2Ra.
In this case, the relation Q1 to be perceived is as follows:
Q1(a,b)=true Ù rb=2ra .
Figure 14(b) visualizes the same relation by presenting two circles with areas
Sa and Sb where the first area is half the second area, Sb=2Sa. In this case,
the relation Q2 is quite different:
Q2(a,b)=true Ù Sb=2Sa .
(a) (b)
Figure 14. Radius and area visualization metaphors
On the other hand, suppose we derive Q2 from Q1. Since rb=2ra in Q1, we
would have area Sa = ra2 and area Sb = (rb)2 = (2ra)2 = 4(ra)2 =4Sa.
Thus, the relation rb=2ra for radii is equivalent to the relation Sb = 4Sa when
converted to areas. This is double the relation originally expressed by Q2.
Without guidance, a person does not know what relation to use, Q1 or Q2,
for comparing alternatives. While the radius relation Q1 is a correct visual
representation for relation R(a,b), without guidance, a person may compare
areas Figure 14(a) even without consciously noticing it. As a result, the per-
son might conclude that a is four times smaller than b. This is neither rela-
tion Q1 or Q2 but rather a third relation Q3
Translated into a relation this will produce an incorrect statement namely the
number a is one quarter of the number b.
8. Visual correlation methods and models 195
Surely then, Figures 14(a) requires guidance so that radii of the circles
are compared. Without this guidance, areas might be compared in which
case the actual relation R(a,b) would not be discovered. Indeed, even with
correct guidance, a person may still not be able to extract the relation R(a,b)
precisely because of misperception.
Misperception. The human misperception of a characteristic such as
area or radius may cause that the corresponding relation, such as Sb=2Sa be-
tween areas of two circles, not to be recognized. In fact, psychological stud-
ies [Tufte, 1983] have shown that the perceived area of a circle probably
grows somewhat more slowly than the actual (physical, measured) area:
with limits (LF< 0.95, LF>1.05) for substantial distortion [Tufte, 1983].
Next, we illustrate misleading visual expectations in terms of visualized
relation R(a,b) between two data sets a={x} and b={y} where for every x,
y=2x. Figure 15(a) visualizes this relation while preserving proportion. Such
a visualization permits the relation y=2x be discovered. Alternately, Figure
15(b) shows the same relation but with inconsistent, disproportional axes
since units on the y-axis are a quarter of the size of units on the x-axis.
Chart Title
30
25
20 y
30
15 25
20
10 15
10
5 5
0
0
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
1
11
13
15
5
x x
(a) (b)
Figure 15. Visualization size effect
(a) (b)
Figure 16. Ptolemy’s Geocentric world (a) and Copernicus’s Solar system (b)
If objects A and B are given by their numeric attributes then they can be
correlated by comparing the value of attributes, computing a measure of
their closeness, and finally by evaluating that measure. If the measure is high
enough then A and B are called correlated. This process can then be visual-
ized in a variety of ways such as those presented in the previous sections.
However for many tasks, objects A and B are not represented directly by
their attributes rather relations between A and B that are directly and explic-
itly recorded in a database. In such cases, correlation may need to be discov-
ered from indirect data that can be spread in different records or even differ-
ent databases. For such tasks, one approach to discovering the correlation
between A and B is done by using an intermediate object B’.
This approach can be successful for tasks where discovering some relation
R1 between A and B’ and another relation R2 between B’ and B would be
simpler than discovering a single relation R between A and B, R(A,B) di-
rectly. It is also expected that these two relations R1 and R2 can be combined
into a single relation between A and B without significant difficulties. Thus,
this approach requires the discovery of an intermediate object B’. Note that
relation R1 between A and B’ and relation R2 between B and B’ can be quite
different. This approach has a lot in common with link analysis. The
DARPA EELD program [Senator, 2001] is the most intensive recent attempt
in this area.
Example. Objects A and B are two terrorist attacks and the goal is to cor-
relate them, that is to find what is common between the attacks. The straight
comparison of attributes may not reveal any correlations useful for decision
making -- preventing new attacks and punishing those who are responsible.
More specifically, let a set of intermediate objects {B’} be available and say
that these objects constitute all communication intercepts in countries where
198 Chapter 8
5.2.2 Definitions
Definition 1. Two objects A and B from classes A and B are exactly cor-
related objects if there is a homomorphism between them.
Informally homomorphism means that relations in A have been matched
to relations in B with the same properties, that is the structures of A and B
and similar. For more formal definitions see section 5.3 and [Mal’cev, 1973;
Kovalerchuk & Vityaev, 2000].
Definition 2. Two objects A and B are correlated objects if there is object
B’ (from class B) exactly correlated to A, where B’ is produced from B by
some mapping F and B’=F(B).
Definition 3. A function (B,B’) is called the difference between B and B’
if : {B,B’} [0.1] and for every B and B’ (B,B) = (B’,B’) = 0.
Definition 4. Visual correlation of two correlated objects A and B is a pair
of visualizations VC1 and VC2, where VC1 is a visualization of the similar-
ity (homomorphism) between A and B’ and VC2 is a visualization of the
difference between B’ and B.
Figure 17 illustrates Definition 4.
exact correlation
Similarity Difference
A Degree of similarity B' degree of difference B
homomorphism
In a similar way, the upper element is (y,y) = (x,y) ∨ (y,x). In general, upper
and low elements may not belong to A.
Let B be mapped to B’ by a homomorphism F, F(B) = B’, that is B’ may
contain fewer elements than B. Also let A be mapped to B’ by a mapping
M(A) = B’ such that every (x,y) in A is mapped to z by its largest compo-
nent, z = max(x,y). Thus, both A and B can be mapped to B’:
A → B’← B
and the concept of the algebraic system [Mal’cev, 1973]. Another even more
abstract concept of class can be derived from the mathematics of category
theory [Marquis, 2004]:
Category theory now occupies a central position not only in
contemporary mathematics, but also in theoretical computer science and
even in mathematical physics. It can roughly be described as a general
mathematical theory of structures and sytems of structures.
Below we will define and use the concepts of relational structure (model)
and algebraic system.
Definition 5. A pair A=<a, Ωa> is called a relational structure (model),
if a is a set of objects and Ωa is a set of relations (predicates) Pi on Cartesian
products of a, such that
Pi :a×a×…×a → {0,1}
Correlating algebraic systems A = <a, Ωa> and B = <b, Ωb> has exactly
the same goal. We build a function f from a to b with |f(ai) – bi|<εi, where εi
is small or equal to zero and where knowing properties in A we use f to judge
properties in B. In classical correlation, the set of such properties is not for-
mulated explicitly. Algebraic systems permit the writing such properties ex-
plicitly. For instance, we may want to be sure that additivity in A is pre-
served in B; that is, we may postulate an additive operator F(a1,a2) in A, F:
a×a → a,
F(a1,a2)=kF(a1)+mF(a2),
G(b1,b2)=sG(b1)+tG(b2).
where > indicates that subjects are ordered (indexed) and ≥h indicates a
greater than or equal to relation for human height.
Let B be another relational structure
where ≥w indicates a greater than or equal to relation for human weight. Here
predicates Pa and Pb are the following:
6. CONCLUSION
tion and formalizations for visual correlation methods and criteria to assess
the quality of visual correlation.
7. ACKNOWLEDGEMENTS
3. Expand Table 4 from Section 3 with more metaphors for visual correla-
tion tasks.
4. Provide your own example of relation distortion similar to the one pre-
sented in Figure 14 in Section 5.1.
Advanced
6. Try to visualize an algebraic form of the example presented in Section
5.3. Tip: start from visualization of classical linear regression and visual-
ize algebraic relations (6) and (7) as a part of this exercise.
9. REFERENCES
Asch, S., Studies of independence and submission to group pressure, Psychological mono-
graphs, 1956
Ahlberg, C., Shneiderman, B., Visual information seeking: Tight coupling of dynamic query
filters with starfield displays. In Proceedings of ACM CHI’94 Conference on Human Fac-
204 Chapter 8
Mackinlay, J., Robertson, G., and Card, S., The perspective wall: Detail and context smoothly
integrated. In Proceedings of ACMCHI’91 Conference on Human Factors in Computing
Systems, Information Visualization, pages 173–179, 1991.
Marsh, H.S., Decision Making and Information Technology, Office of Naval Research, 2000,
http://www.onr.navy.mil/sci_tech/information/docs/dec_mak_it.pdf
Mikulin, L., Elsaaesser, D., The Data Fusion and Correlation Techniques Testbed (DFACTT)
http://stinet.dtic.mil/ Report, 404576, Defense Research Establishment Ottawa, 1994
North, C., A User Interface for Coordinating Visualizations based on Relational Schemata:
Snap-Together Visualization, University of Maryland Computer Science Dept. Doctoral
Dissertation, May 2000.
Novak, G. Diagrams for Solving Physical Problems, in J. Glasgow, N. Narayanan, and B.
Chandrasekaran, eds., Diagrammatic Reasoning: Cognitive and Computational Perspec-
tives, AAAI Press / MIT Press, 1995, pp. 753-774.
Pirolli, P., Rao: R., Table Lens as a Tool for Making Sense of Data, 1996,
http://citeseer.nj.nec.com/cache/papers/cs/25475/http:zSzzSzwww.parc.xerox.comzSzistlz
SzgroupszSzuirzSzpubszSzpdfzSzUIR-R-1996-06-Pirolli-AVI96-
TableLens.pdf/pirolli96table.pdf
Rao, R., Card, S., The Table Lens: Merging graphical and symbolic representations in an
interactive focus+context visualization for tabular information. In Proceedings of ACM
CHI’94 Conference on Human Factors in Computing Systems, volume 1 of Information
Visualization, pages 318–322, 1994. Color plates on pages 481-482
Rao, R., Card, S., Exploring large tables with the table lens. In Proceedings of ACM CHI’95
Conference on Human Factors in Computing Systems, volume 2 of Videos, 403–404,
1995.
Robertson, G., Mackinlay, J., Card, S., Cone trees: Animated 3d visualizations of hierarchical
information. In Proceedings of ACM CHI’91 Conference on Human Factors in Computing
Systems, Information Visualization, pages 189–194, 1991.
Senator, T., 2001, EELD program, http://www.darpa.mil/ito/research/eeld/EELD_BAA.ppt
Shaffer E., Reed D., Whitmore S., Shaffer B., Virtue: Performance visualization of parallel
and distributed applications, Computer, v. 12, 1999, 44-51.
Tufte, E.R., Visual Display of Quantitative Information, Graphics Press, 1983, 1992
Varshney, H., Kaufman, A., FINESSE: A financial information spreadsheet. In IEEE Infor-
mation Visualization Symposium, 70–71, 125, 1996.
Wise, J., Thomas,J., Pennock,K., Lantrip, D., Pottier,M., Schur, A., and Crow, V., Visualizing
the non-visual: Spatial analysis and interaction with in-formation from text documents. In
Proc. Information Visualization Symposium (InfoVis ’95), 51–58. IEEE, IEEE CS, 1995.
Yang-Peláez, J., Flowers, W., Information Content Measures of Visual Displays, Proceedings
of the IEEE Symposium on Information Visualization 2000, IEEE, pp.99-104.
http://www.computer.org/proceedings/infovis/0804/08040099abs.htm.
Chapter 9
ICONIC APPROACH FOR ANNOTATING,
SEARCHING, AND CORRELATING
Boris Kovalerchuk
Central Washington University, USA
Abstract: This chapter presents the current state-of-the-art in iconic descriptive ap-
proaches to annotating, searching, and correlating that are based on the con-
cepts of compound and composite icons, the iconic annotation process, and
iconic queries. Specific iconic languages used for applications such as video
annotation, military use and text annotation are discussed. Graphical coding
principals are derived through the consideration of questions such as: How
much information can a small icon convey? How many attributes can be
displayed on a small icon either explicitly or implicitly? The chapter also
summirizes impact of human perception on icon design.
Key words: iconic representation, compound icon, composite icon, iconic query, iconic
sentence, graphical coding
1. INTRODUCTION
tic hierarchy (ontology). These icons serve as building blocks for ordinary
compound icons, which are comprised by as many as three icons placed
side-by-side. Table 1 shows icons that are similar but not identical to those
used in Media Streams since our goal is only to present Media Streams con-
ceptually. In essence, a user selects the desired icons from a semantic hierar-
chy of icons to builds an iconic sentence. Media Streams does not permit
complex combinations of icons such as superimposing one icon on another
with possible resizing or color change. Each concept is encoded in an icon
and more complex concepts such as “two cars’ are encoded using two icons
“car” and ‘two”. Similarly “three blue birds” are encoded as “bird” and “blue
three” and “two adult female dentists” are represented by three icons “adult
female”, “dentist” and “two”.
á y y
y
Three blue birds
pyy______ á y y
y ________________
Figure 1. Ordinary compound icons “two cars” and ‘three birds: with their time lengths.
b
An adult male using his hand to operate a gun
(black circle in hand indicates an object)
Þ 9
Location icon line:
Earth ground icon Þ
outdoor icon 9
video input
b ____
Jon
Character line:
Adult man icon with
attached descriptor “Jon”
b
Character’s action
icon line:
Jon using his hand to
operate a gun
Jon
A user can select appropriate icons from a hierarchical menu of icons in GUI
interface. The first line in Table 3 show a location of the shot with two icons
“Earth ground” and “outdoor” selected by the user.
210 Chapter 9
The second icon specifies the first one and sits below it in the ontology
hierarchy. The Media Streams time line uses logarithmic time scale to
shorten description. Filling time lines requires some experience with the sys-
tem. Usability studies have shown that after two weeks people are comfort-
able enough to make annotations [Davis, 1995].
2. ICONIC QUERIES
Þ9 Location:
outdoor
Þ9
b___ Character: adult
man
b___
b Action:
man operates
b shot 1
Þ9
gun by hand
b___
b shot 2
9. Iconic approach for annotaing, searching, and correlating 211
b
Table 6. Complex queries
Visual query Content
OR query
AND query
b
Standard query: Find an adult male using his
hand to operate a gun
OR
b
Exception: “Find an adult male using his hand to
operate a telephone”.
Iconized DB record about relations between Iconic query: “Find company X that that was
companies A and B and their business profile recently bought by company A and that has
attributes depicted in icons and text”
Figure 2. Bruegel iconized records and queries
9. Iconic approach for annotaing, searching, and correlating 213
3. COMPOSITE ICONS
?
?
Truck Damage Uncertainty Damaged truck. Truck damage
Questionable
Table 9 illustrates two composite icons that are generated by different se-
quences of superimposed icons. The meaning of the icon “the key over the
envelope” can differ from the meaning of the icon “the envelope over the
key.” The first icon can mean a “secure message” and the second icon can
mean a “protected security key.”
Secure message
Message on security
issue
B↵A ≠ A↵B
214 Chapter 9
The major element of Mil 25-25 iconic language is the shape. Shapes are
used to encode affiliations (friend, hostile, neutral, …). In addition, shape
information is duplicated through the use of color (for color blind people,
black and white monitors, and drawing). The shape encodes not only affilia-
tion but two more characteristics: battle dimension (air, space, ground, …)
and type (units, equipment, installations, …). Figure 6 shows the shape for
hostile air track equipment along with the icon hierarchy. The whole set of
shapes includes 9 × 9 = 81 icons.
military
Object: Plane
Media Streams
Location: in air
compound icon
Direction Arrow
ing an ellipsis “…”. Figures 8(b) and 8(c) provide other examples of small
icons that encode 1-2 attributes each. For instance, Figure 8(b) shows an
“iconic sentence” that describes ways to manipulate the compiler and linker.
This sentence differs structurally from traditional sentences in natural lan-
guages, but it is still the description of a complex object. All three parts of
Figure 8 keep meaningful metaphors.
(a) Borland C++ icons “find”; “find and replace”, and “find again”
Figure 9(a) shows Microsoft Visual C++ icons that accommodate two
concepts by using a combination of graphics and text that includes a tool
icon and a numeric index. Here a part of the metaphoric component is lost.
The number identifies a specific tool (e.g., tool 6) and does not convey di-
rectly that the tool is spy++.
(a) (b)
Figure 9. Combination of graphics and text
Figure 9(b) shows how three Borland C++ icons “compile unit,” “make
project,” and “build project” convey more concepts -- three attributes are
encoded in each icon. The last icon combines attributes of icons 1 and 2 tak-
ing symbol “!” from the first icon and yellow folder symbol from the second
icon. The first one is the class of operations for the “steps of producing ex-
ecutable (binary) code” (binary sequence “100101”). In addition, each icon
222 Chapter 9
conveys two specific attributes. The first icon conveys explicitly the attrib-
utes: (1) compile (!) and (2) unit (page and blue color); the second icon con-
veys two attributes: (3) make (“?”) and (4) project (folder and yellow color);
the last icon conveys attributes (4) “project” (folder) and attribute (5) “build”
(“!”). Together these three icons form an “iconic sentence” that depicts the
complex object “Borland compiler and linker”. The use of color permits
immediate recognition that the second and third operations deal with one
entity and the first icon deals with a different entity. As you can see, the pro-
ject concept is encoded by two graphical features (a folder and the color yel-
low). Similarly, the unit concept is encoded by two other graphical features
(a page and the color blue). Thus, there is a redundancy in this encoding –
features are doubled graphically resulting in clearer and more quickly distin-
guishable icons. These icons also permit one to see that the first icon indi-
cates an action that produces executable code in one step. In contrast, the
same result can be produced by using icons 2 and 3 together (the symbol
“!”).
Each icon presents three concepts using four graphical features. We can
conclude that a simple 32 × 32 pixels icon is able to encode 3-4 independ-
ent concepts without causing any perception difficulties.
Generally, an iconic sentence that contains three icons can depict 9-12
concepts explicitly and can also represent many relationships between the
icons implicitly, such as: (i) to represent the same object or different objects
(encoded by color and shape) and (ii) to represent a subset of operations
(symbols “?” and “!”). Table 14 shows another example depicting 3-4 icons
using iconic metaphors. The analysis above shows that a typical software
icon explicitly presents 1-4 concepts per icon.
Borland
Debugger Win32
(background
(bug iconels) (iconel 32)
color pattern)
Borland Installation
Debugger Win32
(background program
(bug iconels) (iconel 32)
color pattern) (disks iconel)
9. Iconic approach for annotaing, searching, and correlating 223
At first glance, the observations above permit one to conclude that four
independent concepts should be a practical maximum for the number of ele-
ments depicted in a small icon. This would then justify a compression ratio
(from text to icons) of at most 4:1. That is, four icons (each depicts only one
concept) can be combined into one small icon that would represent four text
concepts.
The number of concepts implicitly encoded in an icon can be much lar-
ger than three or four concepts. This is done by relaxing the requirement of a
one-to-one match between an attribute and an iconic metaphor. Below we
describe an example from a Singapore executive job service company
[http://www.liahona.com.sg/].
This company provided a long description of (1) the benefits that their
client companies may provide to an employee, (2) application and resume
requirements, and (3) expatriate bonuses. Each of these areas is represented
by a single icon described in Table 15. The icons we use in table differ from
original icons but encapsulate similar information.
i
The company provides at least half of these benefits: medical, dental, accident
insurance, low interest loans for car & housing, education assistance, transport
allowance, technical & development training, holiday subsidy plan, recrea-
tional facilities, annual company function, compassionate, marriage, maternity,
sizable paternity, childcare, examination leave, stock options purchase plan, profit
benefits sharing, etc
Interested candidates are to apply with a detailed resume stating work experi-
ence, educational qualifications, full personal particulars of current & expected
salary, starting data or resignation notice required, contact numbers (during &
after office hours), address, age, nationality, marital status, language ability
and driving license, a photograph and supporting documents.
224 Chapter 9
It is hard to measure how many concepts are really depicted in the first
three icons in Table 15, but obviously it is more than four concepts or attrib-
utes. It shows that a one-to-one mapping between text concepts and icon
features is avoidable although a user may need assistance in learning an im-
plicit iconic language without a steep learning curve. Table 16 presents an
example of two iconic sentences that summarize two open positions. These
iconic sentences are obviously shorter then original text and can be com-
pared and correlated faster.
Table 16. Iconic sentences that summarize open positions
Due Trans- Phone Bene- Expat- Resume
Pay package Career
Date port Fax fits riate require-
growth
nearby package ments
X 10 min 781-
6FP
6-figure pay, i + large
June 1 walk 1430 > $100,000 large large
X 10 min 980-
$$
depends on
i
modest
+
modest
modest
July 9 walk 1036 experience
Another way to take into account a perceptual aspect in icon design is us-
ing the ecological approach [Gibson, 1979; Preece, 1994] -- to help a user
to simply detect information rather than to construct information from the
image. Detection is a single step process, but constructing may take two or
more steps.
For instance, a user needs to analyze information on a victim. If victim’s
information is in two different spots (see Figure 10(a)) then the user needs to
assemble/construct this information before analyzing it.
In contrast, Figure 10(b) provides victim’s information already assem-
bled as a single focus entity in the center of the window. The Bruegel iconic
system supports the icon and an icon elements relocation mechanism to be
viewed correlated.
4.
4
(a) (b)
Table 17. Principles of graphical coding in a single application based on [Preece, 1994]
Maximum
Type of Graphical number of Perception
Examples of entity
entity element effective comparison
codes
Any Alpha- 12”, AK-47, high tem- Practically Words are scanned
numeric perature, 35o unlimited longer than letters.
(self- Letters are scanned
evident longer than digits.
meaning)
8. CONCLUSION
9. ACKNOWLEDGMENTS
1. Design ten base icons and construct four compound icons using them.
Build four iconic queries using the ten initial icons and four compound
icons. Each query should contain at least four icons.
2. Use the icons designed in exercise 1 and construct four composite icons.
Build four iconic queries using the ten initial icons and four composite
icons. Each query should contain at least four icons.
3. Build four composite icons that will encode six attributes each explicitly.
4. Build four composite icons that will encode at least ten attributes implic-
itly.
11. REFERENCES
Chang, S. K Symbolic Projection for Image Information Retrieval and Spatial Reasoning,
Academic Press, 1996.
Dictionary of Blissymbolics, 2001, http://www.symbols.net/blissymbolics/dictionary.html
Davis, M., Media Streams, Ph.D. dissertation, MIT, Media Lab.,1995,
http://garage.sims.berkeley.edu/pdfs/1995_Marc_Davis_Dissertation.pdf
Davis, M. Media streams: an iconic visual language for video annotation, 1995,
http://www.w3.org/People/howcome/p/telektronikk-4-93/Davis_M.html
Davis, M. Media stream: an iconic visual language for video representation, In: Readings in
Human-Computer Interaction toward the year 2000, Morgan Kaufman Publ., San Fran-
cisco, 1995, 854-866.
Gibson J.J., The ecological approach to visual perception, 1979, Boston: Houghton Mifflin.
Military Standard 25-25 [http://symbology.disa.mil/symbol/mil-std.htm]
Narayanan A. , Shaman T., Iconic SQL: Practical Issues in the Querying of Databases through
Structured Iconic Expressions, Journal of Visual Languages & Computing Volume 13, Is-
sue 6, December 2002, Pages 623-647, 2002.
Preece J. Human-Computer interaction, Addison-Wesley, 1994
Spence, R. Information Visualization, ACM Press, Addison-Wesley, 2001
Chapter 10
BRUEGEL ICONIC CORRELATION SYSTEM
Abstract: This chapter addresses the problem of visually correlating objects and events.
A new Bruegel visual correlation system based on an iconographic language
that permits a compact information representation is described. The descrip-
tion includes the Bruegel concept, functionality, the ability to compress infor-
mation via iconic semantic zooming, and dynamic iconic sentences. The chap-
ter provides a brief description of Bruegel architecture and tools. The formal
Bruegel iconic language for automatic icon generation is outlined. The second
part of the chapter is devoted to case studies that describe how Bruegel iconic
architecture can be used for the visual correlation of terrorist events, for file
system navigation, for the visual correlation of drug traffic and other criminal
records, for the visual correlation of real estate and job markets offerings, and
for the visual correlation of medical research, diagnosis, and treatment.
Key words: Visual correlation, iconographic language, semantic zooming, database visu-
alization, iconic representation
1. INTRODUCTION
The Bruegel visual correlation system permits the compact visual annota-
tion of information for objects and events along with their rapid comparison
and correlation, search and summary presentation. The system was named
after Flemish painter Pieter Bruegel and was inspired by his famous painting
“Blue cloak” shown partially above (see Figure 1 and Table 1 in Chapters 8
for more detail). The main categories of possible visual correlation systems
232 Chapter 10
are described in Chapter 9. The Bruegel iconic system supports three catego-
ries. To begin with, it is a spreadsheet category where each object and event
(O/E) is represented as a spreadsheet of icons organized in rows. The second
category is a 3-D tree presentation where each O/E is represented as a 3-D
tree of icons located in nodes. Icons located at the terminal nodes convey
most detailed information about O/E while upper-level nodes convey more
generalized information. The third representation is a “planted” 3-D tree
representation that combines a geographic map or image with a 3-D tree
“planted” at the location of O/E on the map/image. Each of these representa-
tions has its own semantic zoomed form which is conceptually described in
section 2.2. This form permits “semantic compression” of icons to get a
more compact representation.
The Bruegel visual correlation system includes several components:
• the Bruegel graphical language (BGL) that specifies the layering of
iconographic elements into complex icons to represent textual con-
tent in a space efficient manner,
• Dynico, a supporting tool for BGL which aids in the generation of
complex icons, iconic sentences, and spreadsheets of icons dynami-
cally
• 3DGravTree, a supporting tool for BGL to generate complex 3-D
trees including “planted” trees of icons dynamically, and
• other graphical tools to support BGL for the visual correlation of
complex objects and events.
Figure 1. Sequence of analyst’s work with the Bruegel system with supporting facilities
ish proverbs (see Figure 1 and Table 1 in Chapters 8 for more detail). This
unique visual language provides the highest level of compression, but it is
not well structured for analysis and visual correlation. Other iconic lan-
guages have a one-to-one mapping with text sentences. Consider for exam-
ple Bliss’s iconic language [Bliss for Windows, 2001] a sample of which can
be found in Figure 4(a) below. In such a language, practically every text
word is converted to an icon. Thus, an iconic sentence can take even more
space than the corresponding text. Still the Bliss language fulfills its purpose
– a global international communication tool - a visual Esperanto; however, it
does not serve as a language for visual correlation.
Iconic languages such as Mil Std 25-25 and Media Streams described in
Chapter 9 represent intermediate steps between “Blue cloak” and Bliss. Fig-
ure 2 depicts the relative compression provided by these languages.
10 icons
Figure 3. Dynamics of compression of iconic sentence. See also color plates.
10. Bruegel iconic correlation system 235
Objects and events that change over the time and space are a special chal-
lenge to the iconic approach. Such dynamic objects and events can be repre-
sented using several approaches [Chang, Bottoni, Costabile, Levialdi &
Mussio, 1998, 2002]. Table 1 contains a brief description of these three ap-
proaches (spatial-temporal, semantic explicit and semantic implicit).
Spatial- The normalization of temporal events by indexing temporal and spatial changes
temporal using some temporal and spatial scales and reference points
Semantic Fixed semantics: semantically relevant atomic units organized into various tem-
explicit poral patterns (repeated cycles, scripts, etc.)
The Media Streams iconic system [Davis, 1995] uses the semantic im-
plicit approach because its goal is to search for a video segment with a simi-
lar physical event. The Bruegel system follows all three approaches because
when correlating events such as terrorist attacks each of these representa-
tions are needed.
10. Bruegel iconic correlation system 237
another to define a new concept. Consider the example in Figure 4(a) of the
"Love in marriage" icon generated from the two words “Love” and “Mar-
riage.”
As an example of the complexity that can occur, consider the visualiza-
tion of storytelling [Gershon & Page, 2001], massive texts and data often
need more complex combinations of icons. Gershon and Page present sev-
eral storytelling examples. For instance, the sentence “G-shape building is
active between 8 and 10 a.m. and 4 and 6 p.m.” might be part of a story that
should be visualized.
For situations like this, we are developing a third approach. The first and
second approaches are not scalable for the iconographic visualization of
massive amounts of data where potentially thousands of icons are needed. It
is not realistic that such a number of individual icons can be crafted in a
static or static-dynamic manner. For instance, what if we want to visualize
time information: “3:45 p.m. and 10 seconds.” We may use a traditional
watch icon with arrows for hours, minutes, seconds, and a colored dot indi-
cating a.m./p.m. status as in Figure 4(b). The static approach for this visuali-
zation will require 432000 = 60 * 60 * 60 * 2 icons, assuming that each of
three arrows can take any of 60 possible minute positions and two indica-
tions (a.m., p.m.). When on takes into account that some positions of the
hands do not portray critical data, we can cut the number of alternatives to
86400 = 12 * 60 * 60 * 2 icons since we can assume that for any hour (out of
12), there are 60*60*2 combinations of the minute arrow, the second arrow
and a.m./p.m. For such an application, the dynamic generation of visual fea-
tures (icons) on demand is a natural way to solve the problem.
the only dynamic system for creating aggregate iconographic images in this
manner.
The dynamic approach includes not only the dynamic generation of static
icons but also the dynamic generation of dynamic icons such as framed ani-
mation and key-framed interpolated animation. Since the goal of the current
version is speed of visual interpretation of textual information and data, we
have chosen to focus on static display states (static icons). This also avoids
chronologically masking data that occurs in animated graphics.
<?xml version="1.0"?>
<ICONEL id="human armed.SRT" dimension="600,600" dynels="1">
<DYNEL id="New Dynel" frames="2">
<FRAME id="New Frame" primitives="24">
<PRIMITIVE id="New Primitive" type="LINE" width="4"
stroke="44,200,44" fill="155,155,255" points="2">
<POINT coordinates="585,513"/>
<POINT coordinates="513,585"/>
</PRIMITIVE>
<PRIMITIVE id="New Primitive" type="LINE" width="4"
stroke="44,200,44" fill="155,155,255" points="2">
<POINT coordinates="585,450"/>
<POINT coordinates="450,585"/>
</PRIMITIVE>
<PRIMITIVE id="Gun" type="POLYGON" width="2"
stroke="1,1,1" fill="77,77,77" points="7">
<POINT coordinates="387,315"/>
<POINT coordinates="405,315"/>
<POINT coordinates="405,225"/>
<POINT coordinates="414,225"/>
<POINT coordinates="396,180"/>
<POINT coordinates="378,225"/>
<POINT coordinates="387,225"/>
</PRIMITIVE>
</FRAME>
</DYNEL>
Figure 5. XML representation of iconel data
A number of tools are provided to control the various aspects of the file
such as primitive type, color, ordering and shape. Things that would be hard
to edit textually are included in this view. This includes scaling a primitive,
changing the rendering order using a tree control, moving primitives, or
moving entire frames.
The XML View allows access to and editing of the textual details of the
file. This provides an easy way to change aspects of the file such as reorder-
ing large groups of Dynamic Icon Objects, or copying/pasting of data.
Rather than spending time designing interface features for every detail of a
SRT file, some data are more easily accessed from this XML View.
Parameter a might have 32 different values in a range [0, 31] pixels while
parameters b, c and d might have 11 values 0%, 10%, 20%, …, 90%, and
100%. In this way we generate parametric icons.
A collision in BGL can be automatically detected using standard clipping
algorithms from computer graphics. For rectangular icons this is trivial and
for rounded it remains simple. For more complex icons, it can be more chal-
lenge, but assuming vectorized icons presented as objects, we can use a
polygon clipping algorithm to test for collision.
BGL uses rules to avoid and resolve collisions. Below we present exam-
ples of such rules:
Rule 1. If iconel x is a human target and iconel y is an armament/weapon
then y should be posted on x and made 80% of x size:
x ↵ (80%y).
f(m1,m2)↵ (30%y),
f(x) ↵ colorAlternate(n%y).
In this case study, we use DARPA MUC-3 and MUC-4 data on terrorist
activities in Latin America in 1980s. MUC data are now in the public do-
main at NIST and downloadable from [MUC Data Sets, NIST, 2001].
Table 2 presents a summary of the raw text corpus and Table 3 provides a
sample of raw text message.
The case study uses structured data called the development corpus that
has been produced by 15 MUC teams from raw texts during MUC-3/MUC-4
using manual categorization and tagging. The MUC-3/MUC-4 task was to
automatically extract information about terrorist incidents from raw test
texts compiled for two years using the structured development corpus as
training data. The goal was to determine when a given text contained rele-
vant or irrelevant information. Each team stored their results in a template
format, one template per event.
The template format contains with 24 attributes called slots. The text ex-
ample above has been converted to two output templates because it describes
two terrorist events; the kidnapping of Castellar and the bombing of Tor-
rado’s four-wheel drive vehicle.
Template attributes cover the date and location of the incident, the type
of incident (24 types of violence), the perpetrators, victims, and physical
10. Bruegel iconic correlation system 249
targets in the attack, and the effect on the target. If there is more than one
value for a slot then options are separated by a “/” [Hobbs et al., 1996].
The twenty-four types of violence include eight basic types (such as mur-
der, bombings, kidnappings, arson, and attacks) plus two variations on each
(for threatened incidents and attempted incidents). There are also judgmental
attributes such as perpetrator confidence, concerning the reliability of the
perpetrator's identity.
The manual process of obtaining structured data as a development corpus
was time consuming and non-trivial:
… it takes an experienced researcher at least three days to cover 100 texts
and produce good quality template representations for those texts. This is
an optimistic estimate, which assumes familiarity with a stable set of en-
coding guidelines [Lehnert & Sundheim, 1991]
Although automatic content extraction is not the focus of our research,
MUC results set up a benchmark for the level of effort needed for obtaining
structured data as input for Bruegel iconic visual correlation system.
The algorithm for deciding what part of a textual message will become
iconic and what will be placed into the “mouse over” pop-up uses three crite-
ria:
• Terms included in an ontology (key-words) go into the icon,
• The most frequent terms go into the icon, and
• Personal names and individual organization names go into the
“mouse over” pop-up.
To build an ontology, we analyzed the frequencies of MUC terms. The most
frequent terms have been found in the following steps:
Step 1. Create a parser to dissect the MUC files, particularly the data
fields where three classes of phrases were defined based on their context:
KEYWORDS such as accomplished and civilian.
PLAIN_TEXT_STRINGS such as names and extended text descriptions.
DESCRIPTORS such as further data about locations, e.g., city and town.
Step 2. Run the parser on MUC data
Step 3. Format and track the results with regard to their usage count and
location within categories and sub-sections.
Step 4. Generate a list of the 100 most frequently occurring phrases in the
MUC_TST_1 and MUC_TST_2 files.
Step 5. Design icons for the 100 most common words tracked, which
constitute a large portion of the data contained in the MUC files.
250 Chapter 10
5.3 A Demonstration
(a) (b)
The slashes across the lower right indicate This icon describes a fairly large number of
that this is a small band (blue scale) of armed civilian targets (blue scale) were hurt pretty
rebels (or terrorists). badly (red scale) by some action.
Figure 9. Possible icons for MUC concepts. See also color plates.
the quantity or quality of the information displayed in the icon. Thus, the
meaning of the vertical scale and “slash” scales is content-dependent.
The meaning depends on the main content of the icon. For example, a
yellow mark in the green section of the vertical scale in Figure 9(a) can indi-
cate a high confidence in the data depicted in this icon. It clearly can not rep-
resent confidence in other data not present in the icon.
The vertical scale also can be interpreted as two additional content-
dependent scales that represent the relevance of the icon content to the
evaluating party. The top green half can represent importance, the bottom
red half can represent threat. Assigning values may require more information
about an item than is available from the source being visualized. In this case
there must be some database available that can supply additional informa-
tion.
Figures 10-14 show the icons developed as a result of the analysis of the
most common words in the MUC files. These icons are used in Bruegel’s
listview to represent records for visual correlation. Another view imple-
mented in Bruegel is called macroview. It is mostly is based on simple filled
rectangles possibly combined with simple texture. These simple icons allow
the encoding of more records into a single screen than the rich listview icons
allow, but macroview icons provide less detail.
Figure 10. Bruegel icon examples: base icons. See also color plates.
organization (a tree icon) of Several soldiers perpetrators terrorist act with dynamite
a medium size (encoded by encoded by the soldier icon, and significant damage (red
green lines) and relatively red modifier for perpetrators; lines)
high confidence (encoded 5 blue lines for several and a
by a yellow mark on red) yellow mark for medium in
the red confidence scale.
Figure 14. Bruegel icon examples: location icon types. See also color plates.
The convention of BIL is to use a base icon and attached modifiers. For
instance, the first icon in the second row of Figure 12 shows a government
official as a target using the target modifier. This official was also pretty im-
portant as can be seen by the scale on the left. Similarly, the icon in the mid-
dle of the second row shows that a fair sized band of soldiers from a rather
hostile government committed some act, most likely against something or
someone important to the country. If more than one target is described it is
suggested that, in order to not hinder the intuition, only the most important
target be represented, and then the others implied with the Quantity scale
shown as here.
The iconic language presented above provides one of the examples of
languages that can be loaded to the Bruegel system along with the appropri-
ate and icons matched to the ontology. Some implemented examples are pre-
sented in Figure 15. A time test for reading iconic sentences in Bruegel is
available on line [Kovalerchuk, Brown & Kovalerchuk, 2001].
Xx1 ° ± /
Win
32
Yy1 ° ² XP /
Zz1 #± / Win
16
3 ½ r % $$$
$ b
ß M & $$ c
- ½ 7 ( $ --
As before, the whole ontology of drug trafficking can be encoded in
icons as XML files. Similar to file system navigation, the size of the icons
also conveys information. For instance, the size of the “Legal” icon can rep-
resent the amount of legal information collected or level of potential legal
256 Chapter 10
implications for the offender. Visual correlation and navigation in such vis-
ual sentences can reveal patterns and new trends in drug trafficking.
3 w e -- b
ß 0 P Ëc
- n d ? --
X $$$$ 4 +
X $$ 4 +
Y $$$ 4 +
X 6FP i +
X $$ i +
Y $$$ i +
7.3 Visual correlation for medical research
Patient 1
Patient 2
…
…
Patient i
…
8. CONCLUSIONS
This chapter described the basic concepts of the Bruegel iconic visualiza-
tion and visual correlation system. The description includes Bruegel func-
tionality and Bruegel’s ability to compress information via iconic, semantic
zooming, and dynamic iconic sentences. The chapter provides a brief de-
scription of Bruegel’s architecture and tools.
The formal Bruegel iconic language for automatic icon generation is also
outlined. BGL specifies the layering of iconographic elements into complex
icons in order to represent textual content in a space efficient manner.
Dynico, is a supporting tool for the BGL and aids in the generation complex
260 Chapter 10
9. ACKNOWLEDGMENTS
Advanced
2. Design a set of iconels for exercise 1 that will permit you to build your
icons dynamically on demand. Build a formal language for such iconel
combinations including iconel repositioning and resizing.
3. Offer a new application domain for iconized ontology and workout exer-
cises 1 and 2 for this domain.
10. Bruegel iconic correlation system 261
11. REFERENCES
Bliss, C K. Semantography-Blissymbolics. 3rd ed. N. S. W., Sydney. Semantography-
Blissymbolics Publications, 1978.
Bliss for Windows, version 6.2, 2002, Handicom, NL.
http://www.handicom.nl/DOWNLOAD/BLISSFW/BFW62_UK.EXE
Chang S.K., Principles of Pictorial Information System Design, Prentice-Hall, 1989
Chang S. K., Polese G., Orefice S., Tucci M. A Methodology and Interactive Environment
for Iconic Language Design, University of Pittsburgh, Pittsburgh,
http://www.cs.pitt.edu/~chang/365/mins.html)
Chang S. K [0]. Bottoni, M. F. Costabile, S. Levialdi, Mussio, P., On the Specification of
Dynamic Visual Languages, Proceedings of IEEE Symposium on Visual Languages, Sept
1-4, 1998, Halifax, Canada, 14-21.
Chang S. K [0]. Bottoni, M. F. Costabile, S. Levialdi, Mussio, P., Modeling Visual Interaction
Systems through Dynamic Visual Languages, IEEE Transactions on Systems, Man and
Cybernetics, Vol. 32, No. 6, November 2002, 654-669.
Cardie, C., Domain-specific Knowledge Acquisition for Conceptual Sentence Analysis, PhD
Thesis. Dept. of Computer Science Technical Report, 1994. http://www-
nlp.cs.umass.edu/ciir-pubs/UM-CS-1994-074.pdf
Gershon, N., Page, W. What storytelling can do for information visualization? ACM Com-
munications, vol. 4, No 8., 2001, pp. 31-37.
Healey, C. Formalizing artistic techniques and scientific visualization for painting renditions
of complex information spaces. IJCAI-01, Seattle, 2001, pp. 371-376
Davis, M. Media streams: an iconic visual language for video annotation, 1995
http://www.w3.org/People/howcome/p/telektronikk-4-93/Davis_M.html#KREF7
Dictionary of Blissymbolics, 2001, http://www.symbols.net/blissymbolics/dictionary.html
Hobbs,J.R., Appelt, D., Bear, J., Israel, D., Kameyama,M., Stickel, M., and Tyson, M. FAS-
TUS: Extracting Information from Natural-Language Texts, Artificial Intelligence Center,
SRI International, CA, 1996, http://www.ai.sri.com/~appelt/fastus-schabes.pdf
Kovalerchuk, B, Brown, J., Kovalerchuk, M. Usability study webpage
http://www.cwu.edu/~borisk/timetest, 2001
Lehnert, W, Sundheim, B., A Performance Evaluation of Text-Analysis Technologies, AI
Magazine, 81-94. 1991, http://www-nlp.cs.umass.edu/ciir-pubs/aimag2.pdf
Lehnert, W.G., C. Cardie, D. Fisher, J. McCarthy, E. Riloff and S. Soderland. Univer-
sity of Massachusetts: Description of the CIRCUS System as Used for MUC-4, in The
Proceedings of the Fourth Message Understanding Conference. 1992, pp. 282-288.
Lehnert, W.G., C. Cardie, D. Fisher, J. McCarthy, E. Riloff, and S. Soderland. University of
Massachusetts: MUC-4 Test Results and Analysis, In: The Proceedings of the Fourth
Message Understanding Conference. 1992, pp. 151-158.
Lehnert, W. Natural Language Processing Overview", In: 1993 Research Brochure for the
Department of Computer Science at the University of Massachusetts, Amherst.
http://www-nlp.cs.umass.edu/ciir-pubs/NLP_overview.pdf
262 Chapter 10
Lehnert, W.G. Cognition, Computers and Car Bombs: How Yale Prepared Me for the 90's, in
Beliefs, Reasoning, and Decision Making: Psychologic in Honor of Bob Abelson (eds:
Schank & Langer), Lawrence Erlbaum Associates, Hillsdale, NJ. pp. 143-173., 1994,
http://www-nlp.cs.umass.edu/ciir-pubs/cognition3.pdf
[MUC-3] Proceedings of the Third Message Understanding Conference (MUC-3). San Diego,
CA: Third Message Understanding Conference (Ed. Sundheim, B.M. 1991).
[MUC-4] Proceedings of the Forth Message Understanding Conferences (MUC-4), Morgan
Kaufmann Publ., 1992.
MUC Data Sets, NIST, 2001, http://www.itl.nist.gov/iaui/894.02/related_projects/muc/
Spence, R. Information Visualization, ACM Press, Addison-Wesley, 2001
PART 4
Key words: Dynamic Visualization, Text Visualization, Remote Sensing Imagery, Hydro-
climate Dataset, Transient Data Stream
1. INTRODUCTION
© 2003 IEEE. Portions reprinted, with permission, from [Wong, Foote, Adams, Cowley &
Thomas, 2003]
266 Chapter 11
2. RELATED WORK
Rex, Dowson, Walters, May & Moon, 1997], and VxInsight [VxInsight,
2003]—were developed based on these two models.
Although MDS is frequently applied to text analysis, researchers also use
MDS to visualize images and climate modeling information. For example,
[Rodden, Basalaj, Sinclair & Wood, 1999] discuss a novel image scatterplot
without overlapping and [Wong, Foote, Leung, Adams & Thomas, 2000]
present a data signature scheme to study the trend of climate modeling data-
sets. Of particular note is that the prior work presented here has assumed a
static data collection, which is very different from the dynamic data streams
discussed in this chapter.
The demo corpus has 3,298 news articles collected from open sources
during April 20-26, 1995. It has a strong theme associated with the bombing
of the U.S. Federal Building in Oklahoma, the O.J. Simpson trial, and the
French elections.
The first step in processing the corpus is to identify a set of content-
bearing words [Bookstein, Klein & Raita, 1998] from the documents. Words
separated by white spaces in a corpus are evaluated within the context of the
corpus to assess whether a word is interesting enough to be a topic. The co-
occurrence or lack of co-occurrence of these words in documents is used to
evaluate the strengths of the words.
The second step is to construct the document vectors for the corpus. A
document vector, which is an array of real numbers, contains the weighted
strengths of the interesting words found in the corresponding document.
These vectors are normalized and the result is a document matrix that repre-
sents the corpus. In our example, a document vector contains 200 numbers.
Because there are 3,298 documents, the dimensions of the document matrix
are 3,298×200. Details of our text engine can be found in [Wise, Thomas,
Pennock, Lantrip, Pottier, Schur & Crow, 1995; Wise, 1999].
11. Visualizing data streams 269
4. MULTIDIMENSIONAL SCALING
O.J.
Simpson trial
French
elections Oklahoma
bombing
classical MDS process using the corpus described in Section 3.1. In this ex-
ample, documents with similar themes are clustered together as annotated.
Our first stratification strategy is to cut down the dimensions of the data
vectors. The biggest challenge is to reduce the physical size of the vectors
but maintain most of their important contents. We accomplish this by apply-
ing dyadic wavelets [Mallat 1998; Strang & Nguyen, 1997] to decompose
individual vectors (and thus compress them) progressively. While the theory
of wavelets is extensive, our experiments show that the rectangular (piece-
wise-constant) Haar wavelets perform well in all our visualizations. Not sur-
prisingly, the basic Haar also outperforms all the other wavelet candidates in
processing time, which is considered a top priority for analyzing data
streams [Gilbert, Kotidis, Muthukrishnan & Strauss, 2001].
Figure 4 shows an example of two consecutive wavelet decompositions
on a document vector randomly selected from the demo corpus (described in
Section 3.1.) Figure 4a is the original vector with 200 terms. Figure 4b is the
272 Chapter 11
result of the first wavelet decomposition with 100 terms. Figure 4c is the
result of the second wavelet decomposition with 50 terms. Because Haar
belongs to the dyadic wavelet family, one wavelet application will reduce
the vector dimension by 50%.
Figure 4. a) A document vector with 200 terms. b) Result of the first wavelet decomposition
with 100 terms. c) Result of the second decomposition with 50 terms.
a b c
Figure 5. Scatterplots generated by MDS using document vectors with sizes equal to a) 200,
b) 100, and c) 50 terms. See also color plates.
11. Visualizing data streams 273
a b c
Figure 6. Scatterplots generated by MDS using a) 3298, b) 1649, and c) 824 document
vectors. See also color plates.
So far we have focused only on the classical MDS [Cox & Cox, 1994] in
our investigation. To show the flexibility of our adaptive visualization tech-
nique, we demonstrate a second scaling example using a least-square MDS
technique known as Sammon Projection [Sammon, 1969].
Classical MDS treats similarity between two vectors directly as Euclid-
ean distances whereas least-square MDS takes it as the least squares of a
continuous monotonic function. A particular strength of a Sammon Projec-
tion over a classical MDS projection in visualization is that the former usu-
ally has fewer overlapping clusters due largely to its non-linear mapping ap-
proach.
Figure 8 shows a re-execution of Figure 7 using the Sammon Projection
technique. Although the visualization results look very different from those
in Figure 7 because of the preservation of higher dimensional distances, the
impact on the stratification and their results are very much like those in Fig-
ure 7. Most of the scatter points are able to maintain their original positions
and orientations. The four point colors (red, green, blue, and orange), which
276 Chapter 11
are assigned after a K-mean clustering process, clearly show the integrity of
the visualization after substantial dimension and vector reductions.
Figure 8. A scatterplot matrix demonstrates the impact of reducing document vectors (row)
versus reducing vector dimensions (column) using the Sammon Projection technique. See also
color plates.
Figure 9. A scatterplot matrix demonstrates the effects of reducing pixel vectors (row) versus
reducing vector dimensions (column) using remote sensing imagery. See also color plates.
Figure 10. Colors generated by the scatterplot clusters clearly identify different features of
the images shown in Figure 1c. See also color plates.
278 Chapter 11
a b
Figure 11. Two scatterplots filled with white noise
1. Translate the two scatterplots so that they both have their centroids at the
origin—by subtracting each point with its mean coordinates of the scat-
terplot.
2. Rotate X to match Y by multiplying X with (XTYYTX)½ (YTX)-1.
3. Dilate scatter points in X by multiplying each of them with
tr(XTYYTX)½/tr(XTX)T.
4. The matching index between X and Y :
1 – {tr(XTYYTX)½}2/{tr(XTX)tr(YTY)}.
The goal is to seek the isotropic dilation and the rigid translation, reflec-
tion, and rotation required to match one scatterplot with the other. The
matching index calculated in Step 4 ranges from zero (best) to one (worst).
For example, we can match the scatterplot in Figure 11b to the one in
Figure 11a by a rotation of -36 degree (Step 2) followed by a scaling of 2
(Step 3). The matching index (Step 4) of this Procrustes analysis is
2.21008×10-30—which indicates the two scatterplots are nearly identical.
Bear in mind that we use Procrustes analysis to measure the similarity
between two 2-D scatterplots, not the original high-dimensional datasets. In
other words, we merely use Procrustes analysis as a means to remove much
of the human subjectivity when we peruse the scatterplot patterns.
Table 2 shows the results of Procrustes analyses that were carried out on
the corpus scatterplots in Figure 7. The very low index values (from 0.016 to
0.14) in Table 2 indicate that all eight scatterplots generated by stratified
vectors are extremely similar to the full resolution one using all 3,268 vec-
tors with all 200 dimensions. These highly accurate results and the notable
97.5% time reduction in generating one of them (reported in Table 1) are
strong evidence that the two demonstrated stratification approaches are vi-
able solutions in visualizing transient data streams.
Table 2. Matching indices between the eight document corpus scatterplots and the
original full resolution one shown in Figure 8
200 100 50
All (3268) 0.0 (SELF) 0.0224058 0.0841326
1/2 (1649) 0.0162034 0.0513420 0.1114290
1/4 (824) 0.0329420 0.0620215 0.1417580
To further support our argument, we provide the matching results of the
remote sensing imagery scatterplots shown in Figure 9 in Table 3. The
matching indices listed in Table 3 are even lower than those listed in Table
2. Even the worst case (1/4 dimension, 1/4 vectors) accomplishes an identi-
cal matching index up to four significant figures.
280 Chapter 11
Table 3. Matching indices between the eight remote sensing imagery scatterplots and the
full resolution one shown in Figure 10
169 84 42
All (4096) 0.0 (SELF) 0.000004106 0.0000565361
1/2 (2048) 0.000000279 0.000004136 0.0000567618
1/4 (1024) 0.000004299 0.000007314 0.0000577721
b c d
e f g
Figure 12. Scatterplot a) (top right) is generated from the demo imagery (top left). Scatter-
plots b) to d) (bottom left) are generated from the corresponding cropped areas. Scatterplots e)
to g) are generated by extracting the scatter points from a) that are found in the corresponding
cropping windows which generate scatterplots b) to d). See also color plates.
282 Chapter 11
Our next step is to generate three more scatterplots (Figures 12e to 12g)
using the corresponding pixel vectors found in Figures 12b to 12d. This time
we use the Eigenvectors computed from the entire hyperspectral imagery
(instead of the local cropped windows) to construct all three scatterplots.
This can be done by reusing the coordinates of the selected scatterpoints
from Figure 12a, which is constructed using Eigenvectors from the entire
imagery. In other words, Figures 12b and 12e are generated using same pixel
vectors, as are Figures 12c and 12f and Figures 12d and 12g. However, Fig-
ures 12b to 12d use local Eigenvectors of the cropped regions whereas the
ones in Figures 12e to 12g use global Eigenvectors of the entire imagery.
The resultant scatterplots in Figures 12b to 12g show that the three corre-
sponding pairs (i.e., Figures 12b and 12e, Figures 12c and 12f, and Figures
12d and 12g) closely resemble each other. This visual-based conclusion is
consistent with the near zero Procrustes matching indices shown in Table 4,
which imply a close similarity among the pairs.
Table 4. Procrustes matching indices of the three scatterplot pairs shown in Figure 12
Figures 12b vs. 12e 12c vs. 12f 12d vs. 12g
Matching Index 0.000718745 0.0000381942 0.000683066
Because the first Eigenvector is the line though the centroid of the scatter
points along which the variance of the projections is greatest (not necessarily
the direction of the greatest ranges or extent of the data) and the second Ei-
genvector is orthogonal to the first one, these Eigenvectors tend to be very
robust for changes unless a substantial amount of disparate information is
added. The property is particularly noteworthy because of the frequently
high similarities among neighboring data streams. This remarkable combina-
tion becomes the foundation of our next visualization technique on data
streams.
Sliding Direction
Long Window Short
Data
Figure 13. An illustration of our multiple sliding window design in visualizing data streams.
See also color plates.
depicted in Figure 15a. Finally, the colors of individual vectors are mapped
to the corresponding co-ordinates of the western US map as shown in Figure
15b.
a b c
Figure 14. a) The Eigenvectors of the scatterplot are computed using 100% of the pixel vec-
tors. b) The Eigenvectors of the scatterplot are computed from 75% of the pixel vectors. The
other 25% are projected onto the scatterplot by dot-product. c) The Eigenvectors of the scat-
terplot are computed from 50% of the pixel vectors. The other 50% are projected by dot-
product. See also color plates.
11. Visualizing data streams 285
Similar to the previous CIR example in Figure 14, we start with a scatterplot
generated using 100% of the vectors as shown in Figure 16a. We then use
75% of the vectors to determine their two major Eigenvectors. The rest of
the 25% are inserted into the scatterplot using dot-products. The result is
shown in Figure 16b. Finally, 50% of the vectors (on the left side) are used
to determine the Eigenvectors of the scatterplot and the rest of the vectors
(on the right side) are inserted into the scatterplot by dot-products.
a b
Figure 15. a) A scatterplot of 6,155 hydroclimate vectors divided into 10 clusters. Each clus-
ter is represented by a unique random color. b) Corresponding cluster colors are projected to
the map position. See also color plates.
Based on visual analysis, Figures 16a and 16b are almost identical. The
orientation of the scatterplot in Figure 16c rotates slightly in counter-
clockwise direction. Nevertheless, the shape and the integrity of the scatter-
plot remain intact and look very similar to Figure 16a. The near-zero Pro-
crustes analysis indices shown in Table 6 prove that the three scatterplots are
indeed extremely similar.
Figure 16. a) The Eigenvectors of the scatterplot are computed using 100% of the hydorcli-
mate vectors. b) The Eigenvectors of the scatterplot are computed from 75% of the vectors.
The other 25% are projected onto the scatterplot by dot-product. c) The Eigenvectors of the
scatterplot are computed from 50% of the pixel vectors. The other 50% are projected by dot-
product. See also color plates.
11. Visualizing data streams 287
11. CONCLUSIONS
12. ACKNOWLEDGMENTS
3. Design a new visualization technique that combines both visual and non-
visual matching results described in Section 7.
14. REFERENCES
Aureka (2003). Retrieved Dec 2003 from: http://www.aurigin.com/aureka.html.
Babcock, B., Babu, S., Datar, M., Motwani, R., and Widom, J. Models and Issues in Data
Stream Systems. Proceedings of the 2002 ACM Symposium on Principles of Database
Systems (PODS 2002), ACM Press, 1-16, 2002.
Babu, S. and Widom, J. Continuous Queries over Data Streams. SIGMOD Record, ACM
Press, 109-120, 2001.
Bentley, C. L., Ward, M. O. Animating Multidimensional Scaling to Visualize N-Dimensio-
nal Data Sets. Proceedings IEEE Symposium on Information Visualization ’96, IEEE CS
Press, 72-73, 1996.
Bookstein, A., Klein, S. T., and Raita, T. Clumping Properties or Content-Bearing Words.
Journal of the American Society for Information Science and Technology, 49, 2, 102-114,
1998.
290 Chapter 11
Wise, J. A., Thomas, J. J., Pennock, K., Lantrip, d., Pottier, M., Schur, A., and Crow, V.
Visualizing the Non-Visual: Spatial Analysis and Interaction with Information from Text
Documents. Proceedings IEEE Symposium on Information Visualization ’95, 51-58, 1995.
Wong, P. C. and Bergeron, R. D. Multivariate Visualization Using Metric Scaling. Proceed-
ings IEEE Visualization ’97, IEEE CS Press, 111-118, 1997.
Wong, P. C., Foote, H., Adams, D., Cowley, W., and Thomas, J. Dynamic Visualization of
Transient Data Streams. Proceedings IEEE Symposium on Information Visualization 2003,
IEEE CS Press, 97-104, 2003.
Wong, P. C., Foote, H., Leung, L. R., Adams, D., and Thomas, J. Data Signatures and Visu-
alization of Very Large Datasets. IEEE Computer Graphics and Applications, 20, 2, IEEE
CS Press, 12-15, 2000.
Chapter 12
SPIN! — AN ENTERPRISE ARCHITECTURE
FOR DATA MINING AND VISUAL ANALYSIS
OF SPATIAL DATA
Abstract: The rapidly expanding market for Spatial Data Mining systems and
technologies is driven by pressure from the public sector, environmental
agencies and industry to provide innovative solutions to a wide range of
different problems. The main objective of the described spatial data mining
platform is to provide an open, highly extensible, n-tier system architecture
based on the Java 2 Platform, Enterprise Edition (J2EE). The data mining
functionality is distributed among (i) Java client application for visualization
and workspace management, (ii) application server with Enterprise Java Bean
(EJB) container for running data mining algorithms and workspace
management, and (iii) spatial database for storing data and spatial query
execution. In the SPIN! system visual problem solving involves displaying
data mining results, using visual data analysis tools, and finally producing a
solution based on linked interactive displays with different visualizations of
various types of knowledge and data.
Key words: Spatial data mining, Interactive visual analysis, Enterprise architecture
1. INTRODUCTION
induction and spatial cluster analysis. The system then combines these
methods with the rich interactive functionality of visual data exploration,
thus offering an integrated, distributed environment for spatial data analysis.
The SPIN! spatial data mining system has a component architecture. This
means that it provides the infrastructure and environment while all the
system functionality is provided by separate software modules called
components. Components can be easily added allowing for the expansion of
system capabilities.
Each component is developed as an independent module for solving a
limited number of tasks. For example, there may be components for data
access, analysis or visualization. In order to solve complex problems,
components need to communicate with and utilize the capabilities of each
other. For example, when an algorithm needs data to be loaded, it asks
another component to do this task.
To support interactions among components we developed a Common
Connectivity Framework called CoCon — a generic library written in Java
consisting of a number of interfaces and classes. The idea is that components
can be connected by means of different types of connections. Currently,
there are three connections available: visual, hierarchical and user-defined.
Visual connections are used to link a component with its view (similar to the
Model-View-Controller architecture). Hierarchical connections are used to
compose parent-child relationships among components in the workspace.
An example of this connection is the linking of a workspace folder with its
elements. User-defined connections are the principal type used in the system.
These connections allow for the arbitrary linking of components in the
workspace as required by the task to be solved. Using such a connection, we
could visualize a data mining result on a map by connecting the two
corresponding components.
All components are implemented on the basis of the CoCon common
connectivity framework what allows them to communicate within one
workspace. It is important that components explicitly declare connectivity
capabilities. This includes how to connect to a given component and with
what other components a given component can work. These capabilities can
be described either statically or dynamically. A static description consists of
listing the necessary descriptors such as the ability of a component to accept
an incoming connection from another component of some class. On the other
hand, a dynamic determination can be made at run time by asking each
296 Chapter 12
Figure 1. The SPIN! client provides views for its components: rule base (upper right
window), database connection (lower left window), database query and algorithm (lower right
windows). The workspace is visualized in the form of tree view and graph view.
Client
Workspace JDBC (Connections)
Figure 2. SPIN! platform architecture. The main components are a Java-based client, an
Enterprise Java Beans Container and one or more databases serving spatial and non-spatial
data.
workspace from a central store into the client begins working. When the
work is finished, the workspace is stored back in its initial location or
perhaps a new one. A persistent workspace can be implemented in two
alternate ways:
1. the whole workspace can be serialized and stored in one object like a
local/remote file or a database record, or
2. the workspace components and connections can be stored separately
in different database records.
The first approach is much simpler but it is difficult to share workspaces.
The second approach allows us to treat workspace components as individual
objects even within persistent storage; that is, the whole workspace graph
structure is represented in the storage.
Connection object
Component table
Component object Connection table
Figure 3. A workspace is a graph where nodes are components and edges are connections
between them. All workspaces are stored in a database and retrieving a workspace means
finding its component and connection objects. The persistent workspace management
functionality is implemented as a session bean, which manipulates two types of entity beans:
workspace components and workspace connections.
are serialized. We used XML for serialization, that is, any object state is
represented as XML text.
The functionality of remote workspace management is implemented by a
special session bean. This EJB has functions for loading and storing
workspaces. If the workspace is stored as a set of its constituents then the
session bean uses entity beans that correspond to the workspace components.
The state of such workspaces is stored in two tables: one for nodes and one
for edges. There exist two classes of entity beans, which are used to
manipulate these two tables. The workspace management architecture for
this case is shown in Figure 3.
2000b], which works with finite value attributes and generates rules with one
pass through the data set by using a method of sectioned vectors.
Let us assume that attributes x1 , x2 , , xn take a finite number of values,
ni from their domains Ai = {ai1 , ai 2 , , aini } . All combinations of values
ω = ¢ x1 , x2 , , xn ² ∈ Ω = A1 × A2 × × An form the state space or universe
of discourse. Each record from a data set corresponds to one combination of
attribute values or a point. If, for a combination of values, a record in the
data set exists then it is said to be possible. Otherwise, the point is
impossible. To represent the data semantics as Boolean distribution over the
universe of discourse we use the method of sectioned vectors and matrices
[Savinov, 1999b; Savinov, 2000b]. The main idea of the method is that one
vector can represent a multidimensional interval of possible or impossible
points (called also positive and negative internal, respectively). Each vector
consists of 0s and 1s that are grouped into sections separated by dots and
corresponding to all attributes. A section consists of ni components
corresponding to all attribute values. For example, 01.010.0101 is a
sectioned vector for three attributes taking 2, 3 and 4 values. Each
component corresponds to one attribute value so that the number of
components in the vector is equal to the total number of attribute values. A
sectioned vector associates n components with each point from Ω (one from
each section). The position of these components in the vector corresponds to
the point coordinates. To represent negative intervals we use a disjunctive
interpretation of sectioned vector. This means that the point is assigned 0 if
all its components in the vector are 0s, and it is assigned 1 if at least one the
components is 1. For example, the point ¢ a11, a21, a31 ² is impossible
according to the semantics of vector 01.010.0101 because all three
components corresponding to its coordinates in the vector
a11a12 .a21a22 a23.a31a32 a33 are zeros. Yet the point ¢ a11, a22 , a33 ² is
possible because the component corresponding to a22 is 1 in the vector
01.010.0101.
The main idea behind the algorithm for finding largest empty intervals
consists in representing data semantics by a set of negative sectioned vectors
and updating it for each record. Initially the data is represented only by the
empty interval consisting of all 0s and making all points impossible. After
the first record is added it is split into several smaller negative intervals so
that the point corresponding to this record becomes possible. For example,
addition of the record 01.001.0001 (where 1s correspond to its values) to the
interval 00.010.0100 splits it into three new intervals: 01.010.0100,
00.011.0100, and 00.010.0101 (changed components are underlined). During
this procedure very small intervals with a lot of 1s are removed since they
generate very specific rules leaving only the top set of the largest intervals.
12. SPIN! — An enterprise architecture for data mining and visual 305
analysis of spatial data
Once largest empty intervals have been found, they can be easily
transformed into rules by negating sections that should be in the antecedent.
For example, the vector {0,1} ∨ {0,1,0} ∨ {0,1,0,1} can be transformed into
the implication {1,0} ∧ {1,0,1} → {0,1,0,1} interpreted as the rule IF
x1 = {a11} AND x2 = {a21 , a23} THEN x3 = {a32 , a34 } . The rules are
filled in by statistical information in the form of the target value frequencies
within the rule condition interval (for one additional pass through the data
set). In other words, each value in conclusion is assigned its frequency
within the condition interval, for example, IF x1 = {a11} AND
x2 = {a21, a23} THEN x3 = {a32 : 145, a34 : 178} , which is obviously more
expressive. Here 145 means that the value a32 occurs 145 times within the
selected interval.
Figure 4. Visualization of spatial rules simultaneously and interactively with the map and
other views in the SPIN! system. As one rule is selected in the upper right view all objects
satisfying its condition are dynamically highlighted on the map in the lower right window.
on a map. Moreover, if we select a certain rule, all objects that satisfy its
left-hand side are dynamically highlighted on the map so that we can easily
see how they are spatially distributed (Figure 4). For example, we might find
that enumeration districts satisfying a certain rule and thus having interesting
characteristics in terms of the target attribute form a cluster or have a more
complex spatial configuration with respect to other geographic objects such
as roads and cities.
duplicate uses of tables, and the where part includes the link conditions
(transformed from the link specification) and the attributive selectors. For
aggregation queries, a nested two-level select statement is necessary, first
constructing the multirelational attributive part and then generating the
aggregations. Multiple instances of objects of one table are treated by
including the table in the from part several times and the distinction
predicate in the where part.
The space of subgroups to be explored within a search depends on the
specification of a relation graph, which includes tables (object classes) and
links. For spatial links the system can automatically identify geometry
attributes by which spatial objects are linked, since there is at most one such
attribute. A relation graph constrains the multi-relational hypothesis space in
a similar way as attribute selection constrains it for single relations.
The search for interesting subgroups is arranged as an iterated general to
specific, generate and test procedure. In each iteration, a number of parent
subgroups is expanded in all possible ways, the resulting specialized
subgroups are evaluated, and the subgroups are selected that are used as
parent subgroups for the next iteration step, until a pre-specified iteration
depth is achieved or no further significant subgroup can be found. There is a
natural partial ordering of subgroup descriptions. According to the partial
ordering, a specialization of a subgroup either includes a further selector to
any of the concepts of the description or introduces an additional link to a
further table.
The statistical significance of a subgroup is evaluated by a quality
function. As a standard quality function, SubgroupMiner uses the classical
binomial test to verify if the target share is significantly different in a
subgroup:
p– p0 N
n
p0(1– p0) N –n
This z-score quality function based on comparing the target group share
in the subgroup (p) with the share in its complementary subset balances four
criteria: size of subgroup (n), relative size of subgroup with respect to total
population size (N), difference of the target shares (p-p0), and the level of the
target share in the total population (p0). The quality function is symmetric
with respect to the complementary subgroup. It is equivalent to the χ2-test of
dependence between subgroup S and target group T, and the correlation
coefficient for the (binary) subgroup and target group variables. For
continuous target variables and the deviating mean pattern, the quality
function is similar, using mean and variance instead of share p and binary
case variance p0(1-p0).
12. SPIN! — An enterprise architecture for data mining and visual 311
analysis of spatial data
different value combinations. Thus for each parent, only one scan over the
database (or one joined product table) is executed. Further optimisations are
achieved by combining those parents that are in the same joined product
space (to eliminate unnecessary duplicate joins).
As in our prior example, we applied this algorithm within the SPIN!
system to UK 1991 census data for Stockport, one of the ten districts in
Greater Manchester, UK. Census data provide aggregated information on
demographic attributes such as person per household, cars per household,
unemployment, migration, and long-term-illness. Their lowest level of
aggregation is so called enumeration districts. Also available are detailed
geographical layers, among them streets, rivers, buildings, railway lines,
shopping areas. Data are provided to the project by the project partners
Manchester University and Manchester Metropolitan University.
Assume we are interested in enumeration districts with a high migration
rate. We want to find out how those enumeration districts are characterized,
and especially what distinguishes them from other enumeration districts not
having a high migration rate. Spatial subgroup discovery helps to answer this
question by searching the hypothesis space for interesting deviation patterns
with respect to the target attribute.
Figure 5. Overview on subgroups found showing the subgroup description (left). Bottom right
side shows a detail view for the overlap of the concept C (e.g. located near a railway line) and
the target attribute T (high unemployment rate). The window on the right top plots p(T|C)
against p(C) for the subgroup selected on the left and shows isolines as theoretically discussed
in [Klösgen, 1996].
12. SPIN! — An enterprise architecture for data mining and visual 313
analysis of spatial data
The way data mining results are presented to the user is essential for their
appropriate interpretation. We use a combination of cartographic and non-
cartographic displays linked together through simultaneous dynamic
highlighting of the corresponding parts. The user navigates in the list of
subgroups (Figure 5), which are dynamically highlighted in the map window
(Figure 6). As a mapping tool, the SPIN!-platform integrates the
CommonGIS system [Andrienko & Andrienko, 1999], whose strengths lies
in the dynamic manipulation of spatial statistical data. Figure 6 shows an
example for the migrant scenario, where the subgroup discovery method
reports a relation between districts with high migration rate and high-
unemployment.
Such an analysis, where results of data mining and interactive data
analysis are visualized simultaneously, can be used to make non-trivial
decisions in very different and diverse application areas including decision
making that takes place in public and private sector organizations. In
particular, this approach offers a great potential to improve decisions made
by statistical analysts, urban planners, environmental decision makers,
people in geomarketing, the management of natural and industrial hazards,
nuclear safety and radiation protection and many other domains. Combining
the strengths of GIS and Data Mining in a Spatial Mining tool helps the
decision maker to back up her intuitive insights by sound statistics, and to
automatically explore patterns in the data that are invisible to the eye
because they live in high-dimensional spaces.
5. CONCLUSION
6. ACKNOWLEDGEMENTS
8. REFERENCES
Andrienko G., Andrienko N. Interactive Maps for Visual Data Exploration. International
Journal of Geographical Information Science 13(5), 355-374, 1999.
Andrienko G., Andrienko N., Savinov A. Choropleth Maps: Classification revisited.
Proceedings ICA 2001, Beijing, China, Vol. 2, 1209-1219.
Andrienko N., Andrienko G., Savinov A., Voss H., Wettschereck D. Exploratory Analysis of
Spatial Data Using Interactive Maps and Data Mining. Cartography and Geographic
Information Science 28(3), July 2001, 151-165.
Andrienko N., Andrienko G., Savinov A., Wettschereck D. Descartes and Kepler for Spatial
Data Mining. ERCIM News, No. 40, January 2000, 44–45.
Chazelle B., Drysdale R.L., Lee D.T. Computing the largest empty rectangle. SIAM J.
Comput., 15:300-315, 1986.
Edmonds J., Gryz J., Liang D., Miller R.J. Mining for Empty Rectangles in Large Data Sets.
Proceedings of the 8th International Conference on Database Theory (ICDT), London,
UK, January 2001, 174-188.
Ester M., Frommelt A., Kriegel H.P., Sander J. Spatial Data Mining: Database Primitives,
Algorithms and Efficient DBMS Support. Data Mining and Knowledge Discovery 4(2/3),
2000, 193-216.
European IST SPIN! project web site. http://www.ccg.leeds.ac.uk/spin/
JBoss Application Server web site. http://www.jboss.org.
Klösgen W. Explora: A Multipattern and Multistrategy Discovery Assistant. Advances in
Knowledge Discovery and Data Mining, eds. U. Fayyad, G. Piatetsky-Shapiro, P. Smyth,
and R. Uthurusamy, Cambridge, MA: MIT Press, 249–271, 1996.
Klösgen W. Subgroup Discovery. In Handbook of Data Mining and Knowledge Discovery
(Chapter 16.3), Klösgen, W., Zytkow, J., eds. Oxford University Press, New York, 2002.
Klösgen W. Visualization and Adaptivity in the Statistics Interpreter EXPLORA. Proceedings
of the 1991 Workshop on KDD, ed. Piatetsky-Shapiro, G., 25-34, 1991.
Klösgen W., May M. Spatial Subgroup Mining Integrated in an Object-Relational Spatial
Database. PKDD 2002, Helsinki, Finland, August 2002, 275-286.
Klösgen W., Zytkow J. (eds.) Handbook of Data Mining and Knowledge Discovery. Oxford
University Press, 2002.
Knobbe A.J., de Haas M., Siebes A. Propositionalisation and Aggregates. Proc. PKDD 2001,
eds. De Raedt, L., Siebes, A., Berlin:Springer, 277-288, 2001.
Koperski K., Adhikary J., Han J. Spatial Data Mining, Progress and Challenges. Technical
Report, Vancouver, Canada, 1996.
Koperski K., Han J. GeoMiner: A System Prototype for Spatial Mining. Proceedings ACM-
SIGMOD, Arizona, 1997, 553-556.
Krogel M., Wrobel S. Transformation-Based Learning Using Multirelational Aggregation.
Proc. ILP 2001, eds. Rouveirol, C., Sebag, M., Springer, 142-155, 2001.
Ku L.-P., Liu B., Hsu W. Discovering Large Empty Maximal Hyper-rectangles in Multi-
dimensional Space. Technical Report, Department of Information Systems and Computer
Science (DCOMP), National University of Singapore, 1997.
Kuper G.M., Libkin L., Paredaens J. (eds.) Constraint Databases. Berlin:Springer, 2000.
Libkin L. Expressive Power of SQL. Proc. of the 8th International Conference on Database
Theory (ICDT01), eds. Bussche, J, Vianu, V., Berlin:Springer, 1-21, 2001.
Lisi F.A., Malerba D. SPADA: A Spatial Association Discovery System. In A. Zanasi, C.A.
Brebbia, N.F.F. Ebecken and P. Melli (Eds.), Data Mining III, Series: Management
Information Systems, Vol. 6, 157-166, WIT Press, 2002.
12. SPIN! — An enterprise architecture for data mining and visual 317
analysis of spatial data
Liu B., Ku L.-P., Hsu W. Discovering Interesting Holes in Data. Proceedings of Fifteenth
International Joint Conference on Artificial Intelligence (IJCAI-97), pp. 930-935, August
23-29, 1997, Nagoya, Japan.
Liu B., Wang K., Mun L.-F., Qi X.-Z. Using Decision Tree Induction for Discovering Holes
in Data. Pacific Rim International Conference on Artificial Intelligence (PRICAI-98), 182-
193, 1998.
May M. Spatial Knowledge Discovery: The SPIN! System. Fullerton, K. (ed.) Proceedings of
the 6th EC-GIS Workshop, Lyon, 28-30th June, European Commission, JRC, Ispra, 2000.
May M., Savinov A. An integrated platform for spatial data mining and interactive visual
analysis. Data Mining 2002, Third International Conference on Data Mining Methods and
Databases for Engineering, Finance and Other Fields, 25-27 September 2002, Bologna,
Italy, 51-60.
Orlowski M. A New Algorithm for the Largest Empty Rectangle Problem. Algorithmica,
5(1):65-73, 1990.
Savinov A. Application of multi-dimensional fuzzy analysis to decision making. In Advances
in Soft Computing — Engineering Design and Manufacturing, R. Roy, T. Furuhashi and
P.K. Chawdhry, eds. Springer-Verlag, London, 1999b.
Savinov A. Mining possibilistic set-valued rules by generating prime disjunctions. Proc. 3rd
European Conference on Principles and Practice of Knowledge Discovery in Databases —
PKDD'99, Prague, Czech Republic, September 15-18, 1999a, 536-541.
Savinov A. Mining Interesting Possibilistic Set-Valued Rules. In Fuzzy If-Then Rules in
Computational Intelligence: Theory and Applications, Da Ruan and Etienne E. Kerre, eds.
Kluwer, 2000a, 107-133.
Savinov A. An algorithm for induction of possibilistic set-valued rules by finding prime
disjunctions. In Soft computing in industrial applications, Suzuki, Y., Ovaska, S.J.,
Furuhashi, T., Roy, R., Dote, Y., eds. Springer-Verlag, London, 2000b.
Savinov A. Mining Spatial Rules by Finding Empty Intervals in Data. Proc. of the 7th
International Conference on Knowledge-Based Intelligent Information & Engineering
Systems (KES’03), 3-5 September 2003, Oxford, UK, 1058-1063.
Wrobel S. An Algorithm for Multi-relational Discovery of Subgroups. Proc. of First PKDD,
eds. Komorowski, J., Zytkow, J., Berlin:Springer, 78-87, 1997.
Chapter 13
XML-BASED VISUALIZATION AND
EVALUATION OF DATA MINING RESULTS
Dietrich Wettschereck
The Robert Gordon University, UK
1. INTRODUCTION
discovery process.1 These results (a.k.a. models) are then evaluated and, if
found valuable by the user, incorporated into operational systems via SQL
statements or C-programs. The focus of this chapter is the phase between the
generation of the model and its deployment. During this phase, typically
numerous models are evaluated visually and experimentally, most are
discarded, some are modified (manually or automatically) and then re-
evaluated and, finally, very few models are deployed. This phase can be seen
as an exploratory phase with the difference that in the field of information
visualization data are typically explored while in this case models are
explored.
Shneiderman [Shneiderman, 2002] makes four recommendations
regarding the development of discovery tools and the thesis of this chapter is
that the tool, called VizWiz, that is described below follows these
recommendations:
1. Integrate data mining and information visualization: VizWiz is
not a data mining tool, but rather an information visualization tool.
However, in a certain sense it does combine these two techniques
since it visualizes data mining results and as such uses information
visualization techniques as a post-processing mechanism for data
mining.
2. Allow users to specify what they are seeking and what they find
interesting: The highly interactive graphical user interface of VizWiz
enables users to quickly zoom in on those (parts of) models that are
most interesting to them. Overview plots showing multiple models
can be utilized to identify those models best suited for the purpose.
For example, a user may prefer models with high coverage over
extremely accurate models or vice versa.
3. Support collaboration: The input and output2 format of VizWiz is an
XML-based standard that is already supported by a variety of other
tools such as IBM’s Intelligent Miner or Clementine from SPSS.
Users working jointly on a specific analysis problem, for example in a
medical application of high public interest, can easily exchange their
(preliminary) models. Furthermore, analysis experts using these rather
complex systems can utilize VizWiz to present their results to
application experts that may not have access to these complex
knowledge discovery tools. Finally, Java technology in VizWiz
allows for the presentation of data mining models in the Internet thus
enabling wide dissemination of interesting findings.
1
The CRISP-DM process model [Chapman et al., 2000] divides the knowledge discovery
process into six phases: business understanding, data understanding, data preparation,
modeling, evaluation, and deployment.
2
VizWiz can write modified or enhanced PMML files.
13. XML-based visualization and evaluation of data mining results 321
Figure 2. VizWiz visualization of a decision tree, based on the DMG example shown in
Figure 1. The bars on top of each node indicate the class predicted by this node and the text in
each node lists the conditions that must be satisfied to reach this node.
3
Propositional rules are typically of the form “if variable_a = value_1 and variable_b =
value_2 then class = class_1”, more complicated conditions are of course possible and
supported. First-order rules are typically of the form “if pred_1(A,B) and pred_2(B,C) then
class_1(A)” where pred_1 and pred_2 are predicates such as “father_of” and A, B, C, and D
are variables.
13. XML-based visualization and evaluation of data mining results 325
Figure 3. The ROC curve for six models generated from the Cleveland Heart Disease domain
[Blake, Keogh & Merz, 2002]
4
ILP: inductive logic programming
5
Software that converts WEKA output to PMML is currently under development.
326 Chapter 13
Figure 4. VizWiz detail view of the decision tree model for the Cleveland domain. The panel
on the left hand side shows selected records of the test data set that was used to generate the
ROC curve shown in Figure 3.
6
These bars are colored in reality, but have been changed to patterns for easier viewing in the
printed version.
13. XML-based visualization and evaluation of data mining results 327
been replaced by natural language (“Actor beams down to planet with kirk”).
The mapping is defined in the PMML file and variable and constants are
replaced automatically by their proper values.
been chosen despite the inherent danger associated with all pie chart
visualizations that the user incorrectly perceives exact group sizes. Feedback
from a variety of technical and non-technical users has indicated that the
intuitive appeal of pie charts supersedes the danger of misinterpretation and
users inspect the actual numbers once they have gained an overview through
the pie charts.
portions of the model: in the case of decision trees, only the root is initially
shown; in the case of association rules, only those rules with the highest
confidence and support values are shown. The user may then select to
display additional information which typically is realized in real time.
4. RELATED WORK
5. DISCUSSION
6. ACKNOWLEDGEMENTS
The ROC plotting software has been developed by J. Farrand from the
University of Bristol. Comments on earlier drafts of this chapter and
suggestions for the improvement of the software have been made by various
members of the SolEuNet consortium, most notably Steve Moyle. I am also
332 Chapter 13
indebted to B. Noble and other colleagues from RGU for their helpful
comments.
This work has been supported in part by the EU funded project SolEuNet
– Data Mining and Decision Support for Business Competitiveness: A Euro-
pean Virtual Enterprise (IST-1999-11495) and the Research Development
Initiative (RDI) at The Robert Gordon University.
Advanced
3. Figure 5 assumes an implicit ordering of the rules. Is it possible to
encode this ordering in PMML explicitly and if so, how?
4. What are the most useful (filter) controls a graphical user interface
visualizing association rules should offer? Would these controls also be
sensible for other model types? Does this assume the availability of
additional data that is not part of the minimal PMML format?
8. REFERENCES
Blake C., Keogh E., Merz CJ. UCI repository of Machine Learning databases (machine
readable data repository). Irvine, CA: Department of Information and Computer Science,
University of California at Irvine. http://www.cs.uci.edu/mlearn/MLRepository.html
(accessed 15 November 2002).
Blockeel H., Moyle S., Centralized model evaluation for collaborative data mining. In M.
Grobelnik, D. Mladenic, M. Bohanec, and M. Gams, editors, Proceedings A of the 5th
International Multi-Conference Information Society, 2002: Data Mining and Data
Warehousing/Intelligent Systems, pages 100–103. Jozef Stefan Institute, Ljubljana,
Slovenia.
Chapman P., Clinton J., Kerber R., Khabaza T., Reinartz T., Shearer C., Wirth R., CRISPDM
1.0: step-by-step data mining guide, 2000.
13. XML-based visualization and evaluation of data mining results 333
1. INTRODUCTION
Validating error
Training error
1 2 … k* … km Training epochs, k
z2 x2
f1
z1
ϑ1
x1
Figure 2. An example of two-dimensional classification problem
y
z2
z1
x1
x2
x3
x4
Figure 3. An example of cascade-correlation architecture
we will describe a new algorithm, which can train cascade neural networks
in the presence of irrelevant features.
By adding new features and neurons as they are required, the cascade
neural network evolves during learning. The main steps of the evolving
algorithm are described below.
X = [x1, ..., xm]; % a pool of m input variables
P = 1; % the number of neuron inputs
% Train single-input neurons and calculate errors
for i = 1:m
N1 = create-neuron(p, X(i));
N1 = fit-weight(N1);
E(i) = calc-error(N1);
end
[E1,F] = sort-ascend(E);
h = 1; % the position of the variable in F
C0 = E1(h);
% Create a cascade network NN
344 Chapter 14
NN = [];
r = 0; % the number of hidden neurons
p = 2;
while h < m
h := h + 1;
V = [X(F(1)), X(F(h))];
% Add links to the hidden neurons
for j = 1:r
V = [V, NN(j)];
end
% Create a candidate-neuron N1
N1 = create-neuron(p, V];
N1 = fit-weight(N1);
C1 = calc-error(N1);
if C1 < C0
r := r + 1;
p := r + 2;
NN(r) = add-neuron(N1);
end
end
The algorithm starts to learn the candidate-neurons with one input and
then saves their validating errors in a pool E. The procedure sort-ascend
arranges pool E in ascending order and saves the indexes of the input
variables in a pool F. The first component of the F is an index of the input
variable providing a minimal classification error C0.
At the following steps, the algorithm adds new features as well as new
neurons to the network while the validation error C1 calculated for the
candidate-neuron N1 decreases. The weights of candidate-neurons are
updated until condition (3) is satisfied.
As a result, the cascade neural network consisting of the r neurons is
placed in the pool NN. The size of this network is nearly minimal because
the stopping rule is met for a minimal number of neurons.
Below we describe an application of this algorithm for recognizing
artifacts in clinical EEGs. These EEGs have characteristics such as noise and
features that are redundant or irrelevant to the classification problem.
In our experiment, we used the clinical EEGs recorded via the standard
EEG channels C2 and C4 from two newborns during sleeping hours.
Following [Breidbach, Holthausen, Scheidt & Frenzel, 1998] these EEGs
were represented by spectral features calculated in 10-second segments for 6
frequency bands: subdelta (0-1.5 Hz), delta (1.5-3.5 Hz), theta (3.5-7.5 Hz),
14. Neural-network techniques for visual mining clinical 345
electroencephalograms
alpha (7.5-13.5 Hz), beta 1 (13.5-19.5 Hz), and beta 2 (19.5-25 Hz).
Additionally for each band, the values of relative powers and their variances
were calculated for channels C3 and C4 and their sum, C3+C4. The total
number of the features was 72. Values of these features were normalized to
have a zero mean and a unit variance.
The normal segments and artifacts in the EEGs were manually labeled by
an EEG-viewer that analyzed muscle and cardiac activities of patients
recorded from additional channels. As an example of normal segments and
artifacts, Figure 4 depicts the fragment of EEG containing 500 segments
presented by 36 features. In this fragment the EEG-expert recognized
segments 15, 22, 24, 84, and 85 as artifacts and the remaining as normal.
Figure 4. Fragment of EEG containing 100 segments presented by 36 features in which the
EEG-viewer recognized five artifacts. See also color plates.
y Artefact/
Normal
z3
z2
z1
AbsPowBeta2
AbsPowAlphaC4
AbsPowDeltaC3
AbsVarDelta
Figure 6. A cascade neural network trained for recognizing artifacts and normal segments in
clinical EEGs. The squares represent synaptic connections.
EEG-expert observing this model can conclude the following. First, there
are four features that make the most important contribution to the
classification. These features are involved in the order of their significance –
we can see that the most important feature is AbsPowBeta2 and the less
important is AbsVarDelta. So the most important contribution to the artifact
recognition in EEG of sleeping newborns is made by AbsPowBeta2 which is
calculated for a high frequency band. This fact directly corresponds to a rule
used for recognizing muscle artifacts in sleep EEG of adults [Brunner,
Vasko, Detka, Monahan, Reynolds, & Kupfer, 1996].
Second, the discovered model shows the combinations between the
selected features and hidden variables in the order of their classification
accuracy. The EEG-expert can see that the maximal gain in the accuracy is
achieved if the feature AbsPowAlphaC4 is combined with AbsPowBeta2.
Further improvement is achieved by combining the hidden variable z1, which
is a function of the above two features, and the new feature AbsPowDeltaC3.
So the EEG expert can see the four combinations of the selected features and
hidden variables z1, …, z3 listed in the order of increasing classification
accuracy, p1 < … < p4, as follows
z1: AbsPowBet2 & AbsPowAlphaC4 p1,
z2: z1 & AbsPowBet2 & AbsPowDeltaC3 p2,
z3: z2 & z1 & AbsPowBet2 & AbsPowDeltaC3 p3,
z4: z3 & z2 & z1 & AbsPowBet2 & AbsPowDelta p4,
where z4 = y is the outcome of the classification model.
The third useful issue is that the synaptic connections in the discovered
model are characterized by the real-valued coefficients, which can be
interpreted as the strength of relations between features and hidden variables.
348 Chapter 14
The larger value of the coefficient, the stronger relation between the feature
and hidden variable is.
In general, such models can assist EEG-experts to present the underlying
casual relations between the features and outcomes in a visual form. The
visualization of the discovered models can be useful for understanding the
nature of EEG artifacts.
In our experiments, we compared the performance of the above
classification model and an FNN trained on the same data. Using a sigmoid
activation function and a standard neural-network technique, we found that a
FNN with four hidden neurons and 11 input nodes provides a minimal
training error. The training and testing errors were 2.97% and 5.54%,
respectively.
Comparing the performances, we conclude that the discovered cascade
network slightly outperforms the FNN on the testing EEG data. The
improved performance is achieved because the cascade network is gradually
built up by adding new hidden neurons and new connections. Each new
neuron in the cascade network makes the most significant contribution to the
artifact recognition among the all-possible combinations of the allowed
number of features. This allows for avoiding the contribution of the noise
features and discovering most significant relations which can then be
visualized.
In this experiment, the FNN has misclassified more testing examples than
the classification model described above. Therefore, we conclude that our
cascade neural-network technique can more successfully recognize artifacts
in clinical EEGs. At the same time the discovered classification model
allows EEG-experts to present the basic relations between features and
outcomes in visual form.
Ivakhnenko, 1994; Müller & Lemke, 2003]. The supporting neurons have at
least two inputs v1 and v2. A transfer function g of these neurons may be
described by short-term polynomials, for example, by a linear or non-linear
polynomial:
y = g(v1, v2) = w0 + w1v1 + w2v2, (4)
x2 y2(1) y2
(2)
y2
(3)
The neuron-candidates that were selected for each of the layers are
depicted here as the gray boxes. Here the neuron y2(3) that provides the best
classification accuracy and is assigned to be the output neuron. The resulting
350 Chapter 14
Thus, for the kth training example, we can calculate the output y of the
neuron as
y = g(w, v(k)), k = 1, …, n,
where w is a weight vector, v is an input vector and n is the number of
training examples.
For selecting F best neurons, the exterior criterion is calculated on the
unseen examples of the validation set that have not been used for fitting the
weights w of neurons. These examples are reserved by dividing the dataset D
into two non-intersecting subsets DA = (XA, yAo) and DB = (XB, yBo), the
training and validating data sets, respectively. The sizes nA and nB of these
subsets is usually recommended to be defined with nA ≈ nB, and nA + nB = n.
Let now find a weight vector w* that minimizes the sum square error e of
the neuron calculated on the subset DA:
e = Σk (g( v(k), w) – yok)2, k = 1, …, nA.
To obtain the desirable vector w*, the conventional GMDH fits the
neuron weights to the subset DA by using a Least Square Method (LSM)
[Bishop, 1995; Farlow, 1984; Madala & Ivakhnenko, 1994], which can
produce effective evaluations of weights with Gaussian distributed noise in
the data. As noise in real-world data is often non-Gaussian [Duda & Hart,
2000; Tempo, Calafiore & Dabbene, 2003], we will use the learning
algorithm described in Section 3, which does not require a hypothesis about
the noise structure.
Having found a desirable weight vector w* on the subset DA, we can
calculate the value CRi of the exterior criterion on the validation subset DB:
CRi = Σk(gi(v(k), w*) – y0k)2, k = 1, …, nB, i = 1, …, Lr. (6)
We can see that the calculated value of CRi depends on the behavior of
the ith neuron on the unseen examples of the subset DB. Therefore, we may
expect that the value of CR calculated on the data D would be high for the
neurons with poor generalization ability.
14. Neural-network techniques for visual mining clinical 351
electroencephalograms
The values CRi calculated for all the candidate-neurons at the rth layer
are arranged in an ascending order:
CRi1 ≤ CRi2 ≤ … ≤ CRF ≤ …≤ CRL,
so that the first F neurons provide the best classification accuracy.
For each layer r, it is found out a minimal value CRmr corresponding to
the best neuron, i.e., CRmr = CRi1. The first F best neurons are then used at
the next, r + 1, layer, and the training and selection of the neurons are
repeated.
The value of CRmr decreases step-by-step while the number of layers
increases and the network is built up. Once the value of CR reaches to a
minimal point and then starts to increase and we can conclude that the
network has been over-fitted. Here because the minimum of CR was reached
at the previous layer, we stop the training algorithm and take the desirable
network, which was grown at the third layer.
for i = 1:m
N1 = create-neuron(p, X(i));
N1 = fit-weight(N1);
A(i) = calc-accuracy(N1);
end
% Create new two-input neurons for gno attempts
p = 2;
for i = 1:gno
pair = turn-roulette(p, A);
N1 = create-neuron(p, X(pair));
N1 = fit-weight(N1);
ac = calc-accuracy(N1);
% Selection and Addition
if ac > max(A(pair))
k := k + 1;
NN(k) = add-new-neuron(N1);
A(m + k) = ac;
end
end
x69
x73
x76
Figure 8. A polynomial network for classifying EEG of a Alzheimer and a healthy patient
The EEGs used in our next experiments were recorded from two sleeping
newborns. These EEGs were represented by 72 spectral and statistical
features as described in [Breidbach et al., 1998] calculated in 10-second
segments. For training, we used the EEG recorded from one newborn and for
testing the EEG recorded from the other newborn. These EEGs consisted of
1347 and 808 examples in which an expert labeled respectively 88 and 71
segments as artifacts.
For comparison, we used the standard neural network and the
conventional GMDH techniques. We found out that the best FNN consisted
of 10 hidden neurons and misclassified 3.84% of the testing examples. The
GMDH-type network was grown with an activation function, equation (5)
above, with m = 72 inputs and F = 40. We ran our algorithm with the same
354 Chapter 14
x6
y2(1)
x21
x55
y4(1)
x57
Observing the results listed in Table 1, we can conclude that the PNN
trained by our method recognizes EEG artifacts slightly better than the FNN
and GMDH-type network.
Decision tree (DT) methods have been successfully used for deriving
multi-class concepts from real-world data represented by noisy features
[Brodley & Utgoff, 1995; Duda & Hart, 2000; Quinlan, 1993; Salzberg,
Delcher, Fasman & Henderson, 1998]. Experts find that results from a DT
are easy to observe by tracing the route from its entry point to its outcome.
This route may consist of the subsequence of questions which are useful for
the classification and understandable for medical experts.
Conventional DTs consist of the nodes of two types. One is a splitting
node containing a test, and other is a leaf node assigned to an appropriate
class. A branch of the DT represents each possible outcome of the test. An
example is presented to the root of the DT and follows the branches until the
leaf node is reached. The name of the class at the leaf is the resulting
classification.
A node can test one or more of the input variables. A DT is a multivariate
or oblique, if its nodes test more than one of the features. Multivariate DTs
are in general much shorter than those which test a single variable. These
DTs can test Threshold Logical Units (TLU) or perceptrons that perform a
weighted sum of the input variables. Medical experts can interpret such tests
as a weighted sum of questions for example: Is 0.4 * BloodPressure + 0.2 *
356 Chapter 14
HeartRate > 46? Weights here usually represent the significance of the
feature for the test outcome.
To learn concepts presented by the numerical features [Duda & Hart
,2000], and [Salzberg et al. 1998] have suggested multivariate DTs which
allow classifying linearly separable patterns. By definition such patterns are
divided by linear tests. However using by these algorithms [Brodley &
Utgoff, 1995; Frean, 1992; Parekh, et al., 2000; Salzberg et al., 1998], DTs
can also learn to classify non-linearly separable examples.
In general, DT algorithms require computational time that grows
proportionally to the number of training examples, input features, and
classes. Nevertheless, the computational time, which is required to derive
multi-class concepts from large-scale data sets, becomes overwhelming,
especially, if the number of training examples is in the tens of thousands.
To train the DT from data that are non-linearly separable, [Gallant, 1993]
suggested a Pocket Algorithm. This algorithm seeks weights of multivariate
tests that minimize the classification error. The Pocket Algorithm uses the
error correction rule (9) above to update the weights w j and w k of the
corresponding discriminant functions gj and gk. The algorithm saves in the
Pocket the best weight vectors W P that are seen during training.
In addition, Gallant has suggested the “ratchet” modification of the
Pocket Algorithm. The idea behind this algorithm is to replace of the weight
W P by the current W only if the current LM has correctly classified more
training examples than was achieved by W P. The modified algorithm finds
the optimal weights if sufficient training time is allowed.
To implement this idea, the algorithm cycles training the LM for a given
number of epochs, ne. For each epoch, the algorithm counts the current
number of input series of correctly classified examples, L, and evaluates the
accuracy A of the LM on the training set.
In correspondence to inequality (8), the LM assigns a training example
(x, q) to the jth class, where q is a class where the example x actually
belongs. The LM training algorithm consists of the following steps:
W = init-weight();
[Wp, Lp, Ap] = set-pocket(W);
for i = 1:n % n is the number of training examples
[x, q] = get-random(X);
j = classify(x);
if j ~= q
Lp = 0;
W(j) := W(j) + c*x;
W(q) := W(q) - c*x;
else
if L > Lp
A = calc-accuracy();
if A > Ap
% Update the pocket
Wp = W;
Lp = L;
Ap = A;
end
358 Chapter 14
end
end
end
The SFS algorithm exploits a bottom up search method and starts to learn
using one feature. Then it iteratively adds the new feature providing the
largest improvement in the classification accuracy of the linear test. The
algorithm continues to add the features until a specified stopping criterion is
met. During this process, the best linear test Tb with the minimum number of
the features is stored. In general, the SFS algorithm consists of the following
steps.
p = 1; % the number of features in the test
% Test the unit-variant tests T
for i = 1:m
T(i) = test(p, X);
end
Tb = find-best-test(T);
while stop-rule(Tb, p)
p := p + 1;
T1 = find-best-test(p, T);
% Compare the accuracies of T1 and Tb
if T1.A > Tb.A
Tb = T1;
end
end
The stopping rule is satisfied when all the features have been involved in
the test. In this case m + (m − 1) + … + (m – k) linear tests have been made,
where k is the number of the steps. Clearly if the number of the features, m,
as well as the number of the examples, n, is large, the computational time
needed to terminate may be unacceptable.
To stop the search early and reduce the computational time, the following
heuristic stopping criterion was suggested by [Parekhet et al., 2000]. They
found that if at any step, the accuracy of the best test is decreased by more
than 10%, then the chance of subsequently finding a better test with more
features is slight.
However, the classification accuracy of the resulting linear test depends
on the order in which the features have been included in the test. For the SFS
algorithm, the order in which the features are added is determined by their
contribution to the classification accuracy. As we know, the accuracy
depends on the initial weights as well as on the sequence of the training
examples selected randomly. For this reason the linear test can be non-
optimal, i.e., the test can include more or fewer features than needed for the
best classification accuracy. The chance of selecting the non-optimal linear
test is high, because the algorithm compares the tests that differ by one
feature only.
360 Chapter 14
Ω1
f1/3 g1 = f1/2 + f1/3
f1/2
an example x
Ω3
g3 = - f1/3 - f2/3
g2 = f2/3 - f1/2
Ω2
f2/3
Figure 10. The approximation given by the dividing hyperplanes g1, g2 and g3
-1
x2 f1/3 g2
+1
xm -1
g3
f2/3 -1
In general for r > 2 classes, the neural network consists of r(r – 1)/2
hidden neurons f1/2, …, fi/j, …, fr – 1/r and r output neurons g1, …, gr, where i
< j, j = 2, …, r. The output neuron gi is connected to (r – 1) hidden neurons
which are partitioned into two groups: the first group consists of the hidden
neurons fi/k for which k > i, and the second group consists of the hidden
362 Chapter 14
neurons fk/i for which k < i. The final step is to set up the weights of output
neurons: each output neuron gi is connected to the hidden neurons fi/k and fk/i
with weights equal to +1 or -1.
As we see, each hidden neuron in the network learns to distinguish one
class from another. The neurons learn independently of each other. However,
the performance of the hidden neurons depends on the contribution of the
input variables to the classification accuracy. For this reason, we next
discuss a DT derivation algorithm which is able to select relevant features.
i := i + 1;
if P(i) > rand(1) % wheel of roulette
T1 = [T X(F(i))]; % the features of test
T1 = train-test(T1);
A1 = calc-accuracy(T1);
if A1 > A
T := T1;
A := A1;
feature-no := feature-no + 1;
end
end
end
% Replace the best test Tb
if A > Ab
Ab := A;
Tb := T;
end
end
Using these features, the linear test classifies the training examples with the
best classification accuracy Ab.
For fitting the DT linear tests, we used 2/3 of the training examples and
evaluated the classification accuracy on all the training data. We varied the
number of attempts Na from 5 to 25.
15
Error,%
10
0
0 20 40 60 80 100 120
Classifiers
60
Number of Features
40
20
0
0 20 40 60 80 100 120
Classifiers
Figure 12. The training errors (a) and the number of features (b) for 120 binary classifiers
Note that the trained classifiers use different sets of the features (input
variables). The number of these features varies from 7 to 58, see Figure
12(b).
14. Neural-network techniques for visual mining clinical 365
electroencephalograms
0.8
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16
Classes
0.8
Disribution of Segments
0.6
0.4
0.2
0
0 2 4 6 8 10 12 14 16
Classes
Figure 13. The distribution of the classified testing segments for two patients
V = remove-feature(v1);
if V not empty
% Find the examples A0, A10, A1 and A01:
A0 = find(Y0 == 0);
A01 = find(Y0 == 1); % the errors of 0
A1 = find(Y1 == 1);
A10 = find(Y1 == 0); % the errors of 1
if A10 not empty
find-node(X0(A0, V), X1(A10, V), V);
end
if A01 not empty
find-node(X0(A01, V), X1(A1, V), V);
end
end
We have used this algorithm to derive a decision tree for recognizing the
artifacts in the clinical EEGs. First we trained the polynomial network
described in section 4.4 from the training data which originally was
represented by 72 features. Then we removed from these data all 30
misclassified examples and used the 7 discovered features to present the data
in a new input space.
To derive a DT from the new data, the preceding algorithm was applied.
This algorithm has derived a simple DT which exploits only one variable x6,
the absolute power of subdelta summed over channels C3 and C4, as
depicted in Figure 14.
EEG Segment
x6 > 1.081
0 1
Normal Artifact
N A
Figure 14. A decision tree rule for classifying the normal EEG segments and artifacts
7. CONCLUSION
8. ACKNOWLEDGMENTS
The work has been supported by the University of Jena (Germany) and
particularly by the University of Exeter (UK) under EPSRC Grant
GR/R24357/01. The authors are personally grateful to Frank Pasemann for
fruitful discussions, Joachim Frenzel and Burkhart Scheidt for the clinical
EEG recordings we used in our experiments, to Richard Everson and
Jonathan Fieldsend for useful comments.
where y are a target output, and x1∈ [-1, 1], x2 ∈ [-1, 1] are the input
variables.
If the user uses a fully connected neural network, what structure has to
be preset for this problem?
5. When and why will multivariate decision trees outperform decision trees
which test single variables? Regarding the XOR problem (2.) above,
which of these techniques is better?
6. Assume a 4-class problem. How many neurons are required to train linear
machine? What should be the preset structure of a neural network based
on pairwise classification for this case?
10. REFERENCES
Avilo Garcez, Broda K., Gabbay D., Symbolic knowledge extraction from trained neural
networks. Artificial Intelligence 2001; 125(1): 153-205.
Bishop C. M., Neural Network for Pattern Recognition. Oxford University Press, 1995.
Breidbach O., Holthausen K., Scheidt B., Frenzel J., Analysis of EEG data room in sud-den
infant death risk patients. Theory Bioscience 1998; 117: 377-392.
Brodley C., Utgoff P., Multivariate decision trees. Machine Learning 1995; 19(11):45-77.
Brunner D., Vasko R., Detka C., Monahan J., Reynolds C., and Kupfer D., Muscle arti-facts
in the sleep EEG: Automated detection and effect on all-night EEG power spectra.
Journal of Sleep Research 1996; 5:155–164.
Duke D., Nayak K. The EEG data, Florida State University. Retrieved June 2002 from
http://www.scri.fsu.edu/~nayak/chaos/data.html
Duda R.O., Hart P. E., Pattern Classification. Wiley Interscience, 2000.
370 Chapter 14
Abstract: Visualization is used in data mining for the visual presentation of already dis-
covered patterns and for discovering new patterns visually. Success in both
tasks depends on the ability of presenting abstract patterns as simple visual
patterns. Getting simple visualizations for complex abstract patterns is an es-
pecially challenging problem. A new approach called inverse visualization
(IV) is suggested for addressing the problem of visualizing complex patterns.
The approach is based on specially designed data preprocessing. Preprocessing
based on a transformation theorem is proved in this chapter. A mathematical
formalism is derived from the Representative Measurement Theory. The pos-
sibility of solving inverse visualization tasks is illustrated on functional non-
linear additive dependencies. The approach is called inverse visualization be-
cause it does not use data “as is” and does not follow the traditional sequence:
discover pattern visualize pattern. The new sequence is: convert data to a
visualizable form discover patterns with predefined visualization.
Key words: Visual data mining, simultaneous scaling, non-linear dependency, data pre
processing, reverse visualization.
1. INTRODUCTION
<patterns> <visualization>.
<visualization> <patterns>.
reasons specific to physics or can they be exploited for domains such as fi-
nance, medicine, remote sensing, and image analysis?
An explanation of simplicity in physics follows from two theories: the
Representative Measurement Theory [Krantz, Luce, Suppes & Tversky
1990] and the Physical Structures Theory [Kulakov, 1971; Mikhailichenko,
1985]. Measurement theory [Krantz et al., 1990, v.1] demonstrates that a
system of physical quantities and fundamental laws will have a simple repre-
sentation because they are obtained through a procedure that simultaneously
scales the variables involved in the laws.
Traditionally data mining does not involve simultaneous scaling. Note
that simultaneous scaling is different from the data normalization procedures
used in neural networks to speed up search, see for example [Rao & Rao,
1995]. The typical normalization in neural networks transforms the scale of
each variable independently and non-linearly to some interval, such as [-1,1].
On the other hand, simultaneous scaling of variables x, y and z might trans-
form these variables into new scales x’, y’ and z’ so that the law has the sim-
ple linear form, perhaps y’= x’+z’. In general, laws of classical physics
show that if all variables included in a law are scaled simultaneously then
the law can assume a relatively simple form.
The problem of finding efficient, simultaneous scaling transformations
was not posed and solved by Representative Measurement Theory. This the-
ory explains the simplicity effect but lacks a constructive way to achieve it.
On the other hand, Representative Measurement Theory has wider area of
application than physics only. For instance, psychology has benefited sig-
nificantly from it [Krantz et al., 1971]. This observation raises a hope that
simultaneous scaling will be beneficial in other areas too. This, of course,
requires designing simultaneous scaling transformations.
Fortunately, the theory of Physical Structures provides an answer for this
problem via the constructive classification of all functional expressions of all
possible fundamental physical laws [Mikhailichenko, 1985]. Classes defined
by this classification have an important property -- any other functional ex-
pressions of a physical law can be transformed to one of the given classes by
a monotone transformation of all involved variables.
The procedure for deriving such transformation is the simultaneous scal-
ing of these variables. This result shows, that every physical law can be de-
scribed as class of expressions that can be converted to each other by mono-
tone transformations of the variables contained in the law. This means that
all laws can be enumerated in the classification from of all functional ex-
pressions of all possible fundamental physical laws [Mikhailichenko, 1985].
All laws of this classification have simple form and by extension, the prob-
lem of their visualization is simple too. All complexity of visualization of
374 Chapter 15
2. DEFINITIONS
(1). ∀ z1, z2, ∃ x ( ƒ(x, z1) ≥ ƒ(x, z2) ∀ x' (ƒ(x', z1) ≥ ƒ(x', z2)) )
(3). For any three of x1, x2, z1, z2 the fourth of them exists such that
ƒ(x1, z2) = ƒ(x2, z1)
(5). For any z1, z2 : z1 ≠ z2, if a sequence x1, x2, … , xi, ... of elements of Xf
is determined and satisfies the following properties: ∀i, xi < xmax
ƒ(x1, z1) = ƒ(x2, z2), ƒ(x2, z1) = ƒ(x3, z2), ƒ(x3, z1) = ƒ(x4, z2), … ,
ƒ(xi, z1) = ƒ(x(i+1), z2), …
then this sequence is finite.
15. Visual data mining with simultaneous rescaling 375
In addition properties (2) and (3) should also take place with x replaced by z
and vice versa.
The theorem below is based on axioms (1)-(5) and is used for design of a
simultaneous scaling procedure.
Theorem [Krantz et al., 1971, p.257]:
1. For any function ƒ ∈ F there are one-to-one functions ϕx, ϕz and a
monotone function ϕ such that
2. If ϕ`(x), ϕ`(z) are two other functions with the same property, then
there exist constants α > 0, β1, and β2, such that
z3 = 3
z2 = 2
z1 = 1
<x0,z0>
x1 = 1 x2 = 2 x3 = 3
Figure 1. Simultaneous rescaling process
Let us link the points <x0, z1>, <x1, z0> as shown in Figure 1. Along this
line the function has identical values. These values are the values of Y scale
(which is not shown on the picture). It is easy to see, that these values of x, z,
and y satisfy the function x + z = y. We take a point <x1, z1> and assign
value y = ƒ(x1, z1) = 2 for this point.
Next we again apply the axiom (3). At first we apply it to values x1, x0, z1
and receive x2 such that ƒ(x1, z1) = ƒ(x2, z0) and then we apply it to values x1,
x0, z1 and receive z2, such that ƒ(x0, z2) = ƒ(x1, z1). After that we assign value
y = ƒ(x0, z2) = ƒ(x1, z1) = ƒ(x2, z0) = 2. Now we consider new points <x2, z1>
and <x1, z2>.
To make the given construction possible for all new points <x0, z3>,
<x3, z0> it is necessary, that the values of the function would be identical
ƒ(x2, z1) = ƒ(x1, z2) for points <x2, z1> and <x1, z2>. The equality ƒ(x2, z1) =
ƒ(x1, z2) follows from the axiom 2.
Figures 2 and 3 present such transformation. The surface in Figure 2 is
transformed to the surface in Figure 3 by the simultaneous rescaling of vari-
ables x, z, and y. It follows from the theorem, that if properties (1)-(5) take
place for some variables x, y, z, then the function ƒ ∈ F can be converted to
function y = x + z by rescaling variables. After this the visualization of re-
scaled data and function y = x + z is obvious (see Figure 3).
The rescaling algorithm requires that values of a function f on specific
pairs of values <x, z> satisfy properties (1)-(5) of the theorem. These proper-
ties are true for preference relations used in Decision Theory [Keeney &
Raiffa, 1976], but this is not a universally true condition for other tasks.
15. Visual data mining with simultaneous rescaling 377
Figure 2.
Figure 3.
(a) (b)
Figure 2. Data visualization: (a) original data, (b) simultaneously rescaled data.
See also color plates.
4. A TEST EXAMPLE
∀ a, b (a ≤1 b & a ≤2 b a ≤10 b)
The Discovery System [Kovalerchuk & Vityaev, 2000] can discover all
monotone regularities including those shown in (1) above and are actually
encoded in Table 1 along with random noise. When regularities (1) are dis-
covered, a simultaneous monotone rescaling of the data can be arranged and
the straightforward and simple visualization presented in Figure 3 below will
be generated.
Thus the major challenge is discovering the monotone regularities. The
Discovery System searches sequentially for monotone regularities starting
from simplest ones:
∀ a, b (a ≤ 4 b a ≤10 b).
and test it. The test reveals the needed regularity with a confidence level
equal to 0.1.
tion must be such that substitution makes sense. This is, in essence, leading
us to an additive conjoint structure assumption. Under this widely accepted
assumption, the practical issue is finding the chunks 1 and 2.
One of the options that can be used to solve this problem is the explicit
way where a SME (subject matter expert) declares, say, that 1=3 and 2=5
are equivalent for substitution purposes, that is SME formalizes preferences
as a model. This is typically a very difficult task. Another approach is the
implicit approach. In this approach, we just ask a SME to define preferences
for, say, about 100 pairs of multi-criteria decisions.
A SME can say that a decision with attributes (a1, a2, …, an) = (1, 5, …,
7) is better than a decision with attributes (a1, a2, …, an) = (3, 2, …, 5). Al-
ternatively, we may ask a SME to assign a priority to each (a1, a2, …, an)
alterative using a 0 to 100 percentage scale. Table 1 can be interpreted in this
way, where a10 can be viewed as a priority.
Both implicit alternatives provide us with a partially defined a scalar pri-
ority function, v:
that is the logarithm of v is an additive function. Thus, we can use the same
technique for multiplicative regularities.
7. PHYSICAL STRUCTURES
that are located on the intersection of any r rows i, k,…, q and any s columns
α, β, …, γ.
2. for r = 4, s = 2
3. for r = s ≥ 3
and also
0 1 1 1
1 Ψ [aiα ] Ψ [aiβ ] . . . Ψ [aiτ ]
1 Ψ [aiα ] Ψ [aiβ ] . . . Ψ [aiτ ] = 0 ;
.............................
1 Ψ [avα ] Ψ [avβ ] . . . Ψ [avτ ]
4. for r = s + 1 ≥ 3
8. CONCLUSION
Advanced
4. Solve the problems in exercises 1-3 for the n-dimensional case.
10. REFERENCES
Fayyad U., Grinstein G., and Wierse A., Eds., Information Visualization in Data Mining and
Knowledge Discovery, Morgan-Kaufman, 2001.
Keim D., Visual Exploration of Large Data Sets, Communication of ACM, vol. 44, N.8,
2001, pp. 39-44.
Keeney R.L., Raiffa H. Decisions with Multiple Objectives: preferences and value Tradeoffs.
John Wiley& Sons, 1976
Krantz DH, Luce RD, Suppes P, Tversky A: Foundations of Measurement v.1-3, Acad. Press,
NY, London. 1971, 1989, 1990.
Kuakov Yu.I. The One Principal Underlying Classical Physics, Soviet Physics – Doclagy,
V.15, #7, Jan., 1971, 666-668.
Kovalerchuk, B., Vityaev, E. Data Mining in Finance: Advances in Relational and Hybrid
Methods, Kluwer Acad. Publ., Boston, 2000.
Mikhailichenko G.G. Phenomenological and Group Symmetry in the Geometry of two Sets
(Theory of Physical Structures), Soviet Math. Docl. 32(2), 1985, 371-374.
Mikhailichenko G.G. Solution of functional equations in the theory of physical structures,
Doklady, Soviet Academy of Sciences, 1972, v 206, N.5 1056-1058.
Mille, H. (Ed) Geographic Data Mining & Knowledge Discovery, Taylor and Francis, 2001
Rao H., Rao V., C++ Neural Networks and Fuzzy Logic, Hungry Minds, Inc, 1995.
Soukup T., Davidson I., Visual Data Mining: Techniques and Tools for Data Visualization
and Mining, Wiley, 2002
386 Chapter 15
Abstract: This chapter describes a new technique for extracting patterns and relations
visually from multidimensional binary data using monotone Boolean func-
tions. Visual Data Mining has shown benefits in many areas when used with
numerical data, but that technique is less beneficial for binary data. This prob-
lem is especially challenging in medical applications tracked with binary
symptoms. The proposed method relies on monotone structural relations be-
tween Boolean vectors in the n-dimensional binary cube, En, and visualizes
them in 2-D as chains of Boolean vectors. Actual Boolean vectors are laid out
on this chain structure. Currently the system supports two visual forms: the
multiple disk form (MDF) and the “Yin/Yang” form (YYF). In the MDF,
every vector has a fixed horizontal and vertical position. In the YYF, only the
vertical position is fixed.
Key words: Visual Data Mining, explicit data structure, Boolean data, Monotone Boolean
Function, Hansel Chains, Binary Hypercube.
1. INTRODUCTION
The goal of visual data mining (VDM) is to help a user to get a feeling
for the data, to detect interesting knowledge, and to gain a deep visual under-
standing of the data set [Beilken & Spenke, 1999]. One of especially impor-
tant aspects of visual data mining is visualizing the border between patterns.
A visual result in which the border is simple and patterns are far away from
each other matches our intuitive concept of the pattern and serves as impor-
tant support that the data mining result is robust and not accidental.
388 Chapter 16
VDM methods have shown benefits in many areas when used with nu-
merical data, but these methods do not address the specifics of binary data,
where there is little or no variability in the visual representation of objects
for each individual Boolean attribute. VDM is especially challenging task
when data richness should preserved without the excessive aggregation that
often happens with simple and intuitive presentation graphics such as bar
charts [Keim, Hao, Dayal, & Hsu, 2002]. Another challenge is that often
such data lack natural 3-D space and time dimensions [Groth, 1998] and in-
stead require the visualization of an abstract feature.
The purpose of this chapter is to develop a technique for visualizing and
discovering patterns and relations from multidimensional binary data using
the technique of monotone of Boolean functions, which are also reviewed at
the end of the chapter. We begin with an analysis of the currently available
methods of data visualization.
Glyphs. A glyph is a 2-D or 3-D object (icon, cube, or more complex
“Lego-type” object). Glyph or iconic visualization is an attempt to encode
multidimensional data within the parameters of the icons, such as the shape,
color, transparency, orientation [Ebert, Shaw, Zwa, Miller & Roberts, 1996;
Post, van Walsum, Post & Silver, 1995; Ribarsky, Ayers, Eble & Mukherja,
1994].
Typically, glyphs can visualize up to nine attributes (three positions x, y,
and z; three size dimensions; color; opacity; and shape). Texture can add
more dimensions. Shapes of the glyphs are studied in [Shaw, Hall, Blahut,
Ebert & Roberts, 1999], where it was concluded that with large super-
ellipses, about 22 separate shapes can be distinguished on the average. An
overview of multivariate glyphs is presented in [Ward, 2002]. This overview
includes a taxonomy of glyph placement strategies and guidelines for devel-
oping such a visualization. Some glyph methods use data dimensions as po-
sitional attributes to place glyphs; other methods place glyphs using implicit
or explicit structure within the data set.
From our viewpoint, the placement based on the use of data structure is a
promising approach. We believe that the placement of glyphs on a data
structure is a way to increase the data dimensions that can be visualized. We
call this the GPDS approach (Glyph Placement on a Data Structure). It is
important to notice that in this approach, some attributes are implicitly en-
coded in the data structure while others are explicitly encoded in the glyph.
Thus, if the structure carries ten attributes and a glyph carries nine attributes,
we can encode a total of nineteen attributes. The number of glyphs that can
be visualized is relatively limited because of possible glyph overlap and oc-
clusion.
Spiral Bar and others techniques. Alternative techniques such as Gen-
eralized Spiral and Pixel Bar Chart are developed in [Keim, Hao, Dayal &
16. Visual data mining using monotone Boolean functions 389
Hsu, 2002]. These techniques work with large data sets without overlapping,
but only with a few attributes (these range from a single attribute to perhaps
four to six attributes). Another set of visualization methods, known as Scat-
ter, Splat, Map, Tree, and Evidence Visualizer, are implemented in MineSet
(Silicon Graphics), which permits up to eight dimensions to be shown on the
same plot by using color, size, and animation of different objects [Last &
Kandel, 1999].
Parallel coordinate techniques. This visualization [Inselberg & Dims-
dale, 1990] can work with ten or more attributes, but suffers from record
overlap and thus is limited to tasks with well-distinguished cluster records.
In parallel coordinates, each vertical axis corresponds to a data attribute (xi)
and a line connecting points on each parallel coordinate corresponds to a
record. Figure 1 depicts vectors
01010;11010;01110;01011;01111;11011;11111;10101;11101;10111 (1)
Can we discover a regularity that governs the dataset in Figure 1? This fig-
ure represent these data in parallel coordinates. It is difficult, but the regular-
ity is a simple monotone Boolean function (x2 & x4) ∨ (x1 & x3 & x5). This
function is true for vectors from (1).
• The poor scalability of visual data analysis can fail when represent-
ing hundreds of attributes.
• Humans unable to perceive more than six to eight dimensions on the
same graph.
• The slow speed of manual interactive examination of the multi-
dimensional, multi-color charts is a drawback.
We are interested developing a technique that can work with ten or more
Boolean attributes. Many data mining problems can be encoded using Boo-
lean vectors, where each record is a set of binary values {0; 1} and each re-
cord belongs to one of two classes (categories) that are also encoded as 0 and
1. For instance, a patient can be represented as a Boolean vector of symp-
toms along with an indication of the diagnostic class (e.g., benign or malig-
nant tumor) [Kovalerchuk, Vityaev & Ruiz, 2000, 2001].
For n-dimensional Boolean attributes, traditional glyph-based visualiza-
tions are useful but somewhat limited. Attributes of a Boolean vector can be
encoded in glyph lengths, widths, heights, and other parameters. There are
only two values for the length, width, and other parameters for each Boolean
vector. Thus, there is not much variability in visual representation of objects.
When plotted as nodes in a 3-D binary cube, many objects will not be visu-
ally separated.
The approach and methods described below do not follow the traditional
glyph approaches that would put n-dimensional Boolean vectors (n > 3) into
3-D space, making them barely distinguishable. The methods rely on mono-
tone structural relations between Boolean vectors in the n-dimensional
binary cube, En. Data are visualized in 2-D as chains of Boolean vectors.
Currently, the system supports two visual forms: the Multiple Disk Form
(MDF) and the “Yin Yang” Form (YYF).
Other vectors can occupy fixed positions in between based on their nu-
meric value (binary/decimal) where Boolean vectors are interpreted as num-
bers with the low-order bits located on the right. In this case, since the verti-
cal position is centered, the vector in row 3 is an average of binary numbers
(not vectors) 00111 111112 and 11111 111002 and that can be computed.
Here the subscript 2 indicates a binary number. We call this visualization a
Table Form Visualization (TFV).
Note that for n = 10 the maximum number of vectors on each row is 252
that is the number of combinations for five “1”s out of ten. Consider a dis-
play having a horizontal screen resolution of 1024 = 210 pixels, we can use 4
pixels per bar and can easily visualize 10-dimensional space in 2-D as it is
shown in Figure 2.
Using only one pixel per bar we can accommodate 12 binary dimensions
since we would have 924 combinations for choosing six “1”s out of 12 and
that is still less then 1024. With a higher resolution screen and/or multiple
monitors, we could increase the dimensionality, but this has obvious limits
of about n = 14 where 3432 pixels are needed on a row.
Several options are available to deal with this exponentially growing di-
mensionality. One of them is grouping, that is showing high dimensional
data by their projections to, say, 12-D, which is much larger than traditional
conversion to 2-D. Then methods such as principal components can be used
with visualization of first 12 principal components instead of first two prin-
cipal components.
Beyond this, we need to notice that for n = 20, the number of elements in
the space is 220 = 1,048,576. If a dataset contains 8192 = 213 vectors, then
they would occupy no more than 213/220 = 2-7 = 1/128 fraction of the total
space, that is less than 1%. This means that a visualization like that shown in
Figure 2 may not use 99% of the screen. Thus, the visualization columns that
are not used can be reduced in order to enable the visualization of vectors
with n = 20 or more.
The visualization shown in Figure 3 is a modification of the visualization
shown in Figure 2, where all repeating vectors are deleted. In Figure 2, the
top and bottom vectors are repeated 252 times. Each level in Figures 3 is
called a disk and the entire visualization is called the multiple disk form
(MDF). In the MDF form, every vector has a fixed horizontal and vertical
position as shown in Figure 3.
procedure for MDF is based on the decomposition of the binary cube, En,
into chains.
For instance, the vectors given in (1) above form several natural chains as
shown in Table 1 where the gray elements appear in several chains. Chain 1
contains four elements starting from (01010) and ending up with (11111).
Each following element is greater than previous one, that is, some 0 posi-
tions are exchanged for 1’s as in (01010) and (11010) where the first posi-
tion is changed to 1. We also note that each vector in the second row has two
1’s. Similarly each vector in third and forth rows has three or four 1’s. As
previously noted such numbers indicate the level of the Boolean vector and
are also referred to as its norm.
While vectors on the same chain are ordered, vectors on the different
chains may not be ordered. There is only a partial order on Boolean vectors.
The partial order is defined as follows: vector a = (a1, a2, …, an) is
greater or equal to vector b = (b1, b2, …, bn) if for every i = 1, 2, …, n; ai
bi. This partial order means that chains may overlap as shown in Table 1. We
will use the notation a b if Boolean vector a is greater than or equal to Boo-
lean vector b. A set of vectors v1, v2, …, vn is called a chain if
v1 v2 … vn .
We focus special chains of vectors called Hansel chains, which are de-
scribed in section 6. The Hansel chains are computed and then aligned verti-
cally. Procedure P2 applied to MDF moves vectors with regard to Hansel
chains. First Hansel chains for vectors of size (dimension) n are computed,
and then every vector belonging to the chain will be moved to align the
Hansel Chain vertically. Hansel chains have different lengths with possible
values from 1 to n elements. To keep the integrity of the MDF structure, we
have to place these chains so that no elements fall out of the disks. Hence,
the longest chain will be placed on the center of the disk and the others
chains will be placed alternatively to the right and left of the first chain.
Moreover, procedure P2 will always assign a fixed position to each vector.
This position does not change from one dataset to another. This again allows
direct comparison to be done between different Boolean functions. P2 visual-
izes a certain extent the structure of the Boolean function, but it does not
really visualize the border between classes.
16. Visual data mining using monotone Boolean functions 395
Next, the procedure P2 unveils parts of the structure of the Boolean func-
tion, see Figure 5. Recall, P2 still has a fixed place for each Boolean vector.
Hence, it still permits the comparison of multiple functions. However as
Figure 5 shows, the border visualized can be very complex.
Procedure P3. Hence, we introduce a third procedure, P3, for MDF. This
procedure tries to move all Hansel chains to the center of the disk. It is based
on: the level of the first “1” value in each chain for a given Boolean func-
tion, and the requirement that the disk architecture should be preserved. In
this way, two different functions will produce distinct visualizations. Figure
6 demonstrates that the results of P3 more easily visualize the border between
the two classes.
Here, the concept of the first “1” on the chain means that chain may con-
tain elements from both classes, say benign (class “0”), and malignant (class
“1”). Procedure P3 is a derivative of P2. After computing and placing the
vectors using P2, every Hansel chain will be given a value l equal to the level
of the first 1 value present within the chain. Next, every Hansel chain will be
moved so that the chain with the highest l value is located in the center of the
disk so that the MDF structure is kept. Using this procedure, we are able to
group classes within the MDF. Nevertheless to keep the MDF structure, the
chains have to be placed in a position related to their length. Unfortunately,
this potentially introduces a complex border between classes because of a
possible gap between groups of vectors within the same class.
Procedure P4. Using P3 makes the border obvious, but the border is still
divided into several pieces because of the structure of the MDF itself. We
can see the gaps between the black parts. However, each white element
placed on the top (belonging to class 0) actually expands to an element of the
class. Therefore, the border should be visualized as continuous rather than
16. Visual data mining using monotone Boolean functions 397
interrupted. Since the borders produced by P3 can still be complex and thus
difficult to interpret visually, we introduce the YYF structure along with
procedure P4.
In the YYF, the movement of all chains is based on only the level of the
first 1 in each chain. Additionally, this structure allows filling the gaps be-
tween the groups. The set of chains is sorted from left to right according to
the indicated level and providing a clear, simple border between the two
classes of Boolean vectors. Procedure P4 extends every Hansel chains cre-
ated before placing them according to the same method used in P3. The first
step consists in extending the Hansel chain with elements extended in rela-
tion to the edge elements both up and down. The YYF structure shows this
continuous border build using Procedure P4, see Figure 7.
Edge elements are Boolean vectors that form the border between two
classes on the chain. To extend a chain up, we try to find the first vector
belonging to class 1 above the edge element. That is, given the edge vector x
we look for a vector y where y x, if no such y is found on the level just
above the x level, we then add a vector z from class 0 so that z x and so
that the path to the first vector y z of class 1 is minimized.
We repeat these steps until we find a vector y from the 1 class and add it
to the chain. To expand a chain down, we apply the same steps reversing the
relation y x and swapping the classes 0 and 1. Using this procedure, we
duplicate some of the vectors which would display them more than once.
This is justified because we keep a consistent relative relationship between
the vectors.
Once the chains are expanded, a value l will be assigned to each chain,
just as in the procedure P3. Then, the chains will be sorted with regard to this
value. This approach visualizes a simple border between classes 0 and 1. In
our attempt to visualize the borders between the two classes of elements, we
moved the data out of the MDF structure. In the YYF vectors are ordered
vertically in the same way as in the MDF but they are not centered anymore.
398 Chapter 16
All vectors are moved with regard to the data in order to visualize to the bor-
der between the two classes. In this way, a clear border will appear, class 1
being above class 0 thus giving the YYF the “Yin Yang”-like shape ၛ that
responsible for its name.
This experiment was conducted with breast cancer data that included
about 100 cases with an almost equal number of benign and malignant re-
sults. Each case was described by 10 binary characteristics (referred to as
“symptoms” for short) retrieved from mammographic X-ray images
[Kovalerchuk, Vityaev & Ruiz, 2001]. The goal of experiment was to check
the monotonicity of this data set which is important from both radiological
and visualization viewpoints. Figure 8 shows initial visualization where
cases are aligned as they were in Figure 4; that is, by allocating cases as bars
at fixed places using MDF and P1. As we mentioned in section 5, this proce-
dure permits the comparison of multiple functions and data sets. Comparison
of Figures 4 and 8 immediately reveals their difference. Figure 8 presents the
layered distribution of malignant and benign cases visualized as bars, where
white bars represent the benign cases and black bars represent the malignant
cases.
Figure 8. Breast cancer cases based on characteristics of X-ray images visualized using fixed
location procedure P1
16. Visual data mining using monotone Boolean functions 399
All cases in the same layer have the same number of cancer positive
symptoms, but the symptoms themselves can be different. Light grey areas
indicate monotonic expansion of benign cases to lower layers for each be-
nign case and dark grey areas indicate monotonic expansion of malignant
cases to upper layers for each malignant case. Figures 9, 10 and 11 are modi-
fications of Figure 8 based on procedures P2 and P3.
Benign cases are lined up monotonically. That is, each benign case below
a given benign case contains only a part of its positive cancer characteristics.
Similarly malignant cases (bars) are also lined up monotonically. Thus, a
malignant case above a given malignant case contains more positive cancer
characteristics than the given malignant case. The vertical lines (chains) that
contain both benign and malignant cases are most interesting for further
analysis as we shall see in Figure 11. Figure 11 is a simple modification of
Figure 10 where the cases or areas around the bars are separated by frames.
Figure 12 shows a fragment of the chain from Figure 11 that is rotated 90o.
This chain demonstrates a violation of monotonicity, where after a benign
(white) case on the layer 7 we have a “malignant” (grey) case on the layer 6
that was obtained via monotonic expansion. It also shows how narrow the
border is between benign and by two malignant (black) cases on layers 8 and
9 with out any gap. An actual benign case on layer 7 is immediately fol-
lowed by two actual malignant (black) cases on layers 8 and 9 with out any
gap.
400 Chapter 16
Figure 11. Breast cancer cases visualized using procedure P3 with cases shown as bars with
frames. See also color plates.
16. Visual data mining using monotone Boolean functions 401
1, malignant 1, malignant 0, benign “1”, malignant”
Figures 8-12 reveal that there are inconsistencies with monotonicity for
several of the cases. Note, here cases are shown without frames and several
bars may form continuous areas filled with the same color.
The “white” case is an inconsistent case if there are black and dark grey
bars (areas) above and below it. Similarly the “black” case is inconsistent if
there are white and light grey cases above and below it. Both types of incon-
sistent cases are presented in Figure 9. We will call them white and black
inconsistencies, respectively.
This visualization permits us building different monotone Boolean func-
tions interactively and visually for situations with inconsistencies. The first
way to do this is to find all white inconsistencies and convert all elements
below them to white bars. This process is called a white precedence
monotonization. Similarly, we can use a black precedence monotonization
that converts all white and light grey elements above inconsistent black
cases to black.
If we use the black precedence, then such monotone expansion will cover
100% of the malignant cases. This means that we will have some false posi-
tive cases (benign cases diagnosed as malignant), which of course is better
than having a false negative (cancer cases diagnosed as benign). The latter
occurs when we give precedence to monotonic expansion of benign cases
(white precedence monotonization). If the black case monotonization pro-
duces too many false positives we may check the sufficiency of the parame-
ters used. The violation of monotonicity may indicate that more parameters
(beyond 10 parameters used) are needed. The advantage of described ap-
proach is that we build a visual diagnostic function that is very intuitive.
Inconsistencies can be analyzed globally followed by pulling up inconsistent
cases for further analysis as shown in Figure 12.
Two 3-D versions of the Monotone Boolean Visual Discovery (MBVD)
method (programmed by Logan Riggs) are illustrated in Figures 13 and 14.
The first version uses only vertical surfaces and is quite similar to the 2-D
versions. The second version (Figure 14) uses both vertical and horizontal
surfaces. 3-D versions have several advantages over 2-D versions. The first
one is the ability to increase the dimensionality n that can be visualized. It is
done by using front, back and horizontal surfaces of disks and by grouping
similar Hansel chains and by showing only “representative” chains in the
global disk views (see Figure 15 for grouping illustration). More detail can
402 Chapter 16
be provided by changing camera location, which permits one to see the back
side of the disks combined with the semantic zoom that permits one to see
all chains not only the “representative” ones when the camera closes up on
the disk.
Figure 13. A 3-D version of Monotone Boolean Visual Discovery with only vertical sur-
face used
Figure 14. A 3-D version of Monotone Boolean Visual Discovery with vertical and horizontal
surfaces used. See also color plates.
16. Visual data mining using monotone Boolean functions 403
We then cut the maximum element of Emax and add it to Emin. Thus, the
Hansel chains for the size 2 (dimension n=2) are
7. CONCLUSION
Visual data mining had shown benefits in many areas when used with
numerical data. However, classical VDM methods do not address the spe-
cific needs of binary data. In this chapter we had shown how to visualize real
patterns contained in Boolean data. The first attribute to consider while or-
dering Boolean vectors is its norm. Using the Boolean norm of the vectors,
we are able to split the data into n + 1 groups (n being the number of ele-
ments of vectors). Each group is assigned to a vertical position. Then, multi-
ple methods can be used to assign the horizontal position.
We created two different structures to handle data: MDF and YYF. In
each structure, horizontal position of vectors is then handled by a specific
procedure. Three procedures are specific to the MDF structure and one is
specific to the YYF structure.
The first procedure P1 is specific to MDF and orders vectors with regard
to the natural order, converting a Boolean value into a numerical decimal
16. Visual data mining using monotone Boolean functions 405
value. This procedure does not visualize the real structure of data, but, per-
mits direct comparison between multiple functions.
The second procedure P2 is specific to MDF and orders vectors in order
to visualize Hansel chains. This procedure visualizes the structure of the data
itself. However, it does not really visualize relations among data but it still
permits a direct comparison between several Boolean functions.
The third procedure P3 is specific to MDF and orders the Hansel chains.
This procedure unveils the border between the two classes of elements (0
and 1). In order to keep the MDF structure intact, the border is visualized as
being interrupted and thus differences can be visualized between multiple
Boolean functions. However, in monotone Boolean functions, vectors be-
longing to class 0 actually all expand to a vector of class 1. Hence, the bor-
der should be continuous.
The last procedure, P4, is specific to YYF. This procedure visualizes the
real border that exists between the two classes of elements. This is done by
expanding Hansel chains up and down duplicating some elements in the
process. This new approach has proved to be appropriate to handle discovery
of patterns in binary data. By further developing these procedures for non-
monotone Boolean functions and data and k-valued data structures, we be-
lieve that the new approach can be used in variety of applications.
1. Define Hansel chains for n = 3 using the chains for n = 2 and a recursive
procedure described in Section 6.
2. Define Hansel chains for n = 4 using the chains for n = 3 built in exercise
1 and a recursive procedure described in Section 6.
4. Draw Hansel chains for n=3 and visualize the Boolean function f(x1,x2,x3)
= x1x2 ∨ x3 x4 3 by marking each element of each chain with its value “1” or
“0”.
406 Chapter 16
9. REFERENCES
Beilken, C., Spenke, M., Visual interactive data mining with InfoZoom -the Medical Data Set.
The 3rd European Conference on Principles and Practice of Knowledge Discovery in
Databases, PKDD ’99, Sept. 15-18, 1999, Prague, Czech Republic.
http://citeseer.nj.nec.com/473660.html
Ebert, D., Shaw, C., Zwa, A., Miller, E., Roberts, D. Two-handed interactive stereoscopic
visualization. IEEE Visualization ’96 Conference.1996, pp. 205-210.
Goel, A., Visualization in Problem Solving Environments. Virginia Polytechnic Institute and
State University, 1999.MS. Thesis, 64 p. www.amitgoel.com/vizcraft/docs/ mas-
ters_thesis.pdf
Groth, D., Robertson, E., Architectural support for database visualization, Workshop on New
Paradigms in Information Visualization and Manipulation, 53-55,1998, .pp. 53-55.
Hansel, G., Sur le nombre des functions Bool´eenes monotones de n variables. C.R. Acad.
Sci., Paris (in French), 262(20), pp.1088–1090, 1966
Inselberg, A., Dimsdale, B., Parallel coordinates: A tool for visualizing multidimensional
Geometry. Proceedings of IEEE Visualization ’90, Los Alamitos, CA, IEEE Computer
Society Press, 1990, pp. 360–375.
Keim,D., Ming C. Hao, Dayal, U., Meichun Hsu. Pixel bar charts: a visualization technique
for very large multiattributes data sets. Information Visualization, March 2002, Vol. 1,
N. 1, pp. 20–34.
Kovalerchuk, B., Triantaphyllou, E., Despande, A.,Vityaev, E., Interactive Learning of Mono-
tone Boolean Functions. Information Sciences, Vol. 94, issue 1-4, pp. 87–118, 1996.
Kovalerchuk, B., Vityaev, E., Ruiz, J., Consistent knowledge discovery in medical diagnosis.
IEEE Engineering in Medicine and Biology, (Special issue on Data Mining and Knowl-
edge Discovery), v. 19, n. 426–37, 2000.
Kovalerchuk, B., Vityaev, E., Ruiz, J., Consistent and complete data and “expert” mining in
medicine. Medical Data Mining and Knowledge Discovery, Springer, 2001:238–280.
Last, M., Kandel, A., Automated perceptions in data mining, invited paper. 1999, IEEE
International Fuzzy Systems Conference Proceedings, Part I, Seoul, Korea, Aug 1999,
pp.190–197.
Post, F., T. van Walsum, Post,F., Silver, D., Iconic techniques for feature visualization. In
Proceedings Visualization ’95, pp. 288–295, 1995.
Ribarsky, W., Ayers, E., Eble, J., Mukherja, S., Glyphmaker: creating customized visualiza-
tions of complex data. IEEE Computer, 27(7), pp. 57–64, 994.
Shaw, C., Hall, J., Blahut, C., Ebert, D., Roberts. A., Using shape to visualize multivariate
data. CIKM’99 Workshop on New Paradigms in Information Visualization and Manipu-
lation, ACM Press, 1999, pp. 17-20.
Ward, M., A taxonomy of glyph placement strategies for multidimensional data visualization.
Information Visualization 1, 2002, pp. 194–210.
PART 5
Abstract: With the growing use of geospatial data arising from multiple sources com-
bined with a variety of techniques for data generation and a variety of re-
quested data formats, the problem of data integration poses the challenging
task of creating a general framework for both carrying out this integration and
decision making. This chapter features a general framework for combining
geospatial datasets. The framework is task-driven and includes the develop-
ment of task-specific measures, the use of a task-driven conflation agent, and
the identification of task-related default parameters. The chapter also de-
scribes measures of decision correctness and the visualization of decisions and
conflict resolution by using analytical and visual conflation agents.
1. INTRODUCTION
the result of the conflation is a combined image produced from two or more
images with: (1) matched features from different images and (2) transforma-
tions that are needed to produce a single consistent image. Registration of a
new image can be done by conflating it with a registered image.
Conflation has been viewed as a matching technique that fuses imagery
data and preserves inconsistencies (e.g., inconsistencies between high and
low resolution maps, “best map” concept, [Edwards & Simpson, 2002]). This
approach tries to preserve the pluralism of multi-source data. The traditional
approach [DEMS, 1998] uses an “artistic” match of elevation edges. If the
road has a break on the borderline of two maps then a “corrected” road sec-
tion starts at some distance from the border on both sides and connects two
disparate lines. This new line is artistically perfect, but no real road may ex-
ist on the ground in that location (see Figure 1).
Thus, conflation and registration are typical and important parts of geo-
spatial decision making process that is highly visual by its nature. The de-
mand for visual geospatial decision making is coming from ecology, geogra-
phy, geology, archeology, urban planning, agriculture, military, intelligence,
homeland security, disaster relief operations, rescue missions, and construc-
tion tasks. The range of examples is abundant and includes tasks such as:
• assessment of mobility/trafficability of an area for heavy vehicles,
• dynamic assessing flood damage, and
• assessing mobility in the flood area.
Traditional cartography provided paper maps for analysis and decision-
making in these domains. Recently there has been a massive proliferation of
spatial data, and the traditional paper map is no longer the final product of
cartography. In fact, the focus of cartography has shifted from map produc-
tion to the management, combination, and presentation of spatial data. Maps
can be (and often are) produced on-demand for any number of specialized
purposes. Unfortunately, data are not always consistent. As such, data com-
17. Imagery intergration as conflict resolution decision process 411
bination (or conflation [Jensen, Saalfeld, Broome, Price, Ramsey & Lapine,
2000; Cobb, Chung, Foley, Petry, Shaw & Miller, 1998; Rahimi, Cobb, Ali,
Paprzycki & Petry, 2002]) has become a significant issue in cartography,
where mathematical methods and advanced visual decision making must
have a profound impact upon cartography.
This task has a dual interest in the process of visual decision making (the
focus of this book). First, conflation creates a base for making decisions such
as selection of transportation routes and construction sites. Second, many
decisions are made visually in the process of conflation itself. Conflict reso-
lution in matching features from different sources often requires visual in-
spection of maps and imagery. Feature f1 in a geospatial dataset may have
two features f2 and f3 in another dataset that appear to match f1. An analyst
using contextual visual information may resolve the ambiguity by deciding
that f1 should be matched to f2 only. Such visual problem solving process
may involve computing mathematical similarity measures between features,
analysis of their names and other non-spatial attributes as well as using the
analyst’s tacit knowledge about the context of the task and data.
The conflation problem is discussed in several chapters of this book. This
chapter provides a wide overview and a conceptual framework including
visual aspects. A task-driven approach is elaborated in Chapter 18. Chapter
19 describes algebraic mathematical approach to conflation and a combined
algebraic, rule rule-based approach is presented in Chapter 21.
behind this decision could be as follows. The expert analyzed the whole im-
age as a context for the decision. The expert noticed that both features F1
and F3 are small road segments and are parts of much larger road systems A
and B that are structurally similar, but features F1 and F2 have no such link.
This conclusion is very specific for a given pair of images and roads. The
expert did not have any formal definition of structural similarity in this rea-
soning. Thus, this expert’s reasoning may not be not sufficient for imple-
menting an automatic conflation system. Moreover, the informal similarity
the expert used for one pair of images can differ from the similarities the
same expert might use for two other images.
There are two known approaches for incorporating context: (1) formalize
the context for each individual image and task directly and (2) generalize the
context in the form of expert rules. Here for the first approach, the challenge
is that there are too many images and tasks and there currently is no unified
technique to for context formalization. The second approach is more general
and more feasible, but in some cases may not match a particular context and
task, thus a human expert still needs to “take a look.” Visual decision mak-
ing is necessary.
Consider a simple example. Suppose that a task-specific goal may be to
locate individual buildings at a spatial accuracy of ±20 meters. Suppose fur-
ther that there are two spatial datasets available – one set with roads at ±5
meters and the other set containing both roads and buildings at ±50 meters.
Obviously, neither dataset can properly answer the question. Now suppose
the two datasets are conflated. Is the spatial accuracy of the new image ±20
meters or better? If so, the process is a success. If not, the users will have to
either find new data or just accept the inaccuracies in one dataset. As a side
note, it is important that the conflation process should not be used to simplify
datasets (i.e., combine two datasets into one and delete the original data), but
rather to answer specific questions.
The conflation task includes:
• combining geospatial data (typically represented by geospatial
features),
• measuring conflict in the combined data,
• deconflicting the combined data, and
• testing the appropriateness of the conflation relative to the stated
problem/task definition.
A single common flexible framework is needed that will integrate di-
verse types of spatial data with the following capabilities [Jensen et al.,
2000]: (1) horizontal data integration (merging adjacent data sets), (2) verti-
cal data integration (operations involving the overlaying of maps), (3) tem-
poral data integration, (4) handling differences in data content, scales, meth-
ods, standards, definitions, practices, (5) managing uncertainty and represen-
17. Imagery intergration as conflict resolution decision process 413
tation differences, (6) detecting and deal with redundancy and ambiguity of
representation, (7) keeping some items unmatched, and (8) keeping some
items to be matched with limited confidence.
Several challenges have been identified in [Mark, 1999].
• The representational challenge – finding a way of merging spatial
data from variety of sources without contradiction. Often this chal-
lenge cannot be fully met.
• The uncertainty challenge – finding a way of measuring, model-
ing, and summarizing inconsistencies in merged data. Often incon-
sistencies are inevitable in merging spatial data.
• The visualization challenge – finding a way to visualize differ-
ences between different digital representations and real phenomena.
Measure Good no
specific Matching conflation? Matching
to task A Modify
conflation
method
yes Conflation
Measuring method
3. Visualization specific
and visual Visualize to task A
correlation
tools
Use
Result of
conflation Conflating
for task A task A
Figure 2. Framework for task driven conflation process
This design of the system integrates analytical and visual problem solving
processes. We note that different tasks may surely require different accura-
cies and different measures of correctness of the conflation.
The system of task-specific measures of correctness of the conflation
serves a repository of such measures as a part of the conflation knowledge
base. Different goals along with measures may also require different confla-
tion methods.
The system of task-specific conflation methods is another component of
the conflation knowledge base. The system of visualization and visual
correlation tools opens the way for visualizing conflicts in the data being
conflated, for portraying relations, and for helping to discover relations.
In Figure 2, a specific task, Task A, is matched to both knowledge bases
(1) and (2) and measures and methods specific for Task A are retrieved. Then
the specific conflation method is applied to Task A and the result is tested
using the measure of correctness specific to Task A. If the result is appropri-
ate for Task A, then it is visualized and delivered to the end user. Otherwise,
the parameters of the conflation method are modified and the procedure is
repeated until an acceptable level is achieved.
The conflation process can use a variety of data along with metadata. The
geometry and topology data classes that provide information on points, vec-
tors, and structure are most critical data classes.
Quality metadata include: (1) statistical information such as random er-
ror, bias characteristics of digital terrain elevation data, and location error for
a feature, and (2) expert-based information such as “topologically clean,”
“well matched,” “highly contradictory,” “bias,” “consistent around poly-
gons,” and “noticeable” (e.g., noticeable can mean edge breaks of approxi-
mately 1 to 3 vertical units of resolution).
Typically, data quality is assessed using measures such as accuracy, pre-
cision, completeness, and consistency of spatial data over space, time, and
theme. It is identified relative to the database specifications (i.e., if the data-
base specification states that objects must be located within ±100 meters, and
all objects are located to that accuracy, the data is 100% accurate). As such,
appropriate use (or combination) of data is always relative to both the de-
sired output accuracy stated in the goal and the quality of the input data.
RGi: F1 F2 .
Such RGi rules are called task-driven rules. Another set of rules in KB are
the task-free rules that are used when the user is uncertain about the goals of
data conflation and cannot formulate a definitive goal for conflation. A sub-
set of these rules, TG can be tested for randomly selected potential goals and
matched by the time user-identified task rules are ready to be fired. The rule
below is an example of a task-free conflation rule:
Zitova and Flusser’s [Zitova & Flusser, 2003] survey of conflation meth-
ods and classify them according to their nature (area-based and feature-
based) and according to the four basic steps of image registration: feature
detection, feature matching, mapping function design, and image
transformation and re-sampling. This comprehensive review concludes with
the following statement: “Although a lot of work has been done, automatic
image registration still remains an open problem. Registration of images with
complex nonlinear and local distortions, multimodal registration, and regis-
tration of N-D images (where N > 2) belong to the most challenging tasks at
this moment. . .The future development on this field could pay more atten-
tion to the feature-based methods, where appropriate invariant and modality-
insensitive features can provide good platform for the registration. In the fu-
ture, the idea of an ultimate registration method, able to recognize the type of
given task and to decide by itself about the most appropriate solution, can
motivate the development of expert systems. They will be based on the com-
bination of various approaches, looking for consensus of particular results.”
Zitova & Flusser’s vision of the future reflects the process depicted in Figure
2.
416 Chapter 17
The intent here is not to summarize the more than 1000 papers on regis-
tration and conflation published in the last 10 years nor repeat the excellent
surveys [Zitova & Flusser, 2003; Brown, 1992], but to review some of the
techniques developed by the image processing community for integration of
raster and vector images relevant to this book. It is also important to note the
significant activities of medical imaging community in image integra-
tion/registration such as the Second International Workshop on Biomedical
Image Registration (WBIR'03) in Philadelphia. Below we list some represen-
tative methods from recent research on registration and spatial correspon-
dence presented at this workshop for rigid and non-rigid image registration
based on: vector field regularization, curvature regularization, spatio-
temporal alignment, entropy, mutual information, variational curve match-
ing, normalized mutual information, K-means clustering, shading correction,
piecewise affine transformation, elastic transformation, multiple channels,
modalities and dimensions, similarity measures, the Kullback-Leibler dis-
tance, block-matching features, voxel class probabilities, intensity-based 2D-
3D spines, orthogonal 2D projections. Despite similarities there are signifi-
cant differences between medical and geospatial imagery. Medical imagery
is produced in a more controlled environment with less dynamics and vari-
ability in resolution but often more metadata.
Hirose et al. [Hirose, Furuhashi, Kitamura & Araki, 2001] do not assume
that fiducial points are known. Rather they automatically extract four corre-
sponding points from images. These points are used to derive an affine trans-
formation matrix, which defines a mutual position relationship between two
consecutive range images. Such images can then be concatenated using the
derived affine transformation.
Wang et al. [Wang, Chun & Park, 2001] use GIS-assisted background
registration that discerns different clutter regions in the initial image frame
by using a feature vector composed of vertical and horizontal autocorrela-
tion. The authors also build filters tuned to each class. In the successive
frames, they classify each region of different clutter from a contour image
obtained by projecting the GIS data and by registering it to the previous im-
age.
Finally, the reader is directed to [Bartl & Schneider, 1995] who demon-
strate that knowledge of the geometrical relationship between images is a
prerequisite for registration. Assuming a conformal affine transformation,
four transformation parameters are determined on the basis of the geometri-
cal arrangement of characteristic objects extracted from images. An algo-
rithm is introduced that establishes a correspondence between (centers of
gravity of) objects by building and matching so-called angle chains, a linear
structure for representing a geometric (2D) arrangement.
17. Imagery intergration as conflict resolution decision process 417
Several other related areas that face similar challenges are image retrieval
from multimedia databases that use image content [Shih, 2002], multimedia
data mining [Perner, 2002], and information extraction from heterogeneous
sources [Ursino, 2002]. Progress in each of these areas along with progress
in image integration should prove to be mutually beneficial.
Pope and Theiler [Pope & Theiler, 2003] and Lofy [Lofy, 2000] apply
image photogrammetric georeferencing using metadata about sensors com-
bined with the edge extraction and matching at three levels of resolution.
The last of these systems has been used to automatically register synthetic
aperture radar (SAR), infrared (IR), and electro-optical (EO) images within a
reported 2-pixel accuracy.
Growe and Tonjes [Growe & Tonjes, 1997] present a rule-based ap-
proach to imagery registration for automatic control point matching when
flight parameters are inaccurate. Prior knowledge is used to select an appro-
priate structure for matching, i.e. control points from a GIS database, and to
extract their corresponding features from the sensor data. The knowledge is
represented explicitly using semantic nets and rules. The automatic control
point matching is demonstrated for crossroads in aerial and SAR imagery.
A recent special issue of Computer Vision and Image Understanding
Journal [Terzopoulos, Studholme, Staib & Goshtasby, 2003] is devoted to
non-rigid image registration based on point matching, distance functions,
free boundary constraints, iconic features and others. The next step in geo-
spatial data registration is video registration [Shah & Kumar, 2003], which
faces many of the same challenges as static image registration discussed
above.
The new use of algebraic invariants described in Chapter 19 is a very
general image conflation method. For example, it can be used with digitized
maps (e.g. in USGS/NIMA vector format), aerial photos and SRTM data
that have no common reference points established in advance. The images
may have different, (and often) unknown scales, rotations and accuracy.
The method assumes that map, aerial photo or SRTM data images each
have as least 5 well-defined linear features that can be presented as polylines
(continuous chains of linear intervals). A feature on one image might be only
a portion of the same feature on another image. Also features might overlap
or have no match at all. It is further assumed that these well-defined linear
features can be relatively easy extracted. The major steps of the method are:
Step 1. Extract linear features as sets of points (pixels), S.
Step 2. Interpolate these sets of points S as a specially designed polyline.
Step 3. Construct a matrix P of the relation between all lengths of intervals
on the polyline (see below for more details). These matrixes are com-
puted for all available polylines.
418 Chapter 17
Step 4. Construct a matrix Q of the relation between all angles on the poly-
line. These matrixes are computed for all available polylines.
Step 5. Search common submatrixes in the set of matrices P and compute a
measure of closeness.
Step 6. Search common submatrixes in the set of matrices Q and compute a
measure of closeness.
Step 7. Match features using the closest submatrix.
At first glance, it is not clear how useful a visual approach can be for al-
gebraic, rule-based and task-driven problem solving. The algebraic approach
described in Chapter 19 benefits from visuals by having visual spatial rela-
tions as a source of intuitive problem understanding. It provides insight for
discovering an algebraic formalization of human conflation process. For in-
stance, the matrix for relations between angles of the line segments captures
a human way of analyzing similarities between two lines by noticing that
relations between angles are similar in both lines. This insight is used in the
algebraic approach described in Chapter 19.
individual canals, roads, and islands, which can change the calculation of
their topological invariants. Study needs to be done to determine the extent
that Betti numbers and Euler characteristics would be useful for matching
incompletely defined spatial objects. Similarly, the extent to which Betti
numbers and Euler characteristics would be useful for matching spatial ob-
jects that are defined with errors such as roads and canals with incorrect con-
nections needs further consideration.
Dey et al. [Dey, Edelsbrunner & Guha, 1999] address two important
computational topology problems in cartography: (1) rubber sheets and (2)
cartograms. These problems involve two purposeful deformations of a geo-
graphic map:
(1) bringing two maps into correspondence -- (e.g. two geographic maps
may need to be brought into correspondence so that mineral and agricul-
tural land distributions can be shown together) and
(2) deforming a map to reflect quantities other than geographic distance and
area (e.g., population).
Both tasks (1) and (2) belong to the group of problems that consist of
matching similar features from different databases. However, there is an im-
portant difference between them and our spatial object correlation task.
Dey and Guha [Dey & Guha, 1998] assume that n reference matched
points are given between the two maps for rubber sheeting: “To model this
problem let P ⊆ M and Q ⊆ N be two sets of n points each together with a
bijection b: P → Q. The construction of a homeomorphism h: M → N that
agrees with b at all points of P is popularly known as rubber sheeting [Gill-
man, 1985]. Suppose K and L are simplicial complexes whose simplices
cover M and N: M = |K| and N = |L|. Suppose also that the points in P and
Q are vertices of K and L: P ⊆ Vert K and Q ⊆ Vert L, and that there is a
vertex map v: Vert K → Vert L that agrees with b at all points of P . The ex-
tension of v to a simplicial map f : M → N is a simplicial homeomorphism
effectively solving the rubber sheet problem.”
Dey and Guha [Dey & Guha, 1998] also review variations of the con-
struction of such complexes K and L. They note that [Aronov, Seidel & Sou-
vaine, 1993] consider simply connected polygons M and N with n vertices
each where they show there are always isomorphic complexes |K| = M and
|L| = N with at most O(n2) vertices each. They also prove that sometimes
Ω(n2) vertices are necessary and they show how to construct the complexes
in O(n2) time. In addition, Dey notes that [Gupta & Wenger, 1997] solve the
same problem with at most O(n + m log n) vertices, where m is the mini-
mum number of extra points required in any particular problem instance.
17. Imagery intergration as conflict resolution decision process 421
Below we discuss relationship between our framework and the tasks dis-
cussed by [Bern et al., 1999].
Shape Reconstruction from Scattered Points is an important part of our
algebraic approach. Methods for reconstructing a linear feature shape using a
criterion of maximum of local linearity can be developed using this ap-
proach.
Shape Acquisition. Matching/correlating spatial objects requires shape
acquisition from an existing physical object. We believe the solution of this
problem depends on developing a formal mathematical definition for fea-
tures such as roads and drainage system as complex objects in terms that co-
incided with the USGS/NIMA Topological Vector Profile (TVP) concept.
Shape Representation. Bern et al. [Bern et al., 1999] list several repre-
sentation methods: unstructured collections of polygons (“polygon soup”),
polyhedral models, subdivision surfaces, spline surfaces, implicit surfaces,
skin surfaces, and alpha shapes.
In light of this, we believe that an algebraic system representation such
as that described below, which includes scale-dependent but robust invari-
ants, axioms and permissible transformations should be developed.
Topology Preserving Simplification. It is critical for many applications
to be able to replace a polygonal surface with a simpler one. However, such a
process is “notorious for introducing topological errors, which can be fatal
for later operations” [Bern et al., 1999].
A generic approach developed in [Cohen, Varshney, Manocha, Turk,
Weber, Agarwal, et al., 1996] can be used where a simplified 2-manifold is
fitted into a shell around the original. In addition, we are developing a spe-
cific method for the simplification of linear features critical for the algebraic
approach using the similar ideas.
Our goal is to enforce algebraic invariants using simplification. The
main idea is that the simplification criteria should not be completely defined
in advance but be adjusted using extensive simulation experiments and
machine learning tools to identify the practical limits of robustness of the
algebraic invariants.
This includes the classification of linear features to identify simplification
options. For example, let a be a linear feature, b its simplification, λ a meas-
ure of the closeness between a and b with λ(a,b) < δ, where δ is a limit of
acceptable simplification. This limit can be developed as an adjustable pa-
rameterized function of a feature a, δ = δ(a).
422 Chapter 17
lines and objects on each map preserves absolute elevations and locations on
the border connecting maps, and avoids discontinuity of the objects on the
border.
In the situation when consistent conflation is impossible, some non-linear
distorting conflation methods are used as part of USGS standard. These
methods differ in the number of neighboring elevation profiles involved in
interpolation. The number of profiles depends on the number of vertical reso-
lution units where the edge breaks. For such situations, measures of correct-
ness of conflation are compositions of:
• measures of the distortion of relative distances on both maps,
• measures of the distortion of absolute elevations and locations of
objects on the edge and on the interpolated profiles,
• measures of the discontinuity on the border, and
• measures of the distortion of topology of objects due to compos-
ing two maps.
We represent each of these measures as one of the three measures forms
described in the previous section: pessimistic, optimistic, and average meas-
ure forms.
Traditionally, all measures use a Euclidian separation distance between
parts of the spatial object. For instance, the standard measure of vertical dis-
tortion used in digital elevation models (DEM) is a root-mean-square
(RMSE) error statistic, E of an average Euclidian closeness between eleva-
tion data. Next two thresholds, T1 and T2 are set up for these error statistics:
1
membership value
0.75
0.5
0.25
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
meters
Figure 5 shows a more flexible version of these functions with wider sets
of uncertain values between desired, retained temporarily, and rejected val-
ues of RMSE, which can be more realistic measures in some tasks.
Adjusted RMSE
1
Membership values
0.75
0.5
0.25
0
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26
meters
These membership functions are constructed and stored for each specific
task in the conflation knowledgebase providing task-specific formalizations
of concepts “desired”, “retain temporarily”, and “reject.” Different tasks may
have different levels of desired distinction and their definitions, as we have
illustrated in Figures 4 and 5. The value of RMSE is a property of the data
and exists independently of the task, but its evaluation is a task-specific. A
fuzzy logic approach also permits introducing a hierarchy of fuzzy logic
membership functions to capture a variety of task-specific evaluations. Next,
we consider a more general approach based on context spaces [Kovalerchuk
426 Chapter 17
& Vityaev, 2000] that can capture a richer collection of task specifics and
context.
4. VISUALIZATION
There are several stages in the problem solving process: discovering, im-
plementing (in hardware and software), using, and presenting results. To as-
sociate a visual process with this, all major steps should be visual. A com-
plete visual step is performed visually, represented visually (visualized) and
animated when considering dynamic steps.
Most visualization activities concentrate on two stages: using a well-
defined process for solving tasks and representing results [Mille, 2001].
Visualization of these stages permits speeding up the process and assures
quality control. Examples are abundant, e.g., finding North by using stars
such Polaris and the Big Dipper and Little Dipper stars. Animation of Py-
thagorean Theorem proof shown in Figure 4 in Chapter 1 provides another
example of a visual process. The first example has been very useful for solv-
ing the navigation problem for centuries and the second example has been
used in education for two millennia. AutoCAD provides examples of visual
implementation stage. Much less has been done for visualizing process dis-
covery. Typically, on this stage, visuals serve the role of an informal insight
for process algorithm development. There are historical facts that such in-
sights played a critical role in the whole problem solving process. Consider
again Albert Einstein’s evidence quoted in Chapter 3. This is the most chal-
lenging and creative stage of problem solving. Visual and spatial data min-
ing have recently emerged as major tools for this stage.
Modern geospatial studies form a natural domain for advanced visual de-
cision-making techniques where typical the spatial relations between objects
such as larger, smaller, above, below are visual by their nature. Cognitive
science research reviewed in Chapter 3 indicates that human reasoning with
such spatio-visual relations is more efficient than with relations that are
only visual relations (e.g., cleaner – dirtier).
As an illustration, consider two vector image data sets consisting of 1497
line segments and 407 line segments respectively. Are these two images of
the same scene? Figures 6 and 7 display both the data sets and the parts of
the sets that are held in common and those that are unique.
17. Imagery intergration as conflict resolution decision process 427
Figure 6. Vector image data sets of 1497 and 407 line segments
a b
Figure 7.Common (a) and unique (b) segments for the vector data sets in Figure 6. These
segments were found using methods described in Chapter 19.
The number of common polylines supports the conclusion that these are
two images of the same scene. The number of polylines not in common illus-
trates the need for additional information to know whether these are the re-
sult of incomplete or faulty feature spaces, different image resolutions, or
changes in the scene between the times of acquisition. With additional in-
formation from the metadata associated with this vector data or perhaps from
other sources, it may be possible to quantify the differences that this visuali-
zation makes clear.
A tight link between visual decision-making and conflation is clarified
when we notice that the decision-making task provides a specific goal for
conflation. We call such a conflation approach a task-driven (or mission
428 Chapter 17
ray of tasks. Such agents operate with multiple feature representations and
resolve conflation conflicts using rules or other conflict resolution strategies
according to the task at hand. For instance, if the task is a global strategic
overview of country's conditions, then a strategic conflation agent is acti-
vated. If the goal is to support a local reconnaissance unit, then the system
monitor should activate a specialized local reconnaissance conflation
agent.
In [Doytsher et al., 2001; Rahimi et al., 2002] hierarchies of spatial con-
flation agents (CA) are suggested. The top-level agents are defined accord-
ing to the type of matching geometric elements they use: points, lines, or ar-
eas. Agents based on matching points are called point agents. Next point
agents are classified in subcategories: For instance, some agents can match
images using corner points on buildings (building agents).
Other agents can match images using distinguishing points on roads such
as intersections and turning points. Agents based on lines (line agents) can
be classified further similarly as building, road, or railroad agents. For in-
stance, a line building agent matches images using lines on buildings such as
roof lines and agents based on area (area agents, e.g., lake agent). The rea-
soning behind the introduction of the dynamic selection of conflation agents
versus a static approach with a single conflation agent is the flexibility of the
dynamic approach. The dynamic approach is task-specific; a specialized
agent can be selected depending on the task at hand. A dynamic system can
monitor and map discrepancies related to a specific user's task and select an
appropriate conflation agent to resolve the problem.
The prototype described in [Rahimi et al., 2002] provides a set of menus
for a user: (i) to declare the conflation tools and their parameters to be used,
(ii) to select method for determining matched features, and (iii) to select
method for evaluating links. Below we briefly describe some conflation
agents [Doytsher et al., 2001].
The point agent (PA1)
• selects points (nodes) as counterpart features and
• builds a local rubber-sheeting transformations based on selected
counterpart points, and
• converts one map to another map by using the found transforma-
tions.
This agent is adequate only for cases where rubber-shitting between
known control points does not create significant errors in matching interme-
diate points that differ from control points.
The line agent (LA1) conflates maps by running code that implements
the following sequence of algorithmic steps:
• detecting counterpart linear features,
430 Chapter 17
where aik identifies attribute ak for feature fi and vik identifies the value of
attribute ak for feature fi and terms for feature fj on the second line are de-
fined similarly. For matching numeric attributes ak, membership matching
functions are used and for linguistic attributes similarity tables are used. The
overall Matching Score (MS) for attributes is given by:
where Sk(fi, fj) is the similarity function between features fi and fj for their at-
tribute ak, N is the number of attributes that are common to both features fi
and fj, and Wk is the weight computed by the rule-based expert system for the
attribute ak. For instance, a rule for computing weights Wk and Wm for attrib-
utes ak and am could be:
If v1k = 1 & v2m= 3 & v1m= v2m= 10, then Wk= 0.8 & Wm=0.4.
Lakin [Lakin, 1994; Lakin 1987] described Visual Agents (VAs) as soft-
ware entities, which assist people in performing graphical tasks, such as
making a text-and-graphic record of a group meeting, live and on-the-fly,
and showing it to participants during the meeting to enhance collaboration.
The group members can see the record as it is being made, offering sugges-
tions and corrections. For instance, a visualization agent can act as a white-
board assistant helping to graphically record the conversation and concepts
of a working group on a large display. The visualization agent can help dis-
play on-the-fly global objectives, immediate goals, tools, factual data and
R&D options that are been discussed by the group that is making decision on
a business strategy.
Similarly, a conflation visualization agent may have a complete access to
imagery analyst’s actions doing the conflation in collaboration with other
analysts and software conflation agents. Thus, it can be true visual human-
computer collaboration in problem solving. The visual agent includes com-
putational engines for processing text-graphic activity, both static images
resulting from the activity as well as actual moment-to-moment dynamics of
the activity itself.
6. CONCLUSION
gle task while a community of agents can carry a wide array of tasks. Such
agents operate with multiple feature representations and resolve conflation
conflicts using rules or other conflict resolution strategies according to the
task at hand.
7. ACKNOWLEDGEMENTS
9. REFERENCES
Lynch, M. and Saalfeld, A., Conflation: Automated map compilation -- A video game ap-
proach, Proceedings, AutoCarto 7, 1985, Washington, 343-352.
Mark, D., Geographic information science: critical issues in an emerging cross-disciplinary
research domain, NSF workshop to assess the needs for basic research in Geographic In-
formation Science and Technology, January 14-15, 1999.
Mille, H. (Ed) Geographic Data Mining & Knowledge Discovery, Taylor and Francis, 2001
McKeown, Jr., D.R., 1987, The role of artificial intelligence in the integration of remotely
sensed data with geographic information systems, Geoscience And Remote Sensing, Ge-
25/3,330-348.
Perner, P., Data Mining on Multimedia Data, Springer, 2002.
Pope P., Theiler, J.,Photogrammetric Image Registration (PIR) of MTI Imagery, Space and
Remote Sensing Sciences Group, Los Alamos National Laboratory, Los Alamos,
NM,2003, http://nis-www.lanl.gov/~jt/Papers/pir-spie-03.pdf
Rahimi, S., Cobb, M., Ali, D., Paprzycki, M., Petry, F. A Knowledge-Based Multi-Agent
System for Geospatial Data Conflation, Journal of Geographic Information and Decision
Analysis, 2002, Vol. 6, No. 2, pp. 67-81
Shah, M., Kumar, R., (Eds.) Video Registration, Kluwer, 2003
Shih, T., Distributed Multimedia Databases: techniques and applications, Idea Group Publ.,
2002.
Terzopoulos D., C. Studholme, L. Staib, A. Goshtasby (Eds) Nonrigid Image Registration,
Special issue of Computer Vision and Image Understanding Journal, vol. 89, Issues 2-3,
pp. 109-319, 2003
Ursino, D. Extraction and exploitation of intensional knowledge from heterogeneous inforla-
tion sources, Springer, 2002
Wang, J., Chun, J., and Park, Y.W. GIS-assisted image registration for an onboard IRST of a
land vehicle. Proc. SPIE Vol. 4370, p. 42-49, 2001
Zitová B., Flusser J., Image registration methods: a survey, Image and Vision Computing. 21
(11), 2003, pp. 977-1000
Chapter 18
MULTILEVEL ANALYTICAL AND VISUAL
DECISION FRAMEWORK FOR IMAGERY
CONFLATION AND REGISTRATION
Abstract: This chapter addresses imagery conflation and registration problems by pro-
viding an Analytical and Visual Decision Framework (AVDF). This frame-
work recognizes that pure analytical methods are not sufficient for integrating
images. Conflation refers to a process similar but more complex than what is
traditionally called registration, in the sense that there is, at least, some con-
flicting information, which predates it and post conflation evaluation that
postdates it. The conflation process studies the cases of two or more data
sources where each has inaccuracy and none of them is perfect. The chapter
covers complexity space, conflation levels, error structure analysis, and a
rules-based conflation scenario. Without AVDF, the mapping between two in-
put data sources is more opportunistic then definitive. A partial differential
equation approach is used to illustrate the modeling of disparities between data
sources for a given mapping function. A specific case study of AVDF for
pixel-level conflation is presented based on Shannon’s concept of mutual en-
tropy.
Key words: Imagery conflation, registration, analytical and visual decision framework,
complexity space, conflation level, rule base, entropy, mutual information.
1. INTRODUCTION
and security and monitoring, use imagery for diagnostics, measurement, fea-
ture extraction and decision making.
Scientists, engineers and managers base their decisions on increasingly
complex and higher dimensional images captured by new instruments and/or
generated using models and algorithms though vastly different scales. These
images may be composed from different viewing angles with different
physical characteristics and environment constraints. Common to these sci-
entific and application efforts is the challenge of performing information
assimilation from multiple modality imagery sources to provide sufficient
evidences so that a decision can be made with incomplete information and
under operational constraints; e.g. real time practices.
Analytical and Visual Decision Making (AVDM) framework refers to
a process using visual environments by and/or for decision makers to acquire
quality information to support spatial decision making. This framework
advocates the method of mission specific approach (MSA). Mission specific
is defined as a generic scope augmented with a specific task; e.g., map mak-
ing is a generic job whereas making a road map in area X is a mission spe-
cific task. The integration of information from different maps is in general a
generic conflation job, whereas the conflation of roads for trafficability
evaluation using different data sources in a well-defined area will be a mis-
sion specific task. In other words, a conflation process will have a set of data
sources with a well-defined time, spatial, and attribute framework within
which conflicts can be modeled and managed. In general, the state space at-
tributes outside the conflict extents serve as the reference framework.
In general, visual decision making is a nonlinear process either purely
visual or combined with analytic means. For example, conflation can be ei-
ther a very complex decision making task for trafficability assessment under
a combat situation (visual) or a simple translation function for a well mapped
local street from high quality resolution imagery (analytic). Prospecting in
oil and gas exploration is a typical conflation type decision making process
using combined visual and analytic means.
The purpose of conflation, according to the National Technical Alliance
(NTA) is to create a third dataset that is better than either of the original
sources by combining information from the two. The report by Swiftsure
Spatial Systems Inc. [2002] concluded that no one has yet achieved fully
automated conflation; and vector-to-imagery conflation is required for future
development of the method.
Conflation is a process of identification, correction, and synthesis of dis-
parate information including individual features from multiple imagery (lit-
eral and non-literal) and/or vector sources. Conflation consists of three types:
vector-to-vector, imagery-to-imagery and imagery-to-vector or vice versa.
18. Multilevel analytical and visual decision framework for imagery 437
conflation and registration
For vector images, with identified features (e.g., ESRI shape file format)
conflation means finding features or segments both matched and unmatched.
For raster imagery, theoretically, two images are conflated if values of all
corresponding pixels are equivalent, that is the matching ratio R=1.0 in the
object space, given fully registered and calibrated spatial and spectra re-
sponse. For conflation between vector and raster images, conflation means
the establishment of correspondence for features from all given objects.
AVDM framework for mission specific (task specific) conflation is a
process of using visuals to reduce the degrees of freedom and/or increase the
efficiency for identifying the relationships between data sets. Here the task is
a triple A=<G, K, D>, where G is a goal, K is domain specific knowledge
that includes the domain ontology and D is available data. The goal G for a
well-posed task provides also a criterion for identification that the goal of the
task A is reached. In formal terms, the goal criterion can be expressed as
some predicate, CG, such that if CG(A)=1 (true) for the task A then the goal
G has been reached.
To a certain extent when a well-defined mapping function exists between
two data sets, registration can be treated as a simplified case of conflation.
Extensive review of registration methods can be found in [Brown, 1992; Zi-
tová & Flusser, 2003]. A NSF-funded research-planning workshop on ap-
proaches to combat terrorism [Moniz & Baldeschwieler, 2002] also listed
image registration as an open problem:
…an important area of research is the registration of images from differ-
ent times and modalities. Registration of such images onto a single coor-
dinate system is vital for automated analysis but can be extremely chal-
lenging. The lack of robust image registration algorithms remains a limit-
ing factor in many fields.
Registration is closely related to, but differs from conflation. The goal of
registration is to provide geo-reference for a pair of images without matching
individual features. Traditionally, registration is a process of seeking the
mapping function among all the data points using its subset of so-called con-
trol points. Conflation, on the other hand, includes the matching of features.
Thus, registration can provide a less specific match than conflation and, in
essence, conflation is a tougher challenge.
Source conflict and information change diagnosis become the character-
istics of conflation process, while seeking the mapping function between two
images is the key to registration. Typical cases for conflation include the
matching of a subset of roads among the road networks or across a mapping
boundary, the mapping of extracted features across different scales or resolu-
tion, and the combining of information from sources with large spectral
438 Chapter 18
2. IMAGE INCONSISTENCIES
cause it is an average among all the potential conjugate points when the solu-
tion for specific set of features requires the partition of feature sets. The lim-
ited extent of the feature mismatch makes the matching problem "local” in
nature. In this case, neither the global nor the local solution may be used for
a mission specific solution, but together they may. Further more, the local
solution may differ from one road (Marines) to the next (Camp Lejeune),
thus, the mission specific nature of conflation.
Figure 2. Local inconsistency between imagery source and
vector product. See also color plates.
440 Chapter 18
Human activities and natural occurrences are the two most common
causes of the inconsistencies that occur in the imagery and its derived prod-
ucts. Among all the conflicts, vector-to-vector is the most studied case. Its
disparities originate frequently from multi-source data over a common ob-
servation area.
The disparities come from two sources; i.e., raw data and vector genera-
tion, due to scale, resolution, compilation standards, operator license, source
accuracy, registration, sensor characteristics, currency, temporality, or errors
[Edwards & Simpson, 2002].
The disparities are two types of inconsistencies; i.e., spatial and attrib-
utes. Spatial disparities tend to be analytic while attributes disparities are due
to decision-making from operators based on intensities. Other characteristics
of inconsistency include discrete in space and time, variable in magnitude
and direction, and often times incomplete. Below is a case study to illustrate
disparity structure given a pair of information source and a mapping func-
tion.
∂YT / ∂X c = − S * Sinθ
∂X T / ∂Yc = S * Sinθ
∂YT / ∂Yc = − S * Cosθ
∂X T / ∂θ = S * Sinθ * X c + S * Cosθ * Yc
∂YT / ∂θ = − S * Cosθ * X c + S * Sinθ * Yc
Equation (4) calculates the total differential along the X direction using
the derived partial derivatives. This total differential includes nonlinear com-
ponents for angle measurements.
dX T = ∂X T / ∂X s * dX s
+ ∂X T / ∂X c * dX c
+ ∂X T / ∂Yc * dYc
+ ∂X T / ∂θ * dθ
442 Chapter 18
= 1* dX s
− S * Cosθ * dX c (4)
+ S * Sinθ * dYc
+ ( S * Sinθ * X c + S * Cosθ * Yc ) * dθ
Equation (5), similarly, calculates the total differential along the Y direc-
tion using the derived partial derivatives. This total differential is also a
combination of linear and nonlinear components. The Y components are
generally symmetrical to the total differential along the X direction.
dYT = ∂YT / ∂Ys * dYs
+∂YT / ∂X c * dX c + ∂YT / ∂Yc * dYc + ∂YT / ∂θ * dθ (5)
= 1* dYs − S * Sinθ * dX c − S * Cosθ * dYc
+( − S * Cosθ * X c + S * Sinθ * Yc ) * dθ
In an attempt to relate the total disparity to the original information
sources, equation (6) calculates the square of the total differential along both
X and Y directions as the function of measurements.
Equations (1) through (5) represent the disparity structure that carries
an orthogonal component assumption with a description of various types of
disparity terms, their relationship and relative importance, while equation (6)
describes the combined total under the orthogonal assumption. Some of the
terms are data source related, and some of them only reveal themselves when
information sources are combined.
d 2 X T + d 2YT = dX s2 + S 2Cos 2θ d 2 X c + S 2 Sin 2θ d 2Yc +
S 2 ( Sin 2θ X c2 + Cos 2θ Yc2 )d 2θ − 2Cosθ SdX c dX s
+2 Sinθ SdYc dX s + 2 Sinθ SX c dθ dX s + 2Cosθ SYc dθ dX s (6)
−2Cosθ Sinθ S dYc dX c − 2Cosθ Sinθ S X c dθ dX c
2 2
The simplified version of equation (6) is present in equation (7) for the
square of the total disparity, where one of the two information sources is
assigned as the standard (S) information source and the other is the con-
flicting (C) information source. This assignment is arbitrary because both of
them have their original source errors that lead to the conflicting informa-
tion when they are put together into the same framework viewed under a
selected mapping function. Before this step of disparity analysis, conflicting
information is simply a concept.
d 2 X T + d 2YT Total Disparity Budget
Disparity due to Standard source
dX s2 + dYs2 Trans. disparity
Disparity due to Coflicting source
S 2 d 2 X c + S 2 d 2Yc Trans. disparity
S 2 ( X c2 + Yc2 )d 2θ Rotation disparity
(7)
2 S 2 dθ ( X c dYc − Yc dX c ) Dependent terms
Disparity due to cross terms of Standard and Coflicting sources
2 S *( Sinθ dYc dX s − Cosθ dYc dYs
− Sinθ dX c dYs − Cosθ dX c dX s
+ Sinθ X c dθ dX s − Cosθ X c dθ dYs
+ Sinθ Yc dθ dYs + Cosθ Yc dθ dX s )
The total disparity describing the conflicting information in detail using
sums of squares, the left side of Equation (6), has 30 different terms, the
right side of Equation (6). Equation (7) presents a choice of grouping of the
30 terms by disparity types; i.e., terms from sources designated the stan-
dard and conflicting, and cross terms between the standard and conflicting
sources. After using partial derivative analysis, and placing the disparities
into a standard-conflation framework, the right side of the equation (7) in-
cludes several types of disparities that form a total disparity structure:
• the translation disparity along the X and Y directions,
• their combinations with the scaling factor, that is labeled Errors,
due to a conflicting source in equation (7),
• the position dependent rotation disparity, and
• position canceling but rotation dependent disparity.
In equation (7), the cross term between S and C are all mixed terms either
position independent or position dependent. The designation for “due to
standard source” is for terms related to the standard information sources, and
“due to conflicting source” for conflicting information sources. The cross
444 Chapter 18
terms are related to both standard and conflicting information sources. The
standard and conflicting terms are coded by their corresponding subscripts.
In essence, the integration of data has some prospect of providing better
information; but it also presents opportunities to introduce new errors after
conflation resolution, and potentially larger ones if the mapping function or
its parameter derivation is deemed incompatible with the data sources. This
makes AVDF an important component of conflation resolution. Among the
conflict and cross categories, the assessment of disparity terms that are loca-
tion dependent is key to visualization and decision-making.
The coupling and magnification of location dependent and angle-
measurement error terms arguably will be the most challenging disparity to
quantify. This necessitates a visual decision making contribution in addition
to analytic analysis. For example, if the translation error terms from the des-
ignated standard information source are ignored (as in the standard depend-
ent-and-independent variable analysis methods), the corresponding dispari-
ties have to be compensated by the other terms in the equation. This creates
additional inconsistencies in the subsequent analysis. The illustrated similar-
ity transformation disparity analysis example shows the fundamental differ-
ence between conflation and standard registration where one information
source is considered error free. For more complicated transformation cases,
direct linear transformations, or more relaxed mathematical models are re-
quired to map the spatial relationship between information sources. They
will result in much more complicated disparity terms than using the similar-
ity transformation equation as the mapping function. The state-of-the-art
methodology for conflation resolution gets around the complicated and nec-
essary disparity model by using vector attributes as a linking mechanism
[Edwards & Simpson, 2002], thus completely bypassing the spatial modeling
aspects of the process in the first step of the process. AVDM recommends a
perspective of reducing the dependency of new source disparities on location
by reducing analytic scopes, using visual cues to aid the conflation process.
lying model and quality of the data sources. Given specific data sets and
conflation objectives, the complexity space provides a first level framework
to guide the conflation process. For example, when a particular objective
calls for a very complex model, huge amounts of data, and very high accu-
racy requirement, the first level AVDF analysis in the complexity space may
indicates the impossibility of solution attainment. Subsequently, the com-
plexity space may provide visual decision making alternatives for a less de-
manding solution.
The complexity space provides a framework to view, identify, establish,
and partition the conflation objectives. The conflation levels in the next sec-
tion are proposed to provide a process and mechanism to navigate through
the compartmentalized complexity subspace when a conflation objective is
partitioned into actionable conflation levels. Disparity analysis provides a
mathematical framework and quantities for constructing a practical process
and its associated algorithms to estimate and/or solve the conflation problem
at a specific level. The goal of AVDM framework is to put the complexity
space, conflation levels, and disparity structure analysis into proper geospa-
tial region of interests or scale and boundary. For example, in the case of
emergency operation, a high decision-making level official needs only to
know an approximate location for a large scale environment disaster, such as
a large oil tanker spill, to be able to mange resource allocation, whereas a
firefighter needs to know the exact location to be able to rescue potential
victims successfully. In addition, this AVDF provides a mechanism to cate-
gorize and compare different conflation scenarios; e.g., planning an air strip
for air trafficability vs. defining a route for tank columns in a hostile envi-
ronment. Different conflation scenarios have different needs in visual and
analytical information content integration from multiple information
sources for decision making. Synergistic integration of analytic and visual
decision-making provides a mechanism to elucidate conflation scenarios bat-
ter than using either one individually.
4. CONFLATION LEVELS
Propagation of
discovered
disparities to be Lower level conflation
resolved on
the upper levels Subpixel level
Pixel level
Diagnose Objective &
Disparities Pixel group level Functions
Each of these levels Li has its own conflict between map and imagery that
needs to be resolved. Criteria for resolving conflicts in level Li is coming
from an upper level Li+1. Below in Table 2 we illustrate types of conflict for
each level.
5. SCENARIO OF CONFLATION
situation for which there exists a suitable conflation method. Thus, this is a
description of the situation and the conflation method. It is not necessary that
the method be very precise as provided by GPS. For instance, for two satel-
lite images with metadata on sensor model and location, the gold standard
could be an orthorectification model. We consider both a conceptual
orthorectification model and a model populated with actual parameter values
using the description for the data source (sensor model and location).
The Knowledge Base (KB) contains descriptions of data sources (DDS)
to be conflated and prior knowledge (PK). Prior knowledge is information
that can be used to populate a GS for data source type. It differs from data
that is directly coming from image metadata.
DDS can include metadata if available. Some pairs (DDS, PK) are
matched with an individual GS by a matching function, M,
Function M is not fully defined, for some DDS and PK there may be no Gold
Standard. One of the reasons could be that DDS and PK are not complete.
Function M should be computed for every specific DDS and PK to produce
GS. The conflation scenario based on the Gold Standard consists of four
steps:
(1) Identification of Conflation Situation (CS) that includes identifica-
tion of DDS and PK,
(2) Identification of Gold Standard (GS) by computing
M(DDS, PK) = GS.
(3) Conflation using Gold Standard (This step is abbreviated as CG),
(4) Assess disparities for feature of interest.
18. Multilevel analytical and visual decision framework for imagery 451
conflation and registration
1 2 3 4
RuleCS-GS1
AOC
UseofModality Rule GS 16
Rule 12
Rule CS-GS2 Feature Selection (x,y)
Rule GS 17
Rectified
Image ConvertT oXYZ *
Update Rule GS1 1 Yes2: Rule GS13
VirtualExpert Feature Selection (Z)
System Rule GS18
And
DataSources
Call for Expert Conflation Function
NO: Rule GS5 No2: Rule GS8
Quality Geometric
Evaluation Primitives: Geometric Primitive Extending Function
Rule GS3 NO: Rule GS4 No: Rule GS9
The major source for identifying the conflation situation is the data
source description available. An upper level identification criteria of confla-
tion situation is given by a simple predicate Available (<data source>)=Y/N.
On the next level of detail the same predicate is applied for Metadata and
Prior Knowledge:
Here Metadata describe a specific data set and Prior Knowledge may be ap-
plicable to a variety of datasets.
Next it is assumed that datasets to be conflated are available, that we do
not need to test predicates Available(<datasets>), but we need to identify
types of data available: hardcopy, data that describe geospatial features (fea-
ture-based data), physical sensor parameters, Earth parameters, or data about
a specific object.
These data types can be represented by predicates, for instance, it can be
written as Earth_parameter(<data source>)=Y/N and Hardcopy(<data
source>)=Y/N. Table 4 and Figure 8 show upper-level rules to enrich data
sources and conflation situation to be able to conflate data.
Data Source
The focus of this section will be on the links between rules in addition to
what have provided in Figure 7. In essence, virtual imagery expert can be
thought of as a closed operating system completed with a set of linked hier-
archical rules. Assuming that rules are built first, mission specific approach
(MSP) will make use of the rules to evaluate disparity between two imagery
data sources. The logic of dealing with disparity is presented in Table 5 with
two rules: GS1 and GS2.
Rule GS1 can be further specified in its then-part. “Evaluate <disparity>”
can include two steps also encoded as rules
• Derive <disparity(data source 1,data source2)>,
• Model <disparity>, and
• Evaluate <disparity>.
Table 5. Evaluate disparity using Gold Standard
Rule ID Name If Then
GS1 Evaluate <data_source1> is converted to <GS> evaluate <disparity>
disparity and and
<Features are in X,Y,Z coordinates> set flag=1 if disparity is
and low
<data_source2> is converted to <GS> and
and { set flag=0 if disparity is
<Features are in X,Y,Z coordinates> high and use Rule CG2 }
GS2 Modify <disparity> is high obtain more information
from <expert> or
<other sources>
for adjusting <GS>
Figure 9 shows links between rules for building conflation rules based on
a gold standard and the analysis of prior knowledge. The logic of rule CG1 is
depicted in Figure 10.
Conflation Situation
Rules, CS Rules
Prior Knowledge
No Yes
GS :Rule CG1
Flag =1 Flag =2
Table 8 presents rules when geo-referencing <function F> does not apply
to the whole image. It deals with areas of interest (AOI) and with areas of
conflation (AOC). Rule GS18 deals with their specific type --<flood type>.
Other types could be such as <war_type> or <fire_type>.
18. Multilevel analytical and visual decision framework for imagery 459
conflation and registration
7.1.1 Entropy
be the probability of the joint occurrence of xi for the first and yj for the sec-
ond. The joint entropy of two variables X and Y is
H ( X , Y ) = − ¦ p ij lg p ij
i, j
Figure 11: A 1K×1K frame of Landsat imagery from an agriculture area in Idaho. Circular
features nested in a regular grid in the imagery indicate a nominal irrigation pattern.
Figure 12. A set of histograms formed by dividing the imagery into the corresponding equal
quadrangles. The subplot in the center of the figure shows the histogram of the whole image.
18. Multilevel analytical and visual decision framework for imagery 463
conflation and registration
The comparison of these histograms indicates that they are location and
size dependent (see also Figure 14). The location dependency forms the ba-
sis for spatial information correlation. The size dependency becomes a factor
in the efficacy of applying specific methods.
To reduce the dependency of the histogram of accumulative counts on
window size, the histogram values are divided by the total of number of
points in the selected window, providing a probability density function p and
enabling the calculation of entropy. The transformation from probability
density distribution p to entropy E reduces all the information from the se-
lected window of data to only a single number, i.e., the number of bits over
the area of interests.
This data compression results in quantifying the information from the se-
lected window as an average number of bits needed to convey radiance in-
tensity information for all the image pixels. This averaging nature is inher-
ited from the normalization process of converting the frequency counts to
probability in addition to the application of unit measure provided by the
base value of logarithmic operation in the entropy formula. Figures 12 and
13 illustrate the conversion of histogram to a probability density function.
The magnitude of vertical axis in Figure 12 is on the order of 104, while the
magnitude of the vertical axis in Figure 12 is normalized to around 1/10 of
the total number of points in the selected window. The horizontal axis repre-
sents pixel intensity values ranges from 0 to 100 out of 256 bins. Pi in Figure
13 indicates the probability for a given intensity digital number bin.
Figure 13. Illustration of the conversion of histogram to probability. The entropy value calcu-
lated using Eq. 8 is of 4.7576 bits. It is within the 8 bits of Landsat accuracy as expected.
464 Chapter 18
Figure 14. Entropy (H1, H2) and mutual information (M (I1,I2)) calculated based on the quad-
rangles 2 and 4 from Figure 13. Vertical axis on the left marks the measure for the individual
DN bin while the vertical axis on the right marks the mutual information in terms of number
of bits.
Figure 15. Defining imagery overlapping area using the maximum MI criteria. Top left is the
input imagery. Top right is the image chip for the center of the original imagery. Lower left
is the results from MI maximization. Lower right pictures the absolute entropy difference
between a given window from the original imagery and the imagery chip.
466 Chapter 18
the white vector mapping the lake with the sharing a common original, a dif-
ference vector results marking the misalignment pointed by a curved arrow.
The curved arrow in Figure 16 points to the vector representation of mis-
alignment with components of offset, scaling and rotation. To show the
benefit of MI, this pair of radiance intensity images is first transformed to
entropy using a window size on the order of 60 pixels. The reduction in high
frequency signal from the entropy transformation demonstrates MI’s
capability of reducing local variability. Figure 16 also reveals that the
transformation of the radiance intensity to entropy is a non-literal one; e.g.,
both the river (dark pixels) and agriculture (bright pixel) can have medium to
high entropy. The comparison will be demonstrated using correlation of
data windows with size on the order of 600 pixels.
Figure 17 illustrates the correlation of both radiance intensity and its en-
tropy images with 35-pixel lag in both horizontal and vertical directions. Ra-
diance intensity-based correlation is to the left side of the figure and entropy
468 Chapter 18
to the right. The correlation window is centered near the locus of the image.
The comparison of these two correlations indicates the entropy-based data
has number about 0.1 higher than that of intensity based. Further more, en-
tropy shows only one peak while the correlation appears to have a sharper
peak but with multiple potential solutions. Notice the improvement of corre-
lation coefficient of entropy over intensity, and the existence of a unique
peak for entropy
Figure 17. Comparison of the cross correlation between intensity and entropy with the rest of
parameters staying constant
To further demonstrate the benefit of using entropy and MI, this process
is repeated over a 7 by 7 regular grid centered across the imagery pair. The
results are plotted in Figures 18 and 19 with radiance intensity-based correla-
tion displayed in Figure 18 and entropy-based results in Figure 19. There are
total 49 sub plots corresponding to their locations within the image in each
of the correlation figures. The peaks, or the maximum correlation, suppos-
edly are the solutions for control registration points.
The advantages are at least twofold: 1) the improvement on computa-
tional complexity, and 2) the consistent and unique solution for mapping
control pairs across 100s of pixels. These advantages could potentially pro-
vide solutions to map overlapping areas and provide the necessary inputs for
higher order registration beyond standard physical photogrammetry models.
One of the potential key contributions from the MI based approach is its
ability to match the required solution for a pre-selected model or method at
the similar spatial resolution and extent requirements. The ability of MI to
transform imagery and its derived product to the quantity of information
domain may provide the necessary mathematic foundation for conflation.
18. Multilevel analytical and visual decision framework for imagery 469
conflation and registration
Figure 18. Intensity correlation non-unique and wrong solution. Radiance intensity-based
correlation from total of 49 locations centered on a 7x 7 grid over the image pairs in Figure
16. The connection of the correlation peaks does not have the same trend as the registration
vector displayed in Figure 16. See also color plates.
Figure 19. Entropy correlation. The trend matches with data. Entropy-based correlation with
the same lay out in 18. The connection of the of the correlation peaks does have the same
trend as the registration vector from Figure 16. See also color plates.
470 Chapter 18
8. CONCLUSION
identification of the gold standard; (4) conflation rules based on the gold
standard. Future work includes developing and implementing more rules and
detailed quality assurance tools.
9. ACKNOWLEDGEMENTS
This research has been supported by the US National Imagery and Map-
ping Agency (currently NGA) and the US Department of Energy that are
gratefully acknowledged.
2. Decompose your 3-D errors from exercise 1 in a way similar to what was
done in equation (7) for 2-D.
4. Decompose your 3-D errors from exercise 3 in a way similar to what was
done in equation (7) for 2-D.
11. REFERENCES
Brown, L. A Survey of Image Registration Techniques, ACM Computing Sur-veys,vol.24
(4),pp. 325--376, 1992, citeseer.nj.nec.com/brown92survey.html
Edwards D., Simpson J. Integration and access of multi-source vector data, In: Geospatial
Theory, Processing and Applications, ISPRS Commission IV, Symposium 2002, Ottawa,
Canada, July 9-12, 2002.
Edwards, D., Simpson, J. Integrating, Maintaining, and Augmenting – Multi-source Data
through Feature Linking , In: OEEPE/ISPRS Workshop: From 2D to 3D; Establishment
and Maintenance of National Core Geospatial Databases, 8-10 October 2001, Hannover;
OEEPE, Frankfurt am Main, Germany, 2002,
Federal Geographic Data Committee FGDC-STD-999.1-2000, 2000, http://www.bts.gov/gis/
472 Chapter 18
fgdc/introduction.pdf
Ghosh, S.K. Analytical Photogrammetry, Pergamon Press, New York, 1998.
Maes F, Colignon A, Vandermeulen D, Marchal G and Suetens P, Multimodality image regis-
tration by maximization of mutual information, IEEE Trans Medical Imaging, 1997, vol.
16, pp 187-198.
Moniz, E., Baldeschwieler J., Approaches to Combat Terrorism (ACT): Opportunities for
Basic Research, Chantilly, VA, November 19-21, NSF, 2002,
http://www.mitre.org/public/act/10_22_final.pdf
Scott, D., Conflation for Feature Level Database (FLDB), NIMA National Technology Alli-
ance, 2003, http://www.nta.org/gi.htm.
Shannon, C. E. A Mathematical Theory of Communication. The Bell System Technical J. 27,
379-423 and 623-656, July and Oct. 1948.
http://cm.bell-labs.com/cm/ms/what/shannonday/shannon1948.pdf.
Shannon, C. E. and Weaver, W. Mathematical Theory of Communication. Urbana, IL: Uni-
versity of Illinois Press, 1963.
Studholme C., Hawkes D.J., and Hill D.L.G., An overlap invariant entropy measure of 3D
medical image alignment, Pattern Recognition, vol. 32, pp. 71-86, 1999.
Swiftwure Spatial Systems Inc., Near-term conflation requirements at NIMA, and existing
conflation capabilities, Report, 2936 Phyllis Street, Victoria, BC V8N 1Z1, Nov. 30, 2002.
Wells, W. M., Viola, P., Atsumi, H., Nakajima, S., and Kikinis, R., Multi-modal volume reg-
istration by maximization of mutual information, Medical Image Analysis, 1(1):35--51,
1996
Zitová B., Flusser J., Image registration methods: a survey, Image and Vision Computing. 21
(11), 2003, pp. 977-1000
Chapter 19
CONFLATION OF IMAGES WITH ALGEBRAIC
STRUCTURES
Abstract: Spatial decision making and analysis heavily depend on quality of image regis-
tration and conflation. An approach to conflation/registration of images that
does not depend on identifying common points is being developed. It uses the
method of algebraic invariants to provide a common set of coordinates to im-
ages using chains of line segments formally described as polylines. It is shown
the invariant algebraic properties of the polylines provide sufficient informa-
tion to automate conflation. When there are discrepancies between the image
data sets, robust measures of the possibility and quality of match (measures of
correctness) are necessary. Such measures are offered based on image struc-
tural characteristics. These measures may also be used to mitigate the effects
of sensor and observational artifacts. This new approach grew from a careful
review of conflating processes based on computational topology and geometry.
This chapter describes the theory of algebraic invariants, a confla-
tion/registration method with measures of correctness of feature matching.
Key words: data fusion, imagery conflation, algebraic invariants, geospatial feature, poly-
line match, measure of correctness, structural similarity, structural interpola-
tion
1. INTRODUCTION
The angles between segments and individual segment lengths are two al-
gebraic characteristics of polylines that can be used. For smooth features
extracted from images with comparable scales and resolutions, either com-
parison works well. When there are marked differences in image scale and
resolution, the choice of angles or lengths becomes more important. This
chapter examines characteristics of extracted polylines and how they may be
interpolated and compared and used to conflate images.
19. Conflation of images with algebraic structures 475
2. ALGEBRAIC INVARIANTS
Properties:
• ∀ ai , aj ∈ A: ai ≥D aj or aj ≥D ai .
• ∀ ai , aj , ak , am ∈ A: (ai , aj) ≥L (ak , am) or (ak , am) ≥L (ai , aj).
19. Conflation of images with algebraic structures 477
This property is not easy to test because it requires matching equal ele-
ments of A and B in advance, which is a major goal of conflation.
Proof. To prove this theorem we note that the task is equivalent to find-
ing the largest common submatrix such as shown in Tables 1 and 2 below.
This submatrix should be centered on the diagonal of the two matrixes for a
and b as shown in Table 2. The total number of such matrixes is n+(n-
1)+(n-2)+…+2+1=(n+1)n/2. To find the largest common submatrix we
need to compare submatrixes of the same size i× i in both matrixes.
Table 1. Illustrative matrix for feature a
L1 L2 L3 L4 L5 L6
L1 1 0 1 1 0 1
L2 1 0 1 0 0
L3 1 0 1 0
L4 1 1 0
L5 1 1
L6 1
This relation can be represented as a binary vector, t = (P1, P2, P3, ... , Pk,
... , Pn-1). For example, consider a specific collection of linear intervals that
generate a vector t1 = (1, 0, 1, . . . ). Note that this vector contains informa-
tion about the relative lengths of successive intervals. For example, t1 states
that a1 is no shorter than a2 while a2 is shorter than a3. Denote the ith compo-
nent of t1 as t1i .
Next, for the purpose of simulation, suppose we apply non-linear trans-
formations to the points that comprise each of the ai intervals. Let a vertex
v = (x, y), then the transformed vertex v will be transformed component-wise
as
H(t1 , t2)=Σk=1,n-1(t1k-t2k)2.
H(t1,t2)=Σk=1,n-2(t1k-t2k)2.
Common problems with polylines are discontinuities that result from im-
age resolutions, differences in image acquisition, and artifacts of feature ex-
traction algorithms. Extracted features can be modified in two ways to give
a common resolution complexity to facilitate comparisons.
Consider the case of a curvilinear feature that is segmented as a result of
something obscuring it, such as a road shaded by a tree. By connecting seg-
ments that are “close enough” and have “small” deviation angles, a compos-
ite feature can be formed. The maximum separation distance and the maxi-
mum deviation angle parameters permitted are clearly critical to feature
creation and to one’s confidence in the result.
Another type of feature modification is necessary to simplify curvilinear
features with one or more relatively narrow lobes for comparison to one with
no narrow lobes. For example, a state road map will typically depict a coast-
line with very little structure. A high-resolution aerial photograph, on the
other hand, will show the same coastline with lots of structure, showing that
it goes inland for miles along river channels and juts out around spits of land.
The higher resolution feature can be simplified by removing these lobes if
they are “sufficiently narrow.” Critical parameters here are the unit size of
feature sampling and maximum jumping distance permitted.
With images prepared with this preprocessing, it is possible to register
images that appear at first to have few if any features in common and are of
unknown scale and orientation.
Measures of spatial similarity of polylines also need to be developed.
Here, the focus is on spatial similarity characteristics while the similarity of
non-spatial feature attributes can be matched after a spatial match is con-
firmed. If two images are matched using only a few reference points, the
similarity of other points also needs to be assessed.
The issue of variability of points that form a polyline also needs to be ad-
dressed. Different feature extraction algorithms and imagery analysts can
assign points differently on the same physical feature. This can affect finding
482 Chapter 19
More formally a function G can be defined as follows for its first three
values:
G(20) = G(1)= [p1, p2];
G(21) = G(2) =[p1, middle(p1, p2), p2];
G(22) = G(G(20)) = [p1, middle(p1, middle(p1, p2)), middle(p1, p2),
middle(middle(p1, p2), p2), p2]
Similarly we can construct matrixes A(n) for each polyline G(n). The ma-
trix shown in Table 3 only reflects the upper level structure. For more de-
tailed structure we may include in structure S more matrixes that reflect
more specific structural properties, such as relation between differences be-
tween angles and relations between second, third and so on differences be-
tween angles, similar to second and third derivatives.
Two structured polylines Ga(n) and Gb(n) are structurally equivalent if
their structures Sa(n) and Sb(n) are the same, that is there is a isomorphism
between structures. If we restrict the structure by the matrix A shown in Ta-
ble 3 then the equality of structures means the equality of such matrixes for
two different polylines.
Now we can discuss how to define measures of structural similarity be-
tween two arbitrary polylines a and b and use these definitions for matching
features and conflating images.
Step 1. For raster images extract several linear features as sets of points
(pixels), S. For vector images skip this step.
Step 2. Vectorize extracted linear features. For vector images skip this step.
Step 3. For both raster and vector images analyze the complexity and con-
nectivity of vectorized linear features. If features are too simple (contain
few points and are small relative to the image size) combine several fea-
tures in a superfeature. If features are too complex, simplify features by
applying a gap analysis algorithm. In the ideal situation we also should
be able to separate feature extraction algorithm artifacts from real fea-
tures. In the example here, the algorithm introduced artifacts, by captur-
ing vegetation as a part of the shoreline in several places.
Step 4. Interpolate each superfeature as a specially designed polyline using
the BSD method.
Figure 6 depicts level 1 BSD interpolations with k=1 and n=2 for the vec-
torized features shown in Figure 3. The middle points of each feature are as
shown, computed along each line. Significant fluctuations have been lost in
the lower resolution image. Feature M as interpolated has angles A1, A2, and
A3. Feature L as interpolated has angles B1, B2, and B3
Figure 7 depicts the next level of BSD interpolations for the same vector-
ized features from Figure 3.
19. Conflation of images with algebraic structures 485
L3
L2 B2 S3
M2 M3
A2
T3
T2 S2
A1 A3 B3
B1 M1 T1
S1 L1
Figure 6. Sections of the extracted shorelines with the first level BSD interpolations with k=1
and n=2
3
1
Step 1: Compute L, the length of the line from its start to the middle point M
along the line.
Step 2: Compute the lengths of two “shoulders”, S1 and S2, that is S1 the
length of the straight line [T, M] between the start point T and the mid-
dle point M. Similarly compute S2, the length of the straight line [M,E]
between the middle point M and the end point E.
Step 3. Compute ratios R1=S1/L and R2=S2/L, where L is the half of
length of the feature computed along the feature. If a ratio R1 is close
to 1 then the first part of the feature is close to the straight line, similar
with R2. If both R1 and R2 are significantly larger than 1 and have
similar values, say R1= 10.5 and R2=11.4 then on the first level of
structural similarity features F1 and F2 are similar.
486 Chapter 19
Step 4. Repeat steps 1-3 for subfeatures [T, M] and [M, E] recurrently for
theirs subfeatures until all ratios will be equal to 1 (straight line).
For any polyline with n nodes it is required to repeat step 4 no more than
nlgn times to get all ratios equal to 1.
This can be shown by considering an extreme case (EC) : where after the
finding a middle point M only the single end points will be in the right
“shoulder.” This means that the n-1 nodes are in the part between the start
node T and M including node pn-1.
We know that the polyline between node pn-1 and the end node E= pn is a
straight line [pn-1, E]. Above we also assumed that M is between them, thus
[M, E] is a straight line and a part of the longer straight line [pn-1, E]. That
is ratio R2 is equal to 1 for [M, E] and Step 4 took only one iteration for the
subfeature between M and E.
Now we assume that the same extreme case (EC) is true for the left sub-
feature from T to M and subsequently for all its subfeaures. That is we need
to repeat step 4 only n times for this extreme case.
If our previous assumption is not true and we have more than one node in
the subfeature between M and E, we can repeat such binary search process at
most for lgn times to come to the situation with a single straight line. Having
total n nodes it may take nlgn loops with Step 4 in the worst case.
We visualize the structural length type of the feature related to lengths in
Figure 8.
Figure 8 shows that ratios R1 and R2 on the first level were basically the
same. On the second level it is the same, but on the third level the right
shoulder is much larger than the left shoulder. In the fourth level the right
part is symmetric, but left part is not. The last level 5 also provide a mix of
symmetric and asymmetric cases.
Matrices are constructed from the angle relationships and from the length
relationships of the polylines by using two algorithms, denoted as the Angle
19. Conflation of images with algebraic structures 487
Step 5. Compute a matrix Q of the relation between all angles on the poly-
line by using AA Algorithm.
Step 6. Compute a matrix P of the relation between all lengths of intervals
on the polyline by using SA algorithm.
Values of 0 and 1 are used to indicate and < respectively. For this ex-
ample, the angular relations for the two polylines are presented in Table 4
and the length (or shoulder) relations are in Table 5.
Table 6. Matrices for features from images A and B for angles marked 1,2, and 3.
Bold numbers indicate differences between angular relations in two features.
1 2 3 Image A: angles 1 2 3 Image B: angles
1 1 0 0 1: 2.206752 1 1 1 1 1: 2.906888
2: 2.389911 2: 2.467343
2 1 0 2 1 0 3: 2.702809
3: 2.797306
3 1 3 1
The tables above only show matrixes for BSD level 1. In the general case
for BSD level k there are 2k-1=n-1 segments and each matrix generated has
size (n-1)×(n-1). Table 6 shows part of the BSD levels 2 for the same fea-
tures.
Another way to compare interpolated polylines is to construct a matrix of
shoulder/length ratios S/L, where the length L is computed as a distance be-
tween the shoulder end points along the polyline. We use two relations:
Si/LiSj/Lj and | Si/Li − Sj /Lj |<. In the matrix for relation Si/LiSj/Lj all
diagonal cells (i=j) are equal to 1 because Si/LiSi/Li is always true. Also
S1/L1=1, since by definition for the base line S1=L1.
488 Chapter 19
Next, L2 = L3 by the middle point design, thus S3/L3 S2/L2 is true if and
only if S3 S2. We also know by definition that 1 Si/Li, because Si as a
straight line is shorter or equal to the curve Li between the same points. This
analysis shows that relations for ratios S/L are the same as for S. But ratios
can provide more relations, for instance we may discover that 2*S2<S3 , i.e.,
S2 is twice smaller than S1.
In the example, Tables 4 and 5 show that matrixes for two features are
identical in both images. This means that we have the highest closeness of
given features on BSD level 1. However, Table 6 reveals that matrixes for
BSD level 2 are different and there is no match on this level. If there are suc-
cessful matches, higher BSD levels are explored until the match fails. A
match level tree that shows the deepest match level reached for each section
on each BSD level can be used to make a final judgment about feature
match.
Table 7. Angle comparisons using thresholds
1 2 3 Level 2 BSD angle comparison matrix: image A (columns) and
image B (rows) with threshold = 0.209440. Note, 1 indicates
1 1 1 1 the difference in the column and row values is greater than the
2 0 1 threshold, | Li – Lj |> . This matrix indicates that Level 2 BSD
match fails. There is only one value in the threshold limits.
3 0
The goal of Step 9 is to identify the deepest level of structural match for
two features. The brute force version of this step is to repeat steps 7 and 8 for
all deeper BSD level k+1, k+2 and so on. The more efficient version of step
9 described below first forecasts feature match on the next BSD level and
19. Conflation of images with algebraic structures 489
computes the next BSD level only if the forecast is successful. Otherwise,
the forecasting algorithm explores potential match for halves of the feature
(shoulders). If it fails too then the features are cut down as described in Steps
10 and 11. Step 9 contains the following substeps:
Step 9.1. If Steps 7 and 8 are successful on BSD level k, forecast potential
match in the next BSD level k+1 using forecasting algorithm FA (see
algorithm described below).
Step 9.2. If match is forecasted by FA algorithm for BSD level k+1, repeat
step 9 for BSD level k+1 until mismatch is forecasted. If mismatch is
forecasted by FA algorithm for BSD level k+1, use FA algorithm for
match forecasting for respective halves of the features (right and left
shoulders).
Step 9.3. Construct a match level tree that shows the deepest match level
reached for each section on each BSD level.
Step 9.4. Evaluate the tree to make a final judgment about feature match.
Algorithm AA for BSD level k=0 provides us the lengths of three curves
L1, L2 and L3 (see Figure 5), where L1 is the base line connecting two ends of
the feature curve, and L2=L3 by the definition of the line middle point used
to built them. Similarly SA algorithm for k=0 produces three straight lines
(shoulders) S1, S2 and S3 connecting two ends and the middle point of the
feature curve (see Figure 6). Similarly, for every other k, AA and SA algo-
rithms provide a set of Li for each polyline segment along the curve and a set
of shoulders Si between those segments (see an example in Figure 7).
Next we compute all Si /Li for BSD level k=0 for image A and similar ra-
tios for image B, denoted as Ti /Mi in Figure 6. Say these ratios are 1, 0.5
and 0.8 for image A and 1, 0.3 and 0.83 for image B.
After that we search for a pair of shoulders <Si/Li, Ti/Mi> such that
|Si/Li -Ti/Mi|> in the two images. If shoulders are found, this is declared a
mismatch forecast. If there are no such shoulders, the FA algorithm forecasts
a potential match on the next level and the Conflation Method proceeds to
compute BD level k+1.
In the example above, |S2/L2-T2/M2|=|0.5-0.3|> , if =0.1. Thus, means
that the CM method will not compute BSD level k+1=2 for the whole feature
but will compute BSD for the third section of the curve with
|S3/L3 -T3/M3|=|0.8-0.83|<0.1.
The motivation for this algorithm is the following. The significant differ-
ence in ratio values between two features is indicative that these differences
will show up on a deeper level of BSD and features would not match at
490 Chapter 19
higher levels. However, we cannot say at what higher level it will actually
happen. The probability to get match on the next BSD level is lower for
shoulders with very different ratios than for shoulders with similar ratios.
Our simulation experiments confirm this. It is also consistent with an expec-
tation that the probability of a high level of match (k>4) for images with sig-
nificant noise and different resolution is low. Thus, we cannot expect many
of these very good cases and having the majority of low-level matches the
FA algorithm will significantly shorten computation time.
The next step is to explore the situation when a feature match is not
found. In this case, a search for matching sub-features is initiated by cutting
a predefined unit from the first superfeature and repeating, keeping other
superfeatures unchanged. Currently we use a unit size between 1/200 of the
image largest measurement and half the size of a large feature. It is sug-
gested to start from large units to save time.
If no match is found, the process is repeated by sequentially cutting units
from all superfeatures until a match is found or the superfeatures are gone.
This process forms steps 10 and 11 of the conflation algorithm:
Step 10. If Steps 7 and 8 fail cut a predefined unit from the first superfeature
and repeat steps 4-9 for this modified superfeature and unchanged other
superfeatures.
Step 11. If match is not found repeat step 10 by sequentially cutting units
from all superfeatures and until match will be found or superfeatures
are completely cut down.
Figure 9 shows the final conflation result for the example discussed.
19. Conflation of images with algebraic structures 491
Optimizing unit size. The unit size u used in steps 10 and 11 is the most
critical parameter of the whole CM method. If the unit size is equal to the
length L of the largest superfeature then steps 10 and 11 are empty. If the
unit is a half of the L, u=L/2 then at most step 11 will be repeated two times
for each superfeature, that is total n×m times, where n is the number of su-
perfeatures in image A and m is the number of superfeatures in image B. Our
experiments show that three superfeatures per image were sufficient to find
the match. Thus, n×m could be bounded by 9 times of running step 11. If
each superfeature contains r units then step 11 will be run (rn)×(rm)= r2nm
times, if there is no optimization of time. This is polynomial time complexity
function. The combination of AA and SA algorithms provides such time op-
timization by cutting time of running step 11 by half.
§1234 ·
ϕ = ¨¨ ¸¸ .
© 3241¹
A similar index substitution function is produced for angles (i.e. for pairs
of linear segments with indexes [i, j] and [k, m]).
1 n 4 n n
(¦ k − (2n + 1)¦ k 3 + (n 2 + n)¦ k 2 ) =
2 k =1 k =1 k =1
1 1 5 1 4 1 3 1
[( n + n + n − n) −
2 5 2 3 30
1 1 1
( 2 n + 1 )( n 4 + n 3 + n 2 ) +
4 2 4
1 1 1
( n 2 + n )( n 3 + n 2 + n )] =
3 2 6
1
( 2 n 5 + 5n 4 − 5n 2 − 2 n ) .
120
Thus, we have shown this is indeed O(n5). For comparison purposes as
we proceed, note that for n = 10 this is 2079.
It is interesting to note that for small n (n ≤ 60) this equation behaves
more like O(n4) due to the small leading coefficient. However, it is not gen-
erally unusual to have n > 60 segments in a feature.
¦ k (n − k )(n − k + 1) / 2 =
k =1
n n n
1 2
[(n + n)¦ k − (2n + 1)¦ k 2 + ¦ k 3 ] =
2 k =1 k =1 k =1
1 2 1 2 1
[(n + n)( n + n)
2 2 2
1 1 1
− (2n + 1)( n 3 + n 2 + n)
3 2 6
1 1 1
+ ( n 4 + n3 + n2 ) =
4 2 4
(n + 2n − n − 2n) / 24 .
4 3 2
all possible combinations of values that also satisfy these same relationships,
we found the best method was to first sort the values, keeping track of the
indices from which they came, set up a new group of values from lowest to
highest, then distribute them back into the proper order as per the original
indices. That is, the order of the original indices after sorting fully describes
the comparison matrix. Given this order, it is a simple matter to reconstruct
the matrix. In fact, this order itself can be used to compare two features in a
way in which you do not need the comparison matrices at all.
This is what we call the linear structure’s sequence. The following ex-
ample illustrates this concept.
Let us use the length measurements for feature S1: {5, 2, 8, 9, 4}, which
have the comparison matrix shown in Table 8. The values 0 and 1 are used
to represent x < y and x y respectively, where x is a row value and y is a
column value.
Table 8. Comparison of lengths for feature S1
Length 5 2 8 9 4
5 - 1 0 0 1
2 - 0 0 0
8 - 0 1
9 - 1
4 -
Now, we also have S2: {8, 9, 4, 6, 7} (see Table 9). The matching seg-
ment is highlighted in both Table 8 and Table 9.
Table 9. Comparison of lengths for feature S2
Length 8 9 4 6 7
8 - 0 1 1 1
9 - 1 1 1
4 - 0 0
6 - 0
7 -
Let us build the index sequence for Table 8. This table depicts the set of
lengths of intervals {5, 2, 8, 9, 4} and their relationships. The structural se-
quence for this table will begin with the value 2, which is the index of the
smallest element (3). The next value 5 indicates the second smallest element
is from index 5 whose value was 4. This process continues and the final se-
quence of the indices in S1 is {2, 5, 1, 3, 4}. The sequence of the values by
index in S2 is {3, 4, 5, 1, 2}, and is derived from Table 8 in a manner similar
to S1. At first it seems that, there is no similarity, but when the first two ele-
ments of S1 have been stripped off and the values ranked again, we find the
S1 LS-sequence to be {3, 1, 2}. Further, when the last two elements of S2
have been stripped off, we find the S2 LS-sequence to be {3, 1, 2} also. We
have found the matching segments. All that is necessary is to delineate all
496 Chapter 19
¦ (n + 1 − k )k log(k )
k =1
1 1 1 1
n log(n)( n 2 + n) + log(n)( n 2 + n)
2 2 2 2
1 1 1
− log(n)( n 3 + n 2 + n) =
3 2 6
2 3 1
n log n + n 2 log n + n log n .
3 3
We have shown the following result.
Lemma 1. The complexity of generating LS-sequences, which is
O ( n 3 log n) , is given by
2 3 1
n log n + n 2 log n + n log n .
3 3
This process of generating sequences can be performed for each feature
before any comparisons between features are made because it requires no
information from any other feature. It needs to be noted that mergesort or
another stable sorting method is preferred to a sort such as quicksort here
because of the importance of preserving the correct order in the event that
two equal values are found.
We can now analyze the complexity of the actual comparisons between
features.
• For each sequence of length n-k+1 a comparison must be exact for the
entire matching section. This requires n-k+1 comparisons. Thus, this is
a simple O(n) operation.
• The number of subsequences of length n-k+1 produced from a sequence
of length n is k.
19. Conflation of images with algebraic structures 497
¦ k 2 (n − k + 1) = (n + 1)¦ k 2 − ¦ k 3 =
k =1 k =1 k =1
1 1 1
(n + 1)( n 3 + n 2 + n)
3 2 6
1 1 1
− ( n 4 + n3 + n2 ) =
4 2 4
(n + 4n + 5n + 2n) / 12 .
4 3 2
for each position, that is n-k+1. Thus, for a single sequence, say (1,2,5,4,…)
we need (n-k+1)log k individual comparisons of values (indices).
Now we notice that there are maximum k different sequences of length n-
k+1, that is the total number of comparisons is: (n-k+1)*k*log k. Now, we
derive the formula for all k:
n n
1 1
(n + 1) log(n)( n 2 + n)
2 2
1 1 1
− log(n)( n 3 + n 2 + n) =
3 2 6
log(n) * (n + 3n + 2n) / 6
3 2
All angle differences are less than the 15o limit for both images. Thus, they
are matched by the AA algorithm on this BSD level. But the AA algorithm
does not give any clue that this match may deteriorate in the next levels, in
contrast with the SA algorithm that provides such clues.
Figure 10. This example indicates that the SA algorithm based on lengths is more robust for
similar cases than the AA algorithms based on angles
4. CONFLATION MEASURES
tions in each and (2) there are no point limitations on the determination of
distances (see the Appendix of this report.)
Another measure of feature shape similarity was defined by Cobb et al.
[Cobb, Chung, Foley, Petry, Shaw, & Miller 1998]. The idea of the method
is illustrated in the following Figure 12(a) and 12(b).
(a) (b)
Figure 12. Structural differences
i −1
i =1
in both Figure 12(a) and Figure 12(b) may give the same similarity
with the black line.
4. Each feature is in the bounding box defined by the start and end
points (see Figure 12(a). Note this method is not applicable for fea-
tures shown in Figure 12(b).)
(a) (b)
Figure 13. Examples of scale ratios
(a) (b)
Figure 14. Bounding boxes for standardized features
Our ASC method descried earlier is applicable for both conflation cases
presented in Figure 14. It captures the structural differences between two
lines shown in Figure 12(b) using angles.
While the method of Cobb is satisfactory for similar features with similar
lengths, it lacks the ability to determine partial matches of polylines. By
using the Frechet measure of distance L2 between features it cannot accu-
rately compare structures that are very different and will not work at all with
others such as illustrated in Figure 14(b).
Carswell et al. [Carswell, Wilson & Bertolotto, 2002] have developed a
composite similarity metric to aid in the location of “image-objects” in im-
ages. They combine weighted topology, orientation, and relative-distance
similarity measures for the image components Q and compare it with an im-
age scene I. For our earlier example of finding a house with an outdoor
swimming pool across a road from a barn and silo, the “image-object” would
consist of these five elements with appropriate topology, orientations, and
distances defined. Their similarity measure provides a percentage match
between the sum of these features Q and similar ones in the image I.
The individual similarity measure weights can be varied to emphasize the
relative importance of different image properties. For this example, one
19. Conflation of images with algebraic structures 505
Step 1: Compute L, the length of the line from its start to the middle point D
along the line.
Step 2: Compute the lengths of two “shoulders”, S1 and S2, that is S1 the
length of the straight line [T, D] between the start point T and the mid-
dle point D.
Step 3. Compute ratios R1=S1/L1 and R2=S2/L2, where L1=L2 I is the half
feature length computed along the feature. If a ratio R1 is close to 1
then the first part of the feature is close to the straight line, similar with
R2. Find ratios that are in the threshold limit.
Step 4. Repeat steps 1-3 for subfeatures [T, D] and [D, E] recurrently for
their subfeatures until all ratios are equal to 1 (straight line).
6. CONCLUSION
This chapter describes a technique of image correlation that does not rely
on geometrical or topological invariants or on identifying points in common
with known coordinates. This technique determines relative scales and orien-
tations and corresponding points by analyzing linear features identified in
each image and fit with a polyline. While the example presented is from a
research area at hand, there is nothing intrinsic to this method that ties it to
the spatial imaging of the earth. It could as easily be applied to any set of
overlapping images from any discipline and to images produced of a dy-
namic scene at different times.
7. ACKNOWLEDGEMENTS
1. Draw a road network with four nodes (road intersections) and 6 roads.
Define an algebraic system that would represent this road network.
9. REFERENCES
Brown, L. A Survey of Image Registration Techniques, ACM Computing Surveys,vol.24 (4),
pp. 325--376, 1992
Carswell J., Wilson, D., Bertolotto, M., Digital Image Similarity for Geo-spatial Knowledge
Management, in Advances in Case-based Reasoning, Springer-Verlag, Berlin, 2002
Cobb, M., Chung, M., Foley, H., Petry. F., Shaw, K., and Miller, H., A rule-based approach
for the conflation of attributed vector data, GeoInformatica, 2/1, 1998, 7-36
Cohen S., Guibas, L., Partial Matching of Planar Polylines under Similarity Transformations.
Proceedings of the Eighth Annual ACM-SIAM Symposium on Discrete Algorithms, 777-
786, January 1997
Kovalerchuk B., Sumner W., Algebraic relational approach to conflating images, In: Algo-
rithms and technologies for multispectral, hyperspectral and ultraspectral imagery IX. Vol.
5093, International SPIE Military and Aerospace symposium, AEROSENSE, Orlando,
FL., Apr.21-25, 2003, pp. 621-630.
Kovalerchuk B., Sumner W., Curtis. M., Kovalerchuk, M., Chase, R Matching image feature
structures using shoulder analysis method, In: Algorithms and technologies for multispec-
tral, hyperspectral and ultraspectral imagery X. SPIE Defense and Security symposium,
Orlando, FL., Orlando, FL., April 11-16 2004
Mal’cev A.I. Algebraic Systems, Springer-Verlag, New York, 1973
Neapolitan R., Naimipour K. Foundation of algorithms using C++ pseudocode. Jones and
Bartlett Publ., 1998
Zitová T., Flusser J., Image registration methods: a survey, Image and Vision Computing. 21
(11), 2003, pp. 977-1000
Chapter 20
ALGORITHM DEVELOPMENT TECHNOLOGY
FOR CONFLATION AND AREA-BASED
CONFLATION ALGORITHM
1. INTRODUCTION
In essence this is a top to bottom approach where finally the basic pixel
count is taken as a measure of lake size to be compared. Thus, Table 1 pro-
20. Algorithm development technology for conflation and area-based 511
conflation algorithm
The goal of this step is the further polishing of the set of parameters. At
first we want to preliminary evaluate if the set of 14 parameters could be
sufficient to solve the conflation problem with a linear (affine) transform. At
this informal stage, the answer for this question is simply an expert opinion
(yes/no). If the answer is “yes” then we are interested in narrowing this set of
14 parameters to a smaller subset that can be sufficient too. This is also done
by asking an expert about subsets {Pis} of the potentially sufficient parame-
ters selected from all parameters {Pi}. The simple exhaustive option here is
to ask an expert to answer yes/no about all 214=4096 subsets. Obviously such
approach is not realistic and moreover redundant. The monotone Boolean
function approach [Kovalerchuk et al., 1996, 2001] is more appropriate here.
The approach has two significant components. The first stage is to for-
mulate each parameter in such a way that if the expert answered ‘yes” about
this parameter then it increases the chances that the conflation can be done
using this parameter. For instance, the parameter “Simple symmetric feature
exists” may or may not indicate increased chances that conflation can be
successful. With many similar symmetric features it can be very difficult to
make a unique match. This parameter can be negated and reformulated as
“Asymmetric unique feature exists”. If the expert answers “yes” for this re-
formulated question then the chances to find a unique conflation matching
points and features increase, because we may find unique matching points
516 Chapter 20
and other feature parts that can be uniquely matched with similar elements
in another image. In Figure 2 parameters are already reformulated in such a
way from their original recording. We will call such reformulation a positive
monotone reformulation. Thus, the first test for 14 parameters is to ask the
expert if he/she agrees that if all 14 parameters are evaluated with “yes” then
there are very good chances that two images can be conflated by an affine or
more complex transform.
If the answer is “yes” then an attempt to minimize the set of parameters is
made. This is the second stage of the approach. Figure 3 shows how IVES
system supports this stage. The expert is asked to answer if checked parame-
ters received “yes” answers could be it sufficient for successful conflation. If
the answer is positive then this subset of parameters will be a new starting
point for further decreasing the set of parameters.
There is no reason to ask the expert about subsets that include all parame-
ters of this subset and some more parameters. That large subset should be
sufficient too by the parameter design performed at the first stage described
above. This is the property of monotonicity of Boolean functions that we
exploit. Thus we are only interested in cutting down the number of parame-
ters.
A screenshot in Figure 4 illustrates how IVES system supports this stage.
The expert determines if the following checked parameters could be suffi-
cient to have as single linear mapping from one image to another. The expert
20. Algorithm development technology for conflation and area-based 517
conflation algorithm
answers using yes/no buttons for the current subset of parameters shown in
the first column. Previous answers are shown in other columns. The light
color (green) indicates positive answers (yes) and the dark color (red) indi-
cates negative answers (no). The sequence of questions is stored in the IVES
system in advance. This sequence can be altered by loading another se-
quence file. Expert’s answers are recorded in another file. The question se-
quence can be optimized by selecting an appropriate sequence file. If noth-
ing is known about potential expert’s answers in advance the sequence that
will ask the minimal number of questions for the most difficult situation can
be used. This best sequence for the worst-case scenario is formalized by the
Shannon criterion and is based on Hansel chains [Kovalerchuk et al., 1996].
See also Chapter 16 for more detail.
Figure 4. Expert questioning using monotonicity principle. See also color plates.
The most desirable output from this step would be the conclusion that a
single parameter out of 14 listed could be sufficient for some sets of images.
518 Chapter 20
The analysis of expert’s answers had shown that the parameter “Asym-
metric unique features exist” popped up as a single parameter that could be
sufficient for building the algorithm. The motivation for this selection is as
follows. If asymmetric features exist and are preferably unique then ambigu-
ity in conflation is less likely. Symmetric features are more likely to cause
ambiguity. In the next section we analyze how this parameter can be formal-
ized. We interpret the concept of asymmetric unique feature very generally.
Such a feature could be a cluster of three square buildings of different sizes
and asymmetrically located to each other. The buildings themselves are not
unique, but their mutual location can be unique. Thus we do not require that
the feature be a single continuous entity.
where S is the area occupied by the shape. The concept of asymmetric fea-
ture should be formalized later as a predicate Asymmetric(F). The simplest
20. Algorithm development technology for conflation and area-based 519
conflation algorithm
Step 5 of ADTC is called Check Invariance for short. All suggested im-
pact measures are analyzed for invariance to affine transformations. For in-
stance, the pixel count based ratio of sizes of two shapes mentioned above in
step 4 is invariant. See a formal analysis of this impact measure invariance
with the invariance theorem below.
Invariance to disproportional scaling (DPS) is one of the most difficult re-
quirements to meet. In Chapter 19 relations between angles and linear seg-
ment lengths have been exploited to build conflation algorithms. These rela-
tions are relatively robust, that is they do not change for limited DPS. Figure
5 shows a robust DPS situation with the angle relation “>” preserved.
0 0 B`=100
B=150
A`=800
A=50
B` > A`
100 0 > 80 0
B>A
150 > 50
(a) (b)
In Figure 5 (a) angle B is greater that angle A, B>A. This property is pre-
served under disproportional scaling shown in Figure 5(b), where still
B`>A`.
Now we can explore robustness of relations “=” and “>” between angles
and areas relative to other disproportional scaling. In Figure 6, area S1 is
equal to area S2, S1=S2. In addition, angles A, B, and C, D are equal, A=B,
C=D too.
S1= S2
A=B
C=D
S1 C S2
A
B D
Figure 6. Original image
S`1 = S`2
A` < B`
C` < D`
S`1 C` S`2
A`
B` D`
Figure 7. Image after disproportional scaling
affine transform. This basic idea is adjusted for the cases where more than
one matching triple found. An additional uniqueness criterion is introduced
in the algorithm based on the analysis of additional ratios.
Suppose there is an image that contains a large lake of some size and a
small lake whose size is of the size of the large lake. This size ratio () is
invariant to affine transformations. The ratio precision needs to be adjusted
to the scale of least precise image. Ratios , ½ and ¼ could match 0.336
0.52 0.27 if images are of different scales. The algorithm uses a matching
threshold for these cases.
This logic of the algorithm requires: (1) an algorithm for computing area
ratios and for matching ratios and (2) an algorithm for region extraction from
the image. The first algorithm called the Ratio Algorithm and the second
algorithm called Vectorizer are described below. The development of the
second algorithm is the goal of the Step 7 of the ADTC technology.
The ratio algorithm starts from a set of regions {G1i} for image 1 and a
set of regions {G2i} for image 2 extracted by the Vectorizer algorithm. The
Ratio algorithm computes areas for each region in both images, S1i=S(G1i),
S2i=S(G2i) as a number of pixels inside of the region. Next this algorithm
computes two matrixes V1 and V2. Elements of matrix V1={cij} are cij=S1i/S1j
Elements of matrix V2= {qij} are defined similarly, qij = S2i/S2j. We assume
that all areas S1i and S2i are positive.
The matrix representation is important because it permits us to convert
the situation to a generic algebraic system framework, with algebraic sys-
tems Ak=<Ak, Rk, k>, where signature k contains the operator Vk(ai,aj)
represented as a matrix Vk and handles the conflation problem uniformly.
From this point uniformity permits us to use a single and already imple-
mented algorithm to search for matching features in the images. It does not
matter for the algorithms in algebraic form whether elements of Ak are
straight-line segments, polylines, areas, or complex combinations, or some
other features. Elements of Ak also can be numeric characteristics of image
components such as a size of region i in image k, Ski.
Example: Let matrix V1 be computed for regions with areas S11=6, S12=4,
S13=2, S14=1 (see Table 4) in image 1 and matrix V2 is computed for areas
S21=4, S22=1, S23=6, S24=7 in image 2 (see Table 5).
The brute force method would search equal ratios in two matrixes
excluding the diagonal. There are six equal numbers in these matrixes, which
may indicate the match uncertainty. In fact, there are only three numbers that
should be considered (only numbers above the diagonal). The numbers be-
low the diagonal are 1/cij of the numbers above the diagonal cij. This is an
unambiguous case, where ratio 6/4 for S11=6, S12=4 is matched to S23=6,
S21=4, that is region G11 in image 1 is matched to region G23 in image 2 and
region G12 in image 1 is matched to the region G21 in image 2. The center of
each region is computed as an average of coordinates of all points (pixels) of
the region.
Computational efficiency of the algorithm depends on how quickly ma-
trixes V1 and V2 will be computed for images with the large number of re-
gions. We can notice that matrix V1 has only n-1 independent ratios. All
other ratios from n2 ratios are computed from them excluding the diagonal
that contains all 1’s by definition. The theorem about this is proved below.
It is reasonable to start from these n-1 independent ratios. If all these ra-
tios in A differ from n-1 independent ratios in B then the next n-2 ratios are
computed in both matrixes. If they also have no equal values then the proc-
ess continues for the next n-3 ratios until all elements A from the upper part
of A are exhausted.
If some ratios in this process are equal then there is no reason to compute
ratios that are derived from them. They will be equal too. Specifically, if cij =
qst and cjk = qtq then we do not need to compute cik and qsq. They do not add
new information and are equal (see proof below). From cij = qst we can de-
rive that region G1i is matched to the region G2s and region G1j is matched to
the region G2t.
Similarly from cjk= qtq we can derive that region G1k is matched to the re-
gion G2q. Equality of cik and qsq does not add new information because it
matches region G1i with region G2s and region G1k is matched to the region
G2q, but this match was already established.
The previous consideration was based on sequential fill of lines that are
parallel to the matrix diagonal. Tables 6 and 7 illustrate how the third and
forth lines above the diagonal (termed the 3rd and 4th diagonals) are com-
puted for V2. The light color shows inputs and the dark color shows output.
These computations use the multiplication formula (1).
20. Algorithm development technology for conflation and area-based 525
conflation algorithm
and finding a pair with smallest distance that is below the threshold L, where
the mapping ϕ matches features in two images:
6.2 Proofs
This follows from the definitions of cij, cjk and cik, cij,=Si/Sj, cjk =Sj/Sk and
cik=Si/Sk, where
Now using cij = qst and cjk = qtq in (1) we will get
Theorem: if all elements cj j+1 (j=1,2,…,n) of n×n matrix A are given then
all other elements of A can be restored.
Proof. We can omit computing elements of V1 under diagonal, all these
elements are cij =1/cji for elements above the diagonal. This directly follows
from the definition of elements of V1. The further proof is provided by de-
signing an iterative process that computes all other elements of V1.
Step 1. Compute all elements aj j+2 by using elements aj j+1 in formula (1):
cj j+1⋅cj+1, j+2 = ci j+2.
Step 2. Compute all elements cj j+3 using formula (1) and elements ci j+2
computed in Step 1:
cj j+1⋅cj+1, j+2 = ci j+2.
General Step k: Compute all elements cj j+k using formula (1) and ele-
ments ci j+k-1 computed in Step k-1:
cj j+k-1⋅cj+k-1, j+k = ci j+k.
for matched ones. This work is done by a Vectorizer algorithm that sharpens
images and finds regions using a flood-fill method from computer graphics
[Angel, 2000] that starts from a seed point and looks recursively at colors of
adjacent pixels including diagonal neighbors until all neighboring pixels of
the same color are found. The set of these pixels is considered a single re-
gion. The number of the pixels in the region is considered its area/size. A set
of all extracted regions is ARC algorithm input.
After conflation is done by the ARC algorithm the conflation quality
can be evaluated by visual inspection of the conflated images and by a com-
putational procedure based on the absolute and relative difference between
matched regions.
The difference of two regions is XOR (exclusive OR) of pixels of regions
G and G`. The absolute difference of regions G and G`, Δ(G,G`) is com-
puted as the number of pixels in the difference of regions G and G`:
Δ(G,G`)=S(XOR(F (G),G`))
ρ(G,G`)=Δ(G,G`)/(S(G)+ S(G`)).
The total difference between three matched regions {G} and {G`} of im-
ages Im1 and Im2 found by ARC algorithm is
Figure 8. Two images before conflation is applied. A part of the aerial photo is rotated 90
degrees and stretched 2 times in the y direction, thus being disproportionally scaled.
The shape extraction stage in shown in Figure 9 for both images from
Figure 8. One image has 6 shapes, the other has 23 shapes. Shapes 3, 4, 6 in
image 1 are the 3 lakes, and shapes 14, 13, 15 are the 3 lakes in Image 2.
Shape 3 matches shape 14, shape 4 matches 13, and shape 6 matches shape
15. In the visualization the top left corner of the shape label corresponds to
the Center Point of the shape with that number. In this case the size ratios of
the 3 lakes are used to automatically match up the images. The Center points
of the lake shapes are used to calculate the transform needed for conflation.
The program makes this determination automatically.
Figure 10 shows the two images with matched features shown with rota-
tion, translation and scaling applied.
20. Algorithm development technology for conflation and area-based 531
conflation algorithm
Figure 9. Two images at shape extraction stage. One of the images is scaled
disproportionally. See also color plates.
Figure 10. Two images with matched features are shown after rotation,
translation and scaling are applied
Figure 11 shows the two full images of the sharpened aerial photo and
the topographic map before conflation. The aerial photo is rotated 90 de-
grees.
532 Chapter 20
Figure 11. Two full images sharpened aerial photo and topographic map before conflation.
Aerial photo is rotated 90 degrees.
Figure 12. Shape extraction and shape match visualization. See also color plates.
20. Algorithm development technology for conflation and area-based 533
conflation algorithm
The next screenshot (Figure 13) shows the two complete images after the
conflation algorithm has been applied to the full-size original images.
Figure 14(a) shows the two smaller images, which are parts of the aerial
photo and topographic map before conflation. Figure 14(b) shows the results
of conflation of these smaller images, which are parts of the aerial photo and
topographic map.
Figure 13. Two complete images after match applied to whole original images.
Figure 14. Conflation of two smaller images, which are parts of the aerial photo and
topographic map. See also color plates.
534 Chapter 20
9. CONCLUSION
10. ACKNOWLEDGEMENTS
Advanced
1. Design a conflation algorithm using ADTC technology
12. REFERENCES
Angel E., Interactive computer graphics: a top-down approach with OpenGL, Addison-
Wesley, 2000
ArcGIS, ESRI, 2004, http://www.esri.com/software/arcgis/about/desktop.html
Brown, L. A Survey of Image Registration Techniques, ACM Computing Sur-veys,vol.24
(4),pp. 325--376, 1992, citeseer.nj.nec.com/brown92survey.html
Cobb, M. Chung, M,.Foley, Petry. F., Shaw, K., and Miller, H., A rule-based approach for the
conflation of attributed vector data, GeoInformatica, 2/1, 1998, 7-36.
Doytsher, Y. Filin, S., Ezra, E. Transformation of Datasets in a Linear-based Map Conflation
Framework, Surveying and Land Information Systems, Vol. 61, No. 3, 2001, pp.159-169.
Edwards D., Simpson J. Integration and access of multi-source vector data, In: Geospatial
Theory, Processing and Applications, ISPRS Commission IV, Symposium 2002, Ottawa,
Canada, July 9-12, 2002, http://www.isprs.org/ commission4/proceedings/pdfpapers/
Jensen J., Saalfeld, A., Broome, F., Price, K., Ramsey, D., and Lapine, L. Spatial Data Acqui-
sition and Integration, 2000, NSF Workshop GIS and Geospatial Activities. Accessed June
2001 http://www.ucgis.org/research_white/data.html
Kovalerchuk, B., Triantaphyllou, E., Despande, A.S, and Vityaev, E. Interactive Learning of
Monotone Boolean Functions, Information Sciences, Vol. 94, issue 1-4, 1996, pp. 87-118.
Kovalerchuk, B., Vityaev E., Ruiz J.F., Consistent and Complete Data and "Expert" Mining in
Medicine, In: Medical Data Mining and Knowledge Discovery, Springer, 2001, pp. 238-
280.
Shah, M., Kumar, R., (Eds.) Video Registration, Kluwer, 2003
Terzopoulos D., Studholme, C., Staib, L., Goshtasby A., (Eds) Nonrigid Image Registration,
Special issue of Computer Vision and Image Understanding Journal, vol. 89, Issues 2-3,
pp. 109-319, 2003
Wang, J., Chun, J., and Park, Y.W. GIS-assisted image registration for an onboard IRST of a
land vehicle. Proc. SPIE Vol. 4370, p. 42-49, 2001
Zitová B., Flusser J., Image registration methods: a survey, Image and Vision Computing. 21
(11), 2003, pp. 977-1000
Chapter 21
VIRTUAL EXPERTS FOR IMAGERY
REGISTRATION AND CONFLATION
Abstract: The unique human expertise in imagery analysis should be preserved and
shared with other imagery analysts to improve image analysis and decision-
making. Such knowledge can serve as a corporate memory and be a base for an
imagery virtual expert. The core problem in reaching this goal is constructing a
methodology and tools that can assist in building the knowledge base of im-
agery analysis. This chapter provides a framework for an imagery virtual ex-
pert system that supports imagery registration and conflation tasks. The ap-
proach involves tree strategies: (1) recording expertise on-the-fly and (2) ex-
tracting information from the expert in an optimized way using the theory of
monotone Boolean functions and (3) use of iconized ontologies to built a con-
flation method.
Key words: Imagery virtual expert, ontology, knowledge base, rule generation optimiza-
tion, monotone Boolean function, registration, conflation.
1. INTRODUCTION
Why design virtual experts for conflation? Can the conflation problem
be solved by designing a sophisticated mathematical procedure without rely-
ing on an expert’s knowledge? In essence, the conflation problem is a con-
flict resolution problem between disparate data. Inconsistencies in multi-
source data can be due to scale, resolution, compilation standards, operator
license, source accuracy, registration, sensor characteristics, currency, tem-
porality, or errors [Edwards, Simpson, 2002]. The conflict resolution strate-
gies are highly context and task dependent. Dependency of conflation from a
specific task is discussed in Chapters 17, 18 and 19.
In solving a conflation problem, experts are unique in extracting and us-
ing non-formalized context and in linking it with the task at hand (e.g., find-
ing the best route). Unfortunately, few if any contexts are explicitly formal-
ized and generalized for use in conflating other images. It is common that
the context of each image is unique and not recorded. For example, an expert
conflating two specific images may match feature F1 with feature F3, al-
though the distance between features F1 and F2 is smaller than the distance
between features F1 and F3. The reasoning (that is typically not recorded)
21. Virtual experts for imagery registration and conflation 539
behind this decision could be as follows. The expert analyzed the whole im-
age as a context for the decision. The expert noticed that both features F1
and F3 are small road segments and are parts of much larger road systems A
and B that are structurally similar, but features F1 and F2 have no such link.
This conclusion is very specific for a given pair of images and roads on these
images. The expert did not have any formal definition of structural similarity
in this reasoning. Thus, this expert’s reasoning may not be sufficient for im-
plementing in an automatic conflation system for conflating other images.
Moreover, informal similarity the expert used for one pair of images can dif-
fer from similarity the same expert will use for two other images.
There are two known approaches to incorporate context: (1) formalize
context for each individual image and task directly and (2) generalize context
in the form of expert rules. In the first approach, the challenge is that there
are too many images and tasks and there is no unified technique to for con-
text formalization. The second approach is more general and more feasible,
but in some cases may not match a particular context and task, thus a human
expert needs to take a look.
and rotation and affine control points and shape based conflation tools.
These tools provide a foundation for interactive tools on-the-fly recording of
expert’s actions is implemented as a web portal. Thus the system allows an
expert to conflate images while having the expert knowledge presented by
conflating these images recorded on-the-fly.
The user can view these images overlapped and change the opacity of the
images for conflating. The system provides facilities for marking up the sec-
tions of the images using various shape tools. Each of these markups can be
named by the user using a basic name of his/her choice or by choosing from
a list of predefined terms from one of the ontologies, e.g., DAML-OWL
Geofile ontology. There is also a detailed list of actions the user has per-
formed that can be undone and redone to any point. A basic magnifier is
available for taking a detailed look at the image. Any of these markups and
conflations can be applied to multiple images to allow two (or more) images
to be conflated. All of these actions are recorded, can be presented in a hu-
man readable form and are available for playback. Figure 3 shows a confla-
tion sample with user action recording using ART.
5
2 6
3 4
3. The Auto Conflate tool allows the user to choose 3 points on two images
and using transformations, match those points together (and hopefully
the images as well).
4. The Show recording button allows the user to see how the conflation
was broken up via the current mode that was used in various segments of
the conflating.
5. The Opacity slider is used to set the transparency of the images, allow-
ing one image to be seen through another.
6. The checkboxes are used to select the image(s) that will receive the op-
erations such as move or draw polyline. This allows a user to resize or
rotate both images together, or once two images have been conflated to-
gether, a third image can be brought in and conflated against the other
two together.
Interactive optimized
rule generation Testing rules
(based on the theory of against test cases
monotone Boolean functions)
Recording rules
to the knowledge base
(1) a simple dominant geometric feature exists, (2) a simple unique geomet-
ric feature exists, (3) an asymmetric unique features exist and so on.
Now the imagery expert can be asked if all these 14 arguments (parame-
ters) are true for a pair if images, would he/she conclude that two images can
be conflated with a single affine linear transformation that most likely is
unique. If the answer is “yes,” then it will be encoded as 1. Taking into ac-
count that all 14 arguments are Boolean too, we matched a Boolean vector
(11111111111111) to (1). We can consider a subset of 14 parameters and
ask the expert the same question about these subset of parameters. For in-
stance, we can ask about situation represented by the vector
((11011101110111), where the third “0” indicates that we do not require that
parameter #3 should be true. The total number of subset (and questions)
could be 214.
Assume that we built a system asking an expert only some of these 214
questions. Then if in the real conflation case we have a situation
(11011101110111) that was not asked about then the incomplete knowledge
base does not provide the answer and the conflation task can not be solved
even if this is the solvable situation. The theory of monotone Boolean func-
tions allows us to avoid asking 214 and still generate a complete set of rules.
A specific example of rules build using MIKE system is described in Chap-
ter 20.
Defining rules. In Figure 7 (a), each column is a question asked from the
expert that is used by the system to build conflation rules in the form if
<conditions for Image1 and Image 2> then it is highly possible that <one and
only one (unique) affine transform that conflates images 1 and 2 exists>.
Colored columns are questions already answered by the expert (green col-
umns indicate “yes” answer and “red” column indicate “no” answers). The
current question is shown in the first (left-most) column. The expert presses
“yes” or “no” buttons for the current question as determination of green or
550 Chapter 21
red answer. The system after coloring the answer shifts all columns to the
right and shows a new question in the first column to the expert.
(a) Defining Expert Rules (b) Testing rules against image set
Figure 7. Defining expert rules and case parameter recording
Detailed markup
A rectangular markup in top left image in Figure 8 shows the flood area and
the flood icon in the corner. The smaller rectangular markups indicate road
and crops under flood. After two images have been conflated the marked up
areas are automatically transferred to another image taken before the area
was flooded. Such transfer can help to estimate damage and plan rescue op-
erations. A detailed markup shown on the left identifies the flood area in
more detail. The iconic summary in the bottom of Figure 8 can be read as “a
flood area with crop damage and a road under flood”. An analyst can review
such annotations before looking actual images in detail especially after con-
flation. This can be done faster than work with images that contain much
more information and majority of which may not be relevant.
The user is now ready to drag icons onto the image in order to mark it up.
This is the starting situation for the creation of the bridge between the image
and the domain information in the form of DAML-OWL ontology when an
image and an ontology are loaded. Figure 11 shows the next step where sev-
eral numeric icons have been drugged to markup the appropriate locations in
the image by the user annotator.
Figure 11. Image with various iconic annotations from the DAML-OWL Geofile ontology
(in the middle).
21. Virtual experts for imagery registration and conflation 555
The next step is to record the annotated image 1. The user can store an-
notated images and after that ART-MIKE software no longer has just a raster
image to work with, but knows the location of various key features via
iconic annotation. Finally the annotation is committed to a database where it
can be later pulled for reference data. Person A’s goal for this image is done.
Now we move onto person B. Person B just received a photo of a flooded
area (shown in Figures 12 and 13 next to the first image), and needs to know
what was beneath that water area.
These steps show the completion of one task (annotating images) and the
start of another task (conflating images). Expert requests the system for a
suggestive match. The software tries to determine a match based on the simi-
larities of the annotations in each image and shows the results for the expert
to evaluate.
An automated approach to match up iconic annotations from one image
to another is based on similarities of ontological concepts in the ontology
556 Chapter 21
(a) Ontological match for figure 13 (b) Ontological match for figure 15
Figure 14. Matching ontological icon categories
Thus, we have three matched points (identified by icon matched icon lo-
cations) in two images and an affine transform can be run to conflate them.
21. Virtual experts for imagery registration and conflation 557
Now we can assume that two other persons C and D marked the same
images as shown in Figure 15 below. Figure 14 (b) shows ontological match
of these mark ups and Figure 16 shows the result of their conflation using
these match.
Figure 16. Result of affine conflation based on ontological match and its accuracy
The same iconic ontological approach have been used to conflate historic
maps using the same DAML-OWL Geofile ontology with terms: City (7),
Operating Area (9), Bay (32), Port (35), Dock (39) AND Sea area (64). As
can be seen below simple manipulation of a bitmap might not be enough to
get everything matched up. In these cases the maps are hand drawn and are
horribly inaccurate. Attempts on using scaling, translation, and rotation will
fail, but ontological matching can be still correct. Figure 17(a) shows two
images to be conflated: (1) modern Macau map, 2003 (on the left) and his-
toric Portuguese map, 1889, on the right from the collection of the Library of
the Congress [Macau, 2003]. Figure 17(b) shows that two images conflated
and the third map is not conflated yet with two already conflated. This figure
indicates the need for non-linear transformation.
558 Chapter 21
(a) Two images to be conflated (b) Two images linearly conflated with the third
image on the left to be conflated
Figure 17. Two images to be conflated [Macau, Library of the Congress, 2003].
See also color plates.
level then the higher chances that the match is not accidental. The same
number of intermediate nodes used in (2) can be in both shallow and deep
match.
8. CONCLUSION
9. ACKNOWLEDGEMENTS
1. Select two aerial photos from the web of the same area but with different
spatial resolution. Design an iconic annotation for these images and pro-
vide justification for your chose of icons, spatial features and ontology
terms.
2. Build an ontology that will fit images you used in exercise 1. It should be
a tree with three or more levels.
Chapter 2
Figure 1. Information visualizations for presentation and branding. Left NASDAQ dis-
play and Right: Visual Insights’ eBizLive product for showing website activity
561
562
Chapter 3
Figure 11. Hyperproof [http://www-csli. stanford.edu/hp/Hproof2.html]
Chapter 6
E21 E22 H2 E1 P
P
2
E31 E32 H3 E1
2
Icon Icon
Legend 1 Legend 2
H1 E11 E32
E
H2 E12 E41
E
H3 E21 E42
H4 E31
E111 E112
E12
E12
D Yes
&
E12 P
(3)
IF OR THEN
Chapter 8
Figure 1. Pieter Bruegel’s painting “Blue cloak” (1559), oil on oak panel, 117 x 163 cm. (with
permission from Staatliche Museen zu Berlin - Gemaldegalerie, Berlin)
Fragment of Table 1. Encoding text in art
Compressed content of the text: metaphor proverb and icon
Big fish eats little fish He catches fish with his Merged picture: Big fish eats little fish while
hands he catches fish with his hands
Chapter 10
10 times compression Single icon for the event (pure Bruegel)
Max possible for a given n=10
10 icons
Figure 3. Dynamics of compression of iconic sentence.
Figure 6. Icons with user
defined weighting
a b c
a b c
Figure 8. A scatterplot matrix demonstrates the Figure 9. A scatterplot matrix demonstrates the
impact of reducing document vectors (row) versus effects of reducing pixel vectors (row) versus
reducing vector dimensions (column) using the reducing vector dimensions (column) using
Sammon Projection technique. remote sensing imagery.
b c d
a b c
e f g
Sliding Direction
Long Window Short
Figure 13. An illustra-
tion of our multiple
sliding window design Data
in visualizing data
streams
New vectors are
Eigenvectors
projected using the
determined by the
Eigenvectors of the
long window
long window
571
a b
Figure 15. a) A scatterplot of 6,155 hydroclimate vectors divided into 10 clusters. Each clus-
ter is represented by a unique random color. B) Corresponding cluster colors are projected to
the map position
a
Figure 16. The Eigenvectors
of the scatterplots
c
Chapter 14
Chapter 15
Figure 2. Data
visualization:
original data on the
left and simulta-
neously rescaled
data on the right
Figure 2.
Figure 3.
Chapter 16
Figure 11. Breast cancer cases visualized using procedure P3 with cases
shown as bars with frames.
Figure 14. A 3-D version of Monotone Figure 15. A 3-D version of Monotone
Boolean Visual Discovery with vertical and Boolean Visual Discovery
horizontal surfaces used. with grouping Hansel chains
573
Chapter 18
ht
?
aig
?
Str
Different Shape
?
? ed
rv
cu
Offset
Rotation
Scaling
342x451 pixels
28.5m GSD
Figure 18. Intensity correlation non-unique and wrong solution. Radiance intensity-based
correlation from total of 49 locations centered on a 7x 7 grid over the image pairs in Figure
16. The connection of the correlation peaks does not have the same trend as the registration
vector displayed in Figure 16. See also color plates.
575
Figure 19. Entropy correlation. The trend matches with data. Entropy-based correlation with
the same lay out in 18. The connection of the of the correlation peaks does have the same
trend as the registration vector from Figure 16. See also color plates.
Fragment of Figure 4. Expert question- Figure 9. Two images at shape extraction stage.
ing using monotonicity principle One of the images is scaled disproportionally.
576
Chapter 21
(a) Two images to be conflated (b) Two images linearly conflated with the third
image on the left to be conflated
Figure 17. Two images to be conflated [Macau, Library of the Congress, 2003]