7869
7869
https://ebookultra.com/download/the-structure-of-complex-networks-
theory-and-applications-first-published-in-paperback-edition-estrada/
https://ebookultra.com/download/information-and-exponential-families-
in-statistical-theory-barndorff-nielsen/
https://ebookultra.com/download/statistical-theory-and-methods-for-
evolutionary-genomics-xun-gu/
A Multidisciplinary Introduction to Information Security
DISCRETE MATHEMATICS AND ITS APPLICATIONS 2011th Edition
Stig F. Mjolsnes
https://ebookultra.com/download/a-multidisciplinary-introduction-to-
information-security-discrete-mathematics-and-its-applications-2011th-
edition-stig-f-mjolsnes/
https://ebookultra.com/download/mixed-methods-social-networks-
research-design-and-applications-silvia-dominguez/
https://ebookultra.com/download/protein-microarrays-methods-and-
protocols-2011th-edition-ulrike-korf/
https://ebookultra.com/download/game-theory-in-wireless-and-
communication-networks-theory-models-and-applications-1st-edition-zhu-
han/
https://ebookultra.com/download/numerical-and-statistical-methods-for-
bioengineering-applications-in-matlab-1st-edition-michael-r-king/
Towards an Information Theory of Complex Networks
Statistical Methods and Applications 2011th Edition
Matthias Dehmer Digital Instant Download
Author(s): Matthias Dehmer, Frank Emmert-Streib, Alexander Mehler
ISBN(s): 9780817649036, 0817649034
Edition: 2011
File Details: PDF, 5.56 MB
Year: 2011
Language: english
Matthias Dehmer
Frank Emmert-Streib
Alexander Mehler
Editors
Towards an Information
Theory of Complex Networks
Statistical Methods and Applications
Editors
Matthias Dehmer Frank Emmert-Streib
UMIT School of Medicine, Dentistry
Institute of Bioinformatics and Biomedical Sciences
and Translational Research Center for Cancer Research and Cell Biology
Eduard-Wallnöfer-Zentrum I Queen’s University Belfast
A-6060 Hall in Tirol 97 Lisburn Road
Austria Belfast BT9 7BL
mathias.dehmer@umit.at United Kingdom
v@bio-complexity.com
Alexander Mehler
Faculty of Computer Science
and Mathematics
Goethe-University Frankfurt
am Main Robert-Mayer-Straße 10
P.O. Box: 154
D-60325 Frankfurt am Main
Germany
mehler@em.uni-frankfurt.de
www.birkhauser-science.com
Preface
For more than a decade, complex network analysis has evolved as a methodological
paradigm for a multitude of disciplines, including physics, chemistry, biology,
geography, sociology, computer science, statistics, media science, and linguistics.
Researchers in these fields share an interest in information processing subject to the
networking of their corresponding research object, for instance, genes, molecules,
individuals, semes, memes, etc. They start with the insight that any of these
research objects is extrinsically characterized, if not constituted, by its networking
with objects of the same provenance. In this way, networks, for example, gene
networks, food networks, city networks, networks of words, sentences, texts, or web
documents become important research objects in more and more disciplines.
This book, in line with these research developments, presents theoretical and
practical results of statistical models of complex networks in the formal sciences,
the natural sciences, and the humanities. One of its goals is to advocate and promote
combinations of graph-theoretic, information-theoretic, and statistical methods as a
way to better understand and characterize real-world networks.
On the one hand, networks appear as paradigmatic objects of approaches
throughout the natural and social sciences and the humanities. On the other hand,
networks are—irrespective of their disciplinary provenance—known for character-
istic distributions of graph-theoretic invariants which affect their robustness and
efficiency in information processing. The main goal of this book is to further develop
information-theoretic notions and to elaborate statistical models of information
processing in such complex networks. In this way, the book includes first steps
toward establishing a statistical information theory as a unified basis for complex
network analysis across a multitude of scientific disciplines.
The book presents work on the statistics of complex networks together with
applications of information theory in a range of disciplines such as quantitative
biology, quantitative chemistry, quantitative sociology, and quantitative linguistics.
It aims to integrate models of invariants of network topologies and dynamic
aspects of information processing in these networks or by means of these networks.
v
vi Preface
Thus, the book is in support of sharing and elaborating models and methods that may
help researchers get insights into complex problems emerging from interdisciplinary
reasoning.
The book is divided into two parts: Chaps. 1–4 deal with formal-theoretical issues
of network modeling, while Chaps. 5–13 further develop and apply these methods
to empirical networks from a wide range of areas. The book starts with a theoretical
contribution by Abbe Mowshowitz on the entropy of digraphs and infinite graphs.
The aim is to provide insights into more complex graph models that go beyond
the majority of network models based on finite undirected graphs. The chapter by
Nicolas Bonichon, Cyril Gavoille, and Nicolas Hanusse presents an information-
theoretic upper bound of planar graphs by means of the newly introduced notion of
well-orderly maps. Such a technique might be useful when studying properties of
the very important notion of planar graphs. Terence Chan and Raymond W. Yeung
study a statistical inference problem using network models. Richard Berkovits,
Lukas Jahnke, and Jan W. Kantelhardt examine phase transitions within complex
networks that help to examine their structural properties.
The remainder of the book combines the theoretical stance of the first section
with an empirical analysis of real networks. Elena Konstantinova provides a survey
on information-theoretic measures used in chemical graph theory. Prabhat K. Sahu
and Shyi-Long Lee develop a model of chemical graphs by example of molecular
networks. Exploring the spectral characteristics of these graphs, they provide a
successful classification of chemical graphs.
Biological or, more specifically, ecological networks are dealt with by Robert E.
Ulanowicz who describes a framework of quantifying patterns of the interaction
of networked trophic processes from the point of view of information theory.
Ecological networks are also the focus of the chapter of Linda J. Moniz, James D.
Nichols, Jonathan M. Nichols, Evan G. Cooch, and Louis M. Pecora, who provide
an approach to modeling the interaction dynamics of ecosystems and their change.
A comprehensive view of ontologically disparate networks is given by Cristian
R. Munteanu, J. Dorado, A. Pazos Sierra, F. Prado-Prado, L.G. Pérez-Montoto,
S. Vilar, F.M. Ubeira, A. Sanchez-Gonzaléz, M. Cruz-Monteagudo, S. Arrasate,
N. Sotomayor, E. Lete, A. Duardo-Sánchez, A. Dı́az-López, G. Patlewicz, and
H. González-Dı́az who use the notion of entropy centrality to compare various
systems such as chemical, biological, crime, and legislative networks, thereby
showing the interdisciplinary expressiveness of complex network theory.
The book continues with two contributions to linguistic networks: Alexander
Mehler develops a framework for analyzing the topology of social ontologies as
they evolve within Wikipedia and contrasts them with nonsocial, formal ontologies.
Olga Abramov and Tatjana Lokot present a comparative, classificatory study of
morphological networks by means of several measures of graph entropy.
Edward B. Allen discusses the measurement of the complexity and error prob-
ability of software systems represented as hypergraphs. Finally, in the chapter by
Philippe Blanchard and Dimitri Volchenkov, random walks are studied as a kind
of Markov process on graphs that allow insights into the dynamics of networks as
diverse as city and trade and exchange networks.
Preface vii
With such a broad field, it is clear that the present book addresses an interdisci-
plinary readership. It does not simply promote transdisciplinary research. Rather, it
is about interdisciplinary research that may be the starting point of developing an
overarching network science.
Matthias Dehmer
Frank Emmert-Streib
Alexander Mehler
Acknowledgments
Many colleagues have provided us with input, help, and support (consciously or
unconsciously) before and during the preparation of this book. In particular, we
would like to thank Andreas Albrecht, Gökmen Altay, Gabriel Altmann, Alain
Barrat, Igor Bass, David Bialy, Philippe Blanchard, Danail Bonchev, Stefan Borgert,
Mieczysław Borowiecki, Andrey A. Dobrynin, Michael Drmota, Ramon Ferrer
i Cancho, Maria and Gheorghe Duca, Maria Fonoberova, Armin Graber, Martin
Grabner, Peter Gritzmann, Ivan Gutman, Peter Hamilton, Wilfried Imrich, Patrick
Johnston, Elena Konstantinova, D. D. Lozovanu, Dennis McCance, Abbe Mow-
showitz, Arcady Mushegian, Andrei Perjan, Armindo Salvador, Maximilian Schich,
Heinz Georg Schuster, Helmut Schwegler, Andre Ribeiro, Burghard Rieger, Brigitte
Senn-Kircher, Fred Sobik, Doru Stefanescu, John Storey, Shailesh Tripathi, Kurt
Varmuza, Bohdan Zelinka, and Shu-Dong Zhang. Additionally, Matthias Dehmer
thanks Armin Graber for strong support and providing a fruitful atmosphere at
UMIT. Finally, we would like to thank our editor Tom Grasso who has been always
available and helpful.
The work on the chapters of Philippe Blanchard and Dimitri Volchenkov, Olga
Abramov, and Alexander Mehler have been supported by the German Federal
Ministry of Education and Research (BMBF) through the project Linguistic Net-
works.1 We gratefully acknowledge this financial support.
1
www.linguistic-networks.net.
ix
Contents
xi
xii Contents
xiii
xiv Contributors
J.D. Nichols U.S. Geological Survey, Patuxent Wildlife Research Center, Laurel,
MD 20708, USA, jnichols@usgs.gov
J.M. Nichols Naval Research Laboratory, Optical Sciences Division, Code 5673,
Washington, DC 20375, USA, jonathan.nichols@nrl.navy.mil
G. Patlewicz Institute for Health and Consumer Protection (IHPC), Joint Research
Centre (JRC), European Commission, via E. Fermi 2749–21027 Ispra (Varese), Italy
DuPont Haskell Global Centers for Health and Environmental Sciences, Newark,
DE 19711, USA, Grace.Y.Tier@usa.dupont.com
Alejandro Pazos-Sierra Department of Information and Communication Tech-
nologies, Computer Science Faculty, University of A Coruña,
15071 A Coruña, Spain, apazos@udc.es
L.M. Pecora Naval Research Laboratory, Code 6362, Washington, DC 20375,
USA, pecora@anvil.nrl.navy.mil
L.G. Pérez-Montoto Faculty of Pharmacy, University of Santiago de Compostela,
15782 Santiago de Compostela, Spain, lgmp2002@yahoo.es
F. Prado-Prado Faculty of Pharmacy, University of Santiago de Compostela,
15782 Santiago de Compostela, Spain, fenol1@hotmail.com
Prabhat K. Sahu Institüt für Physikalische und Theoretische Chemie, Universität
Würzburg, Am Hubland, 97074 Würzburg, Germany
Department of Chemistry and Biochemistry, National Chung Cheng University,
Chia-Yi, 621 Taiwan, sahu@chemie.uni-wuerzburg.de
A. Sanchez-Gonzaléz Department of Inorganic Chemistry, Faculty of Pharmacy,
University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,
angeles.sanchez@usc.es
N. Sotomayor Department of Organic Chemistry II, Faculty of Science and
Technology, University of the Basque Country/Euskal Herriko Unibertsitatea, Apto.
644, 48080 Bilbao, Spain, nuria.sotomayor@ehu.es
F.M. Ubeira Department of Microbiology and Parasitology, Faculty of Pharmacy,
University of Santiago de Compostela, 15782 Santiago de Compostela, Spain,
fm.ubeira@usc.es
Robert E. Ulanowicz Department of Biology, University of Florida, Gainesville,
FL 32611-8525, USA
University of Maryland Center for Environmental Science, Solomons, MD 20688-
0038, USA, ulan@umces.edu
S. Vilar Faculty of Pharmacy, University of Santiago de Compostela, 15782
Santiago de Compostela, Spain, qosanti@yahoo.es
xvi Contributors
A. Mowshowitz
1 Introduction
1.1 Overview
This chapter investigates the information content of directed and infinite graphs.
The information content of a finite graph (directed or undirected) is a quantitative
measure based on the symmetry structure of the graph. As explained in detail
A. Mowshowitz ()
Department of Computer Science, The City College of New York (CUNY),
138th Street at Convent Avenue, New York, NY 10031, USA
e-mail: abbe@cs.ccny.cuny.edu
below, the group of symmetries of a finite graph partitions the vertex set and thus
induces a unique finite probability scheme. The entropy of this scheme is taken to
be the information content of the graph. This “classical” notion differs from “graph
entropy” introduced in [16].
Development of the concept of entropy applied to finite graphs is discussed
in [17] and [20]. The application of entropy to graphs was introduced in the
1950s soon after the appearance of Shannon’s famous paper on information theory.
Entropy measurement has been used as a tool for characterizing molecules and
chemical structures. For example, measures characterizing the structural complexity
of chemical graphs have been developed and applied in [1, 3, 6]. Most of these
measures are based on graph invariants that generate an equivalence relation on
the vertices or edges of a graph. The resulting equivalence classes form a partition
to which a finite probability scheme [14] can be associated in a natural way. The
entropy of such a scheme provides a quantitative measure of structural complexity.
Various structural features of a graph have provided the basis for entropy
measures. The earliest centered on the symmetries of a graph [21]. Other features,
such as branching structure in molecular graphs, have been used to define entropy
measures [8]. Measures associated with graphs representing atoms and molecules
have been defined and applied to problems of discriminating chemical isomers and
to classifying atomic and chemical structures [7, 9, 15]. Such measures have also
been used for the analysis of biological networks [13]. Degree characteristics of a
graph have been used as basis for an entropy-based measure of disorder in complex
networks [23]. Interest in measuring the information content of graphs has also been
kindled in recent years by the growing importance of computer and social networks
in modern society [10, 24]. Relationships between graph entropy-based measures,
expressed as inequalities, have been demonstrated in [11].
The notion of information content can be extended to infinite graphs. The
approach adopted here is to consider an infinite graph as a sequence of finite graphs.
Each of the finite graphs in the sequence has a well-defined information content, and
if the corresponding sequence of information content values has an unambiguous
limit, that limit is defined to be the information content of the given infinite graph.
In Sect. 2, we will look into the existence of directed graphs with prescribed
information content and determine the information content of certain products of
directed graphs. Section 3 will focus on infinite graphs, investigating the information
content of some special classes of infinite graphs, and applying results from Sect. 2
to determine the information content of infinite graphs in general. Section 4 will
examine some applications of the information measure to problems in network
theory.
2 Entropy of Digraphs
The automorphism group of a digraph and the measure of information content based
on the group are defined below.
Definition 5. Let G D .V; E/ be a (directed or undirected) graph with vertex set
V (with jV j D n), and edge set E. The automorphism group of G, denoted by
Aut.G/, is the set of all adjacency preserving bijections of V .
Definition 6. Let fVi j1 i kg be the collection of orbits of Aut.G/ and suppose
jVi j D ni f or 1 i k. The entropy or information content of G is given by the
following formula [17]:
k
X ni n
i
Ia .G/ D log :
i D1
n n
Many different binary operations on graphs and digraphs appear in the literature
[19]. We will examine four such operations in some detail, namely, the sum, join,
Cartesian product, and the composition. Our aim is to determine the information
content of a digraph operation in relation to the information contents of the
respective digraphs in the operation. Such products are useful in defining classes
of digraphs with properties of interest in different applications, especially those
pertaining to the analysis of networks.
Definition 7. The sum of G1 and G2 is the digraph G1 [ G2 defined by V .G1 [
G2 / D V .G1 / [ V .G2 / and E.G1 [ G2 / D E.G1 / [ E.G2 /.
4 A. Mowshowitz
7 2
1
6 2
6 1 3
3 2
5 3
5 4
4
X Y Z
Orbit: {1,2,3,4,5,6} Orbits: {1}, {2,5}, {3,6}, {4,7} Orbits: {1}, {2}, {3}
Ia (X) = 0 Ia (Y) = −(1/7) log (1/7) Ia (Z) = log 3
−3(2/7) log (2/7)
Ia .G [ H / D Ia .G C H /
1
D log.n C m/ C ŒnIa .G/ C mIa .H /
nCm
n log.n/ m log.m/:
Y
x
1h n i
Ia .G/ D .n 1/ log C log.n/ :
n n1
The join and Cartesian product can be used to construct digraphs with given
information content. More precisely, for any finite probability scheme there exists a
digraph with information content equal to the entropy of the scheme. This result is
stated in the following theorem originally presented in [18].
Theorem 5. Let n be any positive integer, and suppose P D fnij g is a partition of
n where nij D ni (1 j ri /, ni1 ¤ ni2 (i1 ¤ i2 ), and i D 1; 2; ; k. Then there
exists a weakly connected digraph G with n vertices such that Aut.G/ has exactly
Pk
rD ri orbits, and for each nij there is an orbit A with jAj D nij ; and, hence,
i D1
k
X ni n
i
Ia .G/ D H.P / D ri log :
i D1
n n
Proof. The proof is based on a simple construction. Let Gi D Lri 1 Cni where
Lri 1 is a directed path of length ri 1 and Cni is a directed cycle of length ni .
Since the path and cycle are relatively prime with respect to the Cartesian product,
the orbits of Aut.Gi / are the respective products of the orbits of Aut.Lri 1 / and
Aut.Cni /. Hence, Aut.Gi / has exactly ri orbits, each consisting of ni elements.
The digraph G formed by taking the join of the k non-isomorphic Gi has an
automorphism group with orbits corresponding to the partition specified in the
hypothesis of the theorem, and thus has the required information content. t
u
Figure 1.5 illustrates the Theorem for n D 25; P D f13 ; 24 ; 32 ; 42 g.
8 A. Mowshowitz
G H G’ H’ H G
Ia (G) = 0; Ia (H) = log 3 Ia (G’) = Ia (H’) Ia (H) = log 3; Ia (G) = 0;
= log 3 − 2/3
GXH G’ X H’ HoG
Ia (G X H) Ia (G’ X H’) Ia (G o H)
= Ia (G) + Ia(H) = 2 log 3 − 16/9 = Ia (H) + Ia (G)
= log 3 < Ia (G’) + Ia (H’) = log 3
= 2 log 3 − 12/9
+ + +
L2 X C1 L3 X C2 L1 X C3 L1 X C4
3.1 Preliminaries
, , ,
G1 G2 G3
...
,
G4
Fig. 1.6 A countable graph with more than one defining sequence
10 A. Mowshowitz
n
0 if n is odd
ia .Gn / D log.5/ 35 log.3/ 25 if n is even
Thus, for the subsequence Sn consisting
of the odd terms, IO.GI Sn / D 0; and for the subsequence Tn consisting of the even
terms, IO.GI Tn / D log.5/ 35 log.3/ 25 . The difference in this case is finite,
but it could be infinite as shown in [18]. Using a measure that depends on the
graph’s defining sequence is not necessarily a disadvantage. An infinite graph can
be viewed as an idealization of a growth process. Including the defining sequence
in the definition allows for capturing different principles of growth in practice.
Infinite graphs can be built up recursively with the aid of graph products. The
following result makes use of the Cartesian product.
Lemma 1. Let G be a graph with n vertices. Ia .G K2 / D Ia .G/.
Proof. Corresponding vertices of the two copies of G are in the same orbit of G
K2 , so G and G K2 have the same number of orbits, and each orbit of G K2
has exactly double the number of vertices as the corresponding orbit of G. Thus, if
Aut.G/ has orbits A1 ; A2 ; ; Ar with jA.i /j D ki ; 1 i r, then Ia .G K2 / D
P r
2ki
2n
log 2k
2n
i
D Ia G. t
u
i D1
H1 D G;
HnC1 D Hn K2 ; for n 1
Since the limit of the (defining) sequence fHn g1 nD1 exists, we can set H1 D
limn!1 Hn . Now, Ia .Hn / D 0 for all n 1 which implies by the lemma
that IO.H I Hn / = 0, i.e., the sequence of finite hypercubes yields a limit whose
information content is zero. The hypercube serves as a useful model in parallel
computation. A key feature in this context is the favorable maximum distance
between any two vertices in the graph. This allows for placing computational
units so as to minimize communication costs. The zero information content of the
1 Entropy of Digraphs and Infinite Networks 11
S3 X K2 W3 X K2
hypercube reflects the high degree of symmetry of this graph, which allows for
simultaneous placement of elements at optimal distance from each other.
Other graphs of interest, with information content between the two extremes, can
be substituted for G in G K2 .
Let S k denote the star of order k, a connected graph with one vertex of degree
k 1 and k 1 vertices of degree 1; and let W k denote the wheel of order k, a
connected graph obtained from the star by joining the degree 1 vertices in a cycle of
length k1. Once again using the Cartesian product, we can build infinite sequences
based on these simple graphs.
S1k D S k ;
k
SnC1 D Snk K2 ; for n 1:
The information content of the line graph increases without bound, so the informa-
tion content of the limit graph is infinite.
The cycle graph of order k has information content Ia .C k / D 0, so the limit
graph in this case has information content zero.
12 A. Mowshowitz
4 Applications
Preferential attachment has been studied extensively as a protocol for the growth of
large-scale networks like the Internet [5]. According to this protocol, a vertex added
to a network will be more likely to become attached to existing vertices of higher
rather than of lower degree. The “preference” of a vertex v as a target of attachment
might be expressed as the probability given by the degree of v divided by the sum of
the degrees in the graph. This introduces a random element in the growth process.
Perhaps the simplest way to realize a (relatively deterministic) version of growth by
preferential attachment is to add a single new vertex at each iteration, connecting the
new vertex to an existing one whose degree is maximal in the current graph. Call
this a type-0 preferential attachment protocol. If the starting graph is K1 , the result
is clearly a star. After the nth new vertex has been added, a star of order n C 1 has
n
been formed. This graph S nC1 has information content log.n C 1/ nC1 log.n/, and
as noted above, this value tends to zero as n increases without bound.
A variation on this simple protocol is to add k new vertices at each iteration and
attach each one of them to a different existing vertex, choosing the existing vertices
in nonincreasing order of degree, beginning with one of maximal degree.
Figure 1.8 illustrates the construction process according to this protocol, and the
following theorem gives the information content in the case where k equals the
number of vertices in the initial graph of the sequence.
Theorem 6. Let fGnk g1
1 be a sequence of graphs defined as follows:
G1k D S k
k
GnC1 is obtained from Gnk by adding k new vertices and joining each one to a
different vertex of maximal degree in Gnk .
Proof. Since k vertices are added for each iteration, Gnk , the nth graph in the
sequence has nk C 1 vertices. Let v be the vertex of highest degree in Gnk . The
orbits of Aut.Gnk / consist of the vertex v alone, the vertices of degree 1 adjacent to
v, the vertices of degree > 1 adjacent to v, and the vertices of degree 1 at distance 2
from v. Thus, the orbits of Aut.Gnk / have 1, n 1, k, and .k 1/.n 1/ vertices
from which the result follows. t
u
Corollary 3. Let Gnk be defined as in the theorem. Then IO.GI Gnk / D log k.
Proof. Simplifying the expression in the theorem gives Ia .Gnk / D log.nk C 1/
1
nkC1
Œ.k 1/.n 1/ log.k 1/.n 1/ C .n 1/ log.n 1/ C k log k. Taking the
limit as n ! 1 yields IO.GI Gnk / D log k as required. t
u
Greater connectivity in a network that grows by preferential attachment can
be achieved by allowing the newly added vertices to be joined to more than one
existing vertex [2]. Call this a type-1 preferential attachment protocol. This protocol
is illustrated in Fig. 1.9. The following theorem gives the information content of an
infinite graph that grows according to a type-1 protocol with k D 2.
Theorem 7. Let fHn2 g1
1 be a sequence of graphs defined as follows:
H12 D K2
2
HnC1 is obtained from Hn2 by adding 2 new vertices and joining each one to exactly
two different vertex of maximal degree in Hn2 .
14 A. Mowshowitz
1 1
6 2 6 2
5 3 5 3
4 4
G H
measure of information content, but both have diameter 3 and for both the minimum
number of vertices whose removal disconnects the respective graphs is 2.
This example suggests that there is no simple correlation between information
content, on one hand, and diameter and vulnerability, on the other hand. Two graphs
with the same diameter can register as far apart as possible on the information
content scale; similarly, two graphs with the same (vertex) vulnerability rating can
be far apart on the information content measure. However, it is possible that graphs
that grow in a certain way (like the hypercube) may exhibit a high correlation
between diameter and information content.
Information content could possibly be correlated with diameter if only graphs
with similar distance properties were grouped together. In the example, the max-
imum distances from each vertex (from 1 to 6) to any other in G is given by
.2; 2; 3; 2; 2; 3/, whereas for H these maximum distances are .3; 3; 3; 3; 3; 3/. Al-
though the two graphs have the same diameter, the sequences of maximum distance
do not coincide. Another potentially useful discriminant is the metrical property
distance degree sequence [4]. This sequence gives the number of vertices at each
succeeding distance from a given vertex. Graph H in Fig. 1.10 is distance degree
regular but G is not. The hypercube is distance degree regular and has 0 information
content; however, a distance degree regular graph does not necessarily have a
transitive automorphism group. The more general metrical properties investigated in
[22] might prove useful in determining the relationship between information content
and diameter.
Another open question is how to define the information content of an infinite
graph independently of a defining sequence of finite graphs. Investigation of this
question could shed light on the properties of very large-scale networks.
Acknowledgments Research was sponsored by the US Army Research Laboratory and the UK
Ministry of Defence and was accomplished under Agreement Number W911NF-06-3-0001. The
views and conclusions contained in this document are those of the authors and should not be
interpreted as representing the official policies, either expressed or implied, of the US Army
Research Laboratory, the US Government, the UK Ministry of Defence, or the UK Government.
The US and UK Governments are authorized to reproduce and distribute reprints for Government
purposes notwithstanding any copyright notation hereon.
References
1. Balaban, A.T., Balaban, T.S.: New vertex invariants and topological indices of chemical graphs
based on information on distances. J. Math. Chem. 8, 383–397 (1991)
2. Bent, G., Dantressangle, P., Vyvyan, D., Mowshowitz, A., Mitsou, V.: A dynamic distributed
federated database. Proceedings of the Second Annual Conference of the International
Technology Alliance, Imperial College, London, September 2008
3. Bertz, S.H.: The first general index of molecular complexity. J. Am. Chem. Soc. 103,
3241–3243 (1981)
4. Bloom, G., Kennedy, J.W., Quintas, L.V.: Distance degree regular graphs. In: Chartrand, G.
(ed.) The Theory of Applications of Graphs, pp. 95–108. Wiley, New York (1981)
16 A. Mowshowitz
5. Bollobás, B., Riordan, O.: The diameter of a scalefree random graph. Combinatorika 24, 5–34
(2004)
6. Bonchev, D.: Information Theoretic Indices for Characterization of Chemical Structures.
Research Studies Press, Chichester, UK (1983)
7. Bonchev, D.: Complexity in Chemistry, Biology, and Ecology. Mathematical and Computa-
tional Chemistry series. Springer, New York (2005)
8. Bonchev, D., Trinajstic, N.: Information theory, distance matrix and molecular branching. J.
Chem. Phys. 67, 4517–4533 (1977)
9. Dehmer, M., Emmert-Streib, F.: Structural information content of chemical networks.
Zeitschrift für Naturforschung A 63a, 155–159 (2008)
10. Dehmer, M., Emmert-Streib, F., Mehler, A. (eds.): Towards an Information Theory of Complex
Networks: Statistical Methods and Applications. Springer/Birkhäuser, Berlin (2011)
11. Dehmer, M., Mowshowitz, A.: Inequalities for entropy-based measures of network information
content. Appl. Math. Comput. 215, 4263–4271 (2010)
12. Harary, F.: Graph Theory. Addison Wesley, Reading, MA (1969)
13. Hirata, H., Ulanowicz, R.E.: Information theoretical analysis of ecological networks. Int. J.
Syst. Sci. 15, 261–270 (1984)
14. Khinchin, A.I.: Mathematical Foundations of Information Theory. Dover Publications, New
York (1957)
15. Konstantinova, E.V., Skorobogatov, V.A., Vidyuk, M.V.: Applications of information theory in
chemical graph theory. Indian J. Chem. 42, 1227–1240 (2002)
16. Körner, J.: Coding of an information source having ambiguous alphabet and the entropy of
graphs. In: Transactions of the 6th Prague Conference on Information Theory, 411–425 (1973)
17. Mowshowitz, A.: Entropy and the complexity of graphs: I. An index of the relative complexity
of a graph. Bull. Math. Biophys. 30, 175–204 (1968)
18. Mowshowitz, A.: Entropy and the complexity of graphs: II. The information content of
digraphs and infinite graphs. Bull. Math. Biophys. 30, 225–240 (1968)
19. Mowshowitz, A., Mitsou, V., Bent, G.: Models of network growth by combination. Proceedings
of the Second Annual Conference of the International Technology Alliance, Imperial College,
London, September 2008
20. Mowshowitz, A., Mitsou, V.: Entropy, orbits and spectra of graphs. In: Dehmer, M. (ed.)
Analysis of Complex Networks: From Biology to Linguistics. Wiley-VCH, Weinheim (2009,
in press)
21. Rashevsky, N.: Life, information theory, and topology. Bull. Math. Biophys. 17, 229–235
(1955)
22. Skorobogatov, V.A., Dobrynin, A.A.: Metric analysis of graphs. MATCH 23, 105–151 (1988)
23. Sole, R.V., Valverde, S.: Information theory of complex networks: On evolution and architec-
tural constraints. Lect. Notes Phys. 650, 189–207 (2004)
24. Wasserman, S., Faust, K.: Social Network Analysis: Methods and Applications, Structural
Analysis in the Social Sciences. Cambridge University Press, Cambridge (1994)
Chapter 2
An Information-Theoretic Upper Bound
on Planar Graphs Using Well-Orderly Maps
Abstract This chapter deals with compressed coding of graphs. We focus on planar
graphs, a widely studied class of graphs. A planar graph is a graph that admits an
embedding in the plane without edge crossings. Planar maps (class of embeddings
of a planar graph) are easier to study than planar graphs, but as a planar graph may
admit an exponential number of maps, they give little information on graphs. In
order to give an information-theoretic upper bound on planar graphs, we introduce
a definition of a quasi-canonical embedding for planar graphs: well-orderly maps.
This appears to be an useful tool to study and encode planar graphs. We present
upper bounds on the number of unlabeled1 planar graphs and on the number of
edges in a random planar graph. We also present an algorithm to compute well-
orderly maps and implying an efficient coding of planar graphs.
1 Introduction
In graph theory, a planar graph is a graph which can be embedded in the plane,
i.e., it can be drawn on the plane in such a way that its edges intersect only at their
endpoints. A planar graph drawn in the plane without edge intersections is called a
1
Nodes and edges are not assumed to be labeled.
N. Bonichon ()
LaBRI, University of Bordeaux, 351 Cours de la libération, 33405 Bordeaux, France
e-mail: bonichon@labri.fr
planar map or a planar embedding of the graph. The class of planar graphs is one of
the most studied graphs.
How much information can contain a simple planar graph of n nodes? The
question is highly related to the number of planar graphs. Counting the number
of (non-isomorphic) planar graphs with n nodes is a well-known and long-standing
unsolved graph-enumeration problem (cf. [24]). There is no known close formula,
neither asymptotic nor even an asymptotic on the logarithm of this number. Any
asymptotic on the logarithm would give a bound on the number of independent
random bits needed to generate a planar graph uniformly at random (but not
necessary in polynomial time).
Random combinatorial object generation is an important activity regarding
average case complexity analysis of algorithms and testing algorithms on typical in-
stances. Unlike random graphs (the Erdös–Rényi graph Model), still little is known
about random planar graphs. Indeed adding an edge in a planar graph highly depends
on the location of all previous edges. Random planar maps, i.e., plane embeddings
of planar graphs, have been investigated more successfully. Schaeffer [35] and then
Banderier et al. [2] have showed how to generate in polynomial time several planar
map families, e.g., 3-connected planar maps. Unfortunately, this generating does
not give much information about random planar graphs because there are many
ways to embed a planar graph into the plane. On the positive side, some families of
planar graphs support efficient random generation: trees [1], maximal outerplanar
graphs [3, 14], and more recently labeled and unlabeled outerplanar graphs [4].
Besides the combinatorial aspect and random generation, an important attention
is given in Computer Science to efficiently represent discrete objects. Efficiently
means that the representation is succinct, i.e., the storage of these objects uses few
bits, and that the time to compute such representation is polynomial in their size.
Fast manipulation of the so-encoded objects and easy access to a part of the code
are also desirable properties. At least two scopes of applications of high interests are
concerned with planar graph representation: Computer Graphics and Networking.
Surface discretization of a 3D object outputs a list of 3D coordinates and a set
of adjacency relations. In the case of convex objects, the set of adjacency relations
is an unlabeled planar graph. In general, small degree faces are used for surface
discretization, with triangle or quad meshes. Then, a compressor is applied on the
planar graph. Performances are expressed averaging the number of bits per edge or
per node. They are evaluated among a benchmark of standard examples [21], due to
the lack of “good” random planar graph generator, or typical instance generator.
For example, King and Rossignac [22, 34] gave a triangulation compressor that
guarantees 3:67 bits per node, the best possible rate being log2 .256=27/ 3:24
bits per node from Tutte’s enumerative formula [39].
Routing table design for a network has been investigated in the case of planar
networks [15, 16, 26, 37]. The underlying graph of the network is preprocessed to
optimize routing tables, a data structure dedicated to each node in charge of finding
the next output port given the destination address of an incoming message. The
main objective is to minimize the size of the routing tables while maintaining routes
as short as possible. The strategy used by Gavoille and Hanusse [16] based on a
2 An Information-Theoretic Upper Bound on Planar Graphs 19
k-page embedding, and then improved by Lu [26] with orderly spanning trees,
demonstrates that a compact planar graph representation helps for the design of
compact routing tables, especially when shortest paths are required.
Osthus et al. [31] investigated triangulations containing any planar graph, and
they showed that there is no more than nŠ25:22nCo.n/ labeled planar graphs. Osthus
et al. [31] also showed that almost all labeled planar graphs have at most 2:56n
edges, and that almost all unlabeled planar graphs have at most 2:69n edges.
A lower bound of 13n=7 1:85n has been obtained by Gerke and McDiarmid [17],
improving the 1:5n lower bound of the expected number of edges of [12]. Properties
of random planar graphs have also been investigated in [28].
Gimenez and Noy [18] show that the number of edges of a random labeled planar
graph is asymptotically normal and the mean is 2:213n and variance is 0:4303n.
Unlike general graphs, labeled and unlabeled planar graphs do not have the same
growing rate (up to the nŠ term) as proved in [28]. So upper bounds on labeled
planar graphs do not transfer to upper bounds on unlabeled planar graphs, but the
reverse is true.
Using generating function techniques, Noy and Gimenez [18] proved that the
number of labeled connected planar graphs tends to nŠ24:767nCO.log n/ . The number of
simple planar maps is asymptotic to 25:098nCO.log n/ (cf. algebraic generating function
presented in [25]) providing an upper bound for unlabeled planar graphs.
Let us sketch our technique. Since the number of useful combinatorial objects are
numerous, we first briefly describe in Fig. 2.1 the different steps toward the compact
coding of planar graphs. Our real starting point is a planar map, sometimes called
planar embedding. To get a planar map from a planar graph, well-known linear time
algorithm can be used (see for instance [9]). Roughly speaking, we first present a
very particular embedding of a planar graph called well-orderly map and show how
to encode it using combinatorial tools like bijective combinatorics and a specific
compression technic of binary strings.
2 An Information-Theoretic Upper Bound on Planar Graphs 21
Planar graph
6 compressed binary strings
Planar map
Super−triangulation
= Minimal Realizer Balanced trees with n−2 inner vertices
= 3 well−orderly trees
GRAPH THEORY AND GRAPH ALGORITHMIC COMBINATORICS AND BINARY TEXT COMPRESSION
of the edges into three trees .T0 ; T1 ; T2 / (see Schnyder’s trees [36]). In our case, the
partition has specific properties and corresponds to minimal realizer. We also show
how to uniquely recover the three trees from such a super-triangulation. Different
properties of minimal realizer are useful since the knowledge of two well-orderly
trees implies a canonical description of the third one and can exploited to save bits.
At this point, the following properties are only given as an illustration:
• Every edge .u; v/ of S such that (1) u is the parent of v in T1 and (2) u is an inner
node in T2 , must belong to G. This significantly saves bits in the coding of MS
since many edges of G can be guessed from S .
• An extra property is that two nodes belonging to the same branch of T2 have
the same parent in T1 (a branch is a maximal set of related nodes obtained
in a clockwise depth-first search of the tree, and such that a node belongs to
only one branch at the time, see Sect. 3). This latter property simplifies a lot the
representation of S . Knowing T2 , T1 does not need to be fully represented. Only
one relevant edge per branch of T2 is enough. As any tree of a realizer can be
deduced from the two others, the representation of S can be compacted in a very
efficient way, storing for instance T2 and the relevant edges of T1 .
Combining such properties and the optimal coding of realizer using the bijection of
Poulalhon and Schaeffer [33] (see also Theorem 2), we get an encoding of super-
triangulation presented in [7].
2
The original compressor runs in expected linear time. We give in this chapter a simpler guaranteed
linear time construction with asymptotically the same performances.
2 An Information-Theoretic Upper Bound on Planar Graphs 23
T0 BP
In this chapter, we deal with simple (no loops and no multi-edges) and undirected
graphs. If we cut the plane along the edges, the remainder falls into connected
regions of the plane, called faces. Each plane graph has a unique unbounded face,
called the outerface. The boundary of a face is the set of incident edges. The interior
edges are the edges non-incident to the boundary of the outerface, similarly for
interior nodes. Precise definitions can be founded for instance in [13, 30].
A triangulation is a plane embedding of a maximal planar graph, that is a planar
graph with n nodes and 3n 6 edges. There is only one way to embed in the plane
(up to a continuous transformation), a maximal planar graph whose three nodes are
chosen to lie on the outerface.
Let T be a rooted spanning tree of a plane graph H . Two nodes are unrelated if
neither of them is an ancestor of the other in T . An edge of H is unrelated if its
endpoints are unrelated.
We introduce well-orderly trees, a special case of orderly spanning trees of
Chiang, Lin, and Lu in [8], referred as simply orderly trees later. Let v1 ; : : : ; vn be
the clockwise preordering of the nodes in T (nodes ordered by their first visit in a
clockwise traversal of the tree T ). Recall that a node vi is orderly in H with respect
to T if the incident edges of vi in H form the following four blocks (possibly empty
set of vertices) in clockwise order around vi (see Fig. 2.2b):
• BP .vi /: the edge incident to the parent of vi
• B< .vi /: unrelated edges incident to nodes vj with j < i
• BC .vi /: edges incident to the children of vi
• B> .vi /: unrelated edges incident to nodes vj with j > i
A node vi is well orderly in H with respect to T if it is orderly, and if:
• The clockwise first edge .vi ; vj / 2 B> .vi /, if it exists, verifies that the parent of
vj is an ancestor of vi (in T ).
In other words, if .vi ; vj / the first edge of B> .vi /, then the parent of vj is an
ancestor of vi in T .
24 N. Bonichon et al.
a r1 b r1
w w
r2 u r2 u
t t
v v
r0 r0
Fig. 2.3 Two realizers for a triangulation. The tree T 0 rooted in r0 (the tree with bold edges
augmented with the edges .r0 ; r1 / and .r0 ; r2 /) is well orderly in (b), and simply orderly in (a) (the
node v is not well orderly: .v; w/ is the clockwise first edge of B> .v/ and the parent of t is not an
ancestor of v). The clockwise preordering of T 0 in (a) is r0 ; r2 ; v; u; t; w; r1
v6
b
a v5 v5
v2
v2 v8
v8
v4 v4
v7 v7
v6
v3 v3
c v6
v1 v1
v5
v2 v8
v4
v7
v3
v1
Fig. 2.4 A planar graph G (a), a well-orderly map of G rooted at v1 with its well-orderly tree
(bold edges) (b), and a super-triangulation of G (c) (dotted edges are non-edges of G)
In order to prove Theorem 1, we need the next three lemmas. The proofs of these
lemmas are given after the proof of Theorem 1.
Lemma 1. Every well-orderly map rooted in some node v has a unique well-orderly
tree of root v.
Lemma 2. Let G be a connected planar graph, and let v be any node of G. Then G
has a well-orderly map of root v. Moreover, well-orderly trees and the well-orderly
map can be computed in linear time.
In [8], a result similar to Lemma 2 about simply orderly trees and embeddings
is proved. However, the extra condition reduces much more the choice of the
embedding for the input planar graph and leads to the uniqueness of the tree
2 An Information-Theoretic Upper Bound on Planar Graphs 27
(Lemma 1). In the case of simply orderly embeddings, several orderly trees may
exist (cf. Fig. 2.3 where both orderly trees T 0 span the same triangulation). Actually,
the uniqueness concerns also the way to triangulate the faces of well-orderly maps,
thanks to the next lemma.
Lemma 3. Let T be the well-orderly tree of H rooted in some node r0 , and assume
that T has at least two leaves. Let r2 and r1 be the clockwise first and last leaves
of T , respectively. Then, there is a unique super-triangulation .T0 ; T1 ; T2 / of the
underlying graph of H , preserving the embedding H , and such that each Ti has
root ri . Moreover, T0 D T n fr1 ; r2 g and the super-triangulation are computable in
linear time.
First of all, let us show that Lemmas 1, 2, and 3 imply Theorem 1.
Proof of Theorem 1. Consider a connected planar graph G with at least three nodes,
and let v be any node of G with the only constraint that if G is a path, then v is
chosen to be of degree two (this is feasible since G has at least three nodes). Thanks
to Lemma 2, one can compute in linear time a well-orderly map H of G and a well-
orderly tree T rooted in v. Let us show that T has at least two leaves r1 ; r2 lying on
the outerface of H , r2 traversed before r1 in a clockwise preordering of T .
We show that T cannot be a chain, and thus has a node with at least two children
(and thus has two leaves). If G is a path, then T rooted in a node of degree two is
not a chain. Assume that G is not a path, but T is a chain. Then there exists an edge
of G that is not in T . However, all pairs of nodes of a chain are related, thus must
belong to T . Therefore, T is not a chain.
Lemma 3 can be therefore applied, and one can compute for G a super-
triangulation in linear time. t
u
Proof of Lemma 1. Assume that H has two well-orderly trees T; T 0 rooted in v. Let
v1 ; : : : ; vn (resp. v01 ; : : : ; v0n ) be the clockwise preordering of the nodes of T (resp.
T 0 ). Let vi be the node such that the neighbors of vi in T and in T 0 differ, and such
that i is minimum. We have vt D v0t for all t 6 i , and BC .vi / ¤ BC0 .vi /, where
BC0 .vi / denotes the children edge block around vi in T 0 .
W.l.o.g. assume jBC .vi /j 6 jBC0 .vi /j (the symmetric case is proved by exchang-
ing the role of T and T 0 ). Note that B< .vi / D B> .vi / D ¿ is impossible, otherwise
BC .vi / would consist of all the neighbors of vi (maybe the vi ’s parent excepted)
and jBC .vi /j 6 jBC0 .vi /j and BC .vi / ¤ BC .vi / would be incompatible. Let e1
(resp. e2 ) be the clockwise first (resp. last) edge of BC .vi /. Let e be an arbitrary
edge of BC0 .vi /. In the following, e1 6 e means either e1 D e, or e1 is clockwise
before e around vi .
Let us show that e1 6 e. This is clearly true if B< .vi / D ¿. If B< .vi / ¤ ¿, then
consider any edge .vi ; vh / 2 B< .vi /. Then, .vi ; vh / … BC0 .vi /. Indeed, as h < i , the
path from vh to vi in T exists also in T 0 , and the edge .vi ; vh / of T 0 would create a
cycle in T 0 . Thus, e1 6 e.
28 N. Bonichon et al.
Our website is not just a platform for buying books, but a bridge
connecting readers to the timeless values of culture and wisdom. With
an elegant, user-friendly interface and an intelligent search system,
we are committed to providing a quick and convenient shopping
experience. Additionally, our special promotions and home delivery
services ensure that you save time and fully enjoy the joy of reading.
ebookultra.com