a r t i c l e i n f o abstract
Article history: Over the last decade considerable research effort has been invested in an attempt to understand the
Received 1 September 2010 dynamics of viruses as they spread through complex networks, be they the networks in human
Received in revised form population, computers or otherwise. The efforts have contributed to an understanding of epidemic
30 June 2011
behavior in random networks, but were generally unable to accommodate specific nonrandom features
Accepted 22 July 2011
Available online 7 August 2011
of the network’s actual topology. Recently, though still in the context of the mean field theory,
Chakrabarti et al. (2008) proposed a model that intended to take into account the graph’s specific
Keywords: topology and solve a longstanding problem regarding epidemic thresholds in both random and
SIS model nonrandom networks. Here we review previous theoretical work dealing with this problem (usually
Markov chain
based on mean field approximations) and show with several relevant and concrete counter examples
Poisson process
that results to date breakdown for nonrandom topologies.
& 2011 Elsevier Ltd. All rights reserved.
1. Introduction May and Anderson (1988, 1984). For heterogeneous networks, the
relative fraction of nodes having different degrees is referred to as
Under what conditions will a virus or other infectious agent the degree distribution. It is just the probability Pk that a
spread in a complex population network? This question has vexed randomly chosen node in the network has degree k, i.e, Pk ¼
epidemiologists, mathematicians and computer scientists alike probðdegðnode iÞ ¼ kÞ. The mean degree of the network is /kS¼
for many decades (Anderson and May, 1991; May and Lloyd, kPk and variance of the degree distribution is varðkÞ ¼ s ¼ /k S
2 2
2001; Pastor-Satorras and Vespignani, 2001; Madar et al., 2004; /kS2 . Consider a large population and let dk be the number of
Aparacio and Pascual, 2007; Berchenko et al., 2009). An early individuals in the population that have k contacts, with k dk ¼ L.
result arising from epidemic modeling is based on the so-called Then ðdk =LÞ estimates the degree-distribution Pk.
reproductive number R0, the number of secondary infections a The degree of heterogeneity in the population’s contact struc-
typical infected individual is able to generate (Anderson and May, ture may be gauged by the Coefficient of Variation CV ¼ ðs=/kSÞ.
1991). If a typical infected individual is able to infect on average Equivalently one can write CV 2 ¼ ð/k2 S=/kS2 Þ1. The popula-
more than one other member of the population then R0 41. In tion is assumed to be randomly mixed subjected to the constraint
that case the virus is able to reproduce itself and trigger an so that the degree-distribution is always preserved. For such a
epidemic in the population, allowing it to persist in time for an heterogeneous population, it has been shown that (Anderson and
extensive period. In contrast, if the reproductive number is below May, 1991; Dietz, 1980; May and Anderson, 1988; May and
unity then R0 o1, and the disease will rapidly die out in the Anderson, 1984):
population and an infection free equilibrium will be reached. This
R0 ¼ Rð1þ CV 2 Þ ð1Þ
threshold result assumes that the population is homogeneous and
randomly mixing, whereby an infected individual is equally likely where R is the reproductive number for the equivalent homo-
to come into contact and infect any susceptible present, an geneous population where all individuals have /kS contacts and
assumption that has many limitations. thus CV¼ 0.
This result has been generalized to heterogeneous populations Eq. (1) will be referred to as the Dietz–May formula since both
in which some individuals have more contacts than others. authors (May and Anderson, 1988; May and Anderson, 1984) were
Historically, most notable are the studies of Dietz (1980) and responsible for developing the formulation and applying it in
practice. As before R0 41 implies that an epidemic will ensue,
while R0 o1 implies that the virus rapidly dies out and an infection
Corresponding author. free equilibrium is attained. These concepts have proved useful for
E-mail address: (L. Stone). studying contact networks with power-law distributions that
22 O. Givan et al. / Journal of Theoretical Biology 288 (2011) 21–28
might typify computer networks and in some cases be appropriate of millions of individuals as is appropriate for large cities. As
for studying the transmission of sexually distributed diseases such, CWF developed a method to approximate the Markov
(Pastor-Satorras and Vespignani, 2001; Lloyd and May, 2001). chain model.
Measuring R0 is often quite complicated, especially in complex In more detail they consider a population network, and define
networks as Aparacio and Pascual (2007) have discussed. Newman, an individual’s neighbors as all members of the population he or
2002 and Cohen et al. (2002) have shown how percolation ideas she can directly contact and transmit the disease. Set b as the
and generating function methods can be used to provide exact probability that an infected individual/node will infect a suscep-
solutions of epidemic models on simple networks and on bi-partite tible neighboring node in the network, and let the probability that
graphs. Their key epidemic threshold result is nevertheless the node-i is infected at time t be given by pi,t. Over one time-step, the
same as Eq. (1) obtained by the different methods. probability that node-i will not receive any infections from its
Considerable work has been invested in exploring these issues neighbors is, according to CWF, given by
in the biology, physics and mathematics literature. Concepts Y Y
zi,t C ½ð1bÞpj,t1 þ ð1pj,t1 Þ ¼ ½1bpj,t1 ð2Þ
taken from the percolation theory continue to play a major role j A Mi j A Mi
in current epidemic network research (Madar et al., 2004; Cohen
et al., 2002; Kenah and Robins, 2007; Parshani et al., 2010). where Mi is the set of all neighbors of node-i. Note Eq. (2) is exact
Moreover, there are many challenging open problems (see the only when it is assumed that the nodes pj,t 1 are independent of
interesting inaugural article of Durrett, 2010). each other. This ‘‘independence assumption’’ is of great impor-
In recent years, there has been considerable interest in under- tance and will be dealt with in detail in what follows. Thus,
standing the way in which the detailed network structure of the according to Chakrabarti et al. (2008) the probability that node-i
population, or its ‘‘topology,’’ might affect the persistence thresh- is healthy at time t is given by
old. That is, does the exact network structure, not just its degree 1pi,t ¼ ð1pi,t1 Þzi,t þ dpi,t1 zi,t ð3Þ
distribution, give extra information from which it is possible to
learn more about the spread of an epidemic? Since many real where d is the probability that an infected node will recover at
networks are non-random and sometimes highly clustered, the time-step t. Note that since recovery is geometrically distributed,
motivation to explore beyond random models is quite justified. the mean infection time is 1/d. This last equation states that node-
Chakrabarti, Wang, Faloutsos (Chakrabarti et al., 2008) intro- i is healthy at time t if it did not receive infections from its
duced a new model, referred to here as the CWF model, which neighbors at t and either node-i was uninfected at time step t 1,
intended to identify exactly how a population’s network structure or was infected at t 1 but was cured at t (Chakrabarti et al.,
controls the epidemic threshold. A very general epidemic threshold 2008). (This last term in Eq. (3), which appears in the CWF model
condition for any arbitrary network was derived. This condition is (Chakrabarti et al., 2008), is problematical as we explain in the
based on the network’s topology as a mean field approximation discussion. It can however be dropped without affecting the
will be elaborated shortly. results of the following stability analysis.) Combining Eqs.
In this paper we show that many of the previous studies (2) and (3) yields the CWF model
contribute to our understanding of epidemic thresholds for random pi,t ¼ 1½1þ ðd1Þpi,t1 ð1bpj,t1 Þ ð4Þ
networks, however for nonrandom network topologies (even j A Mi
regular graphs) accurate predictions of the epidemic threshold
It is clear that the model has an infection free equilibrium in
are hard to come by. We explore the mean field approximation
which pni ¼ 0 8i. (Here, the star notation indicates a state of
formulated by CWF and show that its predictions often break down
equilibrium.) We now proceed to examine this equilibrium’s local
for nonrandom networks. This is because mean-field approxima-
stability. Using vector notation, close to the equilibrium, Eq. (4)
tions fail to take into account the correlations in the state of
may be approximated as
indirect neighbors. Moreover, by mapping one model to another,
we are able to retrieve known theoretical literature results (based ! !
p t ¼ ðð1dÞI þ bAÞ Up t1 ð5Þ
on percolation theory) that contradict the CWF general threshold
where, I is the identity matrix and A is the adjacency matrix of
binary entries 1,0 representing the connectivity between the
nodes. Thus, the infection free equilibrium ðpni ¼ 0 8iÞ is locally
stable only if
2. The CWF model
ð1dÞ þ br o 1 ð6Þ
CWF (Chakrabarti et al., 2008) assume that the population is where r is the spectral radius of the matrix A. This is because the
divided into two classes: individuals that are Susceptible (S) and Perron–Frobenius theorem ensures that if A is a nonnegative,
individuals that are Infected (I). The model has the classical SIS irreducible matrix then one of its eigenvalues is real, positive and
structure whereby susceptible individuals may become infected greater than or equals to (in absolute value) all other eigenvalues
upon contact with an infected individual. After contracting the (Horn and Johnson, 1985). This eigenvalue is the spectral radius r.
disease an individual recovers after some fixed time period and In terms of the reproductive number, the infection free
becomes susceptible once again, thereby closing the SIS loop. equilibrium is locally stable if
As each individual can be in one of the two states, for a
complex network of N individuals, there are 2N possible different b
R0 ¼ r o1 ð9Þ
states the population may be found in. It is appropriate to d
formulate the model in terms of a Markov chain, but this requires The reproductive number R0 has a simple interpretation.
information specifying the probabilities between each of the Returning to Eq. (6) we see that if p 0 is an eigenvector
possible states. In this formulation states correspond to particular corresponding to eigenvalue r of A, the expected number of
configurations of the population network, with the configuration newly infected individuals in the next generation p 1 is given by
at each time step dependent on the former time step only. br, while the expected number of recovered individuals is d. Since
However, it is not a simple matter to determine the probabilities the mean infectivity time is 1/d, then (b/d)r should be interpreted
of the 2N 2N transition matrix, which in any case is impractical as the total number of new infections generated in a single time
to work with even when N is modestly large, let alone of the order step multiplied by the actual infectivity period of the disease.
O. Givan et al. / Journal of Theoretical Biology 288 (2011) 21–28 23
Hence R0 is simply the mean number of secondary infections over Thus the threshold condition for local stability as given by Eq.
the infectious period of the disease. (9) becomes
This conforms closely with the conventional view of R0 as the
Rð1þ CV 2 Þ o 1
number of secondary cases that one infected case can produce
when placed in a wholly susceptible population. If it can infect which is the Dietz–May formula given in Eq. (1).
more than one individual on average (R0 41) an epidemic will
ensue otherwise the infection will rapidly die out as the infection
4. Simulations of random networks:
free equilibrium is reached. In what follows, (9) will be referred as
the CWF threshold criterion, since stability of the infection free
We tested the above theoretical results by numerically simu-
equilibrium depends on whether R0 is greater or less than unity.
lating the spread of epidemics on Erdos Renyi networks
In this way R0, as given by Eq. (9), may be used as a reference
(N¼ 50,000) and Regular Random graphs (N¼100,000). For each
frame for testing the CWF threshold
network studied, 1% of the nodes were randomly chosen and
It should be pointed out that the above analysis concerns the
initially infected. Simulation then proceeded in steps of unit time
underlying deterministic mean-field model presented by CWF,
increments. During each time step, an infected node was able to
and this raises two issues. First, for the full stochastic model, in
infect each of its neighbors with probability b. In addition, every
which the mean-field is supposed to mimic, one has to take into
infected node recovered with probability d. In the case of d ¼1,
account the stochastic effects. In particular, if R0 41 then demo-
infected nodes recovered in exactly one time-step. An infection
graphic stochasticity at the initiation of an epidemic when
attempt on an already infected node had no effect; however if a
infectives are in small numbers, can prevent the epidemic from
node recovers, it can be infected by its neighbors within the same
triggering. This is the ‘‘stochastic epidemic theorem’’ (Renshaw,
time step (as simulated by CWF in Chakrabarti et al., 2008 and
1991): even though R0 41 there is a finite probability that the
will be further discussed in the discussion).
epidemic will not trigger. However, if R0 o1 a major epidemic
Simulations were run for 50,000 time steps and were repeated
cannot occur.
100 times with different initial conditions, for different values of
R0 ¼(b/d)r.
Fig. 1 plots the proportion of infected nodes (i.e., the number of
3. Known epidemic thresholds for random networks
nodes infected divided by the total population N) at equilibrium
as a function of R0. One sees the presence of an epidemic
We first consider random networks making use of the results
threshold at R0 ¼0.99, while the CWF prediction is R0 ¼1 (see
from Furedi and Komlos (1981). The latter authors studied
Eq. (9) above). The figure makes clear that the CWF threshold
random, symmetric, N N matrices in which the elements aij
formula holds for both random networks and regular random
are identically distributed having the same mean m and variance
networks, although the result has been known for decades in this
s2. For such matrices the largest eigenvalue may be approximated
context as given by the Dietz–May formula. We thus understand
that the true importance of the CWF threshold formula concerns
ai,j s2 1 nonrandom graphs as treated in detail below.
r ¼ i,j þ þO pffiffiffiffi ð10Þ
N m N
Consider then Erdos Renyi networks, which comprise N nodes 5. Nonrandom graphs
with a probability p, of having an edge between any pair of nodes.
Thus, /aijS¼ m ¼p and var(aij)¼ s2 ¼p(1 p). Therefore Eq. (10) 5.1. One-dimensional chain
may be rewritten as
P Consider regular graphs in which each node has exactly two
ai,j s2 1 pð1pÞ
r ¼ i,j þ þ O pffiffiffiffi Np þ ¼ ðN1Þp þ 1 ¼ /kS þ 1p neighbors. This forms a topology often referred to as a ‘‘one-
N m N p
dimensional chain’’ whereby each node-i is connected to node
for large N. i 1 on the left and node iþ 1 on the right, for i¼1yN (Fig. 2a).
Hence, for an Erdos Renyi network, the CFW threshold is based For the particular case of a one-dimensional chain we show
on R0 ¼(b/d)(/kSþ1 p). This coincides with the work of Dietz that it is possible to theoretically determine the threshold via the
and May who, as we saw, argue that percolation theory. To achieve this we first have to show that the
propagation of a virus in an infinite one-dimensional chain, where
bNp Npð1pÞ b b
R0 ¼ R 1 þ CV 2 ¼ 1þ ¼ ððN1Þpþ 1Þ ¼ ð k þ 1pÞ
d N 2 p2 d d
It is of interest to examine regular random graphs in which
each node has the same fixed number of edges k, but the edges
are connected randomly between nodes. A simple calculation
shows that the spectral radius of the adjacency matrix associated
with any regular graph random or otherwise, must be r ¼k
(Restrepo et al., 2007). Thus the CWF threshold for a regular
random network occurs at the point where R0 ¼ (b/d)k is unity.
This threshold is in agreement with Dietz–May formula (taking
CV¼0) and deduced also by Kephart and White (1991).
Results are also available for the more general case of random
networks having arbitrary degree distribution dk. Chung et al.
Fig. 1. (a) Random ER graph with average connectivity degree /dS¼ 4. (b) Regular
(2003) have shown that the spectral radius of the adjacency
random graph with fixed connectivity degree /dS¼ 6. The proportion of infected
matrix associated with these networks is given by nodes at equilibrium is plotted for various values of R0 ¼ ðb=dÞr where d ¼1 is
fixed and R0 is determined by b. The CWF threshold prediction for the topology of
/k2 S
r¼ ¼ /kSð1 þ CV 2 Þ the graph is simply R0 ¼1 in each figure. The simulated thresholds are in good
/kS agreement with the predictions.
24 O. Givan et al. / Journal of Theoretical Biology 288 (2011) 21–28
Fig. 3. (a) Regular graph N ¼8, d ¼4. The nodes are marked by black circles. (b) The
proportion of infected nodes is plotted for various values of R0 ¼ ðb=dÞr where
d ¼ 1 is fixed and R0 is determined by b. The CWF threshold prediction for the
topology of the graph is R0 ¼ 1. The simulated threshold is R0 ¼1.23.
0 1
2 Y
zi,t ¼ @ ð1bnj,k ÞAPk,t1 ð13Þ
k j A Mi
The so-called independence assumption is a critical assump- Fig. 5. Joint infectivity products /n1n2Sand /n1S/n2S calculated for nodes 1 and
tion used in deriving the CWF model. It assumes that the 2 are plotted versus time for (a) regular random graph and (b) regular nonrandom
probabilities of the ith node. graph. The differences were measured after the system equilibrates.
26 O. Givan et al. / Journal of Theoretical Biology 288 (2011) 21–28
and the probability the disease will survive to the (2j þ1)th time Appendix B: the independence assumption
steps but will not survive to the 2(j þ1)th time step is
The exact derivation of the probability that a node-i will not
N N1
S2j þ 1 ¼ L2j ðb ð1bÞN þN b ð1bÞð1bÞN1 þ receive infections from its neighbors in the next time step is
0 1
X 2N
N N @
þ b1 ð1bÞN1 ð1bÞ1 Þ ¼ L2j ð1bÞN bk 1Nk zi,t ¼ ð1bnj,k ÞAPk,t1 ðB:1Þ
1 k¼1
k k j A Mi
¼ L2j ð1bÞN ½ð1 þ bÞN 1 ðA:3Þ Eq. (B.1) can be revised into
0 1
X X 2
and hence the expected time steps the disease will survive are zi,t ¼ @1b nj,k þ b nj,k nq,k þ ð1Þ b nj,k APk,t1
k j A Mi j,q A Mi j a q j A Mi
1 X
* +
/kS ¼ 2kS2k þ ð2k þ1ÞS2k þ 1 ðA:4Þ X 2
k¼0 k¼0 ¼ 1b nj t1 þ b nj nq t1
þ ð1ÞM b nj
j A Mi j,q A Mi j a q j A Mi t1
By assigning the expressions of S2k þ 1 and S2k, (A.4) turn to ðB:2Þ
" #
1 X
1 where M ¼9Mi9 is the size of the neighbors set.
/kS ¼ ð1bÞN ð2kL2k Þ þ ½ð1 þ bÞN 1 ðð2kþ 1ÞL2k Þ Using the approximation
k¼0 k¼0 * +
" # nj /nj St ðB:3Þ
1 X
¼ ð1bÞ 2ð1 þ bÞ kL2k þ ðð1 þ bÞ 1Þ L2k ðA:5Þ j A Mi t j A Mi
k¼0 k¼0
where the product can be over a subset of the neighbors or all
P1 of them.
Since k¼0 L2k is a geometric series
Eq. (B.2) turns into
1 X
2 1 1 X X Y
ð1ð1b ÞN Þk ¼
2 M
L2k ¼ ¼ zi,t 1b /nj St1 þ b /nj St1 /nq St1 þ ð1ÞM b /nj St1
2 N 2 N
k¼0 k¼0 11 þð1b Þ ð1b Þ j A Mi j,q A Mi j a q j A Mi
" # The average value of /njSt 1 is the probability of node-j to be
X X 1
1 1
2 1 @ X 2 infected at time t 1, which is equivalent to the CWFpj,t 1 and
kL2k ¼ kð1ð1b ÞN Þk ¼ h i 1ð1b ÞN
k¼0 k¼0 ln 1ð1b ÞN
2 @a k ¼ 0 therefore zi,t is approximated to
2 3 zi,t ð1bpj,t1 Þ ð2Þ
1 @ 1
i 6 7 j A Mi
¼ h 4 h ia 5
ln 1ð1b ÞN @a 1 1ð1b2 ÞN
a¼1 Thus we understand that the mean field approximation
N a suggested by Chakrabarti et al. was to approximate each averaged
1 1b 1 1b
product into a product of averages (as seen in Eq. (B.3)).
N a
2 2
1 1 1b
2 1b
Appendix C: vaccination strategy
Now, (A.5) can be written as
One of the conclusions Chakrabarti et al. (2008) draw from the
" # CWF model is that the most efficient way to immunize a network,
2 N
ð1ð1b Þ Þ 1
/kS ¼ ð1bÞN 2ð1 þ bÞN 2
þ ðð1 þ bÞN 1Þ 2
is to vaccinate the nodes (i.e. subtract the nodes and their links
ð1b Þ2N ð1b ÞN from the graph) that will cause the most significant decrease in
the spectral radius r of the adjacency matrix A. It is interesting to
2 1 examine this proposition closer using their example as shown
/jS ¼ 1 ðA:6Þ
ð1b ÞN
ð1 þ bÞN in Fig. 6.
Fig. 6. The ‘‘bar-bell’’ graph discussed by Chakrabarti et al. (2008). Vaccinating any one of the nodes A, A0 , B and C results in the change of the spectral radius r by Dr
demarcated. Since vaccination of node C is associated with largest Dr ¼ 0.0315, the CWF method would suggest this to be the most effective strategy when only one node
can be vaccinated.
28 O. Givan et al. / Journal of Theoretical Biology 288 (2011) 21–28
Fig. 7. Modification of the ‘‘bar-bell’’ graph. Node D is added to the right cluster. The effect of vaccination on r is noted next to nodes C and D.
