A Primer On The Use of Probability Generating Functions in Infectious Disease Modeling
A Primer On The Use of Probability Generating Functions in Infectious Disease Modeling
disease modeling
Joel C. Miller
October 3, 2024
arXiv:1803.05136v4 [q-bio.PE] 2 Oct 2024
Abstract
We explore the application of probability generating functions (PGFs) to invasive processes, focusing
on infectious disease introduced into large populations. Our goal is to acquaint the reader with appli-
cations of PGFs, moreso than to derive new results. PGFs help predict a number of properties about
early outbreak behavior while the population is still effectively infinite, including the probability of an
epidemic, the size distribution after some number of generations, and the cumulative size distribution
of non-epidemic outbreaks. We show how PGFs can be used in both discrete-time and continuous-time
settings, and discuss how to use these results to infer disease parameters from observed outbreaks. In the
large population limit for susceptible-infected-recovered (SIR) epidemics PGFs lead to survival-function
based models that are equivalent the the usual mass-action SIR models but with fewer ODEs. We use
these to explore properties such as the final size of epidemics or even the dynamics once stochastic ef-
fects are negligible. We target this primer at biologists and public health researchers with mathematical
modeling experience who want to learn how to apply PGFs to invasive diseases, but it could also be used
in an applications-based mathematics course on PGFs. We include many exercises to help demonstrate
concepts and to give practice applying the results. We summarize our main results in a few tables.
Additionally we provide a small python package which performs many of the relevant calculations.
Contents
1 Introduction 2
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9
1
3.5 Full dynamics in finite populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.2 SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
4 Large-time dynamics 39
4.1 SIR disease and directed graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Final size relations for SIR epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Discrete-time SIR dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Continuous-time SIR epidemic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43
5 Multitype populations 44
5.1 Discrete-time epidemic probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Continuous-time SIR dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
6 Discussion 47
C Software 64
1 Introduction
The spread of infectious diseases remains a public health challenge. Increased interaction between humans
and wild animals leads to increased zoonotic introductions, and modern travel networks allows these diseases
to spread quickly. Many mathematical approaches have been developed to give us insight into the early
behavior of disease outbreaks. An important tool for understanding the stochastic behavior of an outbreak
soon after introduction is the probability generating function (PGF) [52, 2, 53].
Specifically, PGFs frequently give insight the statistical behavior of outbreaks before they are large
enough to be affected by the finite-size of the population. In these cases, both susceptible-infected-recovered
(SIR) disease (for which nodes recover with immunty) and susceptible-infected-susceptible (SIS) disease (for
which nodes recover and can be reinfected immediately) are equivalent. In the case of SIR disease they can
be used to study the dynamics of disease once an epidemic is established in a large population.
We can investigate properties such as the early growth rate of the disease, the probability the disease
becomes established, or the distribution of final sizes of outbreaks that fail to become established. Similar
questions also arise in other settings where some introduced agent can reproduce or die, such as invasive
species in ecological settings [29], early within-host pathogen dynamics [11], and the accumulation of mu-
tations in precancerous and cancerous cells [14, 4] or in pathogen evolution [48]. These are all examples of
branching processes, and PGFs are a central tool for the analysis of branching processes [7, 25, 26]. Except
for Section 4 where we develop deterministic equations for later-time SIR epidemics, based on [47, 34, 37],
the approaches we describe here have direct application in these other branching processes as well.
2
i
P
Distribution PGF f (x) = i ri x
e−λ λi
Poisson, mean λ: ri = i! eλ(x−1)
Uniform: rλ = 1 xλ
n
Binomial: n trials, with success probability p: ri = i pi q n−i for q = 1 − p [q + px]n
Before proceeding, we define what a PGF is. Let ri denote the probability of drawing the value i from a
given distribution of non-negative integers. Then
X
f (x) = ri xi
i
is the PGF of this distribution. We should address a potential confusion caused by the name. A “generating
function”
P is a function which is defined from (or “generated by”) a sequence of numbers ai and takes the
form i ai xi . So a “probability generating function” is a generating function defined from a probability
distribution on integers. It is not a function that generates probabilities when values are plugged
P in mfor x.
m
There are other generating functions, including the “moment generating function”, defined to be m ⟨i ⟩ x
m m
P
where ⟨i ⟩ = i ri i (the moment and probability generating functions turn out to be closely related).
PGFs have a number of useful properties which we derive in Appendix A. We have structured this paper
so that a reader can skip ahead now and read Appendix A in its entirety to get a self-contained introduction
to PGFs, or wait until a particular property is referenced in the main text and then read that part of the
appendix.
As we demonstrate in Table 1, for many important distributions the PGF takes a simple form. We derive
this for the Poisson distribution.
Example 1.1 Consider the Poisson distribution with mean λ
e−λ λi
ri = .
i!
For this we find
X e−λ λi X (λx)i
f (x) = xi = e−λ = e−λ eλx
i
i! i
i!
λ(x−1)
=e .
In this primer, we explore the application of PGFs to the study of disease spread. We will use PGFs to
answer questions about the early-time behavior of an outbreak (neglecting depletion of susceptibles):
• What is the probability an outbreak goes extinct within g generations (or by time t) in an arbitrarily
large population?
3
• What is the final size distribution of small outbreaks?
• What is the size distribution of outbreaks at generation g (or time t)?
• How fast is the initial growth for those outbreaks that do not go extinct?
Although we present these early-time results in the context of SIR outbreaks they also apply to SIS outbreaks
and many other invasive processes.
We can also use PGFs for some questions about the full behavior accounting for depletion of susceptibles.
Specifically:
• In a continuous-time Markovian SIR or SIS outbreak spreading in a finite population, what is the
distribution of possible system states at time t?
• In the large-population limit of an SIR epidemic, what fraction of the population is eventually infected?
• In the large-population limit of an SIR epidemic, what fraction of the population is infected or recovered
at time t?
We will consider both discrete-time and Markovian continuous-time models of disease. In the discrete-
time case each infected individual transmits to some number of “offspring” before recovering. In the
continuous-time case each infected individual trasmits with a rate β and recovers with a rate γ.
In Section 2 we begin our study investigating properties of epidemic emergence in a discrete-time,
generation-based framework, focusing on the probability of extinction and the sizes of outbreaks assum-
ing that the disease is invading a sufficiently large population with enough mixing that we can treat the
infections caused by any one infected individual as independent of the others. We also briefly discuss how we
might use our observations to infer disease parameters from observed small outbreaks. In Section 3, we repeat
this analysis for a continuous-time case treating transmission and recovery as Poisson processes, and then
adapt the analysis to a population with finite size N . Next in Section 4 we use PGFs to derive simple models
of the large-time dynamics of SIR disease spread, once the infection has reached enough individuals that we
can treat the dynamics as deterministic. Finally, in Section 5 we explore multitype populations in which there
are different types of infected individuals, which may produce different distributions of infections. We provide
three appendices. In Appendix A, we derive the relevant properties of PGFs, in Appendix B we provide ele-
mentary (i.e., not requiring Calculus) derivations of two important theorems, and in Appendix C we provide
details of a Python package Invasion PGF available at https://github.com/joelmiller/Invasion_PGF
that implements most of the results described in this primer. Python code that uses this package to imple-
ment the figures of Section 2 is provided in the supplement.
Our primary goal here is to provide modelers with a useful PGF-based toolkit, with derivations that
focus on developing intuition and insight into the application rather than on providing fully rigorous proofs.
Throughout, there are exercises designed to increase understanding and help prepare the reader for appli-
cations. This primer (and Appendix A in particular) could serve as a resource for a mathematics course on
PGFs. For readers wanting to take a deep dive into the underlying theory, there are resources that provide
a more technical look into PGFs in general [52] or specifically using PGFs for infectious disease [53].
1.1 Summary
Before presenting the analysis, we provide a collection of tables that summarize our main results. Table 2
summarizes our notation. Tables 3 and 4 summarize our main results for the discrete-time and continuous-
time models. Table 5 shows applications of PGFs to the continuous-time dynamics of SIR epidemics once
the disease has infected a non-negligible proportion of a large population, effectively showing how PGFs can
be used to replace most common mass-action models. Finally, Table 6 provides the probability of each finite
final outbreak size assuming a sufficiently large population that susceptible depletion never plays a role.
4
Function/variable name Interpretation
f (x) = i pi xi
P
Arbitrary PGFs.
g(x) = i qi xi
P
Ω∞ (z) = r<∞ ωr z r + ω∞ z ∞
P The PGF for the distribution of completed infections at the end of a
small outbreak, in generation g, or at time t in an infinite population.
Ωg (z) = r ωr (g)z r
P
If R0 > 1, then one of the terms in the expansion of Ω∞ (z) is ω∞ z ∞
Ω(z, t) = r ωr (t)z r
P
where ω∞ is the probability of an epidemic.
Πg (y, z) = i,r πi,r (g)y i z r
P
The PGF for the joint distribution of current infections and completed
Π(y, z, t) = i,r πi,r (t)y i z r infections either at generation g or time t in an infinite population.
P
The PGF for the joint distribution of susceptibles and current infec-
P s i tions at time t in a finite population of size N (used for continuous
Ξ(x, y, t) = s,i ξs,i (t)x y
time only). In the SIR case we can infer the number recovered from
this and the total population size.
PGF for the “ancestor distribution”, analogous to the offspring distri-
p i xi
P
χ(x) = i
bution.
PGF for the distribution of susceptibility for the continuous time model
P (κ)xκ
P
ψ(x) = κ where rate of receiving transmission is proportional to κ.
The individual transmission and recovery rates for the Markovian con-
β, γ
tinuous time model.
Table 2: Common function and variable names. When we use a PGF for the number of susceptible individ-
uals, active infections, and/or completed infections x and s correspond to susceptible individuals, y and i to
active infections, and z and r to completed infections.
5
Question Section Solution
Probability of extinction, α, given a single α = limg→∞ µ[g] (0) or, equivalently, the
2.1
introduced infection. smallest x in [0, 1] for which x = µ(x).
Probability of extinction within g genera-
2.1.2 αg = µ[g] (0).
tions
PGF of the distribution of the number of
2.2 Φg (y) where Φg solves Φg (y) = µ[g] (y).
infected individuals in the g-th generation.
Average number of active infections in gen-
Rg
eration g and average number if the out- 2.2 Rg0 , and 1−αg .
0
Table 3: A summary of our results for application of PGFs to discrete-time SIS and SIR disease processes
in the infinite population limit. The function µ(x) is the PGF for the offspring distribution. The notation
[g] in the exponent denotes function composition g times. For example, µ[2] (y) = µ(µ(y)).
6
Question Section Solution
Probability of eventual extinction α given a
3.1 α = min(1, γ/β)
single introduced infection.
Probability of extinction by time t, α(t). 3.1.1 α(t) where α̇ = (β+γ)[µ̂(α)−α] and α(0) =
0.
Φ(y, t) where Φ(y, 0) = y and Φ solves ei-
ther
PGF of the distribution of number of in- ∂ ∂
Φ = (β + γ)[µ̂(y) − y] ϕ
fected individuals at time t (assuming one 3.2 ∂t ∂y
infection at time 0).
or
∂
Φ = (β + γ)[µ̂(Φ) − Φ] .
∂t
Table 4: A summary of our results for application of PGFs to the continuous-time disease process. We
assume individuals transmit with rate β and recover with rate γ. The functions µ̂(y) = (βy 2 + γ)/(β + γ)
and µ̂(y, z) = (βy 2 + γz)/(β + γ) are given in System (14).
7
Question Section Solution
Final size relation for an SIR epidemic assuming r(∞) = 1 − χ(1 − r(∞)). [For standard as-
a vanishingly small fraction ρ randomly infected 4.2 sumptions, including the usual continuous-
initially with ρN ≫ 1. time assumptions, χ(x) = e−R0 (1−x) .]
For g > 0:
Table 5: A summary of our results for application of PGFs to the final size and large-time dynamics of
SIR disease. The PGFs χ and ψ encode the heterogeneity in susceptibility. The PGF χ is the PGF of
the ancestor
P distribution (an ancestor of u is any individual who, if infected, would infect u). The PGF
ψ(x) = κ p(κ)xκ encodes the distribution of the contact rates.
8
Probability of Log-Likelihood of
Distribution PGF
j infections Parameters given j
(jλ)j−1 −jλ
Poisson eλ(y−1) j! e −jλ + (j − 1) log(jλ) − log(j!)
( (
λ 1 j = 1, λ = 0 0 j = 1, ]λ = 0
Uniform y
0 otherwise −∞ otherwise
1 2j−2
log((2j − 2)!) − log((j − 1)!) −
Geometric p/(1 − qy) pj q j−1
j j−1 log(j!) + j log p + (j − 1) log q
r̂
log((r̂j +j −1)!)−log((r̂j −1)!)−
q 1 r̂j+j−2
Negative Binomial q r̂j pj−1
1−py j j−1 log(j!) + r̂j log q + (j − 1) log p
Table 6: The probability of j total infections in an infinite population for different offspring distributions,
derived using Theorem 2.7 and the corresponding log-likelihoods. For any one of these, if we sum the
probability of j over (finite) j, we get the probability that the outbreak remains finite in an infinite population.
This is particularly useful when inferring disease parameters from observed outbreak sizes (Section 2.4.1).
The parameters’ interpretations are given in Table 1.
1.2 Exercises
We end each section with a collection of exercises. We have designed these exercises to give the reader more
experience applying PGFs and to help clarify some of the more subtle points.
Exercise 1.1 Except for the Poisson distribution handled in Example 1.1, derive the PGFs shown in
Table 1 directly from the definition f (x) = i ri xi .
P
For the negative binomial, it may be useful to use the binomial series:
η(η − 1) 2 η(η − 1) · · · (η − i + 1) i
(1 + δ)η = 1 + ηδ + δ + ··· + δ + ···
2! i!
using η = −r̂ and δ = −px.
Exercise 1.2 Consider the binomial distribution with n trials, each having success probability p = λ/n.
Using Table 1, show that the PGF for the binomial distribution converges to the PGF for the Poisson
distribution in the limit n → ∞, if λ is fixed.
For results related to early extinction or early-time dynamics, we will assume that the population is large
enough and sufficiently well-mixed that the transmissions in successive generations are all independent events
9
Figure 1: A sample of 10 outbreaks starting with a bimodal distribution having R0 = 0.9 in which 3/10 of
the population causes 3 infections and the rest cause none. The top row denotes the initial states, showing
each of the 10 initial infections. An edge from one row to the next denotes an infection from the higher node
to the lower node. Most outbreaks die out immediately.
and unaffected by depletion of susceptible individuals. Before deriving our results for the early-time behavior
of our discrete-time model, we offer a summary in table 3.
Often in disease spread we are interested in the expected number of infections caused by an infected
individual early in an outbreak, which we define to be R0 .
X
R0 = ipi = µ′ (1) (2)
i
′ d
where µ (x) = dx µ(x). The value of R0 is related to disease dynamics, but it is not the only important
property of µ.
Example 2.1 We demonstrate a few sample outbreaks in Fig. 1. Here we take a bimodal case with R0 = 0.9
such that a proportion 0.3 of the population cause 3 infections and the remaining 0.7 cause none. Most of
the outbreaks die out immediately, but some persist, surviving multiple generations before extinction.
Example 2.2 Throughout Section 2 we compare simulated SIR outbreaks with the theoretical predictions
which we calculate using the Python package Invasion PGF described in Appendix C. We assume that all
individuals are equally likely to be infected by any transmission, and we focus on R0 = 0.75 and R0 = 2.
For each R0 , we consider two distributions for the number of new infections an infected individual causes:
10
Poisson Bimodal
10 10
Probability density
Probability density
Probability
Probability
0.4 0.6
6 6 0.4
0.2
0.2
R0 = 0.75
4 0.0 4 0.0
0 10 20 30 0 10 20 30
Number infected Number infected
2 2
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected
10 10
Probability density
Probability
Probability
0.3
0.10
6 6 0.2
0.05
0.1
R0 = 2
4 0.00 4 0.0
0 10 20 30 0 10 20 30
Number infected Number infected
2 2
0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected
Figure 2: Simulated outcomes of SIR outbreaks in populations as described in example 2.2. Outbreaks tend
to be either small or large. The typical number infected in small outbreaks (insets) is affected by the details
of the offspring distribution, but not the population size. The typical proportion infected in large outbreaks
(epidemics) appears to depend on the average number of transmissions an individual causes, but not the
population size or the offspring distribution. These observations will be explained later. These simulations
are reused throughout this section to show how PGFs capture different properties of the distributions.
11
2.1.1 Derivation as a fixed point equation
We present two derivations of the extinction probability. Our first is quicker, but gives less insight. We start
with the a priori observation that the extinction probability takes some value between 0 and 1 inclusive.
Our goal is to filter out the vast majority of these options by finding a property of the extinction probability
that most values between 0 and 1 do not have.
Let α be the probability of extinction if the
P spread starts from a single infected individual. Then from
Property A.1 of Appendix A we have α = i pi α̂i = µ(α̂) where α̂ is the probability that, in isolation,
an offspring of the initial infected individual would not cause an epidemic. Because we assume that the
offspring distribution of later cases is the same as for the index case, we must have α̂ = α and so the
extinction probability solves α = µ(α).
We have established:
Theorem 2.1 Assuming that each infected individual produces an independent number of offspring i
chosen from a distribution having PGF µ(y), then α, the probability an outbreak starting from a single
infected individual goes extinct, satisfies
α = µ(α) . (3)
Not all solutions to x = µ(x) must give the extinction probability.
There can be more than one x solving x = µ(x). In fact 1 = µ(1) is always a solution, and from
Property A.9 it follows that there is another solution if and only if R0 = µ′ (1) > 1. In this case, our
derivation of Theorem 2.1 does not tell us which of the solutions is correct. However, Section 2.1.2 shows
that the correct solution is the smaller solution when it exists. More specifically the extinction probability
is α = limg→∞ αg where αg = µ(αg−1 P ) starting with α0 = 0. This gives a condition for a nonzero epidemic
probability. Namely R0 = µ′ (1) = i ipi > 1.
Example 2.3 We now consider the Poisson and bimodal offspring distributions described in Example 2.2.
We saw that typically an outbreak either affects a small proportion of the population (a vanishing fraction
in the infinite population limit) or a large number (a nonzero fraction in the infinite population limit).
By plotting the cumulative density function (cdf ) of proportion infected in Fig. 3, we extend our earlier
observations. The cdf is steep near zero (becoming vertical in the infinite population limit). Then it is
effectively flat for a while. Finally if R0 > 1 it again grows steeply at some proportion infected well above
0 (the size of epidemic outbreaks).
The plateau’s height is the probability that an outbreak dies out while small. Fig. 3 shows that this is
well-predicted by choosing the smaller of the solutions to x = µ(x).
For a fixed R0 > 1, the the plateau’s height (i.e., the early extinction probability) depends on the details
of the offspring distribution and not simply R0 . However, the critical value at which the cdf increases for
the second time depends only on R0 . This suggests that even though the probability of an epidemic depends
on the details of the offspring distribution, the proportion infected in an SIR epidemic depends only on R0 ,
the reproductive number. We explore this in more detail in Section 4.2.
12
Poisson Bimodal
1.0 1.0
Cumulative probability
Cumulative probability
0.8 0.8
0.6 0.6
R0 = 0.75
0.4 0.4
Cumulative probability
0.8 0.8
Simulated, N = 100
0.2 0.2
Simulated, N = 1000
predicted early extinction probability
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected
Figure 3: Illustration of Theorem 2.1. The cumulative density function (cdf) for the total proportion
ever infected (effectively the integral of Fig. 2). For small R0 , all outbreaks die out without affecting a sizable
portion of the population. For larger R0 , there are many small outbreaks and many large outbreaks, but
very few outbreaks in between, so the cdf is flat in this range. The height of this plateau is the probability
the outbreak dies out while small. This is approximately the predicted extinction probability for an infinite
population (dashed). The probability of a small outbreak is different for the different distributions, but the
proportion infected corresponding to epidemics is the same (for given R0 ).
that the index case causes i infections, pi , times the probability none of those i individuals causes further
infections, α1i , summed over all i. We introduce the notation µ[g] (x) to be the result of iterative applications
of µ to x g times, so µ[1] (x) = µ(x) and for g > 1, µ[g] (x) = µ(µ[g−1] (x)). Then following Property A.1 we
have
α2 = p0 + p1 α1 + p2 α12 + · · · = µ(α1 ) = µ[2] (0)
We generalize this by stating that the probability an initial infection fails to initiate any length g chains is
equal to the probability that all of its i offspring fail to initiate a chain of length g − 1.
X
i
αg = pi αg−1 = µ(αg−1 ) = µ[g] (0) .
i
So the probability of not starting a chain of length at least g is found by iteratively applying the function µ
g times to x = 0. Taking g → ∞ gives the extinction probability [19]:
The fact that there is a biological interpretation of αg starting with α0 = 0 is important. It effectively
guarantees that the iterative process converges and that the speed of convergence reflects the typical speed
of extinction. Iteration appears to be an efficient way to solve x = µ(x) numerically and because of the
13
biological interpretation, we can avoid questions that might arise about whether there are multiple solutions
of x = µ(x) and, if so, which of them corresponds to the biological problem. Instead we simply iterate
starting from 0 and the result must converge to the probability that in an infinite population the outbreak
would go extinct in finite time, regardless of what other solutions x = µ(x) might have.
Exercise 2.1 shows that if µ(0) ̸= 0 then the limit of the sequence αg is 1 if R0 ≤ 1 and some α < 1
satisfying α = µ(α) if R0 > 1. This proves:
Theorem 2.2 Assume that each infected individual produces an independent number of offspring i chosen
from a distribution having PGF µ(y). Then
• The probability an outbreak goes extinct within g generations is
α = lim αg .
g→∞
• If R0 = µ′ (1) ≤ 1 and µ(0) ̸= 0 then α = 1. If R0 > 1 extinction occurs with probability α < 1.
Example 2.4 We now consider the Poisson and bimodal offspring distributions described in Example 2.2
Figure 4 shows that starting with α0 = 0 and defining αg = µ(αg−1 ), the values of αg emerging from
the iterative process correspond to the observed probability outbreaks have gone extinct by generation g for
early values of g.
In the infinite population limit, this provides a match for all g. So this gives the probability the outbreak
goes extinct by generation g assuming it has not grown large enough to see the finite-size of the population
(i.e., assuming it has not become an epidemic). For SIR epidemics in the finite populations we use for
simulations, the plateaus eventually give way to extinction because eventually there are not enough remaining
susceptibles.
Theorem 2.3 Assuming that each infected individual produces an independent number of offspring ℓ
chosen from a distribution with PGF µ(y), the number infected in the g-th generation has PGF
X
Φg (y) = ϕℓ (g)y ℓ = µ[g] (y) (6)
ℓ
where ϕi (g) is the probability there are ℓ active infections in generation g. This does not provide informa-
tion about the cumulative number infected.
14
1.0
y = µ(x)
Extinction probability
0.8
0.6
Poisson αg+1
R0 = 0.75 0.4
0.2
N = 100
N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
1.0
y = µ(x)
Extinction probability
0.8
0.6
αg+1
Bimodal
R0 = 0.75 0.4
0.2
N = 100
N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
0.40
0.35 N = 100
0.30
Extinction probability N = 1000
0.25
αg+1
Poisson 0.20
R0 = 2 0.15
0.10
0.05 y = µ(x)
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
0.5
Extinction probability
0.4
0.3
αg+1
Bimodal
R0 = 2 0.2
0.1
N = 100
y = µ(x) N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
Figure 4: Illustration of Theorem 2.2. Left: Cobweb diagrams showing convergence of iterations to
the predicted outbreak extinction probability (see Fig. 10). Right: Observed probabilities of no infections
remaining after each generation for simulations of Fig. 2 showing the probability of extinction by generation
g. Thin lines show the relation between the cobweb diagram and the extinction probabilities. The simulated
probability tends to rise quickly representing outbreaks that die out early on, then it remains steady at a
level representing the probability of outbreaks dying out while small. For R0 > 1 it increases again because
the epidemics burn through the finite population (and so the infinite population theory breaks down). The
values match the corresponding iteration of the cobweb diagrams.
15
Poisson Bimodal
0.8
Simulated, N = 100 Simulated, N = 100
0.7 0.8
Simulated, N = 1000 Simulated, N = 1000
0.6
predicted predicted
Proportion
Proportion
0.6
0.5
0.4
R0 = 0.75 0.4
0.3
0.2
0.2
0.1
0.0 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Active infections Active infections
0.200
Simulated, N = 100 0.35 Simulated, N = 100
0.175
Simulated, N = 1000 0.30 Simulated, N = 1000
0.150
predicted 0.25
predicted
Proportion
Proportion
0.125
0.20
0.100
R0 = 2 0.15
0.075
0.050 0.10
0.025 0.05
0.000 0.00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Active infections Active infections
Figure 5: Illustration of Theorem 2.3. Comparison of predictions and the simulations from Fig. 2 for the
number of active infections in the third generation. The bimodal case with N = 100 shows a clear impact
of population size as a sizable number of transmissions fail because the population is finite. The predictions
were made numerically using the summation in Property A.3.
for large M and any R ≤ 1. For each ym = Re2πim/M we can calculate Φg (ym ) = µ[g] (ym ) by numerically
iterating µ g times. Then for large enough M , this gives a remarkably accurate and efficient approximation
to the individual coefficients.
Example 2.5 We demonstrate Theorem 2.3 in Fig. 5, using the simulations from Example 2.2. Simula-
tions and predictions are in excellent agreement.
There is a mismatch noticeable for the bimodal distribution with R0 = 2 particularly with N = 100,
which is a consequence of the fact that the population is finite. In stochastic simulations, occasionally an
individual receives multiple transmissions even early in the outbreak, but in the PGF theory this does not
happen.
P
We are often interested in the expected number of active infections in generation g, ℓ ℓϕℓ (g) (however,
as seen below this is not the most relevant measure to use if R0 > 1). Property A.5 shows that this is given
by ∂
∂y Φg (y) . To calculate this we use Φg (1) = 1 for all g (Property A.4) and µ′ (1) = R0 . Then through
y=1
16
induction and the chain rule we show that ∂
∂y Φg (y)|y=1 = Rg0 :
∂ ∂
Φg (y) = µ(Φg−1 (y))
∂y y=1 ∂y y=1
′ ∂
= µ (Φg−1 (y)) × Φg−1 (y)
∂y y=1
′
= µ (1) × Rg−1
0
= Rg0 .
we initialized the induction with the case g = 1 which is the definition of R0 . If R0 < 1, this shows that we
expect decay.
If R0 > 1, there is a more relevant measure. On average we see growth, but a sizable fraction of
outbreaks may go extinct, and these zeros are included in the average, which alters our prediction. This is
closely related to the “push of the past” effect observed in phylodynamics [39]. For policy purposes, we are
more interested in the expected size if the outbreak is not yet extinct because a response that is scaled to
deal with the average size including those that are extinct is either too big (if the disease has gone extinct)
or too small (if the disease has become established) [36]. It is very unlikely to be just right. The expected
number infected in generation g conditional on the outbreaks not dying out by generation g is Rg0 /(1 − αg ).
This has an important consequence. We can have different extinction probabilities for different offspring
distributions with the same R0 . The disease with a higher extinction probability tends to have considerably
more infections in those outbreaks that do not go extinct.
We have
Corollary 2.1 In the infinite population limit, the expected number infected in generation g starting from
a single infection is
[I]g = Rg0 (7)
and the expected number starting from a single infection conditional on the disease persisting to generation
g is
Rg0
⟨I⟩g = (8)
1 − αg
We can explore higher moments through taking more derivatives of Φg (y) and evaluating at y = 1.
to be the PGF for the number of completed infections j at generation g. Although we use j to represent
recoveries, this model is still appropriate for SIS disease because we are interested in small outbreak sizes
17
in a well-mixed infinite population for which we can assume no previously infected individuals have been
reexposed. If the outbreak begins with a single infection, then
Ω0 (z) = 1 and Ω1 (z) = z
showing that the first individual (infectious during generation 0) completes his infection at the start of
generation 1. For generation 2 we have the initial individual and his direct offspring, so Ω2 (z) = zµ(z).
More generally, to calculate for g > 1, the completed infections consist of
• the initial infection
• the active infections in generation 1.
• any descendants of those active infections in generation 1 that will have recovered by generation g.
The distribution of the number of descendants of a generation 1 individual (including that individual) who
have recovered by generation g is given by Ωg−1 (z). That is each generation 1 individual and its descendants
for the following g − 1 infections have the same distribution as an initial infection and its descendants after
g − 1 generations.
From Property A.8 the number of descendants by generation g (not counting the initial infection) that
have recovered is distributed like µ(Ωg−1 (z)). Accounting for the initial individual requires that we increment
the count by 1 which requires increasing the exponent of z by 1. So we multiply by z. This yields
Ωg (z) = zµ(Ωg−1 (z))
To sustain an outbreak up to generation g there must be at least one infection in each generation from
0 to g − 1. So any outbreak with fewer than g completed infections at generation g must be extinct. So the
coefficient of z j does not change once g > j. Thus we have shown
Theorem 2.4 Assuming a single initial infection in an infinite population, the PGF Ωg (z) = j ωj (g)z j
P
for the distribution of the number of completed infections at generation g > 1 is given by
Example 2.6 We test Theorem 2.4 in Fig. 6, using the simulations from Example 2.2. Simulations and
predictions are in excellent agreement.
Example 2.7 Expected cumulative size It is instructive to calculate the expected number of completed
infections at generation g. Note that Ωg (z)|z=1 = 1, µ(1) = 1, and µ′ (1) = R0 . We use induction to show
Pg−1
that for g ≥ 1 the expected number of completed infections is j=0 Rj0 :
∂ ∂
Ωg (z) = zµ(Ωg−1 (z))
∂z z=1 ∂z z=1
∂
= µ(Ωg−1 (z)) + zµ′ (Ωg−1 (z)) Ωg−1 (z)
∂z z=1
g−2
X
= µ(1) + µ′ (1) Rj0
j=0
g−2
X
= 1 + R0 Rj0
j=0
g−1
X
= Rj0
j=0
This is in agreement with our earlier result that the expected number that are infected in generation j is
18
Poisson Bimodal
0.5
Proportion
Proportion
0.3
0.4
0.2
0.1
0.1
0.0 0.0
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Proportion
0.08 0.20
R0 = 2 0.06 0.15
0.04 0.10
0.02 0.05
0.00 0.00
0 5 10 15 20 25 30 0 5 10 15 20 25 30
Completed infections Completed infections
Figure 6: Illustration of Theorem 2.4 Comparison of predictions with the simulations from Fig. 2 for the
number of completed infections at the start of the third generation. The predictions were calculated using
Property A.3.
Rj0 .
This is
1−Rg
(
∂ ′ 0
1−R0 R0 =
̸ 1
Ω (z) =
∂z g z=1 g R0 = 1
As with our previous results, the sum shows a threshold behavior at R0 = 1. If R0 < 1, then in the limit
g → ∞, the expected cumulative outbreak size converges to the finite value 1/(1−R0 ). If R0 ≥ 1, it diverges.
This example shows
Corollary 2.2 In the infinite population limit the expected number of completed infections at the start of
generation g assuming a single randomly chosen initial infection is
1−Rg
(
∂ ′ 0
R0 ̸= 1
Ωg (z) = 1−R0 (10a)
∂z z=1 g R0 = 1
19
Assume we know the values ig−1 and rg−1 for generation g − 1. Then rg is simply ig−1 + rg−1 and
ig is distributed according to µ(y)ig−1 . So given those known ig−1 and rg−1 , the distribution for the next
generation would be [zµ(y)]ig−1 z rg−1 . Summing over all possible ig−1 and rg−1 yields
X
Πg (y, z) = πi,r (g − 1)[zµ(y)]i z r
i,r
= Πg−1 (zµ(y), z)
Π1 (y, z) = zµ(y)
Π2 (y, z) = zµ(zµ(y))
Theorem 2.5 Given a single initial infection in an infinite population, the PGF Πg (y, z) = i,r πi,r (g)y i z r
P
for the joint distribution of the number of active i and completed infections r in generation g is given by
with Π0 (y, z) = y.
Example 2.8 We demonstrate Theorem 2.5 in Fig. 7, using the same simulations as in Example 2.2.
Simulations and predictions are in excellent agreement.
We expect this to be the PGF for the final size of the outbreaks.
We can express the pointwise limit3 as
X
Ω∞ (z) = ωr z r + ω∞ z ∞
r
where for r < ∞ the coefficient ωr is the probability an outbreak causes exactly r infections in an infinite
population. We use ω∞ to denote the probability that the outbreak is infinite in an infinite population (i.e.,
that it is an epidemic), and we interpret z ∞ as 1 when z = 1 and 0 for 0 ≤ z < 1. So if epidemics are
3 Although this converges for any given z in [0, 1], it does not do so “uniformly” if R > 1. That is, for R > 1 no matter
0 0
how large g is, there are always some values of z < 1, but sufficiently close to 1, which are far from converged.
20
Simulated Prediction log10(probability)
0.50 0
0.25
−2
0.00
−4
30
Active infections
Active infections
25 −6
20
−8
15
−10
10
5 −12
0
0 5 10 15 20 25 30 0.0 0.5 0 5 10 15 20 25 30 0.0 0.5
Completed infections Completed infections
Simulated Prediction log10(probability)
0
0.5
−1
0.0
−2
30
Active infections
Active infections
−3
25
−4
20
15 −5
10 −6
5 −7
0
0 5 10 15 20 25 30 0.0 0.5 0 5 10 15 20 25 30 0.0 0.5
Completed infections Completed infections
Simulated Prediction log10(probability)
0
0.1
−2
0.0
−4
30
Active infections
Active infections
25 −6
20
−8
15
−10
10
5 −12
0
0 5 10 15 20 25 30 0.0 0.2 0 5 10 15 20 25 30 0.0 0.2
Completed infections Completed infections
Simulated Prediction log10(probability)
0
0.2
0.0 −1
30
Active infections
Active infections
−2
25
20
−3
15
10
−4
5
0
0 5 10 15 20 25 30 0.00 0.25 0 5 10 15 20 25 30 0.00 0.25
Completed infections Completed infections
Figure 7: Illustration of Theorem 2.5. Comparison of predictions and simulations for the joint distribu-
tion of the number of current and completed infections at generation g = 3. The predictions were calculated
using Property A.3. Left: simulations from Fig. 2 for N = 1000 and Right: predictions (note vertical
scales on left and right are the same). Top to Bottom: Poisson R0 = 0.75, Bimodal R0 = 0.75, Poisson
R0 = 2, and Bimodal R0 = 2. The predictions match our observations, with some difference for two reasons:
1) because 5 × 105 simulations cannot resolve events with probabilities as small as 10−12 , but the PGF
approach can, and 2) due to finite-size effects as occasionally an individual receives multiple transmissions
even early on. The plots also show the marginal distributions, matching Figs. 5 and 6.
21
P
possible, Ω∞ (z) has a discontinuity at z = 1, and the limit as z → 1 from below gives r<∞ ωr = 1 − ω∞
which is the extinction probability α.
We now look for a recurrence relation for Ω∞ (z) in the infinite population limit. Each offspring of
the initial infection independently causes a set of infections. The distribution of the these new infections
(including the original offspring) also has PGF Ω(z). So the distribution of the number of descendants
of the initial infection (but not including the initial infection) has PGF µ(Ω∞ (z)). To include the initial
infection, we must increase the exponent of z by one, which we do by multiplying by z. We conclude that
Ω∞ (z) = zµ(Ω∞ (z)). Although we have shown that Ω∞ (z) solves f (z) = zµ(f (z)), we have not shown that
there is only one function that solves this.
We may be interested in the outbreak size distribution conditional on the outbreak going extinct. For
this we are looking at Ω∞ (z)/α for any z < 1, and at z = 1, this is simply 1. Note that if R0 < 1 then
α = 1.
Summarizing this we have
Theorem 2.6 Given a single initialP infection in an infinite population, consider Ω∞ (z), the PGF for the
∞
final size distribution: Ω∞ (z) = r<∞ ωr z r
+ ω ∞ z where z ∞ = 0 if |z| < 1 and 1 if |z| = 1.
• Then (
zµ(Ω∞ (z)) ̸ 1
z=
Ω∞ (z) = . (12)
1 z=1
• The PGF for outbreak size distribution conditional on the outbreak being finite is
(
Ω∞ (z)/α 0 < z < 1
1 z=1
Perhaps surprisingly we can often find the coefficients of Ω∞ (z) analytically if µ(y) is known. We use a
remarkable result showing that the probability of infecting exactly n individuals is equal to the coefficient
of z n−1 in [µ(z)]n [8, 15, 22, 51]. The theorem is
Theorem 2.7 Given an offspring distribution with PGF µ(y), for j < ∞ the coefficient of z j in Ω∞ (z)
(j) P (j)
is 1j pj−1 where [µ(y)]j = i pi y i .
That is, the probability of having exactly j < ∞ infections in an outbreak starting from a single infection
is 1j times the coefficient of y j−1 in [µ(y)]j .
We prove this theorem in Appendix B. The proof is based on observing that if we draw a sequence
of j numbers from the offspring distribution, the probability they sum to j − 1 (corresponding to j − 1
transmissions and hence j infected individuals including the index case) is the coefficient of z j−1 in [µ(z)]j .
A fraction 1/j of these satisfy additional constraints needed to correspond to a valid transmission tree4 and
(j)
thus the probability of a valid transmission tree with exactly j − 1 transmissions is 1/j times pj−1 .
j−1
1 d
Because the coefficient of y j−1 in [µ(y)]j is (j−1)! dy [µ(y)]j y=0 (by Property A.2), we have that
the probability of an outbreak of size j is
j−1
1 d
[µ(y)]j
j! dy
y=0
It is enticing to think there may be a similar theorem for coefficients of Π(y, z), but we are not aware of
one. The theorem has been generalized to models having multiple types of individuals [28].
4 If the index case causes 0 infections and its first offspring causes 1 infection, we have a sequence of two numbers that sum
to 1, but it is biologically meaningless because it does not make sense to talk about the first offspring of an individual who
causes no infections.
22
Poisson Bimodal
0.5
Probability
Probability
0.5
0.3
1 1
j × (j − 1th coeff of [µ(y)]j ) 0.4 j × (j − 1th coeff of [µ(y)]j )
R0 = 0.75 0.2 0.3
0.2
0.1
0.1
0.0 0.0
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Probability
1 1
0.08
j × (j − 1th coeff of [µ(y)]j ) 0.20
j × (j − 1th coeff of [µ(y)]j )
R0 = 2 0.06 0.15
0.04 0.10
0.02 0.05
0.00 0.00
0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Final Size Final Size
Figure 8: Illustration of Theorems 2.6 and 2.7. The final size of small outbreaks predicted by Theo-
rem 2.6 and by Theorem 2.7 as calculated using Property A.3 matches observations from the simulations in
Fig. 2 (see also insets of Fig. 2).
Example 2.9 We demonstrate Theorems 2.6 and 2.7 in Fig. 8, using the same simulations as in Exam-
ple 2.2.
Example 2.10 The PGF for the negative binomial distribution with parameters p and r̂ (with q = 1 − p)
is r̂
q
µ(y) =
1 − py
We can rewrite this as
µ(y) = q r̂ (1 − py)−r̂
We will use this to find the final size distribution. We expand [µ(y)]j = q r̂j (1 − py)−r̂j using the binomial
series
η(η − 1) 2 η(η − 1) · · · (η − i + 1) i
(1 + δ)η = 1 + ηδ + δ + ··· + δ + ···
2! i!
which holds for integer or non-integer η. Then with −py, −r̂j, and j − 1 playing the role of δ, η, and i:
23
individuals is
1 r̂j + j − 2 r̂j j−1
q p
j j−1
A variation of this result for non-integer r̂ is commonly used in work estimating disease parameters [8, 40].
Exercise 2.12 generalizes the formula for this.
Applying Theorem 2.7 to several different families of distributions yields Table 6 for the probability of a
final size j.
Here we think of Θ as the specific parameter values and X as the observed data (typically the observed size
of an outbreak or sizes of multiple independent outbreaks, in which case P (X|Θ) comes from Theorem 2.7 or
Table 6). In our calculations we can simply use the fact that P (Θ|X) ∝ P (X|Θ)P (Θ) with a normalization
constant which can be dealt with at the end.
The prior for Θ is the probability distribution we assume for the parameter values before observing the
data, given by P (Θ). We often simply assume that all parameter values are equally probable initially.
The likelihood of the parameters Θ is defined to be P (X|Θ), the probability that we would observe X
for the given parameter values. If we are choosing between two sets of parameter values Θ1 and Θ2 and
the observations have consistently higher likelihood for Θ2 , then we intuitively expect that Θ2 is the more
probable parameter value.
In practice the likelihood may be very small which can lead to numerical error. It is often useful to
instead look at log-likelihood 5 , log P (X|Θ). For example, if we have many observed outbreak sizes, the
likelihood P (X|Θ) under independence is the product of the probabilities of each individual outbreak size.
The likelihood is thus quite small (perhaps less than machine precision), while the log-likelihood is simply
the sum of the log-likelihoods of each individual observation.
We know that
log P (Θ|X) − C = log P (X|Θ) + log P (Θ)
where C is the logarithm of the proportionality constant 1/P (X) in Equation (13). If we have a prior and
the likelihood, the right hand side can be calculated. It is often possible (and advisable) to calculate the log
likelihood log P (X|Θ) directly rather than calculating P (X|Θ) and then taking the logarithm.
Exponentiating the right hand side and then finding the appropriate normalization constant will yield
P (Θ|X). Numerically the numbers may be very small when we exponentiate, so to prior to exponentiating
it is advisable to add a constant value to all of the expressions. This constant is corrected for in the final
normalization step.
We now provide the steps for a numerical calculation of P (Θ|X) given the prior P (Θ), the observations
X, and the log likelihood log P (X|Θ).
24
2. Find the maximum Xmax over all Θ and subtract it to yield fˆ(Θ) = log P (X|Θ) + log P (Θ) − Xmax .
Note that Xmax ≤ 0, and this brings all of our numbers closer to zero.
ˆ ˆ
3. Calculate g(Θ) = ef (Θ) . This will be proportional to P (Θ|X). Note that by using ef (Θ) rather than
ef (Θ) we have reduced the impact of roundoff error.
4. Find the normalization constant Θ′ g(Θ′ ). Then
P
g(Θ)
P (Θ|X) = P ′
Θ′ g(Θ )
Note that if Θ comes from a continuous distribution rather than a discrete distribution, then the same
approach works, except that P is a probability density and the summation in the final step becomes an
integral.
Example 2.11 A frequent assumption is that the offspring distribution is negative binomial. Let us make
this assumption with unknown p and r̂.
To artificially simplify the problem, we assume that we know that there are only two possible pairs of
Θ = (p, r̂), namely Θ1 = (p1 , r̂1 ) = (0.02, 40) or Θ2 = (p2 , r̂2 ) = (0.03, 20), and that our a priori belief is
that they are equally probable.
After observing 2 independent outbreaks, with total sizes j1 = 8 and j2 = 7, we want to use our
observations to update P (Θ).
From Table 6, the likelihood of a given Θ given the two independent observations is
Y 1 r̂j + j − 2
f (Θ) = log q r̂j pj−1 + log 0.5
j=7,8
j j − 1
−
X 1 r̂j + j 2
= log q r̂j pj−1 + log 0.5
j=7,8
j j − 1
X
= log((r̂j + j − 2)!) − log(j!) − log((r̂j − 1)!) + r̂j log q + (j − 1) log p + log 0.5
j=7,8
In problems like this, we will often encounter logarithms of factorials. Many programming languages provide
this, typically using Stirling’s approximation. For example, Python, R, and C++ all have a special function
lgamma which calculates the natural log of the absolute value of the gamma functiona We find
f (Θ1 ) ≈ −8.495
f (Θ2 ) ≈ −9.135
g(Θ1 ) = 1
g(Θ2 ) ≈ 0.5277
So now
1
P (Θ1 |X) ≈ ≈ 0.6546
1.5277
0.5277
P (Θ2 |X) ≈ ≈ 0.3454
1.5277
25
So rather than the two parameter sets being equally probable, Θ2 is now about half as likely as Θ1 given the
observed data.
a The Gamma function is an analytic function that satisfies Γ(n) = (n + 1)! for positive integer values so to calculate log(n!)
2.6 Exercises
Exercise 2.1 Monotonicity of αg
a. By considering the biological interpretation of αg , explain why the sequence of inequalities 0 = α0 ≤
α1 ≤ · · · ≤ 1 should hold. That is, explain why α0 = 0, why the αi form a monotonically increasing
sequence, and why all of them are at most 1.
b. Show that αg therefore converges to some non-negative limit α that is at most 1 and that α = µ(α).
c. Use Property A.9 to show that if µ(0) ̸= 0 there exists a unique α < 1 solving α = µ(α) if and only if
R0 = µ′ (1) > 1.
d. Assuming µ(0) ̸= 0, use Property A.9 to show that if R0 > 1 then αg converges to the unique α < 1
solving α = µ(α), and otherwise αg converges to 1.
Exercise 2.2 Use Theorem 2.2 to prove Theorem 2.1.
Exercise 2.3 Show that if µ(0) = 0, then limg→∞ αg = 0. By referring to the biological interpretation of
µ(0) = 0, explain this result.
Exercise 2.4 Find all PGFs µ(y) with R0 ≤ 1 and µ(0) = 0. Why were these excluded from Theorem 2.2?
Exercise 2.5 Larger initial conditions
Assume that disease is introduced with m infections rather than just 1, or that it is not observed by
surveillance until m infections are present. Assume that the offspring distribution PGF is µ(y).
a. If m is known, find the extinction probability.
b. If m is unknown but its distribution has PGF h(y), find the extinction probability.
Exercise 2.6 Extinction probability
Consider a disease in which p0 = 0.1, p1 = 0.2, p2 = 0.65, and p3 = 0.05 with a single introduced
infection.
a. Numerically approximate the probability of extinction within 0, 1, 2, 3, 4, or 5 generations up to five
significant digits (assuming an infinite population).
b. Numerically approximate the probability of eventual extinction up to five significant digits (assuming
an infinite population).
c. A surveillance program is being introduced, and detection will lead to a response. But it will not be
soon enough to affect the transmissions from generations 0 and 1. From then on p0 = 0.3, p1 = 0.4,
p2 = 0.3, and p3 = 0. Numerically approximate the new probability of eventual extinction after an
introduction in an unbounded population [be careful that you do the function composition in the right
26
order – review Properties A.1 and A.8].
Exercise 2.7 We look at two inductive derivations of Φg (y) = µ[g] (y). They are similar, but when adapted
to the continuous-time dynamics we study later, they lead to two different models. We take as given that
Φg−1 (y) gives the distribution of the number of infections caused after g−1 generations starting from a single
case. One argument is based on discussing the results of outcomes attributable to the infectious individuals
of generation g − 1 in the next generation. The other is based on the outcomes indirectly attributable to the
infectious individuals of generation 1 through their descendants after another g − 1 generations.
a. Explain why Property A.8 shows that Φg (y) = Φg−1 (µ(y)).
b. (without reference to a) Explain why Property A.8 shows that Φg (y) = µ(Φg−1 (y)).
Exercise 2.8 Use Theorem 2.3 to prove the first part of Theorem 2.2.
Exercise 2.9 How does Corollary 2.1 change if we start with k infections?
Exercise 2.10 Assume the PGF of the offspring size distribution is µ(y) = (1 + y + y 2 )/3.
d. For the Geometric distribution, follow example 2.10 (noting that p and q interchange roles).
Exercise 2.14 To help model continuous-time epidemics, Section 3 will use a modified version of µ, which
in some contexts will be written as µ̂(y, z). To help motivate the use of two variables, we reconsider the
discrete case. We think of a recovery as an infected individual disappearing and giving birth to a recovered
individual and a collection of infected individuals.PLook back at the discrete-time calculation of Ωg and Πg .
Define a two-variable version of µ as µ(y, z) = z i ri y i = zµ(y).
a. What is the biological interpretation of µ(y, z) = zµ(y)?
b. Rewrite the recursive relations for Ωg using µ(y, z) rather than µ(y).
c. Rewrite the recursive relations for Πg using µ(y, z) rather than µ(y).
The choice to use µ(y, z) versus µ(y) is purely a matter of convenience.
Exercise 2.15 Consider Example 2.11. Assume that a third outbreak is observed with 4 infections. Cal-
culate the probability of Θ1 and Θ2 given the data starting
a. with the assumption that P (Θ1 ) = P (Θ2 ) = 0.5 and X consists of the three observations j = 7, j = 8,
and j = 4.
b. with the assumption that P (Θ1 ) = 0.6546 and P (Θ2 ) = 0.3454 and X consists only of the single
observation j = 4.
27
c. Compare the results and explain why they should have the relation they do.
Exercise 2.16 Assume that we know a priori that the offspring distribution for a disease has a negative
binomial distribution with p = 0.02. Assume that our a priori knowledge of r̂ is that it is an integer
uniformly distributed between 1 and 80 inclusive. Given observed outbreaks of sizes 1, 4, 5, 6, and 10:
a. For each r̂, calculate P (r̂|X) where X is the observed outbreak sizes. Plot the result.
b. Find the probability that R0 = µ′ (1) is greater than 1.
Theorem 3.1 For the continuous-time Markovian model of disease spread in an infinite population, the
probability of extinction given a single initial infection is
28
3.1.1 Extinction probability as a function of time
In the discrete-time case, we were interested in the probability of extinction after some number of generations.
When we are using a continuous-time model, we are generally interested in “what is the probability of
extinction by time t?”
To answer this, we set α(t) to be the probability of extinction within time t. We will calculate the
derivative of α at time t by using some mathematical sleight of hand to find α(t + ∆t) − α(t). Then dividing
this by ∆t and taking ∆t → 0 will give the result. Our approach is closely related to backward Kolmogorov
equations (described later below).
We choose the time step ∆t to be small enough that we can assume that at most one event happens
between time 0 and ∆t. The probabilities of having 0, 1, or 2 infections are P (I(∆t) = 0) = γ∆t + O(∆t),
P (I(∆t) = 1) = 1 − (β + γ)∆t + O(∆t) and P (I(∆t) = 2) = β∆t + O(∆t) where the O notation means
that the error goes to zero fast enough that O(∆t)/∆t → 0 as ∆t → 0. The probability of having 3 or more
infections in the interval (that is, multiple transmission events) is O(∆t) as well.
If there are two infected individuals at time ∆t, then the probability of extinction by time t + ∆t is α(t)2 .
Similarly, if there is one infected at time ∆t, the probability of extinction by time t + ∆t is α(t); and if there
are no infections at time ∆t, then the probability of extinction by time t + ∆t is 1 = α(t)0 . So up to O(∆t)
we have
∞
X
α(t + ∆t) = P (I(∆t) = i) α(t)i
i=0
= [γ∆t]α(t)0 + [1 − (β + γ)∆t]α(t) + [β∆t]α(t)2 + O(∆t)
= α(t) + ∆t(β + γ) [µ̂(α(t)) − α(t)] + O(∆t) (16)
Thus
α̇ = lim [α(t + ∆t) − α(t)]/∆t = (β + γ) [µ̂(α) − α]
∆t→0
and so
Theorem 3.2 Given an infinite population with constant transmission rate β and recovery rate γ, then
α(t), the probability of extinction by time t assuming a single initial infection at time 0 solves
α̇ = (β + γ) [µ̂(α) − α] (17)
We could solve this analytically (Exercise 3.4), but most results are easier to derive directly from the
ODE formulation.
where ϕi (t) is the probability of i actively infected individuals at time t. We will derive equations for the
evolution of Φ(y, t). We assume that Φ(y, 0) = y so a single infected individual exists at time 0.
Our goal is to derive equations telling us how Φ changes in time. We will use two approaches which were
hinted at in exercise 2.7, yielding two different partial differential equations. Although their appearance is
different, for the appropriate initial condition, their solutions are the same. These equations are called the
forward and backward Kolmogorov equations.
We briefly describe the analogy between the forward and backward Kolmogorov equations and exer-
cise 2.7:
29
• Our first approach finds the forward Kolmogorov equations. This is akin to exercise 2.7 where we
found Φg (y) by knowing the PGF Φg−1 (y) for the number infected in generation g − 1 and recognizing
that since the PGF for the number of infections each of them causes is µ(y), we must have Φg (y) =
Φg−1 (µ(y)).
• Our second approach finds the backward Kolmogorov equations which are more subtle and can be
derived similarly to how we derived the ODE for extinction probability in Theorem 3.2. This is akin to
exercise 2.7 where we found Φg (y) by knowing that the PGF for the number infected in generation 1 is
µ(y), and recognizing that after another g − 1 generations each of those creates a number of infections
whose PGF is Φg−1 (y) and so Φg (y) = µ(Φg−1 (y)).
For both approaches, we make use of the observation that for ∆t ≪ 1, we can write the PGF for the
number of infections resulting from a single infected individual at time t = 0 to be
This says that with probability approximately β∆t a transmission happens and we replace y by y 2 and with
probability approximately γ∆t a recovery happens and we replace y by 1. With probability O(∆t) multiple
events happen. We can rewrite this as
Forward equations For this we use Φ(y, t1 + t2 ) = Φ(Φ(y, t2 ), t1 ) with t2 playing the role of ∆t and t1
playing the role of t.
So Φ(y, t + ∆t) = Φ(Φ(y, ∆t), t). For small ∆t (and taking Φy (Φ(y, 0), t) to be the partial derivative of
Φ with respect to its first argument), we have
Backward equations In the backward direction we have Φ(y, t1 + t2 ) = Φ(Φ(y, t2 ), t1 ) with t2 playing
the role of t and t1 playing the role of ∆t.
30
So Φ(y, t + ∆t) = Φ(y, ∆t + t) = Φ(Φ(y, t), ∆t). Note that because Φ(y, 0) = y, we have Φ(Φ(y, t), 0) =
Φ(y, t). Thus for small ∆t, we expand Φ as a Taylor Series in its second argument t
To avoid ambiguity, we use Φt to denote the partial derivative of Φ with respect to its second argument t.
So
Φ(y, t + ∆t) − Φ(y, t)
Φ̇(y, t) = lim
∆t→0 ∆t
Φ(y, t) + (∆t)(β + γ)[µ̂(Φ(y, t)) − Φ(y, t)] + O(∆t) − Φ(y, t)
= lim
∆t→0 ∆t
= (β + γ)[µ̂(Φ(y, t)) − Φ(y, t)] .
It is perhaps remarkable that such seemingly different equations yield the same solution for the given
initial condition.
P
Example 3.1 The expected number of infections in the infinite population limit is given by [I] = i ipi (t) =
∂
∂y Φ(1, t). From this we have
d ∂ ∂
[I] = Φ(y, t)
dt ∂t ∂y y=1
∂ ∂
= (β + γ)[µ̂(y) − y] Φ(y, t)
∂y ∂y y=1
∂ ∂2
= (β + γ)[µ̂′ (y) − 1] Φ(y, t) + (β + γ)[µ̂(y) − y] Φ(y, t)]
∂y ∂y 2 y=1
2
∂
= (β + γ)[µ̂′ (1) − 1][I] + (β + γ)[µ̂(1) − 1] Φ(y, t)
∂y 2 y=1
= (β + γ)[(2β)/(β + γ) − 1][I]
= (β − γ)[I]
∂2
We used µ̂(1) = 1 to eliminate the ∂y 2 Φ(y, t) term and replaced µ̂′ (1) with 2β/(β + γ). Using this and
[I](0) = 1, we have
[I] = e(β−γ)t .
31
Corollary 3.1 In the infinite population limit, if a disease starts with a single infection, then the expected
number of active infections at time t solves
Forward Kolmogorov formulation To derive the forward Kolmogorov equations for the PGF Π(y, z, t),
we use Property A.11, noting that all transition rates are proportional to i. The rate of transmission is βi
and the rate of recovery is γi. There are no interactions to consider. So
βy 2
∂ γz ∂
Π(y, z, t) = (β + γ) + −y Π(y, z, t)
∂t β+γ β+γ ∂y
∂
= (β + γ) µ̂(y, z) − y Π(y, z, t)
∂y
Backward Kolmogorov formulation To derive the backward Kolmogorov equations for the PGF Π, we
use a modified version of Property A.12 to account for two types of individuals (Exercise A.14, with events
proportional only to the infected individuals). We find
Theorem 3.4 Assuming a single initial infection in an infinite population, the PGF Π(y, z, t) for the
joint distribution of the number of current and completed infections at time t solves
∂ ∂
Π(y, z, t) = (β + γ) µ̂(y, z) − y Π(y, z, t) (21)
∂t ∂y
as well as
∂
Π(y, z, t) = (β + γ) µ̂(Π(y, z, t), z) − Π(y, z, t) (22)
∂t
both with the initial condition Π(y, z, 0) = y.
It is again remarkable that these seemingly very different equations have the same solution.
Example 3.2 The expected number of completed infections at time t is
X ∂
[R] = kpjk = Π(y, z, t)
∂z y=z=1
j,k
(although we use R, this approach is equally relevant for counting completed infections in the SIS model
32
because of the infinite population assumption). Its evolution is given by
d ∂ ∂
[R] = Π(y, z, t)
dt ∂t ∂z y=z=1
∂ ∂
= (β + γ) µ̂(y, z) − y Π(y, z, t)
∂z ∂y y,z=1
∂ ∂ ∂ ∂
= (β + γ) µ̂(y, z) Π(y, z, t) + [µ̂(y, z) − y] Π(y, z, t)
∂z ∂y ∂z ∂y y=z=1
γ ∂ ∂ ∂
= (β + γ) Π(y, z, t) + 0 Π(y, z, t)
β + γ ∂y ∂z ∂y y=z=1
= γ[I]
∂ ∂
where we use the fact that µ̂(1, 1) = 1, ∂z µ̂(y, z) = γ/(β + γ), and [I] = ∂y Π(y, z, t) . Our result
y=z=1
says that the rate of change of the expected number of completed infections is γ times the expected number
of current infections.
This example proves
Corollary 3.2 In the infinite population limit the expected number of recovered individuals as a function
of time solves
d
[R] = γ[I] (23)
dt
to be the PGF of the distribution of outbreak final sizes in an infinite population, with ω∞ z ∞ representing
epidemics and for j < ∞ ωj representing the probability of an outbreak that infects exactly j individuals.
We use the convention that z ∞ = 0 for z < 1 and 1 for z = 1. To calculate Ω∞ , we make observations
that the outbreak size coming from a single infected individual is 1 if the first thing that individual does is
a recovery or it is the sum of the outbreak sizes of two infected individuals if the first thing the individual
does is to transmit (yielding herself and her offspring).
Thus we have
β γ
Ω∞ (z) = [Ω∞ (z)]2 + z
β+γ β+γ
= µ̂(Ω∞ (z), z)
As for the discrete-time case we may solve this iteratively, starting with the guess Ω∞ (z) = z. Once n
iterations have occurred, the first n coefficients of Ω∞ (z) remain constant. Note that unlike the discrete
case, here Ω∞ (z) ̸= z µ̂(Ω∞ (z)). This yields
Theorem 3.5 The PGF Ω∞ (z) = j ωj z j +ω∞ z ∞ for the final size distribution assuming a single initial
P
infection in an infinite population solves
with Ω∞ (1) = 1. This function is discontinuous at z = 1. For the final size distribution conditional on
33
the outbreak being finite, the PGF is continuous and equals
(
Ω∞ (z)/α 0 ≤ z < 1
1 z=1
Theorem 3.6 Consider continuous-time outbreaks with transmission rate β and recovery rate γ in an
infinite population with a single initial infection. The probability the outbreak causes exactly j infections
for j < ∞ [that is, the coefficient of z j in Ω∞ (z)] is
1 β j−1 γ j 2j − 2
ωj =
j (β + γ)2j−1 j − 1
We prove this theorem in appendix B. The proof is based on observing that if there are j total infected
individuals, this requires j − 1 transmissions and j recoveries. Of the sequences of 2j − 1 events that have
the right number of recoveries and transmissions, a fraction 1/(2j − 1) of these satisfy additional constraints
required to be a valid sequence leading to j infections (the sequence cannot lead to 0 infections prior to the
last step). Alternately, we can note that the offspring distribution is geometric and use Table 6.
3.5.1 SIS
We start with the SIS model. We set ξs,i (t) to be the probability of s susceptible and i actively infected
individuals at time t. We define the PGF for the joint distribution of susceptible and infected individuals
X
Ξ(x, y, t) = ξs,i (t)xs y i
i
β
At rate N si, successful transmissions occur, moving the system from the state (s, i) to (s − 1, i + 1), which is
equivalent to removing one susceptible individual and one infected individual, and replacing them with two
infected individuals. Following property A.11, this is represented by
β 2 ∂ ∂
(y − xy) Ξ.
N ∂x ∂y
At rate γi, recoveries occur, moving the system from the state (s, i) to (s + 1, i − 1), which is equivalent to
removing one infected individual and replacing it with a susceptible individual. This is represented by
∂
γ(x − y) Ξ.
∂y
34
So the PGF solves
β 2 ∂ ∂ ∂
Ξ̇ = (y − xy) Ξ + γ(x − y) Ξ
N ∂x ∂y ∂y
It is sometimes useful to rewrite this as
β ∂ ∂
Ξ̇ = (y − x) y −γ Ξ
N ∂x ∂y
We have
Theorem 3.7 For SIS dynamics in a finite population we have
∂ β ∂ ∂ ∂
Ξ = (y 2 − xy) Ξ + γ(x − y) Ξ (25)
∂t N ∂x ∂y ∂y
We can use this to derive equations for the expected number of susceptible and infected individuals.
Example 3.3 We use [S] and [I] to denote the expected number of susceptible and infected individuals at
time t. We have
X X ∂
[S] = sξsi (t) = sξsi 1s−1 1i = Ξ(1, 1, t)
s,i s,i
∂x
X X ∂
[I] = iξsi (t) = iξsi 1s 1i−1 = Ξ(1, 1, t)
s,i s,i
∂y
Then we have
∂ ∂
[Ṡ] = Ξ(1, 1, t)
∂t ∂x
∂ ∂
= Ξ(x, y, t)
∂x ∂t x=y=1
∂ β ∂ ∂
= (y − x) y −γ Ξ
∂x N ∂x ∂y x=y=1
∂ β ∂ ∂ β ∂ ∂
= (y − x) y −γ Ξ − y −γ Ξ
∂x N ∂x ∂y N ∂x ∂y x=y=1
β
= − [SI] + γ[I]
N
In the final line, we eliminated the first term because y − x is zero at x = y = 1. Similar steps show that
˙ = β
[I] [SI] − γ[I]
N
but the derivation is faster if we simply note [S] + [I] = N is constant. This proves
Corollary 3.3 For SIS disease, the expected number infected and susceptible solves
d β
[S] = − [SI] + γ[I] (26)
dt N
d β
[I] = [SI] − γ[I] (27)
dt N
35
where [SI] is the expected value of the product si.
3.5.2 SIR
Now we consider the SIR model. A review of various techniques (including PGF-based methods) to find
the final size distribution of outbreaks in finite-size populations can be found in [23]. Here we focus on the
application of PGFs to find the full dynamics. For a given s and i, infection occurs at rate βsi/N . It appears
as a departure from the state (s, i) and entry into (s − 1, i + 1). Following property A.11, this is captured by
β 2 ∂ ∂
(y − xy) Ξ.
N ∂x ∂y
Recovery is captured by
∂
γ(1 − y) Ξ
∂y
[note the difference from the SIS case in the recovery term]. So we have
∂ β(y 2 − xy) ∂ ∂ ∂
Ξ= Ξ + γ(1 − y) Ξ (28)
∂t N ∂x ∂y ∂y
We follow similar steps to example 3.3 to derive equations for [S] and [I] in Exercise 3.16. The result of
this exercise should show
Corollary 3.4 For SIR disease, the expected number of susceptible, infected, and recovered individuals
solves
d β
[S] = − [SI] (29)
dt N
d β
[I] = [SI] − γ[I] (30)
dt N
d
[R] = γ[I] (31)
dt
where [SI] is the expected value of the product si.
3.6 Exercises
Exercise 3.1 Extinction Probability
Let β and γ be given with µ̂(y) = (βy 2 + γ)/(β + γ).
a. Analytically find solutions to y = µ̂(y).
b. Assume β < γ. Find all solutions in [0, 1].
c. Assume β > γ. Find all solutions in [0, 1].
Exercise 3.2 Consistency with discrete-time formulation.
Although we have argued that a transmission in the continuous-time disease transmission case can be
treated as if a single infected individual has two infected offspring and then disappears, this is not what
actually happens. In this exercise we look at the true offspring distribution of an infected individual before
recovery, and we show that the ultimate predictions of the two versions are equivalent.
Consider a disease in which individuals transmit at rate β and recover at rate γ. Let pi be the probability
an infected individual will cause exactly i new infections before recovering.
36
b. Explain why pi = β i γ/(β + γ)i+1 . So pi form a geometric distribution.
c. Show that µ(y) = i pi y i can be expressed as µ(y) = γ/(β + γ − βy). [This definition of µ without
P
the hat corresponds to the discrete-time definition]
d. Show that the solutions to y = µ(y) are the same as the solutions to y = µ̂(y) = (βy 2 + γ)/(β + γ).
So the extinction probability can be calculated either way. (You do not have to find the solutions to
do this, you can simply show that the two equations are equivalent).
Exercise 3.3 Relation with R0
Take µ(y) = γ/(β + γ − βy) as given in exercise 3.2 and µ̂ = (βy 2 + γ)/(β + γ).
a. Show that µ′ (1) ̸= µ̂′ (1) in general.
b. Show that when R0 = µ′ (1) = 1, then µ′ (1) = µ̂′ (1) = 1. So both are still threshold parameters.
Exercise 3.4 Revisiting eventual extinction probability.
We revisit the results of exercise 3.1 using Eq. (17) (without solving it).
a. Following the derivation of Eq. (16), approximate ϕ0 (∆t), ϕ1 (∆t), and ϕ2 (∆t) for small ∆t.
b. From biological grounds explain why if there are 0 infections at time ∆t then there are also 0 infections
at time t0 + ∆t.
c. If there is 1 infection at time ∆t, what is the probability of 1 infection at time t0 + ∆t?
d. If there are 2 infections at time ∆t, what is the probability of 1 infection at time t0 + ∆t?
e. Write ϕ1 (t0 + ∆t) in terms of ϕ0 (t0 ), ϕ1 (t0 ), ϕ1 (∆t), and ϕ2 (∆t).
f. Using the definition of the derivative, find an expression for ϕ̇1 in terms of ϕ1 (t) and ϕ2 (t).
Exercise 3.6 In this exercise we derive the PGF version of the forward Kolmogorov equations by directly
calculating the rate of change of the probabilities of the states. Define ϕj (t) to be the probability that there
are j active infections at time t.
We have the forward Kolmogorov equations:
a. Explain each term on the right hand side of the equation for ϕ̇j .
∂ j
P
b. By expanding Φ̇(y, t) = ∂t j ϕj y , arrive at Equation (18).
Exercise 3.7 In this exercise we follow [3, 6] and derive the PGF version of the backward Kolmogorov
equations by directly calculating the rate of change of the probabilities of the states. Define ϕki (t) to be the
probability of i infections at time t given that there were k infections at time 0. Although we assume that
at time 0 there is a single infection, we will need to derive the equations for arbitrary k.
37
a. Explain why
ϕki (t + ∆t) = ϕki (t) − k(β + γ)ϕki (t)∆t + k(βϕ(k+1)i (t) + γϕ(k−1)i (t)) + O(∆t)
c. Show that
Φ̇(y, t|1) = −(β + γ)Φ(y, t|1) + βΦ(y, t|2) + γΦ(y, t|0)
b. Show that if we substitute Φ(y, t|k) = [Φ(y, t)]k in place of Φ(y, t) in Eq. (18) the equation remains
true with the initial condition y k .
c. Show that if we substitute Φ(y, t|k) = [Φ(y, t)]k in place of Φ(y, t) in equation (19) we do not get a
true equation.
So Eq. (18) applies regardless of the initial condition, but Eq. (19) is only true for the specific initial
condition of one infection.
Exercise 3.9 Let Φ(y, t|k) be the PGF for the number of infections assuming there are initially k infections.
Derive the backward Kolmogorov equation for Φ(y, t|k). Note that some of the Φs in the derivation above
would correspond to Φ(y, t|1) and some of them to Φ(y, t|k).
Exercise 3.10 Comparison of the formulations
a. Using Eq. (18) derive an equation for α̇ where α(t) = Φ(0, t). What, if any, additional information
would you need to solve this numerically?
b. Using Eq. (19), derive Equation (17) for α̇ where α(t) = Φ(0, t). What, if any, additional information
would you need to solve this numerically?
Exercise 3.11 Full solution
∂
Φ(y, t) = (γ − βΦ(y, t))(1 − Φ(y, t))
∂t
b. Using partial fractions, set up an integral which you could use to solve for Φ(y, t) analytically (you do
not need to do all the algebra to solve it).
Exercise 3.12 Argue from their definitions that Φ(y, t) = Π(y, z, t)|z=1 .
Exercise 3.13 Derive Theorem 3.3 from Theorem 3.4.
Exercise 3.14 Derive Theorem 3.5 from Theorem 3.4.
38
6 4 6 4 6 4
9 9 9
7 7 7
0 0 0
3 3 3
5 5 5
8 8 8
1 2 1 2 1 2
10 11 10 11 10 11
Figure 9: (Left) A twelve-individual population, after the a priori assignment of who would transmit to
whom if ever infected by the SIR disease (the delay until transmission is not shown). Half of the nodes have
zero potential infectors and half have 3. Half of the nodes have 1 potential offspring and half have 2. So
the offspring distribution has PGF (x + x2 )/2 while the ancestor distribution has PGF χ(x) = (1 + x3 )/2.
(Middle) If node 6 is initially infected, the infection will reach node 4 who will transmit to 5 and 7,
and eventually infection will also reach 8 and 2 before further transmissions fail because nodes are already
infected. If however, it were to start at 9, then it would reach 10, from which it would spread only to 2.
(Right) By tracing backwards from an individual, we can determine which initial infections would lead to
infection of that individual. For example individual 4 will become infected if and only if it is initially infected
or 0, 3, 5, 6, 7, 8, or 11 is an initial infection.
4 Large-time dynamics
We now look at how PGFs can be used to develop simple models of SIR disease spread in the large population
limit when the disease infects a nonzero fraction of the population. In this limit, the early-time approaches
derived before break down because depletion of the susceptible population is important. The later-time
models of Section 3.5 are impractical because of the N → ∞ limit and are more restricted due to the
continuous-time assumption.
39
v will become infected if and only if there is at least one directed path from an initially infected node u to
v. The time of v’s infection is given by the least sum of all paths from initially infected nodes to v. We note
that the transmission process could be quite complex: the duration of a node’s infection and the delays from
time of infection to time of onwards transmissions can have effectively arbitrary distributions, and we could
still build a similar directed graph.
This directed graph is a useful structure to study because it encodes the outbreak in a single static object,
as opposed to a dynamic process. There is significant study of the structure of such directed graphs [10, 13].
Much of it focuses on the size of out-compoents of a node (that is, for a given node, what fraction of the
population can be reached following the edges forwards) or the in-components (that is, from what fraction
of the population is it possible to reach a given node by following edges forwards).
where pi is the probability that a random node in the directed graph has in-degree i. That is, there are
exactly i nodes that would directly transmit to the randomly chosen node if they were ever infected. So the
probability an individual is not infected S(∞)/N solves x = χ(x), choosing the smaller solution when two
solutions exist. Since the proportion infected is r(∞) = R(∞)/N = 1 − S(∞)/N , we can conclude
Theorem 4.1 Assume that an outbreak begins with a single infected individual and an epidemic results.
In the large N limit, the expected cumulative proportion infected r(∞) = R(∞)/N solves
where χ(x) is the PGF of the ancestor distribution. If there are multiple solutions we choose the larger
solution for r(∞) in [0, 1].
Under common assumptions, the population is large, the average number of transmissions an individual
causes is R0 , and the recipient is selected uniformly at random. Under these assumptions the ancestor
distribution is Poisson with mean R0 . So χ(x) = e−R0 (1−x) . xThen
40
Deriving this result does not depend on the duration of infections, or even on the distribution of factors
affecting infectiousness. The assumptions required are that an epidemic starts from a single infected individ-
ual, that each transmission reaches a randomly chosen member of the population, that all individuals have
equal susceptibility, and the average individual will transmit to R0 others. This result is general across a
wide range of assumptions about the infectious process.
Restating this we have:
Corollary 4.1 Assume that an SIR disease is spreading in a well-mixed population with homogeneous
susceptibility. Assuming that the initial fraction infected is infinitesimal and an epidemic occurs, the final
size satisfies
r(∞) = 1 − e−R0 r(∞) (33)
where R0 is the reproductive number of the disease.
This explains many of the results of [32, 35], and our observation in example 2.3 that the epidemic size
depends on R0 and not on any other property of the offspring distribution. A closely-related derivation is
provided by [12, Section 1.3].
We can interpret this in the context of survival functions. The function (1 − ρ)χ(S(g − 1)/N ) gives the
probability that a node has lasted g generations without being infected.
41
4.4 Continuous-time SIR epidemic dynamics
We now move to continuous-time SIR epidemics. We allow for heterogeneity, assuming that eachP susceptible
individual u receives transmissions at some rate κu βI(t)/N ⟨K⟩, and that the PGF of κ is ψ(x) = κ P (κ)xκ .
We assume κ takes only non-negative integer values.
For an initially susceptible individual u with a given κu , the probability of not yet receiving a transmission
by time t solves ṡu = −κu βI(t)su /N ⟨K⟩, which has solution
Rt
0 I(τ )dτ
su = e−κu β N ⟨K⟩ .
So we can write
su = θκu
Rt
0 I(τ )dτ
where θ = e−β N ⟨K⟩ and
θ̇ = −βθI/N ⟨K⟩ .
Considering a random individual of unknown κ, the probability she was initially susceptible is 1 − ρ and the
probability she has not received any transmissions is ψ(θ). So
Ṙ = γI
γN ⟨K⟩ θ̇
=− .
β θ
γN ⟨K⟩
R=− ln θ
β
Taking I = N − S − R we get
γ ⟨K⟩
I=N 1 − (1 − ρ)ψ(θ) + ln θ
β
and so θ̇ becomes
γ ⟨K⟩
θ̇ = −βθ 1 − (1 − ρ)ψ(θ) + ln θ / ⟨K⟩
β
Theorem 4.3 Assuming that at time t = 0 a fraction ρ of the population is randomly infected and that
the susceptible individuals each have a κ such that they become infected as a Poisson process with rate
κβI/N ⟨K⟩, in the large population limit we have
S = N (1 − ρ)ψ(θ) (35a)
γ ⟨K⟩
I = N 1 − (1 − ρ)ψ(θ) + ln θ (35b)
β
γN ⟨K⟩
R=− ln θ (35c)
β
42
and initial condition
θ(0) = 1 . (35e)
As in the discrete-time case, this can be interpreted as a survival function formulation of the SIR model.
Most, if not all, mass-action formulations of the SIR model can be re-expressed in a survival function
formulation. Some examples are shown in the Exercises.
Some very similar systems of equations are developed in [27, chapter 6] and [37, 46, 47, 34] where the
focus is on networks for which the value of κ not only affects the probability of becoming infected, but also
of transmitting further. These references focus on the assumption that an individual’s infector remains a
contact after transmission, but they contain techniques for studying partnerships with varying duration.
4.5 Exercises
Exercise 4.1 Ancestor distribution for homogeneous well-mixed population.
Consider an SIR disease in a well-mixed population having N individuals and a given R0 . Let v be a
randomly chosen individual from the directed graph created by placing edges from each node to all those
nodes they would transmit to if infected.
a. Show that if the average number of offspring is R0 , then so is the average number of infectors.
b. If there are exactly R0 N edges in the directed graph and each recipient is chosen uniformly at random
from the population (independent of any previous choice), argue that the number of transmissions v
receives has a binomial distribution with R0 N trials and probability R0 /N . (technically we must allow
edges from v to v)
c. Argue that if R0 remains fixed as N → ∞, then the number of transmissions v receives is Poisson
distributed with mean R0 .
Exercise 4.2 Explain why for large N the probability v is still susceptible at generation g if she was initially
susceptible is χ(S(g − 1)/N ).
Exercise 4.3 Use Theorem 4.2 to derive a result like Theorem 4.1, but with nonzero ρ.
Exercise 4.4 Final size relations
Consider the continuous time SIR dynamics as given in System (35)
a. Assume κ = 1 for all individuals, and write down the corresponding equations for S, I, R, and θ.
b. At large time I → 0, so S(∞) = N − R(∞). But also S(∞) = S(0)ψ(θ(∞)). By writing θ(∞) in
terms of R(∞), derive a recurrence relation for r(∞) = R(∞)/N in terms of r(∞) and R0 = β/γ.
c. Comment on the relation between your result and Theorem 4.1
Exercise 4.5 Other relations
a. Using the equations from Exercise 4.4, derive the peak prevalence relation, an expression for the
maximum value of I. [at the maximum I˙ = 0, so we start by finding θ so that Ṡ + Ṙ = 0.]
b. Similarly, find the peak incidence relation, an expression for the maximum rate at which infections
occur, −Ṡ.
Exercise 4.6 Alternate derivation of su .
If the rate of transmissions to u is βIκu /N ⟨K⟩, then the expected number of transmissions u has received
Rt
is βκu 0 I(τ ) dτ /N ⟨K⟩ and this is Poisson distributed.
a. Let fu (x) be the PGF for the number of transmissions u has received. Find an expression for fu (x)
Rt
in terms of the integral 0 I(τ )dτ .
b. Explain why fu (0) is the probability u is still susceptible.
43
c. Find fu (0).
Exercise 4.7 Alternate derivation of Theorem 4.3 in the homogeneous case.
The usual homogeneous SIR equations are
Ṡ = −βIS/N
I˙ = βIS/N − γI
Ṙ = γI
We will Rderive system (35) for fixed κ = 1 from this system through the use of an integrating factor. Set
t
θ = e−β 0 I(τ )dτ /N .
a. Show that θ̇ = −βIθ/N and so θ̇/θ = −β Ṙ/N γ.
b. Using the equation for Ṡ add βIS/N to both sides and then divide by (the factor 1/θ is an integrating
d
factor). Show that the expression on the left hand side is dt S/θ and so
d
S/θ = 0 .
dt
5 Multitype populations
We now briefly discuss how PGFs can be applied to multitype populations. This section is intended primarily
as a pointer to the reader to show that it is possible to apply these methods to such populations. We do not
perform a detailed analysis.
Many populations can be divided into subgroups. These may be patches in a metapopulation model,
genders in a heterosexual sexually transmitted infection model, age groups in an age-structured population,
or any of a number of other groupings. Applications of PGFs to such models have been studied in multiple
contexts [28, 44].
44
infections in group ℓ. Define αg|k to be the probability that a chain of infections starting from an individual
of group k becomes extinct within g generations.
It is straightforward to show that if we define
X
ψk (x1 , x2 , . . . , xM ) = pi1 ,i2 ,...,iM xi11 xi22 · · · xiMM
i1 ,i2 ,...,iM
then
X
i1 i2 iM
αg|k = pi1 ,i2 ,··· ,iM |k αg−1|1 αg−1|2 · · · αg−1|M
i1 ,i2 ,...,iM
After converting this into vectors we get α ⃗ ⃗0. Iterating g times we have
⃗ 1 = ψ(
α ⃗ [g] (⃗0)
⃗g = ψ (36)
Setting α
⃗ to be the limit as g goes to infinity, we find the extinction probabilities. Specifically, the k-th
component of α⃗ is the probability of extinction given that the first individual is of type k. Thus we have:
Theorem 5.1 Let
• α
⃗ g = (αg|0 , αg|1 , . . . , αg|M ) where αg|k is the probability a chain of infections starting with a type k
individual will end within g generations
We could have derived this directly by showing that the extinction probabilities solve α ⃗ α). In this
⃗ = ψ(⃗
case it might not be obvious how to solve this multidimensional system of nonlinear equations or how to be
certain that the solution found is the appropriate one. However, by interpreting the iteration in Eqn. (36)
in terms of the extinction probability after g generations, it is clear that simply iterating starting from
⃗ 0 = ⃗0 will converge to the appropriate values. Additionally the values calculated in each iteration have a
α
meaningful interpretation.
Example 5.1 Consider a population made up of many large communities. We assume an unfamiliar dis-
ease is spreading through the population. When the disease begins to spread in a community, the community
learns to recognize the disease symptoms and infectiousness declines. We assume that we can divide the
population into 3 types: primary cases T0 , secondary cases T1 , and tertiary cases T2 . The infectiousness of
primary cases is higher than that of secondary cases which is higher than that of tertiary cases. Within a
community a primary case can cause secondary cases, while secondary and tertiary cases can cause tertiary
cases. All cases can cause new primary cases in other communities. We ignore multiple introductions to
the same community.
We define nij to be the number of infections of type Ti caused by a type Tj individual, and we assume
that we know the joint distribution pn00 n10 , pn01 n21 , and pn02 n22 . We define
X
ψ1 (x, y, z) = pn00 n10 xn00 y n10
n00 ,n10
X
ψ2 (x, y, z) = pn01 n21 xn01 z n21
n01 ,n21
X
ψ3 (x, y, z) = pn02 n22 xn02 z n22
n02 ,n22
45
We define α
⃗ 0 = (0, 0, 0) and set α
⃗ g = (ψ1 (⃗
αg−1 ), ψ2 (⃗
αg−1 ), ψ3 (⃗
αg−1 )). Then taking α
⃗ to be the limit as
g → ∞, the first entry of α ⃗ is the probability that the disease goes extinct starting from a single primary
case.
Theorem 5.2 If the rate of transmission from an infected individual in group j to group i is βij , then
5.3 Exercises
Exercise 5.1 Consider a vector-borne disease for which each infected individual infects a Poisson-distributed
number of vectors, with mean λ. Each infected vector causes i infections with probability pi = π i (1 − π) for
some π ∈ [0, 1]. This scenario corresponds to human infection lasting for a fixed time with some constant
transmission rate to vectors, and each vector having probability π of living to bite again after each bite and
transmitting with probability 1 if biting.
a. Let αg|1 and αg|2 be the probability that an outbreak would go extinct in g generations starting with an
infected human or vector respectively. Find the vector-valued function ψ(⃗ ⃗ x) = (ψ1 (⃗x), ψ2 (⃗x)). That
is, what are the PGFs ψ1 (x1 , x2 ) and ψ2 (x1 , x2 )?
b. Set λ = 3 and π = 0.5. Find the probability of an epidemic if one infected human is introduced or if
one infected vector is introduced.
c. For the same values, find the probability of an epidemic if one infected vector is introduced.
d. Find ψ2 (ψ1 (0, x), 0). How should we interpret the terms of its Taylor Series expansion?
46
Exercise 5.2 Starting from the equations
Si X
Ṡi = − βij Ij
Ni j
Si X
I˙i = −γi Ii + βij Ij
Ni j
Ṙi = γi Ii
to be the transmission rate from type i individuals to a single type j individual, and assume all infected
individuals recover with the same rate γ.
−β ( j κj 0t Ij (τ ) dτ )/ j κj Nj
P R P
and define the PGF ψ(x) = i N
P i i P
P Define θ = e
P N x . Let S = i Si , I =
i I i , and R = R i .
a. Explain what assumptions this model makes about interactions between individuals in group i and j.
b. Show that
S = N ψ(θ)
I =N −S−R
Ṙ = γI
P
j κj Ij
θ̇ = −βθ P
j κj Nj
with θ(0) = 1.
P P P
κj Ij κj Sj κ j Rj
c. Explain why Pj =1− Pj − Pj .
j κj Nj j κj Nj j κj Nj
P
Pj
κj Sj θψ ′ (θ)
d. Show that = ψ ′ (1) .
j κj Nj
P P
d Pj κj Rj κ j Rj
e. Show that dt = −(γ/β) θ̇θ , and solve for Pj in terms of θ assuming Rj = 0 for all j.
j κj Nj j κj Nj
6 Discussion
There are many contexts where we are interested in how a newly introduced infectious disease would spread.
We encounter situations like this in the spread of zoonotic infections such as Monkey Pox or Ebola as well as
the importation of novel diseases such as the Zika in the Americas or the reintroduction of locally eliminated
diseases such as Malaria.
PGFs are an important tool for the analysis epidemics, particularly at early stages. They allow us to
relate the individual-level transmission process to the distribution of outcomes. This allows us to take data
about the transmission process and make predictions about the possible outcomes, but it also allows us to
take observed outbreaks and use them to infer the individual-level transmission properties.
47
For SIR disease PGFs also provide a useful alternative formulation to the usual mass-action equations.
This formulation leads to a simple derivation of final-size relations and helps explain why previous studies
have shown that a wide range of disease assumptions give the same final size relation.
Our goal with this primer has been to introduce researchers to the many applications of PGFs to disease
spread. We have used the appendices to derive some of the more technical properties of PGFs. Additionally
we have developed a Python package Invasion PGF which allows for quick calculation of the results in the
first three sections of this primer. A detailed description of the package is in Appendix C. The software can
be downloaded at https://github.com/joelmiller/Invasion_PGF. Documentation is available within the
repository, starting with the file docs/ build/html/index.html. The supplementary information includes
code that uses Invasion PGF to generate the figures of Section 2.
Property A.2 Given a PGF f (x) = n rn xn , the coefficient of xn in its expansion for a particular n
P
can be calculated by taking n derivatives, evaluating the result at x = 0, and dividing by (n!). That is
n
1 d
rn = f (x)
n! dx x=0
This result holds for any function with a Taylor Series (it does not use any special properties of PGFs).
48
Exercise A.1 Prove Property A.2 [write out the sum and show that the derivatives eliminate any rm for
m < n, the leading coefficient of the result is n!rn , and the later terms are all zero].
There are many contexts in which we can only calculate a function numerically. In this case the calculation
of these derivatives is likely to be difficult and inaccurate. An improved way to calculate it is given by a
Cauchy integral [38]. This is a standard result of Complex Analysis, and initially we simply take it as given.
1 f (z)
I
rn = dz
2πi z n+1
This integral can be done on a closed circle around the origin z = Reiθ , in which case dz = izdθ. Then rn
can be rewritten as Z 2π
1 f (Reiθ )
rn = dθ
2π 0 (Reiθ )n
Using another substitution, θ = 2πu, we find dθ = 2πdu with u varying from 0 to 1. This integral becomes
Z 1
f (Re2πiu )
rn = n 2nπiu
du
0 R e
The integral on the right hand side can be approximated by a simple summation and we find
M
1 X f (Re2πim/M )
rn ≈
M m=1 Rn e2nπim/M
for large M .
A few technical steps show that the PGF f (z) converges for any z with |z| ≤ 1 (any PGF is analytic
within the unit circle R = 1 and that the PGF converges everywhere on the unit circle [the coefficients are
all positive or zero and the sum converges for z = 1, so it converges absolutely on the unit circle]). Thus
this integral can be performed for any positive R ≤ 1. We have found that the unit circle (R = 1) yields
remarkably good accuracy, so we recommend using it unless there is a good reason not to. Some discussion
of identifying the optimal radius appears in [9].
Thus we have
Property A.3 Given a PGF f (x), the coefficient of xn in its expansion can be calculated by the integral
1
f (Re2πiu )
Z
rn = du (38)
0 Rn e2nπiu
with R = 1 and M ≫ 1.
It turns out that this approach is closely related to the approach to get a particular coefficient of a Fourier
Series. Once the variable is changed from z to θ, our function is effectively a Fourier Series in θ, and the
integral is the standard approach to finding the nth coefficient of a Fourier Series.
Exercise A.2 Verification of Equation (38):
In this exercise we show that the formula in Equation (38) yields rn . Assume that the integral is
performed on a circle of radius R ≤ 1 about the origin.
R1 2πiu
a. Write f (z) = m rm z m and rewrite 0 fR(Re )
P
n e2nπiu du as a sum
1 1
f (Re2πiu )
Z X Z
n 2nπiu
du = rm Rm−n e2(m−n)πiu du
0 R e m 0
49
b. Show that for m = n the integral in the summation on the right hand side is 1.
c. Show that for m ̸= n, the integral in the summation on the right hand side is 0.
d. Thus conclude that the integral on the left hand side must yield rn .
Exercise A.3 Let f (z) = ez = 1 + z + z 2 /2 + z 3 /6 + z 4 /24 + z 5 /120 + · · · . Write a program that estimates
r0 , r1 , . . . , r5 using Equation (39) with R = 1. Report the values to four significant figures for
a. M = 2
b. M = 4
c. M = 5
d. M = 10
e. M = 20.
Property A.5 The expected value of a random variable i whose distribution has PGF f (x) is given by
E(i) = f ′ (1).
It is straightforward to derive relationships for E(i2 ) and higher order moments by repeated differentiation
of f and evaluating the result at 1.
50
Example A.1 Consider a weighted coin which comes up ‘Success’ with probability p and with ‘Failure’
with probability 1 − p. We play a game in which we stop at the first failure, and otherwise flip it again.
Define f (x) = px + 1 − p
Let αg be the probability of failure within the first g flips. Then α0 = 0 and α1 = 1 − p = f (0) are easily
calculated.
More generally the probability of starting the game and failing immediately is α0 = 1 − p = f (0) while
the probability of having a success and flipping again is p, at which point the probability of failure within
g − 1 flips is αg−1 . So we have αg = (1 − p) + pαg−1 = f (αg−1 ). So using induction we can show that the
probability of failure within g generations is f [g] (0).
Exercise A.4 The derivation in example A.1 was based on looking at what happened after a single flip
and then looking g − 1 flips into the future in the inductive step. Derive αg = f (αg−1 ) by instead looking
g − 1 flips into the future and then considering one additional step. [the distinction between this argument
and the previous one becomes useful in the continuous-time case where we use the ‘backward’ or ‘forward’
Kolmogorov equations.]
Exercise A.5 Consider a fair six-sided die with numbers 0, 1, . . . , 5, rather than the usual 1, . . . , 6. We
roll the die once. Then we look at the result, and roll that many copies (if zero, we stop), then we look at
the sum of the result and repeat. Define
( 6
x −1
1 + x + · · · + x5 x ̸= 1
f (x) = = 6(x−1)
6 1 x=1
Define αg to be the probability the process stops after g iterations (with α0 = 0 and α1 = 1/6).
a. Find an expression for αg , the probability that by the g’th iteration the process has stopped, in terms
of f (x).
b. Rephrase this question in terms of the extinction probability for an infectious disease.
Processes like that in Exercise A.5 can be thought of as “birth-death” processes where each event generates
a discrete number of new events. Our examples above show that function composition arises naturally in
calculating the probability of extinction in a birth-death process. We show below that it also arises naturally
when we want to know the distribution of population sizes after some number of generations rather than
just the probability of 0.
Specifically, we often assume an initially infected individual causes some random number of new infections
i from some distribution. Then we assume that each of those new infections independently causes an
additional random number of infections from the same distribution. We will be interested in how to get from
the one-generation PGF to the PGF for the distribution after g generations.
We derive this in a few stages.
• We first show that if we take two numbers from different distributions with PGFs f (x) and h(x), then
their sum has distribution f (x)h(x) [Property A.6]. Then inductively applying this we conclude that
the distribution of the sum of n numbers from a distribution with PGF f (x) has PGF [f (x)]n .
• We also show that if the probability we take a number from the distribution with PGF f (x) is π1 and
the probability we take it from the distribution with PGF h(x) is π2 , then the PGF of the resulting
distribution is π1 f (x) + π2 h(x) [Property A.7].
• Putting these two properties together, we can show that if we choose i from a distribution with PGF
f (x) and then choose i different values from a distribution with PGF h(x), then the sum of the i values
has PGF f (h(x)) [Property A.8].
Our main use of Properties A.6 and A.7 is as stepping stones towards Property A.8.
Consider two probability distributions, let ri be the probability of i for the P
first distribution and
P qj be
the probability of j for the second distribution. Assume they have PGFs f (x) = i ri xi and h(x) = j qj xj
respectively.
51
We are first interested in the process of choosing i from the first distribution, j from the second, and
adding them. In the disease context this arises where the two distributions give the probability that one
individual infects i and the other infects j and we want to know the probability of a particular sum.
The probability of obtaining a particular sum k is
X
ri qk−i
i
P Pk
So the PGF of the sum is k i=0 ri qk−i xk . By inspection, this is equal to the product f (x)h(x). This
means that the PGF of the process where we choose i from the first and j from the second and look at the
sum is the product f (x)h(x).
We have shown
two probability distributions, r0 , r1 , . . . and q0 , q1 , . . . with PGFs f (x) = i ri xi
P
Property A.6P Consider
and h(x) = j qj xj . Then if we choose i from the distribution ri and j from the distribution qj , the PGF
of their sum is f (x)h(x).
Usually we want the special case where we choose two numbers from the same distribution having PGF
f (x). The PGF for the sum is [f (x)]2 . The PGF for the sum of three numbers from the same distribution
can be thought of as the result of [f (x)]2 and f (x), yielding [f (x)]3 . By induction, it follows that the PGF
for the sum of i numbers sum is [f (x)]i .
Now we want to know what happens if we are not sure what the current system state is. For example,
we might not know if we have 1 or 2 infected individuals, and the outcome at the next generation is different
based on which it is.
We use the distributions ri and qj . We assume that with probability π1 we choose a random number k
from the ri distribution, while with probability π2 = 1 − π it is chosen from the qj distribution. Then the
probability of a particular value k occurring is π1 rk + π2 qk , and the resulting PGF is k (π1 rk + π2 qk )xk =
P
π1 f (x) + π2 h(x). This becomes:
i
P
P We jfinally consider a process in which we have two distributions with PGFs f (x) = i ri x and h(x) =
j qj x . We choose the number i from the distribution ri and then take the sum of i values chosen from
Pi
the qj distribution, ℓ=1 jℓ . Both the number of terms in Pthe sum and their values are random variables.
Using the results above, the PGF of the resulting sum is i ri h(x)i = f (h(x)). Thus we have
This property is closely related to the spread of infectious disease. An individual may infect i others, and
then each of themP causes additional infections. The number of these second generation cases is the sum of i
i
random numbers ℓ=1 jℓ where jℓ is the number of additional infections caused by the ℓ-th infection caused
by the initial individual. So if f (x) is the PGF for the distribution of the number of infections caused by the
first infection and h(x) is the PGF for the distribution of the number of infections caused by the offspring,
then f (h(x)) is the PGF for the number infected in the second generation [and if the two distributions are
the same this is f [2] (x)]. Repeated iteration gives us the distribution after g generations.
Exercise A.6 Note that if we interchange p and q in the PGF of the negative binomial distribution in
Table 1, it is simply the PGF of the geometric distibution raised to the power r̂. A number chosen from the
negative binomial can be defined as the number of successful trials (each with success probability p) before
the r̂th failure.
52
Using this and Property A.8, derive the PGF of the negative binomial.
Exercise A.7 Sicherman dice [18, 17].
To motivate this exercise consider two tetrahedral dice, numbered 1, 2, 3, 4. When we roll them we get
sums from 2 to 8, each with its own probability, which we can infer from this table:
2 3 4 5
3 4 5 6
4 5 6 7
5 6 7 8
However another pair of tetrahedral dice, labelled 1, 2, 2, 3 and 1, 3, 3, 5 yields the same sums with the same
probabilities:
2 3 3 4
4 5 5 6
4 5 5 6
6 7 7 8
We now try to find a similar pair for 6-sided dice. First consider a pair of standard 6-sided dice.
a. Show that the PGF of each die is f (x) = (x + x2 + x3 + x4 + x5 + x6 )/6.
b. Fill in the tables showing the possible sums from rolling two dice (fill in each square with the sum of
the two entries) and multiplication for two polynomials (fill in each square with the product of the two
entries):
x1 x2 x3 x4 x5 x6
x1
x2
x3 .
x4
x5
x6
x(1 + x + x2 + x3 + x4 + x5 )
f (x) =
6
x(1 + x + x2 )(1 + x3 )
=
6
x(1 + x + x2 )(1 + x)(1 − x + x2 )
= .
6
53
This cannot be factored further, and indeed it can be shown that a property similar to prime numbers holds.
Namely, any factorization of f (x)f (x) as h1 (x)h2 (x) has the property that each of h1 and h2 can be factored
into some powers of these “prime” polynomials times a constant.
We seek two new six-sided dice (each different) such that the sum of a roll of the two dice has the same
probabilities as the normal dice. The two dice have positive integer values on them (so no fair adding a
constant c to everything on one die and subtracting c on the other). Let h1 (x) and h2 (x) be their PGFs.
e. Explain why we must have h1 (x)h2 (x) = [f (x)]2 .
f. P
If the dice have numbers
ai
P a1bi, . . . , a6 and b1 , . . . , b6 , show that their PGFs are of the form h1 (x) =
i x /6 and h2 (x) = i x /6 where all ai and bi are positive integers.
g. Given the properties we want for the dice, find h1 (0) and h2 (0).
h. Given the properties we want for the dice, find h1 (1) and h2 (1).
i. Using the values at x = 0 and x = 1, explain why h1 (x) = x(1 + x + x2 )(1 + x)(1 − x + x2 )b /6 and
h2 (x) = x(1 + x + x2 )(1 + x)(1 − x + x2 )2−b /6 where b is 0, 1, or 2.
j. The case b = 1 gives the normal dice. Conside b = 0 (b = 2 gives the same final result). Find h1 (x).
h2 (x) = 61 (x + x3 + x4 + x5 + x6 + x8 )
k. Create the table for the two dice corresponding to h1 (x) and h2 (x) and verify that the sums occur with
b. If an infected individual causes anywhere from 1 to 6 infections, all with equal probability, find the
PGF for the number of infections in generation 2 if there is one infection in generation 0. [you can
express the result in terms of f ]
c. And in generation g (assuming depletion of susceptibles is unimportant)?
54
1.0
0.8
0.6
y
0.4
0.2
0.0
Figure 10: Cobweb diagrams: We take the function f (x) = (1 + x3 )/2. A cobweb diagram is built
by alternately drawing vertical lines from the diagonal to f (x) and then horizontal lines from f (x) to the
diagonal. The dashed lines show αg = f (αg−1 ) starting with α0 = 0 and highlight the relation to the
iterative process.
Then at x0 we draw a vertical line to the curve y = f (x). We draw a horizontal line to the line y = x [which
will be at the point (x1 , x1 )]. We then repeat these steps, drawing a vertical line to y = f (x) and a horizontal
line to y = x. Cobweb diagrams are particularly useful in studying behavior near fixed or periodic points.
Exercise A.9 Understanding cobweb diagrams
From figure 10 the origin of the term “cobweb” may be unclear. Because of properties of PGFs, the
more interesting behavior does not occur for our applications. Here we investigate cobweb diagrams in more
detail for non-PGF functions. Since we use f (x) to denote a PGF, in this exercise we use z(x) for an
arbitrary function.
a. Consider the line z(x) = 2(1 − x)/3. Starting with x0 = 0, show how the first few iterations of
xi = z(xi−1 ) can be found using a cobweb diagram (do not explicitly calculate the values).
b. Now consider the line z(x) = 2(1 − x). The solution to z(x) = x is x = 2/3. Starting from an initial
x0 close to (but not quite equal to) 2/3, do several iterations of the cobweb diagram graphically.
c. Repeat this with the lines z(x) = 1/4 + x/2 starting at x0 = 0 and z(x) = −1 + 3x starting close to
where x = z(x).
d. What is different when the slope is positive or negative?
e. Can you predict what condition on the slope’s magnitude leads to convergence to or divergence from
the solution to x = z(x) when z is a line?
So far we have considered lines z(x). Now assume z(x) is nonlinear and consider the behavior of cobweb
diagrams close to a point where x = z(x).
f. Use Taylor Series to argue that (except for degenerate cases where z ′ is 1 at the intercept) it is only
the slope at the intercept that determines the behavior sufficiently close to the intercept.
of fixed points of f (x).
Exercise A.10 Structure P
Consider a PGF f (x) = i ri xi , and assume r0 > 0.
55
b. Show that f (x) is convex (that is f ′′ (x) ≥ 0) for x > 0. [hint ri ≥ 0 for all i]
c. Thus argue that if f ′ (1) ≤ 1, then x = f (x) has only one solution to x = f (x) in [0, 1], namely
f (1) = 1. It may help to draw pictures of f (x) and the function y = x for x in [0, 1].
d. Explain why if there is a point x0 ̸= 1 where f (x0 ) = x0 and f (x) > x for x in some region (x0 , x1 )
then 0 < f ′ (x0 ) < 1.
e. Thus show that if f ′ (1) > 1 then there are exactly two solutions to x = f (x) in [0, 1], one of which is
x = 1.
These results suggest:
i
P
Property A.9 Assume f (x) = i ri x is a PGF, and f (0) > 0.
• If f ′ (1) ≤ 1 then the only intercept of x = f (x) in [0, 1] is at x = 1.
• Otherwise, there is another intercept x∗ , 0 < x∗ < 1, and if x < x∗ then x < f (x) < x∗ while if
x > x∗ then x > f (x) > x∗ and for 0 ≤ x0 < 1, f [g] (x0 ) converges monotonically to x∗ .
The assumption r0 > 0 was used to rule out f (x) = x. Excluding this degenerate case, these results hold
even if r0 = 0, in which case we can show f ′ (1) > 1 and x∗ = 0.
To sketch the proof of this property, we note that clearly f (1) = 1, so if f (0) > 0 then either f (x) crosses
y = x at some intermediate 0 < x∗ < 1 or it does not cross until x = 1. Then using the fact that for x > 0
the slope of f is positive and increasing, we can inspect the cobweb diagram to see these results.
• The first involves assuming we know f (x, t) and then looking through all of the possible transitions to
find how the system changes going from t to t + ∆t. This will yield the forward Kolmogorov Equations.
• The second involves starting from the initial condition f (x, 0) and finding f (x, ∆t) by investigating all
of the possible transitions. Then taking f (x, ∆t) and f (x, t) we are able to find f (x, t + ∆t). This will
yield the backward Kolmogorov Equations.
i 7→ i + m − 1 .
For example early in an epidemic, we may assume that an infected individual causes new infections at rate
β. The outcome of an infection event is equivalent to the removal of the infected individual and replacement
56
by two infected individuals. Similarly, a recovery event occurs with rate γ and is equivalent to removal with
no replacement. So λ2 = β, λ0 = γ, and all other λm are 0.
Our events happen at a per-individual rate λm , so the total rate an event occurs across the population of
i individuals is λm i. Events that can be modeled like this include decay of a radioactive particle, recovery of
an infected individual, or division of a cell. We assume that different events may be possible, each having a
different m. If multiple events have the same effect on m (for example emigration or death), we can combine
their rates into a single λm .
It will be useful to define X
Λ= λm
m
We can think of h(x) as the PGF for the number of new individuals given that a random event happens
(since λm /Λ is the probability that the random event introduces m individuals).
We start with one derivation of the equation for f˙(x, t) based on directly calculating f (x, t+∆t) and using
the definition of the derivative. An alternate way is shown in exercise A.11. For small ∆t the probability that
multiple events occur in the same time interal is O(∆t), and we will see that this is negligible. Let us assume
the system has i individuals at time t, whch occurs with probability ri (t). For P a given m, the probability
that the event occurs in the time interval given i is λm i∆t + O(∆t) and (1 − m λm i∆t) + O(∆t) measures
the probability that none of the the events occur in the time interval and the system remains in state i.
If the event occurs, the system leaves the state corresponding to xi and enters the state corresponding to
xi+m−1 . Summing over m and i, we have
" ! #!
X X X
f (x, t + ∆t) = ri (t) (λm i∆t)xi+m−1 + 1 − λm i∆t xi + O(∆t)
i m m
The O(∆t) corrects for the possibility of multiple events happening in the time interval.
A bit of algebra and separating the i and m summations shows that
X X X
f (x, t + ∆t) = ri (t)xi + λm (∆t)(xm − x) ri (t)ixi−1 + O(∆t)
i m i
X ∂ i
X
= f (x, t) + λm (xm − x)∆t x + O(∆t)
ri (t)
m i
∂x
!
X
m
X ∂ X
= f (x, t) + (∆t) λm x − x λm ri (t)xi + O(∆t)
m m
∂x i
∂
= f (x, t) + Λ(∆t)[h(x) − x] f (x, t) + O(∆t)
∂x
So we now have
∂ f (x, t + ∆t) − f (x, t)
f (x, t) = lim
∂t ∆t→0 ∆t
∂
f (x, t) + Λ∆t[h(x) − x] ∂x f (x, t) + O(∆t) − f (x, t)
= lim
∆t→0 ∆t
∂
= Λ[h(x) − x] f (x, t)
∂x
We finally have
57
Property A.10 Let f (x, t) = i ri (t)xi be the PGF for the probability of having i individuals at time
P
t. Assume several events indexed by m can occur, each with rate λm i, thatPremove one individual and
replace it with m. Let Λ = m λm be the total per-capita rate and h(x) = m λm xm /Λ be the PGF of
P
the outcome of a random event. Then
∂ ∂
f (x, t) = Λ[h(x) − x] f (x, t) (40)
∂t ∂x
The derivative serves the purpose of getting the factor i into the coefficient of each term which addresses the
fact that the rate events happen is proportional to the total count. The derivative has the additional effect
of reducing the exponent by 1, corresponding to the removal of one individual. The λm in the remaining
factor gives the per-capita rate of changing state. The xm − x captures the fact that when moving to that
new state xm individuals are added but the system is leaving the current state (which has an exponent of
xi ) with the same rate.
Exercise A.11 Alternate derivation of Equation (40)
An alternate way to derive Equation (40) is through directly calculating ṙi .
P P
a. Explain why ṙi = − m λm iri + m λm (i − m + 1)ri−m+1 .
We can generalize this to the case where there are multiple types of individuals. For the Forward
Kolmogorov equations, it is relatively straightforward to allow for interactions between individuals. We
may be interested in this generalization when considering predator-prey interactions or interactions between
infected and susceptible individuals if we are interested in depletion of susceptibles. We assume that there
are two types of individuals A and B with counts i and j respectively, and we let rij (t) denote the probability
of a given pair i and j. We define the PGF
X
f (x, y, t) = ri,j (t)xi y j
i,j
We assume that interactions between an A and a B individual occur with some rate proportional to the
product ij We assume that the interaction removes both individuals and replaces them by m of type A and
n of type B. We denote the rate as µm,n ij, and the sum
X
M= µm,n .
m,n
We also assume that individuals of type A spontaneously undergo changes as they did above, but they
can be replaced by type A and/or type B individuals. So one individual of type A is removed and replaced
by m individuals of type A and n of type B with rate λm,n , and the combined rate for one specific transition
over the entire set of individuals is λm,n i. We define
X
Λ= λm,n .
m,n
58
We will ignore spontaneous changes by nodes of type B, but the generalization to include these can be found
by following the same method.
Finally, let X
h(x, y) = λm,n xm y n /Λ
m,n
and X
g(x, y) = µm,n xm y n /M
m,n
Property A.11 Let f (x, y, t) = i,j rij (t)xi y j be the PGF for the probability of having i type A and j
P
type B individuals. Assume that events occur with rate λm,n i or µm,n ij toPreplace a single type APindividual
or one of each type with m type A and n type B individuals. Let Λ = m,n λm,n and M = m,n µm,j .
Then
∂ ∂ ∂ ∂
f (x, y, t) = Λ[h(x, y) − x] f (x, y, t) + M[g(x, y) − xy] f (x, y, t) (41)
∂t ∂x ∂x ∂y
m n
P
where h(x, y) = m,n λm,n P x y /Λ ism the PGF for the outcome of a random event whose rate is propor-
tional to i and g(x, y) = m,n µm,n x y n /M is the PGF for the outcome of a random event whose rate is
proportional to ij.
This can be generalized if there are events whose rates are proportional only to j or if there are more
than two types. The exercise below shows how to generalize this if the rate of events depend on i in a more
complicated manner.
59
Exercise A.12 In many cases interactions between two individuals of the same type are important. These
may occur with rate i(i − 1) orPi2 depending on the specific details. Assume we have only a single type of
individual with PGF f (x, t) = i ri (t)xi .
a. If a collection of events to replace two individuals with m individuals occur with rate βm i(i − 1),
∂2
P
Pwrite am PDE for f . Your final result should contain ∂x2 f (x, t). Use B =
find how m βm and
g(x) = m βm x /B. Follow the derivation of Equation (40).
b. If instead the events replace two individuals with m individuals and occur with rate βm i2 , find how to
∂ ∂
incorporate them into a PDE for f . Your final result should contain ∂x x ∂x f (x, t) or equivalently
∂ ∂2
∂x f (x, t) + x ∂x2 f (x, t).
Exercise A.13 Consider a chemical system that begins with some initial amount of chemical A. Let i
denote the number of molecules of species A. A molecule of A spontaneously degrades into a molecule of
B, with rate ξ per molecule. Let j denote the number of molecules of species B. Species B reacts with A at
rate ηij to produce new molecules of species B. The reactions are denoted
A 7→ B
A + B 7→ 2B
Let ri,j (t) denote the probability of i molecules of A and j molecules of B at time t. Let f (x, y, t) = ri,j (t)xi y j
be the PGF. Find the Forward Kolmogorov Equation for f (x, y, t).
where, as in the forward Kolmogorov case, Λ = m λm and h(x) = m λm xm /Λ is the PGF of the number
P P
of new individuals created given that an event occurs. In the first step we used the fact that for f1 (x, t),
ri (0) = 1 if i = 1 and otherwise it is 0. Thus Equation (42) implies
60
Now taking the definition of the derivative, we have
Property A.12 Consider a process in which the number of individuals change in time such that when
an event occurs one individual is destroyed and replaced with m new individuals. The associated rate
associated with an event that changes the population size by m is λm i where i is the number
P of individuals.
Let f1 (x, t) be the PGF for this process beginning from a single individual and Λ = m λm . Then
where h(x) is the PGF for the number of new individuals created in a random event. If the initial number
of individuals is not 1, let f (x, 0) denote the PGF for the initial condition. Then
This is fairly straightforward to generalize to multiple types as long as none of the events involve inter-
actions.
Exercise A.14 In this exercise we generalize Property A.12 for the case where there are two types of
individuals A and B with counts i and j.
Assume events occur spontaneously with rate λm,n i to remove an individual of type A and replace it with
m of type A and n of type B, or they occur spontaneously with rate ζm,n j to remove an individual of type
B and replacePit with m of type APand n of type B.
Set Λ = m,n λm,n and Z = m,n ζm,n . Let f1,0 (x, y, t) denote the outcome beginning with one indi-
vidual of type A and f0,1 (x, y, t) denote the outcome beginning with one individual of type B.
a. Write f1,0 (x, y, ∆t) and f0,1 (x, y, ∆t) in terms of h(x, y) = m,n λm,n xm y n /Λ and g(x, y) = m,n ζm,n xm y n /Z.
P P
b. Use Property A.8, write f1,0 (x, ∆t + t) and f0,1 (x, ∆t + t) in terms of f1,0 and f0,1 evaluated at t and
∆t. The answer should resemble Equation (42).
∂ ∂
c. Derive expressions for ∂t f1,0 (x, y, t) and ∂t f0,1 (x, y, t).
61
A A
B C H B C H
D G I D G I
E F E F
S = (3, 0, 2, 2, 0, 0, . . .) S = (3, 0, 2, 2, 0, 0, 0, 1, 0)
Figure 11: Demonstration of the steps mapping the tree T to the sequence S. The nodes are traced in a depth-
first traversal and their number of offspring is recorded. For the labeling given, a depth-first traversal traces
the nodes in alphabetical order. At an intermediate stage (left) the traversal has not finished the sequence.
The final sequence (right) is uniquely determined once the order of a node’s offspring is (randomly) chosen.
offspring distribution. Our goal is to find out the probability of arriving at a finite tree with exactly j
infections given the offspring distribution.
This tree has certain constraints on it. The first constraint is that it must have exactly j −1 transmissions
from the j infected individuals. So we look at the probability of having a sum of j − 1 when we choose j
numbers from the offspring distribution. This is given by the coefficient of y j−1 in [µ(y)]j .
Next we have to make sure that the sequence is consistent with an outbreak that did not die out sooner.
For example, if an outbreak has exactly two infections, we cannot assume that the first individual infected
no-one and then the second individual infected 1 because the outbreak would have died out without the
second individual having the chance to transmit. So it is not enough for the sequence to add to j − 1, the
order must be consistent with an outbreak of size j.
It turns out that we can find a one-to-one mapping between trees on j individuals and “valid” sequences
summing to j − 1. When doing this, we discover that if a sequence sums to j − 1, there is exactly one cyclic
permutation of that sequence which is valid.6 Thus of all sequences of j values chosen from the offspring
distribution that sum to j − 1, a fraction 1/j are “valid”, that is they yield a complete transmission tree. So
the probability is (1/j) times the coefficient of y j−1 in [µ(y)]j .
We now go through the proof in detail.
point.
62
2 1 0
1 A
2 A 0 2 A 2 A I B
I B I B I B
0 H H C
C 0 H C 0 H C
G D
G D G D G D 2 0
3 0 2 0 2 0 F E
F E F E F E
0
0 1 0 0
G
G G G
H I E
H I E H I E H I E
A D F
A D F A D F A D F
B C
B C B C B C
0 A A
I B I B A
I B
H C H C
H C
G D G D
2 1 G D
F E F E 0
F E
0 0
G
G G
H I E
H I E H I E
A D F
A D F A D F
B C
B C B C
Figure 12: The steps of the construction of a tree with Ŝ = (2, 0, 0, 0, 1, 0, 3, 0, 2) [note that this is a cyclic
permutation of the previous S]. Each frame shows next step in building a tree on a ring. The resulting tree
is not rooted at the top. The names of the nodes in the tree are a cyclic permutation of the original.
Now we look for the probability that a random length-j sequence Ŝ created by choosing numbers from
the offspring distribution is a Lukasiewicz word.7 P
To be a Lukasiewicz word, Ŝ must satisfy that si ∈Ŝ si = j − 1 because the sum is the total number
of transmissions occurring which is one less than the total number of infections. By repeated application of
Property A.6, the probability a sequence of j numbers chosen from the offspring distribution sums to j − 1
is the coefficient of y j−1 in [µ(y)]j . So the probability that a random sequence Ŝ satisfies this constraint is
the coefficient of y j−1 in [µ(y)]j .
Momentarily we will show that given a length-j sequence Ŝ which sums to j − 1, exactly one of its j
cyclic permutations is a Lukasiewicz word, but let us for now assume this result is true.
Consider the j distinct sequences that are cyclic permutations of a sequence Ŝ which sums to j − 1. Since
each of these is a sequence of exactly the same values they have the same probability. Our assumption that
exactly one of them is a Lukasiewicz word means that if Ŝ satisfies the constraint that it sums to j − 1 then
with probability 1/j it is a Lukasiewicz word. So the probability that a random sequence is a Lukasiewicz
word would be 1/j times the probability it sums to j − 1. That is it would be 1/j times the coefficient of
y j−1 in [µ(y)]j . This is the claim of Theorem 2.7.
However, our earlier assumption must still be proven: if Ŝ sums to j − 1 then exactly one of its j
permutations is a Lukasiewicz word.
Given a length j sequence Ŝ of non-negative integers that sum to j − 1, we place j nodes on a ring
starting at the top and ordered clockwise, following the example in Figure 12. We label each ith node with
si . If a node v is labeled with 0 and the adjacent position in the counter-clockwise direction has node u with
a positive label, we place an edge from u to v (with v to the right of any previous edge from u to another
7 If the sequence is not a Lukasiewicz word, then either it is the start of a sequence corresponding to a larger (possibly infinite
63
node) and remove v. We decrease u’s label by one. Note that at a given step there may be multiple pairs
eligible to have edges placed between them, in which case we do all of them. If we did one at a time, the
final outcome would be the same.
Each edge addid in this process reduces both the number of nodes and their sums by one, leaving all
numbers as non-negative integers. So the sum remains one less than the remaining number of nodes. This
guarantees at least one zero and at least one nonzero value until only one node remains. Thus we can always
find an appropriate pair u and v until only a single node remains. The process constructs a directed tree
(there are j nodes with j − 1 edges and the fact that a node is removed from the algorithm once an edge is
added pointing to it guarantees no cycles). Fig. 12 demonstrates the steps.
If the tree is rooted at the node that began at the top of the ring, then Ŝ corresponds to a depth-first
traversal of that tree. It is a Lukasiewicz word. Each cyclic permutation of Ŝ rotates the location of the root
to be one of the j nodes. Only the case when the root is at the top will result in a Lukasiewicz word. Thus
Ŝ has exactly j distinct cyclic permutations, and exactly one of them is Lukasiewicz word. This completes
the final detail of the proof.
So we finally conclude that the probability of a tree of j infected nodes is equal to 1/j times the probability
that j randomly-chosen values from the offspring distribution sum to j − 1. This is 1/j times the coefficient
of y j−1 in [µ(y)]j as Theorem 2.7 claims.
C Software
We have produced a python package, Invasion PGF which can be used to solve the equations of Section 2
or Section 3 once the PGF of the offspring distribution or β and γ are determined. Because the numerical
method involves solving differential equations in the complex plane, it requires an integration routine that
can handle complex values. For this we use odeintw [50].
Table 7 briefly summarizes the commands available in Invasion PGF.
We now demonstrate a sample session with these commands.
>>> import Invasion_PGF as pgf
>>> def mu(x):
... return (1 + x + x**2 + x**3)/4.
...
>>> pgf.R0(mu)
1.5000001241105565
>>> #extinction probabilities up to generation 5
64
Command Output
R0(µ) Approximation of R0 .
Probability αgen of extinction by generation gen given off-
extinction prob(µ, gen)
spring PGF µ.
Probability α(T ) of extinction by time T given transmission
cts time extinction prob(β, γ, T )
and recovery rates β and γ.
Array containing probabilities ϕ0 , . . . , ϕj , . . . , ϕM −1 of having
active infections(µ, gen, M )
j active infections in generation gen given offspring PGF µ.
Array containing probabilities ϕ0 , . . . , ϕj , . . . , ϕM −1 of having
cts time active infections(β, γ,
j active infections at time T given transmission and recovery
T)
rates β and γ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
completed infections(µ, gen, M ) j completed infections in generation gen given offspring PGF
µ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
cts time completed infections (β,
j completed infections at time T given transmission and re-
γ, T )
covery rates β and γ.
M1 × M2 array containing probabilities πi,r of i active in-
active and completed(µ, gen, M1 ,
fections and r completed infections in generation gen given
M2 )
offspring PGF µ.
M1 × M2 array containing probabilities πi,r of i active infec-
cts time active and completed (β,
tions and r completed infections at time T given transmission
γ, T )
and recovery rates β and γ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
final sizes(µ, M )
j total infections in an outbreak given offspring PGF µ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
cts time final sizes(β, γ, T ) j total infections in an outbreak given transmission and re-
covery rates β and γ.
Table 7: Commands of Invasion PGF. Many of these have an optional boolean argument
intermediate values which, if True, will result in returning values from generation 0 to generation gen
in the discrete-time case or at some intermediate times in the continuous-time case. For the discrete-time
results, the input µ is the offspring distribution PGF. For the continuous-time version, β and γ are the
transmission and recovery rates respectively.
65
8.59375000e-02, 8.59375000e-02, 7.81250000e-02,
6.25000000e-02])
>>> #joint probabilities of 0..4 active infections and 0..4 completed
>>> #infections in generation 3
>>> pgf.active_and_completed(mu, 3, 5, 5)
array([[ 0. , 0.25 , 0.0625 , 0.03125 , 0.015625 ],
[ 0. , 0. , 0. , 0.015625 , 0.015625 ],
[ 0. , 0. , 0. , 0.015625 , 0.01953125],
[ 0. , 0. , 0. , 0.015625 , 0.0234375 ],
[ 0. , 0. , 0. , 0. , 0.01171875]])
>>> #check that marginals match, increase sizes considered to improve match
>>> act_and_complete = pgf.active_and_completed(mu, 3, 20, 20)
>>> act_and_complete.sum(axis=1) #Active infections
array([ 3.69720176e-01, 5.25971800e-02, 7.17844516e-02,
9.60913450e-02, 7.39330947e-02, 7.33402669e-02,
6.61700666e-02, 5.00725210e-02, 4.11909670e-02,
3.18221301e-02, 2.31783241e-02, 1.72899812e-02,
1.21286511e-02, 8.08435678e-03, 5.25146723e-03,
3.23349237e-03, 1.90655887e-03, 1.08598173e-03,
5.83335757e-04, 2.96160579e-04])
>>> act_and_complete.sum(axis=0) #Completed infections
array([ 0. , 0.25 , 0.0625 , 0.078125 , 0.09765625,
0.12109375, 0.0859375 , 0.0859375 , 0.078125 , 0.0625 ,
0.0390625 , 0.02342606, 0.01163167, 0.00376529, 0. ,
0. , 0. , 0. , 0. , 0. ])
>>> #yes, these match previous calculations, with a small mismatch because
>>> #e.g., there may be 21 cumulative cases and 8 active cases. To accurately
>>> #calculate the probability of 8 active cases we would need to
>>> #increase the sizes to include this.
>>> #
>>> #Now look at the final sizes
>>> pgf.final_sizes(mu, 20)
array([ 0.00000000e+00, 2.50000000e-01, 5.93750000e-02,
2.82031250e-02, 1.67456055e-02, 1.03404114e-02,
6.80080902e-03, 4.66611063e-03, 3.29263648e-03,
2.37637247e-03, 1.74605802e-03, 1.30159459e-03,
9.81970183e-04, 7.48352208e-04, 5.75249662e-04,
4.45491477e-04, 3.47250478e-04, 2.72225362e-04,
2.14494366e-04, 1.69773210e-04])
>>> #
>>> #Now consider the continuous-time model
>>> beta = 2
>>> gamma = 1
>>> #In next command, first returned array is the times and second
>>> #is the extinction probabilities at those times
>>> pgf.cts_time_extinction_prob(beta, gamma, 5, intermediate_values =
... True, numvals = 6)
(array([ 0., 1., 2., 3., 4., 5.]),
array([[ 0. , 0.38730017, 0.46371057, 0.48723549, 0.49537878,
0.49830983]]))
>>> #following commands look at possible states at time 3
>>> pgf.cts_time_active_infections(beta, gamma, 3, 10)
array([ 0.48723548, 0.01309038, 0.0127562 , 0.01243055, 0.01211321,
66
0.01180397, 0.01150263, 0.01120897, 0.01092282, 0.01064397])
>>> pgf.cts_time_completed_infections(beta, gamma, 3, 10)
array([ 0.00037014, 0.33477236, 0.07721546, 0.03805734, 0.02535527,
0.02008499, 0.01755497, 0.0161637 , 0.01527602, 0.0146211 ])
>>> #check that the joint distribution has the same marginals
>>> cts_time_act_and_complete = pgf.cts_time_active_and_completed(beta, gamma, 3, 20, 20)
>>> cts_time_act_and_complete.sum(axis=1) #Active infections
array([ 0.48717492, 0.01298644, 0.01257732, 0.0121518 , 0.01170803,
0.0112452 , 0.01076355, 0.01026434, 0.00974976, 0.00922277,
0.00868697, 0.00814639, 0.00760529, 0.00706802, 0.0065388 ,
0.00602164, 0.00552019, 0.00503763, 0.00457669, 0.00413954])
>>> cts_time_act_and_complete.sum(axis=0) #Completed infections
array([ 0.00036997, 0.334771 , 0.07720946, 0.03803859, 0.02530855,
0.01998599, 0.0173696 , 0.01584881, 0.0147816 , 0.01389348,
0.01306482, 0.01224544, 0.01141776, 0.01057991, 0.0097376 ,
0.00889996, 0.00807718, 0.0072792 , 0.00651487, 0.00579152])
>>> #yes, these match previous calculations
>>> #Now look at the final sizes at time infinity
>>> pgf.cts_time_final_sizes(beta, gamma, 20)
array([ 0.00000000e+00, 3.33333333e-01, 7.40740741e-02,
3.29218107e-02, 1.82898948e-02, 1.13803790e-02,
7.58691934e-03, 5.29880081e-03, 3.82691169e-03,
2.83474940e-03, 2.14181066e-03, 1.64421828e-03,
1.27883644e-03, 1.00558079e-03, 7.98079995e-04,
6.38463996e-04, 5.14318219e-04, 4.16833066e-04,
3.39641758e-04, 2.78069275e-04])
Acknowledgments
This work was funded by Global Good.
I thank Linda Allen for useful discussion about the Kolmogorov equations. Hao Hu played an important
role in inspiring this work and testing the methods. Hil Lyons and Monique Ambrose provided valuable
feedback on the discussion of inference. Amelia Bertozzi-Villa and Monique Ambrose read over drafts and
recommended a number of changes that have significantly improved the presentation.
The python code and output in Appendix C was incorporated using Pythontex [43]. I relied heavily
on https://tex.stackexchange.com/a/355343/70067 by “touhami” in setting up the solutions to the
exercises.
References
[1] Linda JS Allen. An introduction to stochastic epidemic models. In Mathematical Epidemiology, pages
81–130. Springer, 2008.
[2] Linda JS Allen. An introduction to stochastic processes with applications to biology. CRC Press, 2010.
[3] Linda JS Allen. A primer on stochastic epidemic models: Formulation, numerical simulation, and
analysis. Infectious Disease Modelling, 2017.
[4] Tibor Antal and PL Krapivsky. Exact solution of a two-type branching process: models of tumor
progression. Journal of Statistical Mechanics: Theory and Experiment, 2011(08):P08018, 2011.
[5] Norman TJ Bailey. The total size of a general stochastic epidemic. Biometrika, pages 177–185, 1953.
[6] Norman TJ Bailey. The elements of stochastic processes with applications to the natural sciences. John
Wiley & Sons, 1964.
67
[7] MS Bartlett. Some evolutionary stochastic processes. Journal of the Royal Statistical Society. Series B
(Methodological), 11(2):211–229, 1949.
[8] Seth Blumberg and James O Lloyd-Smith. Inference of R0 and transmission heterogeneity from the
size distribution of stuttering chains. PLoS Computational Biology, 9(5):e1002993, 2013.
[9] Folkmar Bornemann. Accuracy and stability of computing high-order derivatives of analytic functions
by Cauchy integrals. Foundations of Computational Mathematics, 11(1):1–63, 2011.
[10] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie
Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. Computer Networks, 33:309–
320, 2000.
[11] Jessica M Conway and Daniel Coombs. A stochastic model of latently infected cell reactivation and
viral blip generation in treated HIV patients. PLoS Computational Biology, 7(4):e1002033, 2011.
[12] O. Diekmann and J. A. P. Heesterbeek. Mathematical epidemiology of infectious diseases. Wiley Chich-
ester, 2000.
[13] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin. Giant strongly connected component of
directed networks. Physical Review E, 64(2):025101, Jul 2001.
[14] Richard Durrett. Branching process models of cancer. In Branching Process Models of Cancer, pages
1–63. Springer, 2015.
[15] Meyer Dwass. The total progeny in a branching process and a related random walk. Journal of Applied
Probability, 6(3):682–686, 1969.
[16] David Easley and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected
world. Cambridge University Press, 2010.
[17] Joseph A Gallian and David J Rusin. Cyclotomic polynomials and nonstandard dice. Discrete Mathe-
matics, 27(3):245–259, 1979.
[18] Martin Gardner. Mathematical games. Scientific American, 238:19–32, 1978.
[19] Wayne M Getz and James O Lloyd-Smith. Basic methods for modeling the invasion and spread of
contagious diseases. In Disease Evolution: Models, Concepts, and Data Analyses, pages 87–112, 2006.
[20] Tiberiu Harko, Francisco SN Lobo, and MK Mak. Exact analytical solutions of the Susceptible-Infected-
Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Applied
Mathematics and Computation, 236:184–194, 2014.
[21] Peter D Hoff. A first course in Bayesian statistical methods. Springer Science & Business Media, 2009.
[22] Remco van der Hofstad and Michael Keane. An elementary proof of the hitting time theorem. The
American Mathematical Monthly, 115(8):753–756, 2008.
[23] Thomas House, Joshua V Ross, and David Sirl. How big is an outbreak likely to be? methods for
epidemic final-size calculation. Proc. R. Soc. A, 469(2150):20120436, 2013.
[24] Eben Kenah and Joel C. Miller. Epidemic percolation networks, epidemic outcomes, and interventions.
Interdisciplinary Perspectives on Infectious Diseases, 2011, 2011.
[25] David G Kendall. Stochastic processes and population growth. Journal of the Royal Statistical Society.
Series B (Methodological), 11(2):230–282, 1949.
[26] Marek Kimmel and David E Axelrod. Branching processes in biology. interdisciplinary applied mathe-
matics 19, 2002.
[27] Istvan Z Kiss, Joel C Miller, and Péter L Simon. Mathematics of epidemics on networks: from exact to
approximate models. Springer, Forthcoming.
68
[28] Adam J Kucharski and W John Edmunds. Characterizing the transmission potential of zoonotic infec-
tions from minor outbreaks. PLoS Computational Biology, 11(4):e1004154, 2015.
[29] Mark A Lewis, Sergei V Petrovskii, and Jonathan R Potts. The mathematics behind biological invasions,
volume 44. Springer, 2016.
[30] James O Lloyd-Smith, Sebastian J Schreiber, P Ekkehard Kopp, and Wayne M Getz. Superspreading
and the effect of individual variation on disease emergence. Nature, 438(7066):355, 2005.
[31] Donald Ludwig. Final size distributions for epidemics. Mathematical Biosciences, 23:33–46, 1975.
[32] Junling J. Ma and David J. D. Earn. Generality of the final size formula for an epidemic of a newly
invading infectious disease. Bulletin of Mathematical Biology, 68(3):679–702, 2006.
[33] Robert M May. Simple mathematical models with very complicated dynamics. Nature, 261(5560):459–
467, 1976.
[34] Joel C. Miller. A note on a paper by Erik Volz: SIR dynamics in random networks. Journal of
Mathematical Biology, 62(3):349–358, 2011.
[35] Joel C. Miller. A note on the derivation of epidemic final sizes. Bulletin of Mathematical Biology,
74(9):2125–2141, 2012.
[36] Joel C Miller, Bahman Davoudi, Rafael Meza, Anja C Slim, and Babak Pourbohloul. Epidemics with
general generation interval distributions. Journal of Theoretical Biology, 262(1):107–115, 2010.
[37] Joel C. Miller, Anja C. Slim, and Erik M. Volz. Edge-based compartmental modelling for infectious
disease spread. Journal of the Royal Society Interface, 9(70):890–906, 2012.
[38] Cristopher Moore and Mark EJ Newman. Exact solution of site and bond percolation on small-world
networks. Physical Review E, 62(5):7059, 2000.
[39] Sean Nee, Edward C Holmes, Robert M May, and Paul H Harvey. Extinction rates can be estimated
from molecular phylogenies. Phil. Trans. R. Soc. Lond. B, 344(1307):77–82, 1994.
[40] Hiroshi Nishiura, Ping Yan, Candace K Sleeman, and Charles J Mode. Estimating the transmission
potential of supercritical processes based on the final size distribution of minor outbreaks. Journal of
Theoretical Biology, 294:48–55, 2012.
[41] Heinz-Otto Peitgen, Hartmut Jürgens, and Dietmar Saupe. Chaos and fractals: new frontiers of science.
Springer Science & Business Media, 2006.
[42] George Pólya. Mathematics and plausible reasoning: Induction and analogy in mathematics, volume 1.
Princeton University Press, 1990.
[43] Geoffrey M Poore. Pythontex: reproducible documents with LATEX, Python, and more. Computational
Science & Discovery, 8(1):014010, 2015.
[44] Timothy Reluga, Rafael Meza, D. Brian Walton, and Alison P. Galvani. Reservoir interactions and
disease emergence. Theoretical population biology, 72(3):400–408, 2007.
[45] Richard P. Stanley. Enumerative Combinatorics, volume II. Cambridge University Press, 2001.
[46] L. D. Valdez, P. A. Macri, and L. A. Braunstein. Temporal percolation of the susceptible network in
an epidemic spreading. PLoS One, 7(9):e44188, 2012.
[47] Erik M. Volz. SIR dynamics in random networks with heterogeneous connectivity. Journal of Mathe-
matical Biology, 56(3):293–310, 2008.
[48] Erik M Volz, Ethan Romero-Severson, and Thomas Leitner. Phylodynamic inference across epidemic
scales. Molecular biology and evolution, 34(5):1276–1288, 2017.
69
[49] Henry William Watson and Francis Galton. On the probability of the extinction of families. The Journal
of the Anthropological Institute of Great Britain and Ireland, 4:138–144, 1875.
[50] Warren Weckesser. odeintw. https://github.com/WarrenWeckesser/odeintw.
[51] JG Wendel. Left-continuous random walk and the Lagrange expansion. American Mathematical
Monthly, pages 494–499, 1975.
[52] Herbert S. Wilf. generatingfunctionology. A K Peters, Ltd, 3rd edition, 2005.
[53] Ping Yan. Distribution theory, stochastic processes and infectious disease modelling. Mathematical
Epidemiology, pages 229–293, 2008.
70