0% found this document useful (0 votes)
16 views70 pages

A Primer On The Use of Probability Generating Functions in Infectious Disease Modeling

Uploaded by

boxuan wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
16 views70 pages

A Primer On The Use of Probability Generating Functions in Infectious Disease Modeling

Uploaded by

boxuan wang
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 70

A primer on the use of probability generating functions in infectious

disease modeling
Joel C. Miller
October 3, 2024
arXiv:1803.05136v4 [q-bio.PE] 2 Oct 2024

Abstract
We explore the application of probability generating functions (PGFs) to invasive processes, focusing
on infectious disease introduced into large populations. Our goal is to acquaint the reader with appli-
cations of PGFs, moreso than to derive new results. PGFs help predict a number of properties about
early outbreak behavior while the population is still effectively infinite, including the probability of an
epidemic, the size distribution after some number of generations, and the cumulative size distribution
of non-epidemic outbreaks. We show how PGFs can be used in both discrete-time and continuous-time
settings, and discuss how to use these results to infer disease parameters from observed outbreaks. In the
large population limit for susceptible-infected-recovered (SIR) epidemics PGFs lead to survival-function
based models that are equivalent the the usual mass-action SIR models but with fewer ODEs. We use
these to explore properties such as the final size of epidemics or even the dynamics once stochastic ef-
fects are negligible. We target this primer at biologists and public health researchers with mathematical
modeling experience who want to learn how to apply PGFs to invasive diseases, but it could also be used
in an applications-based mathematics course on PGFs. We include many exercises to help demonstrate
concepts and to give practice applying the results. We summarize our main results in a few tables.
Additionally we provide a small python package which performs many of the relevant calculations.

Contents
1 Introduction 2
1.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 4
1.2 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 9

2 Discrete-time spread of a simple disease: early time 9


2.1 Early extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
2.1.1 Derivation as a fixed point equation . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.1.2 Derivation from an iterative process . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
2.2 Early-time outbreak dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14
2.3 Cumulative size distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.3.1 Focused approach to find the cumulative size distribution . . . . . . . . . . . . . . . . 17
2.3.2 Broader approach . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
2.4 Small outbreak final size distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 20
2.4.1 Inference based on outbreak sizes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
2.5 Generality of discrete-time results . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26

3 Continuous-time spread of a simple disease 28


3.1 Extinction probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 28
3.1.1 Extinction probability as a function of time . . . . . . . . . . . . . . . . . . . . . . . . 29
3.2 Early-time outbreak dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29
3.3 Cumulative and current outbreak size distribution . . . . . . . . . . . . . . . . . . . . . . . . 32
3.4 Small outbreak final size distribution . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 33

1
3.5 Full dynamics in finite populations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.1 SIS . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
3.5.2 SIR . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
3.6 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

4 Large-time dynamics 39
4.1 SIR disease and directed graphs. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39
4.2 Final size relations for SIR epidemics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 40
4.3 Discrete-time SIR dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 41
4.4 Continuous-time SIR epidemic dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42
4.5 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 43

5 Multitype populations 44
5.1 Discrete-time epidemic probability . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44
5.2 Continuous-time SIR dynamics . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46
5.3 Exercises . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46

6 Discussion 47

A Important properties of PGFs 48


A.1 Properties related to individual coefficients . . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
A.2 Properties related to distribution moments . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.3 Properties related to function composition . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50
A.4 Properties related to iteration of PGFs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 54
A.5 Finding the Kolmogorov Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.5.1 Forward Kolmogorov Equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 56
A.5.2 Backward Kolmogorov equations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

B Proof of Theorems 2.7 and 3.6 61


B.1 Proof of Theorem 2.7 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
B.2 Theorem 3.6 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

C Software 64

1 Introduction
The spread of infectious diseases remains a public health challenge. Increased interaction between humans
and wild animals leads to increased zoonotic introductions, and modern travel networks allows these diseases
to spread quickly. Many mathematical approaches have been developed to give us insight into the early
behavior of disease outbreaks. An important tool for understanding the stochastic behavior of an outbreak
soon after introduction is the probability generating function (PGF) [52, 2, 53].
Specifically, PGFs frequently give insight the statistical behavior of outbreaks before they are large
enough to be affected by the finite-size of the population. In these cases, both susceptible-infected-recovered
(SIR) disease (for which nodes recover with immunty) and susceptible-infected-susceptible (SIS) disease (for
which nodes recover and can be reinfected immediately) are equivalent. In the case of SIR disease they can
be used to study the dynamics of disease once an epidemic is established in a large population.
We can investigate properties such as the early growth rate of the disease, the probability the disease
becomes established, or the distribution of final sizes of outbreaks that fail to become established. Similar
questions also arise in other settings where some introduced agent can reproduce or die, such as invasive
species in ecological settings [29], early within-host pathogen dynamics [11], and the accumulation of mu-
tations in precancerous and cancerous cells [14, 4] or in pathogen evolution [48]. These are all examples of
branching processes, and PGFs are a central tool for the analysis of branching processes [7, 25, 26]. Except
for Section 4 where we develop deterministic equations for later-time SIR epidemics, based on [47, 34, 37],
the approaches we describe here have direct application in these other branching processes as well.

2
i
P
Distribution PGF f (x) = i ri x

e−λ λi
Poisson, mean λ: ri = i! eλ(x−1)

Uniform: rλ = 1 xλ
n

Binomial: n trials, with success probability p: ri = i pi q n−i for q = 1 − p [q + px]n

Geometric1 : ri = q i p for q = 1 − p and i = 0, 1, . . . p/(1 − qx)


 r̂
i+r̂−1 q

Negative binomial2 : ri = i q r̂ pi for q = 1 − p 1−px

Table 1: A few common probability distributions and their PGFs.

Before proceeding, we define what a PGF is. Let ri denote the probability of drawing the value i from a
given distribution of non-negative integers. Then
X
f (x) = ri xi
i

is the PGF of this distribution. We should address a potential confusion caused by the name. A “generating
function”
P is a function which is defined from (or “generated by”) a sequence of numbers ai and takes the
form i ai xi . So a “probability generating function” is a generating function defined from a probability
distribution on integers. It is not a function that generates probabilities when values are plugged
P in mfor x.
m
There are other generating functions, including the “moment generating function”, defined to be m ⟨i ⟩ x
m m
P
where ⟨i ⟩ = i ri i (the moment and probability generating functions turn out to be closely related).
PGFs have a number of useful properties which we derive in Appendix A. We have structured this paper
so that a reader can skip ahead now and read Appendix A in its entirety to get a self-contained introduction
to PGFs, or wait until a particular property is referenced in the main text and then read that part of the
appendix.
As we demonstrate in Table 1, for many important distributions the PGF takes a simple form. We derive
this for the Poisson distribution.
Example 1.1 Consider the Poisson distribution with mean λ

e−λ λi
ri = .
i!
For this we find
X e−λ λi X (λx)i
f (x) = xi = e−λ = e−λ eλx
i
i! i
i!
λ(x−1)
=e .

In this primer, we explore the application of PGFs to the study of disease spread. We will use PGFs to
answer questions about the early-time behavior of an outbreak (neglecting depletion of susceptibles):
• What is the probability an outbreak goes extinct within g generations (or by time t) in an arbitrarily
large population?

• What is the probability an index case causes an epidemic?


1 Another definition of the geometric distribution with different indexing, ri = q i−1 p for i = 1, 2, . . ., gives a different PGF.
2 Typically the negative binomial is expressed in terms of a parameter r which is the number of failures at which the
experiment stops, assuming each with success probability p. For us ri plays an important role, so to help distinguish these, we
use r̂ rather than r. Then ri is the probability of i successes.

3
• What is the final size distribution of small outbreaks?
• What is the size distribution of outbreaks at generation g (or time t)?
• How fast is the initial growth for those outbreaks that do not go extinct?
Although we present these early-time results in the context of SIR outbreaks they also apply to SIS outbreaks
and many other invasive processes.
We can also use PGFs for some questions about the full behavior accounting for depletion of susceptibles.
Specifically:
• In a continuous-time Markovian SIR or SIS outbreak spreading in a finite population, what is the
distribution of possible system states at time t?

• In the large-population limit of an SIR epidemic, what fraction of the population is eventually infected?
• In the large-population limit of an SIR epidemic, what fraction of the population is infected or recovered
at time t?
We will consider both discrete-time and Markovian continuous-time models of disease. In the discrete-
time case each infected individual transmits to some number of “offspring” before recovering. In the
continuous-time case each infected individual trasmits with a rate β and recovers with a rate γ.
In Section 2 we begin our study investigating properties of epidemic emergence in a discrete-time,
generation-based framework, focusing on the probability of extinction and the sizes of outbreaks assum-
ing that the disease is invading a sufficiently large population with enough mixing that we can treat the
infections caused by any one infected individual as independent of the others. We also briefly discuss how we
might use our observations to infer disease parameters from observed small outbreaks. In Section 3, we repeat
this analysis for a continuous-time case treating transmission and recovery as Poisson processes, and then
adapt the analysis to a population with finite size N . Next in Section 4 we use PGFs to derive simple models
of the large-time dynamics of SIR disease spread, once the infection has reached enough individuals that we
can treat the dynamics as deterministic. Finally, in Section 5 we explore multitype populations in which there
are different types of infected individuals, which may produce different distributions of infections. We provide
three appendices. In Appendix A, we derive the relevant properties of PGFs, in Appendix B we provide ele-
mentary (i.e., not requiring Calculus) derivations of two important theorems, and in Appendix C we provide
details of a Python package Invasion PGF available at https://github.com/joelmiller/Invasion_PGF
that implements most of the results described in this primer. Python code that uses this package to imple-
ment the figures of Section 2 is provided in the supplement.
Our primary goal here is to provide modelers with a useful PGF-based toolkit, with derivations that
focus on developing intuition and insight into the application rather than on providing fully rigorous proofs.
Throughout, there are exercises designed to increase understanding and help prepare the reader for appli-
cations. This primer (and Appendix A in particular) could serve as a resource for a mathematics course on
PGFs. For readers wanting to take a deep dive into the underlying theory, there are resources that provide
a more technical look into PGFs in general [52] or specifically using PGFs for infectious disease [53].

1.1 Summary
Before presenting the analysis, we provide a collection of tables that summarize our main results. Table 2
summarizes our notation. Tables 3 and 4 summarize our main results for the discrete-time and continuous-
time models. Table 5 shows applications of PGFs to the continuous-time dynamics of SIR epidemics once
the disease has infected a non-negligible proportion of a large population, effectively showing how PGFs can
be used to replace most common mass-action models. Finally, Table 6 provides the probability of each finite
final outbreak size assuming a sufficiently large population that susceptible depletion never plays a role.

4
Function/variable name Interpretation

f (x) = i pi xi
P
Arbitrary PGFs.
g(x) = i qi xi
P

Without hats: The PGF for the offspring distribution in discrete


pi y i
P
µ(y) = i time.
µ̂(y) = (βy 2 + γ)/(β + γ) With hats: The PGF for the outcome of an unknown event in a
µ̂(y, z) = (βy 2 + γz)/(β + γ) continuous-time Markovian outbreak: y accounts for active infections
and z accounts for completed infections.
Probability of either eventual extinction, extinction by generation g,
α, αg , α(t)
or by time t in an infinite population.
ϕi (g)y i
P
Φg (y) = PGF for the number of active infections in generation g or at time t in
Pi
Φ(y, t) = i ϕi (t)y i an infinite population.

Ω∞ (z) = r<∞ ωr z r + ω∞ z ∞
P The PGF for the distribution of completed infections at the end of a
small outbreak, in generation g, or at time t in an infinite population.
Ωg (z) = r ωr (g)z r
P
If R0 > 1, then one of the terms in the expansion of Ω∞ (z) is ω∞ z ∞
Ω(z, t) = r ωr (t)z r
P
where ω∞ is the probability of an epidemic.
Πg (y, z) = i,r πi,r (g)y i z r
P
The PGF for the joint distribution of current infections and completed
Π(y, z, t) = i,r πi,r (t)y i z r infections either at generation g or time t in an infinite population.
P

The PGF for the joint distribution of susceptibles and current infec-
P s i tions at time t in a finite population of size N (used for continuous
Ξ(x, y, t) = s,i ξs,i (t)x y
time only). In the SIR case we can infer the number recovered from
this and the total population size.
PGF for the “ancestor distribution”, analogous to the offspring distri-
p i xi
P
χ(x) = i
bution.
PGF for the distribution of susceptibility for the continuous time model
P (κ)xκ
P
ψ(x) = κ where rate of receiving transmission is proportional to κ.
The individual transmission and recovery rates for the Markovian con-
β, γ
tinuous time model.

Table 2: Common function and variable names. When we use a PGF for the number of susceptible individ-
uals, active infections, and/or completed infections x and s correspond to susceptible individuals, y and i to
active infections, and z and r to completed infections.

5
Question Section Solution

Basic Reproductive Number R0 [the aver-


age number of transmissions an infected in- Intro to 2 R0 = µ′ (1).
dividual causes early in an outbreak].

Probability of extinction, α, given a single α = limg→∞ µ[g] (0) or, equivalently, the
2.1
introduced infection. smallest x in [0, 1] for which x = µ(x).
Probability of extinction within g genera-
2.1.2 αg = µ[g] (0).
tions
PGF of the distribution of the number of
2.2 Φg (y) where Φg solves Φg (y) = µ[g] (y).
infected individuals in the g-th generation.
Average number of active infections in gen-
Rg
eration g and average number if the out- 2.2 Rg0 , and 1−αg .
0

break has not yet gone extinct.


PGF of the number of completed cases at Ωg (z) where Ωg solves Ωg (z) =
2.3.1
generation g in an infinite population. zµ(Ωg−1 (z)) with Ω0 (z) = 1.
PGF of the joint distribution of the number
Πg (y, z) where Πg solves Πg (y, z) =
of current and completed cases at genera- 2.3.2
zµ(Πg−1 (y, z)) with Π0 (y, z) = y.
tion g in an infinite population.
Ω∞ (z) where Ω∞ solves Ω∞ (z) =
limg→∞ Ωg (z). It also solves Ω∞ (z) =
PGF of the final size distribution. 2.4
zµ(Ω∞ (z)). This has a discontinuity at
|z| = 1 if epidemics are possible.
(j)
pj−1 (j)
Probability an outbreak infects exactly j in- where pi is the coefficient of y i in the
2.4 j
dividuals expansion of [µ(y)]j .
P (j1 |Θ)···P (jℓ |Θ)P (Θ)
Probability a disease has a particular set of P (Θ|X) = P ′ ′ , which
Θ′ P (j1 |Θ )···P (jℓ |Θ )
parameters Θ given a set of observed inde- can be solved numerically using our prior
2.4.1
pendent outbreak sizes X = (j1 , . . . , jℓ ) and knowledge P (Θ) and our knowledge of the
a prior belief P (Θ). probability of each ji given Θ.

Table 3: A summary of our results for application of PGFs to discrete-time SIS and SIR disease processes
in the infinite population limit. The function µ(x) is the PGF for the offspring distribution. The notation
[g] in the exponent denotes function composition g times. For example, µ[2] (y) = µ(µ(y)).

6
Question Section Solution
Probability of eventual extinction α given a
3.1 α = min(1, γ/β)
single introduced infection.

Probability of extinction by time t, α(t). 3.1.1 α(t) where α̇ = (β+γ)[µ̂(α)−α] and α(0) =
0.
Φ(y, t) where Φ(y, 0) = y and Φ solves ei-
ther
PGF of the distribution of number of in- ∂ ∂
Φ = (β + γ)[µ̂(y) − y] ϕ
fected individuals at time t (assuming one 3.2 ∂t ∂y
infection at time 0).
or

Φ = (β + γ)[µ̂(Φ) − Φ] .
∂t

Ω(z, t) where Ω(z, 0) = 1 and Ω solves


PGF of the number of completed cases at ∂
3.4  
time t. Ω = (β + γ) µ̂(Ω, z) − Ω
∂t

Π(y, z, t) where Π(y, z, 0) = y and Π solves


either
PGF of the joint distribution of the number ∂  ∂
Π = (β + γ) µ̂(y, z) − y Π
of current and completed cases at time t 3.3 ∂t ∂y
(assuming one infection at time 0).
or
∂  
Π = (β + γ) µ̂(Π, z) − Π .
∂t

Ω∞ (z) = limt→∞ Ω(z, t). This also solves


PGF of the final size distribution. 3.4 Ω∞ (z) = µ̂(Ω∞ (z), z). If epidemics are pos-
sible this has a discontinuity at |z| = 1.
Probability an outbreak infects exactly j in- 1 β
j−1 j
γ 2j−2

3.4 j (β+γ)2j−1 j−1 .
dividuals
Ξ(x, y, t) where Ξ solves
PGF for the joint distribution of the num-
ber susceptible and infected at time t for 3.5.1 ∂ β ∂ ∂ ∂
Ξ = (y 2 − xy) Ξ + γ(x − y) Ξ
SIS dynamics in a population of size N . ∂t N ∂x ∂y ∂y

Ξ(x, y, t) where Ξ solves


PGF for the joint distribution of the num-
ber susceptible and infected at time t for 3.5.2 ∂ β ∂ ∂ ∂
Ξ = (y 2 − xy) Ξ + γ(1 − y) Ξ
SIR dynamics in a population of size N . ∂t N ∂x ∂y ∂y

Table 4: A summary of our results for application of PGFs to the continuous-time disease process. We
assume individuals transmit with rate β and recover with rate γ. The functions µ̂(y) = (βy 2 + γ)/(β + γ)
and µ̂(y, z) = (βy 2 + γz)/(β + γ) are given in System (14).

7
Question Section Solution

Final size relation for an SIR epidemic assuming r(∞) = 1 − χ(1 − r(∞)). [For standard as-
a vanishingly small fraction ρ randomly infected 4.2 sumptions, including the usual continuous-
initially with ρN ≫ 1. time assumptions, χ(x) = e−R0 (1−x) .]
For g > 0:

Discrete-time number susceptible, infected, or re- S(g) = N (1 − ρ)e−R0 (1−S(g−1)/N )


covered in a population with homogeneous sus- I(g) = N − S(g) − R(g)
4.3
ceptibility and given R0 , assuming an initial frac- R(g) = R(g − 1) + I(g − 1)
tion ρ is randomly infected with ρN ≫ 1.
with the initial condition S(0) = (1 − ρ)N ,
I(0) = ρN , and R(0) = 0.
For g > 0:
Discrete-time number susceptible, infected, or re- S(g) = N (1 − ρ)χ(S(g − 1)/N )
covered in a population with heterogeneous sus-
I(g) = N − S(g) − R(g)
ceptibility for SIR disease after g generations 4.3
with an initial fraction ρ randomly infected where R(g) = R(g − 1) + I(g − 1)
ρN ≫ 1.
with the initial condition S(0) = (1 − ρ)N ,
I(0) = ρN , and R(0) = 0.
For t > 0:

S(t) = (1 − ρ)N ψ(θ(t))


Continuous time number susceptible, infected, or I(t) = N − S(t) − R(t)
recovered for SIR disease as a function of time
with an initial fraction ρ randomly infected where 4.4 γN ⟨K⟩
R(t) = ln θ(t)
ρN ≫ 1. Assumes u receives infection at rate β
βIκu /N ⟨K⟩ β
θ̇(t) = − Iθ(t)
N ⟨K⟩

with the initial condition θ(0) = 1.

Table 5: A summary of our results for application of PGFs to the final size and large-time dynamics of
SIR disease. The PGFs χ and ψ encode the heterogeneity in susceptibility. The PGF χ is the PGF of
the ancestor
P distribution (an ancestor of u is any individual who, if infected, would infect u). The PGF
ψ(x) = κ p(κ)xκ encodes the distribution of the contact rates.

8
Probability of Log-Likelihood of
Distribution PGF
j infections Parameters given j
(jλ)j−1 −jλ
Poisson eλ(y−1) j! e −jλ + (j − 1) log(jλ) − log(j!)
( (
λ 1 j = 1, λ = 0 0 j = 1, ]λ = 0
Uniform y
0 otherwise −∞ otherwise

log((nj)!) − log((nj − j + 1)!) −


1 nj

Binomial (q + py)n j j−1 pj−1 q nj−j+1 log(j!) + (j − 1) log p + (nj − j +
1) log q

1 2j−2
 log((2j − 2)!) − log((j − 1)!) −
Geometric p/(1 − qy) pj q j−1
j j−1 log(j!) + j log p + (j − 1) log q
r̂
log((r̂j +j −1)!)−log((r̂j −1)!)−

q 1 r̂j+j−2

Negative Binomial q r̂j pj−1
1−py j j−1 log(j!) + r̂j log q + (j − 1) log p

Table 6: The probability of j total infections in an infinite population for different offspring distributions,
derived using Theorem 2.7 and the corresponding log-likelihoods. For any one of these, if we sum the
probability of j over (finite) j, we get the probability that the outbreak remains finite in an infinite population.
This is particularly useful when inferring disease parameters from observed outbreak sizes (Section 2.4.1).
The parameters’ interpretations are given in Table 1.

1.2 Exercises
We end each section with a collection of exercises. We have designed these exercises to give the reader more
experience applying PGFs and to help clarify some of the more subtle points.
Exercise 1.1 Except for the Poisson distribution handled in Example 1.1, derive the PGFs shown in
Table 1 directly from the definition f (x) = i ri xi .
P
For the negative binomial, it may be useful to use the binomial series:

η(η − 1) 2 η(η − 1) · · · (η − i + 1) i
(1 + δ)η = 1 + ηδ + δ + ··· + δ + ···
2! i!
using η = −r̂ and δ = −px.
Exercise 1.2 Consider the binomial distribution with n trials, each having success probability p = λ/n.
Using Table 1, show that the PGF for the binomial distribution converges to the PGF for the Poisson
distribution in the limit n → ∞, if λ is fixed.

2 Discrete-time spread of a simple disease: early time


We begin with a simple model of disease transmission using a discrete-time setting. In the time step after
becoming infected, an infected individual causes some number of additional cases and then recovers. We let
pi denote the probability of causing exactly i infections (referred to as “offspring”) before recovering. It will
be useful to define the PGF for the offspring distribution

X
µ(y) = pi y i . (1)
i=0

For results related to early extinction or early-time dynamics, we will assume that the population is large
enough and sufficiently well-mixed that the transmissions in successive generations are all independent events

9
Figure 1: A sample of 10 outbreaks starting with a bimodal distribution having R0 = 0.9 in which 3/10 of
the population causes 3 infections and the rest cause none. The top row denotes the initial states, showing
each of the 10 initial infections. An edge from one row to the next denotes an infection from the higher node
to the lower node. Most outbreaks die out immediately.

and unaffected by depletion of susceptible individuals. Before deriving our results for the early-time behavior
of our discrete-time model, we offer a summary in table 3.
Often in disease spread we are interested in the expected number of infections caused by an infected
individual early in an outbreak, which we define to be R0 .
X
R0 = ipi = µ′ (1) (2)
i

′ d
where µ (x) = dx µ(x). The value of R0 is related to disease dynamics, but it is not the only important
property of µ.
Example 2.1 We demonstrate a few sample outbreaks in Fig. 1. Here we take a bimodal case with R0 = 0.9
such that a proportion 0.3 of the population cause 3 infections and the remaining 0.7 cause none. Most of
the outbreaks die out immediately, but some persist, surviving multiple generations before extinction.

Example 2.2 Throughout Section 2 we compare simulated SIR outbreaks with the theoretical predictions
which we calculate using the Python package Invasion PGF described in Appendix C. We assume that all
individuals are equally likely to be infected by any transmission, and we focus on R0 = 0.75 and R0 = 2.
For each R0 , we consider two distributions for the number of new infections an infected individual causes:

• a Poisson-distributed number of infections with mean R0 , or


• a bimodal distribution with either 0 or 3 infections, with the proportion chosen to give a mean of R0 .
The probabilities are p0 = 1 − R0 /3 and p3 = R0 /3 (R0 > 3 is impossible).
The bimodal distribution is similar to that of Fig. 1, but with different probabilities of 0 or 3. After an
individual chooses the number of infections to cause, the recipients are selected uniformly at random (with
replacement) from the population. If they are susceptible, an infection occurs at the next time step, otherwise
nothing happens. We use 5 × 105 simulations for N = 100 and N = 1000.
Figure 2 looks at the final size distribution. The distribution of the number infected in small outbreaks
(insets) is not significantly affected by the total population size. This is because they do not grow large
enough to “see” the system size. They would die out even in an infinite population. Large outbreaks, or
epidemics, on the other hand would grow without bound in an infinite population, and their growth is limited
by the finiteness of the population. We will see that (assuming homogeneous susceptibility and the large

10
Poisson Bimodal
10 10

Simulated, N = 100 Simulated, N = 100


Simulated, N = 1000 Simulated, N = 1000
8 8

Probability density

Probability density
Probability

Probability
0.4 0.6

6 6 0.4
0.2
0.2
R0 = 0.75
4 0.0 4 0.0
0 10 20 30 0 10 20 30
Number infected Number infected
2 2

0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected
10 10

Simulated, N = 100 Simulated, N = 100


Simulated, N = 1000 Simulated, N = 1000
8 8
Probability density

Probability density
Probability

Probability
0.3
0.10
6 6 0.2

0.05
0.1
R0 = 2
4 0.00 4 0.0
0 10 20 30 0 10 20 30
Number infected Number infected
2 2

0 0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected

Figure 2: Simulated outcomes of SIR outbreaks in populations as described in example 2.2. Outbreaks tend
to be either small or large. The typical number infected in small outbreaks (insets) is affected by the details
of the offspring distribution, but not the population size. The typical proportion infected in large outbreaks
(epidemics) appears to depend on the average number of transmissions an individual causes, but not the
population size or the offspring distribution. These observations will be explained later. These simulations
are reused throughout this section to show how PGFs capture different properties of the distributions.

population limit), the proportion infected in an SIR epidemic depends only on R0 .

2.1 Early extinction probability


A common misconception is that if R0 > 1 an epidemic is inevitable. In fact, if we are lucky an outbreak
can die out stochastically before the number infected is large. Conversely, if we are not lucky it may initially
grow faster than our deterministic models predict.
In any finite population a disease will eventually go extinct because the disease interferes with its own
spread. Our observations show that the typical final outcomes of an outbreak are either an “epidemic” which
grows until the number infected is limited by the finiteness of the population or a small outbreak which dies
out before it can see the system size. One of our first questions about a possible disease emergence is “what is
the probability that an outbreak will grow into an epidemic?” We focus on the equivalent question, “what is
the probability the outbreak goes extinct before causing an epidemic?”. We aim to calculate the probability
that the disease would go extinct if it never interferes with its own spread, or in other words, if it were
spreading through an unlimited population. Throughout we assume that disease is introduced with a single
randomly chosen index case.
The theory for the extinction probability in an unbounded population has been developed extensively in
the context of Galton–Watson processes [49]. It has been applied to infectious disease many times, e.g., [16,
section 21.8] and [19, 30].

11
2.1.1 Derivation as a fixed point equation
We present two derivations of the extinction probability. Our first is quicker, but gives less insight. We start
with the a priori observation that the extinction probability takes some value between 0 and 1 inclusive.
Our goal is to filter out the vast majority of these options by finding a property of the extinction probability
that most values between 0 and 1 do not have.
Let α be the probability of extinction if the
P spread starts from a single infected individual. Then from
Property A.1 of Appendix A we have α = i pi α̂i = µ(α̂) where α̂ is the probability that, in isolation,
an offspring of the initial infected individual would not cause an epidemic. Because we assume that the
offspring distribution of later cases is the same as for the index case, we must have α̂ = α and so the
extinction probability solves α = µ(α).
We have established:
Theorem 2.1 Assuming that each infected individual produces an independent number of offspring i
chosen from a distribution having PGF µ(y), then α, the probability an outbreak starting from a single
infected individual goes extinct, satisfies
α = µ(α) . (3)
Not all solutions to x = µ(x) must give the extinction probability.

There can be more than one x solving x = µ(x). In fact 1 = µ(1) is always a solution, and from
Property A.9 it follows that there is another solution if and only if R0 = µ′ (1) > 1. In this case, our
derivation of Theorem 2.1 does not tell us which of the solutions is correct. However, Section 2.1.2 shows
that the correct solution is the smaller solution when it exists. More specifically the extinction probability
is α = limg→∞ αg where αg = µ(αg−1 P ) starting with α0 = 0. This gives a condition for a nonzero epidemic
probability. Namely R0 = µ′ (1) = i ipi > 1.
Example 2.3 We now consider the Poisson and bimodal offspring distributions described in Example 2.2.
We saw that typically an outbreak either affects a small proportion of the population (a vanishing fraction
in the infinite population limit) or a large number (a nonzero fraction in the infinite population limit).
By plotting the cumulative density function (cdf ) of proportion infected in Fig. 3, we extend our earlier
observations. The cdf is steep near zero (becoming vertical in the infinite population limit). Then it is
effectively flat for a while. Finally if R0 > 1 it again grows steeply at some proportion infected well above
0 (the size of epidemic outbreaks).
The plateau’s height is the probability that an outbreak dies out while small. Fig. 3 shows that this is
well-predicted by choosing the smaller of the solutions to x = µ(x).
For a fixed R0 > 1, the the plateau’s height (i.e., the early extinction probability) depends on the details
of the offspring distribution and not simply R0 . However, the critical value at which the cdf increases for
the second time depends only on R0 . This suggests that even though the probability of an epidemic depends
on the details of the offspring distribution, the proportion infected in an SIR epidemic depends only on R0 ,
the reproductive number. We explore this in more detail in Section 4.2.

2.1.2 Derivation from an iterative process


In our second derivation, we calculate the probability that the outbreak dies out within g “generations”.
Then the probability the outbreak would die out after a finite number of steps in an infinite population
is simply the limit of this as g → ∞. In our counting of “generations”, we consider the index case to be
generation 0. An individual’s generation is equal to the number of transmissions occurring in the chain from
the index case to that individual.
We define αg to be the probability that the longest chain an index case will initiate has fewer than g
transmissions. So because there are always at least 0 transmissions, α0 = 0. The probability that there is
no transmission is by definition α1 . Recalling that the probability the index case causes zero infections is
p0 , we have
α1 = p0 = µ(0) = µ(α0 )
is the probability that the index case does not cause a chain of 1 or more transmissions. The probability that
all chains die out after at most 1 transmission (that is, there are no second generation cases) is the probability

12
Poisson Bimodal
1.0 1.0

Cumulative probability

Cumulative probability
0.8 0.8

0.6 0.6

R0 = 0.75
0.4 0.4

Simulated, N = 100 Simulated, N = 100


0.2 0.2
Simulated, N = 1000 Simulated, N = 1000
predicted early extinction probability predicted early extinction probability
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected
1.0 1.0
Cumulative probability

Cumulative probability
0.8 0.8

0.6 Simulated, N = 100 0.6


Simulated, N = 1000
R0 = 2 predicted early extinction probability
0.4 0.4

Simulated, N = 100
0.2 0.2
Simulated, N = 1000
predicted early extinction probability
0.0 0.0
0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
Proportion infected Proportion infected

Figure 3: Illustration of Theorem 2.1. The cumulative density function (cdf) for the total proportion
ever infected (effectively the integral of Fig. 2). For small R0 , all outbreaks die out without affecting a sizable
portion of the population. For larger R0 , there are many small outbreaks and many large outbreaks, but
very few outbreaks in between, so the cdf is flat in this range. The height of this plateau is the probability
the outbreak dies out while small. This is approximately the predicted extinction probability for an infinite
population (dashed). The probability of a small outbreak is different for the different distributions, but the
proportion infected corresponding to epidemics is the same (for given R0 ).

that the index case causes i infections, pi , times the probability none of those i individuals causes further
infections, α1i , summed over all i. We introduce the notation µ[g] (x) to be the result of iterative applications
of µ to x g times, so µ[1] (x) = µ(x) and for g > 1, µ[g] (x) = µ(µ[g−1] (x)). Then following Property A.1 we
have
α2 = p0 + p1 α1 + p2 α12 + · · · = µ(α1 ) = µ[2] (0)
We generalize this by stating that the probability an initial infection fails to initiate any length g chains is
equal to the probability that all of its i offspring fail to initiate a chain of length g − 1.
X
i
αg = pi αg−1 = µ(αg−1 ) = µ[g] (0) .
i

So the probability of not starting a chain of length at least g is found by iteratively applying the function µ
g times to x = 0. Taking g → ∞ gives the extinction probability [19]:

α = lim µ[g] (0) . (4)


g→∞

The fact that there is a biological interpretation of αg starting with α0 = 0 is important. It effectively
guarantees that the iterative process converges and that the speed of convergence reflects the typical speed
of extinction. Iteration appears to be an efficient way to solve x = µ(x) numerically and because of the

13
biological interpretation, we can avoid questions that might arise about whether there are multiple solutions
of x = µ(x) and, if so, which of them corresponds to the biological problem. Instead we simply iterate
starting from 0 and the result must converge to the probability that in an infinite population the outbreak
would go extinct in finite time, regardless of what other solutions x = µ(x) might have.
Exercise 2.1 shows that if µ(0) ̸= 0 then the limit of the sequence αg is 1 if R0 ≤ 1 and some α < 1
satisfying α = µ(α) if R0 > 1. This proves:

Theorem 2.2 Assume that each infected individual produces an independent number of offspring i chosen
from a distribution having PGF µ(y). Then
• The probability an outbreak goes extinct within g generations is

αg = µ[g] (0) . (5)

• The probability of extinction in an infinite population is

α = lim αg .
g→∞

• If R0 = µ′ (1) ≤ 1 and µ(0) ̸= 0 then α = 1. If R0 > 1 extinction occurs with probability α < 1.

Example 2.4 We now consider the Poisson and bimodal offspring distributions described in Example 2.2
Figure 4 shows that starting with α0 = 0 and defining αg = µ(αg−1 ), the values of αg emerging from
the iterative process correspond to the observed probability outbreaks have gone extinct by generation g for
early values of g.
In the infinite population limit, this provides a match for all g. So this gives the probability the outbreak
goes extinct by generation g assuming it has not grown large enough to see the finite-size of the population
(i.e., assuming it has not become an epidemic). For SIR epidemics in the finite populations we use for
simulations, the plateaus eventually give way to extinction because eventually there are not enough remaining
susceptibles.

2.2 Early-time outbreak dynamics


We now explore the number of active infections present in generationP g. Setting ϕi (g) to be the probability
i active infections exist at generation g, we define the PGF Φg (y) = i ϕi (g)y i . Assuming at generation 0
there is a single infection (ϕ1 (0) = 1) then the initial condition is Φ0 (y) = y. From inductive application
of Property A.8 for composition of PGFs (exercise 2.7) it is straightforward to conclude that for g > 0,
Φg (y) = µ[g] (y) where µ(y) is the PGF for the offspring distribution.

Theorem 2.3 Assuming that each infected individual produces an independent number of offspring ℓ
chosen from a distribution with PGF µ(y), the number infected in the g-th generation has PGF
X
Φg (y) = ϕℓ (g)y ℓ = µ[g] (y) (6)

where ϕi (g) is the probability there are ℓ active infections in generation g. This does not provide informa-
tion about the cumulative number infected.

It is worth highlighting that for general distributions, the calculation of coefficients


√ of Φg (y) may seem
quite challenging. Luckily, it is not so difficult. Property A.3 states (taking i = −1)
M
1 X Φg (Re2πim/M )
ϕℓ (g) ≈
M m=1 Rℓ e2ℓπim/M

14
1.0

y = µ(x)

Extinction probability
0.8

0.6

Poisson αg+1
R0 = 0.75 0.4

0.2
N = 100
N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
1.0

y = µ(x)

Extinction probability
0.8

0.6
αg+1

Bimodal
R0 = 0.75 0.4

0.2
N = 100
N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
0.40

0.35 N = 100
0.30
Extinction probability N = 1000
0.25
αg+1

Poisson 0.20

R0 = 2 0.15

0.10

0.05 y = µ(x)
0.00
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation
0.5
Extinction probability

0.4

0.3
αg+1

Bimodal
R0 = 2 0.2

0.1
N = 100
y = µ(x) N = 1000
0.0
0.0 0.2 0.4 0.6 0.8 1.0 0 2 4 6 8 10 12 14
αg Generation

Figure 4: Illustration of Theorem 2.2. Left: Cobweb diagrams showing convergence of iterations to
the predicted outbreak extinction probability (see Fig. 10). Right: Observed probabilities of no infections
remaining after each generation for simulations of Fig. 2 showing the probability of extinction by generation
g. Thin lines show the relation between the cobweb diagram and the extinction probabilities. The simulated
probability tends to rise quickly representing outbreaks that die out early on, then it remains steady at a
level representing the probability of outbreaks dying out while small. For R0 > 1 it increases again because
the epidemics burn through the finite population (and so the infinite population theory breaks down). The
values match the corresponding iteration of the cobweb diagrams.

15
Poisson Bimodal
0.8
Simulated, N = 100 Simulated, N = 100
0.7 0.8
Simulated, N = 1000 Simulated, N = 1000
0.6
predicted predicted

Proportion

Proportion
0.6
0.5

0.4
R0 = 0.75 0.4
0.3

0.2
0.2

0.1

0.0 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30
Active infections Active infections
0.200
Simulated, N = 100 0.35 Simulated, N = 100
0.175
Simulated, N = 1000 0.30 Simulated, N = 1000
0.150
predicted 0.25
predicted

Proportion
Proportion

0.125
0.20
0.100
R0 = 2 0.15
0.075

0.050 0.10

0.025 0.05

0.000 0.00

0 5 10 15 20 25 30 0 5 10 15 20 25 30
Active infections Active infections

Figure 5: Illustration of Theorem 2.3. Comparison of predictions and the simulations from Fig. 2 for the
number of active infections in the third generation. The bimodal case with N = 100 shows a clear impact
of population size as a sizable number of transmissions fail because the population is finite. The predictions
were made numerically using the summation in Property A.3.

for large M and any R ≤ 1. For each ym = Re2πim/M we can calculate Φg (ym ) = µ[g] (ym ) by numerically
iterating µ g times. Then for large enough M , this gives a remarkably accurate and efficient approximation
to the individual coefficients.
Example 2.5 We demonstrate Theorem 2.3 in Fig. 5, using the simulations from Example 2.2. Simula-
tions and predictions are in excellent agreement.
There is a mismatch noticeable for the bimodal distribution with R0 = 2 particularly with N = 100,
which is a consequence of the fact that the population is finite. In stochastic simulations, occasionally an
individual receives multiple transmissions even early in the outbreak, but in the PGF theory this does not
happen.
P
We are often interested in the expected number of active infections in generation g, ℓ ℓϕℓ (g) (however,
as seen below this is not the most relevant measure to use if R0 > 1). Property A.5 shows that this is given
by ∂
∂y Φg (y) . To calculate this we use Φg (1) = 1 for all g (Property A.4) and µ′ (1) = R0 . Then through
y=1

16
induction and the chain rule we show that ∂
∂y Φg (y)|y=1 = Rg0 :

∂ ∂
Φg (y) = µ(Φg−1 (y))
∂y y=1 ∂y y=1
 
′ ∂
= µ (Φg−1 (y)) × Φg−1 (y)
∂y y=1

= µ (1) × Rg−1
0
= Rg0 .

we initialized the induction with the case g = 1 which is the definition of R0 . If R0 < 1, this shows that we
expect decay.
If R0 > 1, there is a more relevant measure. On average we see growth, but a sizable fraction of
outbreaks may go extinct, and these zeros are included in the average, which alters our prediction. This is
closely related to the “push of the past” effect observed in phylodynamics [39]. For policy purposes, we are
more interested in the expected size if the outbreak is not yet extinct because a response that is scaled to
deal with the average size including those that are extinct is either too big (if the disease has gone extinct)
or too small (if the disease has become established) [36]. It is very unlikely to be just right. The expected
number infected in generation g conditional on the outbreaks not dying out by generation g is Rg0 /(1 − αg ).
This has an important consequence. We can have different extinction probabilities for different offspring
distributions with the same R0 . The disease with a higher extinction probability tends to have considerably
more infections in those outbreaks that do not go extinct.
We have

Corollary 2.1 In the infinite population limit, the expected number infected in generation g starting from
a single infection is
[I]g = Rg0 (7)
and the expected number starting from a single infection conditional on the disease persisting to generation
g is
Rg0
⟨I⟩g = (8)
1 − αg

We can explore higher moments through taking more derivatives of Φg (y) and evaluating at y = 1.

2.3 Cumulative size distribution


We now look at the total number infected while the outbreak is small. There are multiple ways to calculate
how the cumulative size of small outbreaks is distributed. We look at two of these. The first focuses just
on the number of completed infections by generation g. The second calculates the joint distribution of the
number of completed infections and the number of active infections at generation g. Later we address the
distribution of final sizes.

2.3.1 Focused approach to find the cumulative size distribution


We begin by calculating just the number of completed infections at generation g. We define ωj (g) to be the
probability that there are j completed infections at generation g (by “completed” we only include individuals
who are no longer infectious in generation g). We will use PGFs of the variable z when focusing on completed
infections.
We define X
Ωg (z) = ωj (g)z j
j

to be the PGF for the number of completed infections j at generation g. Although we use j to represent
recoveries, this model is still appropriate for SIS disease because we are interested in small outbreak sizes

17
in a well-mixed infinite population for which we can assume no previously infected individuals have been
reexposed. If the outbreak begins with a single infection, then
Ω0 (z) = 1 and Ω1 (z) = z
showing that the first individual (infectious during generation 0) completes his infection at the start of
generation 1. For generation 2 we have the initial individual and his direct offspring, so Ω2 (z) = zµ(z).
More generally, to calculate for g > 1, the completed infections consist of
• the initial infection
• the active infections in generation 1.
• any descendants of those active infections in generation 1 that will have recovered by generation g.
The distribution of the number of descendants of a generation 1 individual (including that individual) who
have recovered by generation g is given by Ωg−1 (z). That is each generation 1 individual and its descendants
for the following g − 1 infections have the same distribution as an initial infection and its descendants after
g − 1 generations.
From Property A.8 the number of descendants by generation g (not counting the initial infection) that
have recovered is distributed like µ(Ωg−1 (z)). Accounting for the initial individual requires that we increment
the count by 1 which requires increasing the exponent of z by 1. So we multiply by z. This yields
Ωg (z) = zµ(Ωg−1 (z))
To sustain an outbreak up to generation g there must be at least one infection in each generation from
0 to g − 1. So any outbreak with fewer than g completed infections at generation g must be extinct. So the
coefficient of z j does not change once g > j. Thus we have shown

Theorem 2.4 Assuming a single initial infection in an infinite population, the PGF Ωg (z) = j ωj (g)z j
P
for the distribution of the number of completed infections at generation g > 1 is given by

Ωg (z) = zµ(Ωg−1 (z)) (9)

with Ω1 (z) = z. Once g > j, the coefficient ωj (g) is constant.

Example 2.6 We test Theorem 2.4 in Fig. 6, using the simulations from Example 2.2. Simulations and
predictions are in excellent agreement.
Example 2.7 Expected cumulative size It is instructive to calculate the expected number of completed
infections at generation g. Note that Ωg (z)|z=1 = 1, µ(1) = 1, and µ′ (1) = R0 . We use induction to show
Pg−1
that for g ≥ 1 the expected number of completed infections is j=0 Rj0 :

∂ ∂
Ωg (z) = zµ(Ωg−1 (z))
∂z z=1 ∂z z=1

= µ(Ωg−1 (z)) + zµ′ (Ωg−1 (z)) Ωg−1 (z)
∂z z=1
g−2
X
= µ(1) + µ′ (1) Rj0
j=0
g−2
X
= 1 + R0 Rj0
j=0
g−1
X
= Rj0
j=0

This is in agreement with our earlier result that the expected number that are infected in generation j is

18
Poisson Bimodal
0.5

Simulated, N = 100 0.7 Simulated, N = 100


0.4
Simulated, N = 1000 0.6 Simulated, N = 1000
predicted 0.5
predicted

Proportion
Proportion
0.3

0.4

R0 = 0.75 0.2 0.3

0.2
0.1
0.1

0.0 0.0

0 5 10 15 20 25 30 0 5 10 15 20 25 30

Completed infections Completed infections


0.35
0.14

Simulated, N = 100 0.30


Simulated, N = 100
0.12
Simulated, N = 1000 Simulated, N = 1000
0.25
0.10
predicted predicted
Proportion

Proportion
0.08 0.20

R0 = 2 0.06 0.15

0.04 0.10

0.02 0.05

0.00 0.00

0 5 10 15 20 25 30 0 5 10 15 20 25 30
Completed infections Completed infections

Figure 6: Illustration of Theorem 2.4 Comparison of predictions with the simulations from Fig. 2 for the
number of completed infections at the start of the third generation. The predictions were calculated using
Property A.3.

Rj0 .
This is
1−Rg
(
∂ ′ 0
1−R0 R0 =
̸ 1
Ω (z) =
∂z g z=1 g R0 = 1
As with our previous results, the sum shows a threshold behavior at R0 = 1. If R0 < 1, then in the limit
g → ∞, the expected cumulative outbreak size converges to the finite value 1/(1−R0 ). If R0 ≥ 1, it diverges.
This example shows

Corollary 2.2 In the infinite population limit the expected number of completed infections at the start of
generation g assuming a single randomly chosen initial infection is
1−Rg
(
∂ ′ 0
R0 ̸= 1
Ωg (z) = 1−R0 (10a)
∂z z=1 g R0 = 1

For R0 ≥ 1 this diverges as g → ∞. Otherwise it converges to 1/(1 − R0 ).

2.3.2 Broader approach


An alternate approach calculates both the current and cumulative size at generation g. We let πi,r (g) be the
probability that there
P are i currently infected individuals and r completed infections in generation g. We
define Πg (y, z) = i,r πi,r (g)y i z r , so y represents the active infections and z the completed infections.

19
Assume we know the values ig−1 and rg−1 for generation g − 1. Then rg is simply ig−1 + rg−1 and
ig is distributed according to µ(y)ig−1 . So given those known ig−1 and rg−1 , the distribution for the next
generation would be [zµ(y)]ig−1 z rg−1 . Summing over all possible ig−1 and rg−1 yields
X
Πg (y, z) = πi,r (g − 1)[zµ(y)]i z r
i,r

= Πg−1 (zµ(y), z)

with the initial condition


Π0 (y, z) = y
The first few iterations are

Π1 (y, z) = zµ(y)
Π2 (y, z) = zµ(zµ(y))

and we can use induction on this to show that in general

Πg (y, z) = zµ(Πg−1 (y, z))

Theorem 2.5 Given a single initial infection in an infinite population, the PGF Πg (y, z) = i,r πi,r (g)y i z r
P
for the joint distribution of the number of active i and completed infections r in generation g is given by

Πg (y, z) = zµ(Πg−1 (y, z)) (11)

with Π0 (y, z) = y.

Example 2.8 We demonstrate Theorem 2.5 in Fig. 7, using the same simulations as in Example 2.2.
Simulations and predictions are in excellent agreement.

2.4 Small outbreak final size distribution


There are many diseases for which there have been multiple small outbreaks in recent years but no large-scale
epidemics (such as Nipah, H5N1 avian influenza, Pneumonic Plague, Monkey pox, and — prior to 2013 —
Ebola). A natural question emerges: what can we infer about the epidemic potential of these diseases? The
size distribution may help us to infer properties of the disease and in particular to estimate the probability
that R0 > 1 [8, 28, 40].
We have found that Ωg (z) gives the PGF for the number of completed infections by generation g. We
noted earlier that for a given r, once g > r, the coefficient of z r in Ωg (z) is fixed and equal to the probability
that the outbreak goes extinct after exactly r infections. Motivated by this, we look for the limit as g → ∞.
We define
Ω∞ (z) = lim Ωg (z)
g→∞

We expect this to be the PGF for the final size of the outbreaks.
We can express the pointwise limit3 as
X
Ω∞ (z) = ωr z r + ω∞ z ∞
r

where for r < ∞ the coefficient ωr is the probability an outbreak causes exactly r infections in an infinite
population. We use ω∞ to denote the probability that the outbreak is infinite in an infinite population (i.e.,
that it is an epidemic), and we interpret z ∞ as 1 when z = 1 and 0 for 0 ≤ z < 1. So if epidemics are
3 Although this converges for any given z in [0, 1], it does not do so “uniformly” if R > 1. That is, for R > 1 no matter
0 0
how large g is, there are always some values of z < 1, but sufficiently close to 1, which are far from converged.

20
Simulated Prediction log10(probability)
0.50 0

0.25
−2
0.00
−4
30

Active infections

Active infections
25 −6

20
−8
15
−10
10

5 −12

0
0 5 10 15 20 25 30 0.0 0.5 0 5 10 15 20 25 30 0.0 0.5
Completed infections Completed infections
Simulated Prediction log10(probability)
0
0.5
−1
0.0
−2
30
Active infections

Active infections
−3
25
−4
20

15 −5

10 −6

5 −7

0
0 5 10 15 20 25 30 0.0 0.5 0 5 10 15 20 25 30 0.0 0.5
Completed infections Completed infections
Simulated Prediction log10(probability)
0
0.1
−2
0.0
−4
30
Active infections

Active infections

25 −6

20
−8
15
−10
10

5 −12

0
0 5 10 15 20 25 30 0.0 0.2 0 5 10 15 20 25 30 0.0 0.2
Completed infections Completed infections
Simulated Prediction log10(probability)
0

0.2

0.0 −1

30
Active infections

Active infections

−2
25

20
−3
15

10
−4
5

0
0 5 10 15 20 25 30 0.00 0.25 0 5 10 15 20 25 30 0.00 0.25
Completed infections Completed infections

Figure 7: Illustration of Theorem 2.5. Comparison of predictions and simulations for the joint distribu-
tion of the number of current and completed infections at generation g = 3. The predictions were calculated
using Property A.3. Left: simulations from Fig. 2 for N = 1000 and Right: predictions (note vertical
scales on left and right are the same). Top to Bottom: Poisson R0 = 0.75, Bimodal R0 = 0.75, Poisson
R0 = 2, and Bimodal R0 = 2. The predictions match our observations, with some difference for two reasons:
1) because 5 × 105 simulations cannot resolve events with probabilities as small as 10−12 , but the PGF
approach can, and 2) due to finite-size effects as occasionally an individual receives multiple transmissions
even early on. The plots also show the marginal distributions, matching Figs. 5 and 6.

21
P
possible, Ω∞ (z) has a discontinuity at z = 1, and the limit as z → 1 from below gives r<∞ ωr = 1 − ω∞
which is the extinction probability α.
We now look for a recurrence relation for Ω∞ (z) in the infinite population limit. Each offspring of
the initial infection independently causes a set of infections. The distribution of the these new infections
(including the original offspring) also has PGF Ω(z). So the distribution of the number of descendants
of the initial infection (but not including the initial infection) has PGF µ(Ω∞ (z)). To include the initial
infection, we must increase the exponent of z by one, which we do by multiplying by z. We conclude that
Ω∞ (z) = zµ(Ω∞ (z)). Although we have shown that Ω∞ (z) solves f (z) = zµ(f (z)), we have not shown that
there is only one function that solves this.
We may be interested in the outbreak size distribution conditional on the outbreak going extinct. For
this we are looking at Ω∞ (z)/α for any z < 1, and at z = 1, this is simply 1. Note that if R0 < 1 then
α = 1.
Summarizing this we have
Theorem 2.6 Given a single initialP infection  in an infinite population, consider Ω∞ (z), the PGF for the

final size distribution: Ω∞ (z) = r<∞ ωr z r
+ ω ∞ z where z ∞ = 0 if |z| < 1 and 1 if |z| = 1.
• Then (
zµ(Ω∞ (z)) ̸ 1
z=
Ω∞ (z) = . (12)
1 z=1

• We have limz→1− Ω∞ (z) = α = 1 − ω∞ . If R0 > 1 then Ω∞ (z) is discontinuous at z = 1, with a


jump discontinuity of ω∞ , the probability of an epidemic.

• The PGF for outbreak size distribution conditional on the outbreak being finite is
(
Ω∞ (z)/α 0 < z < 1
1 z=1

Perhaps surprisingly we can often find the coefficients of Ω∞ (z) analytically if µ(y) is known. We use a
remarkable result showing that the probability of infecting exactly n individuals is equal to the coefficient
of z n−1 in [µ(z)]n [8, 15, 22, 51]. The theorem is

Theorem 2.7 Given an offspring distribution with PGF µ(y), for j < ∞ the coefficient of z j in Ω∞ (z)
(j) P (j)
is 1j pj−1 where [µ(y)]j = i pi y i .
That is, the probability of having exactly j < ∞ infections in an outbreak starting from a single infection
is 1j times the coefficient of y j−1 in [µ(y)]j .

We prove this theorem in Appendix B. The proof is based on observing that if we draw a sequence
of j numbers from the offspring distribution, the probability they sum to j − 1 (corresponding to j − 1
transmissions and hence j infected individuals including the index case) is the coefficient of z j−1 in [µ(z)]j .
A fraction 1/j of these satisfy additional constraints needed to correspond to a valid transmission tree4 and
(j)
thus the probability of a valid transmission tree with exactly j − 1 transmissions is 1/j times pj−1 .
 j−1
1 d
Because the coefficient of y j−1 in [µ(y)]j is (j−1)! dy [µ(y)]j y=0 (by Property A.2), we have that
the probability of an outbreak of size j is
 j−1
1 d
[µ(y)]j
j! dy
y=0

It is enticing to think there may be a similar theorem for coefficients of Π(y, z), but we are not aware of
one. The theorem has been generalized to models having multiple types of individuals [28].
4 If the index case causes 0 infections and its first offspring causes 1 infection, we have a sequence of two numbers that sum

to 1, but it is biologically meaningless because it does not make sense to talk about the first offspring of an individual who
causes no infections.

22
Poisson Bimodal
0.5

Simulated, N = 100 0.7 Simulated, N = 100


0.4
Simulated, N = 1000 0.6 Simulated, N = 1000
jth coeff of Ω∞(z) jth coeff of Ω∞(z)

Probability
Probability
0.5
0.3
1 1
j × (j − 1th coeff of [µ(y)]j ) 0.4 j × (j − 1th coeff of [µ(y)]j )
R0 = 0.75 0.2 0.3

0.2
0.1
0.1

0.0 0.0

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14

Final Size Final Size


0.35
0.14

Simulated, N = 100 0.30


Simulated, N = 100
0.12
Simulated, N = 1000 Simulated, N = 1000
0.25
0.10
jth coeff of Ω∞(z) jth coeff of Ω∞(z)
Probability

Probability
1 1
0.08
j × (j − 1th coeff of [µ(y)]j ) 0.20
j × (j − 1th coeff of [µ(y)]j )
R0 = 2 0.06 0.15

0.04 0.10

0.02 0.05

0.00 0.00

0 2 4 6 8 10 12 14 0 2 4 6 8 10 12 14
Final Size Final Size

Figure 8: Illustration of Theorems 2.6 and 2.7. The final size of small outbreaks predicted by Theo-
rem 2.6 and by Theorem 2.7 as calculated using Property A.3 matches observations from the simulations in
Fig. 2 (see also insets of Fig. 2).

Example 2.9 We demonstrate Theorems 2.6 and 2.7 in Fig. 8, using the same simulations as in Exam-
ple 2.2.
Example 2.10 The PGF for the negative binomial distribution with parameters p and r̂ (with q = 1 − p)
is  r̂
q
µ(y) =
1 − py
We can rewrite this as
µ(y) = q r̂ (1 − py)−r̂
We will use this to find the final size distribution. We expand [µ(y)]j = q r̂j (1 − py)−r̂j using the binomial
series
η(η − 1) 2 η(η − 1) · · · (η − i + 1) i
(1 + δ)η = 1 + ηδ + δ + ··· + δ + ···
2! i!
which holds for integer or non-integer η. Then with −py, −r̂j, and j − 1 playing the role of δ, η, and i:

[µ(y)]j = q r̂j (1 − py)−r̂j


r̂j(r̂j + 1) · · · (r̂j + j − 2) j−1 j−1
 
r̂j(r̂j + 1) 2 2
= q r̂j 1 + r̂jpy + p y + ··· + p y + ···
2! (j − 1)!
(r̂j+j−2)!
= r̂j+j−2
 r̂j j−1
[the negatives all cancel]. So the coefficient of y j−1 is q r̂j pj−1 (r̂j−1)!(j−1)! j−1 q p (assuming
r̂ is an integer). Looking at 1/j times this, we conclude that the probability an outbreak infects exactly j

23
individuals is
1 r̂j + j − 2 r̂j j−1
 
q p
j j−1
A variation of this result for non-integer r̂ is commonly used in work estimating disease parameters [8, 40].
Exercise 2.12 generalizes the formula for this.
Applying Theorem 2.7 to several different families of distributions yields Table 6 for the probability of a
final size j.

2.4.1 Inference based on outbreak sizes


A major challenge in infectious disease modeling is inferring parameters of an infectious disease. In Section 2.4
we alluded to the use of PGFs to infer disease properties from observations of the size distribution of small
outbreaks. In this section we describe how to do this using a Bayesian approach, using the probabilities
given in Table 6. A number of researchers have used this approach to estimate disease parameters [8, 40, 28]
We assume that we know what type of distribution the offspring distribution, but that there are some
unknown parameters. We also assume that we have some prior belief about the probability of various
parameters. For practical purposes, we will assume that we have some finite number of possible parameter
values, each with a probability.
We use Bayes’ Theorem [21]:

P (Θ, X) P (X|Θ)P (Θ)


P (Θ|X) = = (13)
P (X) P (X)

Here we think of Θ as the specific parameter values and X as the observed data (typically the observed size
of an outbreak or sizes of multiple independent outbreaks, in which case P (X|Θ) comes from Theorem 2.7 or
Table 6). In our calculations we can simply use the fact that P (Θ|X) ∝ P (X|Θ)P (Θ) with a normalization
constant which can be dealt with at the end.
The prior for Θ is the probability distribution we assume for the parameter values before observing the
data, given by P (Θ). We often simply assume that all parameter values are equally probable initially.
The likelihood of the parameters Θ is defined to be P (X|Θ), the probability that we would observe X
for the given parameter values. If we are choosing between two sets of parameter values Θ1 and Θ2 and
the observations have consistently higher likelihood for Θ2 , then we intuitively expect that Θ2 is the more
probable parameter value.
In practice the likelihood may be very small which can lead to numerical error. It is often useful to
instead look at log-likelihood 5 , log P (X|Θ). For example, if we have many observed outbreak sizes, the
likelihood P (X|Θ) under independence is the product of the probabilities of each individual outbreak size.
The likelihood is thus quite small (perhaps less than machine precision), while the log-likelihood is simply
the sum of the log-likelihoods of each individual observation.
We know that
log P (Θ|X) − C = log P (X|Θ) + log P (Θ)
where C is the logarithm of the proportionality constant 1/P (X) in Equation (13). If we have a prior and
the likelihood, the right hand side can be calculated. It is often possible (and advisable) to calculate the log
likelihood log P (X|Θ) directly rather than calculating P (X|Θ) and then taking the logarithm.
Exponentiating the right hand side and then finding the appropriate normalization constant will yield
P (Θ|X). Numerically the numbers may be very small when we exponentiate, so to prior to exponentiating
it is advisable to add a constant value to all of the expressions. This constant is corrected for in the final
normalization step.
We now provide the steps for a numerical calculation of P (Θ|X) given the prior P (Θ), the observations
X, and the log likelihood log P (X|Θ).

1. For each Θ, calculate f (Θ) = log P (X|Θ) + log P (Θ).


5 Throughout this section, we assume that log is taken with base e.

24
2. Find the maximum Xmax over all Θ and subtract it to yield fˆ(Θ) = log P (X|Θ) + log P (Θ) − Xmax .
Note that Xmax ≤ 0, and this brings all of our numbers closer to zero.
ˆ ˆ
3. Calculate g(Θ) = ef (Θ) . This will be proportional to P (Θ|X). Note that by using ef (Θ) rather than
ef (Θ) we have reduced the impact of roundoff error.
4. Find the normalization constant Θ′ g(Θ′ ). Then
P

g(Θ)
P (Θ|X) = P ′
Θ′ g(Θ )

Note that if Θ comes from a continuous distribution rather than a discrete distribution, then the same
approach works, except that P is a probability density and the summation in the final step becomes an
integral.
Example 2.11 A frequent assumption is that the offspring distribution is negative binomial. Let us make
this assumption with unknown p and r̂.
To artificially simplify the problem, we assume that we know that there are only two possible pairs of
Θ = (p, r̂), namely Θ1 = (p1 , r̂1 ) = (0.02, 40) or Θ2 = (p2 , r̂2 ) = (0.03, 20), and that our a priori belief is
that they are equally probable.
After observing 2 independent outbreaks, with total sizes j1 = 8 and j2 = 7, we want to use our
observations to update P (Θ).
From Table 6, the likelihood of a given Θ given the two independent observations is
 
Y 1 r̂j + j − 2
f (Θ) = log q r̂j pj−1  + log 0.5
j=7,8
j j − 1
 

 
X 1 r̂j + j 2
= log q r̂j pj−1  + log 0.5
j=7,8
j j − 1
 
X
= log((r̂j + j − 2)!) − log(j!) − log((r̂j − 1)!) + r̂j log q + (j − 1) log p + log 0.5
j=7,8

In problems like this, we will often encounter logarithms of factorials. Many programming languages provide
this, typically using Stirling’s approximation. For example, Python, R, and C++ all have a special function
lgamma which calculates the natural log of the absolute value of the gamma functiona We find

f (Θ1 ) ≈ −8.495
f (Θ2 ) ≈ −9.135

So fˆ(Θ1 ) = 0 and fˆ(Θ2 ) ≈ −0.640. Exponentiating, we have

g(Θ1 ) = 1
g(Θ2 ) ≈ 0.5277

So now
1
P (Θ1 |X) ≈ ≈ 0.6546
1.5277
0.5277
P (Θ2 |X) ≈ ≈ 0.3454
1.5277

25
So rather than the two parameter sets being equally probable, Θ2 is now about half as likely as Θ1 given the
observed data.
a The Gamma function is an analytic function that satisfies Γ(n) = (n + 1)! for positive integer values so to calculate log(n!)

we use lgamma(n + 1).

2.5 Generality of discrete-time results


Thus far we have measured time in generations. However, many models measure time differently and different
generations may overlap. For both SIS and SIR disease, our results above about final size distribution or
extinction probability still apply. To see this, we note first that our results have been derived assuming that
the population is infinite and well-mixed so no individuals receive multiple transmissions. Regardless of the
clock time associated with transmission and recovery, there is still a clear definition of the length of the
transmission chain to an infected individual. Once we group individuals by length of the transmission chain,
we get the generation-based model used above. This equivalence is studied more in [53, 31].

2.6 Exercises
Exercise 2.1 Monotonicity of αg
a. By considering the biological interpretation of αg , explain why the sequence of inequalities 0 = α0 ≤
α1 ≤ · · · ≤ 1 should hold. That is, explain why α0 = 0, why the αi form a monotonically increasing
sequence, and why all of them are at most 1.
b. Show that αg therefore converges to some non-negative limit α that is at most 1 and that α = µ(α).
c. Use Property A.9 to show that if µ(0) ̸= 0 there exists a unique α < 1 solving α = µ(α) if and only if
R0 = µ′ (1) > 1.
d. Assuming µ(0) ̸= 0, use Property A.9 to show that if R0 > 1 then αg converges to the unique α < 1
solving α = µ(α), and otherwise αg converges to 1.
Exercise 2.2 Use Theorem 2.2 to prove Theorem 2.1.
Exercise 2.3 Show that if µ(0) = 0, then limg→∞ αg = 0. By referring to the biological interpretation of
µ(0) = 0, explain this result.
Exercise 2.4 Find all PGFs µ(y) with R0 ≤ 1 and µ(0) = 0. Why were these excluded from Theorem 2.2?
Exercise 2.5 Larger initial conditions
Assume that disease is introduced with m infections rather than just 1, or that it is not observed by
surveillance until m infections are present. Assume that the offspring distribution PGF is µ(y).
a. If m is known, find the extinction probability.
b. If m is unknown but its distribution has PGF h(y), find the extinction probability.
Exercise 2.6 Extinction probability
Consider a disease in which p0 = 0.1, p1 = 0.2, p2 = 0.65, and p3 = 0.05 with a single introduced
infection.
a. Numerically approximate the probability of extinction within 0, 1, 2, 3, 4, or 5 generations up to five
significant digits (assuming an infinite population).

b. Numerically approximate the probability of eventual extinction up to five significant digits (assuming
an infinite population).
c. A surveillance program is being introduced, and detection will lead to a response. But it will not be
soon enough to affect the transmissions from generations 0 and 1. From then on p0 = 0.3, p1 = 0.4,
p2 = 0.3, and p3 = 0. Numerically approximate the new probability of eventual extinction after an
introduction in an unbounded population [be careful that you do the function composition in the right

26
order – review Properties A.1 and A.8].
Exercise 2.7 We look at two inductive derivations of Φg (y) = µ[g] (y). They are similar, but when adapted
to the continuous-time dynamics we study later, they lead to two different models. We take as given that
Φg−1 (y) gives the distribution of the number of infections caused after g−1 generations starting from a single
case. One argument is based on discussing the results of outcomes attributable to the infectious individuals
of generation g − 1 in the next generation. The other is based on the outcomes indirectly attributable to the
infectious individuals of generation 1 through their descendants after another g − 1 generations.
a. Explain why Property A.8 shows that Φg (y) = Φg−1 (µ(y)).

b. (without reference to a) Explain why Property A.8 shows that Φg (y) = µ(Φg−1 (y)).
Exercise 2.8 Use Theorem 2.3 to prove the first part of Theorem 2.2.
Exercise 2.9 How does Corollary 2.1 change if we start with k infections?
Exercise 2.10 Assume the PGF of the offspring size distribution is µ(y) = (1 + y + y 2 )/3.

a. What offspring size distribution yields this PGF?


b. Find the PGF Ωg (z) for the number of completed infections at 0, 1, 2, 3, and 4 generations [it may
be helpful to use a symbolic math program once g > 2.].
c. Check that for these cases, once g > r, the coefficient of z r does not change.
Exercise 2.11 By setting y = 1, use Theorem 2.5 to prove Theorem 2.4.
Exercise 2.12 Redo example 2.10 if r̂ is a real number, rather than an integer. It may be useful to use
the Γ–function, which satisfies Γ(x + 1) = xΓ(x) for any x and Γ(n + 1) = n! for integer n.
Exercise 2.13 Except for the negative binomial case done in example 2.10, derive the probabilities in
Table 6.
a. For the Poisson distribution, use Property A.2.
b. For the Uniform distribution, use Property A.2.
Pc c

c. For the Binomial distribution, use the binomial theorem: (a + b)c = i=0 i ai bc−i .

d. For the Geometric distribution, follow example 2.10 (noting that p and q interchange roles).
Exercise 2.14 To help model continuous-time epidemics, Section 3 will use a modified version of µ, which
in some contexts will be written as µ̂(y, z). To help motivate the use of two variables, we reconsider the
discrete case. We think of a recovery as an infected individual disappearing and giving birth to a recovered
individual and a collection of infected individuals.PLook back at the discrete-time calculation of Ωg and Πg .
Define a two-variable version of µ as µ(y, z) = z i ri y i = zµ(y).
a. What is the biological interpretation of µ(y, z) = zµ(y)?
b. Rewrite the recursive relations for Ωg using µ(y, z) rather than µ(y).

c. Rewrite the recursive relations for Πg using µ(y, z) rather than µ(y).
The choice to use µ(y, z) versus µ(y) is purely a matter of convenience.
Exercise 2.15 Consider Example 2.11. Assume that a third outbreak is observed with 4 infections. Cal-
culate the probability of Θ1 and Θ2 given the data starting

a. with the assumption that P (Θ1 ) = P (Θ2 ) = 0.5 and X consists of the three observations j = 7, j = 8,
and j = 4.
b. with the assumption that P (Θ1 ) = 0.6546 and P (Θ2 ) = 0.3454 and X consists only of the single
observation j = 4.

27
c. Compare the results and explain why they should have the relation they do.
Exercise 2.16 Assume that we know a priori that the offspring distribution for a disease has a negative
binomial distribution with p = 0.02. Assume that our a priori knowledge of r̂ is that it is an integer
uniformly distributed between 1 and 80 inclusive. Given observed outbreaks of sizes 1, 4, 5, 6, and 10:

a. For each r̂, calculate P (r̂|X) where X is the observed outbreak sizes. Plot the result.
b. Find the probability that R0 = µ′ (1) is greater than 1.

3 Continuous-time spread of a simple disease


We now develop PGF-based approaches adapting the results above to continuous-time processes. In the
continuous-time framework, generations will overlap, so we need a new approach if we want to answer
questions about the probability of being in a particular state at time t rather than at generation g. Questions
about the final state of the population can be answered using the same techniques as for the discrete case,
but the techniques introduced here also apply and yield the same predictions. Unlike Section 2, we do not
do a detailed comparison with simulation.
In the continuous-time model, infected individuals have a constant rate of recovery γ and a constant rate
of transmission β. Then γ/(β + γ) is the probability that the first event is a recovery, while β/(β + γ) is the
probability it is a transmission. If the event is a recovery, then the individual is removed from the infectious
population. If the event is a transmission, then the individual is still available to transmit again, with the
same rate. If the recipient of a transmission is susceptible, it becomes infectious.
Unlike the discrete-time case, we do not focus on the offspring distribution. Rather, we focus on the
resulting number of infected individuals after an event. Early on we treat the process as if as if each infected
individual were removed and replaced by either 2 or 0 new infections. Although this is not the true process
(she either recovers or she creates one additional infection and remains present), it is equivalent as far as the
number of infections Pat any early time is concerned. We focus on a PGF for the outcome of the next event.
We define µ̂(y) = i p̂i y i and so
β γ
µ̂(y) = y2 + (14a)
β+γ β+γ
When we are calculating the number of completed cases, it will be useful to have a two-variable version of µ̂:
β γ
µ̂(y, z) = y2 + z. (14b)
β+γ β+γ
Most of the results in this section are the continuous-time analog of the discrete-time results above for
the infinite population limit. In the discrete-time approach we did not attempt to address outbreaks in finite
populations. However, we end the continuous-time section by deriving the equations for Ξ(x, y, t), the PGF
for the joint distribution of the number of susceptibles and active infections in a population of finite size N .

3.1 Extinction probability


For the extinction probability, we can apply the same methods derived in the discrete case to µ̂(y). Thus we
can find the extinction probability iteratively starting from the initial guess α0 = 0 and setting αg = µ̂(αg−1 ).
Exercises 3.1 and 3.2 each show that

Theorem 3.1 For the continuous-time Markovian model of disease spread in an infinite population, the
probability of extinction given a single initial infection is

α = min(1, γ/β) (15)

28
3.1.1 Extinction probability as a function of time
In the discrete-time case, we were interested in the probability of extinction after some number of generations.
When we are using a continuous-time model, we are generally interested in “what is the probability of
extinction by time t?”
To answer this, we set α(t) to be the probability of extinction within time t. We will calculate the
derivative of α at time t by using some mathematical sleight of hand to find α(t + ∆t) − α(t). Then dividing
this by ∆t and taking ∆t → 0 will give the result. Our approach is closely related to backward Kolmogorov
equations (described later below).
We choose the time step ∆t to be small enough that we can assume that at most one event happens
between time 0 and ∆t. The probabilities of having 0, 1, or 2 infections are P (I(∆t) = 0) = γ∆t + O(∆t),
P (I(∆t) = 1) = 1 − (β + γ)∆t + O(∆t) and P (I(∆t) = 2) = β∆t + O(∆t) where the O notation means
that the error goes to zero fast enough that O(∆t)/∆t → 0 as ∆t → 0. The probability of having 3 or more
infections in the interval (that is, multiple transmission events) is O(∆t) as well.
If there are two infected individuals at time ∆t, then the probability of extinction by time t + ∆t is α(t)2 .
Similarly, if there is one infected at time ∆t, the probability of extinction by time t + ∆t is α(t); and if there
are no infections at time ∆t, then the probability of extinction by time t + ∆t is 1 = α(t)0 . So up to O(∆t)
we have

X
α(t + ∆t) = P (I(∆t) = i) α(t)i
i=0
= [γ∆t]α(t)0 + [1 − (β + γ)∆t]α(t) + [β∆t]α(t)2 + O(∆t)
= α(t) + ∆t(β + γ) [µ̂(α(t)) − α(t)] + O(∆t) (16)

Thus
α̇ = lim [α(t + ∆t) − α(t)]/∆t = (β + γ) [µ̂(α) − α]
∆t→0

and so

Theorem 3.2 Given an infinite population with constant transmission rate β and recovery rate γ, then
α(t), the probability of extinction by time t assuming a single initial infection at time 0 solves

α̇ = (β + γ) [µ̂(α) − α] (17)

with µ̂(y) = (βy 2 + γ)/(β + γ) and the initial condition α(0) = 0.

We could solve this analytically (Exercise 3.4), but most results are easier to derive directly from the
ODE formulation.

3.2 Early-time outbreak dynamics


We now explore the number of infections at time t. We define the PGF
X
Φ(y, t) = ϕi (t)y i
i

where ϕi (t) is the probability of i actively infected individuals at time t. We will derive equations for the
evolution of Φ(y, t). We assume that Φ(y, 0) = y so a single infected individual exists at time 0.
Our goal is to derive equations telling us how Φ changes in time. We will use two approaches which were
hinted at in exercise 2.7, yielding two different partial differential equations. Although their appearance is
different, for the appropriate initial condition, their solutions are the same. These equations are called the
forward and backward Kolmogorov equations.
We briefly describe the analogy between the forward and backward Kolmogorov equations and exer-
cise 2.7:

29
• Our first approach finds the forward Kolmogorov equations. This is akin to exercise 2.7 where we
found Φg (y) by knowing the PGF Φg−1 (y) for the number infected in generation g − 1 and recognizing
that since the PGF for the number of infections each of them causes is µ(y), we must have Φg (y) =
Φg−1 (µ(y)).
• Our second approach finds the backward Kolmogorov equations which are more subtle and can be
derived similarly to how we derived the ODE for extinction probability in Theorem 3.2. This is akin to
exercise 2.7 where we found Φg (y) by knowing that the PGF for the number infected in generation 1 is
µ(y), and recognizing that after another g − 1 generations each of those creates a number of infections
whose PGF is Φg−1 (y) and so Φg (y) = µ(Φg−1 (y)).
For both approaches, we make use of the observation that for ∆t ≪ 1, we can write the PGF for the
number of infections resulting from a single infected individual at time t = 0 to be

Φ(y, ∆t) = y + (y 2 − y)β∆t + (1 − y)γ∆t + O(∆t) .

This says that with probability approximately β∆t a transmission happens and we replace y by y 2 and with
probability approximately γ∆t a recovery happens and we replace y by 1. With probability O(∆t) multiple
events happen. We can rewrite this as

Φ(y, ∆t) = y + (β + γ)[µ̂(y) − y]∆t + O(∆t) .



Note that Φ(y, 0) = y and ∂t Φ(y, 0) = (β + γ)[µ̂(y) − y].
Both of our approaches rely on the observation that Φ(y, t1 + t2 ) = Φ(Φ(y, t2 ), t1 ) by Property A.8.
This states that if we take the PGF at time t1 , and then substitute for each y the PGF for the number of
descendants of a single individual after t2 units of time, the result is the PGF for the total number at time
t1 + t2 .

Forward equations For this we use Φ(y, t1 + t2 ) = Φ(Φ(y, t2 ), t1 ) with t2 playing the role of ∆t and t1
playing the role of t.
So Φ(y, t + ∆t) = Φ(Φ(y, ∆t), t). For small ∆t (and taking Φy (Φ(y, 0), t) to be the partial derivative of
Φ with respect to its first argument), we have

Φ(y, t + ∆t) = Φ(Φ(y, ∆t), t)



= Φ(Φ(y, 0), t) + (∆t)Φy (Φ(y, 0), t) Φ(y, 0) + O(∆t)
∂t

= Φ(y, t) + (∆t)(β + γ)[µ̂(y) − y] Φ(y, t) + O(∆t) .
∂y
Then
Φ(y, t + ∆t) − Φ(y, t)
Φ̇(y, t) = lim
∆t→0 ∆t

Φ(y, t) + (∆t)(β + γ)[µ̂(y) − y] ∂y Φ(y, t) + O(∆t) − Φ(y, t)
= lim
∆t→0 ∆t

= (β + γ)[µ̂(y) − y] Φ(y, t) .
∂y
More generally, we can directly apply Property A.10 to get this result. Exercise 3.6 provides an alternate
direct derivation of these equations.

Backward equations In the backward direction we have Φ(y, t1 + t2 ) = Φ(Φ(y, t2 ), t1 ) with t2 playing
the role of t and t1 playing the role of ∆t.

30
So Φ(y, t + ∆t) = Φ(y, ∆t + t) = Φ(Φ(y, t), ∆t). Note that because Φ(y, 0) = y, we have Φ(Φ(y, t), 0) =
Φ(y, t). Thus for small ∆t, we expand Φ as a Taylor Series in its second argument t

Φ(y, t + ∆t) = Φ(Φ(y, t), ∆t)


= Φ(Φ(y, t), 0) + (∆t)Φt (Φ(y, t), 0) + O(∆t)
= Φ(y, t) + (∆t)Φt (Φ(y, t), 0) + O(∆t)
= Φ(y, t) + (∆t)(β + γ)[µ̂(Φ(y, t)) − Φ(y, t)] + O(∆t) .

To avoid ambiguity, we use Φt to denote the partial derivative of Φ with respect to its second argument t.
So
Φ(y, t + ∆t) − Φ(y, t)
Φ̇(y, t) = lim
∆t→0 ∆t
Φ(y, t) + (∆t)(β + γ)[µ̂(Φ(y, t)) − Φ(y, t)] + O(∆t) − Φ(y, t)
= lim
∆t→0 ∆t
= (β + γ)[µ̂(Φ(y, t)) − Φ(y, t)] .

This result also follows directly from Property A.12.


So we have
Theorem 3.3 The PGF Φ(y, t) for the distribution of the number of current infections at time t assuming
a single introduced infection at time 0 solves
∂ ∂
Φ(y, t) = (β + γ) [µ̂(y) − y] Φ(y, t) (18)
∂t ∂y
as well as
∂  
Φ(y, t) = (β + γ) µ̂(Φ(y, t)) − Φ(y, t) . (19)
∂t
both with the initial condition Φ(y, 0) = y.

It is perhaps remarkable that such seemingly different equations yield the same solution for the given
initial condition.
P
Example 3.1 The expected number of infections in the infinite population limit is given by [I] = i ipi (t) =

∂y Φ(1, t). From this we have

d ∂ ∂
[I] = Φ(y, t)
dt ∂t ∂y y=1
 
∂ ∂
= (β + γ)[µ̂(y) − y] Φ(y, t)
∂y ∂y y=1
∂ ∂2
= (β + γ)[µ̂′ (y) − 1] Φ(y, t) + (β + γ)[µ̂(y) − y] Φ(y, t)]
∂y ∂y 2 y=1
 2 

= (β + γ)[µ̂′ (1) − 1][I] + (β + γ)[µ̂(1) − 1] Φ(y, t)
∂y 2 y=1
= (β + γ)[(2β)/(β + γ) − 1][I]
= (β − γ)[I]

∂2
We used µ̂(1) = 1 to eliminate the ∂y 2 Φ(y, t) term and replaced µ̂′ (1) with 2β/(β + γ). Using this and
[I](0) = 1, we have
[I] = e(β−γ)t .

This example proves

31
Corollary 3.1 In the infinite population limit, if a disease starts with a single infection, then the expected
number of active infections at time t solves

[I] = e(β−γ)t (20)

3.3 Cumulative and current outbreak size distribution


Let πi,r (t) be the probability of having i currently infected individuals and r completed infections at time t.
We define Π(y, z, t) = ir πi,r (t)y i z r to be the PGF at time t. We have Π(y, z, 0) = y. As before we assume
P
the population is large enough that the spread of the disease is not limited by the size of the population.
We give an abbreviated derivation of the Kolmogorov equations for Π. A full derivation is requested as
an exercise.

Forward Kolmogorov formulation To derive the forward Kolmogorov equations for the PGF Π(y, z, t),
we use Property A.11, noting that all transition rates are proportional to i. The rate of transmission is βi
and the rate of recovery is γi. There are no interactions to consider. So

βy 2
 
∂ γz ∂
Π(y, z, t) = (β + γ) + −y Π(y, z, t)
∂t β+γ β+γ ∂y
 ∂
= (β + γ) µ̂(y, z) − y Π(y, z, t)
∂y

Backward Kolmogorov formulation To derive the backward Kolmogorov equations for the PGF Π, we
use a modified version of Property A.12 to account for two types of individuals (Exercise A.14, with events
proportional only to the infected individuals). We find

Π̇(y, z, t) = (β + γ)[µ̂(Π(y, z, t), z) − Π(y, z, t)] .

Combining our backward and forward Kolmogorov equation results, we get

Theorem 3.4 Assuming a single initial infection in an infinite population, the PGF Π(y, z, t) for the
joint distribution of the number of current and completed infections at time t solves
∂  ∂
Π(y, z, t) = (β + γ) µ̂(y, z) − y Π(y, z, t) (21)
∂t ∂y
as well as
∂  
Π(y, z, t) = (β + γ) µ̂(Π(y, z, t), z) − Π(y, z, t) (22)
∂t
both with the initial condition Π(y, z, 0) = y.

It is again remarkable that these seemingly very different equations have the same solution.
Example 3.2 The expected number of completed infections at time t is
X ∂
[R] = kpjk = Π(y, z, t)
∂z y=z=1
j,k

(although we use R, this approach is equally relevant for counting completed infections in the SIS model

32
because of the infinite population assumption). Its evolution is given by

d ∂ ∂
[R] = Π(y, z, t)
dt ∂t ∂z y=z=1
 
∂  ∂
= (β + γ) µ̂(y, z) − y Π(y, z, t)
∂z ∂y y,z=1
 
∂ ∂ ∂ ∂
= (β + γ) µ̂(y, z) Π(y, z, t) + [µ̂(y, z) − y] Π(y, z, t)
∂z ∂y ∂z ∂y y=z=1
 
γ ∂ ∂ ∂
= (β + γ) Π(y, z, t) + 0 Π(y, z, t)
β + γ ∂y ∂z ∂y y=z=1
= γ[I]

∂ ∂
where we use the fact that µ̂(1, 1) = 1, ∂z µ̂(y, z) = γ/(β + γ), and [I] = ∂y Π(y, z, t) . Our result
y=z=1
says that the rate of change of the expected number of completed infections is γ times the expected number
of current infections.
This example proves

Corollary 3.2 In the infinite population limit the expected number of recovered individuals as a function
of time solves
d
[R] = γ[I] (23)
dt

We will see that this holds even in finite populations.

3.4 Small outbreak final size distribution


We define  
X
Ω∞ (z) =  ωj z j  + ω∞ z ∞
j<∞

to be the PGF of the distribution of outbreak final sizes in an infinite population, with ω∞ z ∞ representing
epidemics and for j < ∞ ωj representing the probability of an outbreak that infects exactly j individuals.
We use the convention that z ∞ = 0 for z < 1 and 1 for z = 1. To calculate Ω∞ , we make observations
that the outbreak size coming from a single infected individual is 1 if the first thing that individual does is
a recovery or it is the sum of the outbreak sizes of two infected individuals if the first thing the individual
does is to transmit (yielding herself and her offspring).
Thus we have
β γ
Ω∞ (z) = [Ω∞ (z)]2 + z
β+γ β+γ
= µ̂(Ω∞ (z), z)

As for the discrete-time case we may solve this iteratively, starting with the guess Ω∞ (z) = z. Once n
iterations have occurred, the first n coefficients of Ω∞ (z) remain constant. Note that unlike the discrete
case, here Ω∞ (z) ̸= z µ̂(Ω∞ (z)). This yields

Theorem 3.5 The PGF Ω∞ (z) = j ωj z j +ω∞ z ∞ for the final size distribution assuming a single initial
P
infection in an infinite population solves

Ω∞ (z) = µ̂(Ω∞ (z), z) (24)

with Ω∞ (1) = 1. This function is discontinuous at z = 1. For the final size distribution conditional on

33
the outbreak being finite, the PGF is continuous and equals
(
Ω∞ (z)/α 0 ≤ z < 1
1 z=1

As in the discrete-time case, we can find the coefficients of Ω∞ (z) analytically.

Theorem 3.6 Consider continuous-time outbreaks with transmission rate β and recovery rate γ in an
infinite population with a single initial infection. The probability the outbreak causes exactly j infections
for j < ∞ [that is, the coefficient of z j in Ω∞ (z)] is

1 β j−1 γ j 2j − 2
 
ωj =
j (β + γ)2j−1 j − 1

We prove this theorem in appendix B. The proof is based on observing that if there are j total infected
individuals, this requires j − 1 transmissions and j recoveries. Of the sequences of 2j − 1 events that have
the right number of recoveries and transmissions, a fraction 1/(2j − 1) of these satisfy additional constraints
required to be a valid sequence leading to j infections (the sequence cannot lead to 0 infections prior to the
last step). Alternately, we can note that the offspring distribution is geometric and use Table 6.

3.5 Full dynamics in finite populations


We now derive the PGFs for continuous time SIS and SIR outbreaks in a finite population.
PGF-based techniques are easiest when we can treat events as independent. In the continuous-time
model, when we look at the system in a given state, each event is independent of the others. Once the next
event happens the possible events change, but conditional on the new state, they are still independent. Thus
we can use the forward Kolmogorov approach (the backward Kolmogorov approach will not work because
descendants of any individual are not independent).
We do not look at the discrete-time version because in a single time step, multiple events can occur, some
of which affect one another. So we would lose independence as we go from one time step to another.
For these reasons we focus on the forward Kolmogorov formulations for the continuous-time models.
Much of our approach here was derived previously in [5, 7]. See also [1]
For a given population size N , we let s, i, and r be the number of susceptible, infected and immune
(removed) individuals. For the SIS model r = 0 and we have s + i = N while for the SIR model we have
s + i + r = N . We let ps,i be the probability of s susceptible and i infected individuals.

3.5.1 SIS
We start with the SIS model. We set ξs,i (t) to be the probability of s susceptible and i actively infected
individuals at time t. We define the PGF for the joint distribution of susceptible and infected individuals
X
Ξ(x, y, t) = ξs,i (t)xs y i
i

β
At rate N si, successful transmissions occur, moving the system from the state (s, i) to (s − 1, i + 1), which is
equivalent to removing one susceptible individual and one infected individual, and replacing them with two
infected individuals. Following property A.11, this is represented by
β 2 ∂ ∂
(y − xy) Ξ.
N ∂x ∂y
At rate γi, recoveries occur, moving the system from the state (s, i) to (s + 1, i − 1), which is equivalent to
removing one infected individual and replacing it with a susceptible individual. This is represented by

γ(x − y) Ξ.
∂y

34
So the PGF solves
β 2 ∂ ∂ ∂
Ξ̇ = (y − xy) Ξ + γ(x − y) Ξ
N ∂x ∂y ∂y
It is sometimes useful to rewrite this as
 
β ∂ ∂
Ξ̇ = (y − x) y −γ Ξ
N ∂x ∂y
We have
Theorem 3.7 For SIS dynamics in a finite population we have
∂ β ∂ ∂ ∂
Ξ = (y 2 − xy) Ξ + γ(x − y) Ξ (25)
∂t N ∂x ∂y ∂y

We can use this to derive equations for the expected number of susceptible and infected individuals.
Example 3.3 We use [S] and [I] to denote the expected number of susceptible and infected individuals at
time t. We have
X X ∂
[S] = sξsi (t) = sξsi 1s−1 1i = Ξ(1, 1, t)
s,i s,i
∂x
X X ∂
[I] = iξsi (t) = iξsi 1s 1i−1 = Ξ(1, 1, t)
s,i s,i
∂y

We also define the expected value of the product si,


X ∂ ∂
[SI] = siξsi (t) = Ξ(1, 1, t) .
s,i
∂x ∂y

Then we have
∂ ∂
[Ṡ] = Ξ(1, 1, t)
∂t ∂x
∂ ∂
= Ξ(x, y, t)
∂x ∂t x=y=1
   
∂ β ∂ ∂
= (y − x) y −γ Ξ
∂x N ∂x ∂y x=y=1
    
∂ β ∂ ∂ β ∂ ∂
= (y − x) y −γ Ξ − y −γ Ξ
∂x N ∂x ∂y N ∂x ∂y x=y=1
β
= − [SI] + γ[I]
N
In the final line, we eliminated the first term because y − x is zero at x = y = 1. Similar steps show that

˙ = β
[I] [SI] − γ[I]
N
but the derivation is faster if we simply note [S] + [I] = N is constant. This proves

Corollary 3.3 For SIS disease, the expected number infected and susceptible solves
d β
[S] = − [SI] + γ[I] (26)
dt N
d β
[I] = [SI] − γ[I] (27)
dt N

35
where [SI] is the expected value of the product si.

3.5.2 SIR
Now we consider the SIR model. A review of various techniques (including PGF-based methods) to find
the final size distribution of outbreaks in finite-size populations can be found in [23]. Here we focus on the
application of PGFs to find the full dynamics. For a given s and i, infection occurs at rate βsi/N . It appears
as a departure from the state (s, i) and entry into (s − 1, i + 1). Following property A.11, this is captured by
β 2 ∂ ∂
(y − xy) Ξ.
N ∂x ∂y
Recovery is captured by

γ(1 − y) Ξ
∂y
[note the difference from the SIS case in the recovery term]. So we have

Theorem 3.8 For SIR dynamics in a finite population we have

∂ β(y 2 − xy) ∂ ∂ ∂
Ξ= Ξ + γ(1 − y) Ξ (28)
∂t N ∂x ∂y ∂y

We follow similar steps to example 3.3 to derive equations for [S] and [I] in Exercise 3.16. The result of
this exercise should show
Corollary 3.4 For SIR disease, the expected number of susceptible, infected, and recovered individuals
solves
d β
[S] = − [SI] (29)
dt N
d β
[I] = [SI] − γ[I] (30)
dt N
d
[R] = γ[I] (31)
dt
where [SI] is the expected value of the product si.

3.6 Exercises
Exercise 3.1 Extinction Probability
Let β and γ be given with µ̂(y) = (βy 2 + γ)/(β + γ).
a. Analytically find solutions to y = µ̂(y).
b. Assume β < γ. Find all solutions in [0, 1].
c. Assume β > γ. Find all solutions in [0, 1].
Exercise 3.2 Consistency with discrete-time formulation.
Although we have argued that a transmission in the continuous-time disease transmission case can be
treated as if a single infected individual has two infected offspring and then disappears, this is not what
actually happens. In this exercise we look at the true offspring distribution of an infected individual before
recovery, and we show that the ultimate predictions of the two versions are equivalent.
Consider a disease in which individuals transmit at rate β and recover at rate γ. Let pi be the probability
an infected individual will cause exactly i new infections before recovering.

a. Explain why p0 = γ/(β + γ).

36
b. Explain why pi = β i γ/(β + γ)i+1 . So pi form a geometric distribution.
c. Show that µ(y) = i pi y i can be expressed as µ(y) = γ/(β + γ − βy). [This definition of µ without
P
the hat corresponds to the discrete-time definition]
d. Show that the solutions to y = µ(y) are the same as the solutions to y = µ̂(y) = (βy 2 + γ)/(β + γ).
So the extinction probability can be calculated either way. (You do not have to find the solutions to
do this, you can simply show that the two equations are equivalent).
Exercise 3.3 Relation with R0
Take µ(y) = γ/(β + γ − βy) as given in exercise 3.2 and µ̂ = (βy 2 + γ)/(β + γ).
a. Show that µ′ (1) ̸= µ̂′ (1) in general.
b. Show that when R0 = µ′ (1) = 1, then µ′ (1) = µ̂′ (1) = 1. So both are still threshold parameters.
Exercise 3.4 Revisiting eventual extinction probability.
We revisit the results of exercise 3.1 using Eq. (17) (without solving it).

a. By substituting for µ̂(α), show that α̇ = (1 − α)(γ − βα).


We have α(0) = 0. Taking this initial condition and expression for α̇, show that
b. α → 1 as t → ∞ if β < γ (i.e., R0 < 1) and

c. α → γ/β as t → ∞ if β > γ (i.e., R0 > 1).


d. Set up (but do not solve) a partial fraction integration that would give α(t) analytically.
Exercise 3.5 This exercise is intended to help with understanding the backward Kolmogorov equations.
Let ϕi (t) denote the probability of having i active infections at time t given that at time 0 there was a
single infection [ϕ1 (0) = 1]. We have ϕ0 (t) = α(t). We extend the derivation of Eq. (16) to ϕ1 . Assume
ϕ0 (t0 ) and ϕ1 (t0 ) are known.

a. Following the derivation of Eq. (16), approximate ϕ0 (∆t), ϕ1 (∆t), and ϕ2 (∆t) for small ∆t.
b. From biological grounds explain why if there are 0 infections at time ∆t then there are also 0 infections
at time t0 + ∆t.
c. If there is 1 infection at time ∆t, what is the probability of 1 infection at time t0 + ∆t?

d. If there are 2 infections at time ∆t, what is the probability of 1 infection at time t0 + ∆t?
e. Write ϕ1 (t0 + ∆t) in terms of ϕ0 (t0 ), ϕ1 (t0 ), ϕ1 (∆t), and ϕ2 (∆t).
f. Using the definition of the derivative, find an expression for ϕ̇1 in terms of ϕ1 (t) and ϕ2 (t).
Exercise 3.6 In this exercise we derive the PGF version of the forward Kolmogorov equations by directly
calculating the rate of change of the probabilities of the states. Define ϕj (t) to be the probability that there
are j active infections at time t.
We have the forward Kolmogorov equations:

ϕ̇j = β(j − 1)ϕj−1 + γ(j + 1)ϕj+1 − (β + γ)jϕj .

a. Explain each term on the right hand side of the equation for ϕ̇j .
∂ j
P
b. By expanding Φ̇(y, t) = ∂t j ϕj y , arrive at Equation (18).

Exercise 3.7 In this exercise we follow [3, 6] and derive the PGF version of the backward Kolmogorov
equations by directly calculating the rate of change of the probabilities of the states. Define ϕki (t) to be the
probability of i infections at time t given that there were k infections at time 0. Although we assume that
at time 0 there is a single infection, we will need to derive the equations for arbitrary k.

37
a. Explain why

ϕki (t + ∆t) = ϕki (t) − k(β + γ)ϕki (t)∆t + k(βϕ(k+1)i (t) + γϕ(k−1)i (t)) + O(∆t)

for small ∆t.

b. By using the definition of the derivative ϕ̇ki = lim∆t→0 ϕki (t+∆t)−ϕ


∆t
ki (t)
, find ϕ̇ki
Define Φ(y, t|k) = i ϕki y i to be the PGF for the number of active infections assuming that there are k
P
initial infections.

c. Show that
Φ̇(y, t|1) = −(β + γ)Φ(y, t|1) + βΦ(y, t|2) + γΦ(y, t|0)

d. Explain why Φ(y, t|k) = Φ(y, t|1)k .


e. Complete the derivation of Equation (19).
Exercise 3.8 Define Φ(y, t|k) to be the PGF for the probability of having i infections at time t given k
infections at time 0.

a. Explain why Φ(y, t|k) = [Φ(y, t)]k .

b. Show that if we substitute Φ(y, t|k) = [Φ(y, t)]k in place of Φ(y, t) in Eq. (18) the equation remains
true with the initial condition y k .
c. Show that if we substitute Φ(y, t|k) = [Φ(y, t)]k in place of Φ(y, t) in equation (19) we do not get a
true equation.

So Eq. (18) applies regardless of the initial condition, but Eq. (19) is only true for the specific initial
condition of one infection.
Exercise 3.9 Let Φ(y, t|k) be the PGF for the number of infections assuming there are initially k infections.
Derive the backward Kolmogorov equation for Φ(y, t|k). Note that some of the Φs in the derivation above
would correspond to Φ(y, t|1) and some of them to Φ(y, t|k).
Exercise 3.10 Comparison of the formulations
a. Using Eq. (18) derive an equation for α̇ where α(t) = Φ(0, t). What, if any, additional information
would you need to solve this numerically?
b. Using Eq. (19), derive Equation (17) for α̇ where α(t) = Φ(0, t). What, if any, additional information
would you need to solve this numerically?
Exercise 3.11 Full solution

a. Show that Eq. (19) can be written


Φ(y, t) = (γ − βΦ(y, t))(1 − Φ(y, t))
∂t

b. Using partial fractions, set up an integral which you could use to solve for Φ(y, t) analytically (you do
not need to do all the algebra to solve it).

Exercise 3.12 Argue from their definitions that Φ(y, t) = Π(y, z, t)|z=1 .
Exercise 3.13 Derive Theorem 3.3 from Theorem 3.4.
Exercise 3.14 Derive Theorem 3.5 from Theorem 3.4.

38
6 4 6 4 6 4
9 9 9
7 7 7
0 0 0
3 3 3
5 5 5
8 8 8
1 2 1 2 1 2
10 11 10 11 10 11

Figure 9: (Left) A twelve-individual population, after the a priori assignment of who would transmit to
whom if ever infected by the SIR disease (the delay until transmission is not shown). Half of the nodes have
zero potential infectors and half have 3. Half of the nodes have 1 potential offspring and half have 2. So
the offspring distribution has PGF (x + x2 )/2 while the ancestor distribution has PGF χ(x) = (1 + x3 )/2.
(Middle) If node 6 is initially infected, the infection will reach node 4 who will transmit to 5 and 7,
and eventually infection will also reach 8 and 2 before further transmissions fail because nodes are already
infected. If however, it were to start at 9, then it would reach 10, from which it would spread only to 2.
(Right) By tracing backwards from an individual, we can determine which initial infections would lead to
infection of that individual. For example individual 4 will become infected if and only if it is initially infected
or 0, 3, 5, 6, 7, 8, or 11 is an initial infection.

Exercise 3.15 Equivalence of continuous and discrete final size distributions.


Show by direct substitution that if Ω∞ (z) = µ̂(Ω∞ (z), z) then Ω∞ (z) = zµ(Ω∞ (z)) where µ(y) =
γ/(β + γ − βy) is the PGF for the offspring distribution found in Exercise 3.2.
Exercise 3.16 We revisit the derivations of the usual mass action SIR ODEs. Following Example 3.3,

a. Derive [Ṡ] in terms of [SI].


˙ in terms of [SI] and [I].
b. Derive [I]
c. Using [S] + [I] + [R] = N , derive [Ṙ].

4 Large-time dynamics
We now look at how PGFs can be used to develop simple models of SIR disease spread in the large population
limit when the disease infects a nonzero fraction of the population. In this limit, the early-time approaches
derived before break down because depletion of the susceptible population is important. The later-time
models of Section 3.5 are impractical because of the N → ∞ limit and are more restricted due to the
continuous-time assumption.

4.1 SIR disease and directed graphs.


In Section 2.5 we argued that for early times the continuous-time predictions are equivalent to discrete-time
predictions because we can classify infections by the length of the transmission chain to them from the index
case. For SIR disease this argument extends beyond early times.
To see this, we assume that prior to the disease introduction, we know for each individual what would
happen if he ever becomes infected as in Figure 9. In particular, we know how long his infection would last,
to whom he would transmit, and how long the delays from his infection to onwards transmission would be.
The process of choosing these in advance, selecting the initial infection(s), and tracing infection from there is
equivalent to choosing the initial infection(s) and then choosing the transmissions while the infection process
is traced out.
By assigning who transmits to whom (and how long the delays are), we have defined a weighted directed
graph whose edges represent the potential transmissions and weights represent the delays [24, 27]. A node

39
v will become infected if and only if there is at least one directed path from an initially infected node u to
v. The time of v’s infection is given by the least sum of all paths from initially infected nodes to v. We note
that the transmission process could be quite complex: the duration of a node’s infection and the delays from
time of infection to time of onwards transmissions can have effectively arbitrary distributions, and we could
still build a similar directed graph.
This directed graph is a useful structure to study because it encodes the outbreak in a single static object,
as opposed to a dynamic process. There is significant study of the structure of such directed graphs [10, 13].
Much of it focuses on the size of out-compoents of a node (that is, for a given node, what fraction of the
population can be reached following the edges forwards) or the in-components (that is, from what fraction
of the population is it possible to reach a given node by following edges forwards).

4.2 Final size relations for SIR epidemics


We now derive final size relations for SIR epidemics in the large population limit. We begin with the
assumption that a single node is initially infected and that an epidemic happens.
We use the mapping of the SIR epidemic to a directed graph G. Assume that a single node u is chosen to
be infected. Consider a node v. The probability v is infected is the probability that u is in her in-component,
and so it equals the proportion of G that is in the in-component of v. In the limit as G becomes infinite,
there are a few possibilities. We are interested in what happens when an epidemic occurs, so we can assume
that u has a large out-component (in the sense that the out-component takes up a non-zero fraction of G in
the N → ∞ limit) [10]:
• If v has a small in-component, then almost surely u is not in the in-component and so almost-surely v
is not infected.
• If v has a large in-component, then almost surely it contains a node w that lies in the out-component of
u. The existence of w then implies the existence of a path from u to w to v, so v is in u’s out-component
and v becomes infected.
Thus, if u causes an epidemic in the large N limit, then the probability that v becomes infected equals
the probability that v has a large in-component. So the size of an epidemic (if it happens) is simply the
probability a random individual has a large in-component.
We approach the question of whether v has a large in-component in the same way we approached the
question of whether u causes a large chain of infections (i.e., whether u has a large out-component). We
define the PGF of the ancestor distribution to be the function χ(x) defined by
X
χ(x) = p i xi
i

where pi is the probability that a random node in the directed graph has in-degree i. That is, there are
exactly i nodes that would directly transmit to the randomly chosen node if they were ever infected. So the
probability an individual is not infected S(∞)/N solves x = χ(x), choosing the smaller solution when two
solutions exist. Since the proportion infected is r(∞) = R(∞)/N = 1 − S(∞)/N , we can conclude

Theorem 4.1 Assume that an outbreak begins with a single infected individual and an epidemic results.
In the large N limit, the expected cumulative proportion infected r(∞) = R(∞)/N solves

r(∞) = 1 − χ(1 − r(∞))

where χ(x) is the PGF of the ancestor distribution. If there are multiple solutions we choose the larger
solution for r(∞) in [0, 1].

Under common assumptions, the population is large, the average number of transmissions an individual
causes is R0 , and the recipient is selected uniformly at random. Under these assumptions the ancestor
distribution is Poisson with mean R0 . So χ(x) = e−R0 (1−x) . xThen

r(∞) = 1 − e−R0 r(∞) . (32)

40
Deriving this result does not depend on the duration of infections, or even on the distribution of factors
affecting infectiousness. The assumptions required are that an epidemic starts from a single infected individ-
ual, that each transmission reaches a randomly chosen member of the population, that all individuals have
equal susceptibility, and the average individual will transmit to R0 others. This result is general across a
wide range of assumptions about the infectious process.
Restating this we have:
Corollary 4.1 Assume that an SIR disease is spreading in a well-mixed population with homogeneous
susceptibility. Assuming that the initial fraction infected is infinitesimal and an epidemic occurs, the final
size satisfies
r(∞) = 1 − e−R0 r(∞) (33)
where R0 is the reproductive number of the disease.

This explains many of the results of [32, 35], and our observation in example 2.3 that the epidemic size
depends on R0 and not on any other property of the offspring distribution. A closely-related derivation is
provided by [12, Section 1.3].

4.3 Discrete-time SIR dynamics


We now take a discrete-time approach, similar to [46, 37] and [27, chapter 6]. We will assume that at
generation g = 0 the disease is introduced by infecting a proportion ρ uniformly at random leaving the
remainder susceptible. We assume that the population is very large and that the number of infections is
large enough that the dynamics can be treated as deterministic. Our results can be adapted to other initial
conditions (for example, to account for nonzero R in the initial condition).
We assume that χ(x) is known and that there is no correlation between how susceptible an individual is
and how infectious that individual is. Thus at generation g, the expected number of transmissions occurring
is R0 I(g), and how the recipients are chosen depends on χ.
Let v be a randomly chosen member of the population. The probability that v’s randomly chosen ancestor
has not yet been infected by generation g −1 is S(g −1)/N . The probability v is susceptible at generation g is
the probability v was initially susceptible, 1 − ρ, times the probability v has not received any transmissions,
χ(S(g − 1)/N ) (see Exercise 4.2).
So for g > 0 we arrive at
S(g) = (1 − ρ)N χ(S(g − 1)/N )
I(g) = N − R(g) − S(g)
R(g) = R(g − 1) + I(g − 1)
with
S(0) = 1 − ρ, I(0) = ρ, R(0) = 0 .
So we have
Theorem 4.2 Assume that χ(x) is the PGF of the ancestor distribution and assume there is no correlation
between infectiousness and susceptibility of a given individual. Further assume that at generation 0 a
fraction ρ is randomly infected in the generation-based discrete-time model. Then in the large population
limit

S(g) = (1 − ρ)N χ(S(g − 1)/N ) (34a)


I(g) = N − R(g) − S(g) (34b)
R(g) = R(g − 1) + I(g − 1) . (34c)

With initial conditions


S(0) = (1 − ρ)N, I(0) = ρN, R(0) = 0 . (34d)

We can interpret this in the context of survival functions. The function (1 − ρ)χ(S(g − 1)/N ) gives the
probability that a node has lasted g generations without being infected.

41
4.4 Continuous-time SIR epidemic dynamics
We now move to continuous-time SIR epidemics. We allow for heterogeneity, assuming that eachP susceptible
individual u receives transmissions at some rate κu βI(t)/N ⟨K⟩, and that the PGF of κ is ψ(x) = κ P (κ)xκ .
We assume κ takes only non-negative integer values.
For an initially susceptible individual u with a given κu , the probability of not yet receiving a transmission
by time t solves ṡu = −κu βI(t)su /N ⟨K⟩, which has solution
Rt
0 I(τ )dτ
su = e−κu β N ⟨K⟩ .

So we can write
su = θκu
Rt
0 I(τ )dτ
where θ = e−β N ⟨K⟩ and
θ̇ = −βθI/N ⟨K⟩ .
Considering a random individual of unknown κ, the probability she was initially susceptible is 1 − ρ and the
probability she has not received any transmissions is ψ(θ). So

S(t) = (1 − ρ)N ψ(θ)

Taking Ṙ = γI, we have

Ṙ = γI
γN ⟨K⟩ θ̇
=− .
β θ

Integrating both sides, taking θ(0) = 1 and R(0) = 0, we have

γN ⟨K⟩
R=− ln θ
β
Taking I = N − S − R we get
γ ⟨K⟩
 
I=N 1 − (1 − ρ)ψ(θ) + ln θ
β
and so θ̇ becomes
γ ⟨K⟩
 
θ̇ = −βθ 1 − (1 − ρ)ψ(θ) + ln θ / ⟨K⟩
β

Theorem 4.3 Assuming that at time t = 0 a fraction ρ of the population is randomly infected and that
the susceptible individuals each have a κ such that they become infected as a Poisson process with rate
κβI/N ⟨K⟩, in the large population limit we have

S = N (1 − ρ)ψ(θ) (35a)
γ ⟨K⟩
 
I = N 1 − (1 − ρ)ψ(θ) + ln θ (35b)
β
γN ⟨K⟩
R=− ln θ (35c)
β

P (k)xk and the system is governed by a single ODE


P
where ψ(x) = k
 
−βθ 1 − (1 − ρ)ψ(θ) + γ⟨K⟩
β ln θ
θ̇ = (35d)
⟨K⟩

42
and initial condition
θ(0) = 1 . (35e)

As in the discrete-time case, this can be interpreted as a survival function formulation of the SIR model.
Most, if not all, mass-action formulations of the SIR model can be re-expressed in a survival function
formulation. Some examples are shown in the Exercises.
Some very similar systems of equations are developed in [27, chapter 6] and [37, 46, 47, 34] where the
focus is on networks for which the value of κ not only affects the probability of becoming infected, but also
of transmitting further. These references focus on the assumption that an individual’s infector remains a
contact after transmission, but they contain techniques for studying partnerships with varying duration.

4.5 Exercises
Exercise 4.1 Ancestor distribution for homogeneous well-mixed population.
Consider an SIR disease in a well-mixed population having N individuals and a given R0 . Let v be a
randomly chosen individual from the directed graph created by placing edges from each node to all those
nodes they would transmit to if infected.

a. Show that if the average number of offspring is R0 , then so is the average number of infectors.
b. If there are exactly R0 N edges in the directed graph and each recipient is chosen uniformly at random
from the population (independent of any previous choice), argue that the number of transmissions v
receives has a binomial distribution with R0 N trials and probability R0 /N . (technically we must allow
edges from v to v)
c. Argue that if R0 remains fixed as N → ∞, then the number of transmissions v receives is Poisson
distributed with mean R0 .
Exercise 4.2 Explain why for large N the probability v is still susceptible at generation g if she was initially
susceptible is χ(S(g − 1)/N ).
Exercise 4.3 Use Theorem 4.2 to derive a result like Theorem 4.1, but with nonzero ρ.
Exercise 4.4 Final size relations
Consider the continuous time SIR dynamics as given in System (35)
a. Assume κ = 1 for all individuals, and write down the corresponding equations for S, I, R, and θ.

b. At large time I → 0, so S(∞) = N − R(∞). But also S(∞) = S(0)ψ(θ(∞)). By writing θ(∞) in
terms of R(∞), derive a recurrence relation for r(∞) = R(∞)/N in terms of r(∞) and R0 = β/γ.
c. Comment on the relation between your result and Theorem 4.1
Exercise 4.5 Other relations

a. Using the equations from Exercise 4.4, derive the peak prevalence relation, an expression for the
maximum value of I. [at the maximum I˙ = 0, so we start by finding θ so that Ṡ + Ṙ = 0.]
b. Similarly, find the peak incidence relation, an expression for the maximum rate at which infections
occur, −Ṡ.
Exercise 4.6 Alternate derivation of su .
If the rate of transmissions to u is βIκu /N ⟨K⟩, then the expected number of transmissions u has received
Rt
is βκu 0 I(τ ) dτ /N ⟨K⟩ and this is Poisson distributed.
a. Let fu (x) be the PGF for the number of transmissions u has received. Find an expression for fu (x)
Rt
in terms of the integral 0 I(τ )dτ .
b. Explain why fu (0) is the probability u is still susceptible.

43
c. Find fu (0).
Exercise 4.7 Alternate derivation of Theorem 4.3 in the homogeneous case.
The usual homogeneous SIR equations are

Ṡ = −βIS/N
I˙ = βIS/N − γI
Ṙ = γI

We will Rderive system (35) for fixed κ = 1 from this system through the use of an integrating factor. Set
t
θ = e−β 0 I(τ )dτ /N .
a. Show that θ̇ = −βIθ/N and so θ̇/θ = −β Ṙ/N γ.

b. Using the equation for Ṡ add βIS/N to both sides and then divide by (the factor 1/θ is an integrating
d
factor). Show that the expression on the left hand side is dt S/θ and so

d
S/θ = 0 .
dt

c. Solve for R in terms of θ.


d. Solve for S in terms of θ.
e. Solve for I in terms of θ using S + I + R = N .

This equivalence was found in [35] and [20].


Exercise 4.8 Alternate derivation of Theorem 4.3.
Consider now a population having many subgroups of susceptibles denoted by κ with the group κ receiving
transmissions at rate βκI/N per individual. Once infected, each individual transmits with rate β ⟨K⟩ and
recovers with rate γ. These assumptions lead to
I
Ṡκ = −βκ Sκ
N ⟨K⟩
I X
I˙ = −γI + β κSκ
N ⟨K⟩ κ
Ṙ = γI
Rt
Following Exercise 4.7, set θ = e−β 0
I(τ ) dτ /N
and derive system (35) from these equations by use of an
integrating factor.

5 Multitype populations
We now briefly discuss how PGFs can be applied to multitype populations. This section is intended primarily
as a pointer to the reader to show that it is possible to apply these methods to such populations. We do not
perform a detailed analysis.
Many populations can be divided into subgroups. These may be patches in a metapopulation model,
genders in a heterosexual sexually transmitted infection model, age groups in an age-structured population,
or any of a number of other groupings. Applications of PGFs to such models have been studied in multiple
contexts [28, 44].

5.1 Discrete-time epidemic probability


We begin by considering the probability of an epidemic in a discrete-time model. To set the stage, assume
there are M groups and let pi1 ,i2 ,··· ,iM |k be the probability that an individual of group k will cause iℓ

44
infections in group ℓ. Define αg|k to be the probability that a chain of infections starting from an individual
of group k becomes extinct within g generations.
It is straightforward to show that if we define
X
ψk (x1 , x2 , . . . , xM ) = pi1 ,i2 ,...,iM xi11 xi22 · · · xiMM
i1 ,i2 ,...,iM

then
X
i1 i2 iM
αg|k = pi1 ,i2 ,··· ,iM |k αg−1|1 αg−1|2 · · · αg−1|M
i1 ,i2 ,...,iM

= ψk (αg−1|1 , αg−1|2 , · · · , αg−1|M )

After converting this into vectors we get α ⃗ ⃗0. Iterating g times we have
⃗ 1 = ψ(
α ⃗ [g] (⃗0)
⃗g = ψ (36)
Setting α
⃗ to be the limit as g goes to infinity, we find the extinction probabilities. Specifically, the k-th
component of α⃗ is the probability of extinction given that the first individual is of type k. Thus we have:
Theorem 5.1 Let
• α
⃗ g = (αg|0 , αg|1 , . . . , αg|M ) where αg|k is the probability a chain of infections starting with a type k
individual will end within g generations

• and ψ⃗ = (ψ1 , ψ2 , . . . , ψM ) where ψk (⃗x) = P i1 i2 iM


i1 ,i2 ,...,iM pi1 ,i2 ,...,iM |k x1 x2 · · · xM .

Then α⃗g = ψ⃗ [g] (⃗0).


The vector of eventual extinction probabilities in the infinite population limit is given by α
⃗ ∞ = limg→∞ α
⃗g
and is a solution to α ⃗
⃗ ∞ = ψ(⃗
α∞ ).

We could have derived this directly by showing that the extinction probabilities solve α ⃗ α). In this
⃗ = ψ(⃗
case it might not be obvious how to solve this multidimensional system of nonlinear equations or how to be
certain that the solution found is the appropriate one. However, by interpreting the iteration in Eqn. (36)
in terms of the extinction probability after g generations, it is clear that simply iterating starting from
⃗ 0 = ⃗0 will converge to the appropriate values. Additionally the values calculated in each iteration have a
α
meaningful interpretation.
Example 5.1 Consider a population made up of many large communities. We assume an unfamiliar dis-
ease is spreading through the population. When the disease begins to spread in a community, the community
learns to recognize the disease symptoms and infectiousness declines. We assume that we can divide the
population into 3 types: primary cases T0 , secondary cases T1 , and tertiary cases T2 . The infectiousness of
primary cases is higher than that of secondary cases which is higher than that of tertiary cases. Within a
community a primary case can cause secondary cases, while secondary and tertiary cases can cause tertiary
cases. All cases can cause new primary cases in other communities. We ignore multiple introductions to
the same community.
We define nij to be the number of infections of type Ti caused by a type Tj individual, and we assume
that we know the joint distribution pn00 n10 , pn01 n21 , and pn02 n22 . We define
X
ψ1 (x, y, z) = pn00 n10 xn00 y n10
n00 ,n10
X
ψ2 (x, y, z) = pn01 n21 xn01 z n21
n01 ,n21
X
ψ3 (x, y, z) = pn02 n22 xn02 z n22
n02 ,n22

Note that ψ1 does not depend on z while ψ2 and ψ3 do not depend on y.

45
We define α
⃗ 0 = (0, 0, 0) and set α
⃗ g = (ψ1 (⃗
αg−1 ), ψ2 (⃗
αg−1 ), ψ3 (⃗
αg−1 )). Then taking α
⃗ to be the limit as
g → ∞, the first entry of α ⃗ is the probability that the disease goes extinct starting from a single primary
case.

5.2 Continuous-time SIR dynamics


Now we consider a continuous-time version of SIR dynamics in a heterogeneous population.
Assume again that there are M groups and let βij be the rate at which an individual in group j causes
transmissions that go to group i. Let ξi be the expected number of transmissions that an individual in group
i has received since time 0. Finally assume that individuals in group i recover at rate γi . Then the expected
number of transmissions an individual in group i has received by time t is Poisson distributed with mean ξi .
The PGF for the number of transmissions received is thus e−ξi (1−x) . Setting x = 0, the probability of having
received zero transmissions is e−ξi (t) . Thus Si = Si (0)e−ξi (t) . We have Ii = Ni −
PSi − Ri and Ṙi = γi Ii . To
find ξi , we simply note that the total rate that group i is receiving infection is j Ij βij , and so
P
j Ij βij
ξ˙i = .
Ni
Thus:

Theorem 5.2 If the rate of transmission from an infected individual in group j to group i is βij , then

Si = Si (0)e−ξi (t) (37a)


Ii = Ni − Si − Ri (37b)
Ṙi = γi Ii (37c)
P
I β
˙ξi = j j ij (37d)
Ni
with ξ(0) = 0.

5.3 Exercises
Exercise 5.1 Consider a vector-borne disease for which each infected individual infects a Poisson-distributed
number of vectors, with mean λ. Each infected vector causes i infections with probability pi = π i (1 − π) for
some π ∈ [0, 1]. This scenario corresponds to human infection lasting for a fixed time with some constant
transmission rate to vectors, and each vector having probability π of living to bite again after each bite and
transmitting with probability 1 if biting.
a. Let αg|1 and αg|2 be the probability that an outbreak would go extinct in g generations starting with an
infected human or vector respectively. Find the vector-valued function ψ(⃗ ⃗ x) = (ψ1 (⃗x), ψ2 (⃗x)). That
is, what are the PGFs ψ1 (x1 , x2 ) and ψ2 (x1 , x2 )?

b. Set λ = 3 and π = 0.5. Find the probability of an epidemic if one infected human is introduced or if
one infected vector is introduced.
c. For the same values, find the probability of an epidemic if one infected vector is introduced.
d. Find ψ2 (ψ1 (0, x), 0). How should we interpret the terms of its Taylor Series expansion?

46
Exercise 5.2 Starting from the equations
Si X
Ṡi = − βij Ij
Ni j
Si X
I˙i = −γi Ii + βij Ij
Ni j

Ṙi = γi Ii

use integrating factors to derive System (37).


P
Exercise 5.3 Assume the population is grouped into subgroups of size Ni with N = i Ni and the i-th
subgroup has a parameter κi representing their rate of contact with others. Take
κi Ni
βji = κj P β
ℓ Nℓ κℓ

to be the transmission rate from type i individuals to a single type j individual, and assume all infected
individuals recover with the same rate γ.
−β ( j κj 0t Ij (τ ) dτ )/ j κj Nj
P R P
and define the PGF ψ(x) = i N
P i i P
P Define θ = e
P N x . Let S = i Si , I =
i I i , and R = R i .

a. Explain what assumptions this model makes about interactions between individuals in group i and j.
b. Show that

S = N ψ(θ)
I =N −S−R
Ṙ = γI
P
j κj Ij
θ̇ = −βθ P
j κj Nj

with θ(0) = 1.
P P P
κj Ij κj Sj κ j Rj
c. Explain why Pj =1− Pj − Pj .
j κj Nj j κj Nj j κj Nj
P
Pj
κj Sj θψ ′ (θ)
d. Show that = ψ ′ (1) .
j κj Nj
P P
d Pj κj Rj κ j Rj
e. Show that dt = −(γ/β) θ̇θ , and solve for Pj in terms of θ assuming Rj = 0 for all j.
j κj Nj j κj Nj

f. Thus conclude that


θ2 ψ ′ (θ)
θ̇ = −βθ + β − θγ ln θ
ψ ′ (1)

6 Discussion
There are many contexts where we are interested in how a newly introduced infectious disease would spread.
We encounter situations like this in the spread of zoonotic infections such as Monkey Pox or Ebola as well as
the importation of novel diseases such as the Zika in the Americas or the reintroduction of locally eliminated
diseases such as Malaria.
PGFs are an important tool for the analysis epidemics, particularly at early stages. They allow us to
relate the individual-level transmission process to the distribution of outcomes. This allows us to take data
about the transmission process and make predictions about the possible outcomes, but it also allows us to
take observed outbreaks and use them to infer the individual-level transmission properties.

47
For SIR disease PGFs also provide a useful alternative formulation to the usual mass-action equations.
This formulation leads to a simple derivation of final-size relations and helps explain why previous studies
have shown that a wide range of disease assumptions give the same final size relation.
Our goal with this primer has been to introduce researchers to the many applications of PGFs to disease
spread. We have used the appendices to derive some of the more technical properties of PGFs. Additionally
we have developed a Python package Invasion PGF which allows for quick calculation of the results in the
first three sections of this primer. A detailed description of the package is in Appendix C. The software can
be downloaded at https://github.com/joelmiller/Invasion_PGF. Documentation is available within the
repository, starting with the file docs/ build/html/index.html. The supplementary information includes
code that uses Invasion PGF to generate the figures of Section 2.

A Important properties of PGFs


In this appendix, we give some theoretical background behind the important properties of PGFs which we
use in the main part of the primer. We attempt to make each subsection self-contained so that the reader has
a choice of reading through the appendix in its entirety, or waiting until a property is used before reading that
section. Because we expect the appendix is more likely to be read piecemeal, the exercises are interspersed
through the text where the relevant material appears.
A PGF has been described as “a clothesline on which we hang up a sequence of numbers for display” [52].
Similarly [42] says “A generating function is a device somewhat similar to a bag. Instead of carrying many
little objects detachedly, which could be embarrassing, we put them all in a bag, and then we have only one
object to carry, the bag.” Indeed for many purposes mathematicians use PGFs primarily because once we
have the distribution put into this “bag”, many more mathematical tools are available, allowing us to derive
interesting and sometimes surprising identities [52].
However, for our purposes there is a meaningful direct interpretation of a PGF. Assume that we are
interested in the probability that an event does not happen given some unknown number i of independent
identical Bernoulli trials with probability α the event does not happen in any one trial. Let ri represent the
probability that there are i trials. Then the probability that the event does not occur in any trial is
X
ri αi = f (α) ,
i

and so PGFs emerge naturally in this context.


In infectious disease, this context occurs frequently and many results in this primer can be expressed in
this framework. For reference, we make this property more formal:
Property A.1 Assume we have a process consisting P of a random number i independent identical Bernoulli
trials. Let ri be the distribution of i and f (x) = i ri xi be its PGF. If α is the probability that each trial
fails, then f (α) is the probability all trials fail.

A.1 Properties related to individual coefficients


We start by investigating how to find the coefficients of a PGF if we can calculate the numeric value of the
PGF at any point. √
This section makes use of the imaginary number i = −1, and so in this section we avoid using i as an
index in the sum of f (x).

Property A.2 Given a PGF f (x) = n rn xn , the coefficient of xn in its expansion for a particular n
P
can be calculated by taking n derivatives, evaluating the result at x = 0, and dividing by (n!). That is
 n
1 d
rn = f (x)
n! dx x=0

This result holds for any function with a Taylor Series (it does not use any special properties of PGFs).

48
Exercise A.1 Prove Property A.2 [write out the sum and show that the derivatives eliminate any rm for
m < n, the leading coefficient of the result is n!rn , and the later terms are all zero].

There are many contexts in which we can only calculate a function numerically. In this case the calculation
of these derivatives is likely to be difficult and inaccurate. An improved way to calculate it is given by a
Cauchy integral [38]. This is a standard result of Complex Analysis, and initially we simply take it as given.
1 f (z)
I
rn = dz
2πi z n+1
This integral can be done on a closed circle around the origin z = Reiθ , in which case dz = izdθ. Then rn
can be rewritten as Z 2π
1 f (Reiθ )
rn = dθ
2π 0 (Reiθ )n
Using another substitution, θ = 2πu, we find dθ = 2πdu with u varying from 0 to 1. This integral becomes
Z 1
f (Re2πiu )
rn = n 2nπiu
du
0 R e

The integral on the right hand side can be approximated by a simple summation and we find
M
1 X f (Re2πim/M )
rn ≈
M m=1 Rn e2nπim/M

for large M .
A few technical steps show that the PGF f (z) converges for any z with |z| ≤ 1 (any PGF is analytic
within the unit circle R = 1 and that the PGF converges everywhere on the unit circle [the coefficients are
all positive or zero and the sum converges for z = 1, so it converges absolutely on the unit circle]). Thus
this integral can be performed for any positive R ≤ 1. We have found that the unit circle (R = 1) yields
remarkably good accuracy, so we recommend using it unless there is a good reason not to. Some discussion
of identifying the optimal radius appears in [9].
Thus we have
Property A.3 Given a PGF f (x), the coefficient of xn in its expansion can be calculated by the integral
1
f (Re2πiu )
Z
rn = du (38)
0 Rn e2nπiu

This is well-approximated by the summation


M
1 X f (Re2πim/M )
rn ≈ (39)
M m=1 Rn e2nπim/M

with R = 1 and M ≫ 1.

It turns out that this approach is closely related to the approach to get a particular coefficient of a Fourier
Series. Once the variable is changed from z to θ, our function is effectively a Fourier Series in θ, and the
integral is the standard approach to finding the nth coefficient of a Fourier Series.
Exercise A.2 Verification of Equation (38):
In this exercise we show that the formula in Equation (38) yields rn . Assume that the integral is
performed on a circle of radius R ≤ 1 about the origin.
R1 2πiu
a. Write f (z) = m rm z m and rewrite 0 fR(Re )
P
n e2nπiu du as a sum

1 1
f (Re2πiu )
Z X Z
n 2nπiu
du = rm Rm−n e2(m−n)πiu du
0 R e m 0

49
b. Show that for m = n the integral in the summation on the right hand side is 1.
c. Show that for m ̸= n, the integral in the summation on the right hand side is 0.
d. Thus conclude that the integral on the left hand side must yield rn .

Exercise A.3 Let f (z) = ez = 1 + z + z 2 /2 + z 3 /6 + z 4 /24 + z 5 /120 + · · · . Write a program that estimates
r0 , r1 , . . . , r5 using Equation (39) with R = 1. Report the values to four significant figures for
a. M = 2
b. M = 4

c. M = 5
d. M = 10
e. M = 20.

f. How fast is convergence for different rn ?

A.2 Properties related to distribution moments


We next look at two straightforward properties about the moments of the √ distribution ri having PGF f (x).
We return to using i = 0, 1, . . . as an indexing variable, so i is no longer −1. We have
X
f (1) = ri 1i
i
X
= ri
i
=1

where the final equality is because the ri determine a probability distribution.


With mildly more effort, we have
X
f ′ (1) = ri i1i−1
i
X
= iri
i
= E(i)

where E(i) denotes the expected value of i. These arguments show

Property A.4 Any PGF f (x) must satisfy f (1) = 1.

Property A.5 The expected value of a random variable i whose distribution has PGF f (x) is given by
E(i) = f ′ (1).

It is straightforward to derive relationships for E(i2 ) and higher order moments by repeated differentiation
of f and evaluating the result at 1.

A.3 Properties related to function composition


To motivate function composition, we start with an example.

50
Example A.1 Consider a weighted coin which comes up ‘Success’ with probability p and with ‘Failure’
with probability 1 − p. We play a game in which we stop at the first failure, and otherwise flip it again.
Define f (x) = px + 1 − p
Let αg be the probability of failure within the first g flips. Then α0 = 0 and α1 = 1 − p = f (0) are easily
calculated.
More generally the probability of starting the game and failing immediately is α0 = 1 − p = f (0) while
the probability of having a success and flipping again is p, at which point the probability of failure within
g − 1 flips is αg−1 . So we have αg = (1 − p) + pαg−1 = f (αg−1 ). So using induction we can show that the
probability of failure within g generations is f [g] (0).
Exercise A.4 The derivation in example A.1 was based on looking at what happened after a single flip
and then looking g − 1 flips into the future in the inductive step. Derive αg = f (αg−1 ) by instead looking
g − 1 flips into the future and then considering one additional step. [the distinction between this argument
and the previous one becomes useful in the continuous-time case where we use the ‘backward’ or ‘forward’
Kolmogorov equations.]
Exercise A.5 Consider a fair six-sided die with numbers 0, 1, . . . , 5, rather than the usual 1, . . . , 6. We
roll the die once. Then we look at the result, and roll that many copies (if zero, we stop), then we look at
the sum of the result and repeat. Define
( 6
x −1
1 + x + · · · + x5 x ̸= 1
f (x) = = 6(x−1)
6 1 x=1

Define αg to be the probability the process stops after g iterations (with α0 = 0 and α1 = 1/6).
a. Find an expression for αg , the probability that by the g’th iteration the process has stopped, in terms
of f (x).
b. Rephrase this question in terms of the extinction probability for an infectious disease.

Processes like that in Exercise A.5 can be thought of as “birth-death” processes where each event generates
a discrete number of new events. Our examples above show that function composition arises naturally in
calculating the probability of extinction in a birth-death process. We show below that it also arises naturally
when we want to know the distribution of population sizes after some number of generations rather than
just the probability of 0.
Specifically, we often assume an initially infected individual causes some random number of new infections
i from some distribution. Then we assume that each of those new infections independently causes an
additional random number of infections from the same distribution. We will be interested in how to get from
the one-generation PGF to the PGF for the distribution after g generations.
We derive this in a few stages.
• We first show that if we take two numbers from different distributions with PGFs f (x) and h(x), then
their sum has distribution f (x)h(x) [Property A.6]. Then inductively applying this we conclude that
the distribution of the sum of n numbers from a distribution with PGF f (x) has PGF [f (x)]n .
• We also show that if the probability we take a number from the distribution with PGF f (x) is π1 and
the probability we take it from the distribution with PGF h(x) is π2 , then the PGF of the resulting
distribution is π1 f (x) + π2 h(x) [Property A.7].
• Putting these two properties together, we can show that if we choose i from a distribution with PGF
f (x) and then choose i different values from a distribution with PGF h(x), then the sum of the i values
has PGF f (h(x)) [Property A.8].
Our main use of Properties A.6 and A.7 is as stepping stones towards Property A.8.
Consider two probability distributions, let ri be the probability of i for the P
first distribution and
P qj be
the probability of j for the second distribution. Assume they have PGFs f (x) = i ri xi and h(x) = j qj xj
respectively.

51
We are first interested in the process of choosing i from the first distribution, j from the second, and
adding them. In the disease context this arises where the two distributions give the probability that one
individual infects i and the other infects j and we want to know the probability of a particular sum.
The probability of obtaining a particular sum k is
X
ri qk−i
i
P Pk
So the PGF of the sum is k i=0 ri qk−i xk . By inspection, this is equal to the product f (x)h(x). This
means that the PGF of the process where we choose i from the first and j from the second and look at the
sum is the product f (x)h(x).
We have shown
two probability distributions, r0 , r1 , . . . and q0 , q1 , . . . with PGFs f (x) = i ri xi
P
Property A.6P Consider
and h(x) = j qj xj . Then if we choose i from the distribution ri and j from the distribution qj , the PGF
of their sum is f (x)h(x).

Usually we want the special case where we choose two numbers from the same distribution having PGF
f (x). The PGF for the sum is [f (x)]2 . The PGF for the sum of three numbers from the same distribution
can be thought of as the result of [f (x)]2 and f (x), yielding [f (x)]3 . By induction, it follows that the PGF
for the sum of i numbers sum is [f (x)]i .
Now we want to know what happens if we are not sure what the current system state is. For example,
we might not know if we have 1 or 2 infected individuals, and the outcome at the next generation is different
based on which it is.
We use the distributions ri and qj . We assume that with probability π1 we choose a random number k
from the ri distribution, while with probability π2 = 1 − π it is chosen from the qj distribution. Then the
probability of a particular value k occurring is π1 rk + π2 qk , and the resulting PGF is k (π1 rk + π2 qk )xk =
P
π1 f (x) + π2 h(x). This becomes:

two probability distributions, r0 , r1 , . . . and q0 , q1 , . . . with PGFs f (x) = i ri xi


P
Property A.7 P Consider
j
and h(x) = j qj x . We consider a new process where with probability π1 we choose k from the ri
distribution and with probability π2 = 1 − π1 we choose k from the qj distribution. Then the PGF of the
resulting distribution is π1 f (x) + π2 g(x).

i
P
P We jfinally consider a process in which we have two distributions with PGFs f (x) = i ri x and h(x) =
j qj x . We choose the number i from the distribution ri and then take the sum of i values chosen from
Pi
the qj distribution, ℓ=1 jℓ . Both the number of terms in Pthe sum and their values are random variables.
Using the results above, the PGF of the resulting sum is i ri h(x)i = f (h(x)). Thus we have

two probability distributions, r0 , r1 , . . . and q0 , q1 , . . . with PGFs f (x) = i ri xi


P
Property A.8 P Consider
and h(x) = j qj xj . Then if we choose i from the distribution ri and then take the sum of i values chosen
from the distribution qj , the PGF of the sum of those i values is f (h(x)).

This property is closely related to the spread of infectious disease. An individual may infect i others, and
then each of themP causes additional infections. The number of these second generation cases is the sum of i
i
random numbers ℓ=1 jℓ where jℓ is the number of additional infections caused by the ℓ-th infection caused
by the initial individual. So if f (x) is the PGF for the distribution of the number of infections caused by the
first infection and h(x) is the PGF for the distribution of the number of infections caused by the offspring,
then f (h(x)) is the PGF for the number infected in the second generation [and if the two distributions are
the same this is f [2] (x)]. Repeated iteration gives us the distribution after g generations.
Exercise A.6 Note that if we interchange p and q in the PGF of the negative binomial distribution in
Table 1, it is simply the PGF of the geometric distibution raised to the power r̂. A number chosen from the
negative binomial can be defined as the number of successful trials (each with success probability p) before
the r̂th failure.

52
Using this and Property A.8, derive the PGF of the negative binomial.
Exercise A.7 Sicherman dice [18, 17].
To motivate this exercise consider two tetrahedral dice, numbered 1, 2, 3, 4. When we roll them we get
sums from 2 to 8, each with its own probability, which we can infer from this table:

   
 2 3 4 5
 3 4 5 6
 4 5 6 7
 5 6 7 8

However another pair of tetrahedral dice, labelled 1, 2, 2, 3 and 1, 3, 3, 5 yields the same sums with the same
probabilities:

   
 2 3 3 4
 4 5 5 6
 4 5 5 6
6 7 7 8

We now try to find a similar pair for 6-sided dice. First consider a pair of standard 6-sided dice.
a. Show that the PGF of each die is f (x) = (x + x2 + x3 + x4 + x5 + x6 )/6.

b. Fill in the tables showing the possible sums from rolling two dice (fill in each square with the sum of
the two entries) and multiplication for two polynomials (fill in each square with the product of the two
entries):

    x1 x2 x3 x4 x5 x6
 x1
 x2
 x3 .

 x4
x5
x6

c. Explain the similarity.


d. Show that each step of the following factorization is correct:

x(1 + x + x2 + x3 + x4 + x5 )
f (x) =
6
x(1 + x + x2 )(1 + x3 )
=
6
x(1 + x + x2 )(1 + x)(1 − x + x2 )
= .
6

53
This cannot be factored further, and indeed it can be shown that a property similar to prime numbers holds.
Namely, any factorization of f (x)f (x) as h1 (x)h2 (x) has the property that each of h1 and h2 can be factored
into some powers of these “prime” polynomials times a constant.
We seek two new six-sided dice (each different) such that the sum of a roll of the two dice has the same
probabilities as the normal dice. The two dice have positive integer values on them (so no fair adding a
constant c to everything on one die and subtracting c on the other). Let h1 (x) and h2 (x) be their PGFs.
e. Explain why we must have h1 (x)h2 (x) = [f (x)]2 .
f. P
If the dice have numbers
ai
P a1bi, . . . , a6 and b1 , . . . , b6 , show that their PGFs are of the form h1 (x) =
i x /6 and h2 (x) = i x /6 where all ai and bi are positive integers.

g. Given the properties we want for the dice, find h1 (0) and h2 (0).

h. Given the properties we want for the dice, find h1 (1) and h2 (1).
i. Using the values at x = 0 and x = 1, explain why h1 (x) = x(1 + x + x2 )(1 + x)(1 − x + x2 )b /6 and
h2 (x) = x(1 + x + x2 )(1 + x)(1 − x + x2 )2−b /6 where b is 0, 1, or 2.
j. The case b = 1 gives the normal dice. Conside b = 0 (b = 2 gives the same final result). Find h1 (x).
h2 (x) = 61 (x + x3 + x4 + x5 + x6 + x8 )
k. Create the table for the two dice corresponding to h1 (x) and h2 (x) and verify that the sums occur with

the same frequency as a normal pair:

Exercise A.8 Early-time outbreak dynamics


a. Consider normal dice. The PGF is f (x) = (x + x2 + x3 + x4 + x5 + x6 )/6. Consider the process where
we roll a die, take the result i, and then roll i other dice and look at their sum. What is the PGF of
the resulting sum in terms of f ?

b. If an infected individual causes anywhere from 1 to 6 infections, all with equal probability, find the
PGF for the number of infections in generation 2 if there is one infection in generation 0. [you can
express the result in terms of f ]
c. And in generation g (assuming depletion of susceptibles is unimportant)?

A.4 Properties related to iteration of PGFs


There are various contexts in which we might iterate to calculate f [n] (x) (the result of applying f n times
to x).
In the disease context, this occurs most frequently in calculating the probability of outbreak extinction.
If we think of α as the probability that the outbreak goes extinct from a single individual, then from
Property A.1 we would expect that α = f (α̂) where α̂ is the probability that an offspring of the individual
fails to produce an epidemic. However, under common assumptions, the number of infections from the
offspring should be from the same distribution as from the parent. In this case we would conclude α = α̂
and so α = f (α).
It turns out that a good way to solve for α is iteration, starting with the guess α0 = 0. We will show that
this converges to the correct value [x = f (x) can have multiple solutions, only one of which is the correct α].
Figure 10 demonstrates how the iterative process can be represented by a “cobweb diagram” [41, 33] To
use a cobweb diagram to study the behavior of f [g] (x0 ), we draw the line y = x and the curve y = f (x).

54
1.0

0.8

0.6

y
0.4

0.2

0.0

0.0 0.2 0.4 0.6 0.8 1.0


x

Figure 10: Cobweb diagrams: We take the function f (x) = (1 + x3 )/2. A cobweb diagram is built
by alternately drawing vertical lines from the diagonal to f (x) and then horizontal lines from f (x) to the
diagonal. The dashed lines show αg = f (αg−1 ) starting with α0 = 0 and highlight the relation to the
iterative process.

Then at x0 we draw a vertical line to the curve y = f (x). We draw a horizontal line to the line y = x [which
will be at the point (x1 , x1 )]. We then repeat these steps, drawing a vertical line to y = f (x) and a horizontal
line to y = x. Cobweb diagrams are particularly useful in studying behavior near fixed or periodic points.
Exercise A.9 Understanding cobweb diagrams
From figure 10 the origin of the term “cobweb” may be unclear. Because of properties of PGFs, the
more interesting behavior does not occur for our applications. Here we investigate cobweb diagrams in more
detail for non-PGF functions. Since we use f (x) to denote a PGF, in this exercise we use z(x) for an
arbitrary function.

a. Consider the line z(x) = 2(1 − x)/3. Starting with x0 = 0, show how the first few iterations of
xi = z(xi−1 ) can be found using a cobweb diagram (do not explicitly calculate the values).
b. Now consider the line z(x) = 2(1 − x). The solution to z(x) = x is x = 2/3. Starting from an initial
x0 close to (but not quite equal to) 2/3, do several iterations of the cobweb diagram graphically.

c. Repeat this with the lines z(x) = 1/4 + x/2 starting at x0 = 0 and z(x) = −1 + 3x starting close to
where x = z(x).
d. What is different when the slope is positive or negative?
e. Can you predict what condition on the slope’s magnitude leads to convergence to or divergence from
the solution to x = z(x) when z is a line?
So far we have considered lines z(x). Now assume z(x) is nonlinear and consider the behavior of cobweb
diagrams close to a point where x = z(x).
f. Use Taylor Series to argue that (except for degenerate cases where z ′ is 1 at the intercept) it is only
the slope at the intercept that determines the behavior sufficiently close to the intercept.
of fixed points of f (x).
Exercise A.10 Structure P
Consider a PGF f (x) = i ri xi , and assume r0 > 0.

a. Show that f (1) = 1 and f (0) > 0.

55
b. Show that f (x) is convex (that is f ′′ (x) ≥ 0) for x > 0. [hint ri ≥ 0 for all i]
c. Thus argue that if f ′ (1) ≤ 1, then x = f (x) has only one solution to x = f (x) in [0, 1], namely
f (1) = 1. It may help to draw pictures of f (x) and the function y = x for x in [0, 1].
d. Explain why if there is a point x0 ̸= 1 where f (x0 ) = x0 and f (x) > x for x in some region (x0 , x1 )
then 0 < f ′ (x0 ) < 1.
e. Thus show that if f ′ (1) > 1 then there are exactly two solutions to x = f (x) in [0, 1], one of which is
x = 1.
These results suggest:
i
P
Property A.9 Assume f (x) = i ri x is a PGF, and f (0) > 0.
• If f ′ (1) ≤ 1 then the only intercept of x = f (x) in [0, 1] is at x = 1.
• Otherwise, there is another intercept x∗ , 0 < x∗ < 1, and if x < x∗ then x < f (x) < x∗ while if
x > x∗ then x > f (x) > x∗ and for 0 ≤ x0 < 1, f [g] (x0 ) converges monotonically to x∗ .

The assumption r0 > 0 was used to rule out f (x) = x. Excluding this degenerate case, these results hold
even if r0 = 0, in which case we can show f ′ (1) > 1 and x∗ = 0.
To sketch the proof of this property, we note that clearly f (1) = 1, so if f (0) > 0 then either f (x) crosses
y = x at some intermediate 0 < x∗ < 1 or it does not cross until x = 1. Then using the fact that for x > 0
the slope of f is positive and increasing, we can inspect the cobweb diagram to see these results.

A.5 Finding the Kolmogorov Equations


To study continuous-time dynamics, we will want to have partial differential equations (PDEs) where we
write the time derivative of a PGF f (x, t) or f (x, y, t) in terms of f and its spatial derivatives.
We will use two approaches to find the derivative. Both start with the assumption that we know f (x, t),
and calculate the derivative by finding f (x, t + ∆t) and use the definition of the derivative:
∂ f (x, t + ∆t) − f (x, t)
f (x, t) = lim
∂t ∆t→0 ∆t
The methods differ in how they find f (x, t + ∆t). The distinction is closely related to the observation in
Exercise 2.7 that µ[g] (x) can be written as either µ[g−1] (µ(x)) or µ(µ[g−1] (x)).

• The first involves assuming we know f (x, t) and then looking through all of the possible transitions to
find how the system changes going from t to t + ∆t. This will yield the forward Kolmogorov Equations.
• The second involves starting from the initial condition f (x, 0) and finding f (x, ∆t) by investigating all
of the possible transitions. Then taking f (x, ∆t) and f (x, t) we are able to find f (x, t + ∆t). This will
yield the backward Kolmogorov Equations.

A.5.1 Forward Kolmogorov Equations


We start with the Forward Kolmogorov Equations. We let ri (t) denote the probability that at time t there
are i individuals, and define the PGF X
f (x, t) = ri (t)xi
i
We begin by looking at events that can be treated as if they remove one individual and replace it with m
individuals. Thus i is replaced by i + m − 1:

i 7→ i + m − 1 .

For example early in an epidemic, we may assume that an infected individual causes new infections at rate
β. The outcome of an infection event is equivalent to the removal of the infected individual and replacement

56
by two infected individuals. Similarly, a recovery event occurs with rate γ and is equivalent to removal with
no replacement. So λ2 = β, λ0 = γ, and all other λm are 0.
Our events happen at a per-individual rate λm , so the total rate an event occurs across the population of
i individuals is λm i. Events that can be modeled like this include decay of a radioactive particle, recovery of
an infected individual, or division of a cell. We assume that different events may be possible, each having a
different m. If multiple events have the same effect on m (for example emigration or death), we can combine
their rates into a single λm .
It will be useful to define X
Λ= λm
m

to be the combined per-capita rate of all possible events and


X
h(x) = λm xm /Λ
m

We can think of h(x) as the PGF for the number of new individuals given that a random event happens
(since λm /Λ is the probability that the random event introduces m individuals).
We start with one derivation of the equation for f˙(x, t) based on directly calculating f (x, t+∆t) and using
the definition of the derivative. An alternate way is shown in exercise A.11. For small ∆t the probability that
multiple events occur in the same time interal is O(∆t), and we will see that this is negligible. Let us assume
the system has i individuals at time t, whch occurs with probability ri (t). For P a given m, the probability
that the event occurs in the time interval given i is λm i∆t + O(∆t) and (1 − m λm i∆t) + O(∆t) measures
the probability that none of the the events occur in the time interval and the system remains in state i.
If the event occurs, the system leaves the state corresponding to xi and enters the state corresponding to
xi+m−1 . Summing over m and i, we have
" ! #!
X X X
f (x, t + ∆t) = ri (t) (λm i∆t)xi+m−1 + 1 − λm i∆t xi + O(∆t)
i m m

The O(∆t) corrects for the possibility of multiple events happening in the time interval.
A bit of algebra and separating the i and m summations shows that
X X X
f (x, t + ∆t) = ri (t)xi + λm (∆t)(xm − x) ri (t)ixi−1 + O(∆t)
i m i
X ∂ i
X
= f (x, t) + λm (xm − x)∆t x + O(∆t)
ri (t)
m i
∂x
!
X
m
X ∂ X
= f (x, t) + (∆t) λm x − x λm ri (t)xi + O(∆t)
m m
∂x i

= f (x, t) + Λ(∆t)[h(x) − x] f (x, t) + O(∆t)
∂x
So we now have
∂ f (x, t + ∆t) − f (x, t)
f (x, t) = lim
∂t ∆t→0 ∆t

f (x, t) + Λ∆t[h(x) − x] ∂x f (x, t) + O(∆t) − f (x, t)
= lim
∆t→0 ∆t

= Λ[h(x) − x] f (x, t)
∂x
We finally have

57
Property A.10 Let f (x, t) = i ri (t)xi be the PGF for the probability of having i individuals at time
P
t. Assume several events indexed by m can occur, each with rate λm i, thatPremove one individual and
replace it with m. Let Λ = m λm be the total per-capita rate and h(x) = m λm xm /Λ be the PGF of
P
the outcome of a random event. Then
∂ ∂
f (x, t) = Λ[h(x) − x] f (x, t) (40)
∂t ∂x

We look at a heuristic way to interpret this. We can rewrite Equation (40) as


" #
˙
X
m ∂
f (x, t) = (λm x − λm x) f (x, t)
m
∂x

Then if we expand f on the right hand side, we have


XX
λm (xm − x)iri xi−1
m i

The derivative serves the purpose of getting the factor i into the coefficient of each term which addresses the
fact that the rate events happen is proportional to the total count. The derivative has the additional effect
of reducing the exponent by 1, corresponding to the removal of one individual. The λm in the remaining
factor gives the per-capita rate of changing state. The xm − x captures the fact that when moving to that
new state xm individuals are added but the system is leaving the current state (which has an exponent of
xi ) with the same rate.
Exercise A.11 Alternate derivation of Equation (40)
An alternate way to derive Equation (40) is through directly calculating ṙi .
P P
a. Explain why ṙi = − m λm iri + m λm (i − m + 1)ri−m+1 .

b. Taking f˙(x, t) = i ṙi xi , derive Equation (40).


P

We can generalize this to the case where there are multiple types of individuals. For the Forward
Kolmogorov equations, it is relatively straightforward to allow for interactions between individuals. We
may be interested in this generalization when considering predator-prey interactions or interactions between
infected and susceptible individuals if we are interested in depletion of susceptibles. We assume that there
are two types of individuals A and B with counts i and j respectively, and we let rij (t) denote the probability
of a given pair i and j. We define the PGF
X
f (x, y, t) = ri,j (t)xi y j
i,j

We assume that interactions between an A and a B individual occur with some rate proportional to the
product ij We assume that the interaction removes both individuals and replaces them by m of type A and
n of type B. We denote the rate as µm,n ij, and the sum
X
M= µm,n .
m,n

We also assume that individuals of type A spontaneously undergo changes as they did above, but they
can be replaced by type A and/or type B individuals. So one individual of type A is removed and replaced
by m individuals of type A and n of type B with rate λm,n , and the combined rate for one specific transition
over the entire set of individuals is λm,n i. We define
X
Λ= λm,n .
m,n

58
We will ignore spontaneous changes by nodes of type B, but the generalization to include these can be found
by following the same method.
Finally, let X
h(x, y) = λm,n xm y n /Λ
m,n

and X
g(x, y) = µm,n xm y n /M
m,n

be the PGFs for the outcomes of the two types of events.


Then
"
X X
f (x, y, t + ∆t) = ri,j (t) [(λm,n i∆t)xi+m−1 y j + (µm,n ij∆t)xi+m−1 y j+n−1
i,j m,n
! #
X
i j
+ 1− [λm,n i∆t + µm,n ij∆t] x y + O(∆t)
m,n
X X X
i j
= ri,j (t)x y + λm,n (xm y n − x)∆t ri,j (t)ixi−1 y j
i,j m,n i,j
X X
+ µm,n (xm y n − xy)∆t ri,j (t)ijxi−1 y j−1 + O(∆t)
m,n i,j
X ∂
= f (x, y, t) + λm,n (xm y n − x)∆t f (x, y, t)
m,n
∂x
X ∂ ∂
+ µm,n (xm y n − xy)∆t f (x, y, t) + O(∆t)
m,n
∂x ∂y
 
∂ ∂ ∂
= f (x, y, t) + (∆t) Λ[h(x, y) − x] f (x, y, t) + M[g(x, y) − xy] f (x, y, t) + O(∆t)
∂x ∂x ∂y
So
∂ f (x, y, t + ∆t) − f (x, y, t)
f (x, y, t) = lim
∂t ∆t→0 ∆t
∂ ∂ ∂
(∆t)Λ[h(x, y) − x] ∂x f (x, y, t) + (∆t)M[g(x, y) − xy] ∂x ∂y f (x, y, t) + O (∆t)
= lim
∆t→0 ∆t
∂ ∂ ∂
= Λ[h(x, y) − 1]x f (x, y, t) + M[g(x, y) − 1]xy f (x, y, t)
∂x ∂x ∂y
We have shown:

Property A.11 Let f (x, y, t) = i,j rij (t)xi y j be the PGF for the probability of having i type A and j
P
type B individuals. Assume that events occur with rate λm,n i or µm,n ij toPreplace a single type APindividual
or one of each type with m type A and n type B individuals. Let Λ = m,n λm,n and M = m,n µm,j .
Then
∂ ∂ ∂ ∂
f (x, y, t) = Λ[h(x, y) − x] f (x, y, t) + M[g(x, y) − xy] f (x, y, t) (41)
∂t ∂x ∂x ∂y
m n
P
where h(x, y) = m,n λm,n P x y /Λ ism the PGF for the outcome of a random event whose rate is propor-
tional to i and g(x, y) = m,n µm,n x y n /M is the PGF for the outcome of a random event whose rate is
proportional to ij.

This can be generalized if there are events whose rates are proportional only to j or if there are more
than two types. The exercise below shows how to generalize this if the rate of events depend on i in a more
complicated manner.

59
Exercise A.12 In many cases interactions between two individuals of the same type are important. These
may occur with rate i(i − 1) orPi2 depending on the specific details. Assume we have only a single type of
individual with PGF f (x, t) = i ri (t)xi .
a. If a collection of events to replace two individuals with m individuals occur with rate βm i(i − 1),
∂2
P
Pwrite am PDE for f . Your final result should contain ∂x2 f (x, t). Use B =
find how m βm and
g(x) = m βm x /B. Follow the derivation of Equation (40).
b. If instead the events replace two individuals with m individuals and occur with rate βm i2 , find how to
∂ ∂
incorporate them into a PDE for f . Your final result should contain ∂x x ∂x f (x, t) or equivalently
∂ ∂2
∂x f (x, t) + x ∂x2 f (x, t).

Exercise A.13 Consider a chemical system that begins with some initial amount of chemical A. Let i
denote the number of molecules of species A. A molecule of A spontaneously degrades into a molecule of
B, with rate ξ per molecule. Let j denote the number of molecules of species B. Species B reacts with A at
rate ηij to produce new molecules of species B. The reactions are denoted

A 7→ B
A + B 7→ 2B

Let ri,j (t) denote the probability of i molecules of A and j molecules of B at time t. Let f (x, y, t) = ri,j (t)xi y j
be the PGF. Find the Forward Kolmogorov Equation for f (x, y, t).

A.5.2 Backward Kolmogorov equations



We now look for another derivation of ∂t f (x, t), and as before we find it by first finding f (x, t + ∆t) for small
∆t and then using the definition of the derivative. We will assume that each individual acts independently,
and at rate λm an individual may be removed and replaced by m new individuals. So if there are i total
individuals, at rate λm i the count i is replaced by i − 1 + m.
Property A.8 plays an important role in our derivation. We define f1 (x, t) = i ri (t)xi where we assume
P
that r1 (0) = 1, that is we start with exactly one individual at time 0. Then Property A.8 shows that
f1 (x, t1 + t2 ) = f1 (f1 (x, t2 ), t1 ). Then from our initial condition f1 (x, 0) = x, and

f1 (x, ∆t + t) = f1 (f1 (x, t), ∆t) (42)

We need to find f1 (x, ∆t). We have


!
X X X
i m−1
f1 (x, ∆t) = ri (0)x 1− iλm (∆t) + iλm (∆t)x + O(∆t)
i m m
!
X X
m−1
=x 1− λm (∆t) + λm (∆t)x + O(∆t)
m m
X X
= x − x(∆t) λm + λm xm + O(∆t)
m m
= x + (∆t)Λ[h(x) − x] + O(∆t)

where, as in the forward Kolmogorov case, Λ = m λm and h(x) = m λm xm /Λ is the PGF of the number
P P
of new individuals created given that an event occurs. In the first step we used the fact that for f1 (x, t),
ri (0) = 1 if i = 1 and otherwise it is 0. Thus Equation (42) implies

f1 (x, t + ∆t) = f1 (x, t) + (∆t)Λ[h(f1 (x, t)) − f1 (x, t)] + O(∆t) .

60
Now taking the definition of the derivative, we have

∂ f1 (x, t + ∆t) − f1 (x, t)


f1 (x, t) = lim
∂t ∆t→0 ∆t
f1 (x, t) + (∆t)Λ[h(f1 (x, t)) − f1 (x, t)] + O(∆t) − f1 (x, t)
= lim
∆t→0 ∆t
= Λ[h(f1 (x, t)) − f1 (x, t)]

Thus we have an ODE for f1 (x, t).


In general, our initial condition may not be a single individual, but some other number (or perhaps a value
chosen from a distribution). Let the initial condition have PGF f (x, 0). Then it follows from Property A.8
that
f (x, t) = f (f1 (x, t), 0)
So we have

Property A.12 Consider a process in which the number of individuals change in time such that when
an event occurs one individual is destroyed and replaced with m new individuals. The associated rate
associated with an event that changes the population size by m is λm i where i is the number
P of individuals.
Let f1 (x, t) be the PGF for this process beginning from a single individual and Λ = m λm . Then

f˙1 (x, t) = Λ [h(f1 (x, t)) − f1 (x, t)] (43)

where h(x) is the PGF for the number of new individuals created in a random event. If the initial number
of individuals is not 1, let f (x, 0) denote the PGF for the initial condition. Then

f (x, t) = f (f1 (x, t), 0) (44)

is the PGF at arbitrary positive time.

This is fairly straightforward to generalize to multiple types as long as none of the events involve inter-
actions.
Exercise A.14 In this exercise we generalize Property A.12 for the case where there are two types of
individuals A and B with counts i and j.
Assume events occur spontaneously with rate λm,n i to remove an individual of type A and replace it with
m of type A and n of type B, or they occur spontaneously with rate ζm,n j to remove an individual of type
B and replacePit with m of type APand n of type B.
Set Λ = m,n λm,n and Z = m,n ζm,n . Let f1,0 (x, y, t) denote the outcome beginning with one indi-
vidual of type A and f0,1 (x, y, t) denote the outcome beginning with one individual of type B.

a. Write f1,0 (x, y, ∆t) and f0,1 (x, y, ∆t) in terms of h(x, y) = m,n λm,n xm y n /Λ and g(x, y) = m,n ζm,n xm y n /Z.
P P

b. Use Property A.8, write f1,0 (x, ∆t + t) and f0,1 (x, ∆t + t) in terms of f1,0 and f0,1 evaluated at t and
∆t. The answer should resemble Equation (42).
∂ ∂
c. Derive expressions for ∂t f1,0 (x, y, t) and ∂t f0,1 (x, y, t).

d. Use this to derive Equation (22).

B Proof of Theorems 2.7 and 3.6


We now prove Theorems 2.7 and 3.6.
We first sketch out the idea behind the method of proof of Theorem2.7. The idea is that if an outbreak
dies out with exactly j infections, then there must be a transmission tree that corresponds to exactly j
infections. In the construction of the tree, each successive number of downward links was chosen from the

61
A A

B C H B C H

D G I D G I

E F E F

S = (3, 0, 2, 2, 0, 0, . . .) S = (3, 0, 2, 2, 0, 0, 0, 1, 0)

Figure 11: Demonstration of the steps mapping the tree T to the sequence S. The nodes are traced in a depth-
first traversal and their number of offspring is recorded. For the labeling given, a depth-first traversal traces
the nodes in alphabetical order. At an intermediate stage (left) the traversal has not finished the sequence.
The final sequence (right) is uniquely determined once the order of a node’s offspring is (randomly) chosen.

offspring distribution. Our goal is to find out the probability of arriving at a finite tree with exactly j
infections given the offspring distribution.
This tree has certain constraints on it. The first constraint is that it must have exactly j −1 transmissions
from the j infected individuals. So we look at the probability of having a sum of j − 1 when we choose j
numbers from the offspring distribution. This is given by the coefficient of y j−1 in [µ(y)]j .
Next we have to make sure that the sequence is consistent with an outbreak that did not die out sooner.
For example, if an outbreak has exactly two infections, we cannot assume that the first individual infected
no-one and then the second individual infected 1 because the outbreak would have died out without the
second individual having the chance to transmit. So it is not enough for the sequence to add to j − 1, the
order must be consistent with an outbreak of size j.
It turns out that we can find a one-to-one mapping between trees on j individuals and “valid” sequences
summing to j − 1. When doing this, we discover that if a sequence sums to j − 1, there is exactly one cyclic
permutation of that sequence which is valid.6 Thus of all sequences of j values chosen from the offspring
distribution that sum to j − 1, a fraction 1/j are “valid”, that is they yield a complete transmission tree. So
the probability is (1/j) times the coefficient of y j−1 in [µ(y)]j .
We now go through the proof in detail.

B.1 Proof of Theorem 2.7


We take as given a probability distribution so that pi is the probability of i offspring.
We will first show a way to represent a (finite) transmission tree as a sequence of integers representing
the number of offspring of each node. Additionally we show that the possible sequences coming from a tree
can be characterized by a few specific properties. Then the probability of such a sequence corresponds to
the probability of the corresponding tree.
Given a finite transmission tree T , we first order the offspring of any individual (randomly) from “left”
to “right”. We then construct a sequence S by performing a depth-first traversal of the tree and recording
the number of offspring as we visit the nodes of the tree, as shown in Fig. 11. A sequence constructed in
this way is called a Lukasiewicz word [45].
It is straightforward to see that if we are given a Lukasiewicz word ST , we can uniquely reconstruct the
(ordered) tree T from which it came.
We now note that the probability of observing
Q a given length-j sequence Ŝ by choosing j numbers from
the offspring distribution is simply πŜ = si ∈Ŝ psi .
Similarly, as infection spreads, each infected individual infects some number si with probability psi . If
we calculate each si in the order of a depth-first search, it is clear that the probability of observing a given
tree T with Lukasiewicz word ST is exactly πST .
6 A cyclic permutation of a sequence is formed by thinking of the sequence as a loop, and then choosing a different starting

point.

62
2 1 0
1 A
2 A 0 2 A 2 A I B
I B I B I B
0 H H C
C 0 H C 0 H C
G D
G D G D G D 2 0
3 0 2 0 2 0 F E
F E F E F E
0
0 1 0 0
G
G G G
H I E
H I E H I E H I E
A D F
A D F A D F A D F
B C
B C B C B C

0 A A
I B I B A
I B
H C H C
H C
G D G D
2 1 G D
F E F E 0
F E
0 0
G
G G
H I E
H I E H I E
A D F
A D F A D F
B C
B C B C

Figure 12: The steps of the construction of a tree with Ŝ = (2, 0, 0, 0, 1, 0, 3, 0, 2) [note that this is a cyclic
permutation of the previous S]. Each frame shows next step in building a tree on a ring. The resulting tree
is not rooted at the top. The names of the nodes in the tree are a cyclic permutation of the original.

Now we look for the probability that a random length-j sequence Ŝ created by choosing numbers from
the offspring distribution is a Lukasiewicz word.7 P
To be a Lukasiewicz word, Ŝ must satisfy that si ∈Ŝ si = j − 1 because the sum is the total number
of transmissions occurring which is one less than the total number of infections. By repeated application of
Property A.6, the probability a sequence of j numbers chosen from the offspring distribution sums to j − 1
is the coefficient of y j−1 in [µ(y)]j . So the probability that a random sequence Ŝ satisfies this constraint is
the coefficient of y j−1 in [µ(y)]j .
Momentarily we will show that given a length-j sequence Ŝ which sums to j − 1, exactly one of its j
cyclic permutations is a Lukasiewicz word, but let us for now assume this result is true.
Consider the j distinct sequences that are cyclic permutations of a sequence Ŝ which sums to j − 1. Since
each of these is a sequence of exactly the same values they have the same probability. Our assumption that
exactly one of them is a Lukasiewicz word means that if Ŝ satisfies the constraint that it sums to j − 1 then
with probability 1/j it is a Lukasiewicz word. So the probability that a random sequence is a Lukasiewicz
word would be 1/j times the probability it sums to j − 1. That is it would be 1/j times the coefficient of
y j−1 in [µ(y)]j . This is the claim of Theorem 2.7.
However, our earlier assumption must still be proven: if Ŝ sums to j − 1 then exactly one of its j
permutations is a Lukasiewicz word.
Given a length j sequence Ŝ of non-negative integers that sum to j − 1, we place j nodes on a ring
starting at the top and ordered clockwise, following the example in Figure 12. We label each ith node with
si . If a node v is labeled with 0 and the adjacent position in the counter-clockwise direction has node u with
a positive label, we place an edge from u to v (with v to the right of any previous edge from u to another
7 If the sequence is not a Lukasiewicz word, then either it is the start of a sequence corresponding to a larger (possibly infinite

tree), or some initial subsequence corresponds to a completed tree.

63
node) and remove v. We decrease u’s label by one. Note that at a given step there may be multiple pairs
eligible to have edges placed between them, in which case we do all of them. If we did one at a time, the
final outcome would be the same.
Each edge addid in this process reduces both the number of nodes and their sums by one, leaving all
numbers as non-negative integers. So the sum remains one less than the remaining number of nodes. This
guarantees at least one zero and at least one nonzero value until only one node remains. Thus we can always
find an appropriate pair u and v until only a single node remains. The process constructs a directed tree
(there are j nodes with j − 1 edges and the fact that a node is removed from the algorithm once an edge is
added pointing to it guarantees no cycles). Fig. 12 demonstrates the steps.
If the tree is rooted at the node that began at the top of the ring, then Ŝ corresponds to a depth-first
traversal of that tree. It is a Lukasiewicz word. Each cyclic permutation of Ŝ rotates the location of the root
to be one of the j nodes. Only the case when the root is at the top will result in a Lukasiewicz word. Thus
Ŝ has exactly j distinct cyclic permutations, and exactly one of them is Lukasiewicz word. This completes
the final detail of the proof.
So we finally conclude that the probability of a tree of j infected nodes is equal to 1/j times the probability
that j randomly-chosen values from the offspring distribution sum to j − 1. This is 1/j times the coefficient
of y j−1 in [µ(y)]j as Theorem 2.7 claims.

B.2 Theorem 3.6


We can prove Theorem 3.6 as a special case of Theorem 2.7 by calculating the offspring distribution (Exer-
cise B.1). However, a more illuminating proof is by noting that if we treat a transmission event as a node
disappearing and being replaced by two infected nodes and a recovery event as a node disappearing with no
offspring, then we have a tree where each node has 2 or 0 offspring. The total number of actual individuals
infected in the outbreak is equal to the number of nodes with 0 offspring in the tree.
Following the arguments above, we are looking for sequences of length 2j − 1 in which 2 appears j − 1
times and 0 appears j times. There are 2j−1

j−1 such sequences. The probability of each is β j−1 γ j /(β + γ)2j−1
and a fraction 1/(2j − 1) of these correspond to trees. Thus, the probability a length 2j − 1 sequence is a
Lukasiewicz word is
β j−1 γ j 2j − 1 1 β j−1 γ j 2j − 2
   
1
=
2j − 1 (β + γ)2j−1 j − 1 j (β + γ)2j−1 j − 1
Using the same approach as before, we conclude that this is the probability of exactly j infections.
Exercise B.1 If we do not think of an infected individual as disappearing and being replaced by two infected
individuals when a transmission happens, but rather, we count up all of the transmissions the individual
causes, we get a geometric distribution with q = β/(β + γ). The details are in Exercise 3.2. Use this along
with Theorem 2.7 and Table 6 (which was derived in exercise 2.13) to give a different proof of Theorem 3.6.

C Software
We have produced a python package, Invasion PGF which can be used to solve the equations of Section 2
or Section 3 once the PGF of the offspring distribution or β and γ are determined. Because the numerical
method involves solving differential equations in the complex plane, it requires an integration routine that
can handle complex values. For this we use odeintw [50].
Table 7 briefly summarizes the commands available in Invasion PGF.
We now demonstrate a sample session with these commands.
>>> import Invasion_PGF as pgf
>>> def mu(x):
... return (1 + x + x**2 + x**3)/4.
...
>>> pgf.R0(mu)
1.5000001241105565
>>> #extinction probabilities up to generation 5

64
Command Output
R0(µ) Approximation of R0 .
Probability αgen of extinction by generation gen given off-
extinction prob(µ, gen)
spring PGF µ.
Probability α(T ) of extinction by time T given transmission
cts time extinction prob(β, γ, T )
and recovery rates β and γ.
Array containing probabilities ϕ0 , . . . , ϕj , . . . , ϕM −1 of having
active infections(µ, gen, M )
j active infections in generation gen given offspring PGF µ.
Array containing probabilities ϕ0 , . . . , ϕj , . . . , ϕM −1 of having
cts time active infections(β, γ,
j active infections at time T given transmission and recovery
T)
rates β and γ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
completed infections(µ, gen, M ) j completed infections in generation gen given offspring PGF
µ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
cts time completed infections (β,
j completed infections at time T given transmission and re-
γ, T )
covery rates β and γ.
M1 × M2 array containing probabilities πi,r of i active in-
active and completed(µ, gen, M1 ,
fections and r completed infections in generation gen given
M2 )
offspring PGF µ.
M1 × M2 array containing probabilities πi,r of i active infec-
cts time active and completed (β,
tions and r completed infections at time T given transmission
γ, T )
and recovery rates β and γ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
final sizes(µ, M )
j total infections in an outbreak given offspring PGF µ.
Array containing probabilities ω0 , . . . , ωj , . . . , ωM −1 of having
cts time final sizes(β, γ, T ) j total infections in an outbreak given transmission and re-
covery rates β and γ.

Table 7: Commands of Invasion PGF. Many of these have an optional boolean argument
intermediate values which, if True, will result in returning values from generation 0 to generation gen
in the discrete-time case or at some intermediate times in the continuous-time case. For the discrete-time
results, the input µ is the offspring distribution PGF. For the continuous-time version, β and γ are the
transmission and recovery rates respectively.

>>> pgf.extinction_prob(mu, 5, intermediate_values = True)


array([ 0. , 0.25 , 0.33203125, 0.36972018, 0.38923784,
0.39992896])
>>> #following commands look at possible states in generation 3
>>> #probability of 0..9 active infections in generation 3
>>> pgf.active_infections(mu, 3, 10)
array([ 0.36972018, 0.05259718, 0.07178445, 0.09609134, 0.07393309,
0.07334027, 0.06617007, 0.05007252, 0.04119097, 0.03182213])
>>> #probability of 0..9 completed infections in generation 3
>>> pgf.completed_infections(mu, 3, 10)
array([ -2.04281037e-17, 2.50000000e-01, 6.25000000e-02,
7.81250000e-02, 9.76562500e-02, 1.21093750e-01,

65
8.59375000e-02, 8.59375000e-02, 7.81250000e-02,
6.25000000e-02])
>>> #joint probabilities of 0..4 active infections and 0..4 completed
>>> #infections in generation 3
>>> pgf.active_and_completed(mu, 3, 5, 5)
array([[ 0. , 0.25 , 0.0625 , 0.03125 , 0.015625 ],
[ 0. , 0. , 0. , 0.015625 , 0.015625 ],
[ 0. , 0. , 0. , 0.015625 , 0.01953125],
[ 0. , 0. , 0. , 0.015625 , 0.0234375 ],
[ 0. , 0. , 0. , 0. , 0.01171875]])
>>> #check that marginals match, increase sizes considered to improve match
>>> act_and_complete = pgf.active_and_completed(mu, 3, 20, 20)
>>> act_and_complete.sum(axis=1) #Active infections
array([ 3.69720176e-01, 5.25971800e-02, 7.17844516e-02,
9.60913450e-02, 7.39330947e-02, 7.33402669e-02,
6.61700666e-02, 5.00725210e-02, 4.11909670e-02,
3.18221301e-02, 2.31783241e-02, 1.72899812e-02,
1.21286511e-02, 8.08435678e-03, 5.25146723e-03,
3.23349237e-03, 1.90655887e-03, 1.08598173e-03,
5.83335757e-04, 2.96160579e-04])
>>> act_and_complete.sum(axis=0) #Completed infections
array([ 0. , 0.25 , 0.0625 , 0.078125 , 0.09765625,
0.12109375, 0.0859375 , 0.0859375 , 0.078125 , 0.0625 ,
0.0390625 , 0.02342606, 0.01163167, 0.00376529, 0. ,
0. , 0. , 0. , 0. , 0. ])
>>> #yes, these match previous calculations, with a small mismatch because
>>> #e.g., there may be 21 cumulative cases and 8 active cases. To accurately
>>> #calculate the probability of 8 active cases we would need to
>>> #increase the sizes to include this.

>>> #
>>> #Now look at the final sizes
>>> pgf.final_sizes(mu, 20)
array([ 0.00000000e+00, 2.50000000e-01, 5.93750000e-02,
2.82031250e-02, 1.67456055e-02, 1.03404114e-02,
6.80080902e-03, 4.66611063e-03, 3.29263648e-03,
2.37637247e-03, 1.74605802e-03, 1.30159459e-03,
9.81970183e-04, 7.48352208e-04, 5.75249662e-04,
4.45491477e-04, 3.47250478e-04, 2.72225362e-04,
2.14494366e-04, 1.69773210e-04])
>>> #
>>> #Now consider the continuous-time model
>>> beta = 2
>>> gamma = 1
>>> #In next command, first returned array is the times and second
>>> #is the extinction probabilities at those times
>>> pgf.cts_time_extinction_prob(beta, gamma, 5, intermediate_values =
... True, numvals = 6)
(array([ 0., 1., 2., 3., 4., 5.]),
array([[ 0. , 0.38730017, 0.46371057, 0.48723549, 0.49537878,
0.49830983]]))
>>> #following commands look at possible states at time 3
>>> pgf.cts_time_active_infections(beta, gamma, 3, 10)
array([ 0.48723548, 0.01309038, 0.0127562 , 0.01243055, 0.01211321,

66
0.01180397, 0.01150263, 0.01120897, 0.01092282, 0.01064397])
>>> pgf.cts_time_completed_infections(beta, gamma, 3, 10)
array([ 0.00037014, 0.33477236, 0.07721546, 0.03805734, 0.02535527,
0.02008499, 0.01755497, 0.0161637 , 0.01527602, 0.0146211 ])
>>> #check that the joint distribution has the same marginals
>>> cts_time_act_and_complete = pgf.cts_time_active_and_completed(beta, gamma, 3, 20, 20)
>>> cts_time_act_and_complete.sum(axis=1) #Active infections
array([ 0.48717492, 0.01298644, 0.01257732, 0.0121518 , 0.01170803,
0.0112452 , 0.01076355, 0.01026434, 0.00974976, 0.00922277,
0.00868697, 0.00814639, 0.00760529, 0.00706802, 0.0065388 ,
0.00602164, 0.00552019, 0.00503763, 0.00457669, 0.00413954])
>>> cts_time_act_and_complete.sum(axis=0) #Completed infections
array([ 0.00036997, 0.334771 , 0.07720946, 0.03803859, 0.02530855,
0.01998599, 0.0173696 , 0.01584881, 0.0147816 , 0.01389348,
0.01306482, 0.01224544, 0.01141776, 0.01057991, 0.0097376 ,
0.00889996, 0.00807718, 0.0072792 , 0.00651487, 0.00579152])
>>> #yes, these match previous calculations
>>> #Now look at the final sizes at time infinity
>>> pgf.cts_time_final_sizes(beta, gamma, 20)
array([ 0.00000000e+00, 3.33333333e-01, 7.40740741e-02,
3.29218107e-02, 1.82898948e-02, 1.13803790e-02,
7.58691934e-03, 5.29880081e-03, 3.82691169e-03,
2.83474940e-03, 2.14181066e-03, 1.64421828e-03,
1.27883644e-03, 1.00558079e-03, 7.98079995e-04,
6.38463996e-04, 5.14318219e-04, 4.16833066e-04,
3.39641758e-04, 2.78069275e-04])

Acknowledgments
This work was funded by Global Good.
I thank Linda Allen for useful discussion about the Kolmogorov equations. Hao Hu played an important
role in inspiring this work and testing the methods. Hil Lyons and Monique Ambrose provided valuable
feedback on the discussion of inference. Amelia Bertozzi-Villa and Monique Ambrose read over drafts and
recommended a number of changes that have significantly improved the presentation.
The python code and output in Appendix C was incorporated using Pythontex [43]. I relied heavily
on https://tex.stackexchange.com/a/355343/70067 by “touhami” in setting up the solutions to the
exercises.

References
[1] Linda JS Allen. An introduction to stochastic epidemic models. In Mathematical Epidemiology, pages
81–130. Springer, 2008.
[2] Linda JS Allen. An introduction to stochastic processes with applications to biology. CRC Press, 2010.
[3] Linda JS Allen. A primer on stochastic epidemic models: Formulation, numerical simulation, and
analysis. Infectious Disease Modelling, 2017.
[4] Tibor Antal and PL Krapivsky. Exact solution of a two-type branching process: models of tumor
progression. Journal of Statistical Mechanics: Theory and Experiment, 2011(08):P08018, 2011.
[5] Norman TJ Bailey. The total size of a general stochastic epidemic. Biometrika, pages 177–185, 1953.
[6] Norman TJ Bailey. The elements of stochastic processes with applications to the natural sciences. John
Wiley & Sons, 1964.

67
[7] MS Bartlett. Some evolutionary stochastic processes. Journal of the Royal Statistical Society. Series B
(Methodological), 11(2):211–229, 1949.
[8] Seth Blumberg and James O Lloyd-Smith. Inference of R0 and transmission heterogeneity from the
size distribution of stuttering chains. PLoS Computational Biology, 9(5):e1002993, 2013.
[9] Folkmar Bornemann. Accuracy and stability of computing high-order derivatives of analytic functions
by Cauchy integrals. Foundations of Computational Mathematics, 11(1):1–63, 2011.
[10] Andrei Broder, Ravi Kumar, Farzin Maghoul, Prabhakar Raghavan, Sridhar Rajagopalan, Raymie
Stata, Andrew Tomkins, and Janet Wiener. Graph structure in the web. Computer Networks, 33:309–
320, 2000.
[11] Jessica M Conway and Daniel Coombs. A stochastic model of latently infected cell reactivation and
viral blip generation in treated HIV patients. PLoS Computational Biology, 7(4):e1002033, 2011.
[12] O. Diekmann and J. A. P. Heesterbeek. Mathematical epidemiology of infectious diseases. Wiley Chich-
ester, 2000.
[13] S. N. Dorogovtsev, J. F. F. Mendes, and A. N. Samukhin. Giant strongly connected component of
directed networks. Physical Review E, 64(2):025101, Jul 2001.
[14] Richard Durrett. Branching process models of cancer. In Branching Process Models of Cancer, pages
1–63. Springer, 2015.
[15] Meyer Dwass. The total progeny in a branching process and a related random walk. Journal of Applied
Probability, 6(3):682–686, 1969.
[16] David Easley and Jon Kleinberg. Networks, crowds, and markets: Reasoning about a highly connected
world. Cambridge University Press, 2010.
[17] Joseph A Gallian and David J Rusin. Cyclotomic polynomials and nonstandard dice. Discrete Mathe-
matics, 27(3):245–259, 1979.
[18] Martin Gardner. Mathematical games. Scientific American, 238:19–32, 1978.
[19] Wayne M Getz and James O Lloyd-Smith. Basic methods for modeling the invasion and spread of
contagious diseases. In Disease Evolution: Models, Concepts, and Data Analyses, pages 87–112, 2006.
[20] Tiberiu Harko, Francisco SN Lobo, and MK Mak. Exact analytical solutions of the Susceptible-Infected-
Recovered (SIR) epidemic model and of the SIR model with equal death and birth rates. Applied
Mathematics and Computation, 236:184–194, 2014.
[21] Peter D Hoff. A first course in Bayesian statistical methods. Springer Science & Business Media, 2009.
[22] Remco van der Hofstad and Michael Keane. An elementary proof of the hitting time theorem. The
American Mathematical Monthly, 115(8):753–756, 2008.
[23] Thomas House, Joshua V Ross, and David Sirl. How big is an outbreak likely to be? methods for
epidemic final-size calculation. Proc. R. Soc. A, 469(2150):20120436, 2013.
[24] Eben Kenah and Joel C. Miller. Epidemic percolation networks, epidemic outcomes, and interventions.
Interdisciplinary Perspectives on Infectious Diseases, 2011, 2011.
[25] David G Kendall. Stochastic processes and population growth. Journal of the Royal Statistical Society.
Series B (Methodological), 11(2):230–282, 1949.
[26] Marek Kimmel and David E Axelrod. Branching processes in biology. interdisciplinary applied mathe-
matics 19, 2002.
[27] Istvan Z Kiss, Joel C Miller, and Péter L Simon. Mathematics of epidemics on networks: from exact to
approximate models. Springer, Forthcoming.

68
[28] Adam J Kucharski and W John Edmunds. Characterizing the transmission potential of zoonotic infec-
tions from minor outbreaks. PLoS Computational Biology, 11(4):e1004154, 2015.
[29] Mark A Lewis, Sergei V Petrovskii, and Jonathan R Potts. The mathematics behind biological invasions,
volume 44. Springer, 2016.

[30] James O Lloyd-Smith, Sebastian J Schreiber, P Ekkehard Kopp, and Wayne M Getz. Superspreading
and the effect of individual variation on disease emergence. Nature, 438(7066):355, 2005.
[31] Donald Ludwig. Final size distributions for epidemics. Mathematical Biosciences, 23:33–46, 1975.
[32] Junling J. Ma and David J. D. Earn. Generality of the final size formula for an epidemic of a newly
invading infectious disease. Bulletin of Mathematical Biology, 68(3):679–702, 2006.

[33] Robert M May. Simple mathematical models with very complicated dynamics. Nature, 261(5560):459–
467, 1976.
[34] Joel C. Miller. A note on a paper by Erik Volz: SIR dynamics in random networks. Journal of
Mathematical Biology, 62(3):349–358, 2011.

[35] Joel C. Miller. A note on the derivation of epidemic final sizes. Bulletin of Mathematical Biology,
74(9):2125–2141, 2012.
[36] Joel C Miller, Bahman Davoudi, Rafael Meza, Anja C Slim, and Babak Pourbohloul. Epidemics with
general generation interval distributions. Journal of Theoretical Biology, 262(1):107–115, 2010.

[37] Joel C. Miller, Anja C. Slim, and Erik M. Volz. Edge-based compartmental modelling for infectious
disease spread. Journal of the Royal Society Interface, 9(70):890–906, 2012.
[38] Cristopher Moore and Mark EJ Newman. Exact solution of site and bond percolation on small-world
networks. Physical Review E, 62(5):7059, 2000.
[39] Sean Nee, Edward C Holmes, Robert M May, and Paul H Harvey. Extinction rates can be estimated
from molecular phylogenies. Phil. Trans. R. Soc. Lond. B, 344(1307):77–82, 1994.
[40] Hiroshi Nishiura, Ping Yan, Candace K Sleeman, and Charles J Mode. Estimating the transmission
potential of supercritical processes based on the final size distribution of minor outbreaks. Journal of
Theoretical Biology, 294:48–55, 2012.

[41] Heinz-Otto Peitgen, Hartmut Jürgens, and Dietmar Saupe. Chaos and fractals: new frontiers of science.
Springer Science & Business Media, 2006.
[42] George Pólya. Mathematics and plausible reasoning: Induction and analogy in mathematics, volume 1.
Princeton University Press, 1990.
[43] Geoffrey M Poore. Pythontex: reproducible documents with LATEX, Python, and more. Computational
Science & Discovery, 8(1):014010, 2015.
[44] Timothy Reluga, Rafael Meza, D. Brian Walton, and Alison P. Galvani. Reservoir interactions and
disease emergence. Theoretical population biology, 72(3):400–408, 2007.
[45] Richard P. Stanley. Enumerative Combinatorics, volume II. Cambridge University Press, 2001.

[46] L. D. Valdez, P. A. Macri, and L. A. Braunstein. Temporal percolation of the susceptible network in
an epidemic spreading. PLoS One, 7(9):e44188, 2012.
[47] Erik M. Volz. SIR dynamics in random networks with heterogeneous connectivity. Journal of Mathe-
matical Biology, 56(3):293–310, 2008.

[48] Erik M Volz, Ethan Romero-Severson, and Thomas Leitner. Phylodynamic inference across epidemic
scales. Molecular biology and evolution, 34(5):1276–1288, 2017.

69
[49] Henry William Watson and Francis Galton. On the probability of the extinction of families. The Journal
of the Anthropological Institute of Great Britain and Ireland, 4:138–144, 1875.
[50] Warren Weckesser. odeintw. https://github.com/WarrenWeckesser/odeintw.
[51] JG Wendel. Left-continuous random walk and the Lagrange expansion. American Mathematical
Monthly, pages 494–499, 1975.
[52] Herbert S. Wilf. generatingfunctionology. A K Peters, Ltd, 3rd edition, 2005.
[53] Ping Yan. Distribution theory, stochastic processes and infectious disease modelling. Mathematical
Epidemiology, pages 229–293, 2008.

70

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy