Release Popgen Notes

GRAHAM COOP
POPULATION AND
QUANTITATIVE
GENETICS
Author: Graham Coop
Author address: Department of Evolution and Ecology & Center for Population Biology,
University of California, Davis.
To whom correspondence should be addressed: gmcoop@ucdavis.edu
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/
i.e. you are free to reuse and remix this work, but please include an attribution to the original.
Typeset using LATEX and the tufte-latex book style.
The LATEX code and R code for this book are kept here https://github.com/cooplab/popgen-notes/ and again are
under a Creative Commons Attribution 3.0 Unported License.
Updated on January 2020

Dedicated to my parents, partner, and children. Thank you for
teaching me the true joy of descent with modification.
This book was developed from my set of notes for the Population Biology graduate group core class (PBG200A) and
Undergraduate Population and Quantitative Genetics class (EVE102) at UC Davis. Thanks to the many students
who’ve read these notes and suggested improvements. Thanks to Simon Aeschbacher, Vince Buffalo, and Erin Calfee
who read and extensively edited earlier drafts of these notes. To illustrate these notes I’ve used old scientific and
natural history illustrations, in part because they are out of copyright but mainly because they bring me joy. Many
of the old images come from Biodiversity Heritage Library a consortium of natural history institutions that are digi-
tizing their collections and make them freely available online. If you enjoy the images consider donating to the BHL.
Many of the data and simulation graphics in the book were prepared in R (2018), the code for each is linked to from
the caption of each figure. In many cases data were extracted from old figures using the WebPlotDigitizer tool, as
such I advise re-extracting the data if you wish to use it for research purposes.
Contents
1 Introduction 7
2 Allele and Genotype Frequencies 11
3 Population Structure and Correlations Among Loci. 31
4 Genetic Drift and Neutral Diversity 49
5 The Population Genetics of Divergence and Molecular Substitution. 75
6 Neutral Diversity and Population Structure. 89
7 Phenotypic Variation and the Resemblance Between Relatives 99
8 The Response to Phenotypic Selection 121
9 The Response of Multiple Traits to Selection. 135
10 One-Locus Models of Selection 147
11 The Interaction of Selection, Mutation, and Migration. 183

6 graham coop
12 The Impact of Genetic Drift on Selected Alleles 199
13 The Effects of Linked Selection. 211
14 Interaction of Multiple Selected Loci 227
A. An Introduction to Mathematical Concepts 245
Bibliography 259
1
Introduction
Biological evolution is the change over time in the

genetic composition of a population. 1 Our population is 1
Dobzhansky, T., 1951 Genetics
made up of a set of interbreeding individuals, the genetic composition and the Origin of Species (3rd Ed.
ed.)., pp. 16
of which is made up of the genomes that each individual carries. The
genetic composition of the population alters due to the death of indi-
viduals or the migration of individuals in or out of the population. If
our individuals vary in the number of children they have, this also al-
ters the genetic composition of the population in the next generation.
Every new individual born into the population subtly changes the
genetic composition of the population. Their genome is a unique com-
bination of their parents’ genomes, having been shuffled by segregation
and recombination during meioses, and possibly changed by mutation.
These individual events seem minor at the level of the population, but
it is the accumulation of small changes in aggregate across individuals
and generations that is the stuff of evolution. It is the compounding
of these small changes over tens, hundreds, and millions of genera-
tions that drives the amazing diversity of life that has emerged on this
earth.
Population genetics is the study of the genetic composition of natu-
ral populations and its evolutionary causes and consequences. Quanti-
tative genetics is the study of the genetic basis of phenotypic variation
and how phenotypic changes evolve over time. Both fields are closely
conceptually aligned as we’ll see throughout these notes. They seek to
describe how the genetic and phenotypic composition of populations
can be changed over time by the forces of mutation, recombination,
selection, migration, and genetic drift. To understand how these forces
interact, it is helpful to develop simple theoretical models to help our
intuition. In these notes we will work through these models and sum-
marize the major areas of population- and quantitative-genetic theory. “All models are wrong but some are
While the models we will develop will seem naïve, and indeed they useful” - Box (1979).
are, they are nonetheless incredibly useful and powerful. Throughout
8 graham coop
the course we will see that these simple models often yield accurate
predictions, such that much of our understanding of the process of
evolution is built on these models. We will also see how these models
are incredibly useful for understanding real patterns we see in the evo-
lution of phenotypes and genomes, such that much of our analysis of
evolution, in a range of areas from human medical genetics to conser-
vation, is based on these models. Therefore, population and quantita-
tive genetics are key to understanding various applied questions, from
how medical genetics identifies the genes involved in disease to how we
preserve species from extinction.
Population genetics emerged from early efforts to reconcile Mendelian
genetics with Darwinian thought. Part of the power of population See Provine (2001) for a history
genetics comes from the fact that the basic rules of transmission ge- of early population genetics.
Provine, W. B., 2001 The
netics are simple and nearly universal. One of the truly remarkable origins of theoretical population
things about population genetics is that many of the important ideas genetics: with a new afterword.
University of Chicago Press
and mathematical models emerged before the 1940s, long before the
mechanistic-basis of inheritance (DNA) was discovered, and yet the
usefulness of these models has not diminished. This is a testament to
the fact that the models are established on a very solid foundation,
building from the basic rules of genetic transmission combined with
simple mathematical and statistical models.
Much of this early work traces to the ideas of R.A. Fisher, Sewall
Wright, and J.B.S. Haldane, who, along with many others, described
the early principals and mathematical models underlying our under-
standing of the evolution of populations. Building on this conceptual
fusion of genetics and evolution, there followed a flourishing of evolu-
tionary thought, the modern evolutionary synthesis, combining these
ideas with those from the study of speciation, biodiversity, and pale-
ontology. In total this work showed that both short-term evolutionary
change and the long-term evolution of biodiversity could be well un-
derstood through the gradual accumulation of evolutionary change
within and among populations. This evolutionary synthesis contin-
ues to this day, combining new insights from genomics, phylogenetics,
ecology, and developmental biology. “Dobzhansky (1951)
Population and quantitative genetics are a necessary but not suf- once defined evolution as ’a
ficient description of evolution; it is only by combining the insights change in the genetic com-
of many fields that a rich and comprehensive picture of evolution position of the populations’
emerges. We certainly do not need to know the genes underlying the an epigram that should not
be mistaken for the claim
displays of the birds of paradise to study how the divergence of these
that everything worth saying
displays, due to sexual selection, may drive speciation. Indeed, as we’ll about evolution is contained
see in our discussion of quantitative genetics, we can predict how pop- in statements about genes”
ulations respond to selection, including sexual selection and assortative – Lewontin (2001)
mating, without any knowledge of the loci involved. Nor do we need
to know the precise selection pressures and the ordering of genetic
population and
quantitative
genetics 9
changes to study the emergence of the tetrapod body plan. We do
not necessarily need to know all the genetic details to appreciate the
beauty of these, and many other, evolutionary case studies. However,
every student of biology gains from understanding the basics of pop-
ulation and quantitative genetics, allowing them to base their studies
on a solid bedrock of understanding of the processes that underpin all
evolutionary change.
2
Allele and Genotype Frequencies
In this chapter we will work through how the basics of Mendelian

genetics play out at the population level in sexually reproducing or-
ganisms. A locus (plural: loci) is a specific spot
in the genome. The term allele was
Loci and alleles are the basic currency of population genetics–and
coined by Edith Rebecca Saunders
indeed of genetics. A locus may be an entire gene, or a single nu- and William Bateson in 1902 in their
cleotide base pair such as A-T. At each locus, there may be multiple paper “The facts of heredity in the
light of Mendel’s discovery” .
genetic variants segregating in the population—these different genetic
variants are known as alleles. If all individuals in the population carry
the same allele, we say that the locus is monomorphic; at this locus
there is no genetic variability in the population. If there are multiple
alleles in the population at a locus, we say that this locus is polymor-
phic (this is sometimes referred to as a segregating site).
Table 2.1 shows a small stretch of orthologous sequence for the
ADH locus from samples from Drosophila melanogaster, D. simulans,
and D. yakuba. D. melanogaster and D. simulans are sister species and
D. yakuba is a close outgroup to the two. Each column represents a
single haplotype from an individual (the individuals are diploid but
were inbred so they’re homozygous for their haplotype). Only sites
that differ among individuals of the three species are shown. Site 834
is an example of a polymorphism; some D. simulans individuals carry
a C allele while others have a T . Fixed differences are sites that differ
between the species but are monomorphic within the species. Site 781
is an example of a fixed difference between D. melanogaster and the
other two species.
We can also annotate the alleles and loci in various ways. For ex-
ample, position 781 is a non-synonymous fixed difference. We call the Figure 2.1: Drosophila melanogaster
less common allele at a polymorphism the minor allele and the com- holds a special place in the history
mon allele the major allele, e.g. at site 1068 the T allele is the minor of genetics and population genetics.
From Morgan’s fly room discovering
allele in D. melanogaster. We call the more evolutionarily recent of the principals of genetics to Dobzhan-
the two alleles the derived allele and the older of the two the ancestral sky’s early work on natural genetic
variation.
allele. We infer that the T allele at site 1068 is the derived allele be- Contributions to the genetics of Drosophila
melanogaster (1919). Morgan T.H., Bridges
cause the C is found in both other species, suggesting that the T allele C.B., Sturtevant A. H. Image from the Bio-
diversity Heritage Library. Contributed by
MBLWHOI Library. Not in copyright.
12 graham coop
arose via a C → T mutation.
pos. con. a b c d e f g h i j k l a b c d e f a b c d e f g h i j k l NS/S
781 G T T T T T T T T T T T T - - - - - - - - - - - - - - - - - - NS
789 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
808 A - - - - - - - - - - - - - - - - - - G G G G G G G G G G G G NS
816 G T T T T - - - - - - - T T T T T T T - - - - - - - - - - - - S
834 T - - - - - - - - - - - - C C - - - C - - - - - - - - - - - - S
859 C - - - - - - - - - - - - - - - - - - G G G G G G G G G G G G NS
867 C - - - - - - - - - - - - - - - - - - G G G G G A G G G G G G S
870 C T T T T T T T T T T T T - - - - - - - - - - - - - - - - - - S
950 G - - - - - - - - - - - - - A - - - - - - - - - - - - - - - - S
974 G - - - - - - - - - - - - T - T T T T - - - - - - - - - - - - S
983 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
1019 C - - - - - - - - - - - - - - - - - - - - - - A - - - - - - - S
1031 C - - - - - - - - - - - - - - - - - - - - - - - - - - A - - - S
1034 T - - - - - - - - - - - - - - - - - - C C C C C - - C - C C S
1043 C - - - - - - - - - - - - - - - - - - - - - - A - - - - - - - S
1068 C T T - - - - - - - - - - - - - - - - - - - - - - - - - - - - S
1089 C - - - - - - - - - - - - A A A A A A - - - - - - - - - - - - NS
1101 G - - - - - - - - - - - - - - - - - - A A A A A A A A A A A A NS
1127 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
1131 C - - - - - - - - - - - - - - - - - - - - - - T - - - - - - - S
1160 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
Table 2.1: Variable sites in exons 2

and 3 of the ADH gene in Drosophila
McDonald and Kreitman (1991).
Question 1. A) How many segregating sites does the sample The first column (pos.) gives the
from D. simulans have in the ADH gene? position in the gene; exon 2 begins
B) How many fixed differences are there between D. melanogaster at position 778 and we’ve truncated
the dataset at site 1175. The second
and D. yakuba? column gives the consensus nucleotide
(con.), i.e. the most common base
at that position; individuals with
nucleotides that match the consensus
2.1 Allele frequencies are marked with a dash. The first
columns of sequence (a-l) are from
Allele frequencies are a central unit of population genetics analysis, D. melanogaster; the next columns
(a-f) give sequences from D. simulans,
but from diploid individuals we only get to observe genotype counts.
and the final set of columns (a-l )
Our first task then is to calculate allele frequencies from genotype from D. yakuba. The last column
counts. Consider a diploid autosomal locus segregating for two alleles shows whether the difference is a non-
synonymous (N) or synonymous (S)
(A1 and A2 ). We’ll use these arbitrary labels for our alleles, merely change.
to keep this general. Let N11 and N12 be the number of A1 A1 ho-
mozygotes and A1 A2 heterozygotes, respectively. Moreover, let N
be the total number of diploid individuals in the population. We can
then define the relative frequencies of A1 A1 and A1 A2 genotypes as
f11 = N11 /N and f12 = N12 /N , respectively. The frequency of allele
A1 in the population is then given by
2N11 + N12 1
p= = f11 + f12 . (2.1)
2N 2
Note that this follows directly from how we count alleles given in-
dividuals’ genotypes, and holds independently of Hardy–Weinberg
proportions and equilibrium (discussed below). The frequency of the
alternate allele (A2 ) is then just q = 1 − p.
2.1.1 Measures of genetic variability

Nucleotide diversity (π) One common measure of genetic diversity is
the average number of single nucleotide differences between haplotypes
chosen at random from a sample. This is called nucleotide diversity
population and
quantitative
genetics 13
and is often denoted by π. For example, we can calculate π for our
ADH locus from Table 2.1 above: we have 6 sequences from D. sim-
ulans (a-f), there’s a total of 15 ways of pairing these sequences, and
1( )
π= (2+1+1+1+0)+(3+3+3+2)+(0+0+1)+(0+1)+(1) = 1.26
15
(2.2)
where the first bracketed term gives the pairwise differences between
a and b-f, the second bracketed term the differences between b and c-f
and so on.
Our π measure will depend on the length of sequence it is calcu-
lated for. Therefore, π is usually normalized by the length of sequence,
to be a per site (or per base) measure. For example, our ADH se-
quence covers 397bp of DNA and so π = 1.26/397 = 0.0032 per site
in D. simulans for this region. Note that we could also calculate π
per synonymous site (or non-synonymous). For synonymous site π, we
would count up number of synonymous differences between our pairs
of sequences, and then divide by the total number of sites where a
synonymous change could have occurred.1 1
Technically we would need to divide
by the total number of possible
point mutations that would result
Number of segregating sites. Another measure of genetic variability in a synonymous change; this is
is the total number of sites that are polymorphic (segregating) in our because some mutational changes at
a particular nucleotide will result in
sample. One issue is that the number of segregating sites will grow a non-synonymous or synonymous
as we sequence more individuals (unlike π). Later in the course, we’ll change depending on the base-pair
change.
talk about how to standardize the number of segregating sites for the
number of individuals sequenced (see eqn (4.39)).
The frequency spectrum. We also often want to compile information

about the frequency of alleles across sites. We call alleles that are
found once in a sample singletons, alleles that are found twice in a
sample doubletons, and so on. We count up the number of loci where
an allele is found i times out of n, e.g. how many singletons are there
in the sample, and this is called the frequency spectrum. We’ll want to
do this in some consistent manner, such as calculating the frequency
spectrum of the minor allele or the derived allele.
Question 2. How many minor-allele singletons are there in D.

simulans in the ADH region? [Defining minor allele just within D.
simulans.]
Levels of genetic variability across species. Two observations have

puzzled population geneticists since the inception of molecular popula-
tion genetics. The first is the relatively high level of genetic variation
observed in most obligately sexual species. This first observation,
14 graham coop
in part, drove the development of the Neutral theory of molecular

evolution, the idea that much of this molecular polymorphism may
simply reflect a balance between genetic drift and mutation. The sec-
ond observation is the relatively narrow range of polymorphism across
species with vastly different census sizes. This observation represented
a puzzle as the Neutral theory predicts that levels of genetic diver-
sity should scale with population size. Much effort in theoretical and
empirical population genetics has been devoted to trying to reconcile
models with these various observations. We’ll return to discuss these
ideas throughout our course.
The first observations of molecular genetic diversity within natural Figure 2.2: Sea Squirt (Ciona intesti-
populations were made from surveys of allozyme data, but we can nalis).
Einleitung in die vergleichende gehirnphysi-
ologie und Vergleichende psychologie. Loeb,
revisit these general patterns with modern data. J. 1899. Image from the Biodiversity Heritage
Library. Contributed by MBLWHOI Library.
No known copyright restrictions.
Figure 2.3: Levels of autosomal

nucleotide diversity for 167 species
For example, Leffler et al. (2012) compiled data on levels of across 14 phyla. Figure 1 from Lef-
within-population, autosomal nucleotide diversity (π) for 167 species fler et al. (2012), licensed under CC
BY 4.0. Points are ranked by their π,
across 14 phyla from non-coding and synonymous sites (Figure 2.3).
and coloured by their phylum. Note
The species with the lowest levels of π in their survey was Lynx, with the log-scale.
π = 0.01%, i.e. only 1/10000 bases differed between two sequences. In
contrast, some of the highest levels of diversity were found in Ciona
savignyi, Sea Squirts, where a remarkable 1/12 bases differ between
pairs of sequences. This 800-fold range of diversity seems impressive,
but census population sizes have a much larger range.
2.1.2 Hardy–Weinberg proportions

Imagine a population mating at random with respect to genotypes,
i.e. no inbreeding, no assortative mating, no population structure, and
no sex differences in allele frequencies. The frequency of allele A1 in Figure 2.4: Eurasian Lynx (Lynx
lynx).
the population at the time of reproduction is p. An A1 A1 genotype is An introduction to the study of mammals
living and extinct. Flower, W.H. and Lydekker,
R. 1891. Image from the Biodiversity Heritage
Library. Contributed by Cornell University
Library. No known copyright restrictions.
population and
quantitative
genetics 15
made by reaching out into our population and independently drawing
two A1 allele gametes to form a zygote. Therefore, the probability
that an individual is an A1 A1 homozygote is p2 . This probability is
also the expected frequencies of the A1 A1 homozygote in the popula-
tion. The expected frequency of the three possible genotypes are Throughout this chapter we’ll be
making use of the basic rules of
f11 f12 f22 probability to find the probabilities of
combinations of events, e.g. the alleles
p2 2pq q2 found in an individual, see Appendix
A.2.2 for a refresher.
Note that we only need to assume random mating with respect to
our focal allele in order for these expected frequencies to hold in the
zygotes forming the next generation. Evolutionary forces, such as
selection, change allele frequencies within generations, but do not
change this expectation for new zygotes, as long as p is the frequency
of the A1 allele in the population at the time when gametes fuse.
Question 3. On the coastal islands of British Columbia there is

a subspecies of black bear (Ursus americanus kermodei, Kermode’s
bear). Many members of this black bear subspecies are white; they’re
sometimes called spirit bears. These bears aren’t hybrids with polar
bears, nor are they albinos. They are homozygotes for a recessive
change at the MC1R gene. Individuals who are GG at this SNP are
white while AA and AG individuals are black.
Below are the genotype counts for the MC1R polymorphism in
a sample of bears from British Columbia’s island populations from
Ritland et al. (2001).
AA AG GG
42 24 21
What are the expected frequencies of the three genotypes under

HWE?
Figure 2.5: Kermode’s bear (Ursus
See Figure 2.6 for a nice empirical demonstration of Hardy–Weinberg americanus kermodei). It’s possible
that this morph is favoured as the
proportions. The mean frequency of each genotype closely matches its salmon these bears eat have a harder
HW expectations, and much of the scatter of the dots around the ex- time seeing the light morph (Klinka
pected line is due to our small sample size (∼ 60 individuals). While and Reimchen, 2009). The adaptive
value of tasting like cinnamon is
HW often seems like a silly model, it often holds remarkably well unknown.
Field book of North American mammals;
within populations. This is because individuals don’t mate at random, descriptions of every mammal known north
of the Rio Grande. Anthony, (1928) H. E.
but they do mate at random with respect to their genotype at most of Image from the Biodiversity Heritage Library.
Contributed by MBLWHOI Library. No known
the loci in the genome. copyright restrictions.
Question 4. You are investigating a locus with three alleles, A,

B, and C, with allele frequencies pA , pB , and pC . What fraction of the
population is expected to be homozygotes under Hardy–Weinberg?
Microsatellites are regions of the genome where individuals vary

for the number of copies of some short DNA repeat that they carry.
16 graham coop
Figure 2.6: Demonstrating Hardy–

Weinberg proportions using 10,000
SNPs from the HapMap European
(CEU) and African (YRI) popula-
tions. Within each of these popula-
tions the allele frequency against the
frequency of the 3 genotypes; each
SNP is represented by 3 different
coloured points. The solid lines show
the mean genotype frequency. The
dashed lines show the predicted geno-
type frequency from Hardy–Weinberg
equilibrium. Code here. Blog post on
figure here.
These regions are often highly variable across individuals, making

them a suitable way to identify individuals from a DNA sample. This
so-called DNA fingerprinting has a range of applications from estab-
lishing paternity and identifying human remains to matching individ-
uals to DNA samples from a crime scene. The FBI make use of the
CODIS database2 . The CODIS database contains the genotypes of 2
CODIS: Combined DNA Index
over 13 million people, most of whom have been convicted of a crime. System
Most of the profiles record genotypes at 13 microsatellite loci that are

tetranucleotide repeats (since 2017, 20 sites have been genotyped).
The allele counts for two loci (D16S539 and TH01) are shown in
table 2.2 and 2.3 for a sample of 155 people of European ancestry. You
can assume these two loci are on different chromosomes.
allele name 80 90 100 110 120 121 130 140 150 Table 2.2: Data for 155 Europeans
at the D16S539 microsatellite from
allele count 3 34 13 102 97 1 44 13 3 CODIS from Algee-Hewitt et al.
(2016). The top row gives the number
of tetranucleotide repeats for each
allele, the bottom row gives the
sample counts.
allele name 60 70 80 90 93 100 110 Table 2.3: Same as 2.2 but for the
TH01 microsatellite.
allele counts 84 42 37 67 77 1 2
Question 5. You extract a DNA sample from a crime scene. The

genotype is 100/80 at the D16S539 locus and 70/93 at TH01.
A) You have a suspect in custody. Assuming this suspect is in-
nocent and of European ancestry, what is the probability that their
genotype would match this profile by chance (a false-match probabil-
ity)?
population and
quantitative
genetics 17
B) The FBI uses ≥ 13 markers. Why is this higher number neces-
sary to make the match statement convincing evidence in court?
C) An early case that triggered debate among forensic geneticists
was a crime among the Abenaki, a Native American community in
Vermont (see Lewontin, 1994, for discussion). There was a DNA
sample from the crime scene, and the perpetrator was thought likely
to be a member of the Abenaki community. Given that allele frequen-
cies vary among populations, why would people be concerned about
using data from a non-Abenaki population to compute a false match
probability?
2.2 Allele sharing among related individuals and Identity by

Descent
All of the individuals in a population are related to each other by a

giant pedigree (family tree). For most pairs of individuals in a pop-
ulation these relationships are very distant (e.g. distant cousins),
while some individuals will be more closely related (e.g. sibling/first
cousins). All individuals are related to one another by varying levels
of relatedness, or kinship. Related individuals can share alleles that
have both descended from the shared common ancestor. To be shared,
these alleles must be inherited through all meioses connecting the two
individuals (e.g. surviving the 1/2 probability of segregation each meio-
sis). As closer relatives are separated by fewer meioses, closer relatives
share more alleles. In Figure 2.7 we show the sharing of chromosomal
regions between two cousins. As we’ll see, many population and quan-
titative genetic concepts rely on how closely related individuals are,
and thus we need some way to quantify the degree of kinship among
individuals.
Figure 2.7: First cousins sharing a

stretch of chromosome identical by
descent. The different grandparental
diploid chromosomes are coloured
so we can track them and recom-
binations between them across the
generations. Notice that the identity
by descent between the cousins per-
sists for a long stretch of chromosome
due to the limited number of genera-
tions for recombination. The squares
represent males and circles females.
We will define two alleles to be identical by descent (IBD) if they

are identical due to transmission from a common ancestor in the past
18 graham coop
few generations3 . For the moment, we ignore mutation, and we will 3

Cotterman, C. W., 1940 A
be more precise about what we mean by ‘past few generations’ later calculus for statistico-genetics. Ph.
D. thesis, The Ohio State Univer-
on. For example, parent and child share exactly one allele identical sity; and Malécot, G., 1948 Les
by descent at a locus, assuming that the two parents of the child are mathématiques de l’hérédité
randomly mated individuals from the population. In Figure 2.13, I
show a pedigree demonstrating some configurations of IBD. Here we’ll focus on IBD of outbred
One summary of how related two individuals are is the probability individuals. Dealing with sharing
between inbred individuals requires 6
that our pair of individuals share 0, 1, or 2 alleles identical by descent more identity-by-descent r coefficients,
(see Figure 2.8). We denote these identity-by-descent probabilities which honestly makes my head spin.
by r0 , r1 , and r2 respectively. See Table 2.4 for some examples. We
can also interpret these probabilities as genome-wide averages. For
example, on average, at a quarter of all their autosomal loci full-sibs
share zero alleles identical by descent.
One summary of relatedness that will be important is the prob-
ability that two alleles (I & J) picked at random, one from each
of the two different individuals i and j, are identical by descent
(P (I&J IBD)). We call this quantity the coefficient of kinship of in-
dividuals i and j, and denote it by Fij . It is calculated as r0
i j
Fij =P (I&J IBD) (2.3)
=P (I&J IBD| i&j 0 IBD)P (i&j 0 IBD)
+ P (I&J IBD| i&j 1 IBD)P (i&j 1 IBD) r1
+ P (I&J IBD| i&j 2 IBD)P (i&j 2 IBD) (2.4)
i j
1 1
=0 × r0 + r1 + r2 . (2.5)
4 2
In the above step, eqn(2.4), we’re summing the conditional prob-
ability of alleles I & J being IBD over whether our individuals i & r2
j share 0, 1, or 2 alleles IBD, an example of using the Law of Total
Probability (see Appendix eqn (A.12)). We’ve then, in eqn 2.5, used i j
the fact that we can calculate our condition probabilities of I & J
being IBD using the rules of Mendelian transmision. Consider the Figure 2.8: A pair of diploid individ-
probability P (I&J IBD| i&j 1 IBD), i.e. that our pair of alleles (I & uals (i and j) sharing 0, 1, or 2 alleles
IBD where lines show the sharing of
J) drawn from individuals i and j are IBD given that i and j share alleles by descent (e.g. from a shared
one allele IBD, this is a 1/4 as we need to draw the allele that is IBD ancestor).
from both i and j, i.e. drawing both black alleles in the middle panel
of Figure 2.8, which happens with probability 1/2 × 1/2. The coef-
ficient of kinship will appear multiple times, in both our discussion
of inbreeding and in the context of phenotypic resemblance between
relatives.
Question 6. What are r0 , r1 , and r2 for 1/2 sibs? (1/2 sibs share
one parent but not the other).
Question 7. Explain in words why P (I&J IBD| i&j 2 IBD) = 1/2.

population and
quantitative
genetics 19
Relationship (i,j)∗ P (i&j 0 IBD) P (i&j 1 IBD) P (i&j 2 IBD) P (I&J IBD)
Relationship (i,j)∗ r0 r1 r2 Fij
parent–child 0 1 0 1/4
full siblings 1/4 1/2 1/4 1/4
Monozygotic twins 0 0 1 1/2
1st cousins 3/4 1/4 0 1/16
Table 2.4: Probability that two

individuals of a given relationship
share 0, 1, or 2 alleles identical by
descent on the autosomes. ∗ Assuming
that our individuals are outbred and
that this the only close relationship
Genotypic sharing between pairs of individuals. Our r coefficients are the pair shares.
going to have various uses. For example, they allow us to calculate the
probability of the genotypes of a pair of relatives. Consider a biallelic
locus where allele A1 is at frequency p, and two individuals who have
IBD allele sharing probabilities r0 , r1 , r2 . What is the overall prob-
ability that these two individuals are both homozygous for allele 1?
Well that’s
P (A1 A1 ) =P (A1 A1 |0 alleles IBD)P (0 alleles IBD)

+ P (A1 A1 |1 allele IBD)P (1 allele IBD)
+ P (A1 A1 |2 alleles IBD)P (2 alleles IBD) (2.6)
Or, in our r0 , r1 , r2 notation:
P (A1 A1 ) =P (A1 A1 |0 alleles IBD)r0

+ P (A1 A1 |1 alleles IBD)r1
+ P (A1 A1 |2 alleles IBD)r2 (2.7)
If our pair of relatives share 0 alleles IBD, then the probability that
they are both homozygous is P (A1 A1 |0 alleles IBD) = p2 × p2 , as all
four alleles represent independent draws from the population. If they
share 1 allele IBD, then the shared allele is of type A1 with probability
p, and then the other non-IBD allele, in both relatives, also needs to
be A1 which happens with probability p2 , so P (A1 A1 |1 alleles IBD) =
p × p2 . Finally, our pair of relatives can share two alleles IBD, in which
case P (A1 A1 |2 alleles IBD) = p2 , because if one of our individuals is
homozygous for the A1 allele, both individuals will be. Putting this all
together our equation (2.7) becomes
P (A1 A1 ) = p4 r0 + p3 r1 + p2 r2 (2.8)
Note that for specific cases we could also calculate this by summing
over all the possible genotypes their shared ancestor(s) had; however,
that would be much more involved and not as general as the form we
have derived here.
20 graham coop
We can write out terms like eq (2.8) for all of the possible configu-
rations of genotype sharing/non-sharing between a pair of individuals.
Based on this we can write down the expected number of polymorphic
sites where our individuals are observed to share 0, 1, or 2 alleles.
Question 8. Trickier question. The genotype of our suspect in

Question 5 turns out to be 100/80 for D16S539 and 70/80 at TH01.
The suspect is not a match to the DNA from the crime scene; how-
ever, they could be a sibling.
Calculate the joint probability of observing the genotype from the
crime and our suspect:
A) Assuming that they share no close relationship.
B) Assuming that they are full sibs.
C) Briefly explain your findings.
There’s a variety of ways to estimate the relationships among in-

dividuals using genetic data. An example of using allele sharing to
identify relatives is offered by the work of Nancy Chen (in collabo-
ration with Stepfanie Aguillon, see Chen et al., 2016; Aguillon
et al., 2017). Chen et al. has collected genotyping data from thou-
sands of Florida Scrub Jays at over ten thousand loci. These Jays
live at the Archbold field site, and have been carefully monitored for
many decades allowing the pedigree of many of the birds to be known. Figure 2.9: Florida Scrub-Jays (Aphe-
Using these data she estimates allele frequencies at each locus. Then locoma coerulescens).
The birds of America : from drawings made in
the United States and their territories. 1880.
by equating the observed number of times that a pair of individuals Audubon J.J. Image from the Biodiversity
Heritage Library. Contributed by Smithsonian
share 0, 1, or 2 alleles to the theoretical expectation, she estimates Libraries. Licensed under CC BY-2.0.
the probability of r0 , r1 , and r2 for each pair of birds. A plot of these

are shown in Figure 2.10, showing how well the estimates match those
known from the pedigree.
Sharing of genomic blocks among relatives. We can more directly see

the sharing of the genome among close relatives using high-density
SNP genotyping arrays. In Figure 2.11 we show a simulation of you
and your first cousin’s genomic material that you both inherited from
your shared grandmother. Colored purple are regions where you and
your cousin will have matching genomic material, due to having inher-
ited it IBD from your shared grandmother.
You and your first cousin will share at least one allele of your geno-
type at all of the polymorphic loci in these purple regions. There’s a
range of methods to detect such sharing. One way is to look for un-
usually long stretches of the genome where two individuals are never
homozygous for different alleles. By identifying pairs of individuals
who share an unusually large number of such putative IBD blocks, we
can hope to identify unknown relatives in genotyping datasets. In fact,
companies like 23&me and Ancestry.com use signals of IBD to help
population and
quantitative
genetics 21
Figure 2.10: Estimated coefficient

of kinship from Florida Scrub Jays.
Each point is a pair of individuals,
1.0
● Parent−Offspring
● Full Sib plotted by their estimated IBD (r1
●
●
Grandparent
1/2 siblings
and r2 ) from their genetic data. The
● Aunt/Uncle points are coloured by their known
0.8
● GreatAunt/Uncle
pedigree relationships. Note that
most pairs have low kinship, and no
Estimated IBD r2
recent genealogical relationship, and

0.6
so appear as black points in the lower

left corner. Thanks to Nancy Chen for
0.4
supplying the data. Code here.

0.2
0.0
0.0 0.2 0.4 0.6 0.8 1.0
Estimated IBD r1
Figure 2.11: A simulation of sharing

between first cousins. The regions
22 22 22
21 21 21
of your grandmother’s 22 autosomes
Your 1st cousin's Both your genomes
20 Your genome in 20
genome in 20
in your
that you inherited are coloured red,
your Grandmother
19 19 your Grandmother 19 Grandmother those that your cousins inherited are
18 18 18
coloured blue. In the third panel we
17 17 17
16 16 16
show the overlapping genomic regions
15 15 15 in purple, these regions will be IBD
14 14 14
in you and your cousin. If you are full
13 13 13
12 12 12
first cousins, you will also have shared
11 11 11 genomic regions from your shared
10 10 10 grandfather, not shown here. Details
9 9 9
about how we made these simulations
8 8 8
7 7 7
here.
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
22 graham coop
identify family ties.

As another example, consider the case of third cousins. You share
one of eight sets of great-great grandparents with each of your (likely
many) third cousins. On average, you and each of your third cousins
each inherit one-sixteenth of your genome from each of those two
great-great grandparents. This turns out to imply that on average, a
little less than one percent of your and your third cousin’s genomes
(2 × (1/16)2 = 0.78%) will be identical by virtue of descent from
those shared ancestors. A simulated example where third cousins share
blocks of their genome (on chromosome 16 and 2) due to their great,
great grandmother is shown in Figure 2.12.
Figure 2.12: A simulation of sharing

between third cousins, the details are
the same as in Figure 2.11.
Note how if you compare Figure 2.12 and Figure 2.11, individuals
inherit less IBD from a shared great, great grandmother than from a
shared grandmother, as they inherit from more total ancestors further
back. Also notice how the sharing occurs in shorter genomic blocks,
as it has passed through more generations of recombination during
meiosis. These blocks are still detectable, and so third cousins can be
detected using high-density genotyping chips, allowing more distant
relatives to be identified than single marker methods alone. 4 More 4
Indeed the suspect in case of the
distant relations than third cousins, e.g. fourth cousins, start to have Golden State Killer was identified
through identifying third cousins that
a significant probability of sharing none of their genome IBD. But you genetically matched a DNA sample
have many fourth cousins, so you will share some of your genome IBD from an old crime scene (see a here
for more details).
with some of them; however, it gets increasingly hard to identify the
degree of relatedness from genetic data the deeper in the family tree
this sharing goes.
population and
quantitative
genetics 23
2.2.1 Inbreeding M. Granddad M. Grandmother
We can define an inbred individual as an individual whose parents are Dad

Uncle Aunt Mum
more closely related to each other than two random individuals drawn
Inbreeding loop
from some reference population.
When two related individuals produce an offspring, that individ- Cousin Cousin
ual can receive two alleles that are identical by descent, i.e. they can Child of 1st cousins
be homozygous by descent (sometimes termed autozygous), due to

Figure 2.13: Alleles being transmitted
the fact that they have two copies of an allele through different paths through an inbred pedigree. The two
through the pedigree. This increased likelihood of being homozy- sisters (mum and aunt) share two
alleles identical by descent (IBD). The
gous relative to an outbred individual is the most obvious effect of cousins share one allele IBD. The off-
inbreeding. It is also the one that will be of most interest to us, as spring of first cousins is homozygous
it underlies a lot of our ideas about inbreeding depression and pop- by descent at this locus.
ulation structure. For example, in Figure 2.13 our offspring of first

cousins is homozygous by descent having received the same IBD allele
via two different routes around an inbreeding loop.
As the offspring receives a random allele from each parent (i and
j), the probability that those two alleles are identical by descent is
equal to the kinship coefficient Fij of the two parents (Eqn. 2.5). This
follows from the fact that the genotype of the offspring is made by
sampling an allele at random from each of our parents.
f11 f12 f22 Table 2.5: Generalized Hardy–

Weinberg
(1 − F )p2 + F p (1 − F )2pq (1 − F )q 2 + F q
The only way the offspring can be heterozygous (A1 A2 ) is if their

two alleles at a locus are not IBD (otherwise they would necessarily be
homozygous). Therefore, the probability that they are heterozygous is
(1 − F )2pq, (2.9)
where we have dropped the indices i and j for simplicity. The off-
spring can be homozygous for the A1 allele in two different ways.
They can have two non-IBD alleles that are not IBD but happen to be
of the allelic type A1 , or their two alleles can be IBD, such that they
inherited allele A1 by two different routes from the same ancestor.
Thus, the probability that an offspring is homozygous for A1 is
(1 − F )p2 + F p. (2.10)
Therefore, the frequencies of the three possible genotypes can be

written as given in Table 2.5, which provides a generalization of the
Hardy–Weinberg proportions.
Question 9. The frequency of the A1 allele is p at a biallelic

locus. Assume that our population is randomly mating and that the
24 graham coop
genotype frequencies in the population follow from HW. We select two

individuals at random to mate from this population. We then mate
the children from this cross. What is the probability that the child
from this full sib-mating is homozygous?
Multiple inbreeding loops in a pedigree. Up to this point we have as-

sumed that there is at most one inbreeding loop in the recent family
history of our individuals, i.e. the parents of our inbred individual
have at most one recent genealogical connection. However, an indi-
vidual who has multiple inbreeding loops in their pedigree can be
homozygous by descent thanks to receiving IBD alleles via multiple
different different loops. To calculate inbreeding in pedigrees of ar-
bitrary complexity, we can extend beyond our original relatedness
coefficients r0 , r1 , and r2 to account for higher order sharing of alleles
IBD among relatives. For example, we can ask, what is the probability
that both of the alleles in the first individual are shared IBD with one
allele in the second individual? There are nine possible relatedness
coefficients in total to completely describe kinship between two diploid
individuals, and we won’t go in to them here as it’s a lot to keep track
of. However, we will show how we can calculate the inbreeding coeffi-
cient of an individual with multiple inbreeding loops more directly.
Let’s say the parents of our inbred individual (B and C) have K
shared ancestors, i.e. individuals who appear in both B and C’s recent
family trees. We denote these shared ancestors by A1 , . . . , AK , and
we denote by n the total number of individuals in the chain from B
to C via ancestor Ai , including B, C, and Ai . For example, if B is C’s
aunt, then B and C share two ancestors, which are B’s parents and,
equivalently, C’s grandparents. In this case, there are n=4 individuals
from B to C through each of these two shared ancestor. In the general
case, the kinship coefficient of B and C, i.e. the inbreeding coefficient
of their child, is
∑K
1 ( )
F = n
1 + f A i (2.11)
i=1
2 i
where fAi is the inbreeding coefficient of the ancestor Ai . What’s hap-
pening here is that we sum over all the mutually-exclusive paths in
the pedigree through which B and C can share an allele IBD. With
probability 1/2ni , a pair of alleles picked at random from B and C is
descended from the same ancestral allele in individual Ai , in which
case the alleles are IBD. 5 However, even if B inherits the maternal 5
For example, in the case of our
allele and C inherits the paternal allele of shared ancestor Ai , if Ai aunt-nephew case, assuming that
the aunt’s two parents are their
was themselves inbred, with probability fAi those two alleles are them- only recent shared ancestors, then
selves IBD. Thus a shared inbred ancestor further increases the kinship F = 1/24 + 1/24 = 1/8, in agreement
with the answer we would obtain from
of B and C. eqn (2.5).
population and
quantitative
genetics 25
Figure 2.14: The pedigree of King

Charles II of Spain. Pedigree from
wikimedia drawn by Lec CRP1, public
domain.
Multiple inbreeding loops increase the probability that a child is

homozygous by descent at a locus, which can be calculated simply by
plugging in F , the child’s inbreeding coefficient, into our generalized
HW equation.
As one extreme example of the impact of multiple inbreeding loops
in an individual’s pedigree, let’s consider king Charles II of Spain, the
last of the Spanish Habsburgs. Charles was the son of Philip IV of
Spain and Mariana of Austria, who were uncle and niece. If this were
the only inbreeding loop, then Charles would have had an inbreeding
coefficient of 1/8. Unfortunately for Charles, the Spanish Habsburgs
had long kept wealth and power within their family by arranging
marriages between close kin. The pedigree of Charles II is shown in
Figure 2.14, and multiple inbreeding loops are apparent. For example,
Phillip III, Charles II’s grandfather and great-grandfather, was himself
Figure 2.15: Charles II of Spain (by
a child of an uncle-niece marriage.
Juan Carreño de Miranda, 1685).
Alvarez et al. (2009) calculated that Charles II had an inbreed- Public Domain.
ing coefficient of 0.254, equivalent to a full-sib mating, thanks to all of
the inbreeding loops in his pedigree. Therefore, he is expected to have 6
Pedro Gargantilla, who performed
Charles’s autopsy, stated that his
been homozygous by descent for a full quarter of his genome. As we’ll
body “did not contain a single drop
talk about later in these notes, this means that Charles may have been of blood; his heart was the size of a
homozygous for a number of recessive disease alleles, and indeed he peppercorn; his lungs corroded; his
intestines rotten and gangrenous; he
was a very sickly man who left no descendants due to his infertility. 6 had a single testicle, black as coal,
Thus plausibly the end of one of the great European dynasties came and his head was full of water.” While
some of this description may refer to
actual medical conditions, some of
these details seem a little unlikely. See
here.
26 graham coop
about through inbreeding.
2.2.2 Calculating inbreeding coefficients from genetic data

If the observed heterozygosity in a population is HO , and we assume
that the generalized Hardy–Weinberg proportions hold, we can set HO
equal to f12 , and solve Eq. (2.9) for F to obtain an estimate of the
inbreeding coefficient as
f12 2pq − f12

F̂ = 1 − = . (2.12)
2pq 2pq
As before, p is the frequency of allele A1 in the population. This
can be rewritten in terms of the observed heterozygosity (HO ) and the
heterozygosity expected in the absence of inbreeding, HE = 2pq, as
HE − HO HO
F̂ = =1− . (2.13)
HE HE
Hence, F̂ quantifies the deviation due to inbreeding of the observed

heterozygosity from the one expected under random mating, relative
to the latter.
Question 10. Suppose the following genotype frequencies were

observed for an esterase locus in a population of Drosophila (A de-
notes the “fast” allele and B denotes the “slow” allele):
AA AB BB
0.6 0.2 0.2
What is the estimate of the inbreeding coefficient at the esterase lo-

cus?
If we have multiple loci, we can replace HO and HE by their means

over loci, H̄O and H̄E , respectively. Note that, in principle, we could
also calculate F for each individual locus first, and then take the aver-
age across loci. However, this procedure is more prone to introducing
a bias if sample sizes vary across loci, which is not unlikely when we
are dealing with real data.
Genetic markers are commonly used to estimate inbreeding for wild
and/or captive populations of conservation concern. As an example
of this, consider the case of the Mexican wolf (Canis lupus baileyi), a
sub-species of gray wolf.
They were extirpated in the wild during the mid-1900s due to hunt-
ing, and the remaining five Mexican wolves in the wild were captured
to start a breeding program. vonHoldt et al. (2011) estimated the
current-day, average expected heterozygosity to be 0.18, based on al-
lele frequencies at over forty thousand SNPs. However, the average Figure 2.16: Grey wolf (Canis lupus).
Dogs, jackals, wolves, and foxes: a monograph
Mexican wolf individual was only observed to be heterozygous at 12% of the Canidae. 1890. y J.G. Keulemans
Image from the Biodiversity Heritage Library.
Contributed by University of Toronto -
Gerstein Science Information Centre. Not in
copyright.
population and
quantitative
genetics 27
of these SNPs. Therefore, the average inbreeding coefficient for the
Mexican wolf is F = 1 − 0.12/0.18, i.e. ∼ 33% of a lobo’s genome is
homozygous due to recent inbreeding in their pedigree.
Genomic blocks of homozygosity due to inbreeding. As we saw above,

close relatives are expected to share alleles IBD in large genomic
blocks. Thus, when related individuals mate and transmit alleles to
an inbred offspring, they transmit these alleles in big blocks through
meiosis. An example, lets return to the case of our hypothetical first
cousins from Figure 2.7. If this pair of individuals had a child, one
possible pattern of genetic transmission is shown in Figure 2.17. The
child has inherited the red stretch of chromosome via two different
routes through their predigree from the grandparents. This is an ex-
ample of an autozygous segment, where the child is homozygous by
descent at all of the loci in this red region. The inbreeding coefficient
Figure 2.17: A pedigree showing the

offspring of first cousins. The chro-
mosomes of their great-grandparents
are coloured different colours so
their transmission can be tracked.
The child is homozygous by descent
(HBD) for a section of the red chro-
mosome.
HBD
of the child sets the proportion of their genome that will be in these
autozygous segments. For example, a child of first full cousins is ex-
pected to have 1/16 of their genome in these segments. The more
distant the loop in the pedigree, the more meioses that chromosomes
have been through and the shorter individual blocks will be. A child of
first cousins will have longer blocks than a child of second cousins, for
example.
Individuals with multiple inbreeding loops in their family tree can
have a high inbreeding coefficient due to the combined effect of many
small blocks of autozygosity. For example, Charles II had an inbreed-
ing coefficient that is equivalent to that of the child of full-sibs, with
a quarter of his genome expected to homozygous by descent, but this
would be made up of many shorter blocks.
We can hope to detect these blocks by looking for unusually long
genomic runs of homozygosity (ROH) sites in an individual’s genome.
One way to estimate an individual’s inbreeding coefficient is then to
total up the proportion of an individual’s genome that falls in such
28 graham coop
ROH regions. This estimate is called FROH .

An example of using FROH to study inbreeding comes from the
work of Sams and Boyko (2018b), who identified runs of homozy-
gosity in 2,500 dogs, ranging from 500kb up to many megabases. Fig-
Figure 2.18: The distribution of

FROH of individuals from various
dog breeds from Sams and Boyko
(2018a), licensed under CC BY 4.0.
ure 2.18 shows the distribution of FROH of individuals in each dog

breed for the X and autosome. In Figure 2.20 this is broken down by
the length of ROH segments.
Dog breeds have been subject to intense breeding that has resulted
in high levels of inbreeding. Of the population samples examined,
Doberman Pinschers have the highest levels of their genome in runs Figure 2.19: English bulldog. The
of homozygosity (FROH ), somewhat higher than English bulldogs. dogs of Boytown. 1918. Dyer, W. A.
In 2.20 we can see that English bulldogs have more short ROH than
Doberman Pinschers, but that Doberman Pinschers have more of their
genome in very large ROH (> 16M b). This suggests that English bull-
dogs have had long history of inbreeding but that Doberman Pinschers
have a lot of recent inbreeding in their history.
population and
quantitative
genetics 29
Figure 2.20: Cumulative density of

ROH length, measured in megabases
(Mb) from Sams and Boyko
(2018a) for various dog breeds (li-
censed under CC BY 4.0). Note that
longer lengths of ROH are on the left
of the plot.
3
Population Structure and Correlations Among Loci.
Individuals rarely mate completely at random; your

parents weren’t two Bilateria plucked at random from the tree of life.
Even within species, there’s often geographically-restricted mating
among individuals. Individuals tend to mate with individuals from the
same, or closely related sets of populations. This form of non-random
mating is called population structure and can have profound effects
on the distribution of genetic variation within and among natural
populations.
Populations can often differ in their allele frequencies, either due to
genetic drift or selection driving differentiation among populations. In
this chapter we’ll talk through some ways to summarize and visualize
population genetic structure. Population differentiation is also a major
driver of correlations in allelic state among loci, and we’ll start our
discussion of these correlations at the end of this chapter. One rea-
son for talking about population structure so early in the book is that
summarizing population structure is often a key initial stage in popu-
lation genomic analyses. Thus you’ll often encounter summaries and
visualizations of population structure when we read research papers,
so it’s good to have some understanding of what they represent.
3.0.1 Inbreeding as a summary of population structure.

Our statements about inbreeding, and inbreeding coefficients, repre-
sent one natural way to summarize population structure. In the pre-
vious chapter, we defined inbreeding as having parents that are more
closely related to each other than two individuals drawn at random
from some reference population. The question that naturally arises is:
Which reference population should we use? While I might not look in-
bred in comparison to allele frequencies in the United Kingdom (UK),
where I am from, my parents certainly are not two individuals drawn
at random from the world-wide population. If we estimated my in-
32 graham coop
breeding coefficient F using allele frequencies within the UK, it would

be close to zero, but would likely be larger if we used world-wide fre-
quencies. This is because there is a somewhat lower level of expected
heterozygosity within the UK than in the human population across the
world as a whole.
Wright1 developed a set of ‘F-statistics’ (also called ‘fixation in- 1
Wright, S., 1943 Isolation by
dices’) that formalize the idea of inbreeding with respect to different Distance. Genetics 28(2): 114–
138; and Wright, S., 1949 The
levels of population structure. See Figure 3.1 for a schematic diagram. Genetical Structure of Populations.
Wright defined FXY as the correlation between random gametes, Annals of Eugenics 15(1): 323–354
drawn from the same level X, relative to level Y . We will return to
why F -statistics are statements about correlations between alleles in
just a moment. One commonly used F -statistic is FIS , which is the Individual
inbreeding coefficient between an individual (I) and the subpopulation
(S). Consider a single locus, where in a subpopulation (S) a fraction
HI = f12 of individuals are heterozygous. In this subpopulation, let HI
the frequency of allele A1 be pS , such that the expected heterozygosity Sub-pop
under random mating is HS = 2pS (1 − pS ). We will write FIS as HS
HI f12 Total pop
FIS = 1 − =1− , (3.1) HT
HS 2pS qS
a direct analog of eqn. 2.12. Hence, FIS is the relative difference be-
Figure 3.1: The hierarchical nature
tween observed and expected heterozygosity due to a deviation from
of F-statistics. The two dots within
random mating within the subpopulation. We could also compare the an individual represent the two alleles
observed heterozygosity in individuals (HI ) to that expected in the at a locus for an individual I. We
can compare the heterozygosity in
total population, HT . If the frequency of allele A1 in the total popula- individuals (HI ), to that found by
tion is pT , then we can write FIT as randomly drawing alleles from the
sub-population (S), to that found in
HI f12 the total population (T).
FIT = 1 − =1− , (3.2)
HT 2pT qT
which compares heterozygosity in individuals to that expected in the

total population. As a simple extension of this, we could imagine
comparing the expected heterozygosity in the subpopulation (HS ) to
that expected in the total population HT , via FST :
HS 2pS qS
FST = 1 − =1− . (3.3)
HT 2pT qT
We can relate the three F -statistics to each other as

HI HS
(1 − FIT ) = = (1 − FIS )(1 − FST ). (3.4)
HS HT
Hence, the reduction in heterozygosity within individuals compared to

that expected in the total population can be decomposed to the reduc-
tion in heterozygosity of individuals compared to the subpopulation,
and the reduction in heterozygosity from the total population to that
in the subpopulation.
population and
quantitative
genetics 33
If we want a summary of population structure across multiple sub-
populations, we can average HI and/or HS across populations, and
use a pT calculated by averaging pS across subpopulations (or our
samples from sub-populations). For example, the average FST across
K subpopulations (sampled with equal effort) is
H̄S
FST = 1 − , (3.5)
HT
∑K (i) (i)
where H̄S = 1/K i=1 HS , and HS = 2pi qi is the expected heterozy-
gosity in subpopulation i. It follows that the average heterozygosity of
the sub-populations H̄S ≤ HT , and so FST ≥ 0 and FIS ≤ FIT . This
observation that the average heterozygosity of the sub-populations
must be less than of equal to that of the total population is called the
Wahlund effect. Furthermore, if we have multiple sites, we can replace 2
Averaging heterozygosity across loci
first, then calculating FST , rather
HI , HS , and HT with their averages across loci (as above). 2
than calculating FST for each locus
As an example of comparing a genome-wide estimate of FST to that individually and then taking the aver-
at individual loci we can look at some data from blue- and golden- age, has better statistical properties as
statistical noise in the denominator is
winged warblers (Vermivora cyanoptera and V. chrysoptera 1-2 & 5-6 averaged out.
in Figure 3.2).
These two species are spread across eastern Northern America, with
the golden-winged warbler having a smaller, more northernly range.
They’re quite different in terms of plumage, but have long been known
to have similar songs and ecologies. The two species hybridize readily
in the wild; in fact two other previously-recognized species, Brewster’s
and Lawrence’s warbler (4 & 3 in 3.2), are actually found to just be
hybrids between theses two species. The golden-winged warbler is
listed as ‘threatened’ under the Canadian endangered species act as its
habitat is under pressure from human activity and and due to increas-
ing hybridization with the blue-winged warbler, which is moving north
into its range. Toews et al. (2016) investigated the population ge-
nomics of these warblers, sequencing ten golden- and ten blue-winged
warblers. They found very low divergence among these species, with
a genome-wide FST = 0.0045. In Figure 3.3, per SNP FST is aver-
aged in 2000bp windows moving along the genome. The average is
very low, but some regions of very high FST stand out. Nearly all of
these regions correspond to large allele frequency differences at loci
in, or close, to genes known to be involved in plumage colouration
differences in other birds. To illustrate these frequency differences
Toews et al. (2016) genotyped a SNP in each of these high-FST re-
gions. Here’s their genotyping counts from the SNP, segregating for
an allele 1 and 2, in the Wnt region, a key regulatory gene involved in
feather development: Figure 3.2: Blue-, golden-winged, and
Lawrence’s warblers (Vermivora).
The warblers of North America. Chapman,
F.M. 1907. Image from the Biodiversity
Heritage Library. Contributed by American
Museum of Natural History Library. Not in
copyright.
34 graham coop
Figure 3.3: FST between blue- and

golden-winged warbler population
samples at SNPs across the genome.
Each dot is a SNP, and SNPs are
coloured alternating by scaffold.
Species 11 12 22
Thanks to David Toews for the figure.
Blue-winged 2 21 31
Golden-winged 48 12 1
Question 1. With reference to the table of Wnt-allele counts:

A) Calculate FIS in blue-winged warblers.
B) Calculate FST for the sub-population of blue-winged warblers
compared to the combined sample.
C) Calculate mean FST across both sub-populations.
Interpretations of F-statistics Let us now return to Wright’s defini-

tion of the F -statistics as correlations between random gametes, drawn
from the same level X, relative to level Y . Without loss of generality,
we may think about X as individuals and S as the subpopulation.
Rewriting FIS in terms of the observed homozygote frequencies (f11 ,
f22 ) and expected homozygosities (p2S , qS2 ) we find
2pS qS − f12 f11 + f22 − p2S − qS2

FIS = = , (3.6)
2pS qS 2pS qS
using the fact that p2 + 2pq + q 2 = 1, and f12 = 1 − f11 − f12 .

The form of eqn. (3.6) reveals that FIS is the covariance between pairs
of alleles found in an individual, divided by the expected variance 3
To see why the numerator of eqn
(3.6) is the covariance of a discrete
under binomial sampling. Thus, F -statistics can be understood as the
random variable see Appendix eqn.
correlation between alleles drawn from a population (or an individual) (A.40), where we imagine that the
above that expected by chance (i.e. drawing alleles sampled at random random variable is 1 if the alleles
drawn from the population are the
from some broader population). 3 same and 0 if not. The denominator
We can also interpret F -statistics as proportions of variance ex- is the binomial variance of a sample
of two, and so our eqn is a covari-
plained by different levels of population structure. To see this, let
ance divided by a variance and so
us think about FST averaged over K subpopulations, whose fre- interpretable as a correlation (see eqn
quencies are p1 , . . . , pK . The frequency in the total population is (A.42))
population and
quantitative
genetics 35
∑K
pT = p̄ = 1/K i=1 pi . Then, we can write
( ∑K ∑K )
∑K 1
p2i + 1 2
− p̄2 − q̄ 2
2p̄q̄ − i=1 qi
1
K i=1 2pi qi K i=1 K
FST = =
2p̄q̄ 2p̄q̄
(3.7)
Var(p1 , . . . , pK )
= , (3.8)
Var(p̄)
which shows that FST is the proportion of the variance explained by

the subpopulation labels. 4 4
This follows because the numerator,
in step eqn (3.7), is the averaged
squared frequency minus the squared
3.0.2 Other approaches to population structure frequency, i.e. the variance (see
Appendix eqn A.23.
There is a broad spectrum of methods to describe patterns of popula-
tion structure in population genetic datasets. We’ll briefly discuss two
broad-classes of methods that appear often in the literature: assign-
ment methods and principal components analysis.
3.0.3 Assignment Methods

Here we’ll describe a simple probabilistic assignment to find the prob-
ability that an individual of unknown population comes from one of
K predefined populations. For example, there are three broad popu-
lations of common chimpanzee (Pan troglodytes) in Africa: western,
central, and eastern. Imagine that we have a chimpanzee whose popu-
lation of origin is unknown (e.g. it’s from an illegal private collection).
If we have genotyped a set of unlinked markers from a panel of in-
dividuals representative of these populations, we can calculate the
probability that our chimp comes from each of these populations.
We’ll then briefly explain how to extend this idea to cluster a set
of individuals into K initially unknown populations. This method is
a simplified version of what population genetics clustering algorithms
such as STRUCTURE and ADMIXTURE do. 5 5
Pritchard, J. K.,
M. Stephens, and P. Don-
nelly, 2000 Inference of population
A simple assignment method We have genotype data from unlinked structure using multilocus genotype
data. Genetics 155(2): 945–959; and
S biallelic loci for K populations. The allele frequency of allele A1 at
Alexander, D. H., J. Novem-
locus l in population k is denoted by pk,l , so that the allele frequencies bre, and K. Lange, 2009 Fast
in population 1 are p1,1 , · · · p1,L and population 2 are p2,1 , · · · p2,L and model-based estimation of ancestry
in unrelated individuals. Genome
so on. research 19(9): 1655–1664
You genotype a new individual from an unknown population at
these L loci. This individual’s genotype at locus l is gl , where gl de-
notes the number of copies of allele A1 this individual carries at this
locus (gl = 0, 1, 2).
The probability of this individual’s genotype at locus l conditional
on coming from population k, i.e. their alleles being a random HW
36 graham coop
draw from population k, is



(1 − pk,l )
2
 gl = 0
P (gl |pop k) = 2pk,l (1 − pk,l ) gl = 1 (3.9)


 2
pk,l gl = 2
Assuming that the loci are independent, the probability of the

individual’s genotype across all S loci, conditional on the individual
coming from population k, is
∏
S
P (ind.|pop k) = P (gl |pop k) (3.10)
l=1
We wish to know the probability that this new individual comes

from population k, i.e. P (pop k|ind.). We can obtain this through
Bayes’ rule
P (ind.|pop k)P (pop k)

P (pop k|ind.) = (3.11)
P (ind.)
where
∑
K
P (ind.) = P (ind.|pop k)P (pop k) (3.12)
k=1
is the normalizing constant.6 We can interpret P (pop k) as the prior 6

See the Appendix (A.16) for more on
Bayes’ Rule
probability of the individual coming from population k, and unless
we have some other prior knowledge we will assume that the new
individual has a equal probability of coming from each population
P (pop k) = 1/K .
We interpret
P (pop k|ind.) (3.13)
as the posterior probability that our new individual comes from each
of our 1, · · · , K populations.
More sophisticated versions of this are now used to allow for hy-
brids, e.g, we can have a proportion qk of our individual’s genome
come from population k and estimate the set of qk ’s.
Question 2.
Returning to our chimp example, imagine that we have genotyped
a set of individuals from the Western and Eastern populations at two
SNPs (we’ll ignore the central population to keep things simpler). The
frequency of the capital allele at two SNPs (A/a and B/b) is given by
Population locus A locus B

Western 0.1 0.85
Eastern 0.95 0.2
population and
quantitative
genetics 37
A) Our individual, whose origin is unknown, has the genotype AA at
the first locus and bb at the second. What is the posterior probability
that our individual comes from the Western population versus Eastern
chimp population?
B) (Trickier) Lets assume that our individual from part A is a hy-

brid (not necessarily an F1). At each locus, with probability qW our
individual draws an allele from the Western population and with prob-
ability qE = 1 − qW they draw an allele from the Eastern population.
What is the probability of our individual’s genotype given qW ?
Optional You could plot this probability as a function of qW . How
does your plot change if our individual is heterozygous at both loci?
Clustering based on assignment methods While it is great to be able

to assign our individuals to a particular population, these ideas can
be pushed to learn about how best to describe our genotype data in
terms of discrete populations without assigning any of our individuals
to populations a priori. We wish to cluster our individuals into K un-
known populations. We begin by assigning our individuals at random
to these K populations.
1. Given these assignments we estimate the allele frequencies at all of

our loci in each population.
2. Given these allele frequencies we chose to reassign each individual

to a population k with a probability given by eqn. (3.10).
We iterate steps 1 and 2 for many iterations (technically, this ap-

proach is known as Gibbs Sampling). If the data is sufficiently infor-
mative, the assignments and allele frequencies will quickly converge on
a set of likely population assignments and allele frequencies for these
populations.
Figure 3.4: Becquet et al. (2007)

genotyped 78 common chimpanzee
and 6 bonobo at over 300 polymor-
phic markers (in this case microsatel-
lites). They ran STRUCTURE to
cluster the individuals using these
data into K = 4 populations. In
Becquet et al. (2007) above fig-
ure they show each individual as a
vertical bar divided into four colours
depicting the estimate of the frac-
To do this in a full Bayesian scheme we need to place priors on tion of ancestry that each individual
the allele frequencies (for example, one could use a beta distribution draws from each of the four esti-
prior). Technically we are using the joint posterior of our allele fre- mated populations (licensed under
CC BY 4.0). We can see that these
quencies and assignments. Programs like STRUCTURE, use this type four colours/populations correspond
of algorithm to cluster the individuals in an “unsupervised” manner to: Red, central; blue, eastern; green,
western; yellow, bonobo.
38 graham coop
(i.e. they work out how to assign individuals to an unknown set of

populations). See Figure 3.4 for an example of Becquet et al. using
STRUCTURE to determine the population structure of chimpanzees.
STRUCTURE-like methods have proven incredible popular and
useful in examining population structure within species. However, the
results of these methods are open to misinterpretation; see Law-
son et al. (2018) for a recent discussion. Two common mistakes
are 1) taking the results of STRUCTURE-like approaches for some
particular value of K and taking this to represent the best way to
describe population-genetic variation. 2) Thinking that these clusters
represent ‘pure’ ancestral populations.
There is no right choice of K, the number of clusters to partition
into. There are methods of judging the ‘best’ K by some statistical
measure given some particular dataset, but that is not the same as
saying this is the most meaningful level on which to summarize pop-
ulation structure in data. For example, running STRUCTURE on
world-wide human populations for low value of K will result in popula-
tion clusters that roughly align with continental populations (Rosen-
berg et al., 2002). However, that does not tell us that assigning
ancestry at the level of continents is a particularly meaningful way of
partitioning individuals. Running the same data for higher value of K,
or within continental regions, will result in much finer-scale partition-
ing of continental groups (Rosenberg et al., 2002; Li et al., 2008).
No one of these layers of population structure identified is privileged
as being more meaningful than another.
It is tempting to think of these clusters as representing ancestral
populations, which themselves are not the result of admixture. How-
ever, that is not the case, for example, running STRUCTURE on
world-wide human data identifies a cluster that contains many Euro-
pean individuals, however, on the basis of ancient DNA we know that
modern Europeans are a mixture of distinct ancestral groups.
3.0.4 Principal components analysis
Principal component analysis (PCA) is a common statistical approach

to visualize high dimensional data, and used by many fields. The idea
of PCA is to give a location to each individual data-point on each of
a small number principal component axes. These PC axes are chosen
to reflect major axes of variation in the data, with the first PC being
that which explains largest variance, the second the second most,
and so on. The use of PCA in population genetics was pioneered by
Cavalli-Sforza and colleagues and now with large genotyping datasets,
PCA has made a comeback. 7 7
Menozzi, P., A. Piazza, and
Consider a dataset consisting of N individuals at S biallelic SNPs. L. Cavalli-Sforza, 1978 Syn-
thetic maps of human gene frequen-
cies in Europeans. Science 201(4358):
786–792; and Patterson, N.,
A. L. Price, and D. Reich, 2006
Population structure and eigenanaly-
sis. PLoS genetics 2(12): e190
population and
quantitative
genetics 39
The ith individual’s genotype data at locus ℓ takes a value gi,ℓ =
0, 1, or 2 (corresponding to the number of copies of allele A1 an
individual carries at this SNP). We can think of this as a N × S matrix
(where usually N ≪ S).
Denoting the sample mean allele frequency at SNP ℓ by pℓ , it’s
common to standardize the genotype in the following way
g − 2pℓ
√ i,ℓ (3.14)
2pℓ (1 − pℓ )
i.e. at each SNP we center the genotypes by subtracting the mean

genotype (2pℓ ) and divide through by the square root of the expected
variance assuming that alleles are sampled binomially from the mean
√
frequency ( 2pℓ (1 − pℓ )). Doing this to all of our genotypes, we form
a data matrix (of dimension N × S). We can then perform principal
component analysis of this data matrix to uncover the major axes of
genotype variance in our sample. Figure 3.5 shows a PCA from Bec-
quet et al. (2007) using the same chimpanzee data as in Figure 3.4.
Figure 3.5: Principal Component

Analysis by Becquet et al. (2007)
using the same chimpanzee data
as in Figure 3.4. Here Becquet
et al. (2007) plot the location of each
individual on the first two principal
components (called eigenvectors)
in the left panel, and on the second
and third principal components
(eigenvectors) in the right panel
(licensed under CC BY 4.0). In the
PCA, individuals identified as all
of one ancestry by STRUCTURE
cluster together by population (solid
circles). While the nine individuals
It is worth taking a moment to delve further into what we are doing identified by STRUCTURE as hybrids
(open circles) for the most part fall
here. There’s a number of equivalent ways of thinking about what at intermediate locations in the
PCA is doing. One of these ways is to think that when we do PCA we PCA. There are two individuals (red
open circles) reported as being of a
are building the individual by individual covariance matrix and per-
particular population but that but
forming an eigenvalue decomposition of this matrix (with the eigenvec- appear to be hybrids.
tors being the PCs). This individual by individual covariance matrix
has entries the [i, j] given by
1 ∑ (gi,ℓ − 2pℓ )(gj,ℓ − 2pℓ )

S
(3.15)
S−1 2pℓ (1 − pℓ )
ℓ=1
Note that this is the sample covariance, and is very similar to those
we encountered in discussing F -statistics as correlations (equation
(3.6)), except now we are asking about the covariance between two
individuals above that expected if they were both drawn from the
total sample at random (rather than the covariance of alleles within a
40 graham coop
single individual). So by performing PCA on the data we are learning

about the major (orthogonal) axes of the kinship matrix.
As an example of the application of PCA, let’s consider the case
of the putative ring species in the Greenish warbler (Phylloscopus
trochiloides) species complex. This set of subspecies exists in a ring
around the edge of the Himalayan plateau. Alcaide et al. (2014)
collected 95 Greenish warbler samples from 22 sites around the ring,
and the sampling locations are shown in Figure 3.6.
Figure 3.6: The sampling locations

of 22 populations of Greenish war-
blers from Alcaide et al. (2014).
The samples are coloured by the
subspecies. Code here.
It is thought that these warblers spread from the south, north-

ward in two different directions around the inhospitable Himalayan
plateau, establishing populations along the western edge (green and
blue populations) and the eastern edge (yellow and red populations).
When they came into secondary contact in Siberia, they were repro-
ductively isolated from one another, having evolved different songs
and accumulated other reproductive barriers from each other as they
spread independently north around the plateau, such that P. t. viri-
danus (blue) and P. t. plumbeitarsus (red) populations presently form
a stable hybrid zone.
Alcaide et al. (2014) obtained sequence data for their samples at
2,334 snps. In Figure 3.8 you can see the matrix of kinship coefficients,
using (3.15), between all pairs of samples. You can already see a lot
about population structure in this matrix. Note how the red and
yellow samples, thought to be derived from the Eastern route around
the Himalayas, have higher kinship with each other, and blue and
the (majority) of the green samples, from the Western route, form a
similarly close group in terms of their higher kinship.
We can then perform PCA on this kinship matrix to identify the
Figure 3.7: Greenish warbler, subspp.
major axes of variation in the dataset. Figure 3.9 shows the sam- viridanus (Phylloscopus trochiloides
ples plotted on the first two PCs. The two major routes of expansion viridanus).
Coloured figures of the birds of the British
Islands. 1885. Lilford T. L. P.. Image from the
clearly occupy different parts of PC space. The first principal com- Biodiversity Heritage Library. Contributed by
American Museum of Natural History Library.
Not in copyright. (Greenish warblers are rare
visitors to the UK.)
population and
quantitative
genetics 41
Figure 3.8: The matrix of kinship

coefficients calcuated for the 95
samples of Greenish warblers. Each
cell in the matrix gives the pairwise
kinship coefficient calculated for
a particular pair. Hotter colours
indicating higher kinship. The x
and y labels of individuals are the
population labels from Figure 3.6, and
coloured by subspecies label as in that
figure. The rows and columns have
been organized to cluster individuals
with high kinship. Code here.
Figure 3.9: The 95 greenish warbler

samples plotted on their locations on
the first two principal components.
The labels of individuals are the
population labels from Figure 3.6, and
coloured by subspecies label as in that
figure. Code here.
42 graham coop
ponent distinguishes populations running North to South along the

western route of expansion, while the second principal component
distinguishes among populations running North to South along the
Eastern route of expansion. Thus genetic data supports the hypoth-
esis that the Greenish warblers speciated as they moved around the
Himalayan plateau. However, as noted by Alcaide et al. (2014), it
also suggests additional complications to the traditional view of these
warblers as an unbroken ring species, a case of speciation by continu-
ous geographic isolation. The Ludlowi subspecies shows a significant
genetic break, with the southern most MN samples clustering with
the Trochiloides subspecies, in both the PCA and kinship matrix (Fig-
ures 3.9 and 3.8), despite being much more geographically close to the
other Ludlowi samples. This suggests that genetic isolation is not just
a result of geographic distance, and other biogeographic barriers must
be considered in the case of this broken ring species.
Finally, while PCA is a wonderful tool for visualizing genetic data,
care must be taken in its interpretation. The U-like shape in the case
of the Greenish warbler PC might be consistent with some low level
of gene flow between the red and the blue populations, pulling them
genetically closer together and helping to form a genetic ring as well
as a geographic ring. However, U-like shapes are expected to appear in
PCAs even if our populations are just arrayed along a line, and more
complex geometric arrangements of populations in PC space can result
under simple geographic models (Novembre and Stephens,
2008). Inferring the geographical and population-genetic history of
species requires the application of a range of tools; see Alcaide
et al. (2014) and Bradburd et al. (2016) for more discussion of the
Greenish warblers.
population and
quantitative
genetics 43
3.0.5 Correlations between loci, linkage disequilibrium, and recom-
bination
Up to now we have been interested in correlations between alleles
at the same locus, e.g. correlations within individuals (inbreeding)
or between individuals (relatedness). We have seen how relatedness
between parents affects the extent to which their offspring is inbred.
We now turn to correlations between alleles at different loci.
Recombination To understand correlations between loci we need

to understand recombination a bit more carefully. Let us consider
a heterozygous individual, containing AB and ab haplotypes. If no
recombination occurs between our two loci in this individual, then
these two haplotypes will be transmitted intact to the next genera-
tion. While if a recombination (i.e. an odd number of crossing over
events) occurs between the two parental haplotypes, then 1/2 the time
the child receives an Ab haplotype and 1/2 the time the child receives
an aB haplotype. Effectively, recombination breaks up the association
between loci. We’ll define the recombination fraction (r) to be the
probability of an odd number of crossing over events between our loci
in a single meiosis. In practice we’ll often be interested in relatively
short regions such that recombination is relatively rare, and so we
might think that r = rBP L ≪ 12 , where rBP is the average recombina-
tion rate (in Morgans) per base pair (typically ∼ 10−8 ) and L is the
number of base pairs separating our two loci.
Linkage disequilibrium The (horrible) phrase linkage disequilibrium

(LD) refers to the statistical non-independence (i.e. a correlation)
of alleles in a population at different loci. It’s an awful name for a
fantastically useful concept; LD is key to our understanding of diverse
topics, from sexual selection and speciation to the limits of genome-
wide association studies.
Our two biallelic loci, which segregate alleles A/a and B/b, have
allele frequencies of pA and pB respectively. The frequency of the two
locus haplotype AB is pAB , and likewise for our other three combi-
nations. If our loci were statistically independent then pAB = pA pB ,
otherwise pAB ̸= pA pB We can define a covariance between the A and
B alleles at our two loci8 as 8
Here again we are making use of
a covariance of discrete random
variables, see Appendix eqn (A.40),
DAB = pAB − pA pB (3.16)
where the first variable is drawing
haplotype with an A at the first locus,
and likewise for our other combinations at our two loci (DAb , DaB , Dab ). and the second is drawing a B allele
Gametes with two similar case alleles (e.g. A and B, or a and b) at the other locus.
are known as coupling gametes, and those with different case alleles
are known as repulsion gametes (e.g. a and B, or A and b). Then,
44 graham coop
we can think of D as measuring the excess of coupling to repulsion

gametes. These D statistics are all closely related to each other as
DAB = −DAb and so on. Thus we only need to specify one DAB to
know them all, so we’ll drop the subscript and just refer to D. Also a
handy result is that we can rewrite our haplotype frequency pAB as
pAB = pA pB + D. (3.17)
If D = 0 we’ll say the two loci are in linkage equilibrium, while if

D > 0 or D < 0 we’ll say that the loci are in linkage disequilibrium
(we’ll perhaps want to test whether D is statistically different from 0
before making this choice). You should be careful to keep the concepts
of linkage and linkage disequilibrium separate in your mind. Genetic
linkage refers to the linkage of multiple loci due to the fact that they
are transmitted through meiosis together (most often because the loci
are on the same chromosome). Linkage disequilibrium merely refers to
the covariance between the alleles at different loci; this may in part be
due to the genetic linkage of these loci but does not necessarily imply
this (e.g. genetically unlinked loci can be in LD due to population
structure).
i
Question 3. You genotype 2 bi-allelic loci (A & B) segregating
in two mouse subspecies (1 & 2) which mate randomly among them-
selves, but have not historically interbreed since they speciated. On
the basis of previous work you estimate that the two loci are separated
by a recombination fraction of 0.1. The frequencies of haplotypes in
each population are:
Pop pAB pAb paB pab

1 .02 .18 .08 .72
2 .72 .18 .08 .02
A) How much LD is there within species? (i.e. estimate D)

B) If we mixed individuals from the two species together in equal
proportions, we could form a new population with pAB equal to the
average frequency of pAB across species 1 and 2. What value would D
take in this new population before any mating has had the chance to
occur?
Our linkage disequilibrium statistic D depends strongly on the al-
lele frequencies of the two loci involved. One common way to partially
remove this dependence, and make it more comparable across loci,
is to divide D through by its maximum possible value given the fre-
quency of the loci. This normalized statistic is called D′ and varies be-
tween +1 and −1. In Figure 3.10 there’s an example of LD across the
TAP2 region in human and chimp. Notice how physically close SNPs,
population and
quantitative
genetics 45
Figure 3.10: LD across the TAP2 gene

region in a sample of Humans and
Chimps, from Ptak et al. (2004),
licensed under CC BY 4.0. The rows
and columns are consecutive SNPs,
with each cell giving the absolute
D′ value between a pair of SNPs.
Note that these are different sets of
SNPs in the two species, as shared
polymorphisms are very rare.
i.e. those close to the diagonal, have higher absolute values of D′ as

closely linked alleles are separated by recombination less often allowing
high levels of LD to accumulate. Over large physical distances, away
from the diagonal, there is lower D′ . This is especially notable in hu-
mans as there is an intense, human-specific recombination hotspot in
this region, which is breaking down LD between opposite sides of this
region.
Another common statistic for summarizing LD is r2 which we write
as
D2
r2 = (3.18)
pA (1 − pA )pB (1 − pB )
As D is a covariance, and pA (1 − pA ) is the variance of an allele drawn

at random from locus A, r2 is the squared correlation coefficient.9 9
See Appendix eqn (A.42) for the
Note that this r in r2 is NOT the recombination fraction. definition of a correlation coefficient.
Figure 3.12 shows r2 for pairs of SNPs at various physical distances

in two population samples of Mus musculus domesticus. Again LD
is highest between physically close markers as LD is being generated
faster than it can decay via recombination; more distant markers have
much lower LD as here recombination is winning out. Note the decay
of LD is much slower in the advanced-generation cross population than
in the natural wild-caught population. This persistence of LD across
megabases is due to the limited number of generations for recombina- Figure 3.11: Mus musculus.
A history of British quadrupeds, including the
Cetacea. 1874. Bell T., Tomes, R. F.m Alston
tion since the cross was created. E. R. Image from the Biodiversity Heritage
Library. Contributed by Cornell University
Library. No known copyright restrictions.
Figure 3.12: The decay of LD for

autosomal SNPin Mus musculus
domesticus, as measured by r2 , in
a wild-caught mouse population
from Arizona and a set of advanced-
generation crosses between inbred
lines of lab mice. Each dot gives the
r2 for a pair of SNPs a given physical
distance apart, for a total of ∼ 3000
SNPs. The solid black line gives the
mean, the jagged red line the 95th
percentile, and the flat red line a
cutoff for significant LD. From Lau-
rie et al. (2007), licensed under CC
BY 4.0.
46 graham coop
The generation of LD. Various population genetic forces can generate

LD. Selection can generate LD by favouring particular combinations
of alleles. Genetic drift will also generate LD, not because particular
combinations of alleles are favoured, but simply because at random
particular haplotypes can by chance drift up in frequency. Mixing
between divergent populations can also generate LD, as we saw in the
mouse question above.
The decay of LD due to recombination We will now examine what

happens to LD over the generations if, in a very large population (i.e.
no genetic drift and frequencies of our loci thus follow their expecta-
tions), we only allow recombination to occur. To do so, consider the
frequency of our AB haplotype in the next generation, p′AB . We lose
a fraction r of our AB haplotypes to recombination ripping our alleles
apart but gain a fraction rpA pB per generation from other haplotypes
recombining together to form AB haplotypes. Thus in the next gener-
ation
p′AB = (1 − r)pAB + rpA pB (3.19)
The last term above, in eqn 3.19, is r(pAB + pAb )(pAB + paB ) simpli-
fied, which is the probability of recombination in the different diploid
0.25
r = 0.01
genotypes that could generate a pAB haplotype. r = 0.1
0.20
r = 0.5
We can then write the change in the frequency of the pAB haplo-
0.15
type as
D
0.10
∆pAB = p′AB − pAB = −rpAB + rpA pB = −rD (3.20)
0.05
0.00
So recombination will cause a decrease in the frequency of pAB 0 20 40 60 80 100
if there is an excess of AB haplotypes within the population (D > Generations
0), and an increase if there is a deficit of AB haplotypes within the

Figure 3.13: The decay of LD from
population (D < 0). Our LD in the next generation is an initial value of D0 = 0.25 over
time (Generations) for a pair of loci a
D′ = p′AB − p′A p′B recombination fraction r apart. Code
= (pAB + ∆pAB ) − (pA + ∆pA )(pB + ∆pB ) here.
= pAB + ∆pAB − pA pB
= (1 − r)D (3.21)
0.25
t=5
t = 10
0.20
t = 100
where we can cancel out ∆pA and ∆pB above because recombination
0.15
only changes haplotype, not allele, frequencies. So if the level of LD in

D
0.10
generation 0 is D0 , the level t generations later (Dt ) is

0.05
Dt = (1 − r)t D0 (3.22)
0.00
0.0 0.1 0.2 0.3 0.4 0.5
Recombination is acting to decrease LD, and it does so geometrically Recombination fraction (r)
at a rate given by (1 − r). If r ≪ 1 then we can approximate this by

Figure 3.14: The decay of LD from
an exponential and say that an initial value of D0 = 0.25 due
to recombination over t generations,
Dt ≈ D0 e−rt (3.23) plotted across possible recombination
fractions (r) between our pair of loci.
Code here.
population and
quantitative
genetics 47
Question 4. You find a hybrid population between the two mouse
subspecies described in the question above, which appears to be com-
prised of equal proportions of ancestry from the two subspecies. You
estimate LD between the two markers to be 0.0723. Assuming that
this hybrid population is large and was formed by a single mixture
event, can you estimate how long ago this population formed?
A particularly striking example of the decay of LD generated by the
mixing of populations is offered by the LD created by the interbreed-
ing between humans and Neanderthals. Neanderthals and modern
Humans diverged from each other likely over half a million years ago,
allowing time for allele frequency differences to accumulate between
the Neanderthal and modern human populations. The two populations
spread back into secondary contact when humans moved out of Africa
over the past hundred thousand years or so. One of the most excit-
ing findings from the sequencing of the Neanderthal genome was that
modern-day people with Eurasian ancestry carry a few percent of their Figure 3.15: The earliest discovered
fossil of a Neanderthal, fragments of a
genome derived from the Neanderthal genome, via interbreeding dur- skull found in a cave in the Neander
ing this secondary contact. To date the timing of this interbreeding, Valley in Germany.
Man’s place in nature. 1890. Huxley, T. H.
Sankararaman et al. (2012) looked at the LD in modern humans Image from the Internet Archive. Contributed
by The Library of Congress. No known
copyright restrictions.
between pairs of alleles found to be derived from the Neanderthal
genome (and nearly absent from African populations). In Figure 3.16
we show the average LD between these loci as a function of the ge-
netic distance (r) between them, from the work of Sankararaman
et al..
Figure 3.16: The LD between

putative-Neanderthal alleles in a
modern European population (the
0.06
CEU sample from the 1000 Genomes

Project). Each point represents the
Neanderthal LD
0.04
average D statistic between a pair

of alleles at loci at a given genetic
distance apart (as given on the x-axis
0.02
and measured in centiMorgans (cM)).

The putative Neanderthal alleles are
0.00
alleles where the Neanderthal genome

0.0 0.2 0.4 0.6 0.8 1.0
has a derived allele that is at very low
Genetic distance (cM)
frequency in a modern-human West
African population sample (thought
to have little admixture from Nean-
derthals). The red line is the fit of
an exponential decay of LD, using
Assuming a recombination rate r, we can fit the exponential decay non-linear least squared (nls in R).
of LD predicted by eqn. (3.23) to the data points in this figure; the fit
is shown as a red line. Doing this we estimate t = 1200 generations, or
about 35 thousand years (using a human generation time of 29 years).
Thus the LD in modern Eurasians, between alleles derived from the
interbreeding with Neanderthals, represents over thirty thousand years
of recombination slowly breaking down these old associations. 10 10
The calculation done by
Sankararaman et al. (2012) is
actually a bit more involved as they
account for inhomogeneity in recom-
bination rates and arrive at a date of
47,334 – 63,146 years.
4
Genetic Drift and Neutral Diversity
Randomness is inherent to evolution, from the lucky

birds blown of course to colonize some new oceanic island, to which
mutations arise first in the HIV strain infecting an individual taking
anti-retroviral drugs. One major source of stochasticity in evolution-
ary biology is genetic drift. Genetic drift occurs because more or less
copies of an allele by chance can be transmitted to the next genera-
tion. This can occur because, by chance, the individuals carrying a
particular allele can leave more or less offspring in the next generation.
In a sexual population, genetic drift also occurs because Mendelian
transmission means that only one of the two alleles in an individual,
chosen at random at a locus, is transmitted to the offspring.
Genetic drift can play a role in the dynamics of all alleles in all
populations, but it will play the biggest role for neutral alleles. A
neutral polymorphism occurs when the segregating alleles at a poly-
morphic site have no discernible differences in their effect on fitness.
We’ll make clear what we mean by ”discernible” later, but for the
moment think of this as ”no effect” on fitness.
The neutral theory of molecular evolution. The role of genetic drift

in molecular evolution has been hotly debated since the 60s when
the Neutral theory of molecular evolution was proposed (see Ohta
and Gillespie, 1996, for a history)1 . The central premise of Neu- 1
Kimura, M., 1968 Evolutionary
tral theory theory is that patterns of molecular polymorphism within rate at the molecular level. Na-
ture 217 (5129): 624–626; King,
species and substitution between species can be well understood by J. L. and T. H. Jukes, 1969
supposing that the vast majority of these molecular polymorphisms Non-darwinian evolution. Sci-
ence 164(3881): 788–798; and
and substitutions were neutral alleles, whose dynamics were just sub- Kimura, M., 1983 The neutral
ject to the vagaries of genetic drift and mutation. Early proponents of theory of molecular evolution. Cam-
bridge University Press
this view suggested that the vast majority of new mutations are either
neutral or highly deleterious (e.g. mutations that disrupt important
protein functions). This latter class of mutations are too deleterious
to contribute much to common polymorphisms or substitutions be-
50 graham coop
tween species, because they are quickly weeded out of the population
by selection.
Neutral theory can sound strange given that much of the time our
first brush with evolution often focuses on adaptation and phenotypic
evolution. However, proponents of this world-view didn’t deny the
existence of advantageous mutations, they simply thought that bene-
ficial mutations are rare enough that their contribution to the bulk of
polymorphism or divergence can be largely ignored. They also often
thought that much of phenotypic evolution may well be adaptive, but
again the loci responsible for these phenotypes are a small fraction of
all the molecular change that occur. The neutral theory of molecular
evolution was originally proposed to explain protein polymorphism.
However, we can apply it more broadly to think about neutral evo-
lution genome-wide. With that in mind, what types of molecular
changes could be neutral? Perhaps:
1. Changes in non-coding DNA that don’t disrupt regulatory se-
quences. For example, in the human genome only about 2% of the
genome codes for proteins. The rest is mostly made up of old trans-
posable element and retrovirus insertions, repeats, pseudo-genes,
and general genomic clutter. Current estimates suggest that, even
counting conserved, functional, non-coding regions, less than 10%
of our genome is subject to evolutionary constraint (Rands et al.,
2014).
2. Synonymous changes in coding regions, i.e. those that don’t change

the amino-acid encoded by a codon.
3. Non-synonymous changes that don’t have a strong effect on the

functional properties of the amino acid encoded, e.g. changes that
don’t change the size, charge, or hydrophobic properties of the
amino acid too much.
4. An amino-acid change with phenotypic consequences, but little

relevance to fitness, e.g. a mutation that causes your ears to be a
slightly different shape, or that prevents an organism from living
past 50 in a species where most individuals reproduce and die by
their 20s.
There are counter examples to all of these ideas, e.g. synonymous
changes can affect the translation speed and accuracy of proteins and
so are subject to selection. However, the list above hopefully convinces
you that the general thinking that some portion of molecular change
may not be subject to selection isn’t as daft as it may have initially
sounded.
Various features of molecular polymorphism and divergence have
been viewed as consistent with the neutral theory of molecular evo-
population and
quantitative
genetics 51
lution. The two we’ll focus on in this chapter are the high level of
molecular polymorphism in many species (see for example Figure 2.3)
and the molecular clock. We’ll see that various aspects of the origi-
nal neutral theory have merit in describing some features and types
of molecular change, but we’ll also see that it is demonstrably wrong
in some cases. We’ll also see the primary utility of the neutral theory
isn’t whether it is right or wrong, but that it serves as a simple null
model that can be tested and in some cases rejected, and subsequently
built on. The broader debate currently in the field of molecular evolu-
tion is the balance of neutral, adaptive, and deleterious changes that
drive different types of evolutionary change.
4.1 Loss of heterozygosity due to drift.
Genetic drift will, in the absence of new mutations, slowly purge our
population of neutral genetic diversity, as alleles slowly drift to high or
low frequencies and are lost or fixed over time.
Imagine a randomly mating population of a constant size N diploid
individuals, and that we are examining a locus segregating for two
alleles that are neutral with respect to each other. This population is
randomly mating with respect to the alleles at this locus. See Figures
4.1 and 4.2 to see how genetic drift proceeds, by tracking alleles within
a small population.
In generation t our current level of heterozygosity is Ht , i.e. the
probability that two randomly sampled alleles in generation t are
non-identical is Ht . Assuming that the mutation rate is zero (or van-
ishingly small), what is our level of heterozygosity in generation t + 1?
Figure 4.1: Loss of heterozygosity

over time, in the absence of new
mutations. A diploid population of 5
individuals over the generations, with
lines showing transmission. In the
first generation every individual is a
heterozygote. Code here.
In the next generation (t + 1) we are looking at the alleles in the off-

spring of generation t. If we randomly sample two alleles in generation
t + 1 which had different parental alleles in generation t, that is just
like drawing two random alleles from generation t. So the probability
that these two alleles in generation t + 1, that have different parental
52 graham coop

over time, in the absence of new
mutations. A diploid population of 5
individuals. In the first generation I
colour every allele a different colour so
we can track their descendants. Code
here.
alleles in generation t, are non-identical is Ht .

Conversely, if the two alleles in our pair had the same parental
allele in the proceeding generation (i.e. the alleles are identical by
descent one generation back) then these two alleles must be identical
(as we are not allowing for any mutation).
In a diploid population of size N individuals there are 2N alleles.
The probability that our two alleles have the same parental allele in
the proceeding generation is 1/(2N ) and the probability that they have
different parental alleles is is 1 − 1/(2N ). So by the above argument, the
expected heterozygosity in generation t + 1 is
( )
1 1
Ht+1 = ×0+ 1− Ht (4.1)
2N 2N
Thus, if the heterozygosity in generation 0 is H0 , our expected het-
erozygosity in generation t is
( )t
1
Ht = 1 − H0 (4.2)
2N
i.e. the expected heterozygosity within our population is decaying geo-
metrically with each passing generation. If we assume that 1/(2N ) ≪ 1
then we can approximate this geometric decay by an exponential de-
cay (see Question 2 below), such that
Ht = H0 e− /(2N )
t
(4.3)
i.e. heterozygosity decays exponentially at a rate 1/(2N ).

In Figure 4.3 we show trajectories through time for 40 indepen-
dently simulated loci drifting in a population of 50 individuals. Each
population was started from a frequency of 30%. Some drift up and
some drift down, eventually being lost or fixed from the population,
but, on average across simulations, the allele frequency doesn’t change.
We also track heterozygosity, you can see that heterozygosity some-
times goes up, and sometimes goes down, but on average we are losing
heterozygosity, and this rate of loss is well predicted by eqn. (4.2).
population and
quantitative
genetics 53
Figure 4.3: Change in allele frequency

and loss of heterozygosity over time
for 40 replicates. Simulations of
1.0
0.5
1 sim.
Mean sim. genetic drift in a diploid population of
Expectation 50 individuals, in the absence of new
0.8
0.4
mutations. We start 40 independent,
Heterozygosity
Frequency, p
biallelic loci each with an initial

0.6
0.3
allele at 30% frequency. The left
panel shows the allele frequency over
0.4
0.2
time and the right panel shows the
heterozygosity over time, with the
0.2
0.1
mean decay matching eqn. (4.2).
Code here.
0.0
0.0
0 50 100 150 0 50 100 150
Time, generations Time, generations
Question 1. You are in charge of maintaining a population of

delta smelt in the Sacramento river delta. Using a large set of mi-
crosatellites you estimate that the mean level of heterozygosity in this
population is 0.005. You set yourself a goal of maintaining a level of
heterozygosity of at least 0.0049 for the next two hundred years. As- Figure 4.4: Pond smelt (Hypomesus
suming that the smelt have a generation time of 3 years, and that only olidus), a close relative of delta smelt.
Bulletin of the United States Fish Commission.
1906. Image from the Biodiversity Heritage
genetic drift affects these loci, what is the smallest fully outbreeding Library. Contributed by Smithsonian Libraries.
Not in copyright.
population that you would need to maintain to meet this goal?
Note how this picture of decreasing heterozygosity stands in con-

trast to the consistency of Hardy-Weinberg equilibrium from the pre-
vious chapter. However, our Hardy-Weinberg proportions still hold
in forming each new generation. As the offspring genotypes in the
next generation (t + 1) represent a random draw from the previous
generation (t), if the parental frequency is pt , we expect a proportion
2pt (1 − pt ) of our offspring to be heterozygotes (and HW proportions
for our homozygotes). However, because population size is finite, the
observed genotype frequencies in the offspring will (likely) not match
exactly with our expectations. As our genotype frequencies likely
change slightly due to sampling, biologically this reflects random vari-
ation in family size and Mendelian segregation, the allele frequency
will changed. Therefore, while each generation represents a sample
from Hardy-Weinberg proportions based on the generation before, our
genotype proportions are not at an equilibrium (an unchanging state)
as the underlying allele frequency changes over the generations. We’ll
develop some mathematical models for these allele frequency changes
later on. For now, we’ll simply note that under our simple model of
drift (formally the Wright-Fisher model), our allele count in the t + 1th
generation represents a binomial sample (of size 2N ) from the popu-
54 graham coop
lation frequency pt in the previous generation. If you’ve read to here,

please email Prof Coop a picture of JBS Haldane in a striped suit with
the title ”I’m reading the chapter 3 notes”. (It’s well worth googling
JBS Haldane and to read more about his life; he’s a true character and
one of the last great polymaths. )

in the Black-footed Ferrets in their
declining population. Numbers in
0.30
brackets give estimated number of

individuals alive at that time. Data
from Wisely et al. (2002). Code
0.25
here.
●
(N>10k)
Heterozygosity (HE)
●
0.20
(N=62)
0.15
0.10
●
(N=40)
●
(N=7)
0.05
0.00
Figure 4.5: The black-footed ferret

1880 1900 1920 1940 1960 1980
(M. nigripes).
Wild animals of North America, The National
geographical society, 1918. Image from the
Year Biodiversity Heritage Library. Contributed by
Not in copyright.
To see how a decline in population size can affect levels of het-

erozygosity, let’s consider the case of black-footed ferrets (Mustela
nigripes). The black-footed ferret population has declined dramatically
through the twentieth century due to destruction of their habitat. In
1979, when the last known black-footed ferret died in captivity, they
were thought to be extinct. In 1981, a very small wild population was
rediscovered (40 individuals), but in 1985 this population suffered a
number of disease outbreaks. All of the 18 remaining wild individuals
were brought into captivity, 7 of which reproduced. Thanks to intense
captive breeding efforts and conservation work, a wild population of
over 300 individuals has been established since. However, because
all of these individuals are descended from those 7 individuals who
survived the bottleneck, diversity levels remain low. Wisely et al.
measured heterozygosity at a number of microsatellites in individu-
als from museum collections, showing the sharp drop in diversity as
population sizes crashed (see Figure 4.6).
Question 2. In mathematical population genetics, a commonly
used approximation is (1 − x) ≈ e−x for x << 1 (formally, this
population and
quantitative
genetics 55
follows from the Taylor series expansion of exp(−x), ignoring second
order and higher terms of x). This approximation is especially useful
for approximating a geometric decay process by an exponential decay
process, e.g. (1 − x)t ≈ e−xt . Using your calculator, or R, check how
good of an approximation this is compared to the exact expression for
two values of x, x = 0.1, and 0.01, across two different values of t,
t = 5 and t = 50. Briefly comment on your results.
4.1.1 Levels of diversity maintained by a balance between mutation

and drift
Next we’re going to consider the amount of neutral polymorphism that
can be maintained in a population as a balance between genetic drift
removing variation and mutation introducing new neutral variation,
see Figure 4.7 for an example. Note in our example, how no single
allele is maintained at a stable equilibrium, rather an equilibrium level
of polymorphism is maintained by a constantly shifting set of alleles.
Figure 4.7: Mutation-drift balance. A

diploid population of 5 individuals. In
the first generation everyone has the
same allele (black). Each generation
the transmitted allele can mutate
and we generate a new colour. In the
bottom plot, I trace the frequency of
alleles in our population over time.
The mutation rate we use is very
high, simply to maintain diversity in
this small population. Code here.
The neutral mutation rate. We’ll first want to consider the rate at
which neutral mutations arise in the population.Thinking back to our
discussion of the neutral theory of molecular evolution, let’s suppose
that there are only two classes of mutation that can arise in our ge-
nomic region of interest: neutral mutations and highly deleterious mu-
tations. The total mutation rate at our locus is µ per generation, i.e.
per transmission from parent to child. A fraction C of our mutations
are new alleles that are highly deleterious and so quickly removed
from the population. We’ll call this C parameter the constraint, and
it will differ according to the genomic region we consider. The remain-
ing fraction (1 − C) are our neutral mutations, such that our neutral
mutation rate is (1 − C)µ. This is the per generation rate.
Question 3. It’s worth taking a minute to get familiar with both
how rare, and how common, mutation is. The per base pair mutation
56 graham coop
rate in humans is around 1.5 × 10−8 per generation. That means, on

average, we have to monitor a site for ∼ 66.6 million transmissions
from parent to child to see a mutation. Yet populations and genomes
are big places, so mutations are common at these levels.
A) Your autosomal genome is ∼ 3 billion base pairs long (3 × 109 ).
You have two copies, the one you received from your mum and one
from your dad. What is the average (i.e. the expected) number of
mutations that occurred in the transmission from your mum and your
dad to you?
B) The current human population size is ∼ 7 billion individuals.
How many times, at the level of the entire human population, is a
single base-pair mutated in the transmission from one generation to
the next?
Levels of heterozygosity maintained as a balance between mutation and

drift. Looking backwards in time from one generation to the previ-
ous generation, we are going to say that two alleles which have the
same parental allele (i.e. find their common ancestor) in the preceding
generation have coalesced, and refer to this event as a coalescent event.
The probability that our pair of randomly sampled alleles have
coalesced in the preceding generation is 1/(2N ), and the probability
that our pair of alleles fail to coalesce is 1 − 1/(2N ).
The probability that a mutation changes the identity of the trans-
mitted allele is µ per generation. So the probability of no mutation
occurring is (1 − µ). We’ll assume that when a mutation occurs it cre-
ates some new allelic type which is not present in the population. This
assumption (commonly called the infinitely-many-alleles model) makes
the math slightly cleaner, and also is not too bad an assumption bi-
ologically. See Figure 4.7 for a depiction of mutation-drift balance in
this model over the generations.
This model lets us calculate when our two alleles last shared a
common ancestor and whether these alleles are identical as a result of
failing to mutate since this shared ancestor. For example, we can work
out the probability that our two randomly sampled alleles coalesce 2
generations in the past (i.e. they fail to coalesce in generation 1 and
then coalesce in generation 2), and that they are identical as
( )
1 1
1− (1 − µ)4 (4.4)
2N 2N
Note the power of 4 is because our two alleles have to have failed to
mutate through 2 meioses each.
More generally, the probability that our alleles coalesce in gener-
ation t + 1 (counting backwards in time) and are identical due to no
population and
quantitative
genetics 57
mutation to either allele in the subsequent generations is
( )t
1 1 2(t+1)
P (coal. in t+1 & no mutations) = 1− (1 − µ) (4.5)
2N 2N
To make this slightly easier on ourselves let’s further assume that
t ≈ t + 1 and so rewrite this as:
( )t
1 1 2t
P (coal. in t+1 & no mutations) ≈ 1− (1 − µ) (4.6)
2N 2N
This gives us the approximate probability that two alleles will
coalesce in the (t + 1)th generation. In general, we may not know
when two alleles may coalesce: they could coalesce in generation
t = 1, t = 2, . . ., and so on. Thus, to calculate the probability that
two alleles coalesce in any generation before mutating, we can write:
P (coal. in any generation & no mutations) ≈P (coal. in t = 1 & no mutations) +

P (coal. in t = 2 & no mutations) + . . .
∞
∑
= P (coal. in t generations & no mutation)
t=1
(4.7)
an example of using the Law of Total Probability, see Appendix eqn

(A.12), combined with the fact that coalescing in a particular genera-
tion is mutually exclusive with coalescing in a different generation.
While we could calculate a value for this sum given N and µ, it’s
difficult to get a sense of what’s going on with such a complicated
expression. Here, we turn to a common approximation in popula-
tion genetics (and all applied mathematics), where we assume that
1/(2N ) ≪ 1 and µ ≪ 1. This allows us to approximate the geometric
decay as an exponential decay (see Appendix eqn (A.2)). Then, the

probability two alleles coalesce in generation t + 1 and don’t mutate
can be written as:
( )t
1 1 2t
P (coal. in t+1 & no mutations) ≈ 1− (1 − µ) (4.8)
2N 2N
1 −t/(2N ) −2µt
≈ e e (4.9)
2N
1 −t(2µ+1/(2N ))
= e (4.10)
2N
Then we can approximate the summation by an integral, giving us:
∫ ∞
1 1/(2N ) 1
e−t(2µ+1/(2N )) dt = = (4.11)
2N 0 1/(2N ) + 2µ 1 + 4N µ
The equation above gives us the probability that our two alleles
coalesce at some point in time, and do not mutate before reaching
58 graham coop
their common ancestor. Equivalently, this can be thought of as the

probability our two alleles coalesce before mutating, i.e. that they are
homozygous.
Then, the complementary probability that our pair of alleles are
non-identical (or heterozygous) is simply one minus this. The follow-
ing equation gives the equilibrium heterozygosity in a population at
equilibrium between mutation and drift: This result was derived by Kimura
and Crow (1964) and Malécot
4N µ (1948) (see Malécot, 1969, for an
H= (4.12) English translation, the lack of earlier
1 + 4N µ translation meant this result was
The compound parameter 4N µ, the population-scaled mutation rate, missed). Technically we’re assuming
that every new mutation creates a
will come up a number of times so we’ll give it its own name: new allele, the so-called ”infinitely
many alleles” model, otherwise our
θ = 4N µ (4.13) pair of sequences could be identical
due to repeat or back mutation.
So all else being equal, species with larger population sizes should See this GENETICS blog post and
Ewens (2016) for a nice discussion of
have proportionally higher levels of neutral polymorphism. the history.
Question 4. The sequence-level heterozygosity in Capsella gran-

diflora (grand shepherd’s purse) is ∼ 2% per base. Assuming a mu-
tation rate of 10−9 bp−1 per generation, what is your estimate of the
population size of C. grandiflora?
4.1.2 The effective population size

the effective population size (Ne )
In practice, populations rarely conform to our assumptions of being is the population size that would
constant in size with low variance in reproductive success. Real popu- result in the same rate of drift in
an idealized population of constant
lations experience dramatic fluctuations in size, and there is often high size (following our modeling assump-
variance in reproductive success. Thus rates of drift in natural pop- tions) as that observed in our true
population .
ulations are often a lot higher than the census population size would
imply. See Figure 4.8 for a depiction of a repeatedly bottlenecked
population losing diversity at a fast rate.

over time in a bottlenecking popu-
lation. A diploid population of 10
individuals, that bottlenecks down to
three individuals repeatedly. In the
first generation, I colour every allele
a different colour so we can track
their descendants. There are no new
mutations. Code here.
To cope with this discrepancy, population geneticists often invoke

the concept of an effective population size (Ne ). In many situations
population and
quantitative
genetics 59
(but not all), departures from model assumptions can be captured by
substituting Ne for N .
If population sizes vary rapidly in size, we can (if certain conditions
are met) replace our population size by the harmonic mean population
size. Consider a diploid population of variable size, whose size is Nt t 2
To see this, note that if 1/(Ni ) is
generations into the past. The probability our pairs of alleles have not small, then we can approximate (4.14)
coalesced by generation t is given by using the exponential approximation:
( ) ( )
t ( ) ∏ ∑
t t
1 1
∏ 1 exp − = exp − .
1− (4.14) i=1
2Ni i=1
2Ni
i=1
2Ni (4.16)
When we put the product inside
( 1 t
) the exponent, it becomes a sum.
Note that this simply collapses to our original expression 1 − 2N if
We can also write the probability of
Ni is constant. Under this model, the rate of loss of heterozygosity in not coalescing by generation t in a
this population is equivalent to a population of effective size population of constant size (Ne ) as an
exponential, so that it takes the same
1 form as the expression above on the
Ne = 1
∑t 1
. (4.15) right. Comparing the exponent in the
t i=1 Ni two cases, we see
∑ t
This is the harmonic mean of the varying population size. 2 t
= 1/(2Ni ) (4.17)
2Ne
Thus our effective population size, the size of an idealized constant i=1
population which matches the rate of genetic drift, is the harmonic So that if we want a constant effective
population size (Ne ) that has the
mean true population size over time. The harmonic mean is very same rate of loss of heterozygosity as
strongly affected by small values, such that if our population size is our variable population, we need to
one million 99% of the time but drops to 1000 every hundred or so rearrange and solve this equation to
give (4.15).
generations, Ne will be much closer to 1000 than a million.
Figure 4.9: High variance on repro-

ductive success increases the rate of
genetic drift. A diploid population
of 10 individuals, where the circled
individuals have much higher repro-
ductive success. In the first generation
I colour every allele a different colour
so we can track their descendants,
there are no new mutations. Code
here.
Variance in reproductive success will also affect our effective pop-

ulation size. Even if our population has a large constant size N indi-
viduals, if only small proportion of them get to reproduce, then the
rate of drift will reflect this much smaller number of reproducing indi-
viduals. See Figure 4.9 for a depiction of the higher rate of drift in a
population where there is high variance in reproductive success.
To see one example of this, consider the case where NF of females
get to reproduce and NM males get reproduce. While every individual
has a mother an a father, not every individual gets to be a parent. In
60 graham coop
practice, in many animal species far more females get to reproduce

than males, i.e. NM < NF , as a few males get many mating oppor-
tunities and many males get no/few mating opportunities (see Jan-
icke et al., 2016, for a broad analysis, and note that there a certainly
many exceptions to this general pattern). When our two alleles pick
an ancestor, 25% of the time our alleles were both in a female ances-
tor, in which case they are IBD with probability 1/(2NF ), and 25% of
the time they are both in a male ancestor, in which case they coalesce
with probability 1/(2NM ). The remaining 50% of the time, our alleles
trace back to two individuals of different sexes in the prior generation
and so cannot coalesce. Therefore, our probability of coalescence in
the preceding generation is
( ) ( )
1 1 1 1
+ (4.18)
4 2NM 4 2NF
i.e. the rate of coalescence is the harmonic mean of the two sexes’
1
population sizes, equating this to 2N e
we find
4NF NM
Ne = (4.19)
NF + NM Figure 4.10: Male Hamadryas ba-
boons. Up to ten females live in a
Thus if reproductive success is very skewed in one sex (e.g. NM ≪ harem with a single male.
Brehm’s Tierleben (Brehm’s animal life).
N /2), our effective population size will be much reduced as a re- Brehm, A.E. 1893. Image from the Biodiversity
Heritage Library. Contributed by University of
sult. For more on how different evolutionary forces affect the rate Illinois Urbana-Champaign. Not in copyright.
of genetic drift, and their impact on the effective population size, see
Charlesworth (2009).
Question 5. You are studying a population of 500 male and 500

female Hamadryas baboons. Assume that all of the females but only
1/10 of the males get to mate: A) What is the effective population
size for the autosome?
B) Do you expect the ratio of X-chromosome to autosomal diversity
to be higher or lower in this species compared to a species where the
sexes have more similar variance in reproductive success? Explain the
intuition behind your answer.
4.2 The Coalescent and patterns of neutral diversity

“Life can only be understood backwards; but it must be lived for-
wards” – Kierkegaard
Pairwise Coalescent time distribution and the number of pairwise

differences. Thinking back to our calculations we made about the
loss of neutral heterozygosity and equilibrium levels of diversity (in
Sections 4.1 and 4.1.1), you’ll note that we could first specify which
generation a pair of sequences coalesce in, and then calculate some
population and
quantitative
genetics 61
properties of heterozygosity based on that. That’s because neutral
mutations do not affect the probability that an individual transmits
an allele, and so don’t affect the way in which we can trace ancestral
lineages back through the generations. In discussing the coalescent we’ll
As such, it will often be helpful to consider the time to the common be making use of random variables,
e.g. number of generations back to
ancestor of a pair of sequences, and then think of the impact of that the common ancestor of a pair of
time to coalescence on patterns of diversity. See Figure 4.11 for an sequences is a random variable. We’ll
example of this. also use the expectation of random
variables, e.g. the average number
of generations back to the common
ancestor of a pair of sequences. Have
a look at sections A.2.1 and A.2.3.
Figure 4.11: A simple demonstration

of the coalescent process. The simu-
lation consists of a diploid population
of 10 individuals (20 alleles). In each
generation, each individual is equally
likely to be the parent of an offspring
(and the allele transmitted is indi-
cated by a light grey line). We track
a pair of alleles, chosen in the present
day, back 14 generations until they
The probability that a pair of alleles have failed to coalesce in t find a common ancestor. Code here.
generations and then coalesce in the t + 1 generation back is
( )t
1 1
P (T2 = t + 1) = 1− (4.20)
2N 2N
For example, the probability that a pair of sequences coalesce three
generations back is the probability that they fail to coalesce in gen-
eration 1 and 2, which is (1 − 1/2N ) × (1 − 1/2N ), multipled by the
probability that they find a common ancestor, i.e. coalesce, in the
third generation, which happens with probability 1/2N .
From the form of eqn (4.20) we can see that the coalescent time
of our pair of alleles is a Geometrically distributed random variable,3 3
See Appendix eqn (A.29) and sur-
where the probability of success is p = 1/2N . The waiting time for rounding text for more on the Geo-
metric distribution.
a pair of lineages to coalesce is like the number of tails thrown while
waiting for a head on a coin with the probability of a head is 1/2N ,
i.e. id the population is large we might be waiting for a long time
for our pair to coalesce. We’ll denote this geometric distribution by
T2 ∼ Geo(1/(2N )). The expected (i.e. the mean over many replicates)
coalescent time of a pair of alleles is then
E(T2 ) = 2N (4.21)
generations. This form to the expectation follows from the fact that
the mean of an geometric random variable is 1/p.
62 graham coop
Conditional on a pair of alleles coalescing t generations ago, there

are 2t generations in which a mutation could occur. See Figure 4.12 AGTTT
for an example. If the per generation mutation rate is µ, then the
expected number of mutations between a pair of alleles coalescing
t generations ago is 2tµ (the alleles have gone through a total of 2t
meioses since they last shared a common ancestor).
AGTGT
So we can write the expected number of mutations (S2 ) separating
AGGTT
t
two alleles drawn at random from the population as
∞
∑
E(S2 ) = E(S2 |T2 = t)P (T2 = t) ACTGT
t=0
∑∞
= 2µtP (T2 = t)
ACTGT AGGTT
t=0
= 2µE(T2 )
= 4µN (4.22) Figure 4.12: The ancestral lineages
of a pair of sequences coalese t gen-
this makes use of the law of total expectation (see Appendix eqn erations in the past. There are 2t
generations that mutations could
(A.27)) to average which generation our pair of sequences coalesce occur in that would be differences be-
in. We’ll assume that mutation is rare enough that it never happens tween our sequences. Three mutations
have occured in this time changing
at the same basepair twice, i.e. no multiple hits, such that we get to the ancestral sequence (AGTTT) to
see all of the mutation events that separate our pair of sequences. This the sequences at the bottom of the
is assumption that repeat mutation is vanishingly rare at a basepair picture.
is called the infinitely-many-sites assumption, which should hold if

N µBP ≪ 1, where µBP is the mutation rate per base pair. Thus the
number of mutations between a pair of sites is the observed number
of differences between a pair of sequences. In the previous chapter we
denote the observed number of pairwise differences at putatively neu-
tral sites separating a pair of sequences as π (we usually average this
over a number of pairs of sequences for a region). Therefore, under our
simple, neutral, constant population-size model we expect
E(π) = 4N µ = θ (4.23)
So we can get an empirical estimate of θ from π, let’s call this θbπ , by

setting θbπ = π., i.e. our observed level of pairwise genetic diversity.
If we have an independent estimate of µ, then from setting π = θbπ =
4N µ we can furthermore obtain an estimate of the population size
N that is consistent with our levels of neutral polymorphism. If we
estimate the population size this way, we should call it the effective
coalescent population size (Ne ). It’s best to think about Ne estimated
from neutral diversity as a long-term effective population size for
the species, but there are many caveats that come along with that
assumption. For example, past bottlenecks and population expansions
are all subsumed into a single number and so this estimated Ne may
not be very representative of the population size at any time. That
population and
quantitative
genetics 63
said, it’s not a bad place to start when thinking about the rate of
genetic drift for neutral diversity in our population over long time-
periods. 4 4
Up to this point we’ve been describ-
Lets take a moment to distinguish our expected heterozygosity ing only neutral processes, however,
selection can also alter levels of poly-
(eqn. 4.12) from our expected number of pairwise differences (π). Our morphism. For example, if some
expected heterozygosity is the probability that two alleles at a locus, synonymous sites directly experience
selection, then even if we use π cal-
sampled from a population at random, are different from each other. If culated for synonymous changes we
one or more mutations have occurred since a pair of alleles last shared may underestimate the coalescent
a common ancestor, then our sequences will be different from each effective population size. As we’ll see
later in the notes, selection at linked
other. On the other hand, our π measure keeps track of the average sites can also impact neutral diver-
total number of differences between our loci. As such, π is often a sity. As such, if we can, we may want
to use genomic sites subject to the
more useful measure, as it records the number of differences between weakest selective constraints, and also
the sequences, not just whether they are different from each other far from gene-dense or otherwise very
(however, for certain types of loci, e.g. microsatellites, heterozygosity constrained regions of the genome, to
estimate Ne from π. But even then
is often used as we cannot usually count up the minimum number of caution is warranted.
mutations in a sensible way). In the case where our locus is a single
basepair, the two measures will usually be close to one another, as
H ≈ θ for small values of θ. For example, comparing two sequences
at random in humans, π ≈ 1/1000 per basepair, and the probability
that a specific base pair differs between two sequences is ≈ 1/1000.
However, these two quantities start to differ from each other when
we consider regions with higher mutation rates. For example, if we
consider a 10kb region, our mutation rate will 10,000 times larger than
a single base pair. For this length of sequence the probability that two
randomly chosen haplotypes differ is quite different from the number
of mutational differences between them. (Try a mutation rate of 10−8
per base and a population size of 10, 000 in our calculations of E[π]
and H to see this.)
Question 6. Robinson et al. (2016) found that the endangered
Figure 4.13: Gray Fox, Urocyon
Californian Channel Island fox on San Nicolas had very low levels
cinereoargenteiis.
of diversity (π = 0.000014bp−1 ) compared to its close relative the Diseases and enemies of poultry. Pearson and
Warren. (1897) Image from the Biodiversity
California mainland gray fox (0.0012bp−1 ). Heritage Library. Contributed by University of

California Libraries. Not in copyright.
A) Assuming a mutation rate of 2 × 10−8 per bp, what effective

population sizes do you estimate for these two populations?
B) Why is the effective population size of the Channel Island fox
so low? [Hint: quickly google Channel island foxes to read up on their
history, also to see how ridiculously cute they are.]
Question 7. In your own words describe why the coalescent time

of a pair of lineages scales linearly with the (effective) population size.
More details on the pairwise coalescent and the randomness of muta-

tion. We found that our pairwise coalescent times followed a Geo-
64 graham coop
metric distribution, eqn (4.20). However, that assumes discrete genera-

tions and we’ll often was to think about populations that lack discrete
generations (i.e. individuals reproducing at random times with some
mean generation time). Using our exponential approximation, we can
see that is
1 −t/(2N )
≈ e (4.24)
2N
and so think of a continuous random variable, i.e. we could say that
the coalescent time of a pair of sequences (T2 ) is approximately ex-
ponentially distributed with a rate 1/(2N ), i.e. T2 ∼ Exp (1/(2N )).
Formally we can do this by taking the limit of the discrete process
more carefully. See Appendix eqn (A.33) for more on exponential
random variables.
We’ve derived the expected number of differences between a pair of
sequences and talked about how variable the coalescent time is for a
pair of sequences. The mutation process is also very variable; even if
two sequences coalesce in the very distant past by chance, they may
still be identical in the present if there was no mutation during that
time.
Conditional on the coalescent time t, the probability that our pair
of alleles are separated by S2 mutations since they last shared a com-
mon ancestor is bionomially distributed
( )
2t j
P (S2 |T2 = t) = µ (1 − µ)2t−j (4.25)
j
i.e. mutations happen in j generations and do not happen in 2t − j

( )
generations (with 2t
j ways this combination of events can possibly
happen). See Appendix eqn (A.28) for discussion of the binomial
distribution. Assuming that µ ≪ 1 and that 2t − j ≈ 2t, then we can
approximate the probability that we have S2 mutations as a Poisson
distribution:
(2µt)j e−2µt
P (S2 |T2 = t) = (4.26)
j!
i.e. a Poisson with mean 2µt. This is an example of taking the bi-
nomial distribution to its Poisson distribution limit, see Appendix
eqn (A.31) for more details. We’ll not make much use of this result,
but it is very useful in thinking about how to simulate the process of
mutation.
4.3 The coalescent process of a sample of alleles.
Usually we are not just interested in pairs of alleles, or the average

pairwise diversity. Generally we are interested in the properties of di-
versity in samples of a number of alleles drawn from the population.
Instead of just following a pair of lineages back until they coalesce, we
population and
quantitative
genetics 65
can follow the history of a sample of alleles back through the popula-
tion.
Consider first sampling three alleles at random from the population.
The probability that all three alleles choose exactly the same ancestral
allele one generation back is 1/(2N )2 . If N is reasonably large, then this
is a very small probability. As such, it is very unlikely that our three
alleles coalesce all at once, and in a moment we’ll see that it is safe to
ignore such unlikely events.
Figure 4.14: A simple simulation

of the coalescent process for three
lineages. We track the ancestry of
three modern-day alleles, the first
TMRCA(=11 gens) pair (blue and purple) coalesce four
generations back, after which there
T2(=8 gens) T3 (=3 gens) are only two independent lineages
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● we are tracking. This pair then
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
coalesces twelve generations in the
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● past. Note that different random
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● realizations of this process will differ
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
from each other a lot. The TM RCA is
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● T3 + T2 . The total time in the tree is
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● Ttot = 3T3 + 2T2 = 25 generations.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Generations Generations
The probability that a specific pair of alleles find a common ances-

tor in the preceding generation is still 1/(2N ). There are three possible
pairs of alleles, so the probability that no pair finds a common ances-
tor in the preceding generation is
( )3 ( )
1 3
1− ≈ 1− (4.27)
2N 2N
In making this approximation we are multiplying out the right hand-

side and ignoring terms of 1/N 2 and higher (a Taylor approximation,
see Appendix eqn (A.2)). See Figure 4.14 for a random realization of
this process.
()
More generally, when we sample i alleles there are 2i pairs,5 i.e. 5
said as “i choose 2”
i(i − 1)/2 pairs. Thus, the probability that no pair of alleles in a
sample of size i coalesces in the preceding generation is
( )(2i ) ( (i) )
1
1− ≈ 1− 2 (4.28)
(2N ) 2N
66 graham coop
while the probability any pair coalesces is ≈ (2)/2N , again using eqn
i
(A.2).
We can ignore the possibility that more than pairs of alleles (e.g.
tripletons) simultaneously coalesce at once as terms of 1/N 2 and higher
can be ignored as they are vanishingly rare. Obviously in reasonable
()
sample sizes there are many more triples ( 3i ) and higher order com-
()
binations than there are pairs ( 2i ), but if i ≪ N then we are safe to
ignore these terms.
When there are i alleles, the probability that we wait until the t + 1
generation before any pair of alleles coalesces is
()(
i
( ) )t
i
P (Ti = t + 1) = 2
1− 2
(4.29)
2N 2N
Thus the waiting time to the first coalescent event while there are i
lineages is a geometrically distributed random variable6 with probabil- 6
see Appendix eqn (A.29).
i
ity of success p = (2)/2N , which we denote by
( i )
Ti ∼ Geo (2)/2N . (4.30)
The mean waiting time till any of pair within our sample coalesces is
2N
E(Ti ) = ( i ) (4.31)
2
which again follows from the mean of a geometric random variable

being 1/p. After a pair of alleles first finds a common ancestral allele To see the continuous time version of
some number of generations back in the past, we only have to keep this, note that (4.29) is
(i) ( (i) )
track of that common ancestral allele for the pair when looking further
≈ 2
exp − 2
t (4.32)
into the past. Thus when a pair of alleles in our sample of i alleles 2N 2N
coalesces, we then switch to having to follow i − 1 alleles back in time. The waiting time Ti to the first coa-
lescent event in a sample of i alleles
Then when a pair of these i − 1 alleles coalesce, we then only have to
is thus
( )
exponentially distributed
(( ) with)
follow i − 2 alleles back. This process continues until we coalesce back i i
rate 2 /2N , i.e. Ti ∼ Exp 2 /2N .
to a sample of two, and from there to a single most recent common
ancestor (MRCA).
Simulating a coalescent genealogy To simulate a coalescent genealogy

at a locus for a sample of n alleles we therefore simply follow the
following algorithm:
1. Set i = n.
2. Simulate a random variable to be the time Ti to the next coalescent

( i )
event from Ti ∼ Exp (2)/2N
3. Choose a pair of alleles to coalesce at random from all possible

pairs.
4. Set i = i − 1
population and
quantitative
genetics 67
5. Continue looping steps 2-4 until i = 1, i.e. the most recent common
ancestor of the sample is found.
By following this algorithm we are generating realizations of the ge-
nealogy of our sample.
4.3.1 Expected properties of coalescent genealogies and mutations.
Figure 4.15: A simple coalescent tree

from a single coalescent simulation,
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● tracing the genealogy of 4 alleles
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● with mutational changes marked with
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● dashes showing transitions away from
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● the MRCA sequence (AGTTT) . The
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● TM RCA is T4 +T3 +T2 . The total time
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● in the tree is Ttot = 4T4 + 3T3 + 2T2 =
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● 54 generations. Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
AGTTC
AGGTC
Generations
TMRCA(=18 gens) AGGTC
AGTGT
ACTGT
AGGTC
AGTTT
ACTGT
CGTTT
CGTTT
T2(=2 gens) T3 (=14 gens) T4 (=2gens)
The expected time to the most recent common ancestor. We will first
consider the time to the most recent common ancestor of the entire
sample (TM RCA ). This is
∑
2
TM RCA = Ti (4.33)
i=n
generations back, where we are summing from i = n alleles counting

backwards to i = 2 alleles (see Figure 4.15 for example). As our
coalescent times for different i are independent, the expected time to
the most recent common ancestor is
∑2 ∑2 ( )
i
E(TM RCA ) = E(Ti ) = 2N / (4.34)
i=n i=n
2
Using the fact that 1

i(i−1) = 1
i−1 − 1
i and a bit of rearrangement, we
can rewrite this as
( )
1
E(TM RCA ) = 4N 1− (4.35)
n
68 graham coop
So the average TM RCA scales linearly with population size N . Inter-

estingly, as we move to larger and larger samples (i.e. n ≫ 1), the
average time to the most recent common ancestor converges on 4N .
What’s happening here is that in large samples our lineages typically
coalesce rapidly at the start and very soon coalesce down to a much
smaller number of lineages.
Question 8. Assume an autosomal effective population of 10,000
individuals (roughly the long-term human estimate) and a generation
time of 30 years. What is the expected time to the most recent com-
mon ancestor of a sample of 20 people? What is this time for a sample
of 500 people?
The expected total time in a genealogy and the number of segregating

sites. Mutations fall on specific lineages of the coalescent genealogy
and are transmitted to all descendants of their lineage. Furthermore,
under the infinitely-many-sites assumption, each mutation creates a
new segregating site. The mutation process is a Poisson process, and
the longer a particular lineage, i.e. the more generations of meioses it
represents, the more mutations that can accumulate on it. The total
number of segregating sites in a sample is thus a function of the total
amount of time in the genealogy of the sample, or the sum of all the
branch lengths on the genealogical tree, Ttot . Our total amount of
time in the genealogy is
∑
2
Ttot = iTi (4.36)
i=n
as when there are i lineages, each contributes a time Ti to the total

time (see Figure 4.15 for an example). Taking the expectation of the
total time in the genealogy,
∑2
2N ∑2
4N ∑1
4N
E(Ttot ) = i (i) = = (4.37)
i=n 2 i=n
i − 1 i=n−1
i
we see that our expected total amount of time in the genealogy scales
linearly with our population size N . Our expected total amount of
time is also increasing with sample size n, but is doing so very slowly.
This again follows from the fact that in large samples, the initial To get a better sense of how Ttot
coalescence usually happens very rapidly, so that extra samples add grows with the sample size, we can
approximate the sum 4.37 by an
little to the total amount of time in the genealogical tree. integral, which
∫ will work for large n.
We saw above that the number of mutational differences between The result is 1n−1 4Ni
di = 4N log(n −
1).
a pair of alleles that coalescence T2 generations ago was Poisson with
a mean of 2µT2 , where 2T2 is the total branch length in this simple
2-sample genealogical tree. A mutation that occurs on any branch of
our genealogy will cause a segregating polymorphism in the sample
population and
quantitative
genetics 69
(meeting our infinitely-many-sites assumption). Thus, if the total time
in the genealogy is Ttot , there are Ttot generations for mutations. So
the total number of mutations segregating in our sample (S) is Poisson
with mean µTtot . Thus the expected number of segregating sites in a
sample of size n is
∑1
4N µ ∑1
1
E(S) = µE(Ttot ) = =θ (4.38)
i=n−1
i i=n−1
i
Note that this is growing with the sample size n, albeit very slowly
(roughly at the rate of the log of the sample size). We can use this
formula to derive another estimate of the population scaled mutation
rate θ, by setting our observed number of segregating sites in a sample
(S) equal to this expectation. We’ll call this estimator θbW :
S
θbW = ∑1 (4.39)
1/i
i=n−1
This estimator of θ was devised by Watterson (1975), hence the

W.
The neutral site-frequency spectrum. We can use our coalescent pro-

cess to find the expected number of derived alleles present i times out
of a sample size n, e.g. how many singletons (i = 1) do we expect T2
to find in our sample? For example, in Figure 4.15 in our sample of
four sequences, there are 3 singletons and 2 doubletons. The number
of sites with these different allele frequencies depends on the lengths
of specific genealogical branches. A mutation that falls on a branch
with i descendants will create a derived allele with frequency i. For
example, in our example tree in Figure 4.15, the total number of gen- T3
erations where a mutation could arise and be a doubleton is T3 + 2T2 ,
the total length of the branch ancestral to just the orange and red
allele (T3 + T2 ) plus the branch ancestral to just the blue and purple
allele (T2 ).
To see how we could go about working this out, lets start by consid- Figure 4.16: A tree for three samples;
note that this is the only possible
ering the simple coalescent tree, shown in Figure 4.16, for sample of 3 tree shape (treating the tips as unla-
alleles drawn from a population. Mutations that fall on the branches beled, i.e. I don’t care which pair of
coloured in black will be derived singletons, while mutations that sequences carry a doubleton, just that
any two sequences carry a derived
fall along the orange branch will be doubletons in the sample. The allele).
total number of generations where a singleton mutation could arise
is 3T3 + T2 . Note that we only count the time where there are two
lineages (T2 ) once. So our expected number of singletons, using eqn
(4.31), is
( )
2N
E(Si ) = µ (3E(T3 ) + E(T2 )) = µ 3 + 2N = θ (4.40)
3
70 graham coop
By similar logic, the time where doubletons could arise is T2 and our
expected number of doubletons is E(Si ) = θ/2. Thus, there are on
average half as many doubletons as singletons.
Extending this logic to larger samples might be doable, but is te-
dious (I mean really tedious: for 10 alleles there are thousands of
possible tree shapes and the task quickly gets impossible even compu-
tationally). A nice, relatively simple proof of the neutral site frequency
spectrum is given by (Hudson, 2015), but we won’t give this here.
The general form is:
θ
E(Si ) = (4.41)
i
i.e. there are twice as many singletons as doubletons, three times
as many singletons as tripletons, and so on. The other thing that
will be helpful for us to know is that neutral alleles at intermediate
frequency tend to be old, and those that are rare in the sample are
young. We expect to see a lot more rare alleles in our sample than
common alleles.
Question 9. There are two possible tree shapes that could relate
four samples. Draw both of them and separately colour (or otherwise
mark) the branches by where singletons, doubletons, and tripleton
derived alleles could arise.
We can also ask the probability of observing a derived allele seg-
regating at frequency i/n given that the site is polymorphic in our
sample of size n (i.e. given that 0 < i < n ). We can obtain this
probability by dividing the expected number of sites segregating for an
allele at frequency i by the expected number segregating at all of the
possible allele frequencies for polymorphisms in our sample
E(S ) 1/i
P (i|0 < i < n) = ∑n−1 i = ∑n−1 . (4.42)
j=1 E(Sj )
1/j
j=1
We can interpret this probability as the fraction of polymorphic sites

we expect to find at a frequency i/n.
Tests based on the site frequency spectrum Population geneticists

have proposed a variety of ways to test whether an observed site fre-
quency spectrum conforms to its neutral, constant-size expectations.
These tests are useful for detecting population size changes using data
across many loci, or for detecting the signal of selection at individual
loci. One of the first tests was proposed by (Tajima, 1989), and is
called Tajima’s D. Tajima’s D is
θ̂π − θ̂W
D= (4.43)
C
where the numerator is the difference between the estimate of θ based
on pairwise differences and that based on segregating sites. As these
population and
quantitative
genetics 71
two estimators both have expectation θ under the neutral, constant-
size model, the expectation of D is zero. The denominator C is a
positive constant; it’s the square-root of an estimatorof the variance of
this difference under the constant population size, neutral model. This
constant was chosen for D to have mean zero and variance 1 under the
null model, so we can test for departures from this simple null model.
An excess of rare alleles compared to the constant-size, neutral
model will result in a negative Tajima’s D, because each additional ●
●
● Synonymous
1e+01
rare allele increases the number of segregating sites by 1, but only has ●
● Non−Synonymous
●
a small effect on the number of pairwise differences between samples. θπ ●
●
●
SNP count per kb

In contrast, a positive Tajima’s D reflects an excess of intermediate ●
1e−01
●
●
●
frequency alleles relative to the constant-size, neutral expectation. ●

●
●
1e−02
Alleles at intermediate-frequency increase pairwise diversity more per ● ●
● ●
segregating site than typical, thus increasing θπ more than θW .
1e−03
●
● ●
● ●
●
1e−04
● ●
●
●
4.3.2 Demography and the coalescent 1 10 100 1000 10000
Minor allele count

We’ve already seen how changes in population size can change the rate
at which heterozygosity is lost from the population (see the discussion Figure 4.17: Data from 202 genes
around eqn. (4.14)). If the population size in generation i is Ni , the from 14002 people of European
probability that a pair of lineages coalesce is 1/(2Ni ); this conforms to ancestry (28004 alleles). Note the
double log-scale. The red line gives
our intuition that if the population size is small, the rate at which the neutral, constant population size
pairs of lineages find their common ancestor is faster. We can poten- estimate of the site frequency spec-
trum, our equation (4.41), using a θ
tially accommodate rapid random fluctuations in population size by estimated from π. Note how the non-
simply using the effective population size Ne in place of N . However, synonymous changes are even more
longer term more systematic changes in population size will distort skewed towards rare alleles, likely due
to selection against non-synonymous
the coalescent genealogies, and hence patterns of diversity, in more alleles preventing them from reaching
systematic ways. high frequency. Data from Nelson
et al. (2012). Code here.
We can see how demography potentially distorts the observed fre-
quency spectrum away from the neutral expectation in a very large
sample of humans shown in Figure 4.17. For comparison, the neu-
tral frequency spectrum, eqn (4.41), is shown as a red line. There are
vastly more rare alleles than expected under our neutral, constant-
size-size model, but the neutral prediction and reality agree somewhat
more for alleles that are more common.
Why is this? Well, these patterns are likely the result of the very
recent explosive growth in human populations. If the population has
grown rapidly, then the pairwise-coalescent rate in the past may be
much higher than the coalescent rate closer to the present. (see Figure
4.18).
One consequence of a recent population expansion is that there
is much less genetic diversity in the population than you’d predict
using the census population size. Humans are one example of this
effect; there are 7 billion of us alive today, but this is due to very rapid
72 graham coop
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
Generations
Figure 4.18: A realization of the

coalescent process in a growing pop-
ulation. The population underwent a
population growth over the past thousand to tens of thousands of
period of doubling every generation.
years. Our level of genetic diversity is very much lower than you’d The initial population size of just two
predict given our census size, reflecting our much smaller ancestral individuals, maintained for a number
of generations, is obviously highly
population. A second consequence of recent population expansion is unrealistic but serves our purpose.
that the deeper coalescent branches are much more squished together Code here.
in time compared to those in a constant-sized population. Mutations
on deeper branches are the source of alleles at more intermediate
frequencies, and so there are even fewer intermediate-frequency alleles
in growing populations. That’s why there are so many rare alleles,
especially singletons, in this large sample of Europeans.
Another common demographic scenario is a population bottleneck.
In a bottleneck, the population size crashes dramatically, and sub-
sequently recovers. For example, our population may have had size
NBig and crashed down to NSmall . One example of a bottleneck is
shown in Figure 4.19. Looking at a sample of lineages drawn from the
population today, if the bottleneck was somewhat recent (≪ NBig
generations in the past) many of our lineages will not have coalesced
before reaching the bottleneck, moving backward in time. But during
the bottleneck our lineages coalesce at a much higher rate, such that
many of our lineages will coalesce if the bottleneck lasts long enough
(∼ NSmall generations). If the bottleneck is very strong, then all of
our lineages will coalesce during the bottleneck, and the resulting site
frequency spectrum may look very much like our population growth
model (i.e. an excess of rare alleles). However, if some pairs of lineages
escape coalescing during the bottleneck, they will coalesce much more
population and
quantitative
genetics 73
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Generations
Figure 4.19: A realization of the

coalescent process in a bottlenecked
deeply in time (e.g. the blue and orange ancestral lineages in 4.19). population. Our population under
went a bottleneck eight generations in
the past. Code here.
2500
2500
0.15
0.15
Comparing two M. guttatus chrs. Comparing two M. nasutus chrs.

Coalescent Time, kyrs
Coalescent Time, kyrs

1667
1667
0.10
0.10
Figure 4.20: Diversity along a re-

πbp
πbp
gion of the Mimulus genome. Black

dots give π in 1kb windows between
0.05
833
0.05
833 chromosomes sampled from two in-

dividuals, the red line is a moving
0.00
0.00
0
average (data from Brandvain

0 1000 2000 3000 4000 5000 0 1000 2000 3000 4000 5000 et al.). Pairwise coalescent times (t)
position (kbp) position (kbp) estimated assuming t = π/2µ using
µBP = 10−9 . Code here.
An example of this is shown Figure 4.20, data from Brandvain

et al.. Mimulus nasutus is a selfing species that arose recently from an
out-crossing progenitor M. guttatus, and experienced a strong bottle-
neck. M. guttatus has very high levels of genetic diversity (π = 4%
at synonymous sites), but M. nasutus has lost much of this diversity
(π = 1%). Looking along the genome, between a pair of M. guttatus
chromosomes, levels of diversity are fairly uniformly high.
But in comparing two M. nasutus chromosomes, diversity is low
because the pair of lineages generally coalesce recently. Yet in a few
places we see levels of diversity comparable to M. guttatus; these re-
gions correspond to genomic sites where our pair of lineages fail to
coalesce during the bottleneck and subsequently coalesce much more
deeply in the ancestral M. guttatus population.
Figure 4.21: Yellow Monkeyflower M.

guttatus.
Choix des plus belles fleurs et des plus
beaux fruits. Pierre-Joseph Redouté. (1833).
Contributed to Flickr by Swallowtail Garden
Seeds. Public Domain.
Figure 4.22: Data for polymorphism
74 graham coop from Maize and Teosinite: 774 loci
from Wright et al. (2005). Left)
Genetic diversity levels in maize and
and teosinte samples at each of these
loci. Note how diversity levels are
lower in maize than teosinte, i.e. most
points are below the red x = y line.
Right) The distribution of Tajima’s
0.04
maize
150
D in maize and teosinte, see how the
teosinte
maize distribution is shifted towards
0.03
positive values. Code here.

Maize πbp
Counts
100
0.02
50
0.01
0.00
0
0.00 0.01 0.02 0.03 0.04 0.05 0.06 −2.4 −1.6 −0.8 0 0.8 1.6 2.4
Teosinte πbp
Tajima's D bin
Mutations that arise on deeper lineages will be at intermediate fre-

quency in our sample, and so mild bottlenecks can lead to an excess of
intermediate frequency alleles compared to the standard constant-size
model. This can skew Tajima’s D (see eqn 4.43) towards positive val-
ues and away from its expectation of zero. One example of this skew
is shown in Figure 4.22. Maize (Zea mays subsp. mays) was domesti-
cated from its wild progenitor teosinte (Zea mays subsp. parviglumis)
roughly ten thousand years ago. We can see how the bottleneck as-
sociated with domestication has resulted in a loss of genetic diversity
in maize compared to teosinte, and the polymorphism that remains is
somewhat skewed towards intermediate frequencies resulting in more
positive values of Tajima’s D.
Question 10. Voight et al. (2005) sequenced 40 autosomal
regions from 15 diploid samples of Hausa people from Yaounde,
Cameroon. The average length of locus they sequenced for each re-
gion was 2365bp. They found that the average number of segregating
sites per locus was S = 11.1 and the average π = 0.0011 per base over
the loci. Is Tajima’s D positive or negative? Is a demographic model
with a bottleneck or growth more consistent with this result?
Figure 4.23: Teosinite (Zea mays ssp.

mexicana)
American grasses (1897). Scribner, FL Image
from the Biodiversity Heritage Library.
Contributed by Smithsonian Libraries. Not in
copyright.
5
The Population Genetics of Divergence and Molecu-
lar Substitution.
“history is just one damn thing after another” -Arnold Toynbee
There are over 30 million base pair substitutions between human

and chimpanzees, sites where humans carry one allele and chimps an-
other at orthologous locations. These changes have occurred in the
seven million years or so since human and chimp last shared a com-
mon ancestor. Other subsitutions are shared between the sister species
human and chimp to the exclusion of gorilla, yet others are shared be-
tween human, chimps and Gorilla but not Orangs. These substitutions
represent changes at just a small percentage of sites genome-wide as
we share the majority of our genome, our evolutionary history, and
our biology with the other great apes. Each of the substitutions must
have arisen as a single mutation in the population, spread through
the population as a polymorphism before eventually reaching fixation.
What forces drove the spread of these alleles?
Human accacagcatttgttagttactgccaagaagcctgtatctgtagggtaaaatcctcgctgaagtgggttg
Chimp ......................................g...........c...................
Gorilla ..................................................cc..................
Orangutan .........c.........c..............................c...................
Gibbon ...................c..............................---.................
Crab-eating macaque g.............gg...c..............................c..t.t..............
Many substitutions were driven by selection, as there has undoubt- Table 5.1: Variable positions in a
primate alignment of orthologous
edly been plenty of adaptive phenotypic adaptive evolution in great sequences of a 136bp region. This
apes. However, these adaptive changes may be a small minority of all region starts at position 5242147 of
the subsitutions, for a start many of these substitutions have occurred chromosome 11, chosen pretty much
at random from the UCSC browser.
in non-coding DNA with no known functional effect. Thus it is rea- Dots indicate positions where the
sonable initial position that the majority of substitutions genome-wide other sequences carry the same base
as the human reference sequence.
may well be neutral. How can we hope to identify regions undergoing
adaptive divergence? How could we hope to address the claim that
many amino-acid changing substitutions are also neutral, as posited Many of the topics covered in this
chapter also fall within the field of
‘molecular evolution’, which shares
many of its questions and tools with
population genetics but often focuses
on longer time-scales of evolution
using phylogenetic approaches.
76 graham coop
by the Neutral theory of molecular evolution. One way forward is

to understand what neutral theory predicts for the rate of molecular
substition, and then develop ways to test these ideas.
Figure 5.1: Illustration by Benjamin

Waterhouse Hawkins from Huxley’s
Evidence as to Man’s Place in Nature
(1863).
Image from the wikimedia, public domain.
5.1 The Neutral Substitution process.
So how then do neutral substitutions occur? It is very unlikely that

a rare neutral allele accidentally drifts up to fixation; more likely,
such an allele will be eventually lost from the population. However,
populations experience a large and constant influx of rare alleles due
to mutation, so even if it is very unlikely that an individual allele fixes
within the population, some neutral alleles will fix by chance. So we’ll
need to understand the probability that a neutral mutation fixes, and
then how we can think about the rate of substitutions accumulate over
time.
5.1.1 probability of the eventual fixation of a neutral allele

An allele which reaches fixation within a population is an ancestor to
the entire population. In a particular generation there can only be a
single allele that all other alleles at the locus in a later generation can
claim as an ancestor (See Figure 5.2). At a neutral locus, the actual
allele does not affect the number of descendants that the allele has
(this follows from the definition of neutrality: neutral alleles don’t
leave more or less descendants on average than other neutral alleles).
An equivalent way to state this is that the allele labels don’t affect
anything; thus the alleles are exchangeable. As a consequence of being
exchangeable, any allele is equally likely to be the ancestor of the
population and
quantitative
genetics 77
Figure 5.2: Each allele initially

present in a small diploid popula-
tion is given a different colour so
we can track their descendants over
time. By the 9th generation, all of
the alleles present in the population
can trace their ancestry back to the
orange allele. Code here.
entire population. In a diploid population of size N , there are 2N

alleles, all of which are equally likely to be the ancestor of the entire
population at some later time point. So if our allele is present in a
single copy, the chance that it is the ancestor to the entire population
in some future generation is 1/(2N ), i.e. the chance our neutral allele
is eventually fixed is 1/(2N ). In Figure 5.2, our orange allele in the
first generation is one of 10 differently coloured alleles, and so has a
1/10 chance of being the ancestor of the entire population at some
later time point (and in this simulation it does become the common
ancestor, by the 9th generation).
More generally, if our neutral allele is present in i copies in the
population, of 2N alleles, the probability that this allele becomes fixed
is i/(2N ), i.e. the probability that a neutral allele is eventually fixed
is simply given by its frequency (p) in the population. (We can also
derive this result by letting N s → 0 in eqn. (12.11), a result we’ll
encounter later.)
How long does it take on average for such an allele to fix within
our population? Well, in developing equation (4.35) we’ve seen that
it takes 4N generations for a large sample of alleles to all trace their
ancestry back to a single most recent common ancestral allele. Any
single-base pair change which arose as a single mutation at a locus,
and fixed in the population, must have been present in the sequence
transmitted by the most recent common ancestor of the population
at that locus. Thus it must take roughly 4N generations for a neu-
tral allele present in a single copy within the population to fix. This
argument can be made more precise, but in general we would still
find that it takes ≈ 4N generations for a neutral allele to go from its
introduction to fixation with the population.
78 graham coop
5.1.2 Rate of substitution of neutral alleles

A substitution between populations that do not exchange gene flow is
simply a fixation event within one population. The rate of substitution
is therefore the rate at which new alleles fix in the population, so that
the long-term substitution rate is the rate at which mutations arise
that will eventually become fixed within our population.
Lets assume, based on our discussion of the neutral theory of molec-
ular evolution, that there are only two classes of mutational changes
that can occur with a region, highly deleterious mutations and neutral
mutations. A fraction C of all mutational changes are highly deleteri-
ous, and cannot possibly contribute to substitution nor polymorphism.
The other 1 − C fraction of mutations are neutral. If our total mu-
tation rate is µ per transmitted allele per generation, then a total of
2N µ(1 − C) neutral mutations enter our population each generation.
Each of these neutral mutations has a 1/(2N ) probability chance of
eventually becoming fixed in the population. Therefore, the rate at
which neutral mutations arise that eventually become fixed within our
population is
1
2N µ(1 − C) = µ(1 − C) (5.1)
2N
Thus the rate of substitution, under a model where newly arising
alleles are either highly deleterious or neutral, is simply given by the
mutation rate of neutral alleles, i.e. µ(1 − C).
Consider a pair of species that have diverged for T generations,
i.e. orthologous sequences shared between the species last shared a
common ancestor T generations ago. If these species have maintained
a constant µ over that time, they will have accumulated an average of
2µ(1 − C)T (5.2)
neutral substitutions. This assumes that T is a lot longer than the

time it takes to fix a neutral allele, such that the total number of
alleles introduced into the population that will eventually fix is the
total number of substitutions.
This is a really pretty result as the population size has completely
canceled out of the neutral substitution rate. However, there is an-
other way to see this in a more straight forward way. If I look at a
sequence in me compared to, say, a particular chimp, I’m looking at
the mutations that have occurred in both of our germlines since they
parted ways T generations ago. Since neutral alleles do not alter the
probability of their transmission to the next generation, we are simply
looking at the mutations that have occurred in 2T generations worth
of transmissions. Thus the average number of neutral mutational dif-
ferences separating our pair of species is simply 2µ(1 − C)T .
population and
quantitative
genetics 79
5.1.3 Implications for the Molecular Clock.
A number of observations follow under this model, from equation ”functionally less impor-
tant molecules or parts of a
(5.2). The first is that a primary determinant of patterns of molecu-
molecule evolve faster than
lar evolution in a genomic region is the level of constraint (C). This more important ones.”
pattern generally seems to hold empirically: non-coding regions often – Kimura and Ohta (1974)
evolve more rapidly than coding regions, synonymous substitutions
accumulate faster than nonsynonymous, and nonsynonymous sub-
stitutions accumulate faster in less vital proteins than ones that are
absolutely necessary for early development. For example, Fibrinopep-
tides evolve in a less constrained manner than the Cytochrome c gene,
see Figure 5.3. Note that this constraint prediction is not a unique
prediction of the neutral model, e.g. less constrained regions may
also be better able to evolve adaptively. However, it is a fantastically
Figure 5.3: The numbers of substitu-
useful general insight, e.g. it allows us to spot putatively functional
tions in three proteins, corrected for
non-coding regions by looking for genomic regions that have very low multiple hits, between various pairs of
levels of divergence among distantly related species. groups plotted against the time these
groups shared a common ancestor in
the fossil record. Data from Dicker-
Corrected # of amino−acid changes (per 100 residues)
son (1971). The lines give the linear

regression through the origin for each
100
protein. The slope of the regression is

given next to the protein name. Code
●
here. See (Robinson et al., 2016)
80
●
who revisited this classic study and
●
confirmed the conclusions.
Hemoglobin,
0.18 subs per Myrs
60
●
Fibrinopeptides,
0.96 subs per Myrs
●
40
●
20
● ●
●
Cytochrome c,
●
● 0.05 subs per Myrs
●
●
0
0 100 200 300 400 500 600
Millions of years since divergence
The second important insight, and critical for the development of

the neutral theory, is that equation (5.2) is seemingly consistent with
Figure 5.4: Eastern diamondback
Zuckerkandl and Pauling (1965)’s hypothesis of a surprisingly
rattlesnake (Crotalus adamanteus).
constant, protein molecular clock. The protein molecular clock is the North American herpetology. Holbrook, J.
E. Image from the Biodiversity Heritage
Library. Contributed by Smithsonian Libraries.
observation that for some proteins there’s a linear relationship be- Licensed under CC BY-2.0.
tween the number of non-synonymous (NS) substitutions and the time

species last shared a common ancestor in the fossil record. Dicker-
son (1971) provided an for early example of this observation (Figure
5.3), by comparing various organisms whose molecular sequences were
available to him. For example, he found that humans and rattlesnakes,
Figure 5.5: Spiny dogfish (Squalus

acanthias).
Rare Book Division, The New York Public
Library. “Squalus Acanthias, The Picked-
Dog” The New York Public Library Digital
Collections. 1785. Public domain.
80 graham coop
who last share a common ancestor in the fossil record around 300 mil-
lion years, are separated by roughly 15 NS substitutions per 100 sites
in the Cytochrome c protein. While, humans and dogfish, which di-
verged around 400 million years, are separated by 19 NS substitutions
per 100 sites in this gene.
In equation (5.2), if we double the amount of time separating a
pair of species T , we double the number of substitutions predicted.
Note that for this to be true T must be measured in generations. To
explain a protein molecular clock between species that clearly differed
dramatically in generation time it was hypothesized that the mutation
rate actually scaled with generation time, i.e. short-lived organisms
introduced fewer mutations per generation, e.g. as they had fewer
rounds of mitosis. This generation-time assumption meant that the
mutation rate per year could be constant, such that µT would be a
constant for pairs of species that had diverged for similar geological
times, which are measured in years, even if the organisms differed in
generation time. This assumption would allow neutral theory to be
consistent with a protein molecular clock measured in years. We now
know that this critical generation time assumption is false: organisms
with shorter generation times have somewhat higher mutation rates
per year so a strict neutral model is inconsistent with the protein
molecular clock. We’ll return to these ideas when we discuss the fate
of very weakly selected mutations in Chapter 12 and Ohta (1973)’s
Nearly Neutral theory. If you are still reading this send Graham a
picture of Tomoko Ohta receiving the Crafoord Prize, an analog of the
Nobel prize for biology, for her contributions to molecular evolution.
The contribution of ancestral polymorphism to divergence. If we are

considering T to represent the divergence between long-separated
species, then we can think of T as the time that the species split.
However, for more recently diverged populations and species, we need
to include the fact that the sorting of ancestral polymorphism con-
tributes to divergence among species. In Figure 5.6, we see our two
populations split Ts generations ago. However, the coalescence of our
A and B lineage is necessarily deeper in time than Ts . The top muta-
tion was polymorphic in the ancestral population but now contributes
to the divergence between A and B. Assuming that our ancestral pop-
ulation had effective size NA individuals, and that our populations
split cleanly with no subsequent gene flow, then
T
T = Ts + 2NA . (5.3)
Ts
If our species split time is very large compared to 2N then we can
A B
think of T as the split time.
Question 1. For this, and the next question, assume that humans Figure 5.6: The genealogy of two
alleles one sampled from population A
and B. Mutations on the lineages are
shown as dashes. The pair of alleles
coalesce in the ancestral population
of A and B. The two populations
split TS generations ago, with no
subsequent gene flow, but the two
lineages must coalesce deeper in time.
population and
quantitative
genetics 81
and chimps split around 5.5×106 years ago, have a generation time
20 years, that the speciation occurred instantaneously in allopatry
with no subsequent gene flow, and the ancestral effective population
size of the human and chimp common ancestor population was 10,000
individuals.
Nachman and Crowell sequenced 12 pseudogenes in humans and
chimps and found substitutions at 1.3% of sites.
A) What is the mutation rate per site per generation at these
genes?
B) All of the pseudogenes they sequenced are on the autosomes.
What would your prediction be for pseudogenes on the X and Y chro-
mosomes, given that there are fewer rounds of replication in the female
germline than in the male germline.
5.2 Tests of molecular evolution.
One of the great appeals of neutral models is they offer a simple null
for us to test real data against.
5.2.1 Comparing the rates of non-synonymous to synonymous sub-

stitutions dN/dS
One common tool in molecular evolution is to compare the estimated
number (or rates) of substitutions in different classes of genomic sites,
for example the ratio of the number of non-synonymous to synony-
mous substitutions in a given gene. The simplest way to calculate dN
is to count up the non-synonymous changes and divide by the total
number of positions in the gene where a non-synonymous point muta-
tion could occur. We can do likewise for synonymous changes dS , and
then take the ratio dN/dS . 1 1
This ignores the fact that some
For the vast majority of protein-coding genes in the genome we changes are more likely to occur
by mutation than others and also
see that dN/dS < 1. This observation is consistent with the view that does not account for multiple hits
non-synonymous sites are much more constrained than synonymous (multiple mutations at the same bp
position). Therefore, in practice the
sites, i.e. that most non-synonymous mutations are deleterious and ratio dN/dS is more typically calcu-
quickly removed from the population. If we are willing to make the lated by model-based likelihood and
assumption that all synonymous changes are neutral, dS = 2T µ, then bayesian methods that can account
for these features.
we can estimate the degree of constraint on non-synonymous sites.
(Note that synonymous changes can sometimes be subject to both
positive and negative selection, but this neutral assumption is a useful
starting place.)
Assume that a fraction C of non-synonymous changes are too dele-
terious to contribute to polymorphism. Then, after T generations of
divergence have elapsed between two populations, we’d expect dN
82 graham coop
neutral non-synonymous substitutions, where
dN = 2T (1 − C)µ (5.4)
Dividing by dS , we find
dN/dS = (1 − C) (5.5)
Therefore, if we assume that non-synonymous mutations can only be

strongly deleterious or neutral, we estimate the fraction of mutational
changes that are constrained by negative selection as C = 1 − dN/dS .
C has the interpretations of being the fraction of non-synonymous
mutations that are quickly weeded out of the population by selection,
and so do not contribute to divergence among species.
We can test whether our gene is evolving in a constrained way at
the protein level by estimating dN/dS and testing whether this is sig-
nificantly less that 1. A dN/dS test can provide evolutionary evidence
that a stretch of DNA proposed to be protein-coding is subject to se-
lective constraint, and so likely does encode for a functional protein.
We can also perform a dN/dS test on specific branches of a phylogeny
for a gene, to test on which branches the gene is subject to constraint,
or to test for changes in the level of constraint across the phylogeny.
“Rudimentary organs may be com-
Loss of constraint at pseudogenes. While most protein genes evolve pared with the letters in a word,
still retained in the spelling, but be-
under constraint, we can find examples of genes that are evolving come useless in the pronunciation,
in a less constrained manner. The simplest example of this is where but which serve as a clue .. for its
the gene has lost function. Genes can lose function because of inac- derivation.” – Darwin (1859) pg. 455
tivating mutations that stop them being transcribed or translated

into functional proteins. Such genes are called ‘pseudogenes’. When
a gene completely loses function there is no longer selection against
non-synonynous changes and so such mutations are just as free to ac-
cumulate as synonymous changes, and so dN/dS = 1. Pseudogenes
are a wonderful example of the extension of Darwin’s ideas about
vestigial traits (‘Rudimentary organs’) to the DNA level; we can still
recognize a once useful word (gene) whose spelling is slowly degrading.
Our genomes are filled with old pseudogenes whose original meanings
(functional protein coding sequences) are slowly being eroded through
the accumulation of neutral substitutions. One nice example of a
gene that has repeatedly lost function, i.e. become repeatedly psue-
dogenized, is the Enamlin gene from the study of Meredith et al.
(2009).
population and
quantitative
genetics 83
Figure 5.7: Examples of frameshift

mutations (insertions blue, deletions
red) and premature stop codons in
Enamlin in Cetacea and Xenarthra.
Figure from Meredith et al. (2009),
licensed under CC BY 4.0.
The protein Enamlin is a key structural protein involved in the

outer cap of enamel on teeth. Various mammals have secondarily
evolved diets that do not require hard teeth, and so greatly reduced
the selection pressure for hard enamel, or even teeth at all. For ex-
ample, two-toed sloths (Choloepus), Pygmy sperm whales (Kogia),
and aardvark (Orycteropus) all lack enamel on teeth. Other mammals
have lost their teeth entirely, e.g. giant anteaters (Myrmecophaga) and
Baleen whales. Due to this relaxation of constraint on the phenotype,
the Enamlin gene has accumulated pseudogenizing substitutions such
as premature stop codons and frameshift mutations (see Figure 5.7
for examples). Meredith et al. (2009) sequenced Enamlin across
a range of species and found that none of the species with enamel Figure 5.8: Two-toed sloth (Choloepus
have frameshift mutations in Enamlin, while 17/20 of species that lack hoffmanni).
An introduction to the study of mammals,
living and extinct. 1891. Flower W. H. and
enamel or teeth have frameshifts in Enamlin, and all of them carry Lydekker R. Image from the Biodiversity
Heritage Library. Contributed by University of
premature stop codons. Toronto. Not in copyright.
Meredith et al. (2009) found that the branches of the Enam-

lin phylogeny with a functional Enamlin gene had an estimated
dN/dS = 0.51, consistent with the protein evolving in a constrained
manner. In contrast, the branches with a pseudogenized Enamlin had

dN/dS = 1.02, consistent with the gene evolving a completely uncon-
strained way. The branches where the gene was likely transitioning
from a functional to non-function state, i.e. pre-mutation and mixed,
had intermediate values of dN/dS = 0.83 − 0.98, consistent with a tran-
sition from a constrained to unconstrained mode of protein evolution
somewhere along these branches of the phylogeny.
Question 2. The Enamlin gene was pseudogenized somewhere

along the branch leading to Aardvarks (Orycteropus afer), see Fig-
ure 5.9. Meredith et al. (2009) estimated that this branch has a
dN/dS = 0.75
A) Calculate the average constraint against amino-acid changes on

84 graham coop
Figure 5.9: A synthetic interpretation

of the history of enamel degeneration
in Tubulidentata (the order of aard-
varks) based on fossils, phylogenetics,
molecular clocks, frameshift muta-
tions, and dN/dS ratios. The oldest
fossil aardvarks are O. minutus (19
mya) from the early Miocene of Kenya
and also lack enamel. Figure & cap-
tion modified from Meredith et al.
(2009), licensed under CC BY 4.0.
this branch.
B) Aardvarks last shared a common ancestor with Afrosoricida
(golden moles, tenrecs) and Macroscelidea (elephant shrews) around ∼
75.1 million years ago in the Cretaceous. Assume that for the portion
of the branch while Enamlin was functional dN/dS = 0.51 and after it
was pseudogenized there was no constaint (i.e. dN/dS = 1). Based on
the branch’s average dN/dS = 0.75, can you estimate the time at which
Enamlin was pseudogenized? (I.e. when is the star in Figure 5.9?) Figure 5.10: Aardvarks (Cape ant-
eater, Orycteropus afer)
Cassell’s natural history ( 1896 ). Duncan,
P. M. Image from the Biodiversity Heritage
Library. Contributed by NCSU Libraries. Not
Adaptive evolution and dN/dS . Clearly genes are not only subject in copyright.
to neutral and deleterious mutations; beneficial mutations must also

arise and fix from from time to time. Let’s assume that a fraction B of
non-synonymous mutations that arise are beneficial such that 2N µB
beneficial mutations arise per generation. Newly arisen beneficial
alleles are not destined to fix in the population, as they may be lost to
genetic drift when they are rare in the population (we’ll discuss how
to calculate the fixation probability for beneficial alleles in Chapter
12). A newly arisen beneficial allele reaches fixation in the population
with probability fB from its initial frequency of 1/2N . This fixation
probability may be much higher than that of neutral mutations, but
still much less than 1. If 2T generations of divergence have elapsed
between the two populations then a total of
dN = 2T (1 − C − B)µ + 2T × (2N µB) × fB (5.6)
non-synonymous substitutions will have accumulated. Then
dN/dS = (1 − C − B) + 2N BfB (5.7)
assuming again that all synonymous mutations are neutral. Note that
this means that our estimates of C using 1 − dN/dS will be a lower
population and
quantitative
genetics 85
bound on the true constraint if even a small fraction of mutations
are beneficial. Those cases where the gene is evolving more rapidly
at the protein level than at synonymous sites, i.e. dN /dS > 1, are
potentially strong candidates for positive selection rapidly driving
change at the protein level. We can identify genes that have dN/dS
significantly greater than one, either on the complete gene phylogeny,
or on particular branches. Note that is a very conservative test that
few genes in the genome meet, as many genes that are fixing adaptive
non-synonymous substitutions will have dN/dS < 1; even if adaptive
mutations are common, genes may still evolve in a constrained way
(i.e. dN/dS < 1) if the rapid fixation of beneficial mutations due to pos-
itive selection is outweighed by the loss of non-synonymous mutations
to negative selection.
Figure 5.11: A phylogram for the
Colobines
4.7/2.1 primate lysozyme gene, data from
9.3/1.0 Douc langur
Yang (1998). For each branch,
Angolan colobus the numbers give the estimated
4.3/1.1 average number of non-synonymous to
synonymous changes in the lysozyme
Rhesus macaque protein.
2.5/0.0
2.1/3.2 Lar gibbon
9.3/0.0
Human
2.0/1.1
3.1/2.1
Squirrel monkey
8.9/6.9 Marmoset
0.0/3.3
A classic example for looking at adaptive evolution using dN/dS is

the evolution of the lysozyme protein in primates (Messier and Figure 5.12: Abyssinian black-and-
Stewart, 1997; Yang, 1998). The lysozyme protein is a key com- white colobus (Colobus guereza). A
member of the leaf-eating Colobines.
ponent for the breakdown of bacterial walls. It shows very fast protein Brehm’s Tierleben, Brehm, A.E. 1893. Image
evolution (see the phylogeny in Figure 5.11), notably on the lineages Contributed by University of Illinois Urbana-
Champaign. Not in copyright.
leading to apes (e.g. gibbons and humans) and Colobines (e.g. colobus
and langur monkeys). Colobines have leaf-based diets. They digest
these leaves by bacterial fermentation in their foregut, and then use
lysozymes to break down the bacteria to extract energy from the
leaves. In Colobines, the lysozyme protein has evolved to work well in
the high-PH environment of the stomach. Remarkably, the Colobine
lysozyme has convergently evolved this activity via very similar amino-
acid changes at 5 key residuals in cows and Hoatzins (a leaf eating
bird, Kornegay et al., 1994)
Figure 5.13: (Hoatzin (Opisthocomus
hoazin). A leaf-eating bird.
A history of birds (1910) Pycraft, W.P.
Contributed by American Museum of Natural
History Library. Not in copyright.
86 graham coop
The Mcdonald-Kreitman test As noted above, a big issue with using

dN/dS to detect adaptation is that it is very conservative. For a more
powerful test of rapid divergence, what we need to do is adjust for

the level of constraint a gene experiences at non-synonymous sites.
One way to do this is to use polymorphism data as an internal con-
trol. If we see little non-synonymous polymorphism at a gene, but a
lot of synonymous polymorphism, we now know that there is likely
strong constraint on the gene (i.e. high C), thus we expect dN/dS to
be low. McDonald and Kreitman (1991) devised a simple test
of the neutral theory of molecular evolution at a gene based on this
intuition (building on the conceptually similar HKA test Hudson
et al., 1987). McDonald and Kreitman took the case where we
have polymorphism data at a gene for one species and divergence to
a closely related species. They partitioned polymorphism and fixed
differences in their sample into non-synonymous and synonymous
changes:
’
Tdiv
Poly. Fixed
Non-Syn. PN DN
Syn. PS DS
Ttot
Ratio PN /PS DN /DS
Under neutral theory, we expect a smaller number of non-synonymous

to synonymous fixed differences (DN /DS < 1) and exactly the same
expectation holds for polymorphism (PN /PS ). Let’s consider a gene Within pop.
with LS and LN sites where synonymous and non-synonymous mu-
Figure 5.14: An example ogene ge-
tations could arise respectively. We can think of the underlying gene nealogy for a set of alleles sampled
genealogy at our gene, see Figure 5.14, with the total time on the co- within a population and a single al-
alescent genealogy within the species as Ttot and the total time for lele sampled from a distantly-related
′ species.
fixed differences between our species as Tdiv . Then under neutrality
we expect µLN (1 − C)Ttot non-synonymous polymorphisms (i.e. our
′
number of segregating sites), and µLN (1 − C)Tdiv non-synonymous
fixed differences. We can then fill out the rest of our table as follows:
Poly. Fixed
′
Non-Syn. µLN (1 − C)Ttot µLN (1 − C)Tdiv
′
Syn. µLS Ttot µLS Tdiv
Ratio LN (1 − C)/(LS ) LN (1 − C)/(LS )
Therefore, we expect the ratio of non-synonymous to synonymous

changes to be the same for polymorphism and divergence under a
strict neutral model. We can test this expectation of equal ratios via
the standard tests of a 2 × 2 table. If the ratio of N /S is significantly
higher for divergence than polymorphism we have evidence that non-
synonymous substitutions are accumulating more rapidly than we
would predict given levels of constraint alone.
population and
quantitative
genetics 87
As example of a Mcdonald-Kreitman (MK) table consider the work
of Frentiu et al. (2007) on the molecular evolution of L Photopig-
ment opsin in Admiral (Limenitis) butterflies, responsible for colour
vision in the long-wavelength part of the visual spectrum. Frentiu
et al. found that the sensitivity of this opsin had shifted towards blue
in its sensitivity in L. archippus archippus (viceroy) compared to L.
arthemis astyanax. To test whether this molecular evolution reflected
positive selection they sequenced 24 L. arthemis astyanax individuals
and one L. archippus archippus sequence. They identified 11 poly-
morphic sites in L. arthemis astyanax and 16 fixed differences, which
break down as follows:
Poly. Fixed
Non-Syn. 2 12 Figure 5.15: White admiral (Limenitis
Syn. 9 4 arthemis) and Viceroy (Limenitis
archippus). Basilarchia is the old
Ratio 2/9 3/1
genus that these two species were
originally placed in. Viceroy and
Note the strong excess of non-synonymous to synonymous diver- Monarch butterflies are Müllerian
gence compared to polymorphism (p-value of 0.006, Fisher’s exact mimics.
Field book of insects (1918). Lutz, F.E. .
illustrations by Edna L. Beutenmüller. Image
test), which is consistent with the gene evolving in an adaptive man- from the Biodiversity Heritage Library.
Contributed by MBLWHOI Library. Not in
ner among the two species. We would expect roughly only 3 non- copyright.
synonymous substitutions out of 16 substitutions if the gene was

evolving neutrally (16 × 2/11).
6
Neutral Diversity and Population Structure.
How does genetic differentiation build up between closely related pop-

ulations? How does migration act to reduce differentiation? These
questions are key to understand the conditions under which popula-
tions can start to genetically diverge from each other To answer these
questions we’ll first consider this in the context of neutral alleles, and
then return to think about selection and differentiation in later chap-
ters. We’ve considered neutral alleles drawn from a randomly-mating
population, and divergence among alleles drawn from two distantly-
related populations. We’ll now turn to consider divergence among
more closely related populations. In thinking about the coalescent
within populations we made the assumption that any pair of lineages
is equally likely to coalesce with each other. However, when there is
population structure this assumption is violated.
To develop models of about population structure we’ll use the
statistic FST , which we introduced in Section 3.0.1 of discussion of
summarizing population structure in allele frequencies. We have previ-
ously written the measure of population structure FST as
HT − HS
FST = (6.1)
HT
where HS is the probability that two alleles sampled at random from
a subpopulation differ, and HT is the probability that two alleles
sampled at random from the total population differ.
6.1 A simple population split model

Imagine a population of constant size of Ne diploid individuals that
T generations in the past split into two daughter populations (sub-
populations) each of size Ne individuals, which do not subsequently
exchange migrants. In the current day we sample an equal number of
alleles from both subpopulations.
Consider a pair of alleles sampled within one of our sub-populations
and think about their per site heterozygosity. These alleles have expe-
90 graham coop
Figure 6.1: Change in allele frequen-

cies following a population split. Code
here.
rienced a population of size Ne and so the probability that they differ

is HS ≈ 4Ne µ (assuming that Ne µ ≪ 1, using our equation 4.12 for
heterozygosity within a population ).
The heterozygosity in our total population is a little more tricky
to calculate. Assuming that we equally sample both sub-populations,
when we draw two alleles from our total sample, 50% of the time
they are drawn from the same subpopulation and 50% of the time
they are drawn from different subpopulations. Therefore, our total
heterozygosity is given by
HT = 12 HS + 21 HB (6.2)
where HB is the probability that a pair of alleles drawn from our two
different sub-populations differ from each other. A pair of alleles from
different sub-populations cannot find a common ancestor with each
other for at least T generations into the past as they are in distinct
populations (not connected by migration). Once our alleles find them-
selves back in the combined ancestral population it takes them on
average 2N generations to coalesce. So the total opportunity for mu-
tation between our pair of alleles sampled from different populations is
2(T + 2N ) generations of meioses, such that the probability that our
pairs of alleles is different is
HB ≈ 2µ(T + 2N ) (6.3)
We can plug this into our expression for HT , and then that in turn
into FST . Doing so we find that
µT T
FST ≈ = (6.4)
µT + 4Ne µ T + 4Ne
population and
quantitative
genetics 91
Note that µ cancels out of this equation. In this simple toy model,
FST is increasing because the amount of between-population diversity
increases with the divergence time of the two populations (initially
linearly with T ). FST grows at a rate give by T/(4Ne ) so that differenti-
ation will be higher between populations separated by long divergence
times or with small effective population sizes.
Question 1. The genome-wide FST between Bornean and Suma-

tran orangutan species samples (Pongo pygmaeus and Pongo abelii)
is ≈ 0.37 (Locke et al., 2011), representing a deep population split
between the species (potentially with little subsequent gene flow).
Within the populations the genome-wide average Watterson’s θ is
θW = 1.4kb−1 , estimated from the number of segregating sites. As-
sume a generation time of 20 years, and a mutation rate of 2 × 10−8
per base per generation. How far in the past did the two populations Figure 6.2: Orangutan (Pongo).
Brehms thierleben, allgemeine kunde des
thierreichs. Brehm, A. E. Image from the
diverge? Biodiversity Heritage Library. Contributed by
MBLWHOI Library. Not in copyright.
6.2 A simple model of migration between an island and the main-

land.
We can also use the coalescent to think about patterns of differen-
tiation under a simple model of migration-drift equilibrium. Let’s
consider a small island population that is relatively isolated from a
large mainland population, where both of these populations are con-
stant in size. We’ll assume that the expected heterozygosity for a pair
of alleles sampled on the mainland is HM .
Our island has a population size NI that is very small compared
to our mainland population. Each generation some low fraction m of
our individuals on the island have migrant parents from the mainland
the generation before. Our island may also send migrants back to the
mainland, but these are a drop in the ocean compared to the large
population size on the mainland and their effect can be ignored.
If we sample an allele on the island and trace its ancestral lin-
eage backward in time, each generation our ancestral allele has a low
probability m of being descended from the mainland in the preceding
generation (if we go back far enough the allele eventually has to be de-
scended from an allele on the mainland). The probability that a pair
of alleles sampled on the island are descended from a shared recent
common ancestral allele on the island is the probability that our pair
of alleles coalesces before either lineage migrates. For example, the
probability that our pair of alleles coalesces t + 1 generations back on
the island is
( )t ( ( ))
1 1 1 1
(1 − m) 2(t+1)
1− ≈ exp −t + 2m , (6.5)
2NI 2NI 2NI 2NI
92 graham coop
with the approximation following from assuming that m ≪ 1 &

(2NI ) ≪ 1 (note that this is very similar to our derivation of het-
1
erozygosity above). The probability that our alleles coalesce before

either one of them migrates off the island, irrespective of the time, is
∫ ∞ ( ( ))
1 1 1/(2NI )
exp −t + 2m dt = 1 . (6.6)
0 2NI 2NI /(2NI ) + 2m
Let’s assume that the mutation rate is very low such that it is very
unlikely that the pair of alleles mutate before they coalesce on the
island. Therefore, the only way that the alleles can be different from
each other is if one or other of them migrates to the mainland, which
happens with probability
1/(2NI )
1− 1/(2NI )
(6.7)
+ 2m
Conditional on one or other of our alleles migrating to the mainland,
both of our alleles represent independent draws from the mainland and
so differ from each other with probability HM . Therefore, the level of
heterozygosity on the island is given by
( )
1/(2NI )
HI = 1 − 1 HM (6.8)
/(2NI ) + 2m
So the reduction of heterozygosity on the island compared to the
mainland is
HI 1/(2NI ) 1
FIM = 1 − =1 = . (6.9)
HM /(2NI ) + 2m 1 + 4NI m
The level of inbreeding on the island compared to the mainland will
be high if the migration rate is low and the effective population size
of the island is low, as allele frequencies on the island are drifting and
diversity on the island is not being replenished by migration. The key
parameter here is the number individuals on the island replaced by
immigrants from the mainland each generation (NI m).
We have framed this problem as being about the reduction in ge-
netic diversity on the island compared to the mainland. However, if we
consider collecting individuals on the island and mainland in propor-
tion to their population sizes, the total level of heterozygosity would
be HT = HM , as samples from our mainland would greatly outnum-
ber those from our island. Therefore, considering the island as our
sub-population, we have derived another simple model of FST .
Question 2. You are investigating a small river population of Figure 6.3: Three-spined stickleback
sticklebacks, which receives infrequent migrants from a very large (Gasterosteus aculeatus).
Jordan, David Starr (1907) Fishes, New York
marine population. At a set of putatively neutral biallelic markers the City, NY: Henry Holt and Company. Image
from Wikimedia Commons Public domain.
freshwater population has frequencies:
0.2, 0.7, 0.8
population and
quantitative
genetics 93
at the same markers the marine population has frequencies:
0.4, 0.5 and 0.7.
From studying patterns of heterozygosity at a large collection of
markers, you have estimated the long term effective size of your fresh-
water population is 2000 individuals.
What is your estimate of the migration rate from the marine popu-
lations into the river?
6.3 Incomplete Lineage Sorting

Finally we turn to the interaction of Because it can take a long time
for an polymorphism to drift up or down in frequency, multiple pop-
ulation splits may occur during the time an allele is still segregating.
This can lead to incongruence between the overall population tree and
the information about relationships present at individual loci. In Fig-
ure 6.4 and 6.5 we show a simulation of three populations where the
bottom population splits off from the other two first, followed by the
subsequent splitting of the the top and the middle populations. We
start both simulations with a newly introduced red allele being poly-
morphic in the combined ancestral population. The most likely fate
of this allele is that it is quickly lost from the population, but some-
times the allele can drift up in frequency and be polymorphic when
the populations split, as the allele in our two figures has done. If the
allele is lost/fixed in the descendant populations before the next pop-
ulation split, our allele configuration will agree with the population
tree, as it does in Figure 6.4, and so too the gene tree will agree with
population tree (as shown in the left side of Figure 6.6). However, if
the allele persists as a polymorphism in the ancestral population until
the top and the middle populations split, then the allele could fix in
one of these populations and not the other. Such an event leads to
a substitution pattern that disagrees with the population tree, as in
Figure 6.5. If we were to construct a phylogeny using the variation
at this site we would see a disagreement between the gene tree and
population tree. In Figure 6.5 an allele drawn from the top and the
bottom populations are necessarily more closely related to each other
than either is to an allele drawn from population 2; tracing our allelic
lineages from the top and bottom populations back through time, they
must coalesce with each other before we reach the point where the
red mutation arose; in contrast, a lineage from the middle population
cannot have coalesced with either other lineage until past the time the
red mutation arose. An example of this ‘incomplete lineage sorting’ in
terms of the underlying tree is shown on the right side of Figure 6.6.
A natural pedigree analogy to incomplete lineage sorting is the fact
that while two biological siblings are more closely related to each other
94 graham coop
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 6.4: An example of alleles
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
assorting among three populations
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
such that there is no incomplete
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
lineage sorting. Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Past Present
Generations
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Figure 6.5: An example of alleles
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
assorting among three populations
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
leading to incomplete lineage sorting.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Past Present
Generations
Figure 6.6: The population tree of

three populations ((A, B), C) is
shown blocked out with black shapes.
Two different coalescent trees relating
t2 a single allele drawn from A, B, and C
are shown with thinner lines.
t1
A B C A B C
population and
quantitative
genetics 95
genealogically than either is to their cousin, at any given locus one of
the siblings can share an allele IBD with their cousin that they do not
share with their own sibling, due to the randomness of Mendelian seg-
regation down their pedigree. In these cases, the average relatedness of
the individuals/populations disagrees with the patterns of relatedness
at a particular locus.
As an empirical example of incomplete lineage sorting, let’s con-
sider the work of Jennings and Edwards (2005) who sequenced
a single allele from three different species of Australian grass finches
(Poephila): two sister species of long-tailed finches (Poephila acuti-
cauda and P. hecki) and the black-throated finch (Poephila cincta,
see Figure 6.7). They collected sequence data for 30 genes, and con-
structed phylogenetic gene trees at each of these loci, resulting in 28
well-resolved gene trees. 16 of the gene trees showed P. acuticauda
and P. hecki as sisters with P. cincta) (the tree ((A,H),C) ), while for
twelve genes the gene tree was discordant with the population tree:
for seven of their genes P. hecki fell as an outgroup to the other two
and at five P. acuticauda fell as an outgroup (the trees ((A,C),H) and Figure 6.7: Banded Grass Finch (P.
cincta). Illustration by Elizabeth
((H,C),A) respectively).
Gould.
Let’s use the coalescent to understand this discordance between Birds of Australia Gould J. 1840. CC BY 4.0
uploaded to Flickr by rawpixel.com.
gene trees and species trees. Let’s assume that two sister populations
(A & B) split t1 generations in the past, with a deeper split from a
third outgroup population (C) t2 generations in the past. We’ll as-
sume that there’s no gene flow among our populations after each split.
We can trace back the ancestral lineages of our three alleles. The first
opportunity for the A & B lineages to coalesce is t1 generations ago.
If they coalesce with each other in their shared ancestral population
before t2 in the past (left side of Figure 6.6) their gene tree will def-
initely agree with the population tree. So the only way for the gene
tree to disagree with the population tree is for the A & B lineages to
fail to coalesce in their shared ancestral population between t1 and t2 ;
t −t
this happens with probability (1 − 1/2N ) 2 1 . We’ll get a discordant
gene tree if A & B make it back to the shared ancestral population
with C without coalescing, and then one or the other of them coalesces
with the C lineage before they coalesce with each other. This happens
with probability 2/3, as at the first pairwise-coalescent event there are
are three possible pairs of lineages that could coalesce, two of which
(A & C and B & C ) result in a discordant tree. So the probability
that we get a coalescent tree that is discordant with the population
tree is
2 t −t
(1 − 1/2N ) 2 1 . (6.10)
3
This equation allows us to relate the fraction of loci showing incom-
plete lineage sorting to the population genetics parameters of the
ancestral population.
96 graham coop
Question 3. Let’s return to Jennings and Edwards’s Aus-

tralian grass finches example. They estimated that the ancestral pop-
ulation size of our two long-tailed finches was four hundred thousand.
What is your best estimate of the inter-speciation time, i.e. t2 − t1 ?
The fraction of loci showing ILS, eqn (6.10), depends on the times
between population splits (t2 − t1 ) Thus we should expect gene-tree
population-tree discordance when populations split in rapid succession
and/or population sizes are large.
Testing for gene flow. We often want to test whether gene flow has
occurred between populations. For example, we might want to es-
tablish a case that interbreeding between humans and Neanderthals
occurred or demonstrate that gene flow occurred after two populations
began to speciate. A broad range of methods have been designed to
test for gene flow and to estimate gene flow rates based on neutral ex-
pectations. Here we’ll briefly just discuss one method based on some
simple coalescent ideas. Above we assumed that gene-tree population-
tree discordance was due to incomplete lineage sorting due to popu-
lations rapidly splitting. However, gene flow among populations can
also lead to gene-tree discordance. While both ILS and gene flow can
lead to discordance, under simplifying assumptions, ILS implies more
symmetry in how these discordances manifest themselves.
Figure 6.8: In both the left and and

right trees ILS has occurred between
our single lineages sampled from pop-
ulations A, B, and C. Imagine that
population D is a somewhat distant
t3 t2 outgroup such that the lineages from
A through C (nearly) always coalesce
t1
with each other before any coalesce
Species 1 2 3 4 1 2 3 4
with D. The small dash on the branch
Mut. Pattern: A B B A B A B A indicates the mutation A→B oc-
curring, giving rise to the ABBA or
BABA mutational pattern shown at
the bottom.
Take a look at Figure 6.8. In both cases the lineages from A and
B fail to coalesce in their initial shared ancestral population, and one
or the other of them coalesces with the lineage from C before they
coalesce with each other. Each option is equally likely; therefore the
mutational patterns ABBA and BABA are equally likely to occur
under ILS. 1 1
Here we have to assume no structure
However, if gene flow occurs from population C into population B, in the ancestral population.
in addition to ILS the lineage from B can more recently coalesce with
the lineage from C, and so we should see more ABBAs than BABAs.
To test for this effect of gene flow, we can sample a sequence from
each of our 4 populations and count up the number of sites that show
the two mutational patterns consistent with the gene-tree discordance
population and
quantitative
genetics 97
nABBA and nBABA and calculate
nABBA − nBABA
(6.11)
nABBA + nBABA
This statistic will have expectation zero if the gene-tree discordance is
due to ILS and will be skewed negative if gene flow occurred from C
into B (and skewed positive if gene flow occurred from C into A).
7
Phenotypic Variation and the Resemblance Between
Relatives
The distinction between genotype and phenotype is

one of the most useful ideas in Biology.1 The genotype of an indi- 1
Johannsen, W., 1911 The Geno-
type Conception of Heredity. The
vidual (the genome), for most purposes, is decided when the gametes
American Naturalist 45(531): 129–159
fuse to form a zygote (individual). The phenotype of an individual
represents any measurable aspect of an organism.
Your height, to the amount of RNA transcribed from a given gene,
to what you ate last Tuesday: all of these are phenotypes. Nearly
any phenotype we can choose to measure about an organism repre-
sents the outcome of the information encoded by their genome played
out through an incredibly complicated developmental, physiological
and/or behavioural processes that in turn interact with a myriad of
environmental and stochastic factors. Honestly it boggles the mind Figure 7.1: European aspen P. trem-
how organisms work as well as they do, let alone that I managed to ula.
Der baum. H. Schacht. 1860. BHL Image from
eat lunch last Tuesday. the Biodiversity Heritage Library. Contributed
by The Library of Congress. Not in copyright.
There are many different ways to think about studying the path
240
from genotype through to phenotype. The one we will take here is to

think about how phenotypic variation among individuals in a popu-
230
lation arises as a result of genetic variation in the population. One

Budset (days)
simple way to measure this genotype-phenotype relationship is to

220
calculate the phenotypic mean for each genotype at a locus. For exam-
ple, Wang et al. (2018) explored the genetic basis of budset time in
210
European aspen (Populus tremula); the effect of one specific SNP on

that phenotype is shown in in Figure 7.2. Budset timing is a key trait
200
underlying local adaptation to varying growing season length. The

associated SNP falls in a gene (PtFT2) that is known to play a strong TT GT GG
PtFT2 Genotype
role in flowering time regulation in other plants.
One way for us to assess the relationship between genotype and
Figure 7.2: The effect of a flowering
phenotype is to find the best fitting linear line through the data, i.e. time gene (PtFT2) SNP on bud-
fit a linear regression of phenotypes for our individuals on their geno- set time in European aspen. Each
dot gives the genotype-phenotype
combination for an individual. The
horizontal lines give the budset mean
for each genotype and the vertical
lines show the inter-quartile range.
The dotted line gives the linear re-
gression of phenotype on genotype.
Thanks to Pär Ingvarsson for sharing
these data from Wang et al. (2018).
100 graham coop
types at a particular SNP (l):
X ∼ µ + al Gl (7.1)
In the equation above, X is a vector of the phenotypes of a set of We’ll encounter linear regressions
individuals and Gl is our vector of genotypes at locus l, with Gi,l at various points during the next
few chapters, see the math appendix
taking the value 0, 1, or 2 depending on whether our individual i is around eqn A.43 for more background
homozygote, heterozygote, or the alternate homozygote at our locus of details.
interest. Here µ is our phenotypic mean. The slope of this regression
line (al ) has the interpretation of being the average effect of substitut-
ing a copy of allele 2 for a copy of allele 1. In our Aspen example the
slope is −13.6, i.e. swapping a single T for a G allele moves the budset
forward by 13.6 days, such that the GG homozygote is predicted to set
buds 27.2 days earlier than the T T homozygote.
As a measure of the significance of this genotype-phenotype rela-
tionship, we can calculate the p-value of our regression. To try and
identify loci that are associated with our trait genome-wide, we can
conduct this regression at each SNP we genotype in the genome.
One common way to display the results of such an analysis (called
a genome-wide association study or GWAS for short) is to plot the
logarithm of the p-value for each SNP along genome (a so-called Man-
hattan plot). Here’s one from Wang et al. (2018) for their Aspen
budset phenotype
Figure 7.3: Manhattan plot of the

p-value of the linear association
between genotype and budset in
Aspen. Each dot represents the
test at a single SNP, plotted at its
physical coordinate in the genome.
Different chromosomes are plotted
in alternating colours. The SNPs
surrounding the PtFT2 gene are
shown in red. From Wang et al.
The SNP with the most significant p-value is the PtFT2 SNP. Note (2018), licensed under CC BY 4.0.
that other SNPs in the surrounding region also light up as showing a

significant association with budset timing. This is because loci that
are in LD with a functional locus may in turn show an association,
not because they directly affect the phenotype, but simply because
the genotypes at the two loci are themselves non-randomly associated.
Below is a zoomed in version (Figure 2 in Wang et al. (2018)) with
SNPs coloured by the strength of their LD with the putatively func-
tional SNP. Note how SNPs in strong LD with the functional allele
(redder points) have more significant p-values.
Variation in some traits seems to have a relatively simple genetic
basis. In our Aspen example there is one clear large-effect locus, which
explains 62% of the variation in budset. Note that even in this case,
where we have an allele with a very strong effect on a phenotype, this
population and
quantitative
genetics 101
Figure 7.4: The Manhattan plot

zoomed in on the top-hit (red SNPs
from Figure 7.3). SNPs are now
coloured by their D′ value with the
most significant SNP. D′ is the LD
covariance between a pair of loci (D,
eqn (3.16)) normalized by the largest
value D can take given the allele
frequencies. Figure from Wang et al.
is not an allele for budset, nor is PtFT2 a gene for budset. It is an “All that we mean when we speak
allele that is associated with budset in the sampled environments and of a gene [allele] for pink eyes is, a
gene which differentiates a pink eyed
populations. In a different set of environments, this allele’s effects fly from a normal one —not a gene
may be far smaller, and a different set of alleles may contribute to [allele] which produces pink eyes
per se, for the character pink eyes
phenotype variation. PtFT2, the gene our focal SNP falls close to, is is dependent on the action of many
just one of many genes and molecular pathways involved in budset. other genes.” - Sturtevant (1915)
A mutant screen for budset may uncover many genes with larger ef-
fects; this gene is just a locus that happens to be polymorphic in this
particular set of genotyped individuals.
While phenotypic variation for some phenotypes has a relatively
simple genetic basis, many phenotypes are likely much more geneti-
cally complex, involving the functional effect of many alleles at hun-
dreds or thousands of polymorphic loci. For example hundreds of
small effect loci affecting human height have been mapped in Euro-
pean populations to date. Such genetically complex traits are called
polygenic traits.
In this chapter, we will use our understanding of the sharing of
alleles between relatives to understand the phenotypic resemblance
between relatives in quantitative phenotypes. This will allow us to
understand the contribution of genetic variation to phenotypic varia-
tion. In the next chapter, we will then use these results to understand
the evolutionary change in quantitative phenotypes in response to
selection.
7.0.1 A simple additive model of a trait

Let’s imagine that the genetic component of the variation in our trait
is controlled by L autosomal loci that act in an additive manner. The
frequency of allele 1 at locus l is pl , with each copy of allele 1 at this
locus increasing your trait value by al above the population mean.
The phenotype of an individual, let’s call her i, is Xi . Her genotype
at SNP l is Gi,l . Here Gi,l = 0, 1, or 2, representing the number of
102 graham coop
copies of allele 1 she has at this SNP. Her expected phenotype, given
her genotype at all L SNPs, is then
∑
L
E(Xi |Gi,1 , · · · , Gi,L ) = µ + XA,i = µ + Gi,l al (7.2)
l=1
where µ is the mean phenotype in our population, and XA,i is the

deviation away from the mean phenotype due to her genotype. Now
in reality the phenotype is a function of the expression of those alleles
in a particular environment. Therefore, we can think of this expected
phenotype as being an average across a set of environments that occur
in the population.
When we measure our individual’s observed phenotype we see
Xi = µ + XA,i + XE,i (7.3)
where XE is the deviation from the mean phenotype due to the envi-
ronment. This XE includes the systematic effects of the environment
our individual finds herself in and all of the noise during development,
growth, and the various random insults that life throws at our indi-
vidual. If a reasonable number of loci contribute to variation in our
trait then we can approximate the distribution of XA,i by a normal
distribution due to the central limit theorem (see Figure 7.5). 2 Thus 2
The central limit theory is discussed
briefly in the math appendix section
if we can approximate the distribution of the effect of environmental
A.2.5.
variation on our trait (XE,i ) also by a normal distribution, which is
reasonable as there are many small environmental effects, then the
distribution of phenotypes within the population (Xi ) will be normally
distributed (see Figure 7.5).
Figure 7.5: The convergence of the

phenotypic distribution to a normal
distribution. Each of the three his-
tograms shows the distribution of
the phenotype in a large sample, for
increasingly large numbers of loci
(L = 1, 4, and 10, with the pro-
portion of variance explained held
at VA = 1). I have simulated each
individual’s phenotype following
equations 7.2 and 7.3. Specifically,
we’ve simulated each individual’s
biallelic genotype at L loci, assuming
Hardy-Weinberg proportions and that
the allele is at 50% frequency. We
Note that as this is an additive model; we can decompose eqn. 7.3 assume that all of the alleles have
into the effects of the two alleles at each locus and rewrite it as equal effects and combine them ad-
ditively together. We then add an
environmental contribution, which is
Xi = µ + XiM + XiP + XiE (7.4)
normally distributed with variance
0.05. Note that in the left two pic-
where XiM and XiP are the contribution to the phenotype of the alle- tures you can see peaks corresponding
les that our individual received from her mother (maternal alleles) and to different genotypes due to our low
environmental noise (in practice we
can rarely see such peaks for real
quantitative phenotypes). Code here.
population and
quantitative
genetics 103
father (paternal alleles) respectively. This will come in handy in just
a moment when we start thinking about the phenotypic covariance of
relatives.
Now obviously this model seems silly at first sight as alleles don’t
only act in an additive manner, as they interact with alleles at the
same loci (dominance) and at different loci (epistasis). Later we’ll
relax this assumption, however, we’ll find that if we are interested
in evolutionary change over short time-scales it is actually only the
“additive component” of genetic variation that will (usually) concern
us. We will define this more formally later on, but for the moment
we can offer the intuition that parents only get to pass on a single
allele at each locus on to the next generation. As such, it is the effect
of these transmitted alleles, averaged over possible matings, that is
an individual’s average contribution to the next generation (i.e. the
additive effect of the alleles that their genotype consists of).
7.0.2 Additive genetic variance and heritability

As we are talking about an additive genetic model, we’ll talk about
the additive genetic variance (VA ), the phenotypic variance due to the
additive effects of segregating genetic variation. This is a subset of the
total genetic variance if we allow for non-additive effects.
The variance of our phenotype across individuals (V ) we can write
as
V = V ar(XA ) + V ar(XE ) = VA + VE (7.5)
In writing the phenotypic variance as a sum of the additive and envi-

ronmental contributions, we are assuming that there is no covariance
between XG,i and XE,i i.e. there is no covariance between genotype
and environment. 3 3
In this section we’re making use of
Our additive genetic variance can be written as the properties of the variance of a
random variable, see math appendix
eqn (A.25)
∑
L
VA = V ar(Gi,l al ) (7.6)
l=1
where V ar(Gi,l al ) is the contribution of locus l to the additive vari-

ance among individuals. Assuming random mating, and that our loci
are in linkage equilibrium, we can write our additive genetic variance
as
∑L
VA = a2l 2pl (1 − pl ) (7.7)
l=1
where the 2pl (1 − pl ) term follows from the binomial sampling of two
alleles per individual at each locus. 4 4
These results follow from the proper-
ties of variance in math appendix eqn
Question 1. You have two biallelic SNPs contributing to variance (A.25).
in human height. At the first SNP you have an allele with an additive
104 graham coop
effect of 5cm which is found at a frequency of 1/10,000. At the second

SNP you have an allele with an additive effect of −0.5cm segregat-
ing at 50% frequency. Which SNP contributes more to the additive
genetic variance? Explain the intuition of your answer.
An example of the additive basis of variation using polygenic scores.

Now we don’t usually get to see the individual loci contributing to
highly polygenic traits. Instead, we only get to see the distribution
of the trait in the population. However, with the advent of GWAS in
human genetics we can see some of the underlying genetics using the
many trait-associated loci identified to date. Using the estimated ef-
fect sizes at each locus, each one of which is tiny, we can calculate the
weighted sum over an individual’s genotype as in equation 7.2. This
weighted sum is called the individual’s polygenic score. To illustrate
how polygenic scores work, we can take a set of 1700 SNPs5 . The ef- 5
each chosen as the SNP with the
fects of these SNPs are tiny; the median, absolute additive effect size strongest signal of association with
height in 1700 roughly independent
is 0.07cm. Figure 7.6 shows the distribution of a thousand individuals’ bins spaced across the genome
polygenic scores calculated using these 1700 SNPs (simulated geno-
types using the UKBB frequencies). The standard deviation of these
polygenic scores ∼ 2cm. The individuals with higher polygenic scores
for height are predicted to be taller than the individuals with lower
polygenic scores.
Figure 7.6: Left) The distribution

of the number of height-increasing
alleles that individuals carry at 1700
SNPs associated with height in the
0.15
0.015
UK Biobank, for a sample of 1000

individuals. right) The distribution
of the polygenic scores for these
0.10
0.010
frequency
frequency
1000 individuals. Plotted on top is

a normal distribution with the same
mean and variance. The empirical
variance of these polygenic scores is
0.005
0.05
0.13, the additive genetic variance

calculated by equation (7.7) is 0.135,
so the two are in good agreement.
0.000
0.00
Code here.
1600 1650 1700 1750 −10 −5 0 5
Number of Alleles assoc. with increase height Height Polygenic Score (cm)
The narrow sense heritability We would like a way to think about

what proportion of the variation in our phenotype across individuals
is due to genetic differences as opposed to environmental differences.
Such a quantity will be key in helping us think about the evolution of
phenotypes. For example, if variation in our phenotype had no genetic
population and
quantitative
genetics 105
basis, then no matter how much selection changes the mean phenotype
within a generation the trait will not change over generations.
We’ll call the proportion of the variance that is genetic the heri-
tability, and denote it by h2 . We can then write heritability as
V ar(XA ) VA
h2 = = (7.8)
V V
Remember that we are thinking about a trait where all of the alleles
act in a perfectly additive manner. In this case our heritability h2
is referred to as the narrow sense heritability, the proportion of the
variance explained by the additive effect of our loci. When we allow
dominance and epistasis into our model, we’ll also have to define the
broad sense heritability (the total proportion of the phenotypic vari-
ance attributable to genetic variation).
The narrow sense heritability of a trait is a useful quantity; indeed
we’ll see shortly that it is exactly what we need to understand the
evolutionary response to selection on a quantitative phenotype. We
can calculate the narrow sense heritability by using the resemblance
between relatives. For example, if the phenotypic differences between
individuals in our population were solely determined by environmental
differences experienced by these different individuals, we should not
expect relatives to resemble each other any more than random individ-
uals drawn from the population. Now the obvious caveat here is that
relatives also share an environment, so may resemble each other due to
shared environmental effects.
Note that the heritability is a property of a sample from the pop-
ulation in a particular set of environments at a particular time.
Changes in the environment may change the phenotypic variance.
Changes in the environment may also change how our genetic alleles
are expressed through development and so change VA . Thus estimates
of heritability are not transferable across environments or populations.
7.0.3 The covariance between relatives

People have long been fascinated by the resemblance between rela-
tives, particularly twins (see Figure 7.7). Families hold a special place
in quantitative genetics, as remarkably we can use the resemblance
between relatives to directly estimate the heritability and covariance
of traits. To see this we can calculate the covariance in phenotype
between two individuals (1 and 2) who have phenotypes X1 and X2
respectively.6 To think about imagine plotting the phenotypes of, say, 6
We’ll be dealing with covariance a
lot this chapter, see math appendix
sisters against each other. The x and y coordinates of each point will
section A.2.5 for more background.
be the, say, heights of the pair of siblings. Do tall women tend to have
tall sisters, do short women tend to have short sisters? How much do
their phenotypes covary. If some of the variation in our phenotype is
106 graham coop
genetic we expect identical twins to resemble each other more than

full siblings, who in turn will resemble each other more than half-sibs
and so on out (see Figure 7.8). Under our simple additive model of
phenotypes we can write the covariance as
Cov(X1 , X2 ) = Cov ((X1M + X1P + X1E ), ((X2M + X2P + X2E ))

(7.9)
We can expand this out in terms of the covariance between the various
components in these sums.
Figure 7.7: The Cholmondeley Ladies.

Unknown British Painter, circa 1600.
Inscription on bottom left of the
painting “Two Ladies of the Chol-
mondeley Family, Who were born the
same day, Married the same day, And
brought to Bed the same day.” The
ladies are thought to be twin sisters,
but there’s a clue that they’re not
identical twins. Can you spot it?
Image from Wikimedia, considered public do-
main in the United States, UK Tate ©Creative
Commons CC-BY-NC-ND (3.0 Unported)
To make our task easier, we will make two commonly made assump-
tions:
1. We can ignore the covariance of the environments between individ-

uals (i.e. Cov(X1E , X2E ) = 0)
2. We can ignore the covariance between the environment of one

individual and the genetic variation in another individual (i.e.
Cov(X1E , (X2M + X2P )) = 0). (We can actually incorporate
these effects in later if we choose too.)
The failure of these assumptions to hold can undermine our esti-

mates of heritability, but we’ll return to that later. Moving forward
with these assumptions, we can simplify our original expression above
and write our phenotypic covariance between our pair of individuals as
Cov(X1 , X2 ) = Cov((X1M , X2M )+Cov(X1M , X2P )+Cov(X1P , X2M )+Cov(X1P , X2P )

(7.10)
This equation is saying that, under our simple additive model, we can
see the covariance in phenotypes between individuals as the covariance
between the maternal and paternal allelic effects in our individuals.
We can use our results about the sharing of alleles between relatives to
population and
quantitative
genetics 107
obtain these covariance terms. But before we write down the general
case, let’s quickly work through some examples.
Id. Twins Cov= 0.979 Full Sibs Cov= 0.443 1/2 Sibs Cov= 0.246 1st Cousins Cov= 0.101
3
3
3
2
2
2
1
1
1
Ind 2's phenotype
Ind 2's phenotype
Ind 2's phenotype
Ind 2's phenotype

0
0
−1 0
−1
−3 −2 −1
−3 −2 −1
−3
−3
−3 −1 1 2 3 −3 −1 1 2 3 4 −3 −1 0 1 2 3 −3 −1 1 2 3
Ind 1's phenotype Ind 1's phenotype Ind 1's phenotype Ind 1's phenotype
Figure 7.8: Covariance of phenotypes

between pairs of individuals of a given
relatedness. Each point gives the
phenotypes of a different pair of indi-
The covariance between identical twins Let’s first consider the case
viduals. The additive genetic variance
of a pair of identical twins, monzygotic twins, from two unrelated is held constant at VA = 1, such that
parents. Our pair of twins share their maternal and paternal allele the expected covariances (2F1,2 VA )
should be 1, 0.5, 0.25, and 0.125 re-
identical by descent (X1M = X2M and X1P = X2P ). As their maternal spectively in good agreement with the
and paternal alleles are not correlated draws from the population, empirical covariances reported in the
title of each graph. The data were
i.e. have no probability of being IBD as we’ve said the parents are
simulated as described in the caption
unrelated, the covariance between their effects on the phenotype is of Figure 7.5. The blue line shows
zero (i.e. Cov(X1P , X2M ) = Cov(X1M , X2P ) = 0). In that case, eqn. x = y and the red line shows the best
fitting linear regression line. Code
7.10 is here.
Cov(X1 , X2 ) = Cov((X1M , X2M )+Cov(X1P , X2P ) = 2V ar(X1M ) = VA

(7.11)
To calculate the narrow sense heritability we could then in principal
divide the covariance of our pairs of MZ twins (MZ1 and MZ2 ) by the
trait variance to give
Cov(MZ1 , MZ2 )
h2 = = ρMZ (7.12)
70
● MZ
V ● DZ
●
● ●
●
● ●
60
● ●
●
where ρMZ is the correlation of pairs of MZ twins (see Appendix eqn ●
●
●
●
●
●
●
50
●
(A.42) for more on correlations). For example, we could estimate the ●
● ●
●
●
●
●
Twin 2 (PBF)
●
heritability of a measure of body from the MZ correlation in Figure ● ●
40
●
● ●
●
● ●
7.9. In general, this simple estimator isn’t great as the correlation of ●
●
●
30
● ●
● ●
●
identical twins includes the effects of the shared family environment of ●

●
●
● ●
20
●
●
●
the twins (i.e. Cov(X1E , X2E )). ●
●
●
●
●
●
●
10
●
Moreover, it can be inflated by non-additive effects as identical ●
●
●
●
●
●
●
●
twins don’t just share alleles, they share their entire genotypes, and ●
0
thus resemble each other in phenotype also because of shared dom- 0 10 20 30
Twin 1 (PBF)
40 50 60 70
inance effects (we’ll discuss non-additive effects in Section 7.1.1).

Figure 7.9: A measure of body fat
in pairs of Monozygotic (MZ) and
Dizygotic (DZ) twins. Our sample
correlations are ρ̂MZ = 0.72 and
ρ̂MZ = 0.10. Data from Faith et al.
(1999), Code here.
108 graham coop
Better twin-based estimates of heritability are commonly used based

on the comparison of MZ vs twins that bypass some of these issues.
The covariance in phenotype between parent and child Children re-

semble their biological parents because children inherit their genome
from their parents (putting aside shared environments for the mo-
ment). If a mother and father are unrelated individuals, i.e. are two
random draws from the population, then this mother and her child
share one allele IBD at each locus (i.e. r1 = 1 and r0 = r2 = 0). Lets
assume that our mother (ind 1) transmits her paternal allele to the
child (ind 2), in which case XP 1 = XM 2 , and so Cov(XP 1 , XM 2 ) =
V ar(XP 1 ) = 12 VA , and all the other covariances in eqn. 7.10 are zero.
We’d also arrive at this result if instead we had thought of the mother
transmitting her own maternal allele. Thus Cov(X1 , X2 ) = 12 VA , we
can leverage this form of the covariance to directly estimate h2 by
regression.
We can estimate the narrow sense heritability through the regres- Figure 7.10: Song sparrow (Melospiza
sion of child’s phenotype on the parental mid-point phenotype. The melodia). “He is the most incurable
optimist of my acquaintance”.
parental mid-point phenotype is simply the average of the mum and Bird biographies (1923). Ball, A.E. illus-
trations by Horsfall R.B. Image from the
dad’s phenotype. See Figure 7.11 for an example from Song sparrows. Biodiversity Heritage Library. Contributed by
Not in copyright.
Slope= 0.43 Slope= 0.3

2
● ●
● ●
● ●
Offspring Mean Tarsus Length
●
Offspring Mean Beak Depth
● ●
●
●●
1
● ●
● ● ● ●● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ●●
●● ●
● ● ● ●
● ● ●
● ● ●
● ●● ● ● ● ● ● ●●
● ●● ● ●● ● ●
0
●● ●
● ●● ● ● ● ● ●
● ●● ● ●
●●● ●
● ● ●● ●
● ● ● ● ● ● ●
●
●
●
● ● ●
● ●
● ● Figure 7.11: Parent-midpoint off-
● ●● ● ●
●
● ● ●
spring regression for beak depth and
−1
−1
● ● ●
●
● ● tarsus length in song sparrows. The
●
phenotypes have been standardized
−2
−2
to have mean 0 and variance 1. The

● red line shows the best fitting slope,
−2 −1 0 1 2 −2 −1 0 1 2
whose slope is reported on the graph.
Mid−parent Beak Depth Mid−parent Tarsus Length Note that Smith and Zach (1979)
regressed the average offspring phe-
notype for each family on parental
mid-point (Xavg. kid ∼ Xmid ), as
We denote the child’s phenotype by Xkid and mid-point phenotype they had multiple offspring per fam-
by Xmid , so that if we take the regression Xkid ∼ Xmid this regres- ily. However, this doesn’t change the
slope of the regression from the form
sion has slope β = Cov(Xkid , Xmid )/V ar(Xmid ). The covariance of given by eqn (7.13). The grey line is
Cov(Xkid , Xmid ) = 12 VA , and V ar(Xmid ) = 12 V , as by taking the aver- the x = y line. Data from Smith and
Zach (1979), Code here.
age of the parents we have halved the variance, such that the slope of
the regression is
Cov(Xkid , Xmid ) VA
βmid,kid = = = h2 (7.13)
V ar(Xmid ) V
i.e. the regression of the child’s phenotype on the parental midpoint
population and
quantitative
genetics 109
phenotype is an estimate of the narrow sense heritability.7 If much of 7
See math appendix eq (A.45) for
the phenotypic variation is due to the (additive) differences in geno- more on regression slopes.
types among individuals (h2 ≈ 1), then children will closely resemble
their parents. Conversely if much of the variation is environmental
(h2 ≈ 0), and there is no shared environment between parent and
child, children will not resemble their parents.
L = 100 VE= 100 , VA=1 L = 100 VE= 1 , VA=1 L = 100 VE= 0.001 , VA=1 Figure 7.12: Regression of child’s
phenotype of the parental mid-point
4
30
● ● ●
● ● ●
●
● ●
● ●●
●
●
●
●
●
● ●
●● ●
●
●
●
phenotype. The three panels show
2
● ● ● ●
● ●
decreasing levels of environmental
● ●
● ●
●●
●● ●● ● ●●
●● ● ●●
20
● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ●● ● ● ●
●● ●
● ●● ● ● ● ● ● ● ●
● ● ● ● ●● ● ●●●●● ● ● ● ● ●●
● ●● ● ● ●● ● ● ● ● ● ●●
● ●●●● ● ● ●
● ●● ●● ●● ●●●
●● ● ●
variance (VE ) holding the additive
2
● ● ● ● ●●
● ●
● ● ●● ●●
● ● ● ●●● ●● ● ●● ● ●● ● ●● ● ● ●● ●● ●●●● ● ●●● ●
● ●● ● ●
●
1
●● ● ●
●●●●● ●
●●●● ●
●●●● ●●● ●●●
●●● ●●●● ●● ● ● ● ● ●● ● ●●
●● ●●● ●●● ● ● ●●● ● ● ●● ● ●
●●● ●●● ●● ● ●●
● ● ●●● ●●●●● ●●●● ●●
10
●
● ●●
genetic variance constant (VA = 1).
● ● ● ● ●● ●● ●
● ●●●● ● ●●● ●●●●●●●● ●● ●● ● ●● ● ●●● ●● ● ●●●● ●
●●●
●● ●
● ● ●● ● ● ●●
●
Child's phenotype
Child's phenotype
Child's phenotype
● ● ● ●●● ●●● ●●●●
●●●●● ● ● ●
●● ●●●●● ● ●
● ● ●●●● ● ● ● ● ● ● ● ●● ● ●●● ●●
●●● ●
● ● ●
● ●● ●●●● ●●●● ●
●●● ●● ● ● ● ● ●● ●● ●
● ●●● ● ●● ●●● ● ● ●● ● ●●●
●● ● ●●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ● ● ●● ●
● ●●●●●●●
●
●● ●● ● ●● ●● ●●●● ●●●●●●●● ●●●●●
●
In these figures, we simulate 100 loci,
●●● ●● ●●● ● ● ● ● ● ●● ● ●
●
●●● ●● ●●● ●●●●●●●
● ●●●
●● ●
● ●●
●● ● ●
● ●● ●●●● ●● ●●●●● ● ● ●●●●● ●●
●●●●●● ●
●● ●● ● ●●
● ● ● ●● ● ●● ● ●● ● ●● ● ●
0
●●●● ●●● ● ● ●● ●●● ●●● ● ●●●● ●● ●
●● ●●● ●●● ● ●● ● ●●● ● ● ● ●●●● ●● ●● ● ●● ●● ●●● ●
●●● ● ● ● ● ●●
● ●●●●
●●
●● ●●● ● ●
● ●●●●●● ● ●●●● ● ●● ●● ● ●●
● ●
●
●
●●
●●●●
●●●
● ● ●● ●
●● ●
●●● ●
●● ● ● ●● ●● ● ●●
0
●● ●
● ● ●●
0
●
● ●
●●●● ●● ●● ●●●●●●●● ●● ●● ●● ●●
● ● ●●●●● ●●● ●●●● ● ●●●●
● ● ●●●●
● ●●●●● ● ●
as described in the caption of Figure
●● ●● ● ● ● ● ●●●●● ●● ●● ● ●●
● ●●●● ● ● ● ● ●● ●●● ●● ●●●●●● ●● ●● ●● ● ●●● ●●● ● ●
● ●● ●
●● ●●●
● ●●●●●● ● ●
●
● ● ● ●● ●●●
● ● ● ●
● ●
●
●
●● ● ● ● ● ●
● ●
●
●
● ●
● ● ●
●
●●●● ● ●● ●● ● ● ● ●●● ●● ●●●● ● ●●
●●
● ●● ● ●
●
●
● ●● ● ●
● ●
● ●●● ● ● ● ●●●● ●●●●●●● ● ● ●●●● ● ●● ● ●● ● ●●● ●
● ●
● ● ●
●●● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ● ●●
● ● ● ● ●●●● ●● ●●
● ●● ● ●● ●
7.5.We simulate the genotypes and
−1
●●●●● ● ● ●● ●●● ●● ● ● ● ● ●
●● ●● ●●
●●
●
●●●● ● ● ● ● ● ●●● ●
● ● ●
●●
●●
● ●●●●●● ●● ●
● ● ● ● ●● ●●●●●● ●
−10
● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●
●●●
●●● ● ●●
●● ● ●●● ●●●● ● ●●
● ●●● ●●● ●● ● ●
●●● ● ●
● ● ●● ● ● ●● ●● ●●● ● ●●
●● ● ● ●●
● ●● ●● ●● ● ● ●●●
●●● ● ● ● ● ● ● ●●● ● ● ●●●●● ● ● ●●● ● ●
●
● ● ●●●● ● ●●●
● ● ●● ● ● ● ●●● ● ●
● ●●● ● ●● ●● ●
●●●● ●
●
phenotypes of the two parents, and
−2
● ●● ● ● ●● ● ● ● ● ●
● ● ●● ● ●● ● ●● ●
● ●●
●
● ● ●● ●● ●●●● ●
−2
● ●● ●●● ● ● ● ● ● ● ●● ● ●● ●
●
then simulate the child’s genotype
−20
●
● ●● ● ● ● ●● ● ●● ● ● ●●●●
● ● ● ●
● ● ●
●
●● ● ● ●
following mendelian transmission. The

● ●
●● ● ● ● ● ●●
●● ● ●
−3
−30
−4
● ● ● ● blue line shows x = y and the red line

−20 −10 0 10 20 −3 −2 −1 0 1 2 3 −2 −1 0 1 shows the best fitting linear regression
Parental midpoint Parental midpoint Parental midpoint line. Code here.
Applying this heritability estimate to the Song sparrow sample

we find h2 = 0.43 and h2 = 0.3 for beak depth and tarsus length
respectively from Figure 7.11. So in Smith and Zach (1979) anal-
ysis, for example, 30% of the variance in tarsus length is atttributal
to the additive effect of genetic differences among individuals. Smith
and Zach (1979) also regressed the average offspring phenotype
agains their fathers or mothers against their offspring, giving a slope
of βdad,avg. kid and βmum,kid , and for tarsus length, for example, they
found βdad,avg. kid = 0.19 and βmum,avg. kid = 0.17. Following a
similar argument to that in eqn (7.13) we find that these slopes are
2
βdad,kid = VA/2/V = h /2, and the same for mums. Thus the regression Slope= −0.17
●
6.2
of offspring’s phenotype on a particular parent is an estimate of half

●
the narrow-sense heritability, in line with the reduced slopes found by ● ● ●

Offspring Mean Beak Depth
6.0
Smith and Zach (1979), this halfing of the slope is due to the fact ●
●
that a single parent’s phenotype is a noisier estimate of the parental ●

●
●
●
●
● ● ●
5.8
mid-point and so less informative about the child s phenotype. These ●

●
●
●
parent specific estimates of heritability are particularly useful as they ●
allow us to investigate sex-specific inheritance and sexual dimorphism

5.6
● ●
(we’ll explore this in a later section). ●

●
Estimating heritability by these various parent-offspring regres- 5.6 5.8 6.0 6.2
Mid−Foster Parent Beak Depth

sion have the issue of not controlling for environmental correlations
between parent and offspring, which can inflate our estimates of heri- Figure 7.13: Foster Parent-midpoint
offspring regression for beak depth
tability (as we will mistake environmentally mediated resemblance for
and tarsus length in song sparrows.
genetics). Raising the organisms in the lab could remove much of the The red line shows the best fitting
slope, whose slope is reported on the
graph. The slope is not significant.
The grey line is the x = y line. Data
from Smith and Dhondt (1980),
Code here.
110 graham coop
potential for shared environment between parent and offspring, but it

also removes much of the environmental variation and we (as evolu-
tionary geneticists) are usually not primarily interested in knowing the
heritability in the lab bur rather in the field. In some organisms, no-
tably plants, we can begin to sidestep these issues by raising offspring
in a common set of randomized field conditions (a so called “common
garden”). Another option is cross-foster animals, for example Smith
and Dhondt (1980) returned to the song sparrow population and
swapped eggs between parents nests. They found that the covariance
between biological parents and children was still high despite these
children being raised in a different nest, but that there was no sig-
nificant covariance between foster parents and their non-biological
children (see Figure 7.13 for beak depth). This suggests that family
environment is not confounder in estimating the heritability in this
song sparrow sample. However, such manipulations are often impos-
sible in many systems, and issues of shared environmental covariance
due to maternal resources from egg (or seed) are still present.
Despite its issues, this measure of heritability provides useful in-
tuition and is directly relevant to our discussion of the response to
selection in the next chapter. That’s because our regression allows us
to attempt to predict the phenotype of the child given the phenotypes
of the parents; how well we can do this depends on the slope. See Fig-
ure 7.12 for examples. If the slope is close to zero then the parental
phenotypes hold no information about the phenotype of the child,
while if the slope is close to one then the parental mid-point is a good
guess at the child’s phenotype. As we will see, natural selection will
only efficiently drive evolution if children resemble their parents.
Thinking abour our prediction of child’s phenotpye more formally,
the expected phenotype of the child given the parental phenotypes is
E(Xkid |Xmum , Xdad ) = µ + βmid,kid (Xmid − µ) = µ + h2 (Xmid − µ)

(7.14)
which follows from the definition of linear regression. So to find the
child’s predicted phenotype, we simply take the mean phenotype and
add on the difference between our parental mid-point and the popula-
tion mean, multiplied by our narrow sense heritability.
Question 2. Briefly explain what Galton meant by ’regression
towards mediocrity’, and why he observed this pattern in light of
Mendelian inheritance.
The covariance between general pairs of relatives under an additive

model The above examples above make clear that to understand
the covariance between phenotypes of relatives, we simply need to
think about the alleles they share IBD. Consider a pair of relatives (1
population and
quantitative
genetics 111
and 2) with a probability r0 , r1 , and r2 of sharing zero, one, or two
alleles IBD respectively. When they share zero alleles Cov((X1M +
X1P ), (X2M + X2P )) = 0, when they share one allele Cov((X1M +
X1P ), (X2M + X2P )) = V ar(X1M ) = 21 VA , and when they share two
alleles Cov((X1M + X1P ), (X2M + X2P )) = VA . Therefore, the general
covariance between two relatives is
1
Cov(X1 , X2 ) = r0 × 0 + r1 VA + r2 VA = 2F1,2 VA (7.15)
2
So under a simple additive model of the genetic basis of a pheno-
type, to measure the narrow sense heritability we need to measure the
covariance between pairs of relatives (assuming that we can remove
the effect of shared environmental noise). From the covariance be-
tween relatives we can calculate VA , and we can then divide this by
the total phenotypic variance to get h2 .
Question 3. A) In polygynous red-winged blackbird populations

(i.e. males mate with several females), paternal half-sibs can be iden-
tified. Suppose that the covariance of tarsus lengths among half-sibs
is 0.25 cm2 and that the total phenotypic variance is 4 cm2 . Use these
data to estimate h2 for tarsus length in this population.
B) Why might paternal half-sibs be preferable for measuring heri-
tability than maternal half-sibs?
Estimating additive genetic variance across a variety of different rela-

tionships (The animal model). In many natural populations we may Figure 7.14: Red-winged black-
bird and Tricoloured Red-winged
have access to individuals with a range of different relationships to
blackbirds (Agelaius phoeniceus and
each other (e.g. through monitoring of the paternity of individuals), Agelaius tricolor).
Bird-lore (1899). National Association of
but relatively few pairs of individuals for a specific relationship (e.g. Audubon Societies for the Protection of
Wild Birds and Animals. Image from the
Biodiversity Heritage Library. Contributed by
sibs). We can try and use this information on various relatives as fully American Museum of Natural History Library.
Not in copyright.
as possible in a mixed model framework. Building from equation 7.3,
we can write an individual’s phenotype Xi as
Xi = µ + XA,i + XE,i (7.16)
where XE,i ∼ N (0, VE ) and XA,i is normally distributed across indi-

viduals with covariance matrix VA A, where the the entries for a pair
of individuals i and j are Aij = 2Fi,j and Aii = 1. Given the matrix
A we can estimate VA . We can also add fixed effects into this model
to account for generation effects, additional mixed effects could also
be included to account for shared environments between particular
individuals (e.g. a shared nest). This approach is sometimes called the
“animal model”, and is widely used to in modern quantitative gentics
to estimate genetic variances and heritabilities.
112 graham coop
7.1 Multiple traits

Traits often covary with each other, both due to environmentally in-
duced effects (e.g. due to the effects of diet on multiple traits) and
due to the expression of underlying genetic covariance between traits.
Genetic covariance, in turn, can reflect pleiotropy, a mechanistic effect
of an allele on multiple traits (e.g. variants that affect skin pigmenta-
tion often affect hair color), the genetic linkage of loci independently
affecting multiple traits, or the effects of assortative mating.
Consider two traits X1,i and X2,i in an individual i. These traits
could be, say, the individual’s leg length and nose length. As before,
we can write these as
X1,i = µ1 + X1,A,i + X1,E,i

X2,i = µ2 + X2,A,i + X2,E,i
(7.17)
As before we can talk about the total phenotypic variance (V1 , V2 ),

environmental variance (V1,E and V2,E ), and the additive genetic
variance for trait one and two (V1,A , V2,A ). But now we also have
to consider the total covariance between trait one and trait two,
V1,2 = Cov(X1 , X2 ), as well as the environmentally induced covari-
ance (VE,1,2 = Cov(X1,E , X2,E )) and the additive genetic covariance
(VA,1,2 = Cov(X1,A , X2,A )). To better understand the covariance aris-
ing due to pleiotropy, let’s think about a set of L SNPs contributing
to our two traits. If the additive effect of an allele at the ith SNP is
αi,1 and αi,2 on traits 1 and 2, then the additive covariance between
our traits is
∑L
VA,1,2 = 2αi,1 αi,2 pi (1 − pi ) (7.18)
i=1
0.04
assuming our loci are in linkage disequilibrium. Thus a genetic corre- ●
●
●
lation arises due to pleiotropy, because loci that tend to affect trait 1
0.02
● ●
● ●
Height effect size
●●
also systematically affect trait 2. For example, alleles associated with ●

●
● ● ●● ●
●
● ●●
0.00
● ●
● ● ● ●
later Age at Menarche (AAM) in European women also tend to be ●

●
● ●
●
● ●
●● ●
−0.02
positively associated with height (see Figure 7.15), thereby creating a ● ●
genetic correlation between AAM and height.

−0.04
We can store our variance and covariance values in matrices, a way −0.06 −0.04 −0.02 0.00 0.02 0.04
of gathering these terms that will be useful when we discuss selection: AAM effect size
( )
Figure 7.15: The additive effect sizes
V1 V1,2
V= (7.19) of loci associated with female Age at
V1,2 V2 Menarche (AAM) and their effect size
on Height in a European population.
and ( ) Data from Pickrell et al. (2016).
V1,A VA,1,2 Code here.
G= (7.20)
VA,1,2 V2,A
population and
quantitative
genetics 113
Here we’ve shown the matrices for two traits, but we can generalize
this to an arbitrary number of traits.
We can estimate these quantities, in a similar way as before, by
studying the covariance in different traits between relatives:
Cov(X1,i , X2,j ) = 2Fi,j VA,1,2 (7.21)
An example of phenotype and genetic covariance are shown on

the left and right of Figure 7.17 respectively. Gray treefrogs (Hyla
versicolor) chorus to attract mates. Their call is made up of a trill,
a note rapidly pulsed a number of times, that is then repeated after
some period. Female frogs prefer males who make a lot of calls and
where each of those calls have a large number of pulses. However,
doing both is be very energetic, and so there is potentially a tradeoff
Figure 7.16: Grey treefrog (Hyla
between these two aspects of a male frog’s call. Indeed Welch et al. versicolor)
Historia Natural, tomo V “Reptiles y peces”
(2014) found in lab-reared male frogs that the pulse number and the (1874) Juan Vilanova y Piera, p. 156. Image
from wikimedia contributed by Dorieo.
time period between calls were positively correlated, left side of Figure Cropped. Public Domain
7.17, i.e. individuals were investing their energy in making either few
highly pulsed calls or many calls with few pulses. This phenotypic
covariance reflects underlying a genetic covariance between theses two
frog call characteristics (right side Figure 7.17). Fathers whose sons
have calls with highly pulsed calls also have sons whose calls are more
spaced apart.
Figure 7.17: Phenotypic and Genetic

35
correlations in male grey treeforg

24
(Hyla versicolor) calls. On the left

30
22
each male is shown as a dot, recording

25
their inter-call period and the number

20
of pulses in each call. One the right

Son Pulse Number
20
Pulse Number
each dot corresponds to a father with

18
the mean of sons for both phenotypes.

15
Data from Welch et al. (2014)

16
downloaded from dryad, Code here.

10
14
12
2 5 10 20 50 5 10 15 20 25
Call period (sec) Son Call period (sec)
One useful summary of a genetic covariance is the genetic correla-

tion between two phenotypes
VA,1,2
rg = √ (7.22)
VA,1 VA,2
where VA,1 and VA,2 are the additive genetic variance for trait 1 and
2 respectively. Here, rg tells us to what extent the additive genetic
variance in two traits is correlated.
114 graham coop
Another important application of genetic covariances is in the study

of sexually antagonistic selection and the evolution of sexual dimor-
phism, here we’ll calculate the genetic covariance between male and
female phenotypes. For example, below is the relationship between
the forehead patch size for Pied fly-catcher fathers and their sons and
daughters. The phenotype has been standardized to have mean 0 and
variance 1 in each group. The phenotypic covariance of the sample
of fathers and sons is 0.35, while the phenotypic covariance of fathers
and daughter is 0.23.
Figure 7.18: Relationship of stan-

dardized forehead patch size between
● ● fathers and sons and daughters in
●
●
Pied fly-catchers. Data from Potti
Size of Daughter's forehead patch
3
● ●
and Canal. Code here.
Size of son's forehead patch
●
2
● ●
●
●
●
2
● ●● ● ●
● ● ● ●
● ● ●
● ●
● ●
1
● ● ● ●
● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ●
1
● ●
● ● ●
●●● ● ●
● ●●
● ● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●●● ●
● ● ● ●● ● ● ● ●● ● ●
● ● ●● ● ● ● ● ● ● ● ● ● ●
0
●●● ●● ● ●● ●
0
● ●● ●● ● ● ●
●
● ●● ● ●● ● ● ● ●●
●
●
● ● ●● ● ● ● ●
●●
● ● ●● ● ● ● ● ●● ● ● ● ● ●
● ●
●
● ●● ● ●
● ● ● ●
●
● ● ● ●● ● ● ●● ●
● ● ● ●● ● ●
● ●● ● ●●
● ● ●● ●● ● ● ● ● ●
−1
● ● ●
● ●● ● ● ● ●
● ● ●
●● ● ● ● ●
−1
● ●
●● ● ●● ● ● ● ●
●
● ● ● ●
●
● ● ● ●
● ●
● ●
−2
● ● ● ●
−2 −1 0 1 2 3 −2 −1 0 1 2
Size of Father's forehead patch Size of Father's forehead patch
Question 4. Assume we can ignore the effect of the shared envi-

ronment in our Pied fly-catcher example.
A) What is the additive genetic covariance between male and fe-
male patch size?
Figure 7.19: Ficedula hypoleuca, Pied
B) What is the additive genetic correlation of male and female fly-catcher.
patch size? You can assume that the additive genetic variance is the Coloured illustrations of British birds, and
their eggs (1842-1850). London :G.W.
Nickisson. Image from the Biodiversity
same in males and females. Heritage Library. Contributed by Smithsonian
Libraries. Not in copyright.
7.1.1 Non-additive variation.
Up to now we’ve assumed that our alleles contribute to our pheno-

type in an additive fashion. However, that does not have to be the
case as there may be non-additivity among the alleles present at a
locus (dominance) or among alleles at different loci (epistasis). We
can accommodate these complications into our models. We do this
by partitioning our total genetic variance into independent variance
components.
population and
quantitative
genetics 115
Dominance. To understand the effect of dominance, let’s consider
how the allele that a parent transmits influences their offspring’s phe-
notype. A parent transmits one of their two alleles at a locus to their
offspring. Assuming that individuals mate at random, this allele is
paired with another allele drawn at random from the population. For
example, assume your mother transmitted an allele 1 to you: with
probability p it would be paired with another allele 1, and you would
be a homozygote; and with probability q it’s paired with a 2 allele and
you’re a heterozygote.
Now consider an autosomal biallelic locus ℓ, with frequency p for
allele 1, and genotypes 0, 1, and 2 corresponding to how many copies
of allele 1 individuals carry. We’ll denote the mean phenotype of an
individual with genotype 0, 1, and 2 as X ℓ,0 , X ℓ,1 , X ℓ,2 respectively.
This mean is taking an average phenotype over all the environments
and genetic backgrounds the alleles are present on. We’ll mean center
′
(MC) these phenotypic values, setting X ℓ,0 = X ℓ,0 − µ, and likewise
for the other genotypes.
We can think about the average (marginal) MC phenotype of an
individual who received an allele 1 from their parent as the average
of the MC phenotype for heterozgotes and 11 homozygotes, weighted
by the probability that the individual has these genotypes, i.e. the
probability they receive an additional allele 1 or an allele 2 from their
other parent:
′ ′
aℓ,1 = pX ℓ,2 + qX ℓ,1 , (7.23)
Similarly, if your parent transmitted an 2 allele to you, your average
MC phenotype would be
′ ′
aℓ,2 = pX ℓ,1 + qX ℓ,0 (7.24)
Let’s now consider the average phenotype of an offspring by how

many copies of the allele 1 they carry
genotype: 0, 1, 2.
additive genetic value: aℓ,2 + aℓ,2 , aℓ,1 + aℓ,2 , aℓ,1 + aℓ,1
i.e. the mean phenotype of each genotypes’ offspring averaged over
all possible matings to other individuals in the population (assuming
individuals mate at random). Theses are the additive MC genetic
values (breeding values) of our genotypes. Here we are simply adding
up the additive contributions of the alleles present in each genotype
and ignoring any non-additive effects of genotype.
To illustrate this, in Figure 7.20 we plot two different cases of dom-
inance relationships; in the top row an additive polymorphism and in
the second row a fully dominant allele. The additive genetic values of
the genotypes are shown as red dots. Note that the additive values of
the genotypes line up with the observed MC phenotypic means in the
116 graham coop
Figure 7.20: The average mean-

centered (MC) phenotypes plotted
1.5
range(a) * c(0.5, 1.5) − pop.mean
0.5
against the number of allele 1 carried
(from 0 for 22 to 2 for 11). Top
1.0
● ●
Row: Additive relationship between
0.0
genotype and phenotype. Bottom
0.5
● ●
Row: Allele 1 is dominant over allele
−0.5
2, such that the heterozygote has the
0.0
● ● same phenotype as the 11 genotype.
−1.0
The area of each circle is proportion
to the fraction of the population in
0 1 2 0 1 2
each genotypic class (p2 , 2pq, and
c(−0.5, 2.5) c(−0.5, 2.5) q 2 ). One the left column p = 0.1
0.4
0.8
range(a) * c(0.5, 1.5) − pop.mean

and the right column is p = 0.9.
The additive genetic values of the
0.2
0.6
0.0 genotypes are shown as red dots. The

0.4
● ● ● ●
●
●
●
regression between phenotype and
−0.2
additive genotype is shown as a red

0.2
line. The black vertical arrows show

0.0
● ● the difference between the average

−0.6
−0.2
MC phenotype and additive genetic

value for each genotype. Code here.
0 1 2 0 1 2
top row, when our alleles interact in a completely additive manner.

Our additive genetic values always fall along a linear line (the red line
in our figure). The additive values are falling along the best fitting line
of linear regression for our population, when phenotype is regressed
against the additive genotype (0, 1, 2 copies of allele 1) across all in-
dividuals in our population. Note in the dominant case the additive
genetic values differ from the observed phenotypic means, and are
closer to the observed values for the genotypes that are most common
in the population.
The difference in the additive effect of the two alleles aℓ,2 − aℓ,1
can be interpreted as an average effect of swapping an allele 1 for an
allele 2; we’ll call this difference αℓ = aℓ,2 − aℓ,1 . Our αℓ is also the
slope of the regression of phenotype against genotype (the red line
in Figure 7.20). Note that the slope of our regression of phenotype
on genotype (αℓ ) does not depend on the population allele frequency
for our completely additive locus (top row of 7.20). In contrast, when
there is dominance, the slope between genotype and phenotype (αℓ )
is a function of allele frequency (bottom row of 7.20). When a domi-
nant allele (1) is rare there is a strong slope of phenotype on genotype,
bottom left Figure 7.20. This strong slope is because replacing a single
copy of the 2 allele with a 1 allele in an individual has a big effect on
average phenotype, as it will most likely move an individual from be-
ing a 22 homozygote to being a 12 heterozygote. In contrast, when the
dominant allele (1) is common in the population, replacing a 2 allele
by a 1 allele in an individual on average has little phenotypic effect,
population and
quantitative
genetics 117
leading to a weak slope bottom right Figure 7.20. This small effect is
because as we are mainly turning heterozygotes into homozygotes (11),
who have the same mean phenotype as each other.
As as an example of how dominance and population allele frequen-
cies can change the additive effect of an allele, let’s consider the ge-
netics of the age of sexual maturity in Atlantic Salmon. A single allele
of large effect segregates in Atlantic Salmon that influences the sexual
maturation rate in salmon (Ayllon et al., 2015; Barson et al., Figure 7.21: Atlantic Salmon (Salmo
salar).
2015), and hence the timing of their return from the sea to spawn Histoire naturelle des poissons. 1796. Bloch,
M. E. Image from the Biodiversity Heritage
(sea age). The allele falls close to the autosomal gene VGLL3 (Cous- Library. Contributed by Ernst Mayr Library
, Museum of Comparative Zoology. Not in
miner et al., 2013, variation at this gene in humans also influences copyright.
the timing of puberty). The left side of Figure 7.22 shows the age at
sexual maturity in males. The L allele associated with slower sexual
maturity is recessive in males. While the LL homozygotes mature on
average a whole year later, the additive effect of the allele is weak
while the L allele is rare in the population. The right panel shows
the effect of the L allele in females. Note how the allele is much more
dominant in females, and has a much more pronounced additive ef-
fect. The dominance of an allele is not a fixed property of the allele
but rather a statement of the relationship of genotype to phenotype,
such that the dominance relationship between alleles may vary across
phenotypes and contexts (e.g. sexes).
Males Females Figure 7.22: The average age at

2.8
sexual maturity for each genotype,

2.2
broken down by sex. The area of each

2.6
●
●
circle is proportional to the fraction
2.0
2.4
Age at maturity
Age at maturity
of the population in each genotypic

1.8
● class. The regression between pheno-

2.2
type and additive genotype is shown

1.6
2.0
as a red line. Data from Barson

1.4
1.8
● ●
1.2
●
1.6
EE LE LL EE LE LL
Genotype Genotype
The variance in the population phenotype due to these additive

breeding values at locus ℓ, assuming HW proportions, is
VA,ℓ = p2 (2aℓ,2 )2 + 2pq(aℓ,1 + aℓ,1 )2 + q 2 (2aℓ,0 )2

= 2(pa2ℓ,1 + qa2ℓ,2 )
= 2pqαℓ2 (7.25)
The total additive variance for the whole genotype can be found by
118 graham coop
summing the individual additive genetic variances over loci
∑
L ∑
L
VA = VA,ℓ = 2pℓ qℓ αℓ2 . (7.26)
ℓ=1 ℓ=1
Having assigned the additive genetic variance to be the variance ex-

plained by the additive contribution of the alleles at a locus, we define
the dominance variance as the population variance among genotypes
at a locus due to their deviation from additivity. We can calculate
how much each genotypic mean deviates away from its additive predic-
tion at locus ℓ (the length of the arrows in Figure 7.20). For example,
the heterozygote deviates
′
dℓ,1 = X ℓ,1 − (aℓ,1 + aℓ,2 ) (7.27)
away from its additive genetic value, with similar expressions for each
of the homozygotes (dℓ,0 and dℓ,2 ). We can then write the dominance
variance at our locus as the genotype-frequency weighted sum of our
squared dominance deviations
VD,ℓ = p2 d2ℓ,0 + 2pqdℓ,1

2
+ q 2 d2ℓ,2 . (7.28)
Writing our total dominance variance as the sum across loci
∑
L
VD = VD,ℓ . (7.29)
ℓ=1
Having now partitioned all of the genetic variance into additive and
dominant terms, we can write our total genetic variance as
VG = VA + VD . (7.30)
We can do this because by construction the covariance between our

additive and dominant deviations for the genotypes is zero. We
can define the narrow sense heritability as before h2 = VA /VP =
VA /(VG + VE ), which is the proportion of phenotypic variance due to
additive genetic variance. We can also define the total proportion of
the phenotypic variance due to genetic differences among individuals,
as the broad-sense heritability H 2 = VG /(VG + VE ).
Relationship (i,j)∗ Cov(Xi , Xj ) Table 7.1: Phenotypic covariance be-

tween some pairs of relatives, include
parent–child 1/2V
A the dominance variation. ∗ Assuming
full siblings 1/2V + 1/4V
A D this is the only relationship the pair of
identical (monzygotic) twins VA + VD individuals share (above that expected
from randomly sampling individuals
1st cousins 1/8V
A from the population).
When dominance is present in the loci influencing our trait (VD >
0), we need to modify our phenotype covariance among relatives to
population and
quantitative
genetics 119
account for this non-additivity. Specifically, our equation for the
covariance among a general pair of relatives (eqn. 7.15 for additive
variation) becomes
Cov(X1 , X2 ) = 2F1,2 VA + r2 VD (7.31)
where r2 is the probability that the pair of individuals share 2 alle-

les identical by descent, making the same assumptions (other than
additivity) that we made in deriving eqn. 7.15. In table 7.1 we show
the phenotypic covariance for some common pairs of relatives. The
regression of offspring phenotype on parental midpoint still has a slope
VA /VP .
Full sibs and parent-offspring have the same covariance if there
is no dominance variance (as they have the same kinship coefficient
F1,2 ). However, when dominance is present (VD > 0), full-sibs re-
semble each other more than parent-offspring pairs. That’s because
parents and offspring share precisely one allele, while full-sibs can
share both alleles (i.e. the full genotype at a locus) identical by de-
scent. We can attempt to estimate VD by comparing different sets of
relationships. For example, non-identical twins (full sibs born at same
time) should have 1/2 the phenotypic covariance of identical twins
if VD = 0. Therefore, we can attempt to estimate VD by looking at
whether identical twins have more than twice the phenotypic covari-
ance than non-identical twins.
The most important aspect of this discussion for thinking about
evolutionary genetics is that the parent-offspring covariance is still
only a function of VA . This is because our parent (e.g. the mother)
transmits only a single allele, at each locus, to its offspring. The other
allele the offspring receives is random (assuming random mating), as
it comes from the other unrelated parent (the father). Therefore, the
average effect on the child’s phenotype of an allele the child receives
from their mother is averaged over all possible random alleles the child
could receive from their father (weighted by their frequency in the
population). Thus we only care about the additive effect of the allele,
as parents transmit only alleles (not genotypes) to their offspring.
This means that the short-term response to selection, as described by
the breeder’s equation, depends only on VA and the additive effect
of alleles. Therefore, if we can estimate the narrow-sense heritability
we can predict the short-term response. However, if alleles display
dominance, our value of VA will change as alleles at our loci change in
frequency, e.g. as dominant alleles become common in the population
their contribution to VA decreases. Therefore, if there is dominance
our value of VA will not be constant across generations.
Up to this point we have only considered dominance and not epis-
tasis. However, we can include epistasis in a similar manner (for ex-
120 graham coop
ample among pairs of loci). This gets a little tricky to think about,
so we will only briefly explain it. We can first estimate the additive
effect of the alleles by considering the effect of the alleles averaging
over their possible genetic backgrounds (including the other interact-
ing alleles they are possibly paired with), just as before. We can then
calculate the additive genetic variance from this. We can estimate the
dominance variance, by calculating the residual variance among geno-
types at a locus unexplained by the additive effect of the loci. We can
then estimate the epistatic variance by estimating the residual vari-
ance left unexplained among the two locus genotypes after accounting
for the additive and dominant deviations calculated from each locus
separately. In practice these high variance components are hard to
estimate, and usually small as much of our variance is assigned to the
additive effect. Again we would find that we mostly care about VA for
predicting short-term evolution, but that the contribution of loci to
the additive genetic variance will depend on the epistatic relationships
among loci.
Question 5. How could you use 1/2 sibs vs. full-sibs to estimate
VD ? Why might this be difficult in practice? Why are identical vs.
non-identical twins better suited for this?
Question 6. Can you construct a case where VA = 0 and VD > 0?

You need just describe it qualitatively; you don’t need to work out the
math. (tricker question).
8
The Response to Phenotypic Selection
See Lewontin (1970). Note that

Evolution by natural selection requires: these requirements are not specific to
DNA, i.e. the concept of evolution
1. Variation in a phenotype by natural selection is substrate
independent.
2. That survival is non-random with respect to this phenotypic varia-
tion.
3. That this variation is heritable.

Points 1 and 2 encapsulate our idea of Natural Selection, but evolution
by natural selection will only occur if the 3rd condition is also met.
1 It is the heritable nature of variation that couples change within 1
Some people consider natural se-
a generation due to natural selection to change across generations lection to only operate on heritable
phenotype varation and so require all
(evolutionary change). three conditions to say that natural
Let’s start by thinking about the change within a generation due selection occurs. This is mostly a
semantic point, however, it is useful
to directional selection, where selection acts to change the mean phe- to be able to distinguish the action of
notype within a generation. For example, a decrease in mean height selection from a possible response.
within a generation, due to taller organisms having a lower chance of
surviving to reproduction than shorter organisms. Specifically, we’ll Phenotype distribution before selection
600
Frequency
denote our mean phenotype at reproduction by µS , i.e. after selec-

300
tion has acted, and our mean phenotype before selection acts by µBS .
0
−2 0 2 4
This second quantity may be hard to measure, as obviously selection Phenotype
acts throughout the life-cycle, so it might be easier to think of this as Phenotype distribution after selection, parental mean= 2.48
the mean phenotype if selection hadn’t acted. So the change in mean

100
Frequency
phenotype within a generation is µS − µBS = S.

0 40
−2 0 2 4
We are interested in predicting the distribution of phenotypes in Phenotype
the next generation. In particular, we are interested in the mean phe- Phenotype distribution in the children Mean in children = 1.2
notype in the next generation to understand how directional selection R

Frequency
1500
has contributed to evolutionary change. We’ll denote the mean phe-

0
notype in offspring, i.e. the mean phenotype in the next generation −2 0

Phenotype
2 4
before selection acts, as µN G . The change across generations we’ll call

the response to selection R and put this equal to µN G − µBS . Figure 8.1: Top. Distribution of a
phenotype in the parental population
The mean phenotype in the next generation is prior to selection, VA = VE = 1.
Middle. Only individuals in the top
µN G = E (E(Xkid |Xmum , Xdad )) (8.1) 10% of the phenotypic distribution are
selected to reproduce; the resulting
shift in the phenotypic mean is S.
Bottom. Phenotypic distribution
of children of the selected parents;
the shift in the mean phenotype is R.
Code here.
122 graham coop
where the outer expectation is over possible pairs of randomly mating VE=1, VA=1 (L =100)
individuals who survive to reproduce. We can use eqn. 7.14 to obtain ●

●
●
4
an expression for this expectation: ●
● ●● ●
● ● ● ●
● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ●●● ●● ● ● ● ●●
2
● ●●●●
Child's phenotype
● ● ● ● ● ●● ● ●
● ● ● ● ● ● ●●● ● ● ●
µN G = µBS + βmid,kid (E(Xmid ) − µBS ) (8.2) ● ● ●●●●● ● ●●●

● ●● ●● ●● ●● ●● ●
● ●
● ●
● ● ●● ● ●
●●●●●●●
●●●
●● ● ●●●● ●● ●
● ●● ●● ● ●●●● ●● ● ●●
● ●●● ● ●● ●●
●●● ●● ●●● ● ● ●● ● ● ●
● ● ●● ●● ●● ●
R ● ● ●●●
● ●
● ●
●● ● ● ●●● ●
●● ● ●● ● ●●
●
●●●●●●
●●● ●
● ● ● ● ●●●●
●
●●●● ● ●●●●
● ●
●●●●●
● ●● ●●● ● ● ●
●● ●
0
●●
●●● ●●● ●
●
●●●● ●●● ●
● ●●●
●
●● ● ●
●● ●
●●●
●
●●●●●●●
●
●●
● ●●● S
●●
●●● ●●● ●
●●●
●
●●
● ● ●
So to obtain µN G we need to compute E(Xmid ), the expected mid-

●● ● ● ●●● ●●● ●
●●● ●
● ●● ●●● ●●
● ● ●● ●
● ●● ● ● ●●●● ●● ●●
●●● ●
● ● ●●
●
●●
● ● ●●●● ● ●●
●●●●●● ●● ● ● ●
● ●●● ●
● ●● ● ●●●●● ●● ●●
●
●
● ● ●● ●● ●
●
●
● ●● ●●● ●●
● ● ● ●
−2
● ●● ● ●● ●● ●● ●
● ● ●● ● ●
point phenotype of pairs of individuals who survive to reproduce. Well ● ●
● ● ● ●● ● ●● ●
●●●● ●
● ● ●
● ●
this is just the expected phenotype in the individuals who survived to ●
●
−4
● ●
●
● ●
reproduce (µS ), so −3 −2 −1 0 1 2 3
Parental midpoint
µN G = µBS + h2 (µS − µBS ) (8.3)

Figure 8.2: A visual representation
So we can write our response to selection as of the Breeder’s equation. Regression
of child’s phenotype on parental mid-
R = µN G − µBS = h2 (µS − µBS ) = h2 S (8.4) point phenotype (VA = VE = 1). The
parents and children of all families are
shown as grey or red points, However,
So our response to selection is proportional to our selection differen-
under truncation selection, only
tial, and the constant of proportionality is the narrow sense heritabil- individuals with phenotypes > 1 (red)
ity. This equation is sometimes termed the Breeder’s equation. It is are bred. The use of the red families
only results in a phenotypic shift S in
a statement that the evolutionary change across generations (R) is the parental generation, which drives
proportional to the change caused by directional selection within a a shift R in the offspring generation.
Code here.
generation (S), and that the strength of this relationship is determined
by the narrow sense heritability (h2 ).
Figure 8.3: The relationship between

maternal and offspring corolla flare
(flower width) in P. viscosum. From
18
●
●
● ●
Galen’s data the covariance of
●
●
●
mother and child is 1.3, while the
16
● ● ●
●
●
● ●
● variance of the mother is 2.8. Data
Offspring corolla flare
● ● ●
● ●
●
● ●
from Galen (1996). Code here.
● ● ●
14
● ●
●● ●
● ●
●
● ●
● ● ●
12
● ●
● ● ●
●
●
● ●
●
● ●
10
● ●
8
8 10 12 14 16 18
Maternal corolla flare
Question 1. Galen (1996) explored selection on flower shape

in Polemonium viscosum. She found that plants with larger corolla
flare had more bumblebee visits, which resulted in higher seed set and
a 17% increase in corolla flare in the plants contributing to the next
Figure 8.4: Sticky jacob’s ladder
generation. Based on the data in the caption of Figure 8.3 what is the (Polemonium viscosum).
Flowers of Mountain and Plain (1920).
expected response in the next generation? Clements, E. Image from the Biodiversity
Heritage Library. Contributed by New York
Botanical Garden, Mertz Library. Not in
copyright. Cropped from original.
population and
quantitative
genetics 123
If we know R and S we can estimate h2 . Heritabilities estimated

like this are called ‘realized heritability’. Estimates of the ‘realized 1897
heritability’ can readily be produced in artificial selection experiments:
20
Parental gen. S
15
Frequency
10
5
Question 2. From the experiment shown in Figure 8.5, the mean
0
corn oil content in 1897 was 4.78, among the 24 individuals chosen to 3.5 4.0 4.5 5.0 5.5 6.0 6.5
all.inds
breed to for the next generation the mean was 5.2. The offspring of
10 15 20 25 30
Offspring gen. R
these individuals had a mean kernel oil content of 5.1. What is the
Frequency
narrow sense realized heritability?
5
0
To understand the genetic basis of the response to selection take a 3.5 4.0 4.5 5.0
Oil content (%)
5.5 6.0 6.5
look at Figure 8.6. The setup is the same as in our previous simulation
Figure 8.5: Top. Phenotypic distri-
figures. The individuals who are selected to form our next generation
bution of oil content in corn in 1897,
and the individuals who were selected
to breed for the next generation are
Parental generation marked in blue. Bottom. The distri-
bution in the next generation. Data
150
Frequency
from the Illinois selection experiment

All individuals available here, Code here.
Selected parents
50
0
80 90 100 110 120 130
Num. up alleles
Next generation Figure 8.6: Top. Distribution of the

number of up alleles in the parental
150
population prior to selection (red),

Frequency
Children for the selected individuals in the

top 10% phenotypic tail of the pop-
50
ulation (blue) Bottom. The same

0
distribution for the offspring of the se-

80 90 100 110 120 130 lected parents in the next generation
Num. up alleles (green). Code here.
carry more alleles that increase the phenotype in the current range of
environments currently experienced by the population. The average
individual before selection carried 100 of these ‘up’ alleles, while the
average individual surviving selection carries 108 ‘up’ alleles.
As individuals faithfully transmit their alleles to the next gener-
ation the average child of the selected parents carries 108 up alleles.
Note that the variance has changed little, the children have plenty of
variation in their genotype, such that selection can readily drive evo-
lution in future generations. The average frequency of an ‘up’ allele
has changed from 50% to 54%. Gains due to selection will be stably
inherited to future generations and can be compounded on generation
after generation if selection pressures were to remain constant.
Figure 8.7: Maize (Zea mays.)

Prof. Dr. Thomé’s Flora von Deutschland.
1886. Thomé, O. W. Image from the Biodi-
versity Heritage Library. Contributed by New
York Botanical Garden. Not in copyright.
124 graham coop
8.0.1 The Long-Term Response to Selection

If our selection pressure is sustained over many generations, we can
25
use our breeder’s equation to predict the response. If we are willing to
20
assume that our heritability does not change and we maintain a con-
% Oil content
15
stant selection differential (S), then after n generations our phenotype
mean will have shifted
10
nh2 S (8.5)
5
i.e. our population will keep up a linear response to selection. There-
0
1900 1920 1940 1960 1980 2000
Year
Illinois long term selection experiment
2010
Year
Figure 8.8: The mean oil content of

corn in the Illinois long term selection
experiment. Two populations were es-
1980
tablished in 1896 from the same inital
population. Two secondary popula-
Exp tions were established in 1948 where
1950 IHO the direction of selection was reversed.
ILO Linear fit to the up experiment shown
as a red line. Data available here,
Code here.
1920
0 5 10 15 20 25 Figure 8.9: Density plots showing the

Oil content (%) phenotypic distributions of the up-
and down-selection populations of the
Illinois long term selection experiment
over time. Data available here, Code
fore, long-term, consistent selection can drive impressive evolutionary here.
change. One example of this comes from a field experiment in Illinois,
where plant breeders have systematically selected for higher and lower
oil content in corn (see our previous Figure 8.5 for one generation of
up selection). For over a century, they have taking seeds from the
plants in the extremes of the distribution and using them to form the
next generation. They have achieved impressive long-term responses,
pushing the population distributions well beyond their initial range
(Figure 8.9.. They’ve established two secondary populations where the
selection differential was reversed. In the up-selection population they
have maintained an impressively linear increase in oil content, shown
by red line in Figure 8.8, but while the response is linear at first in the
down line but they quickly reach very low oil content.
Question 3. A population of red deer were trapped on Jersey (an

island off of England) during the last inter-glacial period. From the
fossil record 2 we can see that the population rapidly adapted to their 2
Lister, A., 1989 Rapid dwarfing
new conditions, perhaps due to selection for shorter reproductive times of red deer on Jersey in the last
interglacial. Nature 342(6249): 539
in the absence of predation. Within 6,000 years they evolved from an
estimated mean weight of the population of 200kg to an estimated
population and
quantitative
genetics 125
mean weight of 36kg (a 6 fold reduction)! You estimate that the gen-
eration time of red deer is 5 years and, from a current day population,
that the narrow sense heritability of the phenotype is 0.5.
A) Estimate the mean change per generation in the mean body
weight.
B) Estimate the change in mean body weight caused by selection
Figure 8.10: It’s not just deer that
within a generation. State your assumptions.
evolve to be small on islands, pygmy
C) Assuming we only have fossils from the founding population and mammoths and elephants have
the population after 6000 years, should we assume that the calcula- evolved from large mainland species
on numerous islands. For example,
tions accurately reflect what actually occurred within our population? the California Channel Islands were
home to a dwarf mammoth until
about 13,000 years ago.
Santa Rosa Mammuthus exilis. wikimedia, CC BY
3.0.
In wild populations, selection pressures are likely rarely sustained
for large numbers of generations. For example, the Grants’ have mea-
sured phenotypic selection in Darwin’s Finches over multiple decades
on the island of Daphne Major. They have seen that selection pres-
sures in the Medium ground-finch (Geospiza fortis) have reversed a
number of times over the years (Figure 8.11).
● Figure 8.11: Top) Mean body size of

0.6
● ●
● the Medium ground-finch population
Mean body size
●
●
measured each year. The 1973 95%
0.4
● ●
● ●
● confidence intervals are shown as
0.2
horizontal bars. Bottom) Standard-

●
●
ized selection differentials on body
●
● ● ● ● ● ●
●
●
size. The statistical significance of
−0.2
●
● ● ●
● ●
the selection differentials is shown,
●
black points are p < 0.001 and grey
1975 1980 1985 1990 1995 2000 p < 0.05. Data from Grant and
0.5
●
Grant (2002) Code here.
Selection gradient
0.3
●
0.1
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
−0.1
● ● ●
●
1975 1980 1985 1990 1995 2000
Patterns of long-term phenotypic change in the wild. Looking across

the diversity of plants and animals we see huge changes in size and
form, can the strengths of selection we can observe over short time
periods possibly explain these changes? Figure 8.12: Medium ground-finch
(Geospiza fortis).
To compare phenotypic changes over various time periods we need The zoology of the voyage of H.M.S. Beagle.
Birds Part 3. (1841) Gould G. Edited by
some measure of the rate of phenotypic change. (Haldane, 1949) Darwin, C. Illustration by Elizabeth Gould.
proposed the rate of change from X1 to X2 in time interval ∆t, mea- Contributed by Natural History Museum
Library, London . Not in copyright.
126 graham coop
sured in Millions of years, be quantified as

log (X2/X1 ) log (X2 ) − log (X1 )
= (8.6)
∆t ∆t
by expressing this the log of the ratio3 , we are looking at the propor- 3
Note that here, as elsewhere, log
tional fold change, which makes sense as a evolutionary change of 1cm refers the natural logarithm, i.e. log
base e. We’ll make it clear if we using
in length is more impressive if you’re a mouse than an elephant. By log in a different base, e.g. we’ll use
putting this on a log-scale we are looking at the fold relative change log10 for log in base 10.
Haldane called the units of this measure ‘the Darwin’, with a one
Darwin change corresponding to a e ≈ 2.71 fold change in a Million
years, a two Darwin change corresponding to a e2 ≈ 7.34 fold change
in a Million years and so on.
Question 4. Calculate the rate of change in body size in the
Jersey red deer from Question 3 in Darwins. Do the same for the total
change in corn oil content in the up lines in Figure 8.8.
Gingerich (1983) examined the absolute rate of phenotypic
change in field study data and the fossil record, a dataset considerably
expanded by Uyeda et al. (2011). In Figure 8.14 each point is an
observation of phenotype evolution. The x-axis shows the time period
in years over which the evolutionary change was observed, the x-axis is
plotted on a log10 scale. The y-axis shows absolute rate of phenotypic
change, measured in Darwins, again on a log10 ,
Figure 8.13: Variation in Atlantic dog

whelks (Nucella lapillus, synonym
Purrpura lapukkus) along the coast of
Abs.pheno.change,~Darwins
Great Britain.
106
The Cambridge natural history, Molluscs

and Brachiopods (1895). Cooke AH, Shipley
AE, Reed FRC. Image from the Biodiversity
104
● Heritage Library. Contributed by University

of Toronto - Earth Sciences Library. Not in
copyright.
●
102
Figure 8.14: The absolute rate of

phenotypic evolution, measured in
100
Darwins, plotted against the time

● interval over which the evolution was
observed. The orange points show
10−2
direct observations of phenotypic

change in historical and contemporary
10−4
populations. The blue dots give

changes observed in the fossil record.
The three black dots left to right give
10−1 100 101 102 103 104 105 106 107 examples from Dog whelks, our Red
deer example, and Triceratops. Based
years
on an original plot by Gingerich
(1983) using an expanded dataset
from Uyeda et al. (2011). Code
here.
Over short timescales we see incredibly rapid evolution, note the
high rates on the left of Figure 8.14. For example, the first black dot
population and
quantitative
genetics 127
from the left is a case of evolution over decades in dog whelks. The
invasion the green crab (Carcinus maenas) drove the evolution of more
robust shells in Atlantic dog whelk (Nucella lapillus) in response to
predation along the North American coast (Vermeij, 1982). The
shell lip thickness of dog whelks in the St. Andrews, New Brunswick
population had changed from 0.94mm to 1.44mm in just 25 years.
That’s a 50% increase, and a rate of 17060 Darwins.
However, when we observe phenotypic evolution over longer time
periods it is usually much slower. For example, the rightmost black
dot in Figure 8.14 shows the phenotypic evolution along the lineage
leading to Triceratops. Triceratops measured in an impressive 25.9–
29.5 ft in length. They evolved from a close relative of Protoceratops,
which was a bit bigger than a sheep at ∼5.9 ft in about 7.5 million
Figure 8.15: The evolution of Tricer-
years (Colbert, 1948). However, that’s only a phenotypic change of atops from Protoceratops, see here
0.143 Darwins, its only a roughly four fold change in millions of years. for a fun updated view of the Coro-
nosauria phylogeny. See these figures
These rates of change in Dinosaurs have nothing on our dog whelks,
from Holtz for an updated & fuller
or many other examples of evolution on short time scales. Thus evo- phylogeny.
The dinosaur book : the ruling reptiles and
lutionary changes we can observe over short timescales readily explain their relatives. (1951) Colbert, E.H. Image
from the Internet Archive. Contributed by
long term changes in quantitative phenotypes. American Museum of Natural History Library.
No known copyright restrictions.
8.1 Fitness and the Breeder’s Equation.
So directional evolution occurs as selection drives a change in the

mean phenotype within a generation. But precisely how does this
relate to the natural-selection requirement that organisms vary in their
fitness? Some different ways of formulating the Breeder’s equation
give us insight into the conditions for directional selection and the
relationship to fitness landscapes.
8.1.1 Directional selection as the covariance between fitness and

phenotype.
To think more carefully about this change within a generation, let’s

think about a simple fitness model where our phenotype affects the
viability of our organisms (i.e. the probability they survive to repro-
duce). The probability that an individual has a phenotype X before
selection is p(X = x), so that the mean phenotype before selection is
∫ ∞
µBS = E[X] = xp(x)dx (8.7)
−∞
The probability that an organism with a phenotype X survives to

reproduce is w(X), and we’ll think about this as the fitness of our
organism. The probability distribution of phenotypes in those who do
128 graham coop
survive to reproduce is
p(x)w(x)
P(X|survive) = ∫ ∞ . (8.8)
−∞
p(x)w(x)dx
where the denominator is a normalization constant which ensures that

our phenotypic distribution integrates to one. The denominator also
has the interpretation of being the mean fitness of the population,
which we’ll call w, i.e.
∫ ∞
w= p(x)w(x)dx. (8.9)
−∞
Therefore, we can write the mean phenotype in those who survive to

reproduce as ∫
1 ∞
µS = xp(x)w(x)dx (8.10)
w −∞
If we mean center the distribution of phenotypes in our population,
i.e. set the phenotype before selection to zero, then
∫
1 ∞ 1
S = µS = xp(x)w(x)dx = E (Xw(X)) (8.11)
w −∞ w
where the final part follows from the fact that the integral is taking
the mean of Xw(X) over the population. Figure 8.16: Red deer (Cervus ela-
phus).
As our phenotype is mean centered (E(X) = 0), we can see that British mammals. Thorburn, A. (1920) Image
S has the form of a covariance4 between our phenotype X and our Contributed by Field Museum of Natural
History Library. Licensed under CC BY-2.0.
4
See our math appendix Equation
relative fitness w(X)/w. A.39 for the definition of covariance.
S = E (X w(X)/w) = Cov (X, w(X)/w) (8.12)
Thus our change in mean phenotype is directly a measure of the

covariance of our phenotype and our fitness. Rewriting our breeder’s
equation using this observation we see
VA
R= Cov (X, w(X)/w) (8.13)
V
we see that the response to selection is due to the fact that our fit-
ness (viability) of our organisms/parents covaries with our phenotype,
and that our child’s phenotype covaries with our parent’s phenotype.
Fitness Gradients and linear regressions To understand this in more

detail let imagine that we calculate the linear regression of an individ-
ual i’s mean-centered phenotype (Xi ) on fitness (Wi ), i.e.
Wi ∼ βXi + w (8.14)
The best fitting slope of this regression (β), see math appendix around
eqn A.43 for more on linear regression, lets call it the ‘fitness gradi-
ent’, is given by
β = Cov(X, w(X)/w)/V (8.15)
population and
quantitative
genetics 129
i.e. the fitness gradient is the phenotype-fitness covariance di-
vided by the phenotypic variance. Using this result we can rewrite
the breeder’s equation as
R = VA β (8.16)
i.e. we’ll see a directional response to selection if there is a linear
relationship of phenotype on fitness, and if there is additive genetic
variance for the phenotype. As one example of a fitness gradient, in
●
Lifetime Reproductive Success

Figure 8.17 the lifetime reproductive success (LRS) of male Red Deer
40
is plotted against the weight of their antlers. The red line gives the
30
linear regression of fitness (LRS) on antler mass and the slope of this ●
●
20
line is the fitness gradient (β).
●
● ●
● ●
● ●
●
● ● ●●
● ●
10
● ● ● ● ●●
● ●
●●
● ● ●
● ● ●● ● ●
● ●●
●● ● ●● ●
Fisher’s fundamental theorem of natural selection Finally how does ●● ●

●
●●●●●●● ●
●
● ● ●
●●● ●●● ●●●
● ●● ● ●
● ●●●●●
● ●●● ● ● ●●
●
●●
●
●●
●●
●●
●●
●
●
●
●●
●
●
●●●
●
●●
● ●●
●
● ● ●
0
the mean fitness of our population evolve? If we choose relative fitness 300 400 500 600 700 800 900 1000
to be our phenotype (X = w(X)/w), then the response in fitness is Antler mass
VA VA Figure 8.17: Lifetime reproductive

R= Cov (w(X)/w, w(X)/w) = V
V V success (LRS) of male Red Deer as
a function of their antler mass. Data
= VA (8.17)
from Kruuk et al. (2002); see the
paper for discussion of the complexi-
i.e. the response to selection is equal to the additive genetic variance ties of equating this selection gradient
for relative fitness. Or as Fisher put it with the evolutionary response. Code
here..
“The rate of increase in fitness of any organism at any time is equal to
its genetic variance in fitness at that time.” -Fisher (1930) (pg 37)
Fisher called this ‘the fundamental theorem of natural selection’. Our

proof here is just a sketch, and more formal approaches are needed
to show it in generality. There has been much gnashing of teeth over
exactly how broadly this result holds, and exactly what Fisher meant
(see Ewens, 2010, for a recent overview).
8.1.2 Directional Selection on Fitness Landscapes.

One common metaphor when we talk about evolution is that of a
population exploring an adaptive landscape with natural selection
5
pushing a population towards higher fitness states corresponding to This follows from the fact that we
can then move the derivative inside
peaks in this landscape (see e.g. Figure 8.18). Lande (1976) found the integral of w, eqn (8.9), to write
an evocative formulation of the Breeder’s equation which aids our the new term in eqn (8.19) as
intuition of phenotypic fitness landscapes. Lande showed that, if the ∫ ∞
1 ∂w 1 ∂p(x)
= w(x) dx
phenotype is normally distributed, the response to selection (R) could w ∂ x̄ w −∞ ∂ x̄
∫ ∞
be written in terms of the gradient (derivative) of the mean fitness (w) w(x) (x − x̄)
= dx
−∞ w V
of the population5 as a function of the mean phenotype:
cov(w(x), x)
= (8.18)
VA ∂w var(x)
R= (8.19)
w ∂ x̄ which is β, so that eqns (8.16) and
What does this mean? Well VA/w is always positive, so the direction (8.19) are equivalent. For this equiv-
alence to hold, in the first line we
our population responds to selection is predicted by the sign of the assume that w(x) is not a function of
x̄, while the middle line is true when
p(x) is the normal distribution.
130 graham coop
1 ∂W
W ∂x
my.y x[−num.points]
W
Figure 8.18: An example of a fitness

landscape, showing the mean fitness
of the population (w) as a function
derivative (see Appendix Section A.1 for more on derivatives). If in-
of the mean phenotype of the pop-
creasing the mean phenotype of the population slightly would increase ulation (x̄. The arrows show the
mean fitness (∂w/∂ x̄ > 0) our population will respond that generation expected direction of movement of
our population on the fitness land-
by evolving toward higher values of the trait (R > 0), left panel of scape, with natural selection moving
Figure 8.19. Conversely, if decreasing the population mean phenotype our population toward local fitness
optima. The coloured bar shows the
slightly would increase the mean fitness (∂w/∂ x̄ < 0) the population
derivative (slope) of the mean fit-
will that generation evolve towards lower values of the phenotype ness with respect to mean phenotype
(middle panel of Figure 8.19). Thus, if selection pressures remain (eqn. (8.19)). Red values are positive
slopes corresponding to the popula-
constant, we can think of the population as evolving on an adaptive tion evolving towards the right of the
landscape where the elevation is given by the population mean fit- page, blue is a negative slope with the
population moving to the left.
ness. Natural selection operates on the basis of individual-level fitness,
but as a result of this our population is increasing in its average fit-
ness, i.e. our population is becoming better adapted. We’ll discuss the
caveats of this hill-climbing interpretation below.
All
Counts
Counts
Counts
Survivors
d d d
Fitness (w)
Fitness (w)
Fitness (w)
Fitness
Mean Fitness
Selection differential
Phenotype (x) Phenotype (x) Phenotype (x)
Figure 8.19: A population evolving

on a (guassian) fitness surface. The
What happens when it reaches the top of a peak? Well at the top bottom panel shows the expected
of a peak ∂w/∂ x̄ = 0, as it is a local maximum, and so R = 0. As- individual fitness (w()) and mean
fitess as a function of phenotype. The
red line shows the best fitting linear
approximation to the relationship
between phenotype and individual
fitness, eqn (8.14), whose slope is β.
The top panel shows the distribution
of the phenotype before and after
selection. Code here..
population and
quantitative
genetics 131
suming that the relationship between fitness and phenotype stays
constant, our population will stay at the top of the fitness peak. This
view of natural selection does not imply that the population is evolv-
ing to the best possible state. Our population is just marching up the
hill of mean fitness (end panel Figure 8.19). However, this peak isn’t
necessarily the highest fitness peak but simply whichever peak was
closest. So our population can become trapped on a local, but not
Touching Pterygiophores
global peak of fitness (see, for example Figure 8.18). 0.6 0.8 1.0 1.2 1.4
One dramatic example documenting adaptive evolution to a new
0
●
●
●
●
●
fitness optimum is offered by a remarkable time-series of stickleback ●
●
●
●
●
●
●
evolution from a fossil lake-bed in Nevada (Bell et al., 2006). In ●
●
●
●
●
−stickleback_traj$gen
●
5000
●
●
this lake the layers of sediment are laid down each year allowing a ●
●
●
●
●
●
Years
●
very detailed time series with over five thousand fossils measured. The ●
●
●
●
●
●
●
time-series documents the evolution towards a new set of optimum ●
●
●
10000
●
●
●
phenotypes in the fifteen thousand years after the initial invasion of ●●
●
●
●
●
●
the lake by a heavily armoured stickleback species. In Figure 8.20 ●
●
●
●
●
●
●
the population mean number of touching pterygiophores, the bones ●
1.0 15000
●
●
●
supporting the dorsal spines, through the fossil record (Figure 8.21).
Note how quickly the species evolves toward its new value, presumably
fitness.surf
Fitness
stickleback_traj$touching
a fitness optimum in their new environment, and the long subsequent
0.5
time interval over which the population mean phenotype fluctuates
0.6 0.8 1.0 1.2 1.4
about its new value. Touching Pterygiophores
Hunt et al. (2008) fitted a model of a population adapting to a Figure 8.20: Top) A time series of
fitness landscape, with a single peak, to these time-series data. Their stickleback phenotypic evolution
fitted fitness surface is shown in the lower panel of Figure 8.20 . The from the fossil record. After a heavily
armoured stickleback invades the
arrows show the moves that the population mean phenotype is making lake it quickly evolves towards fewer
on this inferred fitness surface. The population initially takes large touching pterygiophores (the bones
supporting the dorsal spines). Fossil
steps up toward the peak of this surface and subsequently fluctuates measurements means are calculated
around the peak. Under the interpretation that there is a single sta- in 250 year bins. Bottom) How our
tionary peak these fluctuations represent genetic drift randomly knock- population moves on the Inferred
fitness landscape. The arrows show
ing the population off its optimum, with selection acting to restore the each move made by the population
population towards this local optimum. in the 250 intervals. Data from Bell
et al. (2006) and Hunt et al. (2008)
Code here.
Issues with the interpretation of fitness landscapes. In practice, fit-
ness landscapes may not be constant. The environment may be con-
stantly changing so our population is constantly forced to change to
keep up with the fitness peak. Indeed our environment may change
so quickly that our population cannot keep up with the peak. Our
population is still trying to increase its mean fitness, to ‘adapt’, but
the landscape itself is evolving.In the case of very rapid environmental
change our population may slide further and further away the peak,
and as a consequence its mean fitness decreases which may drive the
population to extinction if our population drops below w < 1 for long Figure 8.21: Fossil stickleback. Photo
enough. The conditions for extinction are an active area of research by Peter J. Park from Losos et al.
132 graham coop
in the field of ‘Evolutionary rescue’. More generally, for our fitness

landscape result (eqn (8.19)) to hold, and for us to be able to talk of
our population attempting to evolve to higher mean fitness states, we
need the fitness of our phenotypes to be independent of the frequency
of other phenotypes in the population. (This independence allows us
to assume that the fitness of individuals is not a function of the mean
phenotype, as needed in eqn (8.18)). The assumption of frequency
independence may not hold when there is competition between indi-
viduals, e.g. for resources or mates, as then the fitness of an individual
depends on the strategies pursued by other individuals in the popula-
tions.
8.1.3 Stabilizing and Disruptive selection

Up to now we have just looked at directional selection, where selection
acts to change the mean phenotype. However, we can also use quanti-
tative genetic models to describe other modes of selection, extending
from effects on the population mean the next natural step is to think
about selection which acts on the population variance. Selection might
act more strongly against individuals in the tails of the distribution,
with those closer to the mean phenotype having higher fitness, which
lowers the variance. Selection could also disfavour individuals close
to the population mean, with individuals with extreme phenotypes
having higher fitness, which acts to increase the variance of the popu-
lation.
Directional selection occurs because of the covariance between our
phenotype and fitness, eqn (8.12). Just as expressing directional selec-
tion as a covariance allowed us to characterize directional selection as
the linear relationship between fitness and phenotype, β, we can sum-
marize the variance reducing selection by including a quadratic term
in the regression of fitness on phenotype
wi ∼ βxi + 1/2γx2i + w (8.20)
This γ, the coefficient of the quadratic term in our model, is the

quadratic selection gradient: the covariance of fitness and the squared
deviation from the phenotypic mean (µBS ), i.e.
( )
Cov w(X), (X − µBS )2 Just like how β could be interpreted
γ= (8.21) as the mean gradient of the fitness
V2
surface, our γ is the mean curvature
Our γ describes the curvature of the fitness surface around the mean. of the fitness surface
∫
Values of γ < 0 are consistent with stabilizing selection, reducing [ ]
γ = E ∂ 2 w(x)/∂x2 = ∂ 2 w(x)/∂x2 p(x)dx
the variance. While values of γ > 0 are consistent with disruptive
(8.22)
selection, increasing the variance. see Appendix Section A.1 for more on
Under stabilizing selection the individuals with extreme phenotypes 2nd derivatives.
in either tail have lower fitness, the result of which is to reduce the
population and
quantitative
genetics 133
Figure 8.22: Bars show the total
1.00
● ● ●
number of births with different birth
2500
●
●
weights (left axis) Dots show the
0.50
●
● mortality probability for different
2000
Number of births
● birth-weight bins (right axis), the red
line shows a fitted quadratic model
0.20
Mortality
●
1500
to mortality. Data from Karn and
Penrose (1951) Table 2, collapsing
0.10
●
male and female births, Code here.
1000
● ● ●
0.05
●
500
●
●
●
● ●
0.02
●
●
●
0
0 5 10 15
Birth Weight (lb)
phenotypic variance within a generation. A classic case of stabilizing

selection is birth weight in humans (Karn and Penrose, 1951).
Mary Karn collected data for nearly fourteen thousand pregnancies
from 1935-46 for birth weight and mortality. These data are replotted
in Figure 8.22. The variance of all births is 1.575lb2 , while in live
births this was reduced to 1.26lb2 , a 20% reduction in variance due to
stabilizing selection. It is worth noting that this selection pressure has
been greatly reduced over the decades in societies with access to good
prenatal care (Ulizzi and Terrenato, 1992).
In Central Africa, Black-bellied seedcrackers (Pyrenestes ostrinus)
show disruptive selection on a remarkable beak-size polymorphism
Figure 8.23: Lesser seedcracker Pyren-
(Figure 8.24). The small-beaked individuals feed on soft seeds from estes minor a close relative of the
one species of marsh sedge while the big-beaked individuals feed on Black-bellied seedcracker, whose beak
is about the same size as the smallest
hard seeds from another sedge, which requires ten times the force Black-bellied individuals.
The birds of Africa, comprising all the species
to crack. Smith (1993) recorded the fates of hundreds of juveniles, which occur in the Ethiopian region. (1986)
Sclater, W. L Plate by H. Grönvold Image from
and found that individuals with intermediate beak sizes survived at the Biodiversity Heritage Library. Contributed
by Smithsonian Libraries. Not in copyright.
much lower rates (Figure 8.24) because they were not well adapted to
either seed resource. Break length is subject to disruptive selection,
as can also be seen by the significant negative quadratic term in the
regression of survival probability on break length. The variance of
mandible length in the total sample of individuals was 0.5mm2 in the
survivors this variance increased by a factor of 2.5 to 1.3mm2 .
To illustrate how directional selection and quadratic terms play
off during adaptation, lets consider the goldenrod gall fly (Eurosta
solidaginis), aka the goldenrod ball gallmaker. See Figure 8.26. As it’s
wonderful name implies this insect lays its eggs in Goldenrod plants,
and the larvae release chemicals forcing the plant to form a gall that
forms a home for the larvae as they develop. While this seems like a
pretty sweet deal for the larvae, it is not without its perils.
134 graham coop
Figure 8.24: Left An illustration of
1.0
the the remarkable variation in beak
predict(my.model.bin, type = "response")
25
size within Black-bellied seedcrackers
0.8
(P. ostrinus). Right A histogram of
20
Prob. Survival
a beak size measurement in Black-
0.6
Count
bellied seedcrackers. All juveniles
15
are shown in grey, while the black
0.4
bars show the survivors. The red
10
curve shows the best fitting linear and
0.2
quadratic model to the probability
5
of survival, fitted using a binomial
0.0
0
generalized linear model with a logit
6.6 7.0 7.5 7.9 8.4 8.8 9.4 9.8
Lower mandible length (mm) link function.
Left illustration from: Size variation in
Pyrenestes by Chapin J.P. in the Bulletin of
the American Museum of Natural History
(Vol. XLIX 1923) Image from the Biodiversity
Heritage Library. Contributed by Toronto
Library. Not in copyright.
When the small, ball galls fall risk of parasitism from parasitoid
wasps. When all the ball galls are small in the population selection
drives strong positive directional selection on gall size, with little sta-
bilizing selection. Notice in the left panel of Figure 8.26 the good
agreement between the linear selection gradient and the fit including a
linear and quadratic term. However, bigger galls fall under the pall of
predation from downy woodpeckers and black-capped chickadees, who
seek out the tasty larvae. Thus intermediate size galls are favoured, a
fitness peak that the population quickly reaches. Once on this peak,
as shown in the right panel of Figure 8.26 there is no directional selec-
tion, i.e. no linear slope, but there is strong stabilizing selection, i.e. a
quadratic term. Thus the population will be maintained at this fitness
peak indefinitely if the environment remains unchanged.
Figure 8.25: The gall formed by the

Counts
Counts
All goldenrod ball gallmaker (Eurosta

Survivors
solidaginis) in a goldenrod plant.
The one on the right is cut to show a
d d partial cross-section.
Annual report of the New York State Museum
● ●
(1917) Image from the Biodiversity Heritage
0.6
0.6
● ●
●
● ●
● Library. Contributed by The LuEsther T Mertz
● ● ● ●
Library, the New York Botanical Garden. Not
● ●
in copyright.
Fitness (w)
Fitness (w)
0.5
0.5
● ● ● ●
0.4
0.4
● ●
● ●
● Fitness ●
0.3
0.3
● ●
● Mean Fitness ●
● ●
● Selection differential ●
Figure 8.26: Fitness surface for gall
0.2
0.2
● ●
●
● ● ● Quadratic ●
● ● ●
diameter in goldenrod ball gallmakers.

10 15 20 25 30 10 15 20 25 30
Gall size (mm) Gall size (mm) The dots are the measured survival
probabilities of bins of different
sized galls.The solid line is a fitted
individual fitness surface (w( )).
Dotted line is w plotted as a function
of the population mean assuming a
normal distribution with a standard
deviation of 2mm. Data from Weis
and Gorman (1990), Code here.
9
The Response of Multiple Traits to Selection.
The fitness of an organism depends on the outcome of many different

organismal processes and phenotypes. Thus natural selection is often
acting on many phenotypes in concert. In some cases the various
directions that selection tries to pull the population phenotypes may
not all be possible to satisfy all at once. Such fitness tradeoffs occur
when selection acts on genetic correlated phenotypes in contradictory
ways.
To understand the short-term consequence of selection on multi-
ple phenotypes we can generalize the Breeder’s equation to multiple
traits1 . Considering two traits we can write our responses in both 1
Lande, R., 1979 Quantitative
traits as genetic analysis of multivariate evo-
lution, applied to brain: body size
allometry. Evolution 33(1Part2):
R1 = VA,1 β1 + VA,1,2 β2 402–416
R2 = VA,2 β2 + VA,1,2 β1
(9.1)
where the 1 and 2 index our two different traits. Here VA,1 and VA,2
are the additive genetic variance for trait 1 and 2 respectively, while
VA,1,2 is our additive covariance between our traits. Our selection gra-
dient for trait 1, β1 , represents the change in fitness as you change
trait 1 alone holding other traits constant constant. These β can be
estimated by multivariate regression, see brelow. The multivariate
breeders equation is a statement that our response in any one pheno-
type is modified by selection on other traits that genetically covary
with that trait.
We can also write this equivalently in matrix form, for an arbitrary
number of traits. Writing our change in the mean of our multiple
phenotypes within a generation as the vector S and our response
across multiple generations as the vector R. These two quantities are
related by
R = GV−1 S = Gβ (9.2)
where V and G are our matrices of the variance-covariance of pheno-
136 graham coop
types and additive genetic values (eqn. (7.20) (7.19)) and β is a vector
of selection gradients (i.e. the change within a generation as a fraction
of the total phenotypic variance). Note that β = V−1 S, such that
each β represents the selection gradient on a trait accounting for it’s
phenotypic covariances with other traits.
An example of the outcome of selection on multiple phenotypes
consider the bout of selection measured by Grant and Grant
(1995) in medium ground Darwin’s finch (Geospiza fortis). They mea-
sured 634 birds in ’76, of which only 15% survived to 1977. The birds
who survived were heavier and had longer, deeper bills than average.
Trait Mean before Selection (1976) S β Mean next gen. (1978)

Weight 16.06 0.74 0.477 17.13
Bill Length 10.63 0.54 -0.144 10.95
Bill Depth 9.21 0.36 0.528 9.70 Table 9.1: Trait means and selection
differentials and gradients from an
episode of selection in Geospiza fortis.
Accounting for the phenotypic covariances among the traits (V−1 ), Numbers from table 2 & 3 of Grant
they found that both weight and bill depth showed direct directional and Grant (1995).
selection towards larger values (positive βs). However, bill length

showed weak selection towards shorter beaks (negative β), reflecting
the fact that bill length shows positive phenotypic correlation with bill
depth and weight, and most of the direct selection was on weight and
bill depth dragging bill length along. Looking at the next generation
all three traits have all significantly increased. Thus despite selection
posssibly favouring shorter bill lengths, and certainly not favouring
long bills, bill length increased in the next generation due to its pos-
itive genetic covariance with two traits that selection was acting to
increase.
Question 1. You collect observations of red deer within a genera-
tion, recording an individual’s number of offspring and phenotypes for
a number of traits which are known to have additive genetic variation.
Using your data, you construct the plots shown in Figure 9.1 (stan-
dardizing the phenotypes). Answer the following questions by choosing
one of the bold options. Briefly justify each of your answers with refer-
ence to the breeder’s equation and multi-trait breeder’s equation.
A) Looking just at figure 9.1 A, in what direction do you expect
male antler size to evolve?
Insufficient information, increase, decrease.
B) Looking just at figures 9.1 B and C, in what direction do you
expect male antler size to evolve?
C) Looking at figures 9.1 A, B, and C, in what direction do you
expect male antler size to evolve?
population and
quantitative
genetics 137
1.5
● ● ● ● ● ● ●●●
● ●● ● ●
4
4
●
●
●
●
● Figure 9.1: Observations of red deer
1.0
within a generation; recording an
1/2−brother's antler size

●
●
●
Number of offspring
Number of offspring
●
individual’s number of offspring and
0.5
● ● ●● ● ●● ● ●
●●● ● ● ● ● ● ●●
● ●
● ●● ●
●●● ●● ●
3
3
● ●
● ●
●
● ● ●
● ●
phenotypes (simulated data), which

●
●●
0.0
●
●
●● ●
● ●
●●
●● ●● ●●●
●● ●●
● ● ● ● ● ● ● ●
● ● ● ●● ●● ● ●
are known to have additive genetic
2
2
●
−0.5
●
● ● ●
●
●
variation. The figures left to right
−1.0
● ● ●
● ●
● ● ● ● ● ● ●● ● ●● ●●
● ● ● ●
are A-C. (Data are simulated. Code
1
1
● ●
−1.5
●
here.)
−2.0
● ● ● ●
0
−3 −2 −1 0 1 2 −1 0 1 2 −1 0 1 2
Male antler size Female leg length 1/2−sister's leg length
As an example of correlated responses to selection, consider the

Wilkinson (1993) selection experiment on Stalk-eyed flies (Cyr-
todiopsis dalmanni). Stalk-eyed flies have evolved amazingly long
eye-stalks. In the lab, Wilkinson established six populations of
wild-caught flies and selected up and down on males eye-stalk to body
size ratio for 10 generations (left plot in Figure 9.2). Despite the fact
that he did not select on females, he saw a correlated response in the Figure 9.2: Wilkinson selected
two populations of flies for increased
females from each of the lines (right plot), because of the genetic cor- eye-stalk to body length ratio in
relation between male and female body proportions. males (mean shown as up triangles),
and two for a decreased ratio (down
triangles), by taking the top 10 males
with the highest (lowest) ratio out
of 50 measures. He also established
Males Females two control populations (circles). He
constructed each generation of females
0.95
by sampling 10 at random from each

1.30
Eye−span/body length
Eye−span/body length
population. Data from Wilkinson

(1993). Code here.
0.90
1.25
● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
●
● ●
● ●
●
0.85
1.20
0.80
1.15
0 2 4 6 8 10 0 2 4 6 8 10
generations generations
Question 2.
At the end of ten generations in Wilkinson’s experiment (Figure
9.2), the males from the up- and down-selected lines had mean eye-
stalk to body ratios of 1.29 and 1.14 respectively, while the females
from the up- and down-selected lines had means of 0.9 and 0.82.
A) Wilkinson estimated that by selecting the top/bottom 10
males, he had on average shifted the mean body ratio by 0.024 within
Figure 9.3: Stalk-eyed Flies (Diopsi-
dae).
Diptera. van der Wulp. 1898. Image from the
Smithsonian Libraries. Not in copyright.
138 graham coop
each generation. What is the male heritability of eye-stalk to body-

length ratio?
B) Assume that the additive genetic variance of male and female
phenotypes are equal and that there is no direct selection on female
body-proportion in this experiment, i.e. that all of the response in
females is due to correlated selection. Can you estimate the male-
female genetic correlation of the eye-stalk ratio?
●
3
● ●
3
●
●
2
2
●
●
●
●
Stripes
Stripes
Stripes
● ● ●
● ●
1
1
● ● ●
● ● ● ● ●
●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ● ●
0
0
●
0
● ● ●
● ●
● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
●
−1
−1
● ● ● ●
● ● ● ● ● ●
−1
● ● ● ● ● ● ● ● ●
−1 0 1 2 3 4 −1 0 1 2 3 4 −1 0 1 2 3 4
Reversals Reversals Reversals
Figure 9.4: Left) The garter

snake fitness surface estimated by
Brodie I I I (1992) lighter colours
Estimating multivariate selection gradients We can estimate multi- indicate higher relative fitness. Mid-
variate directional (β) and quadratic selection gradients (γ) just as we dle) The phenotypes of all of the
snakes released by Brodie, each dot
did for a single traits (x1 and x2 ), using linear and quadratic models is an individual. Right) The pheno-
(in eqn (8.14) and (8.20)). For example, for two traits we can write types of surviving snakes. Note how
snakes in the top left and bottom
wi ∼ β1 x1,i + 1/2γ1 x21,i + β2 x2,i + 1/2γ2 x22,i + γ1,2 x1,i x2,i + w (9.3) right corner are over represented in
the survivors. Data from Brodie I I I
(1992) Code here..
where β1 and γ1 are the directional and quadratic selection gradients
for trait one, and similarly for trait two (Lande and Arnold,
1983). The covariance selection gradient between traits is given by
γ1,2 . This technique for measuring multivariate selection is sometimes
called ‘Lande-Arnold regression’.
Brodie I I I (1992)’s work provides a nice example of selection
on multiple predation-avoidance traits in northwestern garter snakes
(Thamnophis ordinoides). Brodie I I I released hundreds of snakes
born in the lab into the wild, and then performed mark-recapture
observations to monitor their fate.
Before releasing them he measured how stripy they were, and their
behavioural tendency to reversals of direction during simulated flight
from a predator. His quadratic fitness surface is shown in Figure 9.4,
Figure 9.5: Northwestern garter snake
based on fitting the regression given by eqn (9.3) to juvenile survival. (Eutaenia cooperi, now Thamnophis
He found that neither single trait directional or quadratic gradients ordinoides)
The natural history of Washington territory,
with much relating to Minnesota, Nebraska,
Kansas, Oregon, and California (1859).
Cooper J.G. and Suckley, G. Image from the
population and
quantitative
genetics 139
were significant, i.e. there was no apparent selection on one trait ig-
noring the other. However, there was a significant negative covariance
(γ1,2 < 0). The individuals with the highest chance of survival are ei-
ther highly striped and perform few reversals (top left corner), or have
little striping but reverse course frequently (bottom right corner).
9.1 Some applications of the multivariate trait breeder’s equa-

tion
The multivariate breeders equation has a lot of different uses in un-
derstanding the response of multiple traits to selection. It also offers
strong insights into the mechanistic underpinnings of kin selection and
sexual selection. We’ll discuss these next.
9.1.1 Hamilton’s Rule and the evolution of altruistic and selfish be-
haviours
“ ‘The only reason for making a buzzing-noise that I know of is be-
cause you’re a bee.’ Then [Pooh] thought another long time, and
said: ‘The only reason for being a bee that I know of is to make
honey...And the only reason for making honey is so as I can eat it.’ ”
–Winnie-the-Pooh, Milne and Shepard (1926).
One of the seismic shifts caused by Darwin’s work was the realisa-
tion that organisms don’t exist for the benefit of other individuals or
other species. Bees didn’t evolve to pollinate flowers, any more than
they evolved to make honey for bears. If we can say that there is a
‘reason’ why an organism exist it is only to leave offspring to the next
generation. Pooh can be forgiven for straying from Darwinian thought,
as he exists for the benefit of Christopher Robin and other childrens’
bedtime stories.
However, there’s a wrinkle to this Darwinian view. Worker bees
don’t make honey to benefit their offspring, they are sterile and are
working for the benefit of the Queen bee and her offspring. Individ- Maynard Smith (1964) coined the
uals frequently behave in ways that sacrifice their own fitness for the name kin selection to describe Hamil-
ton’s approach to this problem. It’s
benefit of others. That selection favours such apparent acts of altru- also sometimes called the inclusive fit-
ism is puzzling at first sight. Hamilton (1964a,b) supplied the first ness approach, as we need to include
not just one individual’s fitness but
general evolutionary explanation of such altruism. His intuition was the weighted sum of all the fitness of
that while an individual is losing out of some reproductive output, all their relatives.
the alleles underlying an altruistic behaviour can still spread in the
population if this cost is outweighed by benefits gained through the
transmission of these alleles through a related individual. Note that
this means that the allele is not acting in an self-sacrificing manner,
even though individuals may as a result.
Altruism reflects social interactions. So as a simple model let’s
imagine that individuals interact in pairs, with our focal individual i
140 graham coop
being paired with an individual j. Imagine that individuals have two

possible phenotypes X = 1 or 0, corresponding to providing or with-
holding some small act of ‘altruism’ (we could just as easily flip these
labels and call them an unselfish act and a selfish act respectively).
Our pairs of individuals interacting could, for example, be siblings
sharing a nest. The altruistic trait could be as simple as growing at
a slightly slower rate so as to reducing sibling-competition for food
from parents, or more complicated acts of altruism such as children
foregoing their own reproduction so as to help their parents raise their
siblings.
Providing the altruistic act has a cost C to the fitness of our in-
dividual and failing to provide this act has no cost. Receiving this
altruistic act confers a fitness benefit B over individuals who did not
receive this act. Hamilton’s rule states that such a trait will spread
through the population if
2F B > C (9.4)
where F is the average kinship coefficient between the interacting

individuals (i and j). In the usual formulation of Hamilton’s Rule
our 2F is replaced by the ‘Coefficient of relationship’, which is the
proportion of alleles shared between the individuals. Here we use two
times the kinship coefficient to keep things inline with our notation for
these chapters. Note that if our individuals are themselves inbred we
need to do a little more careful to reconcile these two measures. So the
altruistic behaviour will spread even if it is costly to the individual if
its cost is paid off by the benefit to sufficiently related individuals.
As one example of kin-selection consider Krakauer (2005)’s
work on co-operative courtship in wild turkeys (Meleagris gallopavo).
Male turkeys often form display partnerships, with a subordinate male
helping a dominant male with displaying to females and defending the
females from other groups of males.
These pairs are often full brothers (F = 0.25), with the subordi-
nate male often being the younger of the two. The subordinate male
often loses out on mating opportunities over their entire lifetime by
acting as a wingman for their older brothers. Krakauer (2005)
estimated that dominant males gained an extra 6.1 offspring when
they display with a partner than males who display alone. While Figure 9.6: Turkey (Meleagris gal-
the subordinate males lose out on fathering 0.9 offspring compared lopavo).
Bilder-atlas zur Wissenschaftlich-populären
Naturgeschichte der Vögel in ihren sämmtlichen
to solitary males. Thus the costs of helping by subordinate males Hauptformen (1864). Wien,K.K. Hof Image
is more than compensated by the fitness gains of their brothers ( Contributed by Smithsonian Libraries. Not in
copyright.
(2 × 0.25) × 6.1 > 0.9), and so the evolution of this altruistic helping
in co-operative courtship is potentially well explained by kin-selection
(see Akçay and Van Cleve, 2016, for more analysis).
Question 3. How would this answer be changed if the male

population and
quantitative
genetics 141
Turkey partnerships were only 1/2 sibs, or first cousins?
Fitness of ind. i
Where does this result come from? Well, we can use our quantita-
●
tive genetics framework to gain some intuition by deriving a simple C
version of Hamilton’s Rule by thinking about the phenotypes of an ●
individual’s kin as genetically correlated phenotypes. To sketch a proof

of this result, let’s assume that our focal i individual’s fitness can be 0 1
Altruistic pheno. of ind. i
written as
W (i, j) = W0 + Wi + Wj (9.5) ●
Fitness of ind. i
where Wi is the contribution of the fitness of the individual i due to
their own phenotype, and Wj is the contribution to our individual i’s B
fitness due to the interacting individual j’s behaviour (i.e. j’s phe-
notype). With the benefit B and cost C, our W (i, j) are depicted in ●
Figure 9.7. 0
Altruistic pheno. of ind. j
1
Following our multivariate breeder’s equation, we can write the

expected change of our behavioural phenotype as Figure 9.7: Top) The fitness of
individual i as a function of their
behavioural phenotype, where
R = βi VA + βj VA,i,j , (9.6)
altruistic/non-altruistic behavioural
phenotypes are encoded as 1 and 0
Our altruistic phenotype is increasing in the population if R > 0, i.e. if respectively. The direct fitness cost of
behaving altruistically is C. Bottom)
The fitness of our focal individual i as
βi VA + βj VA,i,j > 0 (9.7) a function of the behavioural pheno-
type of their interacting partner (j).
The slope βi of the regression of our focal individual’s behavioural Our focal individual gets an increase
phenotype on fitness is proportional to −C. The slope βj of the re- B in fitness if their partner behaves
altruistically. Code here.
gression of our interacting partner’s phenotype on our focal individ-
Here we’ve following a simplified
ual’s fitness is proportional to B (with the same constant of propor- version of Queller (1992)’s treat-
tionality). Therefore, our altruistic phenotype is increasing in the ment, to re-derive Hamilton’s rule
in a quantitative genetics framework
population if (Hamilton’s original papers did this in
a population genetics framework).
βi VA + βj VA,i,j >0
VA,i,j
B >C (9.8)
VA
So what’s the average genetic covariance between individual i and
j’s altruistic phenotype? Well it’s the same behavioural phenotype
in both individuals, so the phenotypes are genetically correlated if
our individuals are related to each other. The covariance of the same
phenotype between two individuals is just 2Fi,j VA (see (7.15)). So our
altruistic phenotype is increasing in the population if
2Fi,j VA
B VA >C
2Fi,j B >C (9.9)
Seen from this perspective, Hamilton’s rule is simply a statement

that altruistic behaviours can spread via kin-selection, if the average
142 graham coop
cost to an individual of displaying an altruistic phenotype, i.e. car-

rying altruistic alleles, is paid back through the average benefit of
interacting with altruistic relatives (kin).
Figure 9.8: A selection of the huge

diversity of Hymenoptera.
Naturgeschichte, Klassification und Nomen-
clatur der Insekten vom Bienen, Wespen und
Ameisen. Christ, JL Image from the Bio-
diversity Heritage Library. Contributed by
University of Illinois Urbana Champaign. Not
in copyright.
Under the kin-selection, relatedness and the breeding structure of

the populations are hypothesized to be a key factor in determining
the evolution of altruistic behaviours. One most impressive example
of the evolution of altruism is the repeated evolution of eusociality, Figure 9.9: Australian Honey-pot Ant
(Camponotus inflatus). Honey ants
where sterile castes have evolved to help to rear their siblings rather are gorged with honeydew collected
than their own offspring. Eusociality has evolved at least eight in- by their nest mates, till they swell to
the size of grapes, and are used as a
dependent times in Hymenoptera ( bees, wasps, and ants). There’s food storage device.
huge variation in mating systems in Hymenoptera from high levels of Ants, bees, and wasps; a record of observations
on the habits of the social Hymenoptera (1897)
Lubbock, J. Image from the Biodiversity
multiple mating to monandry. Hughes et al. (2008) conducted a Heritage Library. Contributed by Smithsonian
comparative phylogenetic analysis of mating system across hundreds
of Hymenoptera species. They found that each of the eight of eusocial
clades had monandry, females mating with a single male, as an an-
cestral state. Thus, eusociality initially evolved in populations where
relatedness was maximized among siblings.
9.1.2 Sexual selection and the evolution of mate preference by in-

direct benefits.
Organisms often put an enormous effort into finding and attracting
mates, sometimes at a considerable cost to their chances of survival.
Why are individuals so choosy about who they mate with, particu-
larly when their choice seems to be based on elaborate characters and
arbitrary displays that surely lower the viability of their mates?
One major reason why individuals evolve to be choosy about who Figure 9.10: Male (left) and female
they mate with is that it can directly impact their fitness. By choos- (right) common glow worm (Lampyris
noctiluca).
ing a mate with particular characteristics, individuals can gain more The animal kingdom : arranged after its
organization; forming a natural history of
parental care for their offspring, avoid parasites, or be choosing a mate animals, and an introduction to comparative
anatomy. (1863) Cuvier, G. Image from the
Biodiversity Heritage Library. Contributed
by University of Toronto - Gerstein Science
Information Centre. Not in copyright.
population and
quantitative
genetics 143
with higher fertility. For example, female glow-worms flash at night
200
●
to attract males flying by. Females with larger, brighter lanterns have
higher fecundity, so males with a preference for brighter flashes will
●
150
●
Number of Eggs
gain a direct benefit to their own fitness. (Note that males will bene- ● ●
100
●
●
fit even if these differences in female fecundity are entirely driven by ●

●
●
● ●
●
● ●
differences in environment, and thus non-heritable.) Indeed male glow ● ●

●
50
● ●
●
●
worms have evolved to be attracted to brighter flashing lures. ●

●
● ●
6 8 10 12 14 16 18
However, even in the absence of direct benefits of choice, selection Lantern Size (mm2)
can still indirectly favour the evolution of choosiness. These indirect

Figure 9.11: Female Glow worms
benefits occur because individuals can have higher fitness offspring who have the largest, and therefore
by choosing a mate whose phenotype indicates high viability (the brightest, lanterns have the highest
fecundity. Data from Hopkins et al.
so-called ‘good genes’ hypothesis), or by choosing a mate whose phe-
(2015). Code here.
notype is simply attractive, and likely to produce similarly attractive
offspring (the ‘runaway’ or ‘sexy sons’ hypothesis).
Figure 9.12: Left) Assortative mat-

ing between males and females.
Males vary in a display trait (e.g. tail
3
length), females vary in their prefer-

3
ence for this trait. We see evidence of

2
assortative mating as females with a

2
Mean sons' display trait
Father's display trait
preference for a particular value of the

1
male trait tend to mate with those

males. Right) As both male trait
0
and female preference are genetic this

establishes a genetic correlation in the
−1
−1
next generation. This is simulated

data. Code here.
−2
−2
−3
−3
−3 −2 −1 0 1 2 3 4 −4 −2 0 2
Mother's pref. trait Mean daughters' pref. trait
We’ll denote a display trait, e.g. tail length, in males by ♂ and

a preference trait in females by ♀. Our display trait is under direct
selection in males, such that its response to selection can be written as
R♂ = β♂ VA,♂ (9.10)
Let’s assume that the female preference trait, the degree to which
females are attracted to long tails, is not under direct selection β♀ = 0.
Then the response to selection of the preference trait can be written as
R♀ = β♀ VA,♀ + β♂ VA,♀♂ = β♂ VA,♀♂ (9.11)
So the female preference will respond to selection if it is genetically

correlated with the male trait, i.e. if VA,♀♂ is not zero. There’s a
number of different ways this genetic correlation could arise; the sim-
plest is that the loci underlying the male trait may have a pleiotropic
144 graham coop
effect on female preference. However, female preference may often

have quite a distinct genetic basis from male display traits.
A more general way in which trait-preference genetic correlations
may arise is through assortative mating. As females vary in their tail-
length preference, the ones with a preference for longer tails will mate
with long-tailed males and the opposite for females with a preference
for shorter-tails. Therefore, a genetic correlation between display and
preference traits will become established (see Figure 9.12).
The males with the longer tails will also carry the alleles associated
with the preference for longer tails, as their long-tailed dads tended to
mate with females with a genetic preference for long tails. Similarly,
the males with shorter tails will carry alleles associated with the pref-
erence for shorter tails. Thus if there is direct selection for males with
longer tails, then the female preference for longer tails will increase
too, as it is genetically correlated via assortative mating.
Figure 9.13: Mean phenotypes for

the two up- and two down-selected
populations of Guppies. Left panel:
0.22
Up Selection
A response to selection was seen
Mean Female Orange Preference
Down Selection
due to the direct selection on male
2.0
0.20
Mean Orange Area
colouration. Right panel: An indirect,

correlated response was also seen in
0.18
female preference. Data from Houde

1.5
(1994). Code here.

0.16
0.14
1.0
1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
Generation Generation
As an example of how direct selection on display traits can drive

the evolution of preference traits, let’s consider some data from gup-
pies. Guppies (Poecilia reticulata) are a classic system for studying
the interplay of natural and sexual selection. In some populations
of guppies, females show a preference for males with more orange
colouration.
Houde (1994) established four replicate population pairs of gup-
pies and selected one of each pair for an increased or decreased orange
coloration in males, selecting the top/bottom 20 out of 50 males. She Figure 9.14: Guppy (Poecilia reticu-
randomly chose females from each population to form the next gener- lata).
From a set of 1962 stamps of Hungary.
ation, and so did not exert direct selection on females. She measured Contributed to wikimedia by Darjac, not
covered by copyright
the response to selection on male colouration and on female prefer-
ence for orange (left and right panels of Figure 9.13 respectively). In
the lines that were selected for more orange males, females showed
an increased preference for orange. While in those lines selected for
population and
quantitative
genetics 145
less orange in male displays, females showed a decreased preference
for orange. This is consistent with indirect selection on female orange
preference as a response to selection on male colouration, due to a
genetic correlation between female preference and male trait. It is a
priori unlikely that pleiotropy is the source of the genetic correlation
between these traits, rather it is likely caused by females assortatively
mating with males that match their colour preference.
Returning to our bird tail example, what could drive the direct
selection on male tail length? The selection for longer tails in males
could come about because longer tails are genetic correlated with
higher male viability, for example perhaps only males who gather an
excess of food have the resources to invest in growing long tail, i.e.
a long tail is an honest signal of fitness. This would correspond to a
‘good genes’ explanation of female mate choice evolution. “The case of the male Ar-
There’s another subtler way that selection could favour our male gus Pheasant is eminently
trait. Imagine that the variation in female preference trait is because interesting, because it affords
some females have no strong preference for male tail length, but some good evidence that the most
females have a strong preference for males with longer tails. refined beauty may serve as
a sexual charm, and for no
Males with longer tails would then have higher fecundity than the
other purpose.” – Darwin
short-tailed males as there’s a subset of females who are strongly (1888)
attracted to long tails, and these males also get to mate with the other
females. Thus selection favours long-tailed males, and so indirectly
favours female preference for longer tails; females with a preference for
longer-tails have sons who in turn are more attractive. This model is
sometimes called the sexy-son model. It is also called the Fisherian
runaway model (Fisher, 1915), as female preference and male trait
can coevolve in an escalating fashion driving more and more extreme
preferences for arbitrary traits. Thus many extravagant display traits
in males and females may exist purely because individuals find them
beautiful and are attracted to them.
Figure 9.15: Argus Pheasant.

A monograph of the pheasants. (1918). Beebe,
W Image from the Biodiversity Heritage Li-
brary. Contributed by Smithsonian Institution
Libraries. Licensed under CC BY-2.0.
10
One-Locus Models of Selection
“Socrates consisted of the genes his parents gave him, the experiences
they and his environment later provided, and a growth and develop-
ment mediated by numerous meals. For all I know, he may have been
very successful in the evolutionary sense of leaving numerous offspring.
His phenotype, nevertheless, was utterly destroyed by the hemlock
and has never since been duplicated. The same argument holds also
for genotypes. With Socrates’ death, not only did his phenotype dis-
appear, but also his genotype.[...] The loss of Socrates’ genotype is
not assuaged by any consideration of how prolifically he may have
reproduced. Socrates’ genes may be with us yet, but not his genotype,
because meiosis and recombination destroy genotypes as surely as
death.” –Williams (1966)
Individuals are temporary, their phenotypes are temporary, and

their genotypes are temporary. However, the alleles that individuals
transmit across generations have permanence. Sustained phenotypic
evolutionary change due to natural selection occurs because of changes
in the allelic composition of the population. To understand these
changes, we need to understand how the frequency of alleles (genes)
changes over time due to natural selection. We’ll also see that the
because an individual’s genotype is just a ephemeral collection of
alleles that genetic conflicts can arise that actually lower the fitness of
individuals.
As we have seen, natural selection occurs when there are differences
between individuals in fitness. We may define fitness in various ways.
Most commonly, it is defined with respect to the contribution of a
phenotype or genotype to the next generation. Differences in fitness
can arise at any point during the life cycle. For instance, different
genotypes or phenotypes may have different survival probabilities from
one stage in their life to the stage of reproduction (viability), or they
may differ in the number of offspring produced (fertility), or both.
Here, we define the absolute fitness of a genotype as the expected
number of offspring of an individual of that genotype. Differences in
fitness among genotypes drive allele frequency change. In this chapter
148 graham coop
we’ll study the dynamics of alleles at a single locus. In this chapter

we’ll ignore the effects of genetic drift, and just study the determin-
istic dynamics of selection. We’ll return to discuss the interaction of
selection and drift in a couple of chapters.
10.0.1 Haploid selection model

We start out by modeling selection in a haploid model, as this is
mathematically relatively simple. Let the number of individuals carry-
ing alleles A1 and A2 in generation t be Pt and Qt . Then, the relative
frequencies at time t of alleles A1 and A2 are pt = Pt /(Pt + Qt ) and
qt = Qt /(Pt + Qt ) = 1 − pt . Further, assume that individuals of
type A1 and A2 on average produce W1 and W2 offspring individuals,
respectively. We call Wi the absolute fitness.
Therefore, in the next generation, the absolute number of carriers
of A1 and A2 are Pt+1 = W1 Pt and Qt+1 = W2 Qt , respectively. The
mean absolute fitness of the population at time t is
Pt Qt
W t = W1 + W2 = W1 pt + W2 qt , (10.1)
Pt + Qt Pt + Qt
i.e. the sum of the fitness of the two types weighted by their relative
frequencies. Note that the mean fitness depends on time, as it is a
function of the allele frequencies, which are themselves time depen-
dent.
As an example of a rapid response to selection on an allele in a
haploid population, we can consider some data on the evolution of
drug resistant viruses. Feder et al. (2017) studied viral dynamics
in a macaque infected with a strain of simian immunodeficiency virus
(SHIV) that carries the HIV-1 reverse transcriptase coding region. The main focus of Feder et al.’s
The viral load of the macaque’s blood plasma is shown as a black line work was modeling the complicated
spatial dynamics of drug-resistant
in Figure 10.1. Twelve weeks after infection, the macaque was treated SHIV adaptation in different organ
with an anti-retroviral drug that targeted the the virus’ reverse tran- systems.
scriptase protein. Note how the viral load initially starts to drop once
the drug is administered, suggesting that the absolute fitness of the
original strain is less than one (W2 < 1) in the presence of the drug (as
their numbers are decreasing). However, the viral population rebounds
as a mutation that confers drug resistance to the anti-retroviral drug
arises in the SHIV and starts to spread. Viruses carrying this mu-
tation (let’s call them allele 1) likely have absolute fitness W1 > 1.
The frequency of the drug-resistant allele is shown in red; it quickly
spreads from being undetectable in week 13, to being fixed in the
SHIV population in week 20.
The rapid spread of this drug-resistant allele through the popula-
tion is driven by the much greater relative fitness of the drug-resistant
allele over the original strain in the presence of the anti-retroviral
population and
quantitative
genetics 149
Figure 10.1: The rapid evolution of

drug-resistant SHIV. The viral load
100
● ●
of SHIV in the blood of a macaque
Plasma numbers of SHIV (copies per ml)
1e+06
(black line), the frequency of a drug
Frequency of drug resistant allele

● resistance mutation (red line). Data
80
●
from Feder et al. (2017). Code here.
1e+05
●
● ●
●
● ●
60
●
1e+04
● ●
●
40
Start of drug treatment
1e+03
● ●
20
●
1e+02
● ●
0
0 5 10 15 20
Weeks since infection
drug.
The frequency of allele A1 in the next generation is given by
Pt+1 W1 P t W1 pt W1
= pt+1 =
= = pt .
Pt+1 + Qt+1 W1 Pt + W2 Qt W1 pt + W2 qt Wt
(10.2)
Importantly, eqn. (10.2) tells us that the change in p only depends
on a ratio of fitnesses. Therefore, we need to specify fitness only up
to an arbitrary constant. As long as we multiply all fitnesses by the
same value, that constant will cancel out and eqn. (10.2) will hold.
Based on this argument, it is very common to scale absolute fitnesses
by the absolute fitness of one of the genotypes, e.g. the most or the
least fit genotype, to obtain relative fitnesses. Here, we will use wi for
the relative fitness of genotype i. If we choose to scale by the absolute
fitness of genotype A1 , we obtain the relative fitnesses w1 = W1 /W1 =
1 and w2 = W2 /W1 .
Without loss of generality, we can therefore rewrite eqn. (10.2) as
w1
pt+1 = pt , (10.3)
w
dropping the subscript t for the dependence of the mean fitness on
time in our notation, but remembering it. The change in frequency
from one generation to the next is then given by
w1 pt w1 pt − wpt w1 pt − (w1 pt + w2 qt )pt w1 − w2

∆pt = pt+1 −pt = −pt = = = pt qt ,
w w w w
(10.4)
recalling that qt = 1 − pt .
150 graham coop
Assuming that the fitnesses of the two alleles are constant over
time, the number of the two allelic types τ generations after time 0 are
Pτ = (W1 )τ P0 and Qτ = (W2 )τ Q0 , respectively. Therefore, the relative
frequency of allele A1 after τ generations past t is
(W1 )τ P0 (w1 )τ P0 p0
pτ = = = ,
(W1 )τ P τ
0 + (W2 ) Q0 (w1 ) P0 + (w2 )τ Q0
τ p0 + (w2 /w1 )τ q0
(10.5)
where the last step includes dividing the whole term by (w1 )τ and
switching from absolute to relative allele frequencies. Rearrange this
to obtain ( )τ
pτ p0 w1
= . (10.6)
qτ q0 w2
Solving this for τ yields
( ) ( )
pτ q0 w1
τ = log / log . (10.7)
qτ p0 w2
In practice, it is often helpful to parametrize the relative fitnesses

wi in a specific way. For example, we may set w1 = 1 and w2 = 1 − s,
where s is called the selection coefficient. Using this parametrization,
s is simply the difference in relative fitnesses between the two alleles.
Equation (10.5) becomes
pt
pt+τ = , (10.8)
pt + qt (1 − s)τ
as w2 /w1 = 1 − s. Then, if s ≪ 1, we can approximate (1 − s)τ in the

denominator by exp(−sτ ) to obtain
pt
pt+τ ≈ . (10.9)
pt + qt e−sτ
This equation takes the form of a logistic function. That is because we

are looking at the relative frequencies of two ‘populations’ (of alleles
A1 and A2 ) that are growing (or declining) exponentially, under the
constraint that p and q always sum to 1.
Moreover, eqn. (10.6) for the number of generations τ it takes for a
certain change in frequency to occur becomes
( )
pτ q0
τ = − log / log (1 − s) . (10.10)
qτ p0
Assuming again that s ≪ 1, this simplifies to

( )
1 pτ q0
τ ≈ log . (10.11)
s qτ p0
One particular case of interest is the time it takes to go from an

absolute frequency of 1 to near fixation in a population of size N . In
this case, we have p0 = 1/N , and we may set pτ = 1 − 1/N , which
population and
quantitative
genetics 151
is very close to fixation. Then, plugging these values into eqn. (10.11),
we obtain
( )
1 1 − 2/N + 1/N 2
τ = log
s 1/N 2
1
≈ (log(N ) + log(N − 2))
s
2
≈ log(N ) (10.12)
s
where we make the approximations N 2 − 2N + 1 ≈ N 2 − 2N and later
N − 2 ≈ N.
Question 1. In our example of the evolution of drug resistance,
the drug-resistant SHIV virus spread from undetectable frequencies to
∼ 65% frequency by 16 weeks post infection. An estimated effective
population size of SHIV is 1.5 × 105 , and its generation time is ∼ 1
day. Assuming that the mutation arose as a single copy allele very
shortly the start of drug treatment at 12 weeks, what is the selection
coefficient favouring the drug resistance allele?
Haploid model with fluctuating selection Selection pressures may

change while a polymorphism persists in the population due to en-
vironmental changes. We can use our haploid model to consider this
case where the fitnesses depend on time (Dempster, 1955), and say
that w1,t and w2,t are the fitnesses of the two types in generation t.
The frequency of allele A1 in generation t + 1 is
w1,t
pt+1 = pt , (10.13)
wt
which simply follows from eqn. (10.3). The ratio of the frequency of
allele A1 to that of allele A2 in generation t + 1 is
pt+1 w1,t pt
= . (10.14)
qt+1 w2,t qt
Therefore, if we think of the two alleles starting in generation 1 at
frequencies p1 and q1 , then τ generations later,
( τ )
pτ ∏ w1,i p1
= . (10.15)
qτ w
i=1 2,i
q1
The question of which allele is increasing or decreasing in frequency

∏τ
comes down to whether ( i=1 w1,i/w2,i ) is > 1 or < 1. As it is a little
hard to think about this ratio, we can instead take the τ th root of it
and consider v(
u τ ) √ ∏τ
u ∏ w1,i τ
w1,i
t
τ
√
= τ ∏i=1 . (10.16)
τ
w
i=1 2,i i=1 w2,i
152 graham coop
A1 A2
The term v Dry 2 1.57
u τ
u∏ Wet 1.16 1.57
t
τ
w1,i (10.17) Arithmetic Mean 1.58 1.57
i=1 Geometric Mean 1.52 1.57
Table 10.1: Fitnesses of two alleles in
is the geometric mean fitness of allele A1 over the τ generations wet and dry years. Means calculated
past generation t. Therefore, allele A1 will only increase in frequency assuming equal chances of wet and
dry years. The geometric mean is
if it has a higher geometric mean fitness than allele A2 (at least in our √
calculated as wwet wdry . Example
simple deterministic model). This implies that an allele with higher numbers taken from Seger and
geometric mean fitness can even invade and spread to fixation if its Brockmann (1987).
(arithmetic) mean fitness is lower than the dominant type. To see this
consider two alleles that experience the fitnesses given in Table 10.1.
The allele A1 does much better in dry years, but suffers in wet years;
while the A2 is generalist and is not affected by the variable environ-
ment. If there is an equal chance of a year being wet or dry, the A1
allele has higher (arithmetic) mean fitness, but it will be replaced by
the A2 allele as the A2 allele has higher geometric mean fitness (See
Figure 10.2).
Figure 10.2: An example frequency

trajectory of the A1 allele under
variable environments (using the
0.30
fitnesses from Table 10.1). Wet years

(generations) are shown in red, dry
years in white. The environment flips
0.25
at random each year. Note how the

A1 allele increases in frequency in the
dry years as it has higher fitness, and
0.20
yet the A2 allele still wins out. Code

Frequency
here.
0.15
0.10
0.05
0.00
0 20 40 60 80 100
Generations
Evolution of bet hedging Don’t put your eggs in one basket, it makes
a lot of sense to spread your bets. Financial advisors often advise you
to diversify your portfolio, rather than placing all your investments
in one stock. Even if that stock looks very strong, you can come a
cropper that 1/20 times some particular part of the market crashes.
Likewise, evolution can result in risk averse strategies. Some species of
population and
quantitative
genetics 153
bird lay multiple nests of eggs; some plants don’t put all of their en-
ergy into seeds that will germinate next year. It can even make sense
to hedge your bets even if that comes at an average cost (Seger and
Brockmann, 1987).
To see this lets think more about geometric fitness. We can write
the relative fitness of an allele in a given generation i as wi = 1 + si ,
such that we can write your geometric fitness as
v
uτ −1
u∏
ḡ = tτ
1 + si (10.18)
i=1
when we think about products it’s often natural to take the log to
turn it into a sum
−1
( ) 1 τ∑ ( )
log ḡ = log 1 + si
τ i=1
[ ]
( )
=E log 1 + si (10.19)
equating the mean and the expectation. Assuming that si is small

)
log(1 + si ≈ si − si/2, ignoring terms s3i and higher1 then this is
2 1
Here we’re using a 2nd order Talyor
approximation, see math appendix
[ ]
( ) eqn (A.7).
log ḡ ≈E si − si/2
2
[ ]
=E si − var(si )/2
(10.20)
So genotypes with high arithmetic mean fitness can be selected against,

i.e. have low geometric mean fitness against, if their fitness has too
high a variance across generations (Gillespie, 1973, 1977). See our
example above, Table 10.1 and Figure 10.2).
A classic example of bet-hedging is in delayed seed germination
in plants (Cohen, 1966). In variable environments, such as deserts,
it may make sense to spread your bets over years by having only a
proportion of your seeds germinate in the first year. However, delay-
ing germination can come at a cost due to seed mortality. Gremer
and Venable (2014), using data from a long-term study various Figure 10.3: Woolly plantain (Plan-
tago patagonica). One of the desert
species of Sonoran Desert winter showed that annual plants were in- annuals shown to have a bet-hedging
deed pursuing adaptive bet-hedging strategies. The plant species with germination strategy by Gremer
and Venable (2014).
the highest variation in among-year yield had the lowest germination An illustrated flora of the northern United
States, Canada and the British possessions,
fraction per year. Further, Gremer and Venable showed through from Newfoundland to the parallel of the
southern boundary of Virginia, and from
modeling life that by having per-year germination proportions < 1 all the Atlantic Ocean westward to the 102d
meridian (1913) Britton, N.L. Image from the
of the species were achieving higher geometric fitness at the expense of Cornell University Library. Not in copyright.
arithmetic fitness in the variable desert environment. See Figure 10.4

for an example of bet hedging in Woolly plantain.
154 graham coop
Figure 10.4: Plantago patagonica’s
1.1
Mean
Geometric Mean Fitness

arithmetic fitness is an increasing
Std. Dev function of the proportion of seeds
6
Arithmetic Fitness
germinating, due to seeds not surviv-
0.9
ing a germination delay. However,
4
the standard deviation of fitness also

increases with this proportion as
0.7
they are more likely to have all of
2
their seeds germinate in a bad year.

Thus Plantago patagonica can achieve
0.5
higher geometric fitness by only
0
0 20 40 60 80 100 0 20 40 60 80 100 having a proportion of their seeds

Proportion Germinating (per year) Proportion Germinating (per year) germinate. Thanks to Jenny Gremer
for sharing these data from Gremer
and Venable (2014), Code here.
Delayed reproduction is also a common example of bet-hedging

in micro-organisms. For example, the Chicken Pox virus, varicella
zoster virus, has a very long latent phase. After it causes chicken
pox it enters a latent phase, residing inactive in neurons in the spinal
cord, only to emerge 5-40 years later to cause the disease shingles. It
is hypothesized that the virus actively suppresses itself as a strategy
to allow it to emerge at a later time point as insurance against there
being no further susceptible hosts at the time of its first infection
(Stumpf et al., 2002).
Figure 10.5: Frequency of the Lactase

persistence allele in ancient and
modern samples form Central Europe.
● Data compiled by Marciniak and
Perry (2017) from various sources.
Thanks to Stephanie Marciniak for
0.6
sharing these data. Code here.
●
●
Frequency
0.4
●
●
0.2
● ●
● ●
●
0.0
● ●● ●● ● ● ● ●●●
−10000 −8000 −6000 −4000 −2000 0
Years in Past
10.0.2 Diploid model

We will now move on to a diploid model of a single locus with two
segregating alleles. As an example of the change in the frequency of an
Figure 10.6: Auroch (Bos primige-
nius). Aurochs are an extinct species
of large wild cattle that cows were
domesticated from.
Dictionnaire des sciences naturelles. 1816
Cuvier, F.G. Image from the Internet Archive.
Contributed by NCSU Libraries. No known
copyright restrictions.
population and
quantitative
genetics 155
allele driven by selection, lets consider the evolution of Lactase persis-
tence. A number of different human populations that historically have
raised cattle have convergently evolved to maintain the expression
of the protein Lactase into adulthood (in most mammals the pro-
tein is switched off after childhood), with different lactase-persistence
mutations having arisen and spread in different pastoral human pop-
ulations. This continued expression of Lactase allows adults to break
down Lactose, the main carbohydrate in milk, and so benefit nutri-
tionally from milk-drinking. This seems to have offered a strong fitness
benefit to individuals in pastoral populations.
With the advent of techniques to sequence ancient human DNA,
researchers can now potentially track the frequency of selected muta-
tions over thousands of years. The frequency of a Lactase persistence
allele in ancient Central European populations is shown in Figure
10.5. The allele is absent more than 5,000 years ago, but now found at
frequency of upward of 70% in many European populations.
We will assume that the difference in fitness between the three
genotypes comes from differences in viability, i.e. differential survival
of individuals from the formation of zygotes to reproduction. We
denote the absolute fitnesses of genotypes A1 A1 , A1 A2 , and A2 A2
by W11 , W12 , and W22 . Specifically, Wij is the probability that a
zygote of genotype Ai Aj survives to reproduction. Assuming that
individuals mate at random, the number of zygotes that are of the
three genotypes and form generation t are
N p2t , N 2pt qt , N qt2 . (10.21)
The mean fitness of the population of zygotes is then
W t = W11 p2t + W12 2pt qt + W22 qt2 . (10.22)
Again, this is simply the weighted mean of the genotypic fitnesses.

How many zygotes of each of the three genotypes survive to re-
produce? An individual of genotype A1 A1 has a probability of W11
of surviving to reproduce, and similarly for other genotypes. There-
fore, the expected number of A1 A1 , A1 A2 , and A2 A2 individuals who
survive to reproduce is
N W11 p2t , N W12 2pt qt , N W22 qt2 . (10.23)
It then follows that the total number of individuals who survive to

reproduce is
( )
N W11 p2t + W12 2pt qt + W22 qt2 . (10.24)
This is simply the mean fitness of the population multiplied by the

population size (i.e. N w).
156 graham coop
The relative frequency of A1 A1 individuals at reproduction is

simply the number of A1 A1 genotype individuals at reproduction
(N W11 p2t ) divided by the total number of individuals who survive to
reproduce (N W ), and likewise for the other two genotypes. Therefore,
the relative frequency of individuals with the three different genotypes
at reproduction is
N W11 p2t N W12 2pt qt N W22 qt2
, , (10.25)
NW NW NW
(see Table 10.2).
A1 A1 A1 A2 A2 A2
Absolute no. at birth N p2t N 2pt qt N qt2
Fitnesses W11 W12 W22
Absolute no. at reproduction N W11 p2t N W12 2pt qt N W22 qt2
W11 2 W12 W22 2
Relative freq. at reproduction W t
p W
2pt qt W t
q
Table 10.2: Relative genotype fre-
quencies after one episode of viability
As there is no difference in the fecundity of the three genotypes, the
selection.
allele frequencies in the zygotes forming the next generation are simply
the allele frequency among the reproducing individuals of the previous
generation. Hence, the frequency of A1 in generation t + 1 is
W11 p2t + W12 pt qt
pt+1 = . (10.26)
W
Note that, again, the absolute value of the fitnesses is irrelevant to the
frequency of the allele. Therefore, we can just as easily replace the
absolute fitnesses with the relative fitnesses. That is, we may replace
Wij by wij = Wij /W11 , for instance.
Each of our genotype frequencies is responding to selection in a
manner that depends just on its fitness compared to the mean fitness
of the population. For example, the frequency of the A1 A1 homozy-
gotes increases from birth to adulthood in proportion to W11/W . In
fact, we can estimate this fitness ratio for each genotype by compar-
ing the frequency at birth compared to adults. As an example of this
calculation, we’ll look at some data from sticklebacks.
Marine threespine stickleback (Gasterosteus aculeatus) indepen-
dently colonized and adapted to many freshwater lakes as glaciers
receded following the last ice age, making sticklebacks a wonderful sys-
Figure 10.7: Freshwater threespine
tem for studying the genetics of adaptation. In marine habitats, most Stickleback (G. aculeatus).
British fresh-water fishes. Houghton W 1879.
of the stickleback have armour plates to protect them from preda- Image from the Biodiversity Heritage Library.
Contributed by Ernst Mayr Library, Harvard..
tion, but freshwater populations repeatedly evolve the loss of armour Not in copyright.
plates due to selection on an allele at the Ectodysplasin gene (EDA).

This allele is found as a standing variant at very low frequency ma-
rine populations; Barrett et al. (2008) took advantage of this fact
and collected and bred a population of marine individuals carrying
population and
quantitative
genetics 157
both the low- (L) and completely- plated (C) alleles. They introduced
the offspring of this cross into four freshwater ponds and monitored
genotype frequencies 2 over their life courses: 2
The actual dynamics observed by
Barrett et al. are more com-
CC LC LL plicated as in the very young fish
selection reverses direction.
Juveniles 0.55 0.23 0.22
Adults 0.21 0.53 0.26
Adults/Juv. (W• /W ) 0.4 2.3 1.2
rel. fitness (W• /W12 ) 0.17 1.0 0.54
The heterozygotes have increased in frequency dramatically in the
population as their fitness is more than double the mean fitness of the
population. We can also calculate the relative fitness of each geno-
type by dividing through by the fitness of the fittest genotype, the
heterozygote in this case (doing this cancels through W ). The relative
fitness of the CC is ∼ 1/5 of the heterozygote. Note that this calcula-
tion does not rely on the genotype frequencies being at their HWE in
the juveniles.
Question 2. A) What is the frequency of the low-plated EDA
allele (L) at the start of the stickleback experiment?
B) What is the frequency in the adults? C) Also calculate the
frequency in adults using the relative fitnesses.
The change in frequency from generation t to t + 1 is
w11 p2t + w12 pt qt

∆pt = pt+1 − pt = − pt . (10.27)
w
To simplify this equation, we will first define two variables w1 and w2
as
w1 = w11 pt + w12 qt , (10.28)

w2 = w12 pt + w22 qt . (10.29)
These are called the marginal fitnesses of allele A1 and A2 , respec-

tively. They are so called as w1 is the average fitness of an allele A1 ,
i.e. the fitness of A1 in a homozygote weighted by the probability it is
in a homozygote (pt ) plus the fitness of A1 in a heterozygote weighted
by the probability it is in a heterozygote (qt ). We further note that
the mean relative fitness can be expressed in terms of the marginal
fitnesses as
w = w1 pt + w2 qt , (10.30)
where, for notational simplicity, we have omitted subscript t for the
dependence of mean and marginal fitnesses on time.
We can then rewrite eqn. (10.27) using w1 and w2 as
(w1 − w2 )
∆pt = pt qt . (10.31)
w
158 graham coop
The sign of ∆pt , i.e. whether allele A1 increases of decreases in fre-

quency, depends only on the sign of (w1 − w2 ). The frequency of A1
will keep increasing over the generations so long as its marginal fitness
is higher than that of A2 , i.e. w1 > w2 , while if w1 < w2 , the fre-
quency of A1 will decrease. Note the similarity between eqn. (10.31)
and the respective expression for the haploid model in eqn. (10.4).
(We will return to the special case where w1 = w2 shortly).
We can also rewrite (10.27) as
1 pt qt dw
∆pt = , (10.32)
2 w dp
To see this we can write
This form shows that the frequency of A1 will increase (∆pt > 0) dw̄ d (
= W11 p2 + 2W12 p
if the mean fitness is an increasing function of the frequency of A1 dp dp
)
−2W12 p2 + W22 − 2W22 p + W22 p2
(i.e. if dw
dp > 0). On the other hand, the frequency of A1 will decrease
= 2 (w11 p + w12 − 2pw12 − w22 − w22 + w22 p)
(∆pt < 0) if the mean fitness is a decreasing function of the frequency
On expansion of w̄1 − w̄2 , we see
of A1 (i.e. if dw
dp < 0). Thus, although selection acts on individuals, that it matched the terms in the
under this simple model, selection is acting to increase the mean fit- parentheses in the expression above.
ness of the population. The rate of this increase is proportional to the Thus, we see that we can replace
w̄1 − w̄2 with 1/2 ddp
w̄
.
variance in allele frequencies within the population (pt qt ). This for-
mulation suggested to Wright (1932) the view of natural selection
as moving populations up local fitness peaks, as we encountered in
Section 8.1.2 in discussing phenotypic fitness peaks. Again this view of
selection as maximizing mean fitness only holds true if the genotypic
fitnesses are frequency independent, later in this chapter we’ll discuss
some important cases where that doesn’t hold.
Question 3. For many generations you have been studying an
annual wildflower that has two color morphs, orange and white. You
have discovered that a single bi-allelic locus controls flower color, with
the white allele being recessive. The pollinator of these plants is an
almost blind bat, so individuals are pollinated at random with respect
to flower color. Your population census of 200 individuals showed that
the population consisted of 168 orange-flowered individuals, and 32
white-flowered individuals.
Heavy February rainfall creates optimal growing conditions for
an exotic herbivorous beetle with a preference for orange-flowered
individuals. This year it arrives at your study site with a ravenous
appetite. Only 50% of orange-flowered individuals survive its wrath,
while 90% of white-flowered individuals survive until the end of the
growing season.
A) What is the initial frequency of the white allele, and what do
you have to assume to obtain this?
B) What is the frequency of the white allele in the seeds forming
the next generation?
population and
quantitative
genetics 159
10.0.3 Diploid directional selection
So far, our treatment of the diploid model of selection has been in
terms of generic fitnesses wij . In the following, we will use particular
parameterizations to gain insight about two specific modes of selec-
tion: directional selection and heterozygote advantage.
Directional selection means that one of the two alleles always has
higher marginal fitness than the other one. Let us assume that A1 is
the fitter allele, so that w11 ≥ w12 ≥ w22 , and hence w1 > w2 . As
we are interested in changes in allele frequencies, we may use relative
fitnesses. We parameterize the reduction in relative fitness in terms
of a selection coefficient, similar to the one we met in the haploid
selection section, as follows:
genotype A1 A1 A1 A2 A2 A2
absolute fitness W11 ≥ W12 ≥ W22
relative fitness (generic) w11 = W11 /W11 w12 = W12 /W11 w22 = W22 /W11
relative fitness (specific) 1 1 − sh 1 − s.
Here, the selection coefficient s is the difference in relative fitness
between the two homozygotes, and h is the dominance coefficient. For
selection to be directional, we require that 0 ≤ h ≤ 1 holds. The
dominance coefficient allows us to move between two extremes. One
is when h = 0, such that allele A1 is fully dominant and A2 fully
recessive. In this case, the heterozygote A1 A2 is as fit as the A1 A1
homozgyote genotype. The inverse holds when h = 1, such that allele
A1 is fully recessive and A2 fully dominant.
We can then rewrite eqn. (10.31) as
pt hs + qt s(1 − h)
∆pt = pt qt , (10.33) Figure 10.8: The trajectory of the
w frequency of allele A1 , starting from
where p0 = 0.01, for a selection coefficient
s = 0.01 and three different dom-
w = 1 − 2pt qt sh − qt2 s. (10.34) inance coefficients. The recessive
beneficial allele (h = 1) will eventually
Question 4. Throughout the Californian foothills are old cop- fix in the population, but it takes a
long time. Code here.
per and gold-mines, which have dumped out soils that are polluted
with heavy metals. While these toxic mine tailing are often depauper-
ate of plants, Mimulus guttatus and a number of other plant species
have managed to adapt to these harsh soils. Wright et al. (2015)
have mapped one of the major loci contributing to the adaptation to
soils at two mines near Copperopolis, CA. Wright et al. planted
homozygote seedlings out in the mine tailings and found that only
10% of the homozygotes for the non-copper-tolerant allele survived to
flower, while 40% of the copper-tolerant seedlings survived to flower.
A) What is the selection coefficient acting against the non-copper-
tolerant allele on the mine tailing?
Figure 10.9: Keystone Copper Mine

1866, Copperopolis, Calaveras County.
Image from picryl. Source Library of
Congress, Public Domain.
160 graham coop
B) The copper-tolerant allele is fairly dominant in its action on

fitness. If we assume that h = 0.1, what percentage of heterozygotes
should survive to flower?
Question 5. Comparing the red (h = 0) and black (h = 0.5) tra-

jectories in Figure 10.8, provide an explanation for why A1 increases
faster initially if h = 0, but then approaches fixation more slowly
compared to the case of h = 0.5.
70
red (rr)
60
50
Figure 10.10: The frequency of red,

Frequency
cross, and silver fox morphs over the

40
decades in Eastern Canada. These

data are well described by recessive
selection acting against the silver fox
30
cross (Rr)
morph. Data from Elton (1942),
compiled by Allendorf and Hard
20
●
●
(2009). Code here.
●
silver (RR)
10
● ● ● ●
● ● ●
1840 1860 1880 1900 1920
Year
To see how dominance affects the trajectory of a real polymor-

phism, we’ll consider an example from a colour polymorphism in red
foxes (Vulpes vulpes).
There are three colour morphs of red foxes: silver, cross, and red
(see Figure 10.11), with this difference primarily controlled by a single
Figure 10.11: Three colour morphs
polymorphism with genotypes RR, Rr, and rr respectively. The fur in red fox V. vulpes, cross, red, and
pelts of the silver morph fetched three times the price for hunters com- silver foxes from left to right.
The larger North American mammals” Nelson,
pared to cross (a smoky red) and red pelts, the latter two being seen E.W., Fuertes, L.A. 1916. Image from the
by Cornell University Library. No known
as roughly equivalent in worth. Thus the desirability of the pelts acts copyright restrictions.
as a recessive trait, with much stronger selection against the silver ho-
mozygotes. As a result of this price difference, silver foxes were hunted
more intensely and declined as a proportion of the population in East-
ern Canada, see Figure 10.10, as documented by Elton, from 16% to
5% from 1834 to 1937. Haldane reanalyzed these data and showed
that they were consistent with recessive selection acting against the
silver morph alone. Note how the heterozygotes (cross) decline some-
what as a result of selection on the silver homozygotes, but overall the
population and
quantitative
genetics 161
R allele is slow to respond to selection as it is ‘hidden’ from selection
in the heterozygote state.
Directional selection on an additive allele. A special case is when

h = 0.5. This case is the case of no dominance, as the interaction
among alleles with respect to fitness is strictly additive. Then, eqn.
(10.33) simplifies to
1s
∆pt = pt qt . (10.35)
2w
If selection is very weak, i.e. s ≪ 1, the denominator (w) is close to
1 and we have
1
∆pt = spt qt . (10.36)
2
It is instructive to compare eqn. (10.36) to the respective expression
under the haploid model. To this purpose, start from the generic term
for ∆pt under the haploid model in eqn. (10.4) and set w1 = 1 and
w2 = 1 − s. Again, assume that s is small, so that eqn. (10.4) becomes
∆pt = spt qt . Hence, if s is small, the diploid model of directional
selection without dominance is identical to the haploid model, up to a
factor of 1/2. That factor is due to the choice of the parametrisation;
we could have set w11 = 1, w12 = 1 − s, and w22 = 1 − 2s in our diploid
model instead, in which case the agreement with the haploid model
would be perfect.
From this analogy, we can borrow some insight we gained from the
haploid model. Specifically, the trajectory of the frequency of allele A1
in the diploid model without dominance follows a logistic growth curve
similar to eqn. (10.9). From this similarity, we can extrapolate from
Equation (10.11) to find the time it takes for our diploid, beneficial,
additive allele (A1 ) to move from frequency p0 to pτ :
( )
2 pτ q0
τ ≈ log (10.37)
s qτ p0
generations; this just differs by a factor of 2 from our haploid model.
Using this result we can find the time it takes for our favourable,
additive allele (A1 ) to transit from its entry into the population (p0 =
1/(2N )) to close to fixation (pτ = 1 − 1/(2N )):
4
τ≈ log(2N ) (10.38)
s
generations. Note the similarity to eqn. 10.12 for the haploid model,
with a difference by a factor of 2 due to the choice of parametrization
(and that the number of alleles is 2N in the diploid model, rather than
N ). Doubling our selection coefficient halves the time it takes for our
allele to move through the population.
Figure 10.12: Gulf killifish (Fundulus
Question 6. Gulf killifish (Fundulus grandis) have rapidly adapted grandis).
Distribution and abundance of fishes and
to the very high pollution levels in the Houston shipping canal since invertebrates in Gulf of Mexico estuaries.
Nelson D M and Pattillo M E Image from the
by MBLWHOI Library. No known copyright
restrictions.
162 graham coop
the 1950s. One of the ways that they’ve adapted is through the dele-
tion of their aryl hydrocarbon receptor (AHR) gene. Oziolor et al.
(2019) estimated that individuals who were homozygote for the intact
AHR gene had a relative fitness of 20% of that of homozygotes for
the deletion. Assuming an effective population size of 200 thousand
individuals, how long would it take for the deletion to reach fixation,
starting as a single copy in this population?
10.1 Balancing selection and the selective maintenance of poly-

morphism.
Directional selection on genotypes is expected to remove variation
from populations, yet we see plentiful phenotypic and genetic variation
in every natural population. Why is this? Three broad explanations
for the maintenance of polymorphisms are
1. Variation is maintained by a balance of genetic drift and mutation

(we discussed this explanation in Chapter 4).
2. Selection can sometimes act to maintain variation in populations

(balancing selection).
3. Deleterious variation can be maintained in the population as a bal-

ance between selection removing variation and mutation constantly
introducing new variation into the population.
We’ll turn to these latter two explanations through this chapter and
the next. Note that these explanations are not mutually exclusive.
Each explanation will explain some proportion of the variation, and
these proportions will differ over species and classes of polymorphism.
A central challenge in population genomics is how we can do this in a
systematic way.
10.1.1 Heterozygote advantage

One form of balancing selection occurs when the heterozygotes are
fitter than either of the homozygotes. In this case, it is useful to pa-
rameterize the relative fitnesses as follows:
absolute fitness w11 < w12 > w22
relative fitness (specific) 1 − s1 1 1 − s2
Here, s1 and s2 are the differences between the relative fitnesses

of the two homozygotes and the heterozygote. Note that to obtain
population and
quantitative
genetics 163
relative fitnesses we have divided absolute fitness by the heterozygote
fitness. We could use the same parameterization as in the model of
1.0
directional selection, but the reparameterization we have chosen here
0.8
makes the math easier.
0.6
In this case, when allele A1 is rare, it is often found in a heterozy-
p
gous state, while the A2 allele is usually in the homozygous state, and
0.4
so A1 is more fit and increases in frequency. However, when the allele
0.2
A1 is common, it is often found in a less fit homozygous state, while
0.0
the allele A2 is often found in a heterozygous state; thus it is now al- 0 50 100 150
lele A2 that increases in frequency at the expense of allele A1 . Thus, Generations
at least in the deterministic model, neither allele can reach fixation

Figure 10.13: Two allele frequency
and both alleles will be maintained at an equilibrium frequency as a trajectories of the A1 allele subject to
balanced polymorphism in the population. heterzygote advantage (w11 = 0.9,
w12 = 1, and w22 = 0.85). In one sim-
We can solve for this equilibrium frequency by setting ∆pt = 0 ulation the allele is started from being
in eqn. (10.31), i.e. pt qt (w1 − w2 ) = 0. Doing so, we find that there rare in the population (p = 1/1000,
solid line) and increases in frequency/
are three equilibria. Two of them are not very interesting (p = 0 or
In the other simulation the allele is al-
q = 0), but the third one is a stable polymorphic equilibrium, where most fixed (p = 999/1000, dashed line).
w1 − w2 = 0 holds. Using our s1 and s2 parametrization above, we see In both cases the frequency moves
toward the equilibrium frequency.
that the marginal fitnesses of the two alleles are equal when The red line shows the equilibrium
frequency (pe ). Code here.
s2
pe = (10.39)
s1 + s2
0.015
for the equilibrium frequency of interest. This is also the frequency
of A1 at which the mean fitness of the population is maximized. The
0.005
∆p
highest possible fitness of the population would be achieved if every −0.005
individual was a heterozygote. However, Mendelian segregation of al-

leles in the gametes of heterozygotes means that a sexual population
0.0 0.2 0.4 0.6 0.8 1.0
can never achieve a completely heterozygote population. This equi-
0.94
librium frequency represents an evolutionary compromise between the

mean.fit(p)
advantages of the heterozygote and the comparative costs of the two

0.90
w
homozygotes.
0.86
0.70
0.60
1.0
Probability of surviving to the next year
0.0 0.2 0.4 0.6 0.8 1.0

● p
0.55
●
Yearly Viability x Fecundity
0.9
●
Figure 10.14: Top) The change in fre-
0.65
● ●
●
offspring per year
quency of an allele with heterozygote

0.50
0.8
advantage within a generation (∆p)

as a function of the allele frequency.
0.60
0.45
Fitnesses as in Figure 10.13. Note

0.7
0.40
●
how the frequency change is positive
0.55
●
below the equilibrium frequency (pe )
0.6
0.35
and negative above. Bottom) Mean

●
fitness (w̄) as a function of the allele
0.50
0.30
frequency. The red line shows the

0.5
Ho+, Ho+ Hop,Ho+ Hop,Hop Ho+, Ho+ Hop,Ho+ Hop,Hop Ho+, Ho+ Hop,Ho+ Hop,Hop equilibrium frequency (pe ). Code
here.
Figure 10.15: For the three Soay

sheep genotypes: the offspring per
year (left), the probability of surviv-
ing a year (middle), and the product
of the two (right). Thanks to Susan
Johnston for supplying these simpli-
fied numbers from Johnston et al.
(2013). Code here.
164 graham coop
One example of a polymorphism maintained by heterozygote advan-

tage is a horn-size polymorphism found in Soay sheep, a population of
feral sheep on the island of Soay (about 40 miles off the coast of Scot-
land). The horns of the soay sheep resemble those of the wild Mouflon
sheep, and the male Soay sheep use their horns to defend females dur-
ing the rut. Johnston et al. (2013) found a large-effect locus, at the
RXFP2 gene, that controls much of the genetic variation for horn size.
Two alleles Hop and Ho+ segregate at this locus. The Ho+ allele is
associated with growing larger horns, while the Hop allele is associated
with smaller horns, with a reasonable proportion of Hop homozygotes
developing no horns at all. Johnston et al. (2013) found that the
Ho locus had substantial effects on male, but not female, fitness (see
Figure 10.15).
The Hop allele has a mostly recessive effect on male fecundity, with
the Hop homozygotes having lower yearly reproductive success pre-
sumably due to the fact that they perform poorly in male-male com-
petition (left plot Figure 10.15). Conversely, the Ho+ has a mostly re-
cessive effect on viability, with Ho+ homozygotes having lower yearly
survival (middle plot Figure 10.15), likely because they spend little
time feeding during the rut and so lose substantial body weight. Thus Figure 10.16: Mouflon (Ovis orientalis
orientalis).
both of the homozygotes suffer from trade-offs between viability and Animate creation. (1898). Wood, J. G. Image
fecundity. As a result, the Hop Ho+ heterozygotes have the highest Contributed by Smithsonian Libraries. Not in
copyright.
fitness (right plot Figure 10.15). The allele is thus balanced at in-
termediate frequency ( 50%) in the population due to this trade off
between fitness at different life history stages. The fitnesses here are chosen to
roughly match those of the real Soay
Question 7. Assume that the frequency of the HoP allele is 10%, sheep example, as a full model would
require us to more carefully model the
that there are 1000 males at birth, and that individual adults mate at
life-histories of the sheep.
random.
A) What is the expected number of males with each of the three
genotypes in the population at birth?
B) Assume that a typical male individual of each genotypes has the
following probability of surviving to adulthood:
Ho+ Ho+ Ho+ Hop Hop Hop
0.5 0.8 0.8
Making the assumptions from above, how many males of each geno-
type survive to reproduce? C) Of the males who survive to reproduce,
let’s say that males with the Ho+Ho+ and Ho+Hop genotype have on
average 2.5 offspring, while Hop Hop males have on average 1 offspring.
Taking into account both survival and reproduction, how many off-
spring do you expect each of the three genotypes to contribute to the
total population in the next generation?
D) What is the frequency of the Ho+ allele in the sperm that will
form this next generation?
E ) How would your answers to B-D change if the Hop allele was at
population and
quantitative
genetics 165
90% frequency?
p = 0.1 p =peq p = 0.9 Figure 10.17: The deviation of the
0.6
fitness of each genotype away from
0.6
0.8
the mean population fitness (0)
0.4
is shown as black dots. The area
0.4
0.6
of each circle is proportion to the

●
0.2
0.2
0.4
● ● ●
fraction of the population in each

Fitness
● ●
genotypic class (p2 , 2pq, and q 2 ).

0.0
● ● ●
0.0
0.2
● ● ●
●
The additive genetic fitness of each
genotype is shown as a red dot. The
−0.2
−0.2
0.0
●
● ● ● linear regression between fitness and
−0.4
additive genotype is shown as a red
−0.4
−0.2
line. The black vertical arrows show

−0.6
0 1 2 0 1 2 0 1 2
the difference between the average
Genotype mean-centered phenotype and additive
genetic value for each genotype. The
left panel shows p = 0.1 and the
right panel shows p = 0.9; in the
middle panel the frequency is set to
To push our understanding of heterozygote advantage a little fur- the equilibrium frequency. Code here.
ther, note that the marginal fitnesses of our alleles are equivalent to
the additive effects of our alleles on fitness. Recall from our discus-
sion of non-additive variation (Section 7.1.1) that the difference in the
additive effects of the two alleles gives the slope of the regression of
additive genotypes on fitness, and that there is additive variance in
fitness when this slope is non-zero. So what’s happening here in our
heterozygote advantage model is that the marginal fitness of the A1
allele, the additive effect of allele A1 on fitness, is greater than the
marginal fitness of the A2 allele (w̄1 > w̄2 ) when A1 is at low fre-
quency in the population. In this case, the regression of fitness on the
number of A1 alleles in a genotype has a positive slope. This is true
when the frequency of the A1 allele is below the equilibrium frequency.
If the frequency of A1 is above the equilibrium frequency, then the
marginal fitness of allele A2 is higher than the marginal fitness of allele
A1 (w̄1 < w̄2 ) and the regression of fitness on the number of copies
of allele A1 that individuals carry is negative. In both cases there is
additive genetic variance for fitness (VA > 0) and the population has
a directional response. Only when the population is at its equilibrium
frequency, i.e. when w̄1 = w̄2 , is there no additive genetic variance
(VA = 0), as the linear regression of fitness on genotype is zero.
Underdominance. Another case that is of potential interest is the

case of fitness underdominance, where the heterozygote is less fit than
either of the two homozygotes. Underdominance can be parametrized
as follows:
166 graham coop
absolute fitness w11 > w12 < w22
relative fitness (specific) 1 + s1 1 1 + s2
Underdominance also permits three equilibria: p = 0, p = 1, and

a polymorphic equilibrium p = pU . However, now only the first two
equilibria are stable, while the polymorphic equilibrium is unstable. If
p < pU , then ∆pt is negative and allele A1 will be lost, while if p > pU ,
allele A1 will become fixed. Figure 10.18: In Pseudacraea eurytus
While strongly-selected, underdominant alleles might not spread there are two homozygotes morphs
within populations (if pU ≫ 0), they are of special interest in the that mimic a different blue and
orange butterfly; the heterozygote
study of speciation and hybrid zones. That is because alleles A1 and fails to mimic either successfully and
A2 may have arisen in a stepwise fashion, i.e. not by a single mutation, so suffers a high rate of predation
(Owen and Chanter, 1972).
but in separate subpopulations. In this case, heterozygote disadvan- Illustrations of new species of exotic butterflies
(1868) Hewitson. Image from the Biodiversity
tage will play a potential role in species maintenance. Heritage Library. Contributed by Smithsonian
1.20
1.0
0.8
0.000
1.16
p 0.6
∆p
w
−0.010
1.12
0.4
0.2
1.08
−0.020
0.0
0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
generations p p
Figure 10.19: Left) Two allele fre-

quency trajectories of an A1 allele
Question 8. You are studying the polymorphism that affects subject to heterzygote disadvantage
(w11 = 1.1, w12 = 1, and w22 = 1.2).
flight speed in butterflies. The polymorphism does not appear to affect The allele is started from just above
fecundity. Homozygotes for the B allele are slow in flight and so only and below the equilibrium frequency,
40% of them survive to have offspring. Heterozygotes for the poly- in both cases the frequency move
away the equilibrium frequency. The
morphism (Bb) fly quickly and have a 70% probability of surviving red line shows the unstable equilib-
to reproduce. The homozygotes for the alternative allele (bb) fly very rium frequency (pe ). Middle) The
change in frequency of an allele with
quickly indeed, but often die of exhaustion, with only 10% of them heterozygote disadvantage within
making it to reproduction. a generation (∆p) as a function of
A) What is the equilibrium frequency of the B allele? the allele frequency. Fitnesses as in
Figure 10.13. Note how the frequency
B) Calculate the marginal absolute fitnesses of the B and the b change is negative below the equi-
allele at the equilibrium frequency. librium frequency (pe ) and positive
above. Right) Mean fitness (w̄) as a
function of the allele frequency. Code
here.
Diploid fluctuating fitness Selection pressures fluctuate over time
and can potentially maintain polymorphisms in the population. Two
examples of polymorphisms fluctuating in frequency in response to
temporally-varying selection are shown in Figure 10.20; thanks to the
population and
quantitative
genetics 167
short lifespan of Drosophila we can see seasonally-varying selection.
The first example is an inversion allele in Drosophila pseudoobscura
populations. Throughout western North America, two orientations
of the chromosome, two ’inversion alleles’, exist: the Chiricahua
and Standard alleles. Dobzhansky (1943) and Wright and
Dobzhansky (1946) investigated the frequency of these inversion
alleles over four years at a number of locations and found that their
frequency fluctuated systematically over the seasons in response to
selection (left side of 10.20). If you’re still reading these notes send
Prof. Coop a picture of Dobzhansky; Dobzhansky was one of the most
important evolutionary geneticists of the past century and spent a
bunch of time at UC Davis in his later years. Our second example is
an insertion-deletion polymorphism in the Insulin-like Receptor gene
in Drosophila melanogaster. Paaby et al. (2014) tracked the fre-
quency of this allele over time and found it oscillated with the seasons
(right side of 10.20). She and her coauthors also determined that these
alleles had large effects on traits such as developmental time and fe-
cundity, which could mediate the maintenance of this polymorphism
through life-history trade-offs.
Figure 10.20: Left) Seasonal variation

Standard Inversion allele Insulin−like Receptor allele
in the frequency of the ‘Standard’
0.40
0.7
inversion allele in Drosophila pseu-

Andreas Canyon
● doobscura for three populations from
●
0.35
0.6
● Mount San Jacinto, CA. These fre-

quencies are an average over four
Frequency
Frequency
0.30
● years. Data from Wright and

0.5
Pinon Flats
● ● ●
●
Dobzhansky (1946). Right) The
0.25
●
frequency of an allele at the Insulin-
0.4
Keen Camp ● ● like Receptor gene over three years in

0.20
● Drosophila melanogaster samples from

0.3
●
an Orchard in Pennsylvania. Data
0.15
J F M A M J J A S O N D Spr 09 Fall 09 Spr 10 Fall 10 Spr 11 Fall 11

from Paaby et al. (2014). Code here.
Month Month
To explore temporal fluctuations in fitness, we’ll need to think

about the diploid absolute fitnesses being time-dependent, where the
three genotypes have fitnesses w11,t , w12,t , and w22,t in generation t.
Modeling the diploid case with time-dependent fitness is much less
tractable than the haploid case, as segregation makes it tricky to
keep track of the genotype frequencies. However, we can make some
progress and gain some intuition by thinking about how the frequency
of allele A1 changes when it is rare (following the work of Haldane
and Jayakar, 1963).
When A1 is rare, i.e. pt ≪ 1, the frequency of A1 in the next gener-
168 graham coop
ation (10.26) can be approximated as

w12
pt+1 ≈ pt . (10.40)
w
To obtain this equation, we have ignored the p2t term (because it is
very small when pt is small) and we have assumed that qt ≈ 1 in the
numerator. Following a similar argument to approximate qt+1 , we can
write
pt+1 w12,t pt
= . (10.41)
qt+1 w22,t qt
Starting from out from p0 and q0 in generation 0, then t + 1generations
later we have ( t )
pt+1 ∏ w12,i p0
= . (10.42)
qt+1 w
i=0 22,i
q0
From this we can see, following our haploid argument from above, that
the frequency of allele A1 will increase when rare only if
√∏
t t
i=0 w12,i
√∏ > 1, (10.43)
t t
i=0 w22,i
i.e. if the heterozygote has higher geometric mean fitness than the
A2 A2 homozygote.
The question now is whether allele A1 will approach fixation in
the population, or whether there are cases in which we can obtain a
balanced polymorphism. To investigate that, we can simply repeat our
analysis for q ≪ 1, and see that in that case
( t )
pt+1 ∏ w11,i p0
= . (10.44)
qt+1 w
i=0 12,i
q0
Now, for allele A1 to carry on increasing in frequency and to approach

fixation, the A1 A1 genotype has to be out-competing the heterozy-
gotes. For allele A1 to approach fixation, we need the geometric mean
of w11,i to be greater than the geometric mean fitness of heterozy-
gotes (w12,i ). If instead heterozygotes have higher geometric mean
fitness than the A1 A1 homozygotes, then the A2 allele will increase in
frequency when it is rare.
Intriguingly, we can thus have a balanced polymorphism even if the
heterozygote is never the fittest genotype in any generation, as long
as the heterozygote has a higher geometric mean fitness than either of
the homozygotes. In this case, the heterozygote comes out ahead when
we think about long-term fitness across heterogeneous environmental
conditions, despite never being the fittest genotype in any particular
environment.
As a toy example of this type of balanced polymorphism, consider a
plant population found in one of two different environments each gen-
eration. These occur randomly; 1/2 of time the population experiences
population and
quantitative
genetics 169
the dry environment and with probability 1/2 it experiences the wet
environment. The absolute fitnesses of the genotypes in the different
environments are as follows:
Environment AA Aa aa
Wet 6.25 5.0 3.75
Dry 3.85 5.0 6.15
arithmetic mean 5.05 5.0 4.95
This example is loosely based on the
Let’s write wAA,dry and wAA,wet for the fitnesses of the AA ho- work of Schemske and Bierzy-
mozygote in the two environments. Then, if the two environments are chudek (2001) on Linanthus par-
∏t t/2 t/2 ryae, a desert annual, endemic to
equally common, i=0 wAA,i ≈ wAA,dry wAA,wet for large values of t. California. There are blue- and a
To obtain an estimate of this product normalized over the t genera- white-flowered colour morphs poly-
morphic many populations, with this
tions, we can take the tth root to obtain the geometric mean fitness.
polymorphism being controlled by
Taking the tth root, we find the geometric mean fitness of the AA al- a single dominant allele. The blue-
1/2 1/2
lele is wAA,dry wAA,wet . Doing this for each of our genotypes, we find flowered plants produce more seeds
in dry years, i.e. they have higher
the geometric mean fitnesses of our alleles to be: fitness in these years, while the white-
flowered plants have higher seed
AA Aa aa production in wet years. Thus both
Geometric mean 4.91 5.0 4.80 morphs can potentially be maintained
in the population. See Turelli
i.e. the heterozygote has higher geometric mean fitnesses than either et al. (2001) for a more detailed
of the homozygotes, despite not being the fittest genotype in either analysis.
environment (nor having the highest arithmetic mean fitness). So the

A1 allele can invade the population when it is rare as it spread thanks
to the higher fitness of the heterozygotes. Similarly the A2 allele can
invade the population when it is rare. Thus both alleles will persist in
the population due to the environmental fluctuations, and the higher
geometric mean fitness of the heterozygotes.
Negative frequency-dependent selection. In the models and examples

above, heterozygote advantage maintains multiple alleles in the pop-
ulation because the common allele has a disadvantage compared to
the other rarer allele. In the case of heterozygote advantage, the rel-
ative fitnesses of our three genotypes are not a function of the other
genotypes present in the population. However, there’s a broader set of
models where the relative fitness of a genotype depends on the geno-
typic composition of the population; this broad family of models is
called frequency-dependent selection. Negative frequency-dependent
selection, where the fitness of an allele (or phenotype) decreases as it
becomes more common in the population, can act to maintain genetic
and phenotypic diversity within populations. While cases of long-term
heterozygote advantage may be somewhat rare in nature, negative
frequency-dependent selection is likely a common form of balancing
selection.
One common mechanism that may create negative frequency-
dependent selection is the interaction between individuals within or
170 graham coop
among species. For example, negative frequency-dependent dynamics

can arise in predator-prey or pathogen-host dynamics, where alleles
conferring common phenotypes are at a disadvantage because preda-
tors or pathogens learn or evolve to counter the phenotypic effects of
common alleles.
As one example of negative frequency-dependent selection, con-
sider the two flower colour morphs in the deceptive Elderflower orchid
(Dactylorhiza sambucina). Throughout Europe, there are popula-
tions of these orchids polymorphic for yellow- and purple-flowered
individuals, with the yellow flower corresponding to a recessive al-
lele. Neither of these morphs provide any nectar or pollen reward to
their bumblebee pollinators. Thus these plants are typically polli-
nated by newly emerged bumblebees who are learning about which
plants offer food rewards, with the bees alternating to try a different
coloured flower if they find no food associated with a particular flower-
colour morph (Smithson and Macnair, 1997). Gigord et al.
(2001) explored whether this behaviour by bees could result in nega-
tive frequency-dependent selection; out in the field, the researchers set
up experimental orchid plots in which they varied the frequency of the Figure 10.21: Elderflower orchid
(Dactylorhiza sambucina).
two colour morphs. Figure 10.22 shows their measurements of the rel- Abbildungen der in Deutschland und den
angrenzenden gebieten vorkommenden grund-
ative male and female reproductive success of the yellow morph across formen der orchideenarten (1904). Müller, W.
these experimental plots. When the yellow morph is rare, it has higher Contributed by New York Botanical Garden.
Not in copyright.
reproductive success than the purple morph, as it receives a dispropor-

tionate number of visits from bumblebees that are dissatisfied with the
purple flowers. This situation is reversed when the yellow morph be-
comes common in the population; now the purple morph outperforms
the yellow morph. Therefore, both colour morphs are maintained in
this population, and presumably Europe-wide, due to this negative
frequency-dependent selection.
Figure 10.22: Left) Measures of the

relative male- and female- reproduc-
tive success of the yellow Elderflower
Relative reproductive success of Yellow
1.0
1.4
● Female (% pollinia deposited)

Male (% pollinia removed) orchid morph as a function of the
●
yellow morph in experimental plots.
1.3
0.8
Right) Two allele frequency trajec-

Yellow allele freq.
1.2
tories of the Yellow allele subject to

0.6
● negative frequency scheme given in

1.1
●
the left plot (for an initial frequency
0.4
of 0.01 and 0.99, solid and dotted line

1.0
respectively). Note that the yellow

0.9
0.2
Male reproductive success is measured

in terms of the % of pollinia removed
0.8
front a plant and female reproductive

0.0
0.2 0.4 0.6 0.8 0 50 100 150

success is measured in terms of the
% of stigmas receiving pollinia on
Frequency of Yellow morph Generations
a plant. These measures are made
relative by dividing the reproduc-
tive success of the yellow morph by
the mean of the yellow and purple
morphs. Pollinia are the pollen masses
of orchids, and other plants, where
individual pollinium are transferred
as a single unit by pollinators. Data
from Gigord et al. (2001). Code
here.
population and
quantitative
genetics 171
Negative frequency-dependent selection can also maintain differ-
ent breeding strategies due to interactions amongst individuals within
a population. One dramatic example of this occurs in ruffs (Philo-
machus pugnax), a marsh-wading sandpiper that summers in Northern
Eurasia. The males of this species lek, with the males gathering on
open ground to display and attract females. There are three different
male morphs differing in their breeding strategy. The large majority of
males are ‘Independent’, with black or chestnut ruff plumage, and try
to defend and display on small territories. ‘Satellite’ males, with white
ruff plumage, make up ∼ 16% of males and do not defend territories,
but rather join in displays with Independent males and opportunis-
tically mate with females visiting the lek. Finally, the rare ‘Faeder’
morph was only discovered in 2006 (Jukema and Piersma, 2006)
and makes up less than 1% of males. These Faeder males are female
mimics who hang around the territories of Independents and try to
’sneak’ in matings with females. Faedar males have plumage closely
resembling that of females and a smaller body size than other males,
but with larger testicles (presumably to take advantage of rare mating
opportunities). All three of these morphs, with their complex be-
havioural and morpological differences, are controlled by three alleles
at a single autosomal locus, with the Satellite and Faeder alleles being
genetically dominant over the high frequency Independent allele. The
Figure 10.23: Lekking Ruffs (Philo-

machus pugnax). Three Independent
males, one Satellite male, and one
female (or Faeder male?).
Painting by Johann Friedrich Naumann
(1780–1857). Public Domain, wikimedia.
genetic variation for these three morphs is potential maintained by

negative frequency-dependent selection, as all three male strategies are
likely at an advantage when they are rare in the population. For ex-
ample, while the Satellites mostly lose out on mating opportunities to
Independents, they may have longer life-spans and so may have equal
life-time reproductive success (Widemo, 1998). However, Satellite
and Faeder males are totally reliant on the lekking Independent males,
and so both of these alternative strategies cannot become overly com-
172 graham coop
mon in the population. The locus controlling these differences has

been mapped, and the underlying alleles have persisted for roughly
four million years (Küpper et al., 2016; Lamichhaney et al.,
2016). While this mating system is bizarre, the frequency dependent
dynamics mean that it has been around longer than we’ve been using
stone tools.
While these examples may seem somewhat involved, they must be
simple compared to the complex dynamics that maintain the hundreds
of alleles present at the genes in the Major histocompatibility complex
(MHC). MHC genes are key to the coordination of the vertebrate
immune system in response to pathogens, and are likely caught in an
endless arms race with pathogens adapting to common MHC alleles,
allowing rare MHC alleles to be favoured. Balancing selection at the
MHC locus has maintained some polymorphisms for tens of millions
of years, such that some of your MHC alleles may be genetically more
closely related to MHC alleles in other primates than they are to
alleles in your close human friends.
10.2 Sex ratios, sex ratio distorters, and other selfish elements.
We have seen that when selection acts on phenotypes and genotypes

in a frequency-independent manner it can act to increase the mean
fitness of the population, consist with our notation of selection driving
our population to become better adapted to the environment (eqn.
(8.19) and (10.32)). However, when the absolute fitnesses of individ-
uals are frequency dependent, e.g. depend on the strategies deployed
by others in the population, natural selection is not guaranteed to in-
crease mean fitness. Nothing about the strategies pursued by the Ruffs
discussed above seems well suited to maximizing the future growth
rate of the population. One place where it is particularly apparent
that frequency dependence drives non-optimal solutions from the per-
spective of the population is in the evolution of a 50/50 sex ratio. In
fact as we’ll see, selection can drive the evolution of traits that are ac-
tively harmful to the fitness of an individual when selection acts below
the level of an individual.
population and
quantitative
genetics 173
Figure 10.24: Basolo (1994) ex-

plored sex ratio dynamics in platyfish
(Xiphophorus maculatus), which
1.0
has manipulable sex ratio due to its
three factor sex determination. She
0.8 started two replicates with a strong
● ● female bias (black) and two replicates
Sex ratio (% males)
with strong male bias (white). In all

0.6
four cases the sex ratio quickly oscil-

●
● lated to a 50/50 sex ratio. Data from
●
Basolo (1994), Code here.
0.4
●
●
0.2
0.0
0.0 0.5 1.0 1.5 2.0 2.5 3.0
Generation
In many species, regardless of the mechanism of sex determination,

the sex ratio is close to 50/50. Yet this is far from the optimum sex
ratio from the perspective of the population viability. In many species
females are the limiting sex, investing more in gametes and (some-
times) more in parental care. Thus a population having many females
and few males would offer the fastest rate of population growth (i.e.
the highest mean fitness). Why then is the sex ratio so often close
to 50/50? Imagine if the population sex ratio was strongly skewed
towards females. A rare autosomal allele that caused a mother to pro-
duced sons would have high fitness, as the mother’s sons would have
high reproductive success in this population of most females. Thus our
initially rare allele would increase in frequency. Conversely if the sex
ratio was strongly skewed towards males, a rare autosomal allele that Figure 10.25: Poecilid Hybrid,
causes a mother to produce daughters would spread. So selection on Xiphophorus helleri × Platypoecilus
maculatus.
autosomal alleles favours the production of the rare sex, a form of neg- Aquatic life, chapter by Curtis F.S. (1915)
ative frequency dependence, and this pushes the sex ratio away from Contributed by Harvard University, Museum of
Comparative Zoology, Ernst Mayr Library. Not
in copyright.
being too skewed. Only the 50/50 sex ratio is evolutionarily stable as
there is no rarer sex, and so no (autosomal) sex-ratio-altering muta-
tion can invade a population with a 50/50. The 50/50 sex ratio is an
example of an Evolutionary stable strategy (ESS), described in more
detail in Section 10.2.2. “An ESS is a strategy such that, if all
the members of a population adopt it,
then no mutant strategy could invade
Adaptive adjustments to sex ratio in response to local mate competi- the population under the influence of
tion. There are, however, situations where we see strong deviations natural selection” Smith (1982), pg
10.
away from a 50/50 sex ratio. This can represent an adaptive strategy A version of this sex ratio argument
to situations where individuals compete against relatives for access was first put forward by Düsing in
1884 and popularized by Fisher
(1930), see Edwards (1998).
174 graham coop
to resources or mating opportunities. To see this consider fig wasps.

There are many species of fig wasp, which form a tight pollination
symbiosis with many species of fig. Wasp females enter the inverted
fig flower structure (top right Figure 10.26) pollinating the flowers.
They lay their eggs in some of the flowers, which form galls in re-
sponse. The young, wingless, male wasps emerge from their galls first
(Figure 10.27f) but they never leave the fig. Their only role in this is
to fertilize the female wasps (Figure 10.27d) in the fig and then die. Figure 10.26: Common Fig (Ficus
The female offspring (Figure 10.27a & e) emerge in the fig just as the carica). Despite urban legends the
crunch in figs isn’t dead wasps, edible
male fig flowers are emerging. The female wasps burrow out and and figs are dioecious and female wasps
take the fig pollen with them as they fly off. can’t lay in the female flowers that
Female wasps have control over the sex of their offspring but what form the fruit we eat.
Plantae selectae quarum imagines ad exem-
plaria naturalia Londini, in hortis curiosorum
is their optimal strategy? Females have this degree of control as sex nutrita (1750) Trew, C.J. Image from the
determination in wasps is haplo-diploid, with fertilized eggs developing Missouri Botanical Garden. Not in copyright.
as diploid females and unfertilized as males; by choosing to lay fertil-

ized eggs they can control their number of daughters. If a female wasp
lays her eggs into a fig with no other eggs, her sons will mate with her
daughters and then die. Thus a lone female can maximize her con-
tribution to the next generation by having many daughters, and just
enough sons to fertilize them. And that’s exactly what female wasps
do, in many species of fig wasp 95% of individuals born are female.
Figure 10.27: Life stages of Fig
wasp (Blastophaga psenes, synonym
Blastophaga grossorum); the primary
pollinator of the common fig Ficus
carica.
A descriptive catalogue of fruit and forest
trees, vines and shrubs, choice palms and roses
1.0
(1903) by Fancher Creek Nurseries Image from

the Biodiversity Heritage Library. Contributed
Selfish X chr. by National Agricultural Library, USDA. Not
in copyright.
Males
0.8
Figure 10.28: The increase in fre-

quency of a sex-ratio distorting X
Frequency
0.6
allele in the population of X chromo-

somes (solid line) and the frequency
of males in the population. Males
carrying the selfish X allele have 99%
0.4
daughters, and the selfish X allele

reduces the viability of the carries
by 20% in a dominant manner. The
0.2
model set up as in Edwards (1961),

Code here.
0.0
0 50 100 150 200 250
Generations
population and
quantitative
genetics 175
10.2.1 Selfish genetic elements and selection below the level of the
individual.
These ideas about individuals pursuing selfish strategies, which can

lower the populations fitness, extends below the level of the individual.
The alleles within an individual can sometimes pursue selfish strate-
gies that actively harm the individuals that carry them. Here we’ll
take a tour of the rogues gallery of some the various genetic conflicts
that occur and selfish genetic elements that exploit them. They’re
included in this chapter in part because much of their biology can be
understood from the perspective of the ideas developed here. But the
main reason for talking about them is that they’re an amazing slice of
biology.
Selfish sex chromosomes and sex ratio distortion From the perspec-
tive of the autosomes a 50/50 sex ratio normally represents a stable
strategy, but all is not always harmonious in the genome. In systems
with XY sex determination, male fertilization by Y-bearing sperm
leads to sons, while male fertilization by X-bearing sperm leads to
daughters. From the viewpoint of the X chromosome the Y-bearing
sperm, and a male’s sons, are an evolutionary deadend. We can imag-
ine a mutation arising on the X chromosome that causes a poison to
be released during gametogenesis that kills Y-bearing sperm. This
would cause much of the ejaculate of the males carrying this mutation
to be X-bearing sperm, and so these males would have mostly daugh-
ters. Such an allele would potentially spread in the population as it
is over transmitted through males, even if it somewhat reduces the
fitness of the individuals who carry it. The spread of this allele would
strongly bias the population sex ratio towards females. Such ‘selfish’ X
alleles turn out to be relatively common, and they can often substan-
tially low the fitness of the bearer. They do not spread because they
are good for the individual but rather because they are favoured due
to selection below the level of the individual.
One example of a selfish X chromosome allele is the Winters sex-
ratio system found in Drosophila simulans, so named as it was found
Figure 10.29: Top) Normally de-
in flies collected around Winters, California (just a few miles down the
veloping spermatids in D. simulans.
road from Davis). In crosses males carrying the selfish X chromosome Bottom) Abnormally developing
have > 80% daughters. The gene responsible, Dox (Distorter on the spermatids in a male expressing dox.
The spermatids that look like rice
X), is a gene duplicated by transposition and produces a transcript crispies carry the Y chromosome,
which targets a region on the Y chromosome preventing the Y-bearing the normal, slender spermatids are
X-bearing spermatids. Figure from
sperm from developing Tao et al. (see Figure 10.29 from 2007).
Tao et al. (2007), cropped, licensed
The spread of such selfish sex chromosomes, distorting the sex ratio under CC BY 4.0.
strongly away from 50/50, can have profound effects for population
growth rates.3 However, the other sex chromosome and autosomes are 3
Indeed people have long discussed
using selfish Y chromosomes, driving
an over production of sons, for pop-
ulation control of malaria-spreading
mosquitos. Natural selfish systems on
the Y appear rare, likely because of
its low gene content.
176 graham coop
Figure 10.30: Mechanistic and Evolu-

tionary Model for sex-ratio Distortion
Left) The X-linked Dox gene evolved
to target the Y chromosome, blocking
Y-bearing sperm from developing and
so favouring its own transmission.
Right) Subsequently Dox was retro-
transposed to an autosome forming
the Nmy gene. Nmy was subsequently
rearranged by a a small duplication,
and now blocks the action of dox by
the formation of a hairpin small in-
terfering RNA. Figure from Ferree
and Barbash (2007), licensed under
CC BY 4.0. See Lin et al. (2018) for
an update on the fascinating biology
and further loci uncovered in this
system.
not helpless against the spread of selfish sex chromosome elements.

In the case of a selfish X chromosome that has achieved appreciable
frequency in the population, there will be a strong excess of females
in the population such that suppressors of drive can arise on the auto-
somes and spread due to the fact that they cause the male bearer to
produces some sons and so spread due to Fisherian sex-ratio advan-
tage. This has happened in the case of the Winters sex chromosome
system. An autosomal allele has spread through the population that
suppresses the selfish X chromosome, restoring the 50/50 sex ratio.
Now the sex ratio distorter can only be found by crosses to naive
populations, where the supressor has not spread yet. The autosomal
supressor gene turns out to be a duplicate of the selfish dox gene,NMY
(Not Much Yang), that moved to the autosome through retrotrans-
position and now blocks the action of dox through RNA-interference
degradation of the dox transcript ( Tao et al., 2007, , see Figure
10.30).
Conflict due to maternally transmitted elements. Chromosomes

transmitted maternally, i.e. only through mothers, also have diver-
gent interests from the individual. Many plants are hermaphrodites
producing both pollen and seeds. But from the perspective of the
Mitochondria in an individual, pollen is a waste of energy as the Mi-
tochondria won’t be transmitted through it. Thus a mutation that
arises on the Mitochondria abolishing male sexual function (pollen)
and shunting energy into other processes can spread. The self spread
of a Cytoplasmic Male Sterility (CMS) allele creates a population of
females and hermaphrodite plants (a gynodioecious population). This
population and
quantitative
genetics 177
strong excess of female plants in turn can select for the spread of au-
tosomal suppressors of CMS that are favoured by producing the rarer
gamete (pollen), and so restore the population to hermaphroditism.
The spread of such CMS alleles, and subsequent autosomal suppres-
sion, is thought to be common in hermaphrodite species and often un-
covered in crosses between diverged hermaphrodite populations. The
discovery or deliberate creation of CMS alleles in agricultural plants
is prized because it gives breeders more control over hybridization as
they can more carefully control the pollen donor to the plants.
The maternal transmission of mtDNA also causes genetic conflicts
in organisms with separate sexes. Males are an evolutionary dead end
as far as mitochondria are concerned, and so mitochondrial mutations
that lower a male’s fitness are not removed from the population of
mitochondria. Thus the Mitochondria genome may be a hotspot of
alleles that are deleterious in males (an effect termed the “Mother’s
curse” ). One example of a male-deleterious mitochondrial mutations
Figure 10.31: Bladder Campion

(Silene vulgaris), on left, has both
hermaphrodite and female plants
due to CMS and nuclear restorer
polymorphisms (Charlesworth
and Laporte, 1998). (S. nutans on
right)
Billeder af nordens flora (1917). Mentz, A
Contributed by The LuEsther T Mertz Library,
the New York Botanical Garden. Not in
copyright.
Figure 10.32: Arrival of the fille

du roi, the ‘king’s daughters’ to
Quebec city in 1667. Painting by
Eleanor Fortescue-Brickdale. The
fille du roi were some 800 women
whose emigration to New France
(Quebec) was paid for by an program
established by King Louis XIV of
underlying Leber’s ‘hereditary optic neuropathy’ (LHON) in humans. France to address the strong gender
imbalance of the new colony. You can
LHON causes degeneration of the optic nerve and loss of vision in read more in this Atlantic article by
teenage males (with much lower penetrance in women). One such Sarah Zhang.
Painting from the Library and Archives
LHON mutation is present at low frequency in the Quebec population. Canada collection, Wikimedia, Public Domain.
The Québécois population grew rapidly from a relatively small num-

ber of founders, leading to the prevalence of some disease mutations
due to the founder effect. Thanks to the detailed genealogical records
kept by French Canadians since the founding of Quebec, we know
that nearly all the Québécois LHON alleles are descended from the
mitochondria of a single woman, one of the fille du roi, who arrived in
Quebec City in 1669 (Laberge et al., 2005). Using the genealogy,
178 graham coop
Milot et al. (2017) tracked all of her mitochondrial descendents, in-

dividuals whose mothers were in her matrilineal line, and so identified
all the individuals in the Québécois who carried this allele. There was
no significant difference in the fitness of females who carried or didn’t
carry the mutation. In contrast, the fitness of male carriers of the mu-
tation was only 65.3% that of male non-carriers. This mitochondria
mutation has increased in frequency slightly over the past 290 years,
despite its strong effects in males, due to the fact that its effects have
no consequence for female fitness.
Question 9. The frequency of the LHON allele was roughly 1/2000

in 1669. If females suffered the same ill consequences as males what
would be the frequency today? [assume there are ∼29 years a genera-
tion]
Question 10. Kin selection has been proposed as a way that the
male deleterious mitochondrial mutations could be removed from the
population. Can you explain this idea?
It’s not just chromosomes that get in on the act of the battle of
the sexes. Numerous arthropods, including a high proportion of in-
sects, are infected with the intracellular bacteria Wolbachia, which
are passed to offspring through the maternal cytoplasm. As they are Figure 10.33: male Eggspot butterfly
only transmitted by females, Wolbachia increase their transmission in (Hypolimnas bolina).
P. Cramer’s Uitlandsche kapellen (1780)
a variety of selfish ways including feminization of males and killing Contributed by Smithsonian Libraries. Not in
copyright.
male embryos. In one dramatic case, a male-killing Wolbachia strain
forced a sex ratio of 100 females to every 1 male in Hypolimnas bolina
(eggspot butterflies) throughout Southeast Asia. This extreme sex
ratio persisted for many decades, according to the analysis of museum
collections from the late 19C, before the sex ratio was rapidly restored
to 50/50 by the spread of an autosomal suppressing allele. The autoso-
mal supressor allele spread very rapidly within populations taking just
5 years to spread through the population from 2001 to 2006.
Selfish Autosomal Systems Self genetic systems can also arise and
cause genetic conflicts on the autosomes. The interests of autosomal
alleles are usually relatively well aligned with promoting the fitness of
the individual who carries them. However, these interests can diverge
during meiosis and gametogenesis. After all, there are two alleles at
each autosomal locus but only one of them will get passed to a child,
therefore there can be competition to be in gamete transmitted to the
next generation.
The four products of meiosis in the fungus Podospora anserina are
arrayed in the ascus4 of the spores for the next generation. There is 4
from the Greek word askos meaning
a polymorphism S/T at the Spok gene in this species. In spores from wineskin.
population and
quantitative
genetics 179
Figure 10.34: Pictures of P. anserina

asci from various crosses. The arrow
in the SxT picture shows a rare ascus
carrying all four products of meiosis.
Figure from Grognet et al. (2014),
SxS and TxT individuals all four products are present. However, only
two out of four spores are present in the ∼ 90% of asci from SxT
individuals (Grognet et al., 2014). The T allele is releasing a toxin
that poisons off the S carrying spores. The jury is still out on whether
the T allele spread due to the advantage created by sabotaging its
rival product of meiosis (Sweigart et al., 2019). However, in other
systems it is clear that alleles have spread due to their selfish actions.
Figure 10.35: The two copies of a

A
chromosome are shown in red and
blue through the process of female
A A
and male meiosis and gametogenesis.
A Crossovers are omitted to keep things
A simpler. Modified from original to
include chromosomes transmitted.
Biology; the story of living things (1937).
A Hunter, G.W., Walter H.E. Image from the
A A A A by MBLWHOI Library. No known copyright
restrictions.
A A
A
A A
A A
A A A B
A B B
A number of well-established genetics systems illustrate in ani-

mals and plants how male and female gametogenesis offer different
opportunities for selfish alleles (Figure 10.35). Just as how selfish X
chromosome systems can spread by targeting sperm that carry the Y
chromosome, selfish autosomal alleles can spread by targeting sperm
carrying the other chromosome in heterozygotes. Both the Drosophila
Segregation Distortion allele and the mouse T-allele are selfish auto-
somal systems that game transmission in heterozygotes by killing off
180 graham coop
sperm that don’t carry the allele in heterozygotes.

In females meiosis there is a unique opportunity for cheating. In
1.0
male meiosis all four products of meiosis become gametes. However,
0.8
only 1 of the four products of female meiosis becomes the egg, the
other 3 products are fated to become the polar bodies. Thus alle-
0.6
p
les can cheat in female meiosis by preferentially getting transmitted
0.4
into the egg rather than the polar body. If an allele on a red chromo-
0.2
some (in top panel of Figure 10.35) can manipulate any asymmetry of
meioses so that it can be present in the egg > 50% of the time it will
0.0
0 20 40 60
have a transmission advantage in female heterozygotes. Generations
To see how such drivers can spread through the population lets
consider the case of a population where an allele drives in both male Figure 10.36: The fate of an unfit
transmission distorter allele. If trans-
and female gametogenesis. (Most selfish alleles will be sex-specific, but
mission is fair (α = 1/2, blue curve)
that makes the math a little more tricky.) Imagine a randomly-mating the allele is lost, but the stronger
population of hermaphrodites. In this population, a derived allele (D) its drive in heterozygotes the faster
its spread and the higher its final
segregates that distorts transmission in its favour over the ancestral frequency in the population (black
allele (d) in the production of all the gametes of heterozygotes. The and red curves, α = 0.7 & 0.9 re-
spectively). With fitnesses wdd = 1,
drive leads to a fraction α of the gametes of heterozygotes (D/d) to
wDd = 0.95, and wDD = 0.1. The
carry the D allele (α ≥ 0.5). The D allele causes viability problems dotted lines show the predicted equi-
such that the relative fitnesses are wdd = 1, 1 > wDd ≥ wDD . If librium. Code here.
the D allele is currently at frequency p in the population at birth, its

frequency at birth in the next generation will be
wDD p2 + wDd α2pq
p′ = (10.45)
w
when α = 1/2, i.e. fair Mendelian transmission this is exactly the same
as our directional selection, which results in our D allele being selected
out of the population (blue line, Figure 10.36). However, if α > 1/2, i.e.
our deleterious allele cheats, it can potentially increase in the popula-
tion when it is rare (red and black lines, Figure 10.36)). However, the
allele can become trapped in the population at a polymorphic equilib-
rium if its cost is sufficient in homozygotes. This is akin to the case
of heterozygote advantage, but now our allele offers no advantage to
heterozygote but has a self advantage in heterozygotes.
Question 11. (Tricker question) Thinking of our autosomal driver
from equation 10.45. B) Imagine the cost of the driver were additive,
i.e. wdd = 1, wDd = 1 − e, wDD = 1 − 2e. Under what conditions can
the driver invade the population? Can a polymorphic equilibrium be
maintained?
A) Imagine the allele is completely recessive, i.e. wdd = wDd =
1. What conditions do you need for a polymorphic equilibrium to
be maintained? What is the equilibrium frequency of this balanced
polymorphism?
Many of the known autosomal drive systems are polymorphic in
population and
quantitative
genetics 181
populations, unable to reach fixation in the population due to their
costs in homozygotes. It seems likely that this represents an ascer-
tainment bias, and that many other selfish systems that had lower
selective costs have swept to fixation.
10.2.2 Appendix: ESS for the sex ratio

Let R be the resources available to an individuals and C♂ and C♀
be the cost of producing a son and daughter respectively. If our focal
mother directs s of her effort towards sons and (1 − s) of her effort to-
wards daughters, she’ll produce CRs sons and R(1−s)C♀ daughters. Let’s
♂
assume that the mean reproductive value of daughters is 1. Given
this, the average reproductive value of sons is the average number of
matings that a male will have, i.e. the ratio # females/# males. So if the
population has a sex ratio sp , the fitness of our focal female is
( ) ( )
R(1 − s) Rs R(1−sp )/C
♀
W (s, sp ) = ×1 + × Rs (10.46)
C♀ C♂ p/C
♂
expressing fitness in terms the number of grandkids our focal female is
expected to have.
To find the ESS we want a sex ratio s∗ for the population such that
no mutant has higher fitness. We can write this as as the population
having strategy sp = s∗ , and then seeing what choice of s∗ leads to
W (s∗ , s∗ ) > W (s, s∗ ) for s ̸= s∗ , i.e. that no new strategy (s) has
higher fitness than the ESS strategy s∗. We can find this ESS s∗ by
∂W (s, sp )
=0 (10.47)
∂s s∗ =s=sp
taking the derivative of Eqn 10.46 we obtain

( )
R(1−sp )/C
∂W (s, sp ) R R ♀
=− + (10.48)
∂s C♀ C♂ Rsp/C
♂
setting s∗ = s = sp and rearranging
( )
R(1−s∗ )/C
R R ♀
= Rs∗/C
(10.49)
C♀ C♂ ♂
which is satisfied when s∗ = 1/2, i.e. devoting equal resources to male
and female offspring is the ESS, which corresponds to a 50/50 sex
ratio if male and female offspring are equally costly.
11
The Interaction of Selection, Mutation, and Migra-
tion.
Genetic variation is the raw fuel of evolution. Without variation,

natural selection would have nothing to act on to shape adaptive
traits. However, variation can be deleterious.
Mutation, broadly defined, is the ultimate source of all genetic vari-
ation and is constantly introducing new variation into all populations.
However, mutation is random and so mutations that affect function
are often damaging. Thus mutation will, in the absence of sufficiently
strong selection, degrade pre-existing adaptations and undo the work
of selection that has built up functional regions of DNA over time.
Migration, the movement of individuals into a population, can
also increase variation to the population as the individuals bring new
alleles in from surrounding populations. Thus migration can be an
important source of adaptive alleles, aiding their spread amongst pop-
ulations within a species. Adaptive alleles can introgression between
species if low levels of interbreeding occur. They can sometimes spread
between very diverged clades of species, indeed sometimes different
domains of life, thanks to horizontal gene transfer. However, again,
just like mutation, migration can disrupt adaptations. When popula-
tions are locally adapted migration amongst populations can introduce
maladaptive alleles into well adapted populations. If this migration
pressure is sufficiently strong, it can lead to the collapse of local adap-
tations, or even the collapse of species.
In this chapter we’ll study some of the interplay between selection,
migration, and mutation.
11.0.1 Mutation–Selection Balance

Mutation is constantly introducing new alleles into the population.
Therefore, variation can be maintained within a population not only
if selection is balancing (e.g. through heterozygote advantage or fluc-
184 graham coop
tuating selection over time, as we have seen in the previous section),

but also due to a balance between mutation introducing deleterious
alleles and selection acting to purge these alleles from the population
(Haldane, 1937). To study mutation-selection balance, we return to
the model of directional selection, where allele A1 is advantageous, i.e.
absolute fitness W11 ≥ W12 ≥ W22
relative fitness w11 = 1 w12 = 1 − sh w22 = 1 − s.
We’ll begin by considering the case where allele A2 is not completely
recessive (h > 0), so that the heterozygotes suffer at least some dis-
advantage. We denote by µ = µ1→2 the mutation rate per generation
from A1 to the deleterious allele A2 , and assume that there is no re-
verse mutation (µ2→1 = 0). Let us assume that selection against A2 is
relatively strong compared to the mutation rate, so that it is justified
to assume that A2 is always rare, i.e. qt = 1 − pt ≪ 1. Compared to
previous sections, for mathematical clarity, we also switch from fol-
lowing the frequency pt of A1 to following the frequency qt of A2 . Of
course, this is without loss of generality. The change in frequency of
A2 due to selection can be written as
w2 − w1
∆ S qt = pt qt ≈ −hsqt . (11.1)
w
This approximation can be found by assuming that q 2 ≈ 0, p ≈ 1,
and that w ≈ w1 . All of these assumptions make sense if q ≪ 1.
From eqn. (11.1) we see that selection acts to reduce the frequency of
A2 (as both h and s are positive), and it does so geometrically across
the generations. That is, if the initial frequency of A2 is q0 , then its
frequency at time t is approximately
qt = q0 (1 − hs)t . (11.2)
We will now consider the change in frequency induced by mutation.
Recalling that µ is the mutation rate from A1 to A2 per generation,
the frequency of A2 after mutation is
q ′ = µpt + qt = µ(1 − qt ) + qt . (11.3)
Assuming that µ ≪ 1 and that q ≪ 1, the change in the frequency of
allele A2 due to mutation (∆M qt ) can be approximated by
∆M qt = q ′ − qt = µ. (11.4)
Hence, when A2 is rare and the mutation rate is low, mutation acts to
linearly increase the frequency of the deleterious allele A2 .
If selection is to balance deleterious mutation, their combined effect
over one generation has to be zero. Therefore, to find the mutation–
selection equilibrium, we set
∆M qt + ∆S qt = 0, (11.5)
population and
quantitative
genetics 185
insert eqns. (11.1) and (11.4), and solve for q to obtain
µ
qe = qt = . (11.6)
hs
We see that the frequency of the deleterious allele A2 is balanced at a
frequency equal to the mutation rate (µ) divided by the reduction in
relative fitness in the heterozygote (hs).
It is worth pointing out that the fitness of the A2 A2 homozygote
has not entered this calculation, as A2 is so rare that it is hardly ever
found in the homozygous state. Therefore, if A2 has any deleteri-
ous effect in a heterozygous state (i.e. if h > 0), it is this effect that
determines the frequency at which A2 is maintained in the popula-
tion. Also, note that by writing the total change in allele frequency as
∆M qt + ∆S qt we have implicitly assumed that we can ignore terms
of order µ × s. That is, we have assumed that mutation and selection
are both relatively weak. This assumption is valid under our prior
assumption that both µ and s are small.
If an allele is truly recessive (although few likely are), we have
h = 0, and so eqn. (11.6) is not valid. However, we can make an
argument similar to the one above to show that, for truly recessive
alleles, √
µ
qe = . (11.7)
s
Question 1. Oblong-winged katydids (Amblycorypha oblongifo-

Figure 11.1: Oblong-winged katydid.
lia) are usually green. However, some are bright pink, thanks to an Field book of insects (1918). Lutz, F.E.
. Illustrations by Edna L. Beutenmüller.
erythrism mutation (a nice example of early Mendelian reasoning in a Image from the Biodiversity Heritage Library.
Contributed by MBLWHOI Library. Not in
wonderfully titled paper1 ). This pink condition is thought to be due copyright.

1
Wheeler, W. M., 1907 Pink
to a dominant mutation (Crew, 2013). Assume that roughly one in Insect Mutants. The American
ten thousand katydids is bright pink and that the mutation rate at the Naturalist 41(492): 773–780
gene underlying this condition is 10−5 . What is the relative fitness of
heterozygotes for the pink mutation?
The genetic load of deleterious alleles What effect do such deleterious

mutations at mutation–selection balance have on the population? It
is common to quantify the effect of deleterious alleles in terms of a
reduction of the mean relative fitness of the population. For a single
site at which a deleterious mutation is segregating at frequency qe =
µ/(hs), the population mean relative fitness is reduced to
w = 1 − 2pe qe hs − qe2 s ≈ 1 − 2µ. (11.8)
Somewhat remarkably, the drop in mean fitness due to a site segre-

gating at mutation–selection balance is independent of the selection
coefficient against the heterozygote; it depends only on the mutation
186 graham coop
rate. Intuitively this is because, given a fixed mutation rate, less dele-
terious alleles can rise to a higher equilibrium frequency, and thus
contribute the same total load as more deleterious (rarer) alleles, but
this load is spread across more individuals in the population. Note
that this result applies only if the mutation is not totally recessive, i.e.
if h > 0.
A fitness reduction of 2µ is very small, given that the mutation
rate of a gene is likely < 10−5 . However, if there are many loci seg-
regating at mutation–selection balance, small fitness reductions can
accumulate to a substantial so-called genetic load, a major cause of
variation in fitness-related traits among individuals. For example,
the human genome contains over twenty thousand genes, and many
other functional regions, the vast majority of which will be subject
to purifying selection against mutations that disrupt their function.
In humans, most loss of function (LOF) variants, which severely dis-
rupt a protein-coding gene, are found at low frequencies. However,
each human genome typically carries over a hundred LOF variants
(MacArthur et al., 2012; Lek et al., 2016). Not every LOF allele
will be deleterious; some could even be advantageous. However, the
combined load of these LOF alleles must on average lower our fitness,
otherwise selection wouldn’t be removing them from the population.
Each one of us carries a unique set of these LOF alleles, usually in a
heterozygous state. We differ slightly in how many of these alleles we
carry. For example, the left side of Figure 11.2 shows the distribution
of the number of LOF alleles carried by 769 individuals of Dutch an-
cestry. The individuals who carry fewer of these LOF alleles will on
average have higher fitness than those individuals with more.
Figure 11.2: Left) The distribution

of LOF alleles in 769 individuals
from the Genome of the Netherlands
50
●
1.2
●
project. Data from Francioli et al..
Number of Individuals
●
●
40
●
The average individual (red line)
1.1
Relative Fitness
●
●
● carries 144 LOF alleles. Right). The
30
●
●
relative fitness of individuals carrying
1.0
●
●
these varying numbers of LOF alleles,
20
●
●
●
0.9
●
●
●
assuming multiplicative selection and
10
●
●
●
a selection coefficient of sh = 10−2
0.8
●
●
● acting against these alleles (Cassa
0
et al., 2017). Code here.

130 140 150 160 170 130 140 150 160
# LOF alleles # LOF alleles
How do these differences across individuals in total LOF muta-

tions mount up? Well, if we are willing to assume that the fitness
costs of deleterious alleles interact multiplicatively, we can make some
progress. If an individual who carries one LOF mutation has a fitness
population and
quantitative
genetics 187
1 − hs, then an individual who’s heterozygote for two LOF mutations
would have fitness (1 − hs)2 , and an individual who is heterozygote
for L LOF alleles would have fitness (1 − hs)L . The right-hand side of
Figure 11.2 shows the predicted fitness of individuals carrying varying
number of LOF alleles, relative to the mean fitness of the sample, us-
ing this multiplicative model. We don’t yet know how much lower the
fitness of these individuals really is, nor do we know how most of these
LOF alleles manifest their fitness consequences through disease and
other mechanisms. However, it’s a reasonable guess that this variation
in LOF alleles, presumably maintained by mutation-selection balance,
is a major source of variation in fitness.
11.0.2 Inbreeding depression

All else being equal, eqn. (11.6) suggests that mutations that have a
smaller effect in the heterozygote can segregate at higher frequency
under mutation–selection balance. As a consequence, alleles that have
strongly deleterious effects in the homozygous state can still segregate
at low frequencies in the population, as long as they do not have too
strong a deleterious effect in heterozygotes. Thus, outbred populations
may have many alleles with recessive deleterious effects segregating
within them.
Question 2. Assume that a deleterious allele has a relative fit-
ness .99 in heterozygotes and a relative fitness 0.2 when present in
the homozygote state. Assume that the deleterious allele is at a fre-
quency 10−3 at birth and the genotype frequencies follow from HWE.
Only considering the fitness effects of this locus, and measuring fitness
relative to the most fit genotype, answer the following questions:
A) What is the average fitness of an individual in the population?
B) What is the average fitness of the child of a full-sib mating?
One consequence of segregating for low-frequency recessive delete-

rious alleles is that inbreeding can reduce fitness. In typically outbred
populations, the mean fitness of individuals decreases with the in-
breeding coefficient, i.e. so-called ’inbreeding depression’ is a common
observation. This wide-spread observation dates back to systematic
surveys of inbreeding depression by Darwin (1876). Inbreeding de-
pression is likely primarily a consequence of being homozygous at
many loci for alleles with recessive deleterious effects.
188 graham coop
Figure 11.3: Data showing inbreeding
1.0
●
●
●●
● depression over different degrees of
●●
●● inbreeding in S. latifolia. Each point
● ●
● is the mean seed germination rates
Full−sib
●
0.8
●
Outbred
● for different family crosses. Data from
●
Germination Rate
●
● Richards. Code here.
●
Two gens. sib

0.6
●
●
●
●
●
● ●
● ●
●
0.4
●
●●
Half−sib
●
●
● ●
●
●
0.2
●
●
●
●
●
0.0
0.0 0.1 0.2 0.3

Inbreeding coeff
One example of inbreeding depression is shown in Figure 11.3.

White campion (Silene latifolia) is a dioecious flowering plant; dioe-
cious means that the males and females are separate individuals.
Richards performed crosses to create offspring who were outbred,
the offspring of half-sibs, full-sibs, and of two generations of full-sib
mating. He measured their germination success, which is plotted in
Figure 11.3. Note how the fitness of individuals declines with increased
inbreeding.
Purging the inbreeding load. Populations that regularly inbreed over

sustained periods of time are expected to partially purge this load
of deleterious alleles. This is because such populations have exposed
many of these alleles in a homozygous state, and so selection can more
readily remove these alleles from the population.
If the population has sustained inbreeding, such that individuals
in the population have an inbreeding coefficient F , deleterious alle-
Figure 11.4: White campion (S.
les at each locus will find a new equilibrium frequency. Assuming latifolia).
the mutation-selection model, now with inbreeding, the equilibrium Deutschlands Flora in Abbildungen (1796).
Johann Georg Sturm (Painter: Jacob Sturm).
Public Domain, wikimedia.
frequency is
µ
qe = ( ) (11.9)
h(1 − F ) + F s
The frequency of the deleterious allele is decreased due to the al-

lele now being expressed in homozygotes, and therefore exposed to
selection, more often due to inbreeding. Thus, all else being equal,
populations with a high degree of inbreeding will purge their load.
population and
quantitative
genetics 189
11.0.3 Migration–selection balance
Another reason for the persistence of deleterious alleles in a population
is that there is a constant influx of maladaptive alleles from other pop-
ulations where these alleles are locally adaptive. Migration–selection
balance seems unlikely to be as broad an explanation for the persis-
tence of deleterious alleles genome-wide as mutation-selection balance.
However, a brief discussion of such alleles is worthwhile, as it helps to
inform our ideas about local adaptation.
Local adaptation can occur over a range of geographic scales. Lo-
cal adaptation is relatively unimpeded by migration at broad geo-
graphically scales, where selection pressures change more slowly than
distances over which individuals typically migrate over a number of
generations. Adaptation can, however, potentially occur on much finer
geographic scales, from kilometers down to meters in some species. On
such small scales, dispersal is surely rapidly moving alleles between
environments, but local adaptation is maintained by the continued
action of selection. An example of adaptation at fine-scales is shown in
Figure 11.6 . Jain and Bradshaw (1966) studied the patterns of
heavy-metal resistance in plants on mine tailings and in nearby mead-
ows, a set of classic studies of population differences maintained by
local adaptation to different soils. Even at these very short geograph- Figure 11.5: Sweet vernal grass
(Anthoxanthum odoratum).
Billeder af nordens flora (1917). Mentz, A &
Ostenfeld, C H. Image from the Biodiversity
Heritage Library. Contributed by New York
Botanical Garden. Not in copyright.
5000
4000
1250
Zn
500
450
220
(P.P.M.)
● ●
●
70
●
Zinc Tolerance
● Figure 11.6: Data showing the Zinc

50
Mine Off Mine tolerance of Anthoxanthum odora-

tum on and off of the Trelogan Mine,
30
Flintshire, North Wales. The numbers

along the top give the soil contamina-
●
● tion of Zinc in parts per million. Data
●
10
from Jain and Bradshaw (1966).

10 0 10 20 30 70 Code here.
Distance to Mine Boundary (meters)
ically scales, over which seed and pollen will definitely move, we see
strong local adaptation. Zinc-intolerant alleles are nearly absent from
the mine tailings because they prevent plants from growing on these
zinc-heavy soils; conversely, zinc-tolerant alleles do not spread into
the meadow populations, likely due to some trade-off or fitness cost of
zinc-tolerance.
As a first pass at developing a model of local adaptation, let’s con-
sider a haploid two-allele model with two different populations, see
Figure 11.7, where the relative fitnesses of our alleles are as follows
190 graham coop
A simple haploid model
Modelling migration–selection balance

allele 1 2
population 1 1 1-s
population 2 1-s 1
Population 1
A1 A2
As a simple model of migration, let’s suppose within a population a
1 1–s
fraction of m individuals are migrants from the other population, and
1 − m individuals are from the same population.
To quickly sketch an equilibrium solution to this scenario, we’ll take m m
an approach analogous to our mutation-selection balance model. To do

this, let’s assume that selection is strong compared to migration (s ≫
m), such that allele 1 will be almost fixed in population 1 and allele
Population 2
2 will be almost fixed in population 2. If that is the case, migration A1 A2
changes the frequency of allele 2 in population 1 (q1 ) by 1–s 1
∆M ig. q1 ≈ m (11.10)
Figure 11.7: Setup of a two-
population haploid model of local
while as noted above ∆S q1 = −sq1 , so that migration and selection adaptation.
are at an equilibrium when 0 = ∆S q1 + ∆M ig. q1 , i.e. an equilibrium
frequency of allele 2 in population 1 of
m
qe,1 = (11.11)
s
Here, migration is playing the role of mutation and so migration–
selection balance (at least under strong selection) is analogous to
mutation–selection balance.
We can use this same model by analogy for the case of migration–
selection balance in a diploid model. For the diploid case, we replace
our haploid s by the cost to heterozygotes hs from our directional
selection model, resulting in a diploid migration–selection balance
equilibrium frequency of
m
qe,1 = (11.12)
hs
As an example of fine-scale local adaptation due to a single lo-
cus, consider the case of the rock pocket mice adapting to lava flows.
Throughout the deserts of the American Southwest there are old lava
flows, where the rocks and soils are much dark than the surrounding
desert.
population and
quantitative
genetics 191
Figure 11.8: Frequency of melanic

mice on the lava flow, and at nearby
locations (diamonds). Frequency
of MC1R melanic allele at same
1.0 ● locations. Data from Hoekstra
● et al. (2004). Code here.
●
0.8
Frequency
0.6
0.4
●
0.2
Melanic Phenotype
● ● MC1R D allele
●
0.0
Xmas Tule West Mid East Oneill
Location
Many populations of small animals that live on these flows have

evolved darker pigmentation to be cryptic against this dark substrate
and better avoid visual predators. One example of such a locally
adapted population are the rock pocket mice (Chaetodipus inter-
medius) who live on the Pinacate lava flow on the Arizona-Mexico
border, studied by Hoekstra et al. (2004). These mice have much
darker, more melanic pelts than the mice who live on nearby rocky
outcrops (see Figure 11.8). Nachman et al. (2003) determined that
a dominant allele (D) at MC1R is the primary determinant of this Figure 11.9: Two species from the
genus Chaetodipus, pocket mice,
melanic phenotype. The frequency of this allele across study sites is
formally known as Perognathus.
shown in Figure 11.8. Hoekstra et al. (2004) found that other, Wild animals of North America, intimate
studies of big and little creatures of the
mammal kingdom (1918), Nelson, E. W.
unlinked markers showed little differentiation over these populations, Image from the Biodiversity Heritage Library.
Contributed by American Museum of Natural
suggesting that the migration rate is high. History Library. Not in copyright.
Question 3. Hoekstra et al. (2004) found that the dark D

allele was at 3% frequency at the Tule Mountains study site. Using
FST -based approaches, for unlinked markers, they estimated that the
per individual migration rate was m = 7.0 × 10−4 per generation
between this site and the Pinacate lava flow. What is the selection
coefficient acting against the dark D allele at the Tule Mountains site?
The width of a genetic cline. We can also extend these ideas beyond
our discrete model to a model of a population spread out on a land-
192 graham coop
scape where individuals migrate in a more continuous fashion. For

simplicity, let’s assume a one dimensional habitat, where the habitat
makes a sharp transition in the middle of our region. You could imag-
ine this to be a set of populations sampled along a transect through
some environmental transition. Our individuals disperse to live on
average σ miles away from where they were born (we can think of this
as our individuals migrating a random distance drawn from a normal
distribution, with mean zero, and σ being the standard deviation of
this distribution). . We’ll think of a bi-allelic model where the ho- “Upon an island hard to reach, the
mozygotes for allele 1 have an additive selective advantage s over allele East Beast sits upon his beach. Upon
the west beach sits the West Beast.
2 homozygotes to the east of our habitat transition (left of zero in Each beach beast thinks he’s the best
Figure 11.10). This flips to allele 2 having the same advantage s west beast.” – Theodor Seuss Geisel
of the transition (right of zero). If you’ve read this send Prof Coop a
picture of the East and West Beast.
Figure 11.10: An equilibrium cline

in allele frequency (the frequency of
allele 2, q( ) is shown). Our individ-
uals dispersal an average distance
of σ = 1miles per generation, and
our allele 2 has a relative fitness of
1 + s and 1 − s on either side of the
environmental change at x = 0. Code
here.
With this setup, we get an equilibrium distribution of our two alle-

les, where to the left of zero our allele 2 is at higher frequency, while
to the right of zero allele 1 predominates. As we cross from the left to
the right side of our range, the frequency of our allele 2 decreases in
a smooth cline. The frequency of allele 2, q( ), is shown as a function
of location along the cline for a variety of selection coefficients (s) in
Figure 11.10. The width of this cline, i.e. the geographic distance over
which the allele frequency changes, depends on the relative strengths
of dispersal and selection. If selection is strong compared to dispersal,
then selection acts to remove maladaptive alleles much faster than
migration acts to move alleles across the environmental transition.
Thus the allele frequency transition would be very rapid, and the cline
population and
quantitative
genetics 193
narrow, as we move across the environmental transition. In contrast,
if individuals disperse long distances and selection is weak, many alle-
1.0
les are being moved back and forth over the environmental transition
0.8
Frequency of allele 2, q(x)
much faster than selection can act against these alleles and so the cline
0.6
would be very wide.
0.4
The width of our cline, i.e. the distance over which we make this
0.2
shift from allele 2 to allele 1 predominating, can be defined in a num-
0.0
−100 −50 0 50 100
ber of different ways. One way to define the cline width, which is Position x, km
simple to define but perhaps hard to measure accurately, is via the

Figure 11.11: An equilibrium cline in
slope (i.e. the tangent) of q(x) at x = 0. See Figure 11.11. Under this allele frequency from Figure 11.10,
definition, the cline width is approximately s = 0.01. Vertical lines show the cline
√ width. The diagonal line show the
0.6σ/ s miles, (11.13) tangent to the cline at its midpoint.
Code here.
note that the units are miles here just because we defined the average
dispersal distance (σ) in miles above. Thus the cline will be wider
Figure 11.12: Allele frequency clines
if individuals dispersal further, higher σ, and if selection is weaker, of two pesticide resistance alleles,
smaller s. The appendix below talks through the math underlying at the Ace 1 and Ester genes, in
the mosquito Culex pipiens. The
these ideas in more detail.
dotted line shows where we move
from pesticide-treated to untreated
Ace 1 Ester areas as we move away from the
French coast. The dots show observed
1.0
1.0
● ●
●
allele frequencies, the solid lines
● ● clines fit under a migration-selection
0.8
0.8
●
Allele frequency
Allele frequency
●
●
●
● balance model of a cline. These allele
● ● ● frequencies represent collections over
0.6
0.6
●
●
● ●●
two summers, the frequencies of the
●
0.4
0.4
● ●
●
●
● ●
alleles are substantially reduced in
● ●
●
●
●
●
●
●
the winter due to the reduced use of
0.2
0.2
●
Treated Untreated Treated Untreated pesticides. Data from Lenormand
0 10 20 30 40 50 0 10 20 30 40 50
Distance from coast (km) Distance from coast (km)
Lenormand et al. (1999) collected mosquitoes (Culex pipiens)

in a north–south transect moving away from the Southern French
coast. Areas near the coast were are treated with pesticides, and the
mosquitos have evolved resistance, but areas just a few tens of kilo-
meters from the coast were untreated. Lenormand et al. estimated
the frequency of two unlinked, pesticide-resistance alleles, and found
them at high frequency near the coast but found that their frequencies
declined rapidly moving inland. Lenormand et al. fit migration-
selection cline models to their data, similar to those in Figure 11.10,
with the pesticide-resistance alleles having an selection advantage (s)
in treated areas an a cost (c) in untreated areas (they didn’t enforce
the selective advantage and cost being symmetric). Figure 11.13: mosquito (Culex pipi-
They estimated that a higher selective advantage for the Ace 1 ens).
Domestic mosquitoes (1939). Bishopp, F.
allele than Ester allele (s = 0.33 and s = 0.19 respectively) and a C. Image from the Biodiversity Heritage
Library. Contributed by U.S. Department of
Agriculture, National Agricultural Library. Not
in copyright.
194 graham coop
higher cost to the Ace 1 allele than Ester allele in untreated areas
(c = 0.11 and c = 07 respectively) potentially explaining the less
extreme cline for Ester allele than the Ace 1 allele. Despite these
strong selection pressures we still see a cline over tens of kilometers
because dispersal is relatively high (σ = 6.6km per generation).
Hybrid zones Local adaptation isn’t the only way that selection can
generate strong spatial patterns. We can also see strong selection-
driven clines when partially-reproductively isolated species spread
back in to secondary contact they can hybridize bringing alleles to-
gether that may not work well with each other. One simple model of
is to think about an under-dominant polymorphism, i.e. where the
heterozygote has lower fitness. The two ancestral populations are al-
ternatively fixed for the two fitter homozygote states, e.g. ancestral
population 1 fixed A1 A1 and ancestral population two the A2 A2 . The
hybrid population forming at the mating edge between the two an-
cestral populations has a high frequency of the less fit heterozygotes.
Thus hybrids are at a disadvantage, potentially acting to keep the two
populations from collapsing into each other.
Figure 11.14: The frequency of the

southern neo-X chromosome mov-
ing along a valley transect (more
1.0
●● ●●
● ● ● southern locations to the right of the
● ●●
● graph). This represents data from
●
●● four different valleys in the French
●
0.8
● Alps over less then a kilometer, each

● ●
●
point represents a sample of 20 males.
●
neo−X Frequency
● ●
● The red curve is the fitted cline under
●●
a model of heterozygote disadvan-
0.6
● ●
● ●
● ●
tage (Bazykin, 1969). Data from
● Barton and Hewitt (1981), Code
●
● here.
0.4
● ●
●
●
●
● ●
0.2
● ●●
●
●● ●
●
●
0.0
●●
●● ● ●● ●
●●
● ●● ●
−500 0 500
Distance (m)
Two previously isolated populations of the short-horned grasshop-

per Podisma pedestris have spread into secondary contact in the
French Alps, probably after the last ice age. The population that
has spread into the Alps from the south has a large section of novel
X chromosome, due to a chromosomal fusion. This ‘neo-X’ is absent
in the populations that spread from the North into the Alps. The two
Figure 11.15: 7. Podisma pedestris, a

species of short-horned grasshoppers;
from a page illustrating Orthoptera.
Illustration from Brockhaus and Efron Ency-
clopedic Dictionary (1890) Image wikimedia,
public domain.
population and
quantitative
genetics 195
populations meet in many valleys running through the Alps, and re-
peated form a narrow hybrid zone, with the frequency of the neo-X
chromosome forming a very steep cline transitioning in frequency over
a few hundred meters (Barton and Hewitt, 1981). One potential
reason for this steep cline is that females who are heterozygous for
the neo-X (neo-X/old-X) may have reduced fitness, consistent with an
underdominant polymorphism. The neo-X allele cannot spread into
the northern population as it cannot increase in frequency when rate.
Conversely the northern population cannot displace the neo-X, as the
old-X is at a disadvantage. This spatial distribution at this locus is a
tension zone between the two populations, where neither population
can make ground on the other due to the low fitness of the hybrid.
We can use our same continuous model of migration and selection
to study this setup. Assuming that the homozygotes are equally fit,
and that the heterozygotes relative fitness is reduced by a selection
coefficent sh , the width of the cline is
σ
√ (11.14)
sh
The stronger the selection the more abrupt the transition between
the populations. These wingless grasshoppers move σ ∼ 20 meters
a generation. Thus a reduction in the relative fitness of the hybrid
would be needed to explain this hybrid zone with a width of ∼ 800m.
More generally we can see tension zones arise when hybrids have re-
duced fitness compared to either species. For example, this can occur
due to be due to bad epistatic interactions between alleles from each
species. If selection is strong enough on hybrids, often because many
loci are involved in incompatibilities between the species, the entire
genome can be tied up in a tension zone between the two species.
Appendix: Some theory of the spatial distribution of allele frequen-

cies under deterministic models of selection
Imagine a continuous haploid population spread out along a line. Each
individual disperses a random distance ∆x from its birthplace to the
location where it reproduces, where ∆x is drawn from the probabil-
ity density g( ). To make life simple, we will assume that g(∆x) is
normally distributed with mean zero and standard deviation σ, i.e.
migration is unbiased and individuals migrate an average distance of
σ.
The frequency of allele 2 at time t in the population at spatial lo-
cation x is q(x, t). Assuming that only dispersal occurs, how does our
allele frequency change in the next generation? Our allele frequency in
the next generation at location x reflects the migration from different
locations in the proceeding generation. Our population at location x
196 graham coop
receives a contribution g(∆x)q(x + ∆x, t) of allele 2 from the popula-

tion at location x + ∆x, such that the frequency of our allele at x in
the next generation is
∫ ∞
q(x, t + 1) = g(∆x)q(x + ∆x, t)d∆x. (11.15)
−∞
To obtain q(x + ∆x, t), let’s take a Taylor series expansion of q(x, t):
dq(x, t) 1 d2 q(x, t)
q(x + ∆x, t) = q(x, t) + ∆x + 2 (∆x)2 + ··· (11.16)
dx dx2
then
(∫ ∞ ) (∫ ∞ )
dq(x, t) 1 d2 q(x, t)
q(x, t+1) = q(x, t)+ ∆xg(∆x)d∆x +2 (∆x)2 g(∆x)d∆x +· · ·
−∞ dx −∞ dx2
(11.17)
∫∞
Because g( ) has a mean of zero, −∞ ∆xg(∆x)d∆x = 0, and has
∫∞
because g( ) has variance σ 2 , −∞ (∆x)2 g(∆x)d∆x = σ 2 . All higher
order terms in our Taylor series expansion cancel out (as all high
moments of the normal distribution are zero). Looking at the change
in allele frequency, ∆q(x, t) = q(x, t + 1) − q(x, t), so
σ 2 d2 q(x, t)
∆q(x, t) = (11.18)
2 dx2
This is a diffusion equation, so that migration is acting to smooth
2
out allele frequency differences with a diffusion constant of σ2 . This is
exactly analogous to the equation describing how a gas diffuses out to
equal density, as both particles in a gas and our individuals of type 2
are performing Brownian motion (blurring our eyes and seeing time as
continuous).
We will now introduce fitness differences into our model and set the
relative fitnesses of allele 1 and 2 at location x to be 1 and 1 + sγ(x).
To make progress in this model, we’ll have to assume that selection
isn’t too strong, i.e. sγ(x) ≪ 1 for all x. The change in frequency of
allele 2 obtained within a generation due to selection is
( )
q ′ (x, t) − q(x, t) ≈ sγ(x)q(x, t) 1 − q(x, t) (11.19)
i.e. logistic growth of our favoured allele at location x. Putting our
selection and migration terms together, we find the total change in
allele frequency at location x in one generation is
( ) σ 2 d2 q(x, t)
q(x, t + 1) − q(x, t) = sγ(x)q(x, t) 1 − q(x, t) + (11.20)
2 dx2
In deriving this result, we have essentially assumed that migration
acted upon our original allele frequencies before selection, and in doing
so have ignored terms of the order of σs.
population and
quantitative
genetics 197
Figure 11.16: An equilibrium cline

in allele frequency. Our individuals
dispersal an average distance of σ =
1km per generation, and our allele 2
has a relative fitness of 1 + s and 1 − s
on either side of the environmental
change at x = 0.
The cline in allele frequency associated with a sharp environmental

transition. To make progress, let’s consider a simple model of local
adaptation where the environment abruptly changes. Specifically, we
assume that γ(x) = 1 for x < 0 and γ(x) = −1 for x ≥ 0, i.e. our allele
2 has a selective advantage at locations to the left of zero, while this
allele is at a disadvantage to the right of zero. In this case we can get
an equilibrium distribution of our two alleles, where to the left of zero
our allele 2 is at higher frequency, while to the right of zero allele 1
predominates. As we cross from the left to the right side of our range,
the frequency of our allele 2 decreases in a smooth cline.
Our equilibrium spatial distribution of allele frequencies can be
found by setting the left-hand side of eqn. (11.20) to zero to arrive at
σ 2 d2 q(x)
sγ(x)q(x) (1 − q(x)) = − (11.21)
2 dx2
We then could solve this differential equation with appropriate bound-
ary conditions (q(−∞) = 1 and q(∞) = 0) to arrive at the appropriate
functional form for our cline. While we won’t go into the solution of
this equation here, we can note that by dividing our distance x by
√
ℓ = σ/ s, we can remove the effect of our parameters from the above
equation. This compound parameter ℓ is the characteristic length of
our cline, and it is this parameter which determines over what geo-
graphic scale we change from allele 2 predominating to allele 1 pre-
dominating as we move across our environmental shift.
12
The Impact of Genetic Drift on Selected Alleles
“Natural selection is a mechanism for generating an exceedingly high

degree of improbability.” –R.A. Fisher
In the previous chapter we assumed that the selection acting on our

alleles was strong enough that we could ignore the action of genetic
drift in shaping allele frequencies. However, genetic drift affects all al-
leles, and so in this chapter we explore the interaction of selection and
drift. Strongly selected alleles can be lost from the population via drift
when they are rare in the population, while both weakly beneficial and
weakly deleterious alleles are subject to the random whims of genetic
drift throughout their entire time in the population. Understanding
the interaction of selection and genetic drift is key to understand-
ing the extent to which small populations may be mutation-limited
in their rates of adaptation, and how rates of molecular and genome
evolution may differ across taxa.
12.1 Stochastic loss of strongly selected alleles

Even strongly beneficial alleles can be lost from the population when
they are sufficiently rare. This is because the number of offspring left
by individuals to the next generation is fundamentally stochastic. A
selection coefficient of s=1% is a strong selection coefficient, which can
drive an allele through the population in a few hundred generations
once the allele is established. However, if individuals have on average a
small number of offspring per generation, the first individual to carry
our beneficial allele, who has on average 1% more children than their
peers, could easily have zero offspring, leading to the loss of our allele
before it ever gets a chance to spread.
To take a first stab at this problem, let’s think of a very large hap-
loid population in which a single individual starts with the selected
allele, and ask about the probability of eventual loss of our selected
allele starting from this single copy. To derive this probability of loss
(pL ), we’ll make use of a simple argument (derived from branching
200 graham coop
processes Fisher, 1923; Haldane, 1927). Our selected allele will be

eventually lost from the population if every individual with the allele
fails to leave descendants. Well we can think about different cases:
A B C D
Gen. 1 2 1 2 …. 1 2 …. 1 2 ….
Prob. P0 P1 P2 P3
Prob. loss 1 pL pL2 pL3
Figure 12.1: Four different outcomes

of a selected allele present as a single
copy in the population, leaving zero,
1. In our first generation, with probability P0 our individual allele one, two, three offspring in the next
leaves no copies of itself to the next generation, in which case our generation.
allele is lost (Figure 12.1A).
2. Alternatively, our allele could leave one copy of itself to the next
generation (with probability P1 ), in which case with probability pL
this copy eventually goes extinct (Figure 12.1B).
3. Our allele could leave two copies of itself to the next generation
(with probability P2 ), in which case with probability p2L both of
these copies eventually go extinct (Figure 12.1C).
4. More generally, our allele could leave could leave k copies (k > 0)
of itself to the next generation (with probability Pk ), in which case
with probability pkL all of these copies eventually go extinct (e.g.
Figure 12.1D).
Summing over these probabilities, we see that

∞
∑
pL = Pk pkL (12.1)
k=0
We’ll now need to specify Pk , the probability that an individual car-

rying our selected allele has k offspring. In order for this population
to stay constant in size, we’ll assume that individuals without the se-
lected mutation have on average one offspring per generation, while
population and
quantitative
genetics 201
individuals with our selected allele have on average 1 + s offspring
per generation. We’ll assume that the number of offspring an individ-
ual has is Poisson distributed with mean given by 1 or 1 + s, i.e. the
probability that an individual with the selected allele has i children is
(1 + s)i e−(1+s)
Pi = (12.2)
i!
Substituting Pk into the equation above, we see
∞
∑ (1 + s)k e−(1+s)
pL = pkL
k!
k=0
( ∞
)
∑ (pL (1 + s))
k
= e−(1+s) (12.3)
k!
k=0
The term in the brackets is itself an exponential expansion, so we can

rewrite this equation as
pL = e(1+s)(pL −1) (12.4)
Solving for pL would give us our probability of loss for any selection
coefficient. Let’s rewrite our result in terms of the the probability of
escaping loss, pF = 1 − pL . We can rewrite eqn. (12.4) as
1 − pF = e−pF (1+s) (12.5)
To gain an approximate solution for this result, let’s consider a small

selection coefficient s ≪ 1 such that pF ≪ 1 and then use a Taylor
series to expand out the exponential on the right hand side (ignoring
terms of higher order than s2 and p2F ):
1 − pF ≈ 1 − pF (1 + s) + p2F (1 + s)2 /2 (12.6)
Solving this we find that

pF = 2s. (12.7)
Thus even an allele with a 1% selection coefficient has a 98% proba-

bility of being lost when it is first introduced into the population by
mutation.
If the mutation rate towards our advantageous allele is µ, and there
are N individuals in our haploid population, then N µ advantageous
mutations arise per generation. Each of these new beneficial mutations
has a probability pF of fixing. Thus the number of advantageous mu-
tations arising per generation that will eventually fix in the population
is N µpF , and the waiting time for a mutation that will fix to arise is
the reciprocal of this: 1/N µpF . Thus, in adapting to a novel selection
pressure via new mutations, the population size, the mutational target
size, and the selective advantage of new mutations all matter. One
202 graham coop
reason why combinations of drugs are used against viruses like HIV
and malaria is that, even if the viruses adapt to one of the drugs, the
viral load (N ) of the patient is greatly reduced, making it very un-
likely that the population will manage to fix a second drug-resistant
allele.
Diploid model of stochastic loss of strongly selected alleles. We can

also adapt this result to a diploid setting. Assuming that heterozy-
gotes for the 1 allele have on average 1 + hs children, the probability
allele 1 is not lost, starting from a single copy in the population, is
pF = 2hs (12.8)
for h > 0. Note this is a slightly different parameterization from our

diploid model in the previous chapter; here h is the dominance of our
positively selected allele, with h = 1 corresponding to the full se-
lective advantage expressed in an individual with only a single copy.
Thus the probability that a beneficial allele is not lost depends just
on the relative fitness advantage of the heterozygote; this is because
when the allele is rare it is usually present in heterozygotes and so its
probability of escaping loss just depends on the fitness of these indi-
viduals compared to homozygotes for the ancestral allele (assuming an
outbred population).
Figure 12.2: Map of G6PD-

deficiency allele frequencies
across Asia. The pie chart shows the
frequency of G6PD-deficiency alleles.
The size of the pie chart indicates the
number of G6PD-deficient individuals
sampled. Countries with endemic
malaria are colored yellow. Figure
from Howes et al. (2013), licensed
under CC BY 4.0.
Over roughly the past ten thousand years, adaptive alleles con-
ferring resistance to malaria have arisen in a number of genes and
spread through human populations in areas where malaria is en-
demic (Kwiatkowski, 2005). One particularly impressive case of
convergent evolution in response to selection pressures imposed by
malaria are the numerous changes throughout the G6PD gene, which
include at least 15 common variants in Central and Eastern Asia
alone that lower the activity of the enzyme (Howes et al., 2013).
These alleles are now found at a combined frequency of around 8%
frequency in malaria endemic areas, rarely exceeding 20% (Howes
population and
quantitative
genetics 203
et al., 2012). Whether these variants all confer resistance to malaria
is unknown, but a number of these alleles have demonstrated effects
against malaria and are thought to have a selective advantage to het-
erozygotes sh > 5% where malaria is endemic (Ruwende et al.,
1995; Tishkoff et al., 2001; Louicharoen et al., 2009).
With a 5% advantage in heterozygotes, a G6PD allele present
as a single copy would only have a 10% probability of fixing in the
population. If that’s so, how come malaria adaptation has repeat-
edly occurred via changes at G6PD? Well, maybe adaptation didn’t
start from a single copy of the selected allele. How many copies of the
G6PD-deficiency alleles do we expect were segregating in the popula-
tion before selection pressures changed?
In the absence of malaria, these G6PD alleles are deleterious with
carriers suffering from G6PD deficiency, leading to hemolytic anemia
when individuals are exposed to a variety of different compounds,
notably those present in fava beans. There’s upward of one hundred
bases where G6PD-deficiency alleles can arise, so assuming a mutation Figure 12.3: Pythagoras’s “just say no
rate of ≈ 10−8 per base pair per generation, we can roughly estimate to fava beans” campaign. Pythagoras
prohibited the consumption of fava
the rate of mutations arising that affect the G6PD gene as µ ≈ 10−6 beans by his followers; perhaps be-
per generation. In the absence of malaria, the selective cost of being cause favaism, the anemia induced in
G6PD-deficient individuals by fava
a heterozygotes carrier of a G6PD-deficient allele must have been on
beans, is relatively common in the
the order of 5% or more, and thus the frequency of the allele under Mediterranean due to adaptation to
mutation-selection balance would have been ≈ 10 /0.05 = 2 × 10−5 .
−6
endemic malaria. French early 16th
Century. Woodner Collection, Na-
Assuming an effective population size of 2 − 20 million individu- tional Gallery of Art. Public Domain,
als, roughly five to ten thousand years ago that means that there wikimedia.
A full analysis of this case requires
would have been forty to four hundred copies of the G6PD-deficiency
modeling of G6PD’s X chromosome
allele present in the population when selection pressures shifted at inheritance, and the randomness in
the introduction of malaria. The chance that one of these newly the number of copies of the allele
present at mutation-selection balance
adaptive alleles is lost is 90% but the chance that they’re all lost is (Ralph and Coop, 2015).
< (0.9)40 ≈ 0.02, i.e. there would have been a greater than 98% chance
that adaptation would occur via one or more alleles at G6PD. How
many alleles would escape drift? Well with 40 − 400 copies of the allele
pre-malaria, and each of them having a 10% probability of escaping
drift, we expect between 4 and 40 G6PD alleles to escape drift and
contribute to adaptation. We see 15 common G6PD alleles in Eurasia,
so our simple model of adaptation from mutation-selection balance
seems reasonable.
Question 1. ‘Haldane’s sieve’ is the name for the idea that the
mutations that contribute to adaptation are likely to be dominant or
Figure 12.4: Haldane’s sieve. To
at least co-dominant. our knowledge Haldane never wore
A) Briefly explain this argument with a verbal model relating to a sieve, but we assume he owned
one. Sieve, Flickr licensed under CC
the results we’ve developed in the last two chapters. BY 2.0. Haldane, Public Domain
B) Haldane’s sieve is thought to be less important for adaptation wikimedia.
from previously deleterious standing variation, than adaptation from
204 graham coop
new mutation. Can you explain the intuition behind of this idea?
C) Haldane’s sieve is likely to be less important in inbred, e.g.
selfing, populations. Why is this?
Question 2. Melanic squirrels suffer a higher rate of predation

(due to hawks) than normally pigmented squirrels. Melanism is due to
a dominant, autosomal mutation. The frequency of melanic squirrels
at birth is 4 × 10−5 .
A) If the mutation rate to new melanic alleles is 10−6 , assuming
the melanic allele is at mutation-selection equilibrium, what is the
reduction in fitness of the heterozygote?
Suddenly levels of pollution increase dramatically in our population,
and predation by hawks now offers an equal (and opposite) advantage Figure 12.5: cress bug (Asellus aquati-
cus) in the isopod family Asellidae.
to the dark individuals as it once offered to the normally pigmented Brehms Tierleben. Allgemeine kunde des
Tierreichs (1911). Brehm A.E. Image from the
individuals. Biodiversity Heritage Library. Contributed by
B) What is the probability that a single copy of this allele (present
just once in the population) is lost?
C) If the population size of our squirrels is a million individuals,
and is at mutation-selection balance, what is the probability that the
population adapts from one or more allele(s) from the standing pool of
melanic alleles?
12.2 The interaction between genetic drift and weak selection.

For strongly selected alleles, once the allele has escaped initial loss at
low frequencies, its path will be determined deterministically by its ●
selection coefficients. However, if selection is weak compared to genetic

0.5
● ●
drift, the stochasticity of reproduction can play a role in the trajectory ●
an allele takes even when it is common in the population. If selection ●

●
0.4
dN/dS
●
●
is sufficiently weak compared to genetic drift, then genetic drift will ●
●
●
●
●
●
dominate the dynamics of alleles and they will behave like they’re
0.3
●
●
●
●
effectively neutral. Thus, the extent to which selection can shape ●

0.2
patterns of molecular evolution will depend on the relative strengths ●
Subterranean Surface
of selection and genetic drift. But how weak must selection on an
allele be for drift to overpower selection? And do these interactions
between selection and drift have longterm consequences for genome-
Figure 12.6: Asellid isopods have
wide patterns evolution? repeatedly invaded subterranean,
To model selection and drift each generation, we can first calculate ground-water habitats from surface-
water habitats, and leading to a
the deterministic change in our allele frequency due to selection using genome-wide increase in dN/dS
our deterministic formula. Then, using our newly calculated expected and larger genomes (Data from
Lefébure et al., 2017, compar-
allele frequency, we can binomially sample two alleles for each of our
ing independent isopod species pairs).
offspring to construct the next generation. This approach to jointly One possible explanation of this is
modeling genetic drift and selection is called the Wright-Fisher model. that the longterm effective population
sizes of the subterranean species are
Under the Wright-Fisher model, we will calculate the expected lower and so these species are less
able to prevent mildly deleterious
alleles fixing, and also less able to
prevent genome expansion from the
accumulation of weakly deleterious,
extraneous genomic DNA. Code here.
population and
quantitative
genetics 205
change in allele frequency due to selection and the variance around
this expectation due to drift. To make our calculations simpler, let’s
assume an additive model, i.e. h = 1/2, and that s ≪ 1 so that w ≈ 1.
Using our directional selection deterministic model, from Chapter 10,
and these approximations gives us our deterministic change due to
selection
s
∆S p = E(∆p) = p(1 − p) (12.9)
2
To obtain our new frequency in the next generation, p1 , we binomially
sample from our new deterministic frequency p′ = p + ∆S p, so the
variance in our allele frequency change from one generation to the
next is given by
p′ (1 − p′ ) p(1 − p)
V ar(∆p) = V ar(p1 − p) = V ar(p1 ) = ≈ . (12.10)
2N 2N
where the previous allele frequency p drops out because it is a con- To see this denote our new count of
stant and the variance in our new allele frequency follows from the allele 1 by i, then
fact that we are binomially sampling 2N new alleles from a frequency Var(p1 − p) = i
Var( 2N − p) = Var( 2N
i
)
p′ to form the next generation. =
Var(i)
(2N )2
To get our first look at the relative effects of selection vs. drift we and from binomial sampling Var(i) =
can simply look at when our change in allele frequency caused by se- 2N p′ (1 − p′ ) and so we arrive at our
answer. Assuming that s ≪ 1, p′ ≈ p,
lection within a generate is reasonably faithfully passed down through
then in practice we can use
the generations. In particular, if our expected change in allele fre-
Var(∆p) = Var(p′ − p) ≈ p(1−p)/2N .
quency is much greater than the variance around this change, genetic
drift will play little role in the fate of our selected allele (once the al-
lele is not at low copy number within the population). When does se-
lection dominant genetic drift? This will happen if E(∆p) ≫ V ar(∆p),
i.e. when |N s| ≫ 1. Conversely, any hope of our selected allele follow-
ing its deterministic path will be quickly undone if our change in allele
frequencies due to selection is much less than the variance induced by
drift. So if the absolute value of our population-size-scaled selection
coefficient |N s| ≪ 1, then drift will dominate the fate of our allele.
To make further progress on understanding the fate of alleles with
selection coefficients of the order 1/N requires more careful modeling.
However, under our diploid model, with an additive selection coef-
ficient s, we can obtain the probability that allele 1 fixes within the
population, starting from a frequency p :
1 − e−2N sp
pF (p) = (12.11)
1 − e−2N s
The proof of this result is sketched out below (see Section 12.2.1). A
new allele that arrives in the population at frequency p = 1/(2N ) has
a probability of reaching fixation of
( )
1 1 − e−s
pF = (12.12)
2N 1 − e−2N s
206 graham coop
Figure 12.7: The probability of the

fixation of a new mutation with
selection coefficient s (h = 1/2) in
1e−03 a diploid population of effective size
Ne = 2000
Ne = 5000 Ne . The dashed line gives the infinite
Ne = 10000 population solution. The dots give the
Ne = ∞ pF = s solution for s → 0, i.e. the neutral
8e−04
case, where the probability of fixation

is 1/(2Ne ). Code here.
Prob. of fixation, π(1 2Ne)
6e−04
4e−04
●
2e−04
●
●
0e+00
−5e−04 0e+00 5e−04 1e−03
selection coeff., s
If s ≪ 1 but N s ≫ 1 then pF ( 2N 1
) ≈ s, which nicely gives us back
the result that we obtained above for an allele under strong selection
(eqn. (12.8)). Our probability of fixation (eqn. (12.12)) is plotted as
a function of s and N in Figure 12.7. To recover our neutral result,
we can take the limit s → 0 to obtain our neutral fixation probability,
1/(2N ).
In the case where N s is close to 1, then

( )
1 s
pF ≈ (12.13)
2N 1 − e−2N s
This is greater than our earlier result pF = s from the branching
process argument (using our additive model of h = 1/2), increasingly
so for smaller N . Why is this? The reason why is that pF is really
the probability of ”never being lost” in an infinitely large population.
So to persist indefinitely, the allele has to escape loss permanently,
by never being absorbed by the zero state. When the population size
is finite, to fix we only need to reach a size 2N individuals. Weakly
beneficial mutations (N s ∼ 1) are slightly more likely to fix than the s
probability, as they only have to reach 2N to never be lost.
If, for selection to operate on an allele, we need the selection coeffi-
cient to satisfy |N s| ≫ 1, then that holds if |s| ≫ 1/N . Well, effective
population sizes are often reasonably large, on the order of hundreds
of thousands or millions of individuals, thus selection coefficients on
the order of 10−5 to 10−6 can be effectively selected upon, i.e. se-
population and
quantitative
genetics 207
Figure 12.8: The probability of the

fixation of a new mutation with
selection coefficient s relative to the
neutral fixation probability (1/2Ne )
105
log10 Ratio of Prob. of fixation π(1 2Ne) to Neutral Prob.
as a function of the effective size Ne .

104 The selection coefficient is shown
next to the line. Note how quickly
103 the probabilities move away from the
neutral expectation as Ne s moves
102 10−2 10−3 10−4 10−5
passed 1. Code here.
101
1
−1
10
10−2 − 10−2 − 10−3 − 10−4 − 10−5 − 10−6
10−3
10−4
10−5
10 102 103 104 105 106 107

log10(Ne)
lection equivalent to individuals have incredibly slight advantages in

terms of the number of offspring they leave to the next generation (see
Figure 12.8). While we are incapable of detecting measuring all but
the large fitness effect sizes, except in some elegant experiments (e.g. 0.4
in microbes), such small effects are visible to selection in large popula-

0.3
Frequency
tions. Thus, if consistent selection pressures are exerted over long time
0.2
periods, natural selection can potentially finely tune various aspects of

an organism.
0.1
As one example of this fine-tuning, consider how carefully crafted

0.0
and optimized the sequence of codons is for translation. Due to the CTT CTC CTA CTG TTA TTG
degeneracy of the protein code, multiple codons code for the same
Figure 12.9: Data from Drosophila
amino-acid. For example, there are six different codons that can code melanogaster on the frequency of
leucine. While these synonymous codons are equivalent at the protein different codons for Leucine. Data
from Genscript. Code here.
level, cells do differ in the number of tRNA molecules that bind these
codons and so the efficacy and accuracy with which proteins can be
formed through translation and folding. These slight differences in ●
translation rates likely often correspond to tiny differences in fitness,

0.4
but do they matter? ●

Codon Bias
0.2
In many organisms there is a strong bias in the codons to encode

●
●
particular amino-acids, see Figure 12.9, with the most abundant codon
0.0
●
●
matching the most abundant tRNA in cells. This ’codon bias’ likely ●●
●
−0.2
reflects the combined action of weak selection and mutational pressure, ●
0 10 20 30 40 50 60
pushing the codon composition of the genome and tRNA abundances Expression Level
Figure 12.10: A measure of unequal

codon frequencies (F) plotted in bins
of of gene expression (E) for genes
across the Drosophila melanogaster
genome. Data from Hey and Kli-
man (2002). Code here.
208 graham coop
towards an adaptive compromise. These selection pressures have acted

over long time periods, as codon usage patterns are often very simi-
lar for species that diverged over many tens of millions of years ago.
Compared to other genes, highly expressed genes show a strong bias
towards using codons matching abundant tRNAs, consistent with the
idea that the synonymous codon content of highly expressed genes is
evolving to optimize their translation (see Figure 12.10 for an early
example). These patterns likely represent the action of selection pres-
sures that are incredibly weak on average, but that have played out
over vast time-periods.
The fixation of slightly deleterious alleles. From Figure 12.7 we can

see that weakly deleterious alleles can also fix, especially in small
populations. To understand how likely it is that deleterious alleles by
chance reach fixation by genetic drift, let’s assume a diploid model
with additive selection (with a selection coefficient of −s against our
allele 2).
If N s ≫ 1 then our deleterious allele (allele 2) cannot possibly reach
fixation. However, if N s is not large, then the probability of fixation
( )
1 s
pF ≈ 2N s (12.14)
2N e −1
for our single-copy deleterious allele. So deleterious alleles can fix

within populations (albeit at a low rate) if N s is not too large. As
above, this is because while deleterious mutations will never escape
loss in infinite populations, they can become fixed in finite population
by reaching 2N copies.
Question 3. An additive mutation arises that lowers the relative
fitness of heterozygotes by 10−5 . What is the probability that this
mutation fixes in a diploid population with effective size of 104 ? What
is the probability it fixes in a population of effective size 106 ? By
comparing both to their neutral probability describe the intuition
behind this result.
Ohta proposed the ‘nearly-neutral’ theory of molecular evolu-

tion in a series of papers1 . She suggested that a reasonable fraction 1
Ohta, T., 1972 Population size
of newly arising functional mutations may have very weak selection and rate of evolution. Journal of
Molecular Evolution 1(4): 305–314;
coefficients, such that species with smaller effective population sizes Ohta, T., 1973 Slightly deleterious
may have higher rates of fixation of these very weakly deleterious al- mutant substitutions in evolution.
Nature 246(5428): 96; and Ohta,
leles. In effect, her suggestion is that the constraint parameter C of T., 1987 Very slightly deleterious
a functional region is not a fixed property, but rather depends on the mutations and the molecular clock.
ability of the population to resist the influx of very weakly deleterious Journal of Molecular Evolution 26(1-
2): 1–6
mutations.
population and
quantitative
genetics 209
Figure 12.11: Data from 44 metazoan

species from Cuttlefish to Sifakas.
Each dot represents the average
●
● invertebrates of over many genes plotting dN/dS
0.25 ● vertebrates against synonymous diversity (πS ).
Data from Galtier (2016). Code
here.
●
0.20
●
●
dN/dS
● ●
●
●
●
0.15
● ●
●
● ●
●● ●
● ● ●
● ● ● ● ●
●
●
0.10
●
● ● ●
●
●● ●●
●
● ● ● ● ●
0.05
● ●
−3.0 −2.5 −2.0 −1.5

log10(πS)
Across species, genome-wide averages of dN/dS do seem to be corre-

lated with measures of the effective population size (such as synony-
mous diversity), see Figure 12.11. This evidence supports the idea that
in species with smaller effective population sizes (lower πS ), proteins
may be subject to lower degrees of constraint, as very weakly delete-
Figure 12.12: Common Cuttlefish
rious mutations are able to fix. Thus, some reasonable proportion of (Sepia officinalis).
Cefalopodi viventi nel Golfo di Napoli (1896).
functional substitutions in populations with small effective population Jatta G. Image from the Biodiversity Heritage
sizes, such as humans, may be mildly deleterious. Licensed under CC BY-2.0.
12.2.1 Appendix: The fixation probability of weakly selected alle-

les
What is the probability a weakly beneficial or deleterious additive
allele fixes in our population? We’ll let P (∆p) be the probability that
our allele frequency shifts by ∆p in the next generation. Using this, we
can write our probability pF (p) in terms of the probability of achieving
fixation averaged over the frequency in the next generation
∫
pF (p) = pF (p + ∆p)P (∆p)d(∆p) (12.15)
This is very similar to the technique that we used when deriving our
probability of escaping loss in a very large population above.
So we need an expression for pF (p + ∆p). To obtain this, we’ll do a
Taylor series expansion of pF (p), assuming that ∆p is small:
dpF (p) d2 pF (p)
pF (p + ∆p) ≈ pF (p) + ∆p + (∆p)2 (p) (12.16)
dp dp2
Figure 12.13: Coquerel’s Sifaka
(Propithecus coquereli).
A hand-book to the primates (1894). Forbes,
H. O. Image from the Biodiversity Heritage
Licensed under CC BY-2.0.
210 graham coop
ignoring higher order terms.

Taking the expectation over ∆p on both sides, as in eqn. 12.15, we
obtain
dpF (p) d2 pF (p)
pF (p) = pF (p) + E(∆p) + E((∆p)2 ) (12.17)
dp dp2
Well, E(∆p) = 2s p(1 − p) and V ar(∆p) = E((∆p)2 ) − E2 (∆p), so if

s ≪ 1 then E2 (∆p) ≈ 0, and E(∆p)2 = p(1−p)
2N . Substituting in these
values and subtracting p from both sides of our equation, this leaves
us with
s dpF (p) p(1 − p) d2 pF (p)
0 = p(1 − p) + (12.18)
2 dp 2N dp2
and we can specify the boundary conditions to be pF (1) = 1 and
pF (0) = 0. Solving this differential equation is a somewhat involved
process, but in doing so we find that
1 − e−2N sp
pF (p) = (12.19)
1 − e−2N s
This proof can be extended to alleles with arbitrary dominance, how-
ever, this does not lead to a analytically tractable expression so we do
not pursue this here.
13
The Effects of Linked Selection.
Genetic drift is not the only source of randomness

in the dynamics of alleles. Alleles also experience random fluctuations
in frequency due to the fact that they present on a set of random ge-
netic backgrounds with different fitnesses. For example, when a benefi-
cial allele arises via a single mutation, it arises on a particular genetic
background, i.e. a particular haplotype (Figure 13.1A). Imagine this
mutation arising in a region with no recombination, or in an organism
where genetic exchange is rare. If our beneficial allele becomes estab-
lished in the population, i.e. escapes loss by genetic drift in those first
few generations, it will start to increase in frequency rapidly. As it
rises in frequency, so will the alleles that happened to be present on
the haplotype that the mutation arose on (if those other alleles are
neutral or at least not too deleterious). These other alleles are get-
ting to ’hitchhiking’ along. The alleles that are not on that particular
background are swept out of the population, so the net effect of this
selective sweep is to remove genetic diversity from the population. Di-
versity will eventually recover, as new mutations arise and some slowly
drift up in frequency. But in the short-term, selective sweeps remove
genetic variation from populations.
Williams and Pennings (2019) have visualized selective
sweeps in HIV. In Figure 13.1B) we see a set of HIV haplotypes sam-
pled from a patient before and after of a selective sweep of a drug-
resistant mutation. The patient is taking a retrotransposase inhibitor
(Efavirenz), but sadly within 161 days a drug-resistant mutation that
changes the HIV retrotransposase protein has arisen and spread. Note
how a particular haplotype is now fixed in the sample, and little ge-
netic diversity remains, due to the hitchhiking effect of the strong
selective sweep of this allele.
Figure 13.1: A) In the top panel, a
212 graham coop selected mutation (red dot) arises
on a particular haplotype in the
population. It sweeps to fixation,
carrying with it the haplotype on
which it arose, middle panel, erasing
A) B) the standing genetic diversity in the
region. The bottom panel is some
time after the selective sweep when
some new neutral alleles (green dots)
have started to drift up in frequency.
B) Top panel: HIV sequences from a
patient at the start of drug treatment
in the protease and retrotransposase
coding regions. Bottom panel: A
sample 161 days later, after a drug
resistant mutation has spread, the A
→ T in the 103rd codon of retrotrans-
posase. Each row is a haplotype, with
the alleles present shown as coloured
blocks. Figure B from Williams and
Pennings (2019), licensed under CC
BY 4.0.
To better understand hitchhiking, first let’s imagine examining vari-

ation at a locus fully linked to our selected locus, just after our sweep
reached fixation. Neutral alleles sampled at this locus must trace their
ancestral lineages back to the neutral allele on whose background the t X(t)
selected allele initially arose (Figure 13.2). This is because that back-
ground neutral allele, which existed τ generations ago, is the ancestor
of the entire population at this fully linked locus. Our individuals
who carry the beneficial allele are, from the perspective of these al- Figure 13.2: The coalescent of 4
leles, experiencing a rapidly expanding population. Therefore, a pair lineages, marked in blue, at a locus
completed linked to our selected
of neutral alleles sampled at our linked neutral locus will be forced to allele. The frequency trajectory of the
coalesce ≈ τ generations ago. A newly derived allele with an additive selected allele X(t) is shown in red.
selection coefficient s will take a time τ = 4 log(2N )/s generations to
reach to fixation within our population (see eqn. (10.38)). This is a
very short-time scale compared to the average neutral coalescent time
of 2N generations for a pair of alleles. Thus we expect little variation,
as few mutations will have arisen on these very short branches, and
those that have done will likely be singletons in our sample.
Now let’s think about a sweep in a recombining region. Again the
selected mutation arises on a particular haplotype, and it and its
haplotype starts to increase in frequency in the population. However,
now recombination events can occur between haplotypes carrying and
not carrying the selected allele, in individuals who are heterozygote
for the selected allele. These recombination events allow alleles that
were not present on the original selected haplotype to avoid being
swept out of the population, and also decouple the selected allele
3 2 1
somewhat from hitchhiking alleles, preventing many of them from
hitchhiking all the way to fixation. Far out from the selected site, Figure 13.3: A cartoon depiction of
the recombination rate is high enough that alleles that were present a sweep of a red beneficial allele over
three time points. The haplotype that
on the original background barely get to hitchhike along at all, as the beneficial arose on by mutation
recombination breaks up their association with the selected allele very is shown in black. The three vertical
orange lines mark the loci shown in
Figure 13.4. Neutral alleles segregat-
ing prior to the sweep appear as white
circles, new mutations after the sweep
as green circles.
population and
quantitative
genetics 213
rapidly.
3 2 1
Figure 13.4: Coalescent genealogies
2 3 at three loci different distances along
1 the genome from a selective sweep.
The locations of these three loci
along the genome are marked in
Figure 13.3. The selected mutation
is shown in red. Lineages descended
from recombination events during the
sweep are marked in stars. Neutral
mutations close to each of the loci are
shown on the genealogy.
What do the coalesecent genealogies look like at loci various dis-

tances away from the selected site? Well, close to the selected site all
our alleles in the present day trace back to a most recent common an-
cestral allele present on that selected haplotype, and so are all forced
to coalesce around τ generations ago (locus 1). Slightly further out
from the selected site (locus 2), we have lineages that don’t trace their
ancestry back to the original selected haplotype, but instead are de-
scended from recombinant haplotypes that recombined onto the sweep
(the haplotype 2 from the bottom). These lineages can coalesce neu-
trally with the other ancestral lineages over far deeper time scales and
mutations on these deeper lineages correspond to the standing diver-
sity present in our population prior to the sweep. As we move even
further out from the selected site (locus 3), we encounter more and
more lineages descended from recombinant haplotypes that coalesce
neutrally much deeper in time than τ , allowing diversity to recover to
background levels as we move away from the selected site.
Figure 13.5: The expected reduction

in diversity compared to its neutral
expectation as a function of the
distance away from a site where
a selected allele has just gone to
fixation. The sweeps associated with
two different strengths of selection
are shown, corresponding to a short
timescale (τ ) for the sweep and
long one. The recombination rate is
rBP = 1 × 10−8 . Code here.
To model the expected pattern of diversity surrounding a selected

site, we can think about a pair of alleles sampled at a neutral locus
a recombination distance r away from our selected site. Our pair of
214 graham coop
alleles will be forced to coalesce ≈ τ generations if neither of them of

are descended from recombinant haplotypes (Left side of Figure 13.6).
We know that in the present day our neutral lineage is linked to the
selected allele. The probability that our lineage, in some generation
t back in time, is in a heterozygote is 1 − X(t), and the probability
that a recombination occurs in that individual is r. So the probability
t X(t) t X(t) X(t)
that our neutral lineage is descended from a recombinant haplotype t
generations back is
No recombination Recombination
r(1 − X(t)) (13.1)
So the probability (pN R ) that our lineage is not descended from a re- Figure 13.6: Left) two lineages
coalesce roughly τ generations ago
combinant haplotype from a recombination event in the τ generations as they are both descended from the
it takes our selected allele to move through the population is selected haplotypes. Right) One of
our two lineages is descended from
∏
τ
( ) the selected haplotype but the other
pN R = 1 − r(1 − X(j)) (13.2) is descended from a recombinant on
t=1 to the sweep. The pair on the right
coalesce much deeper back in time.
Assuming that r is small, then (1 − r(1 − X(t))) ≈ e−r(1−X(t)) , such
that
( )
∏
τ ∑τ ( )
pN R = (1 − r(1 − X(t))) ≈ exp −r 1 − X(t) = exp −rτ (1 − X) b
t=1 t=1
(13.3)
b
where X is the average frequency of the derived beneficial allele across
its trajectory as it sweeps up in frequency, X b = 1 ∑τ X(t). As
τ t=1
our allele is additive, its trajectory for frequencies < 0.5 is the mirror
image of its trajectory for frequencies > 0.5, therefore its average
frequency X b = 0.5. This simplifies our expression to
pN R = e−rτ /2 . (13.4)
The probability that neither of our lineages is descended from a re-

combinant haplotype, and hence are forced to coalesce, is p2N R (as-
suming that they coalesce at a time close to τ so that they recombine
independently of each other for times < τ ).
If one or other of our lineages is descended from a recombinant
haplotype, it will take them on average ≈ 2N generations to find a
common ancestor, as we are back to our neutral coalescent probabili-
ties (Right side of Figure 13.10). Thus, the expected time till our pair
of lineages find a common ancestor is
( )
E(T2 ) = τ × p2N R + (1 − p2N R )(τ + 2N ) ≈ 1 − p2N R 2N (13.5)
where this last approximation assumes that τ ≪ 2N . So the expected

pairwise diversity for neutral alleles at a recombination distance r
away from the selected sweep (πr ) is
( )
E(πr ) = 2µE(T2 ) ≈ π0 1 − e−rτ (13.6)
population and
quantitative
genetics 215
So diversity increases as we move away from the selected site, slowly
and exponentially plateauing to its neutral expectation π0 .
The malaria pathogen (Plasmodium falciparum) has evolved drug
resistance to anti-malaria drugs, often by changes at the dhfr gene.
Figure 13.8 shows levels of genetic diversity (heterozygosity) at a set
of markers moving out from the dhfr gene in a set of drug resistant
malaria sequences collected in Thailand (Nash et al., 2005). We see
the characteristic dip in diversity around the gene, with zero diversity
at a number of the loci very close to the gene, suggesting a strong
Figure 13.7: Three species of malaria
selective sweep. Fitting our simple model of a sweep to this data, we
parasites (Plasmodium) in red blood
estimate that τ ≈ 40 generations, corresponding to the drug-resistance cells.
Animal parasites and human disease (1918).
allele fixing in very short time period. Chandler, A.C. Image from the Biodiversity
Heritage Library. Contributed by Cornell
University Library. Not in copyright.
Figure 13.8: Levels of heterozygos-

1.0
ity at a set of microsatellite markers

Heterozygosity (He)
● ●
0.8
●
● ●
surounding the dhfr gene in samples
of drug-resistant malaria (Plasmodium
0.6
●
●
falciparum) from Thailand. The dot-
ted horizontal line gives the average
0.4
● Observed
level of heterozygosity found at these
0.2
Fitted sweep markers in a set of drug-resistant

●
●●
● background levels malaria; we take this background as
0.0
●●●●
our π0 . The dashed line shows our fit-
−200 −100 0 100 200 ted hitchhiking model from equation
Distance from dhfr (kb) 13.6 with τ ≈ 40, fitted by non-linear
least squares. The recombination rate
in P. falciparum is rBP ≈ 10−6 bp−1 .
Data from Nash et al. (2005). Code
here.
To get a sense of the physical scale over which diversity is reduced,
consider a region where recombination occurs at a rate rBP per base
pair per generation, and a locus ℓ base pairs away from the selected
site, such that r = rBP ℓ (where rBP ℓ ≪ 1 so we don’t need to worry
about more than one recombination event occurring per generation).
Typical recombination rates are on the order of rBP = 10−8 . In Figure
13.5 we show the reduction in diversity, given by eqn. (13.6), for two
different selection coefficients.
For our expected diversity level to recover to 50% of its neutral
expectation E(πr )/θ = 0.5, requires a physical distance ℓ∗ such that
log(0.5) = −rBP ℓ∗ τ , and by re-arrangement,
− log(0.5)
ℓ∗ = . (13.7)
rBP τ
As τ depends inversely on the selection s (eqn. (10.38)), the width
of our trough of reduced diversity depends on s/rBP . All else being
equal, we expect stronger sweeps or sweeps in regions of low recombi-
nation to have a larger hitchhiking effect. For example, in a genomic
region with a recombination rate rBP = 10−8 bp01 a selection coeffi-
cient of s = 0.1% would reduce diversity over 10’s of kb, while a sweep
of s = 1% would affect ∼100kb.
216 graham coop
Question 1. van’t Hof et al. (2011) identified the genetic

basis of melanism in the peppered moth (Biston betularia). This al-
lele swept to fixation in northern parts of the UK; a classic case of
adaptation to industrial pollution (made famous by the work of Ket-
tlewell, see Majerus (2009) and Cook et al. (2012)). The
genetic basis of melanism is a transposable element (TE) inserted
into a pigmentation gene. van’t Hof et al. found that diversity
is suppressed in a broad region around the TE. Specifically, on the
background of the TE, it takes roughly 200 kb in either direction for
diversity levels to recover to 50% of genome-wide levels.
Random facts: In all moths and butterflies only males recombine; Figure 13.9: peppered moth (Biston
betularia), non-melanic morph
chromosomes are transmitted without recombination in females. The Les papillons dans la nature (1934).Robert,
P.-A. Image from the Biodiversity Heritage
recombination rate in males is 2.9 cM/Mb. Peppered moths have an Library. Contributed by University of Illinois
Urbana-Champaign. Not in copyright.
effective population size of roughly a hundred thousand individuals.

Kettlewell used to eat moths when out collecting them in the field
(personal communication, Art. Shapiro).
A) Briefly explain how this pattern offers further evidence that the
melanic allele was favoured by selection.
B) Using this information, and assuming the allele’s effects on
fitness are additive, what is your estimate of the age of the allele?
C) What is your estimate of the selection coefficient favouring this
melanic allele?
Other signals of selective sweeps The primary signal of a recently

completed selective sweep is the characteristic reduction in diversity
surrounding the selected site. However, sweeps do leave other signals
and these have also often been used to identify loci undergoing selec-
tion. For example, neutral alleles further away from the selected site
may hitchhiking only part of the way to fixation if recombination oc-
curs during the sweep, which can lead to an excess of high-frequency
derived alleles at intermediate distances away from the selected site,
a pattern lasting for a short time after a sweep (Fay and Wu, 2000;
Przeworski, 2002; Kim, 2006). Also, as neutral diversity levels
slowly recover through an influx of new mutations after a sweep, there
is a strong skew towards low frequency derived alleles, a pattern that
persists for many generations (Braverman et al., 1995; Prze-
worski, 2002; Kim, 2006). The excess of rare alleles, compared to
a neutral model, can be captured by statistics such as Tajima’s D
(which we encountered back in our discussion of the neutral site fre-
quency eqn 4.43). Thus one way to look for loci that have undergone
selective sweeps is to calculate Tajima’s D from data in windows along
the genome and look for strong departures from the null distribution.
population and
quantitative
genetics 217
Figure 13.10: Two populations de-

scended from a common ancestral
population. A beneficial mutation has
occurred in population and swept to
fixation.
We can also use comparisons among multiple populations to look

for evidence of sweeps occurring in one of the populations, for example
to identify alleles involved in local adaptation (see 13.10). A selective
sweep will decrease the within-population diversity (HS ) surrounding
the selected site, without affecting the diversity between different
populations. Thus local sweeps create peaks of FST between weakly
differentiated populations.
Hohenlohe et al. (2010) studied genome-wide patterns of FST
between marine and freshwater populations of threespine stickleback
(Gasterosteus aculeatus), plotted in Figure 13.11. Between different
marine populations, they found no strong peaks of FST ; however, be-
tween the marine and freshwater comparisons they found a number
of high FST peaks that were replicated over a number of freshwater-
marine comparisons. They identified a number of novel regions re-
sponsible for the adaptation of sticklebacks to freshwater environments
and also a number of loci previously identified in crosses between
marine and freshwater populations. For example, the first peak of
Linkage Group IV includes Ectodysplasin A (Eda), a gene involved in
the adaptive loss of armour plating in freshwater environments.
Figure 13.11: FST across the stick-

leback genome, with colored bars
indicating significantly elevated
(p ≤ 10−5 , blue; p ≤ 10−7 , red) and
reduced (p ≤ 10−5 , green) values.
The alternating white and grey panels
indicate different linkage groups. A)
FST between two oceanic populations
B) Average FST between a freshwater
population and the two marine popu-
lations. Figure and caption text from
Hohenlohe et al. (2010), licensed
under CC BY 4.0.
218 graham coop
Soft Sweeps from multiple mutations and standing variation. In our

sweep model above, we assumed that selection favoured a beneficial
allele from the moment it entered the population as a single copy
mutation (left panel, Figure 13.12). However, when a novel selection
pressure switches on, multiple mutations at the same gene may start
to sweep, such that no one of these alleles sweeps to fixation (middle
panel, Figure 13.12). These sweeps involving multiple mutations sig-
nificantly soften the impact of selection on genomic diversity, and so
are called ’soft sweeps’.
Hard sweep. Multiple mutation soft sweep. Single mutation soft sweep. Figure 13.12: Three types of sweeps.
Another way that the impact of a sweep can be softened is if our

allele was segregating in the population for some time before it became
beneficial. That additional time means that our allele can have recom-
bined onto various haplotype backgrounds, such that when selection
pressures switch, the selected allele sweeps up in frequency on multiple
different haplotypes (right panel, Figure 13.12). Detecting and differ-
entiating these different types of sweeps is an active area of empirical
research and theory in population genomics (see Hermisson and
Pennings (2017) for an overview of developments in this area).
13.1 The genome-wide effects of linked selection.
To what extent are patterns of variation along the genome and among
species shaped by linked selection, such as selective sweeps? We can
hope to identify individual cases of strong selective sweeps along the
genome, but how do they contribute to broader patterns of variation?
Two observations have puzzled population geneticists since the in-
ception of molecular population genetics. The first is the relatively
high level of genetic variation observed in most obligately sexual
species. The neutral theory of molecular evolution was developed in
part to explain these high levels of diversity. As we saw in Chapter
4, under a simple neutral model, with constant population size, we
population and
quantitative
genetics 219
should expect the amount of neutral genetic diversity to scale with the
product of the population size and mutation rate. The second obser-
vation, however, is the relatively narrow range of polymorphism across
species with vastly different census sizes (see Figure 2.3 and Leffler
et al. (2012) for a recent review). As highlighted by Lewontin
(1974) in his discussion of the paradox of variation, this observation
seemingly contradicts the prediction of the neutral theory that genetic
diversity should scale with the census population size. There are a
number of explanations for the discrepancy between genetic diversity
levels and census population sizes. The first is that the effective size
of the population (Ne ) is often much lower than the census size, due
to high variance in reproductive success and frequent bottlenecks (as
discussed in Chapter 4). The second major explanation, put forward
by Maynard Smith and Haigh (1974), is that neutral levels
of diversity are also systematically reduced by the effects of linked
selection. In large populations, selective sweeps and other forms of
linked selection may come to dominate over genetic drift as a source
of stochasticity in allele frequencies, potentially establishing an upper
limit to levels of diversity (Kaplan et al., 1989; Gillespie, 2000).
Figure 13.13: The relationship be-

tween (sex-averaged) recombination
rate and synonymous site pair-
wise diversity (π) in Drosophila
melanogaster. The curve is the pre-
dicted relationship between π and
recombination rate, obtained by
fitting the recurrent hitchhiking
equation (13.13) to this data using
non-linear least squares via the nls()
function in R. Data from (Shapiro
et al., 2007), kindly provided by Peter
Andolfatto, see Sella et al. (2009)
for details.Code here.
One strong line of evidence for the action of linked selection in

reducing levels of polymorphism is the positive correlation between
putatively neutral diversity and recombination seen in a number of
species, as, all else being equal, linked selection should remove diver-
sity more quickly in regions of low recombination (Aguadé et al.,
1989; Begun and Aquadro, 1992; Wiehe and Stephan,
1993b; Cutter and Choi, 2010; Cai et al., 2009). For example,
Drosophila melanogaster diversity levels are much lower in genomic
regions of low recombination (see Figure 13.13). This pattern can not
220 graham coop
be explained by differences in mutation rate between low and high re-

combination regions as this pattern is not seen strongly in divergence
data among species.
These patterns could reflect the action of selective sweeps happen-
ing recurrently along the genome. In the next section we’ll present a
model for how levels of genetic diversity should depend on recombi-
nation and the density of functional sites under a model of recurrent
selective sweeps. However, other forms of linked selection can impact
genetic diversity in similar ways. For example, linked genetic diversity
is continuously lost from natural populations due to the removal of
haplotypes that carry deleterious alleles (Charlesworth et al.,
1995; Hudson and Kaplan, 1995b); this is called the ’background
selection’ model. Below we’ll discuss the background selection model
and its basic predictions.
More generally, a wide range of models of selection predict the
removal of neutral diversity linked to selected sites. This is because
the diversity-reducing effects of high variance in reproductive success
are compounded over the generations when there is heritable variance
in fitness (Robertson, 1961; Santiago and Caballero, 1995,
1998; Barton, 2000). Many different modes of linked selection likely
contribute to these genome-wide patterns of diversity; the present
challenge is how to differentiate among these different modes.
13.1.1 A simple recurrent model of selective sweeps

To explain how a constant influx of sweeps could impact levels of
diversity, here we will develop a model of recurrent selective sweeps.
Imagine we sample a a pair of neutral alleles at a locus a genetic
distance r away from a locus where sweeps are initiated within the
population at some very low rate ν per generation. The waiting time
between sweeps at our locus is exponentially distributed ∼ Exp(ν).
Each sweep rapidly transits through the population in τ generations,
such that each sweep is finished long before the next sweep (τ ≪ 1/ν ).
As before, the chance that our neutral lineage fails to recombine off
the sweep is pN R , such that the probability that our pair of lineages
are forced to coalesce by a sweep is e−rτ . Our lineages therefore have
a very low probability
νe−rτ (13.8)
of being forced to coalesce by a sweep per generation. If our lineages

do not coalesce due to a sweep, they coalesce at a neutral rate of 1/2N
per generation. Thus the average waiting time till a coalescent event
between our neutral pair of lineages due to either a sweep or a neutral
coalescent event is
1
E(T2 ) = −rτ 1 (13.9)
νe + /2N
population and
quantitative
genetics 221
Now imagine that the sweeps don’t occur at a fixed location with
respect to our locus of interest, but now occur uniformly at random
across our genome. The sweeps are initiated at a very low rate of νBP
per basepair per generation. The rate of coalescence due to sweeps
at a locus ℓ basepairs away from our neutral loci is νBP e−rBP ℓτ . If
our neutral locus is in the middle of a chromosome that stretches L
basepairs in either direction, the total rate of sweeps per generation
that could force our pair of lineages to coalesce is
∫ L
2νBP ( )
2 νBP e−rBP ℓτ dℓ = 1 − e−rBP τ L (13.10)
0 rBP τ
so that if L is very large (rBP τ L ≫ 1), the rate of coalescence per

generation due to sweeps is 2νBP/rBP τ . The total rate of coalescence
for a pair of lineages per generation is then
2νBP 1
+ (13.11)
rBP τ 2N
So our average time till a pair of lineages coalesce is
1 rBP 2N
E(T2 ) = 2νBP /rBP τ
= (13.12)
+ 1/2N 4N νBP /τ + r
BP
such that our expected pairwise diversity (π = 2µE(T2 )) in a region

with recombination rate rBP that experiences sweeps at rate νBP is
rBP
E(π) = π0 4N ν (13.13)
BP /τ + rBP
where π0 is our expected diversity without any selective sweeps,

(pi0 = θ = 4N µ). The expected diversity increases with rBP , as
higher recombination rates decrease the likelihood a neutral allele
hitchhikes along with a sweep and is thus forced to coalesce by the
sweep. Expected diversity decreases with νBP , as a greater density
of functional sites experiencing sweeps increases the chance of being
linked to a nearby sweep. As we move to high rBP , assuming that νBP
doesn’t increase with rBP , our level of diversity should plateau to θ,
the level of genetic diversity of a neutral site completely unlinked to
any selected loci. If we assume that our genome experiences a constant
rate of sweeps of a given strength, i.e. that 4N νBP/τ is a constant, we
can fit the variation in π across regions that vary in their recombi-
nation rate (rBP ) to estimate a population’s rate of recurrent sweeps
per basepair. An example of fitting this curve to data from Drosophila
melanogaster is shown in Figure 13.13; see Wiehe and Stephan
(1993a) for an early example of fitting a similar recurrent hitchhiking
model to such data. The parameter giving us this best-fitting curve is
4N νBP /τ ≈ 7 × 10−9 . With an effect population size of a million and as-
suming that the sweeps take a thousand generations to reach fixation,

222 graham coop
we find this implies νBP ≈ 10−12 . Thus, a really low rate of moder-
ately strong sweeps, roughly one every megabase every million gener-
ations, is all we need to explain the profound dip in diversity seen in
regions of the genome with low recombination. However, sweeps from
positively selected alleles are not the only cause of genome-wide signals
of linked selection. Selection against deleterious alleles can also drive
these patterns.
13.1.2 Background selection

Populations experience a constant influx of deleterious mutations at
functional loci while selection acts to purge them from the population,
thus preventing deleterious substitutions and maintaining function at
these loci. As we discussed in Chapter 10, this balance between muta-
tion and selection results in a constant level of deleterious variation in
the population. The constant selection against this deleterious varia-
tion has effects on diversity at linked sites. Each deleterious mutation
arises at random on a haplotype in the population, and as selection
purges this mutation, it removes with it any neutral alleles that were
also on this haplotype. This constant removal of linked alleles from the
population acts to reduce diversity in regions surrounding functional
loci (Hudson and Kaplan, 1995a; Nordborg et al., 1996), an
effect known as background selection (BGS).
What proportion of our haplotypes are free of deleterious mutations
in any given generation, and so free to contribute to future genera-
tions? Well, under mutation-selection balance, a constrained locus
with a mutation rate µ towards deleterious alleles that experience a
selection coefficient sh against them in heterozygotes, will result in
µ/sh chromosomes carrying the deleterious allele. Some of these hap-
lotypes may be passed on to the next generation, but if they are fully
linked to the deleterious locus they will all eventually be lost because
they carry a deleterious mutation at a site under constraint. Thus, for
a neutral polymorphism completed linked to a constrained locus, only
2N (1 − µ/sh) alleles get to contribute to future generations. Therefore,
the level of pairwise diversity in a constant population due to BGS at
such a locus will be
E[π] = 2µ × 2N (1 − µ/sh) = π0 (1 − µ/sh) (13.14)
where π0 = 4N µ, the level of neutral pairwise diversity in the absence
of linked selection.
The effects of background selection are more pronounced in regions
of low recombination, where neutral alleles are less able to recombine
off the background of deleterious alleles. Thus, under background
selection, we also expect to see reduced diversity in regions of lower
recombination.
BGS: The balance between a steady flux of deleterious mutations and purifying
selection generates a stable partition of chromosomes in a population, depending on
how many deleterious mutations they carry. Chromosomes with deleterious mutations
will be eliminated relatively quickly from the population by purifying selection, but this
class is constantly replenished by new deleterious mutations. In the absence of
recombination, a new neutral mutation can remain in the population for a long period population and
of time and rise to high population frequencies only if it appears on a gamete that is
free of deleterious mutations, and hence is not destined to be rapidly eliminated. The
effect of this “background selection” against deleterious mutations is a reduction in the
quantitative
level of neutral polymorphism [61], as well a downward shift in their population
frequencies, because of the relative excess of short-lived (and hence low frequency) genetics 223
neutral mutations [63].
Figure 13.14: A cartoon depiction of a

region for 10 haplotypes experiencing
background selection. Neutral muta-
tions are shown as gray circles, and
deleterious mutations in red. Over
time, chromosomes carrying deleteri-
ous mutations are removed from the
population, such that most individ-
uals are descended from a subset of
chromosomes free of deleterious alleles
(highlighted here by orange boxes).
Sella G, Petrov DA, Przeworski M, Andolfatto P (2009) Pervasive Natural Selection in the Drosophila Genome?. PLOS Genetics 5(6): e1000495. Mutation is constantly generating new
https://doi.org/10.1371/journal.pgen.1000495
https://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1000495 deleterious alleles on the background
For a neutral locus that is a recombination fraction r away from a of chromosomes previously free of
deleterious alleles. Figure modified
locus subject to constraint, the level of diversity is from Sella et al. (2009), licensed
( ) under CC BY 4.0.
µsh
E[π] = π0 1 − (13.15)
2(r + sh)2
As we move away from a locus experiencing purifying selection, we
increase r, and diversity should recover. For example, moving away
from genic regions in the maize genome we see the average level of
Relative pairwise diversity

diversity recover. This occurs in both maize and teosinte, the wild
progenitor of maize. The dip in diversity around non-synonymous sites
is stronger in teosinte, perhaps because the accelerated drift due to
the bottleneck in maize may have somewhat released constraint on
sites where very weakly deleterious alleles segregated previously at
mutation-selection balance.
More generally, if a neutral locus is surrounded by L loci experi-
encing purifying selection at recombination distances r1 , · · · , rL , then Distance to Gene (cM)
compounding equation (13.15) across these loci, the expected reduced Figure 13.15: Relative diversity
diversity is approximately compared to the mean diversity in
( L ) windows ≥ 0.01 cM as a function of
∏L ( ) ∑ the distance to the nearest gene. See
µsh µsh
E[π] = π0 1− ≈ exp (13.16) (Beissinger et al., 2016) for details.
i=1
2(rL + sh)2 i=1
2(ri + sh)2 Figure licensed under CC BY 4.0by
Jeff Ross-Ibarra.
To model an average neutral locus in a genomic region with a given
recombination rate, we can imagine that our neutral locus is situated
in the center of a large region with total recombination rate R and
total deleterious mutation rate U , where U = µL. Then our expression
for diversity, equation (13.16), simplies to
E[π] ≈ π0 exp (−U/(sh+R)) ≈ π0 exp (−U/R) . (13.17)
In this last approximation, we assume that we’re looking at a large

region, with R ≫ sh . Note that much like genetic load, equation
(11.8), this expression depends only on the total deleterious mutation
rate. Any dependence on the selection coefficient drops out, as weakly
selected mutations segregate in the population at higher frequencies,
but are also removed from the population more slowly, allowing more
of the genome to recombine off the deleterious background.
6
BGS
Hitchhiking
5
Syn diversity (%)
224 graham coop
4
3
2
1
0
0.0 0.5 1.0 1.5 2.0 2.5
For a first go at fitting this to genome-wide data, we could look rec rate (cM/Mb)
at diversity in windows of length W bp (as in Figure 13.16). If we

assume that there is a constant rate of deleterious mutation per base Figure 13.16: The relationship be-
tween recombination rate and synony-
pair, µBP , then U = µBP W . Furthermore, if our genomic window mous site pairwise diversity (π) in D.
has a recombination rate rBP per base-pair, our total genetic length melanogaster, as in Figure 13.13. The
red curve is the predicted relationship
is R = rBP W . Making these substitutions in equation (13.17), our
between π and recombination rate,
window size cancels out to give obtained by fitting the BGS equation
(13.17) to this data using non-linear
E[π] ≈ π0 exp (−µBP/rbp ) (13.18) least squares via the nls() function
in R. The blue line is the recurrent
Looking across windows that vary in their recombination rate, i.e. hitchhiking equation line from Figure
13.13. Code here.
rBP , we can fit equation (13.18) to data to estimate µBP . An example
of doing this to data from D. melanogaster is shown in Figure 13.16,
yielding an estimate of the deleterious mutation rate of µBP ≈ 3.2 ×
10−9 . This is roughly on the same order as the mutation rate per
base pair in D. melanogaster, and so this deleterious mutation rate
estimate is somewhat high as it would require most of the genome to
be constrained, but as a first approximation it’s not terrible. Note
how similar the fit is to a model of hitchhiking, suggesting that both
BGS and hitchhiking are capable of explaining the broad relationship
between diversity and recombination seen in D. melanogaster and
other species.
Figure 13.17: Observed (black line)

Diversity
and predicted pairwise diversity across

chromosome 1, from a background se-
lection model that assumes a uniform
mutation rate (red line) or a mutation
rate that varies with local human/dog
Position (Mb) divergence (blue line). Figure from
(McVicker et al., 2009), licensed
under CC BY 4.0.
As our annotations of functional regions of the genome have im-

proved, so have our methods to infer background selection. A more
rigorous version of this analysis today would incorporate variation in
coding density among windows into the parameter µBP . With de-
tailed genomic annotations showing coding regions and constrained
non-coding regions, we can also move beyond just analyzing broad-
scale patterns. For example, McVicker et al. (2009) fit a model
of background selection to putatively neutral pairwise diversity along
the human genome, using equation 13.16 to estimate the effect of BGS
at each locus, weighing the genetic distance to all of the surround-
ing coding regions and constrained non-coding sites. This allowed
McVicker et al. (2009) to estimate mutation rates and average
selection coefficients acting against deleterious alleles in these regions
of the genome. This best fitting model also allowed them to predict
population and
quantitative
genetics 225
diversity levels along the genome, a section of which is shown in figure
13.17. Thus, broad-scale features of polymorphism along the genome
are well described by background selection (or by linked selection more
generally).
The deleterious mutation rates estimated by McVicker et al.
(2009) from fitting a model of BGS were again too high, as in the
Drosphila example above, suggesting the BGS alone is not sufficient
to explain all of the effect of linked selection. But how then do we go
about distinguishing the impact of BGS from hitchhiking?
Distinguishing the impact of hitchiking from background selection

in genome-wide data A variety of approaches have been taken to
start to separate the effects of hitchhiking from background selection.
Much of the strongest evidence showing the effects of both comes from
Drosophila melanogaster and we review some of that evidence here.
Hitchhiking is expected to have systematic effects on the neutral site
frequency spectrum, distorting it towards rare minor alleles, (reflecting
the slow recovery of diversity following a sweep). Therefore, we should
expect a distortion of summary statistics such as Tajima’s D in regions
of low recombination if hitchhiking is contributing to the reduction in
diversity in these regions (Braverman et al., 1995; Przeworski,
2002; Kim, 2006). In D. melanogaster, there is a greater skew towards
rare alleles at putatively neutral sites in regions of low recombination
(Andolfatto and Przeworski, 2001; Shapiro et al., 2007),
see left panel of Figure 13.18. However, while this skew isn’t expected
under simple models of strong background selection, other models of
background selection can lead to such patterns.
Figure 13.18: Left) Average Tajima’s

D in genomic windows plotted
against their recombination rate in
1.0
D. melanogaster. Data from Shapiro

0.06
et al. (2007). Right) Synonymous

0.5
Tajima's D
pairwise diversity in genomic win-

0.04
dows as a function of the density of

0.0
πS
non-synonymous subsitutions in the

0.02
window. Data from Andolfatto

−0.5
(2007). Code here.

0.00
−1.0
0.0 0.5 1.0 1.5 2.0 2.5 0.00 0.05 0.10 0.15 0.20 0.25
rec rate (cM/Mb) dN
Another prediction of the hitchhiking model, where an allele sweeps

to fixation, is that there should be a functional substitution associ-
ated with each sweep. Or, to flip that around, we might expect to
see a greater impact of hitchhiking where there are more functional
226 graham coop
substitutions. For example, regions surrounding non-synonymous sub-

stitutions should have lower levels of diversity, if a high fraction of
non-synonymous substitutions are adaptive. Again, this pattern is
seen in D. melanogaster (Andolfatto, 2007; Macpherson et al.,
2007; Sattath et al., 2011b), right side of Figure 13.18.
Figure 13.19: Left) Scaled syn-

onymous pairwise diversity levels
around non-synonymous (NS) and
synonymous (SYN) substitutions in
D. melanogaster. Right) Predicted
scaled diversity levels around non-
synonymous substitutions based on
models including background selection
(BS), classic sweeps (CS) and both
(BS & CS). Figure from Elyashiv
et al. (2016), licensed under CC BY
4.0.
Pushing this idea further, we can look at the dip in diversity sur-
rounding a non-synonymous substitution averaged across all the sub-
stitutions in the genome. Elyashiv et al. (2016) found a stronger
dip in diversity around non-synonymous substitutions than synony-
mous substitutions (see also Sattath et al., 2011a). Extending the
model of McVicker et al. (2009) to fit a model of background se-
lection and hitchhiking to putative neutral diversity along the genome,
they found that the dip in diversity around synonymous substitu-
tions comes mostly from BGS. But to fully explain the dip in diversity
around non-synonymous substitutions, a reasonable proportion of
these non-synonymous substitutions have to have been accompanied
by a classic (hard) sweep. The majority of these sweeps are estimated
to be due to very weak selection, with selection coefficients < 10−4 .
Furthermore, Elyashiv et al. (2016) estimated a 77 - 89% reduc-
tion in neutral diversity due to selection on linked sites, and concluded
that no genomic window was entirely free of the effects of selection.
Thus linked selection has a profound effect in some species such as
Drosophila melanogaster.
14
Interaction of Multiple Selected Loci
Selection doesn’t act on loci in isolation, and the fates of selected

alleles in the genome are correlated. In the prior chapter we looked
into how selected loci affected neutral loci. Here we’ll explore the
interaction of multiple selected loci. Throughout this chapter we’ll
see how multi-locus dynamics are key to understanding hypotheses
about the evolutionary significance of sexual reproduction, after all the
primary evolutionary costs and benefits of sex arise the independent
assortment of chromosomes and recombination. Multi-locus dynamics
are also often key to understanding how new species arise and are
maintained. From a population-genetic perspective, species are sets of
traits and alleles held together by assortative mating and selection.
14.1 Why sex?
The vast majority of eukaryotic organisms reproduce sexually. Sex-

ual reproduction, the fusion of two cells to form a zygote (syngamy)
followed by meiosis, represents an ancient feature of eukaryotes. How-
ever, the ubiquity of sex is not just due to sex being a fixed ancestral
state of eukaryotes. Many eukaryotic species are not obligately sexual
and can reproduce clonally (i.e. asexually), e.g. vegetative growth in
plants. However, they will reproduce clonally only for a a short wgile
before having sex again. There are even asexual vertebrate lineages.
For example, there are a number of obligately parthenogenic species
of whiptail lizard (Aspidoscelis), where every individual in the species
is female and reproduce clonally. However, only a small fraction of
eukaryote species are obligate asexuals, and these species appear to be
short-lived twigs on the eukaryotic tree of life.
Sex reproduction is confined to eukaryotes but most non-eukaryotic
species have some form of genetic exchange where genetic material is
acquired and incorporation into their genomes via a range of mecha-
nisms. These non-eukaryotic mechanisms often seem to have evolved
in part because they facilitate genetic exchange
228 graham coop
Thus, sex and genetic exchange are incredibly widespread. Yet sex
has substantial short-term costs.
The costs of sex. Three broad costs of sex have often been hypothe-
sized:
1. The cost of mating. Finding and attracting a mate are costly and
may be impossible, and mating can be dangerous.
2. The cost of recombination. Why risk breaking it up a winning

genotype? If you’ve managed to survive to reproduce you’re geno-
type likely can’t be a terrible fit to the environment. But if you
engage in sexual reproduction, i.e. meiosis, you’re shuffling up your
genome with that of your partner. There’s no guarantee that this
new genotype will work well in the current environment.
3. The two-fold cost of sex (Smith, 1971). The offspring of sexual or-
ganisms have two parents. Therefore, sexual parents only contribute
half of their genome to their offspring. While asexual organisms
contribute their entire genome to the next generation. Thus a sex-
ual organism has to have twice as many children to leave the same
number of copies of their genome to the next generation. That
might be doable if both sexual parents were equally committed to
contributing to those offspring. However, that is rarely the case.
This cost is sometimes called the two-fold cost of males, as males
often provide little in terms of resources to their children. Thus
any allele that makes its host asexual should initially spread all else
being equal.
Yet sex and other forms of genetic exchange persist, despite these
short-term advantages to asexual reproduction. Indeed asexual lin-
eages often arise and spread within some sexual populations due to
these advantages.
The benefits of sex. Numerous benefits to sexual reproduction have

been suggested. Throughout this chapter we’ll encounter a range of
models that touch on the advantages of sex. We’ll see that selection
allows beneficial alleles to shed their background of deleterious alle-
les as they sweep through the population. In the absence of sex and
recombination, beneficial alleles can block each other’s progression to
fixation, so called ‘clonal interference’. Another major advantage of sex
is that beneficial alleles can be brought together on the same genetic
background via recombination, allowing faster rates of adaptation.
population and
quantitative
genetics 229
14.2 A two locus model of selection and recombination.
Models involving many selected loci can be very challenging to an-
alyze. Luckily for us many of the key insights of the interaction of
selection and recombination can be understood in relatively intuitive
terms, and demonstrated using two locus models.
Consider two biallelic loci segregating for A/a and B/b. There are
four haplotypes, AB, Ab, aB, ab, which for simplicity we label 1-4.
The frequency of our four haplotypes are x1 , x2 , x3 , and x4 . Each in-
dividual has a genotype consisting of two haplotypes; we label wij the
fitness of an individual with the genotype made up of haplotype i and
j (we assume that wij = wji , i.e. there are no parent of origin effects).
Assuming that these fitnesses reflect differences due to viability selec-
tion, and that individuals mate at random, we can write the following
table of our genotype proportions after selection:
AB Ab aB ab
AB w11 x21 w12 2x1 x2 w13 2x1 x3 w14 2x1 x4
Ab • w22 x22 w23 2x2 x3 w24 2x2 x4
aB • • w33 x23 w34 2x3 x4
ab • • • w44 x24
This follows from assuming that our haplotypes are brought together
at random (HWE), then discounted by their fitnesses. Our mean
fitness w̄ is the sum of all the entries in the table, so dividing by w̄
normalizes the complete table to sum to one. The frequency of the AB
haplotype (1) in the next generation of gametes is
( )
w11 x21 + 12 w12 2x1 x2 + 12 w13 2x1 x3 + 12 (1 − r)w14 2x1 x4 + 12 rw23 2x2 x3
x′1 =
w̄
(14.1)
This is a bit of a mouthful, but each of the terms is easy to under-
stand. Each of the HWE genotype frequencies (e.g. 2x1 x2 ) is weighted
by its fitness relative to the mean fitness (wij /w̄), and by its proba-
bility of transmitting the AB haplotype to the next generation. For
example, AB/Ab individuals (1/2) transmit the AB haplotype only
half the time. The final two terms include the recombination fraction
(r). The first term involving recombination refers to the AB/ab geno-
type (1/4), who with probability (1−r)/2 transmits a non-recombinant
AB haplotype to the gamete. Similarly, the second term refers to the
Ab/aB genotype; a proportion r/2 of its gametes carry the recombi-
nant AB haplotype.
In the single locus case, we defined the marginal fitness of an allele.
Here it will help us to define the marginal fitness of the ith haplotype:
∑
4
w̄i = wij xj (14.2)
j=1
230 graham coop
This is the fitness of the ith haplotype averaged over all of the diploid
genotypes it could occur in, weighted by their probability under ran-
dom mating. Using this notation, and with some rearrangement of
equation (14.1), we obtain
x1 w̄1 − w14 rD
x′1 = (14.3)
w̄
Here we have assumed that w23 = w14 , i.e. that the fitness of AB/ab
individuals is the same as Ab/aB individuals (i.e. that fitness de-
pends only on the alleles carried by an individual, and not on which
chromosome they are carried; this assumption is sometimes called no
cis-epistasis).
We can then write the change in the frequency of our 1 haplotype
as
x1 (w̄1 − w̄) − rw14 D
∆x1 = (14.4)
w̄
Generalizing this result, we write the change in any haplotype i from
our set of four haplotypes as
xi (w̄i − w̄) ± rw14 D
∆xi = (14.5)
w̄
where the coupling haplotypes 1 and 4 use +D and repulsion haplo-
types 2 and 3 use −D. Note that the sum of these four ∆xi is zero, as
our haplotype frequencies sum to one.
So the change in the frequency of a haplotype (e.g. AB, haplotype
1) is determined by the interplay of two factors: First, the extent
to which the marginal fitness of our haplotype is higher (or lower)
than the mean fitness of the population (the magnitude and sign of
(w̄1 − w̄)/w̄). Second, whether there is a deficit or any excess of our
haplotype compared to linkage equilibrium (the magnitude and sign of
D), modified by the strength of recombination. This tension between
selection promoting particular haplotypic combinations, and recom-
bination breaking up overly common haplotypes is the key to a lot of
interesting dynamics and evolutionary processes.
14.3 Types of interaction between selection and recombination

Throughout the rest of the chapter we’ll discuss some general forms
to the interactions between selected loci and how recombination plays
into either facilitating or hindering selection. To illustrate these ideas
we make use of Muller diagrams (Muller, 1932), where we visualize
the allele dynamics in terms of a plot of the stack frequencies over
time. All of our simulations use the same basic two locus dynamics
given by eqn (14.5). To keep things simpler we just discuss through
the qualitative dynamics of these models, but many of these models
have been investigated in much more depth.
population and
quantitative
genetics 231
14.3.1 The hitchhiking of neutral alleles
r=0.0005 r=0.005 r=0.05

Figure 14.1: A beneficial mutation
AB
1.0
1.0
1.0
aB B arises on the background of a
neutral allele whose initial frequency
0.8
0.8
0.8
aB aB is pA = 10%. The beneficial allele has
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
ab ab ab a strong, additive selection coefficient
AB of hs = 0.05.
0.4
0.4
0.4
0.2
0.2
0.2
AB
Ab aB Ab Ab AB
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Generations Generations Generations
AB
AB
Let’s start by revisiting our neutral hitchhiking in this two locus

setting in the previous chapter we saw that neutral alleles can hitch-
hike along with our selected allele if they are tightly linked enough.
Figure 14.1 shows the frequency trajectories of the various haplotypes
for neutral allele (A) that is present at 10% frequency in the popula-
tion when our beneficial allele (B) arises on its background. When the
recombination rate (r) is low between the loci, A gets to hitchhike to
high frequency, but for higher recombination rates it only gets dragged
to intermediate frequencies. For the highest recombination rate shown
(r ≈ s) the neutral allele’s dynamics (pAb + pAB ) are barely changed
at all, as it recombines on and off the sweeping allele frequently and so
barely perceives the sweep.
14.3.2 The hitchhiking of deleterious alleles

Deleterious alleles can also hitchhike along with beneficial mutations
if they are not too deleterious compared to the benefits offered by the
selected allele. Again our allele A is at 10% frequency in the popu-
lation in Figure 14.2, but this time it is deleterious and so initially
decreasing in frequency across the generations when the beneficial
mutation (B) arises on its background. If the loci are tightly linked,
and A were too deleterious, B would never get to take off in the pop-
ulation. However, if the benefits of B outweighs the cost of A, even in
the case of no recombination between our loci, allele A gets to hitch-
hike to fixation and merely slows down B’s rate of increase and their
combined fitness is reduced. With moderate amounts of recombination
between the loci, our deleterious starts to hitchhike but before it can
get to fixation the beneficial allele manages to recombine off its back-
ground. This recombinant aB haplotype, which has higher fittest as
it lacks the deleterious allele can now sweep through the population
displacing the AB haplotype. For higher recombination events we have
to wait less long for a recombination to breakup the hitchhiking dele-
terious allele, so the adaptive allele easily escapes its background. For
232 graham coop
the purposes of illustration here we’ve used a relatively common dele-

terious allele, but in reality these alleles will likely be often be rare in
the population and at mutation selection balance. If they are rare it is
likely that a beneficial mutation arises on a specific deleterious allele’s
background, but as we have seen there are likely going to be many
rare deleterious alleles in the population so it is likely that a beneficial
mutations may often have to contend with deleterious hitchhikers.
r=0.0005 r=0.005 r=0.05

Figure 14.2: The hitchhiking of a
1.0
1.0
1.0
deleterious allele. The beneficial
allele B arises on the background of a
0.8
0.8
0.8
deleterious allele A, and the extent to
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
which the A allele gets to hitchhiking
along depends on the recombination
0.4
0.4
0.4
rate. Code here.
0.2
0.2
0.2
sA =0.08
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0
s100B =0.06
200 300 400 500
sAB=0.14
1.0
1.0
aB r=0 aB r=0.001
0.8
0.8
ab
ab
Frequencies
Frequencies
0.6
0.6
AB
0.4
0.4
Ab
Ab
0.2
0.2
0.0
0.0
0 200 400 600 800 1000 0 200 400 600 800 1000
Generations Generations
Figure 14.3: Interference between two

positively selected alleles. Left) the
red and blue (A and B) beneficial
14.3.3 Clonal interference between favourable alleles. alleles arise on different haplotypes.
They rise in frequency, but in the
When rates of sex and recombination are zero, or very low, positively absence of recombination only one
selected alleles can prevent each other reach fixation and so the rate can fix. This is shown in a Muller
diagram, where pAB is initially set
of adaptation can be slowed. In the absence of sex and recombination, to zero. Right) In the presence
when two positively selected alleles arise on different genetic back- of recombination the population
can generate the recombinant (AB)
grounds in the population they cannot both fix (left side of Figure haplotype, which can subsequently fix.
14.3). They can initially increase in frequency, but necessarily com- Code here.
pete with each other when they become common. This is called selec-
population and
quantitative
genetics 233
tive interference, or sometime clonal interference. If one of the alleles
has a much larger selection coefficient it will fix, forcing the other al-
lele from the population, but when they are relatively equally matched
it may take some time for this situation to resolve itself resulting in a
traffic jam in the population. Thus in an asexual adaptive alleles nec-
essarily have to fix sequentially. However, with even a small amount
of recombination beneficial alleles can recombine on to each others
background, allowing them to fix in parallel (right side of Figure 14.3).
Given the rapid evolution of HIV we can see interference taking
place over very short time periods indeed. HIV uses its reverse tran-
scriptase (RT) gene to write itself from an RNA virus into its host’s
DNA, allowing HIV to hijack the hosts regulatory machinery, a critical
part of its life cycle. One of the early HIV drugs was Efavirenz, which
inhibits HIV’s RT protein. Sadly, mutations are common in the RT
HIV gene, and these mutations, in the presence of the drug, confer a
profound fitness advantage, allowing them to spread through the HIV
population in patients undergoing anti-HIV treatment. In Figure 14.4
we see that by day 224 after the start of drug treatment two different
drug-resistance amino-acid changes beginning to spread within a pa-
tient (also shown as a Muller diagram in Figure 14.5). Because these Figure 14.4: HIV sequences from
a patient over the course of drug
alleles occur on different genetic backgrounds, with little chance for treatment in the retrotransposase
genetic exchange between them, they interfere in each other progress coding region. Figure cropped from
Williams and Pennings (2019),
as they compete to fix within the population. Eventually the amino
acid change at site 188 wins out.
Figure 14.5: Muller plot of the drug

resistance interference dynamics from
Figure 14.4. Figure from Williams
and Pennings (2019), licensed under
CC BY 4.0.
234 graham coop
14.3.4 Epistatic combinations of alleles and the cost of recombi-

nation.
“Love, love will tear us apart
Recombination comes at a cost. While recombination can bring bene-
again” –Joy Division.
ficial combinations of alleles together, it will also tear them apart. To
see this imagine a pair of alleles A and B at two loci that work very
well together, and offer a fitness advantage over the ancestral combi-
nationInteraction between
of allele a and selection
b. You could & recombination
for example imagine that A and B
are changes in a protein Epistasis amongand
and its receptor, locithat they offer a much
more efficient signalling
ab (1) response. However,
wAB>wimagine that A doesn’t
ab>wAb>waB
Ab (2)
work with b, nor does the
aB (3)
allele a worksAB= 0.05
well with B. Perhaps that the
protein made by alleleAB (4)A gums up the sreceptor
Ab= -0.9 b, and similarly for the
other the other combination. sAb= -0.9
Sab= 0
r=0 r=0.05 r=0.1 Figure 14.6: Code here.
1.0
1.0
1.0
aB aB
ab
0.8
0.8
0.8
ab
ab
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
AB
0.4
0.4
0.4
AB
0.2
0.2
0.2
AB
Ab Ab
0.0
0.0
0.0
0 50 100 150 200 0 50 100 150 200 0 50 100 150 200
The haplotype AB can spread from low frequency if recombina-

tion doesn’t break it apart at too high a rate. When recombination
rates are higher, recombination prevents either the A or the B allele
from spreading because recombination swops the A allele from the B
background onto the b background, where it suffers low fitness (and
similarly for the B allele). The ab haplotype doesn’t suffer the same
consequence because it is in the majority, so when recombination oc-
curs the a allele is usually recombined back on to the b background
with no consequence. Thus recombination can prevent the spread of
beneficial epistatic combinations of alleles. We’ll look into this more
when we discuss the evolution of recombination suppressors in Section
14.3.7.
Figure 14.7: A Ratchet. A cog (b)

14.3.5 Muller’s ratchet with asymmetric teeth that can only
turn one way as the pawl (a) prevents
There is a constant influx of deleterious mutations along any chromo- it turning the other way.
Original sketch from Brockhaus Konversations-
some (red alleles in Figure 14.8). In asexual populations, or regions Lexikon, Vol. 10, 1894, page 420. Georg Wiora
(reworked by Dr. Schorsch). From wikimedia.
of the genome lacking recombination, this leads to nearly inevitable Licensed under CC BY-2.0
decrease in fitness due to the loss of high fitness haplotypes a process

known as ‘Muller’s ratchet’ (Muller, 1964).
population and
quantitative
genetics 235
Different haplotypes vary in the number of deleterious alleles they
carry. The haplotypes carrying the most deleterious alleles can be lost Asexual Sexual
by drift, and by selection acting against them, but haplotypes carrying
high numbers of deleterious alleles are easily recreated by new mu-
tations. The converse can also happen, if the selection against these
each deleterious alleles is relatively weak, the population can acciden-
tally lose the haplotype carrying the least number of deleterious alleles
(middle panel of Figure 14.8).
Once we have lost this haplotype it is hard to recreate, as that
would require unlikely back mutations to remove the deleterious mu-
tations from the population. After the the loss of the least deleterious
haplotype, we have ratcheted up the mean deleterious mutations in
the population and ratcheted down the mean fitness of the population.
This will keep happening, by chance we can keep losing the haplotype
with fewest deleterious alleles (bottom left panel of Figure 14.8). Thus
number of deleterious alleles carried in our asexual population will Figure 14.8: A cartoon of haplotypes
at three time points showing the ac-
gradually increase. This may eventually doom asexual population to tion of Muller’s ratchet in an asexual
extinction, as their mean fitness declines over time. population.
Reciprocal Translocations and PT

In a sexual population, the same process can start. We can lose by
chance the haplotype with the fewest deleterious mutations (middle
right panel of Figure 14.8). However, recombination among deleteri- 1 11 11 1
2 12
ous haplotypes can recreate this haplotype carrying few deleterious 2
2 12 10
1 3
alleles. Such a crossover is shown as a red X in the middle right panel 3
1
11
13
3
of Figure 14.8, and the resulting recombinant haplotype few of dele- 13 2
10
5 5
terious is shown in the lower right panel. Therefore, Muller’s ratchet 10 13
8 9 14 4
doesn’t tick forward in sexual populations, as even a small amount of 9 14 7
7
recombination is enough to stop its progression. 6
6 4 8
6 4
9 5 9
7
10 7 8 5 6
8
14.3.6 An example of the costs of asexuality. 11 Meiotic pairing 3
12 12
In the Evening primrose genus (Oenothera), there are a number of 13
4
young, independently-derived, asexual species. In each species this 14 14
asexuality is due to a complicated series of reciprocal translocations,
Figure 14.9 which prevent recombination and segregation and ensure Figure 14.9: A schematic diagram of
From Jesse Hollister
the karotype of an evening primrose.
that every plant is permanently-heterozygote for these rearrangements
The two columns show a heterozygote
due to lethality. This system is quite complicated, and super cool. We individual’s diploid chromosomal
don’t need to worry about the details but importantly each species is complement. Each chromosome
is heterozygote for two different
functionally asexual. Hollister et al. (2014) sampled transcrip- translocations. For example both
tome data from across the Evening primrose clade, and took advan- the top-most chromosomes has one
arm from chromosome 1, but the
tage of 7 independent, asexual-sexual sister pairs of species to examine
other arm is heterozygote for a large
the impact of the evolution of asexuality for molecular evolution. translocation from the ancestral
The dN/dS for the sexual and asexual species for each of the seven chromsome 2 and 10. Due to these
translocations the meiotic pairing
pairs (C1-C7) is shown in Figure 14.10. In every pair dN/dS is higher in form a complete ring of chromosomes,
the asexual species. The genomes of the asexual species are evolving in which prevent crossing over and
independent segregation. Thanks to
Jesse Hollister for this image.
236 graham coop
0.40
● sex
asex
●
0.35
●
●
dN/dS
0.30
●
● ●
Figure 14.10: dN/dS calculated on sex-
ual (circles) and asexual (diamonds)
0.25
● lineages of each of seven sister pairs of

species. Data from Hollister et al.
C1 C2 C3 C4 C5 C6 C7
(2014). Code here.|
a less constrained fashion, likely due to weakly deleterious mutations

accumulating due to hitchhiking with beneficial alleles and the slow
crank of Muller’s ratchet.
14.3.7 The maintenance of combinations of alleles in the face of

recombination.
In some cases balancing selection may be attempting to maintain mul-
tiple combinations of alleles in the population that work well together.
However, recombination may be constantly ripping those alleles away
from each other making it difficult to maintain these alleles. This can
select for the suppression of recombination. Some of the most dra-
matic demonstrations of this tension involve the evolution of so-called
super genes. We’ll first consider the evolution of a mimicry supergene
in Heliconius numata as an example of these dynamics.
Some of the most spectacular examples of Müllerian mimicry in
the world are found in Heliconius butterflies. These butterflies are
unpalatable to predators, and different species mimic each other so
benefiting from not being eaten by predators, which rapidly learn to Figure 14.11: Showy evening primrose
(Oenothera speciosa), the sexual
avoid all these species). In many of these species multiple mimicry
species in the clade C2 from Figure
morphs are found as we move across geographic space. In Heliconius 14.10.
Favourite flowers of garden and greenhouse
numata a number of different morphs mimic morphs from a distantly (1896). Step, E. Image from the Biodiversity
Heritage Library. Contributed by Missouri
related Melinaea species, see Figure 14.12. Botanical Garden. Licensed under CC BY-2.0.
To keep things relatively simple lets focus on two differences be-

tween silvana and bicoloratus, the yellow stripe on the top wing of
silvana and the black bottom wing of bicoloratus. Lets imagine that
these two differences are due to a simple two locus system (see left
column of Figure 14.13). The first locus segregates for Y/y, where the
Y allele encodes for a top-wing yellow band, and y encodes for the ab-
mothone, and M. marsaeus
phasiana)
(second row, left to right: H. population and
n. f. tarapotensis, H. n. f. silvana, H. quantitative
n. f. aurora, H. n.f. bicoloratus, and H.
genetics 237
n. f. arcuella)
M. menophilus Figure 14.12: Five sympatric forms
of H. numata from northern Peru,
and their distantly related comimetic
Melinaea species. First row: M.
menophilus ssp. nov., M. ludovica
ludovica, M. marsaeus rileyi, M.
marsaeus mothone, and M. marsaeus
phasiana. Second row, H. n. f. tara-
potensis, H. n. f. silvana, H. n.f.
H. numata
aurora, H. n.f. bicoloratus, and H. n.

f. arcuella. Figure and caption from
Joron et al. (2006) cropped, licensed
under CC BY 4.0.
sence of the yellow band. The second locus segregates for B/b where
B encodes for the bottom-wing being black, and b for the absence of
black on the bottom wing. If Y is recessive and B is dominant, then
the silvana phenotype corresponds to a YY bb genotype. Due to the
dominance of the y and B alleles the bicoloratus phenotype can be
achieved by various genotypes (Yy Bb, yy BB, Yy BB, yy Bb). Lets
assume that both of these phenotypes offer an advantage as they
mimic a M. menophilus model. But there are also genotypes that
don’t do as well; YY BB individuals have a yellow band and a black
bottom and so don’t do a great job mimicking anything and so will
be eaten. Thinking about the four possible haplotypes, y-B has high
marginal fitness as due to its combo of dominant alleles it’ll always
produce a bicoloratus phenotype. Likewise the Y-b haplotype has high
marginal fitness, as it does well in the homozygous state (silvana phe-
notype), and when it is paired with the y-B allele. However, the Y-B
and y-b haplotypes fair less well as they carry two alleles that don’t
work well with each other and so are often individuals who suffer high
rates of predation.
If no recombination occurs between these loci (r = 0, Figure 14.13),
then the Y-B and y-b are selected out of the population, and the y-B
and and Y-b can be stably maintained. However, when there’s too
much recombination between our loci (e.g. r = 0.4, Figure 14.13) the
high-fitness haplotypes keep getting ripped apart by recombination
and the Y-b is lost from the population as it’s recessive advantage is
lost as it’s too often being broken up by recombination in heterozy-
gotes. “coadapted combinations
of several or many genes
14.3.8 Supergenes to the rescue! locked in inverted sections of
chromosomes and therefore
So our polymorphisms can only be maintained if they are tightly inherited as single units.”
linked, i.e if these alleles arose at loci that are genetically close to Dobzhansky (1970) on
each other. But how is it possible that these alleles arose close to supergenes.
238 graham coop
Figure 14.13: Left column a hypo-

thetical two locus model to describe
the H. numata silvana and bicoloratus
morphs.. Right column the frequency
Y-B r=0
dynamics of the four haplotypes under
0.8
Y-b two different recombination regimes.
Frequencies
The model has negative frequency
0.4
dependent selection acting to increase
H.n.f. silvana
y-B the frequency of the mimicry morph
Y-b y-b
0.0 that is rarer in the population. While
Y-b
0 20 40 60 80 100 all individuals with genotypes cor-
Generations responding to a mixed phenotype,
Y-B r=0.4 e.g. YY BB, have very low fitness as
0.8
b
Y-
Frequencies
they mimic no Melinaea and so are

quickly eaten. Butterflies cropped
Y-B
0.4
from Joron et al. (2006) cropped,

H.n.f. bicoloratus licensed under CC BY 4.0, Code here.
y-B or y-B y-b
0.0
Y-b y-B 0 20 40 60 80 100
Generations
each other? Well the trick is that they don’t necessarily have to arise
very close to each other. If such a system is polymorphic but being
regularly broken up by recombination, a chromosomal inversion–the
flipping around of a whole section of chromosome– can arise and will
suppress recombination. Imagine that our two loci are far apart ge-
netically, and a chromosomal inversion arises on the Y-b background
forming the b-Y haplotype. This inverted haplotype will not recom-
bine with the y-B haplotype when it is present in a heterozygote, thus
it is not broken down by recombination. This inverted haplotype,
which enjoys the fitness benefits of the Y-b, can therefore replace the
Y-b haplotype in the population. The two other low fitness haplotypes
will disappear as they sre no longer being generated by recombination,
leaving just the y-B and b-Y. The polymorphism system now behaves
like alleles at a single locus, a super gene (e.g. like r = 0 in Figure
14.13).
Now the H. numata system is vastly more complicated than our
toy two locus system, presumably involving many changes and re-
finements, but the same principle holds (Joron et al., 2011). The
differences between the different H. numata mimmicy morphs is found
on a single chromosome, and the inheritance behaves as if controlled
by a single locus (albeit with many alleles). The H. n. f. silvana in-
dividuals carry a recessive haplotype of alleles that which is known to
be locked together by a ∼ 400kb inversion, that is a different chromo-
somal orientation from the bicoloratus allele (haplotype) which acts as
population and
quantitative
genetics 239
a dominant allele. Other alleles at this same chromosomal region pro- Figure 14.14: Left) A coastal peren-
nial and an Inland annuals Mimulus
vide the genetic basis of the other morphs, and sometimes correspond gutatus Lowry and Willis (2010),
to further inversions with a range of dominance relationships. image from Lowry and Willis
(2010) licensed under CC BY 4.0.
Right) A reciprocal transplant exper-
Inland annual ● iment showing that coastal perennial
Coastal perennial
and an Inland annuals are locally
12
adapted to their respective habitats.
10
Data from Lowry and Willis
(2010), Code here..
8
Fitness
6
●
Coastal Pop.
4
A B
Co 1 1
2
In (1 – s) (1 – s)
●
●
0
Coastal Inland
m m
Local Adaptation, Speciation, and Inversions. Inversions have long Inland Pop.
been thought to play an important role in local adaptation and speci- A B
Co (1 – s) (1 – s)
ation. One example of an inversion underlying local adaptation occurs In 1 1
in Mimulus gutatus, in Western North America, where there are an-
nual and perennial ecomorphs with very different life history strategies
Figure 14.15: A two locus, two pop-
(see Figure 14.14). The perennial form grows in many places along ulation migration-selection balance
the Pacific coast, and in other places with year around moisture; it system. Two loci A and B segregate
for an Inland and Coastal adapted
invests a lot of resources in achieving large size and laying down re-
alleles.
sources for the next year, and as a result flowers late. The annual form
grows inland, e.g. the California central valley, where it has to invest
all its effort in flowering rapidly before the long, hot, dry summer.
Neither ecomorph does well in the other’s environment. The perenni-
als get crisped before they have a chance to flower, while the annuals
suffer from high rates of herbivory and cannot tolerate the salt spray.
Lowry and Willis (2010) found that large inversion controled
a lot of of the phenotypic variation in flowering time and a range of
other morphological differences between these two morphs. They also
showed that the inversion controled a reasonable proportion of the dif-
ferences in fitness in the field, consistent with it underlying the fitness
tradeoffs involved in local adaptation.
Why would an inversion be involved in locking together local
adapted alleles? The basic idea, like above, is an inversion can be
selected for we have two (or more) loci segregating for locally adapted
alleles (Figure 14.15). Locally advantagous haplotypes are in danger
of being broken up by recombination with maladapted haplotypes,
which are constantly being introduced into each population by mi-
gration from the other. If an inversion arises that locks these alleles
240 graham coop
together in one population, it can be selected for as does not suffer the
ill effects from recombination with migrating maladaptive haplotype.
Figure 14.16: Diversity of sex deter-

mination systems for representative
plant and animal clades. Figure
14.3.9 Sex Chromosomes and the dynamics of selection and recom- and caption from Bachtrog et al.
bination.
The evolution of sex chromosomes and new systems of genetic sex
determination provide a beautiful demonstration of the interplay of
selection and recombination. But first it’s worth taking a step back
and thinking the difference between an species being sexual, having
male and female gametes, and having separate sexes (i.e. males and
females), and the mechanisms for determining the sexes. Many species
are sexual but with no separate sexes or even male or female gametes.
The production of different sized gametes (anisogamy) has arisen a
number of times in multi-cellular life, with male and female gametes
are defined by their relative sizes. The smaller, and often more mobile,
gametes are defined male gametes (e.g. sperm), while the larger, well
provisioned, and often less mobile are defined as female gametes (e.g.
egg cell). The evolution of anisogamy is thought to be due to disrup-
tive selection due to a tradeoff pulling in opposite directions towards
mobile gametes able to move further and in the opposite direction to-
wards better provisioned gametes better able to build larger zygotes.
In many organisms individuals can produce both male and female ga- Figure 14.17: Volvox aureus, Volvox
are spherical, multicellular green
metes, while some species have evolved separate sexes, likely in part algae. The surface is made up of a
as an inbreeding avoidance mechanism.There is huge diversity in sex single layer of somatic cells (up to
50k cells) beating their flagella. Some
determination mechanisms across the eukaryotic tree (Figure 14.16.
species of Volvox have individuals
This is all to say, that biology is wonderfully diverse and complicated. with both male and female gametes,
being made here in the germ cells (a
and g respectively) in the middle of
the sphere. Some Volvox have sepa-
rate sexes, where different individuals
produce male and female gametes.
population and
quantitative
genetics 241
In mammals, and many other systems with genetic sex determi-
nation, the genes responsible for sex determination lie on a pair of
heteromorphic sex chromosomes, i.e. pair of chromosomes that are
quite different in size. In mammals where most males are XY and
females XY. Where the male determining Y chromosome that has a
very small gene content compared to the X chromosome. But in other
groups such as birds, and some snakes, sex determination is a ZW
system with females being ZW and males being ZZ. In those systems
females carry a gene poor W with males being the homogametic sex,
carrying two Zs. If you are still reading send Graham a picture of
Nettie Stevens, she discovered sex chromosomes in 1905 (Stevens,
1905). These examples of heteromorphic sex chromosomes, and many
others like them, are thought to have arisen from an ancestral pair of
autosomes? What then explains their evolution?
One broad explanation for the evolution of sex chromosome is illus-
trated in Figure 14.18 and goes as follows:
1. There are a pair of ancestral autosomes with sexually-antagonistic

male-beneficial, female-detrimental alleles segregating on them
(the converse can occur but arent central to the evolution of Y
chromosomes). These alleles can persist in the population for some Figure 14.18: A cartoon of formation
of a neo-Y chromosome and subse-
time but are eventually lost due to their cost in females.
quent suppression of recombination.
A pair of orthologous automosomes
2. A dominant, male-determining allele arises on one of the chromo- is shown in the top most panel.
Sexually-antagonistic male-beneficial,
somes. Let’s call this chromosome our proto-Y and the other our
female-detrimental alleles are shown
proto-X. All individuals who are heterozygous for the Proto-Y will as vertical lines. A newly arising
be male, individuals who are homozygous for the proto-X. No in- dominant, male-determining allele
is shown as a blue circle. The in-
dividuals will be homozygous for the proto-Y, as individuals can versions are shown as brackets. The
receive at most one Proto-Y, that of their father. non-recombining region linked to the
sex determining allele coloured red.
3. Our sexually-antagonistic alleles benefit from being on the same

chromosome as our male-determining allele as then they are guaran-
teed to be in males. However, if they recombine off the proto-Y on
to the proto-X they are at a disadvantage.
4. If an inversion arises on the background of the proto-Y chromosome

it can lock together the male-determining allele and some of our
sexually-antagonistic alleles. This inversion can initially spread as
gains the benefit of the sexually-antagonistic alleles without the
cost of recombination. This inversion can’t spread to fixation as
Fisherian selection on the sex ratio keeps it in check.
5. Further inversions can potentially cement additional sexually-

antagonistic alleles into tight linkage with the male-determining
allele.
242 graham coop
Sex chromosomes, under this hypothesis, are super genes locking to-
gether sex determination and sexually-antagonistic alleles. Our male-
beneficial, female-detrimental alleles work well on the background of
the male-determining allele and poorly off it, that’s exactly the su-
https://commons.wikimedia.org/wiki/File:Labe
pergene setup we encountered in Section 14.3.8. This sketch can be
https://www.flickr.com/photos/52993488@N0 otropheus_fuelleborni_in_Botanic_garden_in_
flipped to describe the evolution of ZY systems.
3/4890217915 Teplice_(2).JPG
Figure 14.19: The sex-specific effects

Female Male of the OB allele.
Image credits: Blue mbuna Male L. fuelleborni
by Chmee2; OB Male L. fuelleborni by Doro-
nenko; Brown ob Tropheops female by Alexandra
Tyers; Female L. fuelleborni orange morph, by
Mikko Stenberg
ob
OB
A colourful example of the initial conditions for the evolution of

a novel sex determination system is offered in lake Malawi there are
many very closely related cichlids species (Roberts et al., 2009).
In many of these species the males are brightly coloured to attracted
females, while the females are often brown to help them avoid preda-
tors. In some of these species there is an alternative orange morph,
called the marmalade cat morph, which are cryptic against the rocky
bottom of the lake. This morph is due to a dominant mutation called
OB at the pax7, and the allele appears to shared across many of these
species. This OB allele works well in females, however, in the males
the OB allele disrupts their bright colouration. Thus the OB polymor-
phism is sexually antagonistic, i.e. it works well in females and poorly
in males.
Males carrying the male-deleterious OB allele are rarely found, de-
spite the allele being common in females. Why is that? Well because
the OB allele is tightly linked to a newly emerged female-determining
allele (W), with males carrying two copies of the Z allele. Males usu-
ally are homozygous for the ob-Z haplotype, while females can being
either orange (OB-W/ob-Z) or brown (ob-W/ob-Z). Recombination
between these two loci seems to be very rare, and so the sexually an-
population and
quantitative
genetics 243
tagonistic allele OB appears to be mainly female specific. Thus the
spread of this sex determining allele has potentially helped resolve
sexually-antagonism while it aided its own spread. An inversion on the
Z background would lock together these two alleles, and spread.
The degradation of heterogametic sex chromosomes. Our inversions

on the neo-Y chromosome have created a issue (or conversely the
neo-W in ZW systems). The inverted block, containing the male-
determining allele, is now inherited as a non-recombining haplotype.
Why’s that? Well the inversion doesn’t recombine in heterozygotes,
and the neo-Y inversion region is only ever found in heterozygote
males.1 Thus the region of chromosome tied up within inversions is 1
This differs from the situation that
effectively asexual and subject to many of the issues that come along most other non-sex chromosome
inversions find themselves in as they
with that. The hitchhiking of deleterious alleles will be common and homozygous some of the time and so
Muller’s Rachet will begin to tick. Many mildly deleterious alleles will experience recombination.
be allowed to fix through these mechanisms, leading to the accumula-

tion of permature stop codons and silencing mutations in non-essential
genes within the neo-Y inversion. The X chomosome will maintain
copies of these genes, and sometimes the expression of these genes will
have to be up-regulated in males to accommodate for the degradation
of the Y based copy leading to lower dosage of these genes.2 Trans- 2
Indeed in some heterogametic sex
posable elements can also accumulate on the non-recombining section chromosome systems there are evolved
dosage compensation systems that
of the Y chromosome, some times in huge numbers, as the purging deal specifically with these issues.
of these transposable elements will be inefficient in this region. But
there’s little to stop the non-recombining section of neo-Y chromosome
from expanding more due to the short-sighted selection for inversions
that further tie up sexually-antagonistic alleles. Our non-recombining
section of the Y chromosome maybe expanding to occupy more of the
chromosome, as it is losing functional genes and bloating up with re-
peative DNA. Eventually much of what remains may be genes that are
essential to male function, as is the case with old Y chromosomes such
as humans.
A.
An Introduction to Mathematical Concepts
From Haldane’s entertaining response

“Now, in the first place I deny that the mathematical theory of popu- to Mayr’s criticism of population
lation genetics is at all impressive, [... We] made simplifying assump- genetics. .
tions which allowed us to pose problems soluble by the elementary Haldane, J. B. S., 1964 A
mathematics at our disposal, and even then did not always fully solve defense of beanbag genetics. Perspec-
the simple problems we set ourselves. Our mathematics may impress tives in Biology and Medicine 7 (3):
343–360
zoologists but do not greatly impress mathematicians.”–Haldane
Throughout these notes we make use of mathematical concepts,

many of which are based in probability theory and statistics. Here
we briefly review some of these concepts. The wikipedia pages on
statistics and math topics are often excellent introductions and worth
consulting if you want to know more. Parts of this primer were origi-
nally written by Sebastian Schreiber and myself for our Fall quarter of
the PBG core (although I take full credit for any errors subsequently
introduced). Some of these concepts may go beyond what you have
covered in previous courses. The notes do not rely on you knowing
all of these results, but I’ll refer to this appendix when these concepts
first come up in the main body of the notes. To answer the questions
in the first chapter you will need to know some basic rules of proba-
bility, so reviewing Sections A.2.2 and A.2.1 below would be a good
place to start.
A.1 Calculus
In evolutionary genetics we’re often interested in how quantities

change over time, and so we’re interested in the rate of change over
time. This particular obsession is shared with much of science and so
the concepts we make use of popup in many other fields. The deriva-
tive f ′ (a) of a function f (x) at x = a represents the instantaneous rate
of change of the function, dfdx (x)
, at x = a or, equivalently, the slope
of the graph of the function at x = a. A derivative of zero indicates a
local maxima, minima, or saddle point of the function see Figure A.1.
To give a physical example, imagine that the derivative of position
246 graham coop
Figure A.1: Top) An example func-
0.2 0.4
tion, f (x) = x − (5/6)x3 − (1/3)x4 ,
and Bottom) its derivative
f ′ (a) = 1 − 3(5/6)x2 − 4(1/3)x3
f(x)
Code here.
−0.2
−0.6
A B C D
1
0
f′(x)
−1
−2
−3
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0

x
with respect to time gives the (instantaneous) speed of a car. Think

of the top panel of Figure A.1 as showing a car driving up and down
an alley, the page, with f (x) giving the car’s position at time x. The
bottom panel shows the car’s speed, with the sign (i.e. + or −) of the
derivative giving the direction of movement. Moving from left along
the x (time) axis, in time period A our car is moving up the alley
(page), the speed is positive (i.e. f ′ (a) > 0). In the time period B,
the car is reversing down the alley, its speed is negative (f ′ (a) < 0 ).
As we move from A to B the car is beginning to slow down, i.e. the
derivative gets small in magnitude, as it’s going to reverse direction
at time indicated by the first dotted line at the point. At the dotted
line between A and B, we are at the moment when the car is changing
direction, the car is stationary, its speed is zero (i.e. f ′ (a) = 0 ).
We’ll sometimes want to know about the second derivative of f ,
2
denoted by f ′′ (a) or d df2(a)
x . The second derivative measures the rate
at which the first derivative is changing i.e. the concavity/convexity
of the function. See Figure A.2. In our physical example, the second
derivative with respect to time is the (instantaneous) acceleration
of the car, as it is the rate of change in the speed of the car (signed
by whether it’s accelerating in a positive or negative direction). One
useful property of the second derivative is that it is positive at local
maxima of the function, and negative for local minima of the function.
A.1.1 Approximating functions by Taylor Series.

A wonderful thing about derivatives is that they allow us to approx-
imate complicated, nonlinear functions by linear functions (this is
population and
quantitative
genetics 247
Figure A.2: Code here.
2
0
−2
f″(x)
−4
−6
−8
−2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
called a first-order Taylor approximation). Namely, a first order ap-
0.4
proximation of f (x) at x = a is given by ●
0.2
f (x) ≈ f (a) + f ′ (a)(x − a) for x near a (A.1)
0.0
Returning to our car example, this corresponds to trying to guess
f(x)
−0.2
the position of the car extrapolating from its current location and ●
speed. We’ll do well when the car is traveling at a relatively constant ●
−0.4
speed, i.e. isn’t accelerating or deccelerating too fast. ●
−0.6
Two common first-order Taylor approximations that we’ll encounter
throughout the notes are −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
x
exp(x) ≈ 1 + x for x near 0 (A.2)

Figure A.3: Our function from the top
(1 − x) k
≈ 1 − k x for x near 0 (A.3) panel of Figure A.1 approximated by
first-order taylor approximations (red
we’ll also use the Taylor approximation given by eqn (A.2) as a trick lines) at a variety of points a (solid
dots). Note how the approximation
to write breaks down away from the dot, I
stop plotting the approximation a
(1 + x)L ≈ exp (Lx) for x near 0, (A.4) little away from the dot for easy of
presentation. Code here.
which allows us to move from a geometric decay to an exponential
decay. As a generalization of this, we’ll approximate the product
0.4
( L )
●
∏L ∑ ●
0.2
(1 + xi ) ≈ exp xi if all xi are near 0, (A.5)

i=1 i=1
0.0
f(x)
which is useful as it allows us move from a product to thinking about

−0.2
a sum (where averages are easier to think about).

●
−0.4
We’ll sometimes want more accuracy and so use a second order

●
approximation, i.e. we will approximate the graph of a function with a

−0.6
parabola instead of a line (see Figure A.4). This is often useful when
examining the effects of stochasticity on some process. These second- −2.0 −1.5 −1.0 −0.5
x
0.0 0.5 1.0
order Taylor approximations take the form:

Figure A.4: Our function from the top
′ ′′ panel of Figure A.1 approximated by
f (x) ≈ f (a) + f (a)(x − a) + f (a)(x − a) /2 2
(A.6)
second-order taylor approximations
(red lines) at a variety of points a
(solid dots). Code here.
248 graham coop
where f ′′ (a) denotes the second derivative of f at x = a. In our car

example, this is equivalent to predicting the location of the car from
its speed and acceleration.
One place this second order approximation is useful is for the log
function and yields
log(1 + x) ≈ x − x2 /2 for x near 0. (A.7)
A.1.2 Integrals
∫b
Regarding integrals a f (x) dx, just remember that they represent the
signed area “under” the graph of y = f (x) over the interval [a, b].
A.2 Probability
Evolution is fundamentally a random process. Which individuals live,

die, and reproduce in any given moment is far from predictable. The
randomness of Mendelian transmission, what genetic material is trans-
mitted to the next generation, reflects randomness at the molecular
and cellular level. While this makes it impossible to predict the out-
come for a given individual we can speak of average outcomes and the
statistical properties of evolutionary processes. Indeed evolution is a
statistical process, evolution occurs because some types of individuals,
and alleles, on average leave more offspring to subsequent generations.
Thus to understand the details of models of evolutionary change we
will have to understand something about probability and statistics.
A.2.1 Random Variables

A random variable X, roughly, is a variable that takes on values
drawn randomly from some probability distribution. There are two
major types of random variables, discrete and continuous. For a dis-
crete random variable, think of it as a person calling out numbers by
drawing them randomly out of a hat with some distribution of num-
bered slips of paper. We use uppercase X to think about the number
that might be drawn (before it is drawn) and lowercase x to denote
the number that is actually drawn. Discrete random variables take on
a countable number of values, say x1 , x2 , . . . , with some probabilities
p1 , p2 , . . . . We can denote this assumption as
P[X = xi ] = pi “the probability that X equals xi is pi ”
Continuous random variables, which can take on values in a contin-

uum, are characterized by their probability density function p(x) i.e.
∫∞
a function that satisfies p(x) ≥ 0 for all x and −∞ p(x) dx = 1. For
population and
quantitative
genetics 249
example, think about the precise time of day a baby is born in a hos-
pital (not just the hour or the minute, where discrete random variables
would suffice, but the precise moment). For these variables,
∫ b
P[a ≤ X ≤ b] = p(x) dx “the probability that X is interval [a, b] equals the area under the curve p(x) from a to b”
a
for example, we could ask the probability that a baby was born some-
where between midnight and 12.18am.
Dog Cat
A.2.2 Basic Rules of Probability

Imagine a fairground game where you reach into a box and pull out an 12 10 8
egg. There are 100 eggs in the box, 57 of them are empty. Forty three
have a toy in them. There are eggs with a stuffed dog toy, eggs with
a cat toy, eggs with a lizard toy, eggs with both a dog and cat toy in
them. The counts of each type of egg are shown in Figure A.5.
13
Question 1. You reach into the box and pull out one egg:
i) For each egg type (dog alone, cat alone, lizard, dog+cat, and no
Lizard
prize), what is the probability that you get an egg of that type? What
do these probabilities sum to? Figure A.5: Venn diagram of fair-
ground game toys, there are a hun-
ii) What’s the probability of getting an egg with a dog? What is dred eggs in total, including 57 eggs
the probability of getting an egg with a dog in it or an egg without a with no prize that are not shown.
Code here.
dog in it.
iii) What’s the probability of getting an egg with a dog in it or an
egg with a lizard.
These questions above illustrate the principle that if events A & B

are mutually exclusive then P(A or B) = P(A) + P(B), following from
these P(A or not A) = P(A) + P(not A) = 1. What is the probability
of getting an egg with a dog or a cat? Well, for events that are not
mutually exclusive we need to discount the sum of the probabilities by
their overlap, giving
P(A or B) = P(A) + P(B) − P(A & B). (A.8)
We call P(A & B) the joint probability of A & B.
Question 2. What is the probability P(dog or cat)?
Conditional probability We often want to know the conditional proba-

bility, the probability of an event conditional on some other particular
event. For example, the conditional probability of getting a cat toy
given that I’ve pulled out an egg containing a dog (recall that ten of
the hundred eggs contain both a dog and a cat toy.). We write this as
P(cat|dog), where we read |dog as ‘given dog’ or ‘conditional on dog’.
250 graham coop
The rule of conditional probabilities is that

P(A & B)
P(A|B) = (A.9)
P (B)
we can now answer
Question 3. What is P(cat|dog)?
Explain the underlying intuition of your answer?
By rearranging eqn (A.9), we obtain the rule that
P(A & B) = P(A|B)P(B). (A.10)
Thus we can always obtain the joint probability of A & B by multi-

plying the conditional probability by the probability of the event we
are conditioning on. Equivalently, we could have computed the joint
probability as
P(A & B) = P(B|A)P(A). (A.11)
these two ways of writing the same thing will come in useful in just a
moment.
The total probability of an event can be obtained by summing over
all of the L mutually exclusive ways that A can happen
∑
L ∑
L
P(A) = P(A & Bi ) = P(A|Bi )P(Bi ) (A.12)
i=1 i=1
where B1 , · · · , BL give the mutually exclusive events that can occur

alongside our event B. This is the law of total probability. For exam-
ple, we can write the probability of obtaining a cat as
P(cat) = P(cat & dog) + P(cat & not dog). (A.13)
Independence Two events are independent of each other if
P(A & B) = P(A)P(B) (A.14)
this requirement implies independence because the conditional and un-

conditional probabilities are equal, P (A) = P (A|B), i.e. I learn noth-
ing about the event A from the event B having occurred. For example,
if I draw two eggs with replacement from the box the probability of
getting a lizard then a dog is P(lizard then&dog) = P(lizard)P(dog).
Bayes Rule We often want to reverse of conditional probability

statements, i.e. turn the statement of P (B|A) into the statement of
P (A|B). We have two different ways of expressing the joint probability
in terms of conditional probabilities. Because they each equal the joint
probability, they are equal to each other, meaning
P(B|A)P(A) = P(A|B)P(B). (A.15)

population and
quantitative
genetics 251
Rearranging eqn (A.15) we obtain
P(A|B)P(B)
P(B|A) = (A.16)
P(A)
Equation (A.16) is also called “Bayes’ Rule” or “Bayes’ Theorem,”

and it which allows us to reverse the variable we condition on.
Question 4. Use Bayes’ rule to calculate P(dog|cat) from the
conditional probability you calculated in Question 3.
A.2.3 Expectation of a Random Variable

The expectation of a random variable is the point at which the dis-
tribution is “balanced”. For discrete random variables it is given by
µ = E[X] = p1 x1 + p2 x2 + · · · + (A.17)
The average outcome 1 over a set of independent events is an esti- According to Pascal, the expectation
mate of the mean µ̂, where the hat denotes that it is an estimate. A is the excitement a gambler feels when
placing a bet i.e. each term in the
more precise interpretation of the relationship between the average sum equals the probability of winning
and the expectation is given by the law of large numbers described times the amount won. Apparently
Pascal knew some unusually rational
below. For a continuous random variable, gamblers.
∫ 1
Recalling that we compute average,
E[X] = x p(x) dx. (A.19) the sample mean, of a set of numbers
X1 , · · · , XL as
For any “reasonable” function, one can define E[f (X)] by 1 ∑

L
X̄ = Xi (A.18)
L i=1
E[f (X)] = p1 f (x1 ) + p2 f (x2 ) + . . . (A.20)
where the bar over the X denotes that
it is the average value of X.
for discrete random variables and
∫
E[f (X)] = f (x)p(x) dx (A.21)
for continuous random variables.

A particularly important choice of f is f (x) = (x − µ). In this case,
σ 2 = E[(X − µ)2 ] = E[X 2 ] − µ2 (A.22)
is the variance of X which measures the mean deviation squared

around the mean i.e. “the spread around the mean”. σ (i.e. the square
root of the variance) is the standard deviation of X. We can compute
the sample variance as
1 ∑
L
c2 =
σ (Xi − X̄)2 (A.23)
L − 1 i=1
Note that the units of our variance will be the units of X 2 , e.g.
if X is height measurements in cm the variance will have units cm2 .
252 graham coop
One reason that the standard deviation is a more intuitive than the
variance is that its units are the same as X, e.g. cms.
Another important choice of f is f (x) = log x. Provided that X
is positive, exp(E[log X]) corresponds to the geometric mean of X.
Alternatively 1/E[1/X] corresponds to the harmonic mean of X.
Question 5. Your friend offers you a wager on the outcome of

one round of playing the fairground egg game. She’ll give you: $1 for a
only dog, $2 for a only cat $5 for an egg with a cat and a dog, and $4
for a lizard. However, she’ll take $1 from you if you get an empty egg.
What is your expected payout?
Some Useful Properties of Expectations. One of most useful mathe-

matical properties of the expectation is its linearity, in that the expec-
tation of a linear function of random variables is the linear function
applied to the expectation, i.e.
E[aX + bY + c] = aE[X] + bE[Y ] + c (A.24)
where X and Y are random variables, and a, b, and c are constants.

This holds regardless of whether X and Y are independent. Note, that
our multipliers (a & b) must be constant, as this does not hold for the
expectation of products of random variables. One sensible property of
the linearity is the units of the mean is the same as our observation,
for example if we change our measure height of adult height from
inches to cm, the unit our mean also changes from inches to cm (as
this change just involves multipling by a number).
Using our linearity of expectations, we can obtain an analogous
result for the variance
V ar[aX + bY + c] = a2 V ar[X] + b2 V ar[Y ] (A.25)
this holds if X and Y are independent variables, if they are not we

need to account for their covariance. Note that the constant c has
disappeared as the variance is a statement about the spread of the
points around the mean, and so it doesn’t matter how we shift the
mean.
We are often interested in the expectation of a random variable X
conditional on some event Y = y, this conditional expectation is
∑
L
E[X|Y = y] = xi P(X = xi |Y = y) (A.26)
i=1
summing over the L possible values X could take. For example, we

could ask the expected payoff of your friend’s wager conditional on
knowing that you have an egg with a dog in it. With the analogous
population and
quantitative
genetics 253
expression for continuous random variables replacing the sum with an
integral.
We can recover our total expectation from the conditional expecta-
tions by taking the sum of our conditional expectation over the values
that Y could take, weighting each by their probability
∑
M
E[X] = E[X|Y = yj ]P(Y = yj ) (A.27)
j=1
0.4
●
this is the law of total expectation, the analog to the law of total ●
0.3
Probability
probability (eqn (A.12)). We can write this law more generally as ●
[ ]
0.2
● ●
●
E E[X|Y ] , i.e. we are taking the expectation of our conditional ex- ● ●
0.1
pectation over Y .
●
● ●
0.0
● ● ●
● ● ● ● ● ● ●
0 2 4 6 8 10
i
A.2.4 Discrete Random Variable Distributions.

●●
0.12
● ● p = 0.1
●
● p = 0.5
●
p = 0.7
Probability
●
0.08
Important discrete random variables include ●
●
●
●
●●●
● ●
●
●
●
● ●
0.04
● ●
● ●
● ●
Binomial random variables count the number X of heads when flip- ●

●
●
●
●
●
● ●
●
●
●
0.00
● ●
●●
●●●●●●●●●●●●●●●●●● ●● ●●
●●● ●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
ping a coin n times whose probability of being heads is p. In which 0 20 40

i
60 80 100
case,
n! Figure A.6: Binomial distribution for
pi = pi (1 − p)n−i 0 ≤ i ≤ n. (A.28) a sample of n = 10 and n = 100, the
i!(n − i)!
vetical lines show the means np. Code
For a binomial random variable, E[X] = np and σ 2 = np(1 − here.
p). Examples are shown in Figure A.6, Note how the mass of the
distribution becomes more centered on the mean for larger sample
√
sizes, as the standard deviation increases only as n. Another
way that we can write that our observation i is drawn from the
binomial distribution is i ∼ Binomial(p, n), where i ∼ is read as
“i is distributed as” (we will use the notation as short hand for
random variable in the notes).
●
● p = 0.9
● p = 0.5
0.8
p = 0.1
Geometric random variables count the total number of flips X before
0.6
seeing a heads on a coin with probability p of being heads. In which

Probability
case,
0.4
pi = p(1 − p)i−1 i = 1, 2, . . . (A.29) ●

0.2
For a geometric random variable E[X] = 1/p; if our coin is fair ●

●
●
0.0
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
p = 1/2 we wait two flips for a head on average while if the coin-flip 5 10 15
i
is very biased against heads p ≪ 1 we can be waiting a very long
time. The variance of a geometric random variable is σ 2 = 1−p/p2 , Figure A.7: Geometric distribution for
different probabilities of success (p).
which means that the mass of the distribution is much more spread The vertical lines show the means 1/p.
out if we consider the waiting time for rare events. See Figure A.7 Code here.
for examples of the distribution.
Poisson random variables count the i events that occur in a fixed

interval of time or space (t), when these events occur independently
254 graham coop
of each other and of time. If λ events are expected to occur in this

● ●
λ=1
interval, then ●
● λ=5
i −λ λ = 10
0.3
pi = λ e /i! (A.30)
Probability
For a random Poisson variable E[X] = λ.
0.2
●
● ●
●
●
The form of this is less intuitive than that of the binomial. How-
0.1
●
ever, the Poisson is actually a limiting case of the binomial. Think

●
● ●
● ●
of setting up a game of chance, where there’s a very large number ● ●

●
0.0
● ● ●
● ● ● ● ● ● ● ● ● ●
of coin flips (n → ∞), but you’ve set the chance of heads on a single 0 5
i
10 15
coin flip is very low (p = λ/n → 0, where λ is a constant). Under

Figure A.8: Poisson distribution with
these conditions you’d still expect some heads (np = λ), and the
different means (λ). the vetical lines
distribution of the number of heads is Poisson.2 See Figure A.8 to show the means. The lighter coloured
see how well they match. Therefore, the Poisson represents a limit lines show a binomial with n = 100
and p = λ/n to illustrate how well the
of the binomial for rare events. Poisson approximates the bionomial
for rare events (it’s hard to see them
as they are close together!). Code
A.2.5 Continuous Random Variable Distributions. here.
2
To see this we substitute p = λ/n
Important continuous random variables include into our binomial probability and take
the limit as n → ∞
Uniform random variables correspond to “randomly” choosing a num- n!
pi = pi (1 − p)n−i
ber in an interval, say [a, b]. The pdf for a uniform is i!(n − i)!
( )i ( )
n(n − 1) . . . (n − i − 1) λ λ n−i
1 = lim 1−
p(x) = for x ∈ [a, b] and 0 otherwise. (A.32) n→∞ i! n n
b−a ni λi
( )
λ n
= lim 1 −
n→∞ i! ni n
For a uniform random variable E[X] = (a + b)/2.
λi −λ
= lim e (A.31)
n→∞ i!
Exponential random variables with rate parameter λ > 0 correspond
The third line assumes that n − i ≈ n,
to the waiting time for an event which occurs with probability λ∆t which holds for n ≫ i, and uses our
over a time interval of length ∆t. For these random variables exponential approximation given by
eqn (A.4).
p(x) = λ exp(−λx) for x ≥ 0 and 0 otherwise. (A.33)
For an exponential random variable E[X] = 1/λ. The Exponential distribution is

the continuous-time version of the
Normal random variables have the “bell-shaped” or “Gaussian” Geometric distribution. Informally
this can be seen by considering the
shaped distribution. They are characterized by two parameters, trials in the geometric distribution
the mean µ and the standard deviation σ, and as corresponding to narrow time-
intervals, where the probability of
1 success is small. Then we can use
p(x) = √ exp(−(x − µ)2 /(2σ 2 )). (A.34) our exponential approximation to the
σ 2π geometric probability (eqn (A.4)).
For a normal random variable E[X] = µ.
Multiple random variables

Covariance and Independence To fully specify multiple random vari-
ables, say X and Y , one needs to know their joint distribution. For
population and
quantitative
genetics 255
example, if X and Y are discrete random variables taking on the val-
ues x1 , x2 , x3 , . . . , then the joint distribution is given by
pi,j = P[X = xi , Y = xj ] “ the probability that X equals x1 and Y equals x2 ”
(A.35)
for all i and j, see also our discussion around eqn. (A.14).
Alternatively, if X and Y are continuous random variables, then the
joint distribution is a function of the form p(x, y) which satisfies
∫ b∫ d
P[a ≤ X ≤ b, c ≤ Y ≤ d] = p(x, y) dxdy. (A.36)
a c
where X and Y are said to be independent if we can write the joint
density as a product of the probability density functions
p(x, y) = p(x)p(y). (A.37)
Given any function f (x, y) of x and y, one can define the expec-
tation E[f (X, Y )] by integrating with respect to the distribution.
Namely,
∫ ∫ ∑∑
E[f (X, Y )] = f (x, y)p(x, y) dxdy for continuous case and f (xi , xj )pi,j in discrete case
i j
(A.38)
The covariance of X and Y is given by
Cov(X, Y ) = E[(X − µX )(Y − µY )] = E[XY ] − µX µY . (A.39)
X and Y are said to uncorrelated if their covariance equals zero. If X
and Y are independent, then they are guaranteed to be uncorrelated,
but it is possible to construct X and Y to be uncorrelated but not
independent.
Binary variable correlations One application of our covariance for-

mula is to two binary variables, for example taking values A/a and
B/b. Let’s set X = 1 if A, and X = 0 otherwise, and Y = 1 if B.
For example, you could imagine drawing a once from a deck of cards
and A being the event of drawing an queen or a jack, with a being any
other type of card, and B being that the card is a heart and b it being
any other suit. So XY = 1 if our card is a Queen or Jack of Hearts,
and zero otherwise. Then
E[XY ] − E[X]E[Y ] = P(X = A, Y = B) − P(X = A)P(Y = B) pAB −(A.40)
pA pB
where pAB is the frequency of AB, eg. the proportion of cards that
are the Queen or Jack of hearts in our deck, and pA is the (marginal)
frequency of B, e.g. the proportion of Heart suite cards (and similarly
for pA ).
Question 6. What is the covariance for A and B in our deck of
cards example?
256 graham coop
Sample Covariance and Correlation We can calculate the sample

covariance for X and Y of a set of observations of X1 , · · · , XL and
Y1 , · · · , YL , where these observations are paired (Xi , Yi ) as
1 ∑
L
σd
2
XY = (Xi − X̄)(Yi − Ȳ ) (A.41)
L − 1 i=1
this captures the extent to which two sets of numbers covary. For
example, the running speeds of kids in a race at age 8 and 9 positively
covary. Example datasets are shown in Figure A.9.
Figure A.9: Examples of datasets

4
4
● ● ●
0.15 0.82 ●
−0.56
● ●●
where pairs of variables show varying
4
● ●
●
● ● degrees of covariance, the sample
3
●●
3
● ● ● ●
correlation (ρ[
●
XY ) is shown in the top
●
3
● ●
● ● ● ●●
● ● ●
●
●●
● ● ●●
● ●
● ●
●
● ● ●
● ●●
● ● corner. Code here.
2
● ●
●●
Y
●
2
2
● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ● ●
●● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ●
●
1
● ●
1
●
● ●●
1
● ● ●
● ●
●
● ● ●
● ● ●
0
● ●
0
●
● ● ●
0
3 4 5 6 7 3 4 5 6 7 4 5 6 7
X X X
To move covariances to a more understandable scale we can divide

through by the product of the standard deviations
σXY
Cor(X, Y ) = ρXY = (A.42)
σX σX
this is the correlation of our variables X and Y , if we calculate it
for our sample it is our sample correlation. A correlation can range
between 1, perfectly correlated, to −1 perfectly negatively correlated.
If ρXY = 0 the variables are said to be uncorrelated.
Fitting a linear regression using least squares. We often want to

approximate the relationship between our two variables X and Y
by the best fitting linear relationship predicting Y value from their
observed X value. For example, think of a linear prediction of a child’s
4.0
weight from their height. See Figure A.10 for an example plot. To do ●
3.5
this we can think of approximating the Yi that accompanies the Xi ●

3.0
●
●
2.5
value for the ith pair of data points by ● ●

Y
●
2.0
● ●
●
●
●
Yi ≈ a + bXi
1.5
●
(A.43) ●
● ●
Var(X)= 1.3
1.0
● Cov(X,Y)= 0.47
●
where a and b are the intercept and slope of a line.

0.5
●
Slope= 0.36
3 4 5 6 7
What is the best fitting line? Well one common definition of the X
optimal fit is the choice of a and b that minimize the squared error
between the observed (Y ) and their predicted values, i.e. Figure A.10: An example of a linear
regression with best fitting least-
∑
L squares line. The sample variance and
(Yi − a − bXi )2 (A.44) covariance are given, so that you can
i=1 see for yourself that the best fitting
slope is just the ratio of these two.
Code here.
population and
quantitative
genetics 257
here (Yi − a − bXi )2 is the squared residual error, the square of the
length of the dotted lines in Figure A.10. The best fitting slope, i.e.
that with least squared error, is
b = σd
2 c
XY /σX
2 (A.45)
i.e. the sample covariance of X and Y divided by the sample variance

of X. Thus the slope will be of the same sign as the covariance, and
will be larger in magnitude when the covariance of X and Y is a large
proportion of the variance of X.
This least squares fit is the solution to the linear regression
Yi ∼ a + bXi + ϵi (A.46)
where the errors (ϵi ) are uncorrelated across data points with an ex-
pectation of zero and constant but unknown variance. These assump-
tions would hold for example if ϵi ∼ Normal(0, σ).
We often want to include additional terms in our regression, or have
more complicated error structures, but these extensions can usually be
understood as simple extensions of this machinery. For example, least-
squares can also be used to fit a non-linear function of X, f (X, Ω),
where we minimize
∑L
(Yi − f (Xi ; Ω))2 (A.47)
i=1
over our choices of parameters Ω. Often there is no analytical solu-

tion, i.e. no equivalent of eqn. A.45, and the answer must be found
computationally exploring over choices of Ω (using tools available in
R and other programming languages). Throughout the book we use
non-linear least squares to fit various models to data.
Useful Properties of Covariances. Following from the linearity of

expectation, eqn (A.24), if we rescale X to mX + n and Y to oY + p
then
Cov(mX + n, oY + p) = (mo)Cov(X, Y ) (A.48)
Such linear transforms leaves our correlation unaffected, as it cancels

out of the top and bottom of eqn (A.42). While it multiplies the linear
slope by m/o. Thus for our correlation it does not matter what units
we measure X and Y . While our linear slope is always in the units of
units(X)/units(Y), for example m/s if we were predicting the distance (Y
in meters) traveled by runners in time intervals (Y seconds).
Useful Limits.
Law of Large Numbers If X1 , X2 , . . . are a sequence of independent
random variables (i.e. “the outcomes of a sequence of independent
258 graham coop
experiments) with common expectation µ = E[Xi ], then
X1 + · · · + Xn
→ µ as n → ∞ with probability one. (A.49)
n
Hence, LLN implies that if you repeat a bunch of experiments and
take the average outcome (X̄) from the experiments, the value you get
is likely to be close the expected outcome of the experiment.
Of course, in the real world, we can only perform a finite number
of experiments in which case it is useful to have a sense of how much
variation there will be in the average outcome. The central theorem is
the key tool for understanding this variation.
Central Limit Theorem If X1 , X2 , . . . are a sequence of independent

random variables (i.e. “the outcomes of a sequence of independent
experiments) with common expectation µ = E[Xi ] and variance σ 2 ,
then
X1 + · · · + Xn − µ n
√ → normal distribution with mean 0 and variance 1 as n → ∞
nσ
(A.50)
Hence, for n large enough X1 + · · · + Xn is approximately normally
distributed with mean µ n and variance σ 2 n. This is one of the reasons
the normal distribution is so useful, many outcomes (e.g. phenotypes)
have an approximately normal distribution as they are the combined
outcome of many (somewhat) independent quantities.
Bibliography
Aguadé, M., N. Miyashita, and C. H. Langley, 1989 Re-

duced variation in the yellow-achaete-scute region in natural popula-
tions of Drosophila melanogaster. Genetics 122: 607–615.
Aguillon, S. M., J. W. Fitzpatrick, R. Bowman, S. J.

Schoech, A. G. Clark, G. Coop, and N. Chen, 2017,
08)Deconstructing isolation-by-distance: The genomic consequences
of limited dispersal. PLOS Genetics 13(8): 1–27.
Akçay, E. and J. Van Cleve, 2016 There is no fitness but fit-

ness, and the lineage is its bearer. Phil. Trans. R. Soc. B 371(1687):
20150085.
Alcaide, M., E. S. Scordato, T. D. Price, and D. E.

Irwin, 2014 Genomic divergence in a ring species complex. Na-
ture 511(7507): 83.
Alexander, D. H., J. Novembre, and K. Lange, 2009

Fast model-based estimation of ancestry in unrelated individuals.
Genome research 19(9): 1655–1664.
Algee-Hewitt, B. F., M. D. Edge, J. Kim, J. Z. Li,

and N. A. Rosenberg, 2016 Individual identifiability predicts
population identifiability in forensic microsatellite markers. Current
Biology 26(7): 935–942.
Allendorf, F. W. and J. J. Hard, 2009 Human-induced

evolution caused by unnatural selection through harvest of
wild animals. Proceedings of the National Academy of Sci-
ences 106(Supplement 1): 9987–9994.
Alvarez, G., F. C. Ceballos, and C. Quinteiro, 2009 The

role of inbreeding in the extinction of a European royal dynasty.
PLoS One 4(4): e5174.
260 graham coop
Andolfatto, P., 2007 Hitchhiking effects of recurrent beneficial

amino acid substitutions in the Drosophila melanogaster genome.
Genome Res. 17: 1755–1762.
Andolfatto, P. and M. Przeworski, 2001 Regions of lower

crossing over harbor more rare variants in African populations of
Drosophila melanogaster. Genetics 158: 657–665.
Ayllon, F., E. Kjærner-Semb, T. Furmanek, V. Wen-

nevik, M. F. Solberg, G. Dahle, G. L. Taranger,
K. A. Glover, M. S. Almén, C. J. Rubin, and others,
2015 The vgll3 locus controls age at maturity in wild and domesti-
cated Atlantic salmon (Salmo salar L.) males. PLoS genetics 11(11):
e1005628.
Bachtrog, D., J. E. Mank, C. L. Peichel, M. Kirk-

patrick, S. P. Otto, T.-L. Ashman, M. W. Hahn,
J. Kitano, I. Mayrose, R. Ming, and others, 2014 Sex
determination: why so many ways of doing it? PLoS biology 12(7):
e1001899.
Barrett, R. D. H., S. M. Rogers, and D. Schluter,

2008 Natural Selection on a Major Armor Gene in Threespine
Stickleback. Science 322(5899): 255–257.
Barson, N. J., T. Aykanat, K. Hindar, M. Baranski,

G. H. Bolstad, P. Fiske, C. Jacq, A. J. Jensen, S. E.
Johnston, S. Karlsson, and others, 2015 Sex-dependent
dominance at a single locus maintains variation in age at maturity
in salmon. Nature 528(7582): 405.
Barton, N. and G. Hewitt, 1981 A chromosomal cline in the

grasshopper Podisma pedestris. Evolution: 1008–1018.
Barton, N. H., 2000 Genetic hitchhiking. Philos. Trans. R. Soc.

Lond., B, Biol. Sci. 355: 1553–1562.
Basolo, A. L., 1994 The dynamics of Fisherian sex-ratio evolu-

tion: theoretical and experimental investigations. The American
Naturalist 144(3): 473–490.
Bazykin, A., 1969 Hypothetical mechanism of speciation. Evolu-

tion 23(4): 685–687.
Becquet, C., N. Patterson, A. C. Stone, M. Prze-

worski, and D. Reich, 2007 Genetic structure of chimpanzee
populations. PLoS genetics 3(4): e66.
population and
quantitative
genetics 261
Begun, D. J. and C. F. Aquadro, 1992 Levels of naturally
occurring DNA polymorphism correlate with recombination rates in
D. melanogaster. Nature 356: 519–520.
Beissinger, T. M., L. Wang, K. Crosby, A. Durva-

sula, M. B. Hufford, and J. Ross-Ibarra, 2016 Recent
demography drives changes in linked selection across the maize
genome. Nature plants 2(7): 16084.
Bell, M. A., M. P. Travis, and D. M. Blouw, 2006 Infer-

ring natural selection in a fossil threespine stickleback. Paleobiol-
ogy 32(4): 562–577.
Box, G. E., 1979 Robustness in the strategy of scientific model

building. In Robustness in statistics, pp. 201–236. Elsevier.
Bradburd, G. S., P. L. Ralph, and G. M. Coop, 2016

A spatial framework for understanding population structure and
admixture. PLoS genetics 12(1): e1005703.
Brandvain, Y., A. M. Kenney, L. Flagel, G. Coop, and

A. L. Sweigart, 2014 Speciation and introgression between
Mimulus nasutus and Mimulus guttatus. PLoS Genetics 10(6):
e1004410.
Braverman, J. M., R. R. Hudson, N. L. Kaplan, C. H.

Langley, and W. Stephan, 1995 The hitchhiking effect on
the site frequency spectrum of DNA polymorphisms. Genetics 140:
783–796.
Brodie I I I, E. D., 1992 Correlational selection for color pattern

and antipredator behavior in the garter snake Thamnophis ordi-
noides. Evolution 46(5): 1284–1298.
Cai, J. J., J. M. Macpherson, G. Sella, and D. A.

Petrov, 2009 Pervasive hitchhiking at coding and regulatory
sites in humans. PLoS Genet. 5: e1000336.
Cassa, C. A., D. Weghorn, D. J. Balick, D. M. Jor-

dan, D. Nusinow, K. E. Samocha, A. O’Donnell-
Luria, D. G. MacArthur, M. J. Daly, D. R. Beier,
and others, 2017 Estimating the selective effects of heterozy-
gous protein-truncating variants from human exome data. Nature
genetics 49(5): 806.
Charlesworth, B., 2009 Effective population size and pat-

terns of molecular evolution and variation. Nature Reviews Ge-
netics 10(3): 195.
262 graham coop
Charlesworth, D., B. Charlesworth, and M. T. Mor-

gan, 1995 The pattern of neutral molecular variation under the
background selection model. Genetics 141: 1619–1632.
Charlesworth, D. and V. Laporte, 1998 The male-sterility

polymorphism of Silene vulgaris: analysis of genetic data from two
populations and comparison with Thymus vulgaris. Genetics 150(3):
1267–1282.
Chen, N., E. J. Cosgrove, R. Bowman, J. W. Fitz-

patrick, and A. G. Clark, 2016 Genomic Consequences of
Population Decline in the Endangered Florida Scrub-Jay. Current
Biology 26(21): 2974 – 2979.
Cohen, D., 1966 Optimizing reproduction in a randomly varying

environment. Journal of theoretical biology 12(1): 119–129.
Colbert, E. H., 1948 Evolution of the horned dinosaurs. Evolu-

tion 2(2): 145–163.
Cook, L. M., B. S. Grant, I. J. Saccheri, and J. Mal-

let, 2012 Selective bird predation on the peppered moth: the last
experiment of Michael Majerus. Biology Letters 8(4): 609–612.
Cotterman, C. W., 1940 A calculus for statistico-genetics. Ph.

D. thesis, The Ohio State University.
Cousminer, D. L., D. J. Berry, N. J. Timpson, W. Ang,

E. Thiering, E. M. Byrne, H. R. Taal, V. Huikari,
J. P. Bradfield, M. Kerkhof, and others, 2013
Genome-wide association and longitudinal analyses reveal genetic
loci linking pubertal height growth, pubertal timing and childhood
adiposity. Human molecular genetics 22(13): 2735–2747.
Cutter, A. D. and J. Y. Choi, 2010 Natural selection shapes

nucleotide polymorphism across the genome of the nematode
Caenorhabditis briggsae. Genome Res. 20: 1103–1111.
Darwin, C., 1859 On the Origin of Species by Means of Natural

Selection. London: Murray. or the Preservation of Favored Races in
the Struggle for Life.
Darwin, C., 1876 The effect of cross and self fertilization in the
vegetable kingdom: Murray. London, UK.
Darwin, C., 1888 The descent of man and selection in relation to

sex, Volume 1. Murray.
Dempster, E., 1955 Maintenance of genetic heterogeneity. Cold

Spring Harb Symp Quant Biol 20: 25–32.
population and
quantitative
genetics 263
Dickerson, R. E., 1971 The structure of cytochromec and the
rates of molecular evolution. Journal of Molecular Evolution 1(1):
26–45.
Dobzhansky, T., 1943 Genetics of natural populations IX. Tem-

poral changes in the composition of populations of Drosophila pseu-
doobscura. Genetics 28(2): 162.
Dobzhansky, T., 1951 Genetics and the Origin of Species (3rd

Ed. ed.)., pp. 16.
Dobzhansky, T., 1970 Genetics of the evolutionary process, Vol-

ume 139. Columbia University Press.
Edwards, A., 1961 The population genetics of “sex-ratio” in

Drosophila pseudoobscura. Heredity 16(3): 291.
Edwards, A. W., 1998 Natural selection and the sex ratio:

Fisher’s sources. The American Naturalist 151(6): 564–569.
Elton, C., 1942 Voles, mice and lemmings. Problems in population

dynamics. Oxford: Clarendon Press.
Elyashiv, E., S. Sattath, T. T. Hu, A. Strutsovsky,

G. McVicker, P. Andolfatto, G. Coop, and G. Sella,
2016 A genomic map of the effects of linked selection in Drosophila.
PLoS genetics 12(8): e1006130.
Ewens, W. J., 2010 What is the gene trying to do? British Jour-
nal for the Philosophy of Science 62(1): 155–176.
Ewens, W. J., 2016 Motoo Kimura and James Crow on the In-
finitely Many Alleles Model. Genetics 202(4): 1243–1245.
Faith, M. S., A. Pietrobelli, C. Nunez, M. Heo, S. B.

Heymsfield, and D. B. Allison, 1999 Evidence for inde-
pendent genetic influences on fat mass and body mass index in a
pediatric twin sample. Pediatrics 104(1): 61–67.
Fay, J. C. and C. I. Wu, 2000 Hitchhiking under positive Dar-

winian selection. Genetics 155: 1405–1413.
Feder, A. F., C. Kline, P. Polacino, M. Cottrell,

A. D. Kashuba, B. F. Keele, S.-L. Hu, D. A. Petrov,
P. S. Pennings, and Z. Ambrose, 2017 A spatio-temporal
assessment of simian/human immunodeficiency virus (SHIV) evo-
lution reveals a highly dynamic process within the host. PLoS
pathogens 13(5): e1006358.
264 graham coop
Ferree, P. M. and D. A. Barbash, 2007 Distorted sex ratios:

a window into RNAi-mediated silencing. PLoS biology 5(11): e303.
Fisher, R. A., 1915 The evolution of sexual preference. The

Eugenics Review 7 (3): 184.
Fisher, R. A., 1923 XXI.—on the dominance ratio. Proceedings of

the royal society of Edinburgh 42: 321–341.
Fisher, R. A., 1930 The genetical theory of natural selection: a

complete variorum edition. Oxford University Press.
Francioli, L. C., A. Menelaou, S. L. Pulit,

F. Van Dijk, P. F. Palamara, C. C. Elbers, P. B.
Neerincx, K. Ye, V. Guryev, W. P. Kloosterman,
and others, 2014 Whole-genome sequence variation, population
structure and demographic history of the Dutch population. Nature
Frentiu, F. D., G. D. Bernard, C. I. Cuevas, M. P.

Sison-Mangus, K. L. Prudic, and A. D. Briscoe, 2007
Adaptive evolution of color vision as seen through the eyes of but-
terflies. Proceedings of the National Academy of Sciences 104(suppl
1): 8634–8640.
Galen, C., 1996 Rates of floral evolution: adaptation to bumblebee

pollination in an alpine wildflower, Polemonium viscosum. Evolu-
tion 50(1): 120–125.
Galtier, N., 2016 Adaptive protein evolution in animals and the

effective population size hypothesis. PLoS genetics 12(1): e1005774.
Gigord, L. D., M. R. Macnair, and A. Smithson, 2001

Negative frequency-dependent selection maintains a dramatic
flower color polymorphism in the rewardless orchid Dactylorhiza
sambucina (L.) Soo. Proceedings of the National Academy of Sci-
ences 98(11): 6253–6255.
Gillespie, J. H., 1973 Natural selection with varying selection

coefficients–a haploid model. Genetics Research 21(2): 115–120.
Gillespie, J. H., 1977 Natural selection for variances in offspring

numbers: a new evolutionary principle. The American Natural-
ist 111(981): 1010–1014.
Gillespie, J. H., 2000 Genetic drift in an infinite population. The

pseudohitchhiking model. Genetics 155: 909–919.
Gingerich, P., 1983 Rates of evolution: effects of time and tempo-

ral scaling. Science 222: 159–162.
population and
quantitative
genetics 265
Grant, P. R. and B. R. Grant, 1995 Predicting microevo-
lutionary responses to directional selection on heritable variation.
Evolution 49(2): 241–251.
Grant, P. R. and B. R. Grant, 2002 Unpredictable evolution

in a 30-year study of Darwin’s finches. science 296(5568): 707–711.
Gremer, J. R. and D. L. Venable, 2014 Bet hedging in

desert winter annual plants: optimal germination strategies in a
variable environment. Ecology Letters 17 (3): 380–387.
Grognet, P., H. Lalucque, F. Malagnac, and P. Silar,

2014 Genes that bias Mendelian segregation. PLoS genetics 10(5):
e1004387.
Haldane, J., 1942 The selective elimination of silver foxes in east-

ern Canada. Journal of Genetics 44(2-3): 296–304.
Haldane, J. and S. Jayakar, 1963 Polymorphism due to selec-

tion of varying direction. Journal of Genetics 58(2): 237–242.
Haldane, J. B. S., 1927 A mathematical theory of natural and

artificial selection, part V: selection and mutation. In Mathematical
Proceedings of the Cambridge Philosophical Society, Volume 23, pp.
838–844. Cambridge University Press.
Haldane, J. B. S., 1937 The Effect of Variation of Fitness. The

American Naturalist 71(735): 337–349.
Haldane, J. B. S., 1949 Suggestions as to quantitative measure-

ment of rates of evolution. Evolution 3(1): 51–56.
Haldane, J. B. S., 1964 A defense of beanbag genetics. Perspec-

tives in Biology and Medicine 7 (3): 343–360.
Hamilton, W. D., 1964a The genetical evolution of social be-

haviour. II. Journal of theoretical biology 7 (1): 17–52.
Hamilton, W. D., 1964b The genetical evolution of social be-

haviour. II. Journal of theoretical biology 7 (1): 17–52.
Hermisson, J. and P. S. Pennings, 2017 Soft sweeps and

beyond: understanding the patterns and probabilities of selection
footprints under rapid adaptation. Methods in Ecology and Evolu-
tion 8(6): 700–716.
Hey, J. and R. M. Kliman, 2002 Interactions between nat-

ural selection, recombination and gene density in the genes of
Drosophila. Genetics 160(2): 595–608.
266 graham coop
Hoekstra, H. E., K. E. Drumm, and M. W. Nachman,

2004 Ecological genetics of adaptive color polymorphism in pocket
mice: geographic variation in selected and neutral genes. Evolu-
tion 58(6): 1329–1341.
Hohenlohe, P. A., S. Bassham, P. D. Etter,

N. Stiffler, E. A. Johnson, and W. A. Cresko, 2010
Population genomics of parallel adaptation in threespine stickleback
using sequenced RAD tags. PLoS genetics 6(2): e1000862.
Hollister, J. D., S. Greiner, W. Wang, J. Wang,

Y. Zhang, G. K.-S. Wong, S. I. Wright, and M. T.
Johnson, 2014 Recurrent loss of sex is associated with accumula-
tion of deleterious mutations in Oenothera. Molecular biology and
evolution 32(4): 896–905.
Hopkins, J., G. Baudry, U. Candolin, and A. Kaitala,

2015 I’m sexy and I glow it: female ornamentation in a nocturnal
capital breeder. Biology letters 11(10): 20150599.
Houde, A. E., 1994 Effect of artificial selection on male colour

patterns on mating preference of female guppies. Proc. R. Soc.
Lond. B 256(1346): 125–130.
Howes, R. E., M. Dewi, F. B. Piel, W. M. Monteiro,

K. E. Battle, J. P. Messina, A. Sakuntabhai, A. W.
Satyagraha, T. N. Williams, J. K. Baird, and S. I.
Hay, 2013 Spatial distribution of G6PD deficiency variants across
malaria-endemic regions. Malar. J. 12: 418.
Howes, R. E., F. B. Piel, A. P. Patil, O. A. Nyangiri,

P. W. Gething, M. Dewi, M. M. Hogg, K. E. Bat-
tle, C. D. Padilla, J. K. Baird, and S. I. Hay, 2012
G6PD deficiency prevalence and estimates of affected populations in
malaria endemic countries: a geostatistical model-based map. PLoS
Medicine 9(11): e1001339.
Hudson, R. R., 2015, 07)A New Proof of the Expected Frequency

Spectrum under the Standard Neutral Model. PLOS ONE 10(7):
1–5.
Hudson, R. R. and N. L. Kaplan, 1995a Deleterious back-

ground selection with recombination. Genetics 141: 1605–1617.
Hudson, R. R. and N. L. Kaplan, 1995b The coalescent pro-

cess and background selection. Philos. Trans. R. Soc. Lond., B, Biol.
Sci. 349: 19–23.
population and
quantitative
genetics 267
Hudson, R. R., M. Kreitman, and M. Aguadé, 1987 A test
of neutral molecular evolution based on nucleotide data. Genet-
ics 116(1): 153–159.
Hughes, W. O., B. P. Oldroyd, M. Beekman, and F. L.

Ratnieks, 2008 Ancestral monogamy shows kin selection is key to
the evolution of eusociality. Science 320(5880): 1213–1216.
Hunt, G., M. A. Bell, and M. P. Travis, 2008 Evolution

toward a new adaptive optimum: phenotypic evolution in a fossil
stickleback lineage. Evolution 62(3): 700–710.
Jain, S. and A. D. Bradshaw, 1966 Evolutionary divergence

among adjacent plant populations I. The evidence and its theoreti-
cal analysis. Heredity 21(3): 407.
Janicke, T., I. K. Häderer, M. J. Lajeunesse, and

N. Anthes, 2016 Darwinian sex roles confirmed across the an-
imal kingdom. Science advances 2(2): e1500983.
Jennings, W. B. and S. V. Edwards, 2005 Speciational his-

tory of Australian grass finches (Poephila) inferred from thirty gene
trees. Evolution 59(9): 2033–2047.
Johannsen, W., 1911 The Genotype Conception of Heredity. The

American Naturalist 45(531): 129–159.
Johnston, S. E., J. Gratten, C. Berenos, J. G. Pilk-

ington, T. H. Clutton-Brock, J. M. Pemberton, and
J. Slate, 2013 Life history trade-offs at a single locus maintain
sexually selected genetic variation. Nature 502(7469): 93.
Joron, M., L. Frezal, R. T. Jones, N. L. Chamber-

lain, S. F. Lee, C. R. Haag, A. Whibley, M. Becuwe,
S. W. Baxter, L. Ferguson, and others, 2011 Chromo-
somal rearrangements maintain a polymorphic supergene controlling
butterfly mimicry. Nature 477 (7363): 203.
Joron, M., R. Papa, M. Beltrán, N. Chamberlain,

J. Mavárez, S. Baxter, M. Abanto, E. Berming-
ham, S. J. Humphray, J. Rogers, and others, 2006 A
conserved supergene locus controls colour pattern diversity in Heli-
conius butterflies. PLoS biology 4(10): e303.
Jukema, J. and T. Piersma, 2006 Permanent female mimics in

a lekking shorebird. Biology letters 2(2): 161–164.
Kaplan, N. L., R. R. Hudson, and C. H. Langley, 1989

The hitchhiking effect revisited. Genetics 123: 887–899.
268 graham coop
Karn, M. N. and L. Penrose, 1951 Birth weight and gestation

time in relation to maternal age, parity and infant survival. Annals
of eugenics 16(1): 147–164.
Kettlewell, H. B. D., 1955 Selection experiments on industrial

melanism in the Lepidoptera. Heredity 9(3): 323.
Kim, Y., 2006 Allele frequency distribution under recurrent selective

sweeps. Genetics 172: 1967–1978.
Kimura, M., 1968 Evolutionary rate at the molecular level. Na-

ture 217 (5129): 624–626.
Kimura, M., 1983 The neutral theory of molecular evolution. Cam-

bridge University Press.
Kimura, M. and J. F. Crow, 1964 The number of alleles that

can be maintained in a finite population. Genetics 49(4): 725.
Kimura, M. and T. Ohta, 1974 On some principles govern-

ing molecular evolution. Proceedings of the National Academy of
Sciences 71(7): 2848–2852.
King, J. L. and T. H. Jukes, 1969 Non-darwinian evolution.

Science 164(3881): 788–798.
Klinka, D. R. and T. E. Reimchen, 2009 Adaptive coat

colour polymorphism in the Kermode bear of coastal British
Columbia. Biological Journal of the Linnean Society 98(3): 479–
488.
Kornegay, J. R., J. W. Schilling, and A. C. Wilson,

1994 Molecular adaptation of a leaf-eating bird: stomach lysozyme
of the hoatzin. Molecular Biology and Evolution 11(6): 921–928.
Krakauer, A. H., 2005 Kin selection and cooperative courtship in

wild turkeys. Nature 434(7029): 69.
Kruuk, L. E., J. Slate, J. M. Pemberton, S. Broth-

erstone, F. Guinness, and T. Clutton-Brock, 2002
Antler size in red deer: heritability and selection but no evolution.
Evolution 56(8): 1683–1695.
Küpper, C., M. Stocks, J. E. Risse, N. dos Remedios,

L. L. Farrell, S. B. McRae, T. C. Morgan, N. Kar-
lionova, P. Pinchuk, Y. I. Verkuil, and others, 2016
A supergene determines highly divergent male reproductive morphs
in the ruff. Nature Genetics 48(1): 79.
population and
quantitative
genetics 269
Kwiatkowski, D. P., 2005, August)How malaria has affected
the human genome and what human genetics can teach us about
malaria. Am. J. Hum. Genet. 77 (2): 171–192.
Laberge, A.-M., M. Jomphe, L. Houde, H. Vézina,

M. Tremblay, B. Desjardins, D. Labuda, M. St-Hi-
laire, C. Macmillan, E. A. Shoubridge, and others,
2005 A “Fille du Roy” introduced the T14484C Leber hereditary
optic neuropathy mutation in French Canadians. The American
Journal of Human Genetics 77 (2): 313–317.
Lamichhaney, S., G. Fan, F. Widemo, U. Gunnars-

son, D. S. Thalmann, M. P. Hoeppner, S. Kerje,
U. Gustafson, C. Shi, H. Zhang, and others, 2016
Structural genomic changes underlie alternative reproductive strate-
gies in the ruff (Philomachus pugnax). Nature Genetics 48(1): 84.
Lande, R., 1976 Natural selection and random genetic drift in

phenotypic evolution. Evolution 30(2): 314–334.
Lande, R., 1979 Quantitative genetic analysis of multivariate evo-

lution, applied to brain: body size allometry. Evolution 33(1Part2):
402–416.
Lande, R. and S. J. Arnold, 1983 The measurement of selec-

tion on correlated characters. Evolution 37 (6): 1210–1226.
Laurie, C. C., D. A. Nickerson, A. D. Anderson, B. S.

Weir, R. J. Livingston, M. D. Dean, K. L. Smith,
E. E. Schadt, and M. W. Nachman, 2007, 08)Linkage Dise-
quilibrium in Wild Mice. PLOS Genetics 3(8): 1–9.
Lawson, D. J., L. Van Dorp, and D. Falush, 2018 A tuto-

rial on how not to over-interpret STRUCTURE and ADMIXTURE
bar plots. Nature communications 9(1): 3258.
Lefébure, T., C. Morvan, F. Malard, C. François,

L. Konecny-Dupré, L. Guéguen, M. Weiss-Gayet,
A. Seguin-Orlando, L. Ermini, C. Der Sarkissian,
and others, 2017 Less effective selection leads to larger genomes.
Genome research: gr–212589.
Leffler, E. M., K. Bullaughey, D. R. Matute, W. K.

Meyer, L. Segurel, A. Venkat, P. Andolfatto, and
M. Przeworski, 2012 Revisiting an old riddle: what deter-
mines genetic diversity levels within species? PLoS biology 10(9):
e1001388.
270 graham coop
Lek, M., K. J. Karczewski, E. V. Minikel, K. E.

Samocha, E. Banks, T. Fennell, A. H. O’Donnell-
Luria, J. S. Ware, A. J. Hill, B. B. Cummings, and
others, 2016 Analysis of protein-coding genetic variation in
60,706 humans. Nature 536(7616): 285.
Lenormand, T., D. Bourguet, T. Guillemaud, and

M. Raymond, 1999 Tracking the evolution of insecticide resis-
tance in the mosquito Culex pipiens. Nature 400(6747): 861.
Lewontin, R. C., 1970 The units of selection. Annual review of

ecology and systematics 1(1): 1–18.
Lewontin, R. C., 1974 The Genetic Basis of Evolutionary

Change. Columbia University Press, New York.
Lewontin, R. C., 1994, 05)[DNA Fingerprinting: A Review of

the Controversy]: Comment: The Use of DNA Profiles in Forensic
Contexts. Statist. Sci. 9(2): 259–262.
Lewontin, R. C., 2001 Thinking about evolution: historical, philo-

sophical, and political perspectives, Chapter Natural History and
Formalism in Evolutionary Genetics, pp. 7–20. Cambridge Univer-
sity Press.
Li, J. Z., D. M. Absher, H. Tang, A. M. Southwick,

A. M. Casto, S. Ramachandran, H. M. Cann, G. S.
Barsh, M. Feldman, L. L. Cavalli-Sforza, and oth-
ers, 2008 Worldwide human relationships inferred from genome-
wide patterns of variation. science 319(5866): 1100–1104.
Lin, C.-J., F. Hu, R. Dubruille, J. Vedanayagam,

J. Wen, P. Smibert, B. Loppin, and E. C. Lai, 2018 The
hpRNA/RNAi pathway is essential to resolve intragenomic conflict
in the Drosophila male germline. Developmental cell 46(3): 316–326.
Lister, A., 1989 Rapid dwarfing of red deer on Jersey in the last
interglacial. Nature 342(6249): 539.
Locke, D. P., L. W. Hillier, W. C. Warren, K. C.

Worley, L. V. Nazareth, D. M. Muzny, S.-P. Yang,
Z. Wang, A. T. Chinwalla, P. Minx, and others, 2011
Comparative and demographic analysis of orang-utan genomes.
Nature 469(7331): 529.
Losos, J. B., S. J. Arnold, G. Bejerano, E. Brodie I I I,

D. Hibbett, H. E. Hoekstra, D. P. Mindell, A. Mon-
teiro, C. Moritz, H. A. Orr, and others, 2013 Evolu-
tionary biology for the 21st century. PLoS Biology 11(1): e1001466.
population and
quantitative
genetics 271
Louicharoen, C., E. Patin, R. Paul, I. Nuchpray-
oon, B. Witoonpanich, C. Peerapittayamongkol,
I. Casademont, T. Sura, N. M. Laird, P. Singhasi-
vanon, L. Quintana-Murci, and A. Sakuntabhai, 2009,
December)Positively selected G6PD-Mahidol mutation reduces Plas-
modium vivax density in Southeast Asians. Science 326(5959):
1546–1549.
Lowry, D. B. and J. H. Willis, 2010 A widespread chromo-

somal inversion polymorphism contributes to a major life-history
transition, local adaptation, and reproductive isolation. PLoS biol-
ogy 8(9): e1000500.
MacArthur, D. G., S. Balasubramanian, A. Frank-

ish, N. Huang, J. Morris, K. Walter, L. Jostins,
L. Habegger, J. K. Pickrell, S. B. Montgomery, and
others, 2012 A systematic survey of loss-of-function variants in
human protein-coding genes. Science 335(6070): 823–828.
Macpherson, J. M., G. Sella, J. C. Davis, and D. A.

Petrov, 2007 Genomewide spatial correspondence between non-
synonymous divergence and neutral polymorphism reveals extensive
adaptation in Drosophila. Genetics 177: 2083–2099.
Majerus, M. E., 2009 Industrial melanism in the peppered moth,

Biston betularia: an excellent teaching example of Darwinian evolu-
tion in action. Evolution: Education and Outreach 2(1): 63.
Malécot, G., 1948 Les mathématiques de l’hérédité.
Malécot, G., 1969 The Mathematics of Heredity (Revised, edited

and translated by Yermanos, DM).
Marciniak, S. and G. H. Perry, 2017 Harnessing ancient

genomes to study the history of human adaptation. Nature Reviews
Genetics 18(11): 659.
Maynard Smith, J., 1964 Group selection and kin selection.

Nature 201(4924): 1145.
Maynard Smith, J. and J. Haigh, 1974 The hitch-hiking

effect of a favourable gene. Genet. Res. 23: 23–35.
McDonald, J. H. and M. Kreitman, 1991 Adaptive protein

evolution at the Adh locus in Drosophila. Nature 351(6328): 652.
McVicker, G., D. Gordon, C. Davis, and P. Green, 2009

Widespread genomic signatures of natural selection in hominid
evolution. PLoS Genet. 5: e1000471.
272 graham coop
Menozzi, P., A. Piazza, and L. Cavalli-Sforza, 1978

Synthetic maps of human gene frequencies in Europeans. Sci-
ence 201(4358): 786–792.
Meredith, R. W., J. Gatesy, W. J. Murphy, O. A. Ry-

der, and M. S. Springer, 2009, 09)Molecular Decay of the
Tooth Gene Enamelin (ENAM) Mirrors the Loss of Enamel in the
Fossil Record of Placental Mammals. PLOS Genetics 5(9): 1–12.
Messier, W. and C.-B. Stewart, 1997 Episodic adaptive

evolution of primate lysozymes. Nature 385(6612): 151.
Milne, A. A. and E. H. Shepard, 1926 Winnie-the-Pooh.
Milot, E., C. Moreau, A. Gagnon, A. A. Cohen,

B. Brais, and D. Labuda, 2017 Mother’s curse neutralizes
natural selection against a human genetic disease over three cen-
turies. Nature ecology & evolution 1(9): 1400.
Muller, H. J., 1932 Some genetic aspects of sex. The American

Naturalist 66(703): 118–138.
Muller, H. J., 1964 The relation of recombination to mutational

advance. Mutation Research/Fundamental and Molecular Mecha-
nisms of Mutagenesis 1(1): 2–9.
Nachman, M. W., H. E. Hoekstra, and S. L.

D’Agostino, 2003 The genetic basis of adaptive melanism
in pocket mice. Proceedings of the National Academy of Sci-
ences 100(9): 5268–5273.
Nash, D., S. Nair, M. Mayxay, P. N. Newton, J.-P.

Guthmann, F. Nosten, and T. J. Anderson, 2005 Selec-
tion strength and hitchhiking around two anti-malarial resistance
genes. Proceedings of the Royal Society of London B: Biological
Sciences 272(1568): 1153–1161.
Nelson, M. R., D. Wegmann, M. G. Ehm, D. Kessner,

P. S. Jean, C. Verzilli, J. Shen, Z. Tang, S.-A. Ba-
canu, D. Fraser, and others, 2012 An abundance of rare
functional variants in 202 drug target genes sequenced in 14,002
people. Science: 1217876.
Nordborg, M., B. Charlesworth, and

D. Charlesworth, 1996 The effect of recombination on back-
ground selection. Genet. Res. 67: 159–174.
Novembre, J. and M. Stephens, 2008 Interpreting principal

component analyses of spatial population genetic variation. Nature
population and
quantitative
genetics 273
Ohta, T., 1972 Population size and rate of evolution. Journal of
Molecular Evolution 1(4): 305–314.
Ohta, T., 1973 Slightly deleterious mutant substitutions in evolu-

tion. Nature 246(5428): 96.
Ohta, T., 1987 Very slightly deleterious mutations and the molecu-
lar clock. Journal of Molecular Evolution 26(1-2): 1–6.
Ohta, T. and J. H. Gillespie, 1996 Development of neutral

and nearly neutral theories. Theoretical population biology 49(2):
128–142.
Owen, D. and D. Chanter, 1972 Polymorphic mimicry in a

population of the African butterfly, Pseudacraea eurytus (L.)(Lep.
Nymphalidae). Insect Systematics & Evolution 3(4): 258–266.
Oziolor, E. M., N. M. Reid, S. Yair, K. M. Lee, S. G.

VerPloeg, P. C. Bruns, J. R. Shaw, A. Whitehead,
and C. W. Matson, 2019 Adaptive introgression enables
evolutionary rescue from extreme environmental pollution. Sci-
ence 364(6439): 455–457.
Paaby, A. B., A. O. Bergland, E. L. Behrman, and

P. S. Schmidt, 2014 A highly pleiotropic amino acid polymor-
phism in the Drosophila insulin receptor contributes to life-history
adaptation. Evolution 68(12): 3395–3409.
Patterson, N., A. L. Price, and D. Reich, 2006 Population

structure and eigenanalysis. PLoS genetics 2(12): e190.
Pickrell, J. K., T. Berisa, J. Z. Liu, L. Ségurel, J. Y.

Tung, and D. A. Hinds, 2016 Detection and interpretation of
shared genetic influences on 42 human traits. Nature genetics 48(7):
709.
Potti, J. and D. Canal, 2011 Heritability and genetic corre-

lation between the sexes in a songbird sexual ornament. Hered-
ity 106(6): 945.
Pritchard, J. K., M. Stephens, and P. Donnelly, 2000

Inference of population structure using multilocus genotype data.
Genetics 155(2): 945–959.
Provine, W. B., 2001 The origins of theoretical population genet-

ics: with a new afterword. University of Chicago Press.
Przeworski, M., 2002 The signature of positive selection at ran-

domly chosen loci. Genetics 160: 1179–1189.
274 graham coop
Ptak, S. E., A. D. Roeder, M. Stephens, Y. Gilad,

S. Pääbo, and M. Przeworski, 2004 Absence of the TAP2
human recombination hotspot in chimpanzees. PLoS biology 2(6):
e155.
Queller, D. C., 1992 Quantitative genetics, inclusive fitness, and

group selection. The American Naturalist 139(3): 540–558.
R, 2018 R: A Language and Environment for Statistical Computing.
Ralph, P. L. and G. Coop, 2015 The role of standing varia-

tion in geographic convergent adaptation. The American Natural-
ist 186(S1): S5–S23.
Rands, C. M., S. Meader, C. P. Ponting, and

G. Lunter, 2014 8.2% of the human genome is constrained:
variation in rates of turnover across functional element classes in the
human lineage. PLoS genetics 10(7): e1004525.
Richards, C. M., 2000 Inbreeding depression and genetic rescue

in a plant metapopulation. The American Naturalist 155(3): 383–
394.
Ritland, K., C. Newton, and H. Marshall, 2001 Inher-

itance and population structure of the white-phased “Kermode”
black bear. Current Biology 11(18): 1468 – 1472.
Roberts, R. B., J. R. Ser, and T. D. Kocher, 2009 Sex-

ual conflict resolved by invasion of a novel sex determiner in Lake
Malawi cichlid fishes. Science 326(5955): 998–1001.
Robertson, A., 1961 Inbreeding in artificial selection programmes.

Genet. Res. 2: 189––194.
Robinson, J. A., D. Ortega-Del Vecchyo, Z. Fan,

B. Y. Kim, C. D. Marsden, K. E. Lohmueller, R. K.
Wayne, and others, 2016 Genomic flatlining in the endangered
island fox. Current Biology 26(9): 1183–1189.
Robinson, L. M., J. R. Boland, and J. M. Braverman,

2016 Revisiting a Classic Study of the Molecular Clock. Journal of
molecular evolution 82(2-3): 110–116.
Rosenberg, N. A., J. K. Pritchard, J. L. Weber,

H. M. Cann, K. K. Kidd, L. A. Zhivotovsky, and
M. W. Feldman, 2002 Genetic structure of human populations.
science 298(5602): 2381–2385.
population and
quantitative
genetics 275
Ruwende, C., S. C. Khoo, R. W. Snow, S. N. Yates,
D. Kwiatkowski, S. Gupta, P. Warn, C. E. Allsopp,
S. C. Gilbert, and N. Peschu, 1995, July)Natural selection of
hemi- and heterozygotes for G6PD deficiency in Africa by resistance
to severe malaria. Nature 376(6537): 246–249.
Sams, A. J. and A. R. Boyko, 2018a Fine-scale resolution and

analysis of runs of homozygosity in domestic dogs. bioRxiv.
Sams, A. J. and A. R. Boyko, 2018b Fine-Scale Resolution of

Runs of Homozygosity Reveal Patterns of Inbreeding and Substan-
tial Overlap with Recessive Disease Genotypes in Domestic Dogs.
G3: Genes, Genomes, Genetics: g3–200836.
Sankararaman, S., N. Patterson, H. Li, S. Pääbo, and

D. Reich, 2012, 10)The Date of Interbreeding between Neander-
tals and Modern Humans. PLOS Genetics 8(10): 1–9.
Santiago, E. and A. Caballero, 1995 Effective size of popu-

lations under selection. Genetics 139: 1013–1030.
Santiago, E. and A. Caballero, 1998 Effective size and

polymorphism of linked neutral loci in populations under directional
selection. Genetics 149: 2105–2117.
Sattath, S., E. Elyashiv, O. Kolodny, Y. Rinott, and

G. Sella, 2011a Pervasive adaptive protein evolution apparent
in diversity patterns around amino acid substitutions in Drosophila
simulans. PLoS genetics 7 (2): e1001302.
Sattath, S., E. Elyashiv, O. Kolodny, Y. Rinott, and

G. Sella, 2011b Pervasive adaptive protein evolution apparent
in diversity patterns around amino acid substitutions in Drosophila
simulans. PLoS Genet. 7: e1001302.
Schemske, D. W. and P. Bierzychudek, 2001 Perspective:

evolution of flower color in the desert annual Linanthus parryae:
Wright revisited. Evolution 55(7): 1269–1282.
Seger, J. and H. Brockmann, 1987 Oxford Surveys in Evo-

lutionary Biology, Volume 4, Chapter What is bet-hedging, pp.
182–211. Oxford University Press.
Sella, G., D. A. Petrov, M. Przeworski, and P. An-

dolfatto, 2009 Pervasive natural selection in the Drosophila
genome? PLoS genetics 5(6): e1000495.
Shapiro, J. A., W. Huang, C. Zhang, M. J. Hubisz,

J. Lu, D. A. Turissini, S. Fang, H. Y. Wang, R. R.
276 graham coop
Hudson, R. Nielsen, Z. Chen, and C. I. Wu, 2007 Adap-

tive genic evolution in the Drosophila genomes. Proc. Natl. Acad.
Sci. U.S.A. 104: 2271–2276.
Smith, J. M., 1971 The origin and maintenance of sex. In Group

selection, pp. 163–175. Routledge.
Smith, J. M., 1982 Evolution and the Theory of Games. Cam-

bridge university press.
Smith, J. N. and A. A. Dhondt, 1980 Experimental confirma-

tion of heritable morphological variation in a natural population of
song sparrows. Evolution: 1155–1158.
Smith, J. N. and R. Zach, 1979 Heritability of some morpholog-

ical characters in a song sparrow population. Evolution 33(1Part2):
460–467.
Smith, T. B., 1993 Disruptive selection and the genetic basis

of bill size polymorphism in the African finch Pyrenestes. Na-
ture 363(6430): 618.
Smithson, A. and M. R. Macnair, 1997 Negative frequency-

dependent selection by pollinators on artificial flowers without re-
wards. Evolution 51(3): 715–723.
Stevens, N. M., 1905 Studies in Spermatogenesis: With especial

reference to the” accessory chromosome”, Volume 36. Carnegie
Institution of Washington.
Stumpf, M. P., Z. Laidlaw, and V. A. Jansen, 2002 Her-

pes viruses hedge their bets. Proceedings of the National Academy
of Sciences 99(23): 15234–15237.
Sturtevant, A. H., 1915 The behavior of the chromosomes as

studied through linkage. Zeitschrift für induktive Abstammungs-und
Vererbungslehre 13(1): 234–287.
Sweigart, A. L., Y. Brandvain, and L. Fishman, 2019

Making a Murderer: The Evolutionary Framing of Hybrid Gamete-
Killers. Trends in Genetics.
Tajima, F., 1989 Statistical method for testing the neutral muta-
tion hypothesis by DNA polymorphism. Genetics 123(3): 585–595.
Tao, Y., L. Araripe, S. B. Kingan, Y. Ke, H. Xiao,

and D. L. Hartl, 2007 A sex-ratio meiotic drive system in
Drosophila simulans. II: an X-linked distorter. PLoS biology 5(11):
e293.
population and
quantitative
genetics 277
Tao, Y., J. P. Masly, L. Araripe, Y. Ke, and D. L.
Hartl, 2007 A sex-ratio meiotic drive system in Drosophila simu-
lans. I: an autosomal suppressor. PLoS biology 5(11): e292.
Tishkoff, S. A., R. Varkonyi, N. Cahinhinan,

S. Abbes, G. Argyropoulos, G. Destro-Bisol,
A. Drousiotou, B. Dangerfield, G. Lefranc,
J. Loiselet, A. Piro, M. Stoneking, A. Tagarelli,
G. Tagarelli, E. H. Touma, S. M. Williams, and
A. G. Clark, 2001 Haplotype Diversity and Linkage Disequi-
librium at Human G6PD: Recent Origin of Alleles That Confer
Malarial Resistance. Science 293(5529): 455–462.
Toews, D. P., S. A. Taylor, R. Vallender, A. Brels-

ford, B. G. Butcher, P. W. Messer, and I. J.
Lovette, 2016 Plumage Genes and Little Else Distinguish the
Genomes of Hybridizing Warblers. Current Biology 26(17): 2313 –
2318.
Turelli, M., D. W. Schemske, and P. Bierzychudek,

2001 Stable two-allele polymorphisms maintained by fluctuating
fitnesses and seed banks: protecting the blues in Linanthus parryae.
Evolution 55(7): 1283–1298.
Ulizzi, L. and L. Terrenato, 1992 Natural selection associated

with birth weight. VI. Towards the end of the stabilizing compo-
nent. Annals of human genetics 56(2): 113–118.
Uyeda, J. C., T. F. Hansen, S. J. Arnold, and J. Pien-

aar, 2011 The million-year wait for macroevolutionary bursts.
Proceedings of the National Academy of Sciences 108(38): 15908–
15913.
van’t Hof, A. E., N. Edmonds, M. Dalíková, F. Marec,

and I. J. Saccheri, 2011 Industrial melanism in British pep-
pered moths has a singular and recent mutational origin. Sci-
ence 332(6032): 958–960.
Vermeij, G. J., 1982 Phenotypic evolution in a poorly dispersing

snail after arrival of a predator. Nature 299(5881): 349.
Voight, B. F., A. M. Adams, L. A. Frisse, Y. Qian,

R. R. Hudson, and A. Di Rienzo, 2005 Interrogating mul-
tiple aspects of variation in a full resequencing data set to infer hu-
man population size changes. Proceedings of the National Academy
of Sciences 102(51): 18508–18513.
278 graham coop
vonHoldt, B. M., J. P. Pollinger, D. A. Earl,

J. C. Knowles, A. R. Boyko, H. Parker, E. Gef-
fen, M. Pilot, W. Jedrzejewski, B. Jedrzejew-
ska, V. Sidorovich, C. Greco, E. Randi, M. Mu-
siani, R. Kays, C. D. Bustamante, E. A. Ostrander,
J. Novembre, and R. K. Wayne, 2011 A genome-wide per-
spective on the evolutionary history of enigmatic wolf-like canids.
Genome Research.
Wang, J., J. Ding, B. Tan, K. M. Robinson, I. H.

Michelson, A. Johansson, B. Nystedt, D. G.
Scofield, O. Nilsson, S. Jansson, and others, 2018
A major locus controls local adaptation and adaptive life history
variation in a perennial plant. Genome biology 19(1): 72.
Watterson, G., 1975 On the number of segregating sites in ge-

netical models without recombination. Theoretical population
biology 7 (2): 256–276.
Weis, A. E. and W. L. Gorman, 1990 Measuring selection

on reaction norms: an exploration of the Eurosta-Solidago system.
Evolution 44(4): 820–831.
Welch, A. M., M. J. Smith, and H. C. Gerhardt, 2014 A

multivariate analysis of genetic variation in the advertisement call of
the gray treefrog, Hyla versicolor. Evolution 68(6): 1629–1639.
Wheeler, W. M., 1907 Pink Insect Mutants. The American

Naturalist 41(492): 773–780.
Widemo, F., 1998 Alternative reproductive strategies in the ruff,

Philomachus pugnax: a mixed ESS? Animal Behaviour 56(2): 329–
336.
Wiehe, T. and W. Stephan, 1993a Analysis of a genetic hitch-

hiking model, and its application to DNA polymorphism data from
Drosophila melanogaster. Molecular Biology and Evolution 10(4):
842–854.
Wiehe, T. H. and W. Stephan, 1993b Analysis of a genetic

hitchhiking model, and its application to DNA polymorphism data
from Drosophila melanogaster. Mol. Biol. Evol. 10: 842–854.
Wilkinson, G. S., 1993 Artificial sexual selection alters allometry

in the stalk-eyed fly Cyrtodiopsis dalmanni (Diptera: Diopsidae).
Genetics Research 62(3): 213–222.
Williams, G. C., 1966 Adaptation and Natural Selection. Prince-

ton.
population and
quantitative
genetics 279
Williams, K.-A. and P. S. Pennings, 2019 Drug resistance
evolution in HIV in the late 1990s: hard sweeps, soft sweeps, clonal
interference and the accumulation of drug resistance mutations.
bioRxiv.
Wisely, S. M., S. W. Buskirk, M. A. Fleming, D. B.

McDonald, and E. A. Ostrander, 2002 Genetic Diversity
and Fitness in Black-Footed Ferrets Before and During a Bottle-
neck. Journal of Heredity 93(4): 231–237.
Wright, K. M., U. Hellsten, C. Xu, A. L. Jeong,

A. Sreedasyam, J. A. Chapman, J. Schmutz, G. Coop,
D. S. Rokhsar, and J. H. Willis, 2015 Adaptation to
heavy-metal contaminated environments proceeds via selection
on pre-existing genetic variation. bioRxiv: 029900.
Wright, S., 1932 The roles of mutation, inbreeding, crossbreeding,

and selection in evolution, Volume 1. na.
Wright, S., 1943 Isolation by Distance. Genetics 28(2): 114–138.
Wright, S., 1949 The Genetical Structure of Populations. Annals

of Eugenics 15(1): 323–354.
Wright, S. and T. Dobzhansky, 1946 Genetics of natural

populations. XII. Experimental reproduction of some of the changes
caused by natural selection in certain populations of Drosophila
pseudoobscura. Genetics 31(2): 125.
Wright, S. I., I. V. Bi, S. G. Schroeder, M. Yamasaki,

J. F. Doebley, M. D. McMullen, and B. S. Gaut,
2005 The Effects of Artificial Selection on the Maize Genome.
Science 308(5726): 1310–1314.
Yang, Z., 1998 Likelihood ratio tests for detecting positive selection
and application to primate lysozyme evolution. Molecular Biology
and Evolution 15(5): 568–573.
Zuckerkandl, E. and L. Pauling, 1965 Evolutionary diver-

gence and convergence in proteins. In Evolving genes and proteins,
pp. 97–166. Elsevier.

Release Popgen Notes

Uploaded by

Copyright:

Available Formats

Release Popgen Notes

Uploaded by

Document Information

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Release Popgen Notes

Uploaded by

Copyright:

Available Formats

GRAHAM COOP

Typeset using LATEX and the tufte-latex book style.

Updated on January 2020

2 Allele and Genotype Frequencies 11

3 Population Structure and Correlations Among Loci. 31

4 Genetic Drift and Neutral Diversity 49

5 The Population Genetics of Divergence and Molecular Substitution. 75

6 Neutral Diversity and Population Structure. 89

7 Phenotypic Variation and the Resemblance Between Relatives 99

8 The Response to Phenotypic Selection 121

9 The Response of Multiple Traits to Selection. 135

10 One-Locus Models of Selection 147

11 The Interaction of Selection, Mutation, and Migration. 183

12 The Impact of Genetic Drift on Selected Alleles 199

13 The Effects of Linked Selection. 211

14 Interaction of Multiple Selected Loci 227

A. An Introduction to Mathematical Concepts 245

Biological evolution is the change over time in the

In this chapter we will work through how the basics of Mendelian

arose via a C → T mutation.

pos. con. a b c d e f g h i j k l a b c d e f a b c d e f g h i j k l NS/S

Table 2.1: Variable sites in exons 2

2.1.1 Measures of genetic variability

The frequency spectrum. We also often want to compile information

Question 2. How many minor-allele singletons are there in D.

Levels of genetic variability across species. Two observations have

in part, drove the development of the Neutral theory of molecular

Figure 2.3: Levels of autosomal

2.1.2 Hardy–Weinberg proportions

Question 3. On the coastal islands of British Columbia there is

What are the expected frequencies of the three genotypes under

Question 4. You are investigating a locus with three alleles, A,

Microsatellites are regions of the genome where individuals vary

Figure 2.6: Demonstrating Hardy–

These regions are often highly variable across individuals, making

Most of the profiles record genotypes at 13 microsatellite loci that are

Question 5. You extract a DNA sample from a crime scene. The

2.2 Allele sharing among related individuals and Identity by

All of the individuals in a population are related to each other by a

Figure 2.7: First cousins sharing a

We will define two alleles to be identical by descent (IBD) if they

few generations3 . For the moment, we ignore mutation, and we will 3

Question 7. Explain in words why P (I&J IBD| i&j 2 IBD) = 1/2.

full siblings 1/4 1/2 1/4 1/4

Monozygotic twins 0 0 1 1/2

1st cousins 3/4 1/4 0 1/16

Table 2.4: Probability that two

P (A1 A1 ) =P (A1 A1 |0 alleles IBD)P (0 alleles IBD)

Or, in our r0 , r1 , r2 notation:

P (A1 A1 ) =P (A1 A1 |0 alleles IBD)r0

Question 8. Trickier question. The genotype of our suspect in

There’s a variety of ways to estimate the relationships among in-

the probability of r0 , r1 , and r2 for each pair of birds. A plot of these

Sharing of genomic blocks among relatives. We can more directly see

Figure 2.10: Estimated coefficient

recent genealogical relationship, and

so appear as black points in the lower

supplying the data. Code here.

0.0 0.2 0.4 0.6 0.8 1.0

Figure 2.11: A simulation of sharing