Release Popgen Notes
Release Popgen Notes
Release Popgen Notes
POPULATION AND
QUANTITATIVE
GENETICS
Author: Graham Coop
Author address: Department of Evolution and Ecology & Center for Population Biology,
University of California, Davis.
To whom correspondence should be addressed: gmcoop@ucdavis.edu
This work is licensed under a Creative Commons Attribution 3.0 Unported License.
http://creativecommons.org/licenses/by/3.0/
i.e. you are free to reuse and remix this work, but please include an attribution to the original.
The LATEX code and R code for this book are kept here https://github.com/cooplab/popgen-notes/ and again are
under a Creative Commons Attribution 3.0 Unported License.
This book was developed from my set of notes for the Population Biology graduate group core class (PBG200A) and
Undergraduate Population and Quantitative Genetics class (EVE102) at UC Davis. Thanks to the many students
who’ve read these notes and suggested improvements. Thanks to Simon Aeschbacher, Vince Buffalo, and Erin Calfee
who read and extensively edited earlier drafts of these notes. To illustrate these notes I’ve used old scientific and
natural history illustrations, in part because they are out of copyright but mainly because they bring me joy. Many
of the old images come from Biodiversity Heritage Library a consortium of natural history institutions that are digi-
tizing their collections and make them freely available online. If you enjoy the images consider donating to the BHL.
Many of the data and simulation graphics in the book were prepared in R (2018), the code for each is linked to from
the caption of each figure. In many cases data were extracted from old figures using the WebPlotDigitizer tool, as
such I advise re-extracting the data if you wish to use it for research purposes.
Contents
1 Introduction 7
Bibliography 259
1
Introduction
the course we will see that these simple models often yield accurate
predictions, such that much of our understanding of the process of
evolution is built on these models. We will also see how these models
are incredibly useful for understanding real patterns we see in the evo-
lution of phenotypes and genomes, such that much of our analysis of
evolution, in a range of areas from human medical genetics to conser-
vation, is based on these models. Therefore, population and quantita-
tive genetics are key to understanding various applied questions, from
how medical genetics identifies the genes involved in disease to how we
preserve species from extinction.
Population genetics emerged from early efforts to reconcile Mendelian
genetics with Darwinian thought. Part of the power of population See Provine (2001) for a history
genetics comes from the fact that the basic rules of transmission ge- of early population genetics.
Provine, W. B., 2001 The
netics are simple and nearly universal. One of the truly remarkable origins of theoretical population
things about population genetics is that many of the important ideas genetics: with a new afterword.
University of Chicago Press
and mathematical models emerged before the 1940s, long before the
mechanistic-basis of inheritance (DNA) was discovered, and yet the
usefulness of these models has not diminished. This is a testament to
the fact that the models are established on a very solid foundation,
building from the basic rules of genetic transmission combined with
simple mathematical and statistical models.
Much of this early work traces to the ideas of R.A. Fisher, Sewall
Wright, and J.B.S. Haldane, who, along with many others, described
the early principals and mathematical models underlying our under-
standing of the evolution of populations. Building on this conceptual
fusion of genetics and evolution, there followed a flourishing of evolu-
tionary thought, the modern evolutionary synthesis, combining these
ideas with those from the study of speciation, biodiversity, and pale-
ontology. In total this work showed that both short-term evolutionary
change and the long-term evolution of biodiversity could be well un-
derstood through the gradual accumulation of evolutionary change
within and among populations. This evolutionary synthesis contin-
ues to this day, combining new insights from genomics, phylogenetics,
ecology, and developmental biology. “Dobzhansky (1951)
Population and quantitative genetics are a necessary but not suf- once defined evolution as ’a
ficient description of evolution; it is only by combining the insights change in the genetic com-
of many fields that a rich and comprehensive picture of evolution position of the populations’
emerges. We certainly do not need to know the genes underlying the an epigram that should not
be mistaken for the claim
displays of the birds of paradise to study how the divergence of these
that everything worth saying
displays, due to sexual selection, may drive speciation. Indeed, as we’ll about evolution is contained
see in our discussion of quantitative genetics, we can predict how pop- in statements about genes”
ulations respond to selection, including sexual selection and assortative – Lewontin (2001)
mating, without any knowledge of the loci involved. Nor do we need
to know the precise selection pressures and the ordering of genetic
population and
quantitative
genetics 9
changes to study the emergence of the tetrapod body plan. We do
not necessarily need to know all the genetic details to appreciate the
beauty of these, and many other, evolutionary case studies. However,
every student of biology gains from understanding the basics of pop-
ulation and quantitative genetics, allowing them to base their studies
on a solid bedrock of understanding of the processes that underpin all
evolutionary change.
2
Allele and Genotype Frequencies
781 G T T T T T T T T T T T T - - - - - - - - - - - - - - - - - - NS
789 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
808 A - - - - - - - - - - - - - - - - - - G G G G G G G G G G G G NS
816 G T T T T - - - - - - - T T T T T T T - - - - - - - - - - - - S
834 T - - - - - - - - - - - - C C - - - C - - - - - - - - - - - - S
859 C - - - - - - - - - - - - - - - - - - G G G G G G G G G G G G NS
867 C - - - - - - - - - - - - - - - - - - G G G G G A G G G G G G S
870 C T T T T T T T T T T T T - - - - - - - - - - - - - - - - - - S
950 G - - - - - - - - - - - - - A - - - - - - - - - - - - - - - - S
974 G - - - - - - - - - - - - T - T T T T - - - - - - - - - - - - S
983 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
1019 C - - - - - - - - - - - - - - - - - - - - - - A - - - - - - - S
1031 C - - - - - - - - - - - - - - - - - - - - - - - - - - A - - - S
1034 T - - - - - - - - - - - - - - - - - - C C C C C - - C - C C S
1043 C - - - - - - - - - - - - - - - - - - - - - - A - - - - - - - S
1068 C T T - - - - - - - - - - - - - - - - - - - - - - - - - - - - S
1089 C - - - - - - - - - - - - A A A A A A - - - - - - - - - - - - NS
1101 G - - - - - - - - - - - - - - - - - - A A A A A A A A A A A A NS
1127 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
1131 C - - - - - - - - - - - - - - - - - - - - - - T - - - - - - - S
1160 T - - - - - - - - - - - - - - - - - - C C C C C C C C C C C C S
2N11 + N12 1
p= = f11 + f12 . (2.1)
2N 2
Note that this follows directly from how we count alleles given in-
dividuals’ genotypes, and holds independently of Hardy–Weinberg
proportions and equilibrium (discussed below). The frequency of the
alternate allele (A2 ) is then just q = 1 − p.
1( )
π= (2+1+1+1+0)+(3+3+3+2)+(0+0+1)+(0+1)+(1) = 1.26
15
(2.2)
where the first bracketed term gives the pairwise differences between
a and b-f, the second bracketed term the differences between b and c-f
and so on.
Our π measure will depend on the length of sequence it is calcu-
lated for. Therefore, π is usually normalized by the length of sequence,
to be a per site (or per base) measure. For example, our ADH se-
quence covers 397bp of DNA and so π = 1.26/397 = 0.0032 per site
in D. simulans for this region. Note that we could also calculate π
per synonymous site (or non-synonymous). For synonymous site π, we
would count up number of synonymous differences between our pairs
of sequences, and then divide by the total number of sites where a
synonymous change could have occurred.1 1
Technically we would need to divide
by the total number of possible
point mutations that would result
Number of segregating sites. Another measure of genetic variability in a synonymous change; this is
is the total number of sites that are polymorphic (segregating) in our because some mutational changes at
a particular nucleotide will result in
sample. One issue is that the number of segregating sites will grow a non-synonymous or synonymous
as we sequence more individuals (unlike π). Later in the course, we’ll change depending on the base-pair
change.
talk about how to standardize the number of segregating sites for the
number of individuals sequenced (see eqn (4.39)).
AA AG GG
42 24 21
allele name 80 90 100 110 120 121 130 140 150 Table 2.2: Data for 155 Europeans
at the D16S539 microsatellite from
allele count 3 34 13 102 97 1 44 13 3 CODIS from Algee-Hewitt et al.
(2016). The top row gives the number
of tetranucleotide repeats for each
allele, the bottom row gives the
sample counts.
allele name 60 70 80 90 93 100 110 Table 2.3: Same as 2.2 but for the
TH01 microsatellite.
allele counts 84 42 37 67 77 1 2
i j
Fij =P (I&J IBD) (2.3)
=P (I&J IBD| i&j 0 IBD)P (i&j 0 IBD)
+ P (I&J IBD| i&j 1 IBD)P (i&j 1 IBD) r1
+ P (I&J IBD| i&j 2 IBD)P (i&j 2 IBD) (2.4)
i j
1 1
=0 × r0 + r1 + r2 . (2.5)
4 2
In the above step, eqn(2.4), we’re summing the conditional prob-
ability of alleles I & J being IBD over whether our individuals i & r2
j share 0, 1, or 2 alleles IBD, an example of using the Law of Total
Probability (see Appendix eqn (A.12)). We’ve then, in eqn 2.5, used i j
the fact that we can calculate our condition probabilities of I & J
being IBD using the rules of Mendelian transmision. Consider the Figure 2.8: A pair of diploid individ-
probability P (I&J IBD| i&j 1 IBD), i.e. that our pair of alleles (I & uals (i and j) sharing 0, 1, or 2 alleles
IBD where lines show the sharing of
J) drawn from individuals i and j are IBD given that i and j share alleles by descent (e.g. from a shared
one allele IBD, this is a 1/4 as we need to draw the allele that is IBD ancestor).
from both i and j, i.e. drawing both black alleles in the middle panel
of Figure 2.8, which happens with probability 1/2 × 1/2. The coef-
ficient of kinship will appear multiple times, in both our discussion
of inbreeding and in the context of phenotypic resemblance between
relatives.
Question 6. What are r0 , r1 , and r2 for 1/2 sibs? (1/2 sibs share
one parent but not the other).
Relationship (i,j)∗ P (i&j 0 IBD) P (i&j 1 IBD) P (i&j 2 IBD) P (I&J IBD)
Relationship (i,j)∗ r0 r1 r2 Fij
parent–child 0 1 0 1/4
If our pair of relatives share 0 alleles IBD, then the probability that
they are both homozygous is P (A1 A1 |0 alleles IBD) = p2 × p2 , as all
four alleles represent independent draws from the population. If they
share 1 allele IBD, then the shared allele is of type A1 with probability
p, and then the other non-IBD allele, in both relatives, also needs to
be A1 which happens with probability p2 , so P (A1 A1 |1 alleles IBD) =
p × p2 . Finally, our pair of relatives can share two alleles IBD, in which
case P (A1 A1 |2 alleles IBD) = p2 , because if one of our individuals is
homozygous for the A1 allele, both individuals will be. Putting this all
together our equation (2.7) becomes
P (A1 A1 ) = p4 r0 + p3 r1 + p2 r2 (2.8)
Note that for specific cases we could also calculate this by summing
over all the possible genotypes their shared ancestor(s) had; however,
that would be much more involved and not as general as the form we
have derived here.
20 graham coop
We can write out terms like eq (2.8) for all of the possible configu-
rations of genotype sharing/non-sharing between a pair of individuals.
Based on this we can write down the expected number of polymorphic
sites where our individuals are observed to share 0, 1, or 2 alleles.
● Parent−Offspring
● Full Sib plotted by their estimated IBD (r1
●
●
Grandparent
1/2 siblings
and r2 ) from their genetic data. The
● Aunt/Uncle points are coloured by their known
0.8
● GreatAunt/Uncle
pedigree relationships. Note that
most pairs have low kinship, and no
Estimated IBD r2
Estimated IBD r1
21 21 21
of your grandmother’s 22 autosomes
Your 1st cousin's Both your genomes
20 Your genome in 20
genome in 20
in your
that you inherited are coloured red,
your Grandmother
19 19 your Grandmother 19 Grandmother those that your cousins inherited are
18 18 18
coloured blue. In the third panel we
17 17 17
16 16 16
show the overlapping genomic regions
15 15 15 in purple, these regions will be IBD
14 14 14
in you and your cousin. If you are full
13 13 13
12 12 12
first cousins, you will also have shared
11 11 11 genomic regions from your shared
10 10 10 grandfather, not shown here. Details
9 9 9
about how we made these simulations
8 8 8
7 7 7
here.
6 6 6
5 5 5
4 4 4
3 3 3
2 2 2
1 1 1
22 graham coop
Note how if you compare Figure 2.12 and Figure 2.11, individuals
inherit less IBD from a shared great, great grandmother than from a
shared grandmother, as they inherit from more total ancestors further
back. Also notice how the sharing occurs in shorter genomic blocks,
as it has passed through more generations of recombination during
meiosis. These blocks are still detectable, and so third cousins can be
detected using high-density genotyping chips, allowing more distant
relatives to be identified than single marker methods alone. 4 More 4
Indeed the suspect in case of the
distant relations than third cousins, e.g. fourth cousins, start to have Golden State Killer was identified
through identifying third cousins that
a significant probability of sharing none of their genome IBD. But you genetically matched a DNA sample
have many fourth cousins, so you will share some of your genome IBD from an old crime scene (see a here
for more details).
with some of them; however, it gets increasingly hard to identify the
degree of relatedness from genetic data the deeper in the family tree
this sharing goes.
population and
quantitative
genetics 23
2.2.1 Inbreeding M. Granddad M. Grandmother
more closely related to each other than two random individuals drawn
Inbreeding loop
from some reference population.
When two related individuals produce an offspring, that individ- Cousin Cousin
ual can receive two alleles that are identical by descent, i.e. they can Child of 1st cousins
(1 − F )2pq, (2.9)
where we have dropped the indices i and j for simplicity. The off-
spring can be homozygous for the A1 allele in two different ways.
They can have two non-IBD alleles that are not IBD but happen to be
of the allelic type A1 , or their two alleles can be IBD, such that they
inherited allele A1 by two different routes from the same ancestor.
Thus, the probability that an offspring is homozygous for A1 is
(1 − F )p2 + F p. (2.10)
HE − HO HO
F̂ = =1− . (2.13)
HE HE
AA AB BB
0.6 0.2 0.2
HBD
of the child sets the proportion of their genome that will be in these
autozygous segments. For example, a child of first full cousins is ex-
pected to have 1/16 of their genome in these segments. The more
distant the loop in the pedigree, the more meioses that chromosomes
have been through and the shorter individual blocks will be. A child of
first cousins will have longer blocks than a child of second cousins, for
example.
Individuals with multiple inbreeding loops in their family tree can
have a high inbreeding coefficient due to the combined effect of many
small blocks of autozygosity. For example, Charles II had an inbreed-
ing coefficient that is equivalent to that of the child of full-sibs, with
a quarter of his genome expected to homozygous by descent, but this
would be made up of many shorter blocks.
We can hope to detect these blocks by looking for unusually long
genomic runs of homozygosity (ROH) sites in an individual’s genome.
One way to estimate an individual’s inbreeding coefficient is then to
total up the proportion of an individual’s genome that falls in such
28 graham coop
In 2.20 we can see that English bulldogs have more short ROH than
Doberman Pinschers, but that Doberman Pinschers have more of their
genome in very large ROH (> 16M b). This suggests that English bull-
dogs have had long history of inbreeding but that Doberman Pinschers
have a lot of recent inbreeding in their history.
population and
quantitative
genetics 29
HS 2pS qS
FST = 1 − =1− . (3.3)
HT 2pT qT
H̄S
FST = 1 − , (3.5)
HT
∑K (i) (i)
where H̄S = 1/K i=1 HS , and HS = 2pi qi is the expected heterozy-
gosity in subpopulation i. It follows that the average heterozygosity of
the sub-populations H̄S ≤ HT , and so FST ≥ 0 and FIS ≤ FIT . This
observation that the average heterozygosity of the sub-populations
must be less than of equal to that of the total population is called the
Wahlund effect. Furthermore, if we have multiple sites, we can replace 2
Averaging heterozygosity across loci
first, then calculating FST , rather
HI , HS , and HT with their averages across loci (as above). 2
than calculating FST for each locus
As an example of comparing a genome-wide estimate of FST to that individually and then taking the aver-
at individual loci we can look at some data from blue- and golden- age, has better statistical properties as
statistical noise in the denominator is
winged warblers (Vermivora cyanoptera and V. chrysoptera 1-2 & 5-6 averaged out.
in Figure 3.2).
These two species are spread across eastern Northern America, with
the golden-winged warbler having a smaller, more northernly range.
They’re quite different in terms of plumage, but have long been known
to have similar songs and ecologies. The two species hybridize readily
in the wild; in fact two other previously-recognized species, Brewster’s
and Lawrence’s warbler (4 & 3 in 3.2), are actually found to just be
hybrids between theses two species. The golden-winged warbler is
listed as ‘threatened’ under the Canadian endangered species act as its
habitat is under pressure from human activity and and due to increas-
ing hybridization with the blue-winged warbler, which is moving north
into its range. Toews et al. (2016) investigated the population ge-
nomics of these warblers, sequencing ten golden- and ten blue-winged
warblers. They found very low divergence among these species, with
a genome-wide FST = 0.0045. In Figure 3.3, per SNP FST is aver-
aged in 2000bp windows moving along the genome. The average is
very low, but some regions of very high FST stand out. Nearly all of
these regions correspond to large allele frequency differences at loci
in, or close, to genes known to be involved in plumage colouration
differences in other birds. To illustrate these frequency differences
Toews et al. (2016) genotyped a SNP in each of these high-FST re-
gions. Here’s their genotyping counts from the SNP, segregating for
an allele 1 and 2, in the Wnt region, a key regulatory gene involved in
feather development: Figure 3.2: Blue-, golden-winged, and
Lawrence’s warblers (Vermivora).
The warblers of North America. Chapman,
F.M. 1907. Image from the Biodiversity
Heritage Library. Contributed by American
Museum of Natural History Library. Not in
copyright.
34 graham coop
∏
S
P (ind.|pop k) = P (gl |pop k) (3.10)
l=1
as the posterior probability that our new individual comes from each
of our 1, · · · , K populations.
More sophisticated versions of this are now used to allow for hy-
brids, e.g, we can have a proportion qk of our individual’s genome
come from population k and estimate the set of qk ’s.
Question 2.
Returning to our chimp example, imagine that we have genotyped
a set of individuals from the Western and Eastern populations at two
SNPs (we’ll ignore the central population to keep things simpler). The
frequency of the capital allele at two SNPs (A/a and B/b) is given by
Note that this is the sample covariance, and is very similar to those
we encountered in discussing F -statistics as correlations (equation
(3.6)), except now we are asking about the covariance between two
individuals above that expected if they were both drawn from the
total sample at random (rather than the covariance of alleles within a
40 graham coop
are known as coupling gametes, and those with different case alleles
are known as repulsion gametes (e.g. a and B, or A and b). Then,
44 graham coop
pAB = pA pB + D. (3.17)
0.25
r = 0.01
genotypes that could generate a pAB haplotype. r = 0.1
0.20
r = 0.5
We can then write the change in the frequency of the pAB haplo-
0.15
type as
D
0.10
∆pAB = p′AB − pAB = −rpAB + rpA pB = −rD (3.20)
0.05
0.00
= pAB + ∆pAB − pA pB
= (1 − r)D (3.21)
0.25
t=5
t = 10
0.20
t = 100
where we can cancel out ∆pA and ∆pB above because recombination
0.15
Dt = (1 − r)t D0 (3.22)
0.00
Recombination is acting to decrease LD, and it does so geometrically Recombination fraction (r)
tween species, because they are quickly weeded out of the population
by selection.
Neutral theory can sound strange given that much of the time our
first brush with evolution often focuses on adaptation and phenotypic
evolution. However, proponents of this world-view didn’t deny the
existence of advantageous mutations, they simply thought that bene-
ficial mutations are rare enough that their contribution to the bulk of
polymorphism or divergence can be largely ignored. They also often
thought that much of phenotypic evolution may well be adaptive, but
again the loci responsible for these phenotypes are a small fraction of
all the molecular change that occur. The neutral theory of molecular
evolution was originally proposed to explain protein polymorphism.
However, we can apply it more broadly to think about neutral evo-
lution genome-wide. With that in mind, what types of molecular
changes could be neutral? Perhaps:
1. Changes in non-coding DNA that don’t disrupt regulatory se-
quences. For example, in the human genome only about 2% of the
genome codes for proteins. The rest is mostly made up of old trans-
posable element and retrovirus insertions, repeats, pseudo-genes,
and general genomic clutter. Current estimates suggest that, even
counting conserved, functional, non-coding regions, less than 10%
of our genome is subject to evolutionary constraint (Rands et al.,
2014).
Genetic drift will, in the absence of new mutations, slowly purge our
population of neutral genetic diversity, as alleles slowly drift to high or
low frequencies and are lost or fixed over time.
Imagine a randomly mating population of a constant size N diploid
individuals, and that we are examining a locus segregating for two
alleles that are neutral with respect to each other. This population is
randomly mating with respect to the alleles at this locus. See Figures
4.1 and 4.2 to see how genetic drift proceeds, by tracking alleles within
a small population.
In generation t our current level of heterozygosity is Ht , i.e. the
probability that two randomly sampled alleles in generation t are
non-identical is Ht . Assuming that the mutation rate is zero (or van-
ishingly small), what is our level of heterozygosity in generation t + 1?
Ht = H0 e− /(2N )
t
(4.3)
0.5
1 sim.
Mean sim. genetic drift in a diploid population of
Expectation 50 individuals, in the absence of new
0.8
0.4
mutations. We start 40 independent,
Heterozygosity
Frequency, p
0.3
allele at 30% frequency. The left
panel shows the allele frequency over
0.4
0.2
time and the right panel shows the
heterozygosity over time, with the
0.2
0.1
mean decay matching eqn. (4.2).
Code here.
0.0
0.0
0 50 100 150 0 50 100 150
here.
●
(N>10k)
Heterozygosity (HE)
●
0.20
(N=62)
0.15
0.10
●
(N=40)
●
(N=7)
0.05
0.00
The neutral mutation rate. We’ll first want to consider the rate at
which neutral mutations arise in the population.Thinking back to our
discussion of the neutral theory of molecular evolution, let’s suppose
that there are only two classes of mutation that can arise in our ge-
nomic region of interest: neutral mutations and highly deleterious mu-
tations. The total mutation rate at our locus is µ per generation, i.e.
per transmission from parent to child. A fraction C of our mutations
are new alleles that are highly deleterious and so quickly removed
from the population. We’ll call this C parameter the constraint, and
it will differ according to the genomic region we consider. The remain-
ing fraction (1 − C) are our neutral mutations, such that our neutral
mutation rate is (1 − C)µ. This is the per generation rate.
Question 3. It’s worth taking a minute to get familiar with both
how rare, and how common, mutation is. The per base pair mutation
56 graham coop
Note the power of 4 is because our two alleles have to have failed to
mutate through 2 meioses each.
More generally, the probability that our alleles coalesce in gener-
ation t + 1 (counting backwards in time) and are identical due to no
population and
quantitative
genetics 57
mutation to either allele in the subsequent generations is
( )t
1 1 2(t+1)
P (coal. in t+1 & no mutations) = 1− (1 − µ) (4.5)
2N 2N
To make this slightly easier on ourselves let’s further assume that
t ≈ t + 1 and so rewrite this as:
( )t
1 1 2t
P (coal. in t+1 & no mutations) ≈ 1− (1 − µ) (4.6)
2N 2N
This gives us the approximate probability that two alleles will
coalesce in the (t + 1)th generation. In general, we may not know
when two alleles may coalesce: they could coalesce in generation
t = 1, t = 2, . . ., and so on. Thus, to calculate the probability that
two alleles coalesce in any generation before mutating, we can write:
∫ ∞
1 1/(2N ) 1
e−t(2µ+1/(2N )) dt = = (4.11)
2N 0 1/(2N ) + 2µ 1 + 4N µ
The equation above gives us the probability that our two alleles
coalesce at some point in time, and do not mutate before reaching
58 graham coop
population which matches the rate of genetic drift, is the harmonic So that if we want a constant effective
population size (Ne ) that has the
mean true population size over time. The harmonic mean is very same rate of loss of heterozygosity as
strongly affected by small values, such that if our population size is our variable population, we need to
one million 99% of the time but drops to 1000 every hundred or so rearrange and solve this equation to
give (4.15).
generations, Ne will be much closer to 1000 than a million.
i.e. the rate of coalescence is the harmonic mean of the two sexes’
1
population sizes, equating this to 2N e
we find
4NF NM
Ne = (4.19)
NF + NM Figure 4.10: Male Hamadryas ba-
boons. Up to ten females live in a
Thus if reproductive success is very skewed in one sex (e.g. NM ≪ harem with a single male.
Brehm’s Tierleben (Brehm’s animal life).
N /2), our effective population size will be much reduced as a re- Brehm, A.E. 1893. Image from the Biodiversity
Heritage Library. Contributed by University of
sult. For more on how different evolutionary forces affect the rate Illinois Urbana-Champaign. Not in copyright.
of genetic drift, and their impact on the effective population size, see
Charlesworth (2009).
E(T2 ) = 2N (4.21)
generations. This form to the expectation follows from the fact that
the mean of an geometric random variable is 1/p.
62 graham coop
= 2µE(T2 )
= 4µN (4.22) Figure 4.12: The ancestral lineages
of a pair of sequences coalese t gen-
this makes use of the law of total expectation (see Appendix eqn erations in the past. There are 2t
generations that mutations could
(A.27)) to average which generation our pair of sequences coalesce occur in that would be differences be-
in. We’ll assume that mutation is rare enough that it never happens tween our sequences. Three mutations
have occured in this time changing
at the same basepair twice, i.e. no multiple hits, such that we get to the ancestral sequence (AGTTT) to
see all of the mutation events that separate our pair of sequences. This the sequences at the bottom of the
is assumption that repeat mutation is vanishingly rare at a basepair picture.
E(π) = 4N µ = θ (4.23)
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Generations Generations
while the probability any pair coalesces is ≈ (2)/2N , again using eqn
i
(A.2).
We can ignore the possibility that more than pairs of alleles (e.g.
tripletons) simultaneously coalesce at once as terms of 1/N 2 and higher
can be ignored as they are vanishingly rare. Obviously in reasonable
()
sample sizes there are many more triples ( 3i ) and higher order com-
()
binations than there are pairs ( 2i ), but if i ≪ N then we are safe to
ignore these terms.
When there are i alleles, the probability that we wait until the t + 1
generation before any pair of alleles coalesces is
()(
i
( ) )t
i
P (Ti = t + 1) = 2
1− 2
(4.29)
2N 2N
Thus the waiting time to the first coalescent event while there are i
lineages is a geometrically distributed random variable6 with probabil- 6
see Appendix eqn (A.29).
i
ity of success p = (2)/2N , which we denote by
( i )
Ti ∼ Geo (2)/2N . (4.30)
The mean waiting time till any of pair within our sample coalesces is
2N
E(Ti ) = ( i ) (4.31)
2
coalesces, we then switch to having to follow i − 1 alleles back in time. The waiting time Ti to the first coa-
lescent event in a sample of i alleles
Then when a pair of these i − 1 alleles coalesce, we then only have to
is thus
( )
exponentially distributed
(( ) with)
follow i − 2 alleles back. This process continues until we coalesce back i i
rate 2 /2N , i.e. Ti ∼ Exp 2 /2N .
to a sample of two, and from there to a single most recent common
ancestor (MRCA).
1. Set i = n.
4. Set i = i − 1
population and
quantitative
genetics 67
5. Continue looping steps 2-4 until i = 1, i.e. the most recent common
ancestor of the sample is found.
By following this algorithm we are generating realizations of the ge-
nealogy of our sample.
AGGTC
Generations
TMRCA(=18 gens) AGGTC
AGTGT
ACTGT
AGGTC
AGTTT
ACTGT
CGTTT
CGTTT
The expected time to the most recent common ancestor. We will first
consider the time to the most recent common ancestor of the entire
sample (TM RCA ). This is
∑
2
TM RCA = Ti (4.33)
i=n
∑
2
Ttot = iTi (4.36)
i=n
∑2
2N ∑2
4N ∑1
4N
E(Ttot ) = i (i) = = (4.37)
i=n 2 i=n
i − 1 i=n−1
i
we see that our expected total amount of time in the genealogy scales
linearly with our population size N . Our expected total amount of
time is also increasing with sample size n, but is doing so very slowly.
This again follows from the fact that in large samples, the initial To get a better sense of how Ttot
coalescence usually happens very rapidly, so that extra samples add grows with the sample size, we can
approximate the sum 4.37 by an
little to the total amount of time in the genealogical tree. integral, which
∫ will work for large n.
We saw above that the number of mutational differences between The result is 1n−1 4Ni
di = 4N log(n −
1).
a pair of alleles that coalescence T2 generations ago was Poisson with
a mean of 2µT2 , where 2T2 is the total branch length in this simple
2-sample genealogical tree. A mutation that occurs on any branch of
our genealogy will cause a segregating polymorphism in the sample
population and
quantitative
genetics 69
(meeting our infinitely-many-sites assumption). Thus, if the total time
in the genealogy is Ttot , there are Ttot generations for mutations. So
the total number of mutations segregating in our sample (S) is Poisson
with mean µTtot . Thus the expected number of segregating sites in a
sample of size n is
∑1
4N µ ∑1
1
E(S) = µE(Ttot ) = =θ (4.38)
i=n−1
i i=n−1
i
Note that this is growing with the sample size n, albeit very slowly
(roughly at the rate of the log of the sample size). We can use this
formula to derive another estimate of the population scaled mutation
rate θ, by setting our observed number of segregating sites in a sample
(S) equal to this expectation. We’ll call this estimator θbW :
S
θbW = ∑1 (4.39)
1/i
i=n−1
By similar logic, the time where doubletons could arise is T2 and our
expected number of doubletons is E(Si ) = θ/2. Thus, there are on
average half as many doubletons as singletons.
Extending this logic to larger samples might be doable, but is te-
dious (I mean really tedious: for 10 alleles there are thousands of
possible tree shapes and the task quickly gets impossible even compu-
tationally). A nice, relatively simple proof of the neutral site frequency
spectrum is given by (Hudson, 2015), but we won’t give this here.
The general form is:
θ
E(Si ) = (4.41)
i
i.e. there are twice as many singletons as doubletons, three times
as many singletons as tripletons, and so on. The other thing that
will be helpful for us to know is that neutral alleles at intermediate
frequency tend to be old, and those that are rare in the sample are
young. We expect to see a lot more rare alleles in our sample than
common alleles.
Question 9. There are two possible tree shapes that could relate
four samples. Draw both of them and separately colour (or otherwise
mark) the branches by where singletons, doubletons, and tripleton
derived alleles could arise.
We can also ask the probability of observing a derived allele seg-
regating at frequency i/n given that the site is polymorphic in our
sample of size n (i.e. given that 0 < i < n ). We can obtain this
probability by dividing the expected number of sites segregating for an
allele at frequency i by the expected number segregating at all of the
possible allele frequencies for polymorphisms in our sample
E(S ) 1/i
P (i|0 < i < n) = ∑n−1 i = ∑n−1 . (4.42)
j=1 E(Sj )
1/j
j=1
θ̂π − θ̂W
D= (4.43)
C
where the numerator is the difference between the estimate of θ based
on pairwise differences and that based on segregating sites. As these
population and
quantitative
genetics 71
two estimators both have expectation θ under the neutral, constant-
size model, the expectation of D is zero. The denominator C is a
positive constant; it’s the square-root of an estimatorof the variance of
this difference under the constant population size, neutral model. This
constant was chosen for D to have mean zero and variance 1 under the
null model, so we can test for departures from this simple null model.
An excess of rare alleles compared to the constant-size, neutral
model will result in a negative Tajima’s D, because each additional ●
●
● Synonymous
1e+01
rare allele increases the number of segregating sites by 1, but only has ●
● Non−Synonymous
●
a small effect on the number of pairwise differences between samples. θπ ●
●
●
1e−01
●
●
●
1e−02
Alleles at intermediate-frequency increase pairwise diversity more per ● ●
● ●
segregating site than typical, thus increasing θπ more than θW .
1e−03
●
● ●
● ●
●
1e−04
● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
● ● ● ● ● ● ●
Generations
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Generations
2500
0.15
0.15
1667
0.10
0.10
πbp
833
0.05
0.00
0
maize
150
D in maize and teosinte, see how the
teosinte
maize distribution is shifted towards
0.03
Counts
100
0.02
50
0.01
0.00
0
0.00 0.01 0.02 0.03 0.04 0.05 0.06 −2.4 −1.6 −0.8 0 0.8 1.6 2.4
Teosinte πbp
Tajima's D bin
Human accacagcatttgttagttactgccaagaagcctgtatctgtagggtaaaatcctcgctgaagtgggttg
Chimp ......................................g...........c...................
Gorilla ..................................................cc..................
Orangutan .........c.........c..............................c...................
Gibbon ...................c..............................---.................
Crab-eating macaque g.............gg...c..............................c..t.t..............
Many substitutions were driven by selection, as there has undoubt- Table 5.1: Variable positions in a
primate alignment of orthologous
edly been plenty of adaptive phenotypic adaptive evolution in great sequences of a 136bp region. This
apes. However, these adaptive changes may be a small minority of all region starts at position 5242147 of
the subsitutions, for a start many of these substitutions have occurred chromosome 11, chosen pretty much
at random from the UCSC browser.
in non-coding DNA with no known functional effect. Thus it is rea- Dots indicate positions where the
sonable initial position that the majority of substitutions genome-wide other sequences carry the same base
as the human reference sequence.
may well be neutral. How can we hope to identify regions undergoing
adaptive divergence? How could we hope to address the claim that
many amino-acid changing substitutions are also neutral, as posited Many of the topics covered in this
chapter also fall within the field of
‘molecular evolution’, which shares
many of its questions and tools with
population genetics but often focuses
on longer time-scales of evolution
using phylogenetic approaches.
76 graham coop
●
who revisited this classic study and
●
confirmed the conclusions.
Hemoglobin,
0.18 subs per Myrs
60
●
Fibrinopeptides,
0.96 subs per Myrs
●
40
●
20
● ●
●
Cytochrome c,
●
● 0.05 subs per Myrs
●
●
0
who last share a common ancestor in the fossil record around 300 mil-
lion years, are separated by roughly 15 NS substitutions per 100 sites
in the Cytochrome c protein. While, humans and dogfish, which di-
verged around 400 million years, are separated by 19 NS substitutions
per 100 sites in this gene.
In equation (5.2), if we double the amount of time separating a
pair of species T , we double the number of substitutions predicted.
Note that for this to be true T must be measured in generations. To
explain a protein molecular clock between species that clearly differed
dramatically in generation time it was hypothesized that the mutation
rate actually scaled with generation time, i.e. short-lived organisms
introduced fewer mutations per generation, e.g. as they had fewer
rounds of mitosis. This generation-time assumption meant that the
mutation rate per year could be constant, such that µT would be a
constant for pairs of species that had diverged for similar geological
times, which are measured in years, even if the organisms differed in
generation time. This assumption would allow neutral theory to be
consistent with a protein molecular clock measured in years. We now
know that this critical generation time assumption is false: organisms
with shorter generation times have somewhat higher mutation rates
per year so a strict neutral model is inconsistent with the protein
molecular clock. We’ll return to these ideas when we discuss the fate
of very weakly selected mutations in Chapter 12 and Ohta (1973)’s
Nearly Neutral theory. If you are still reading this send Graham a
picture of Tomoko Ohta receiving the Crafoord Prize, an analog of the
Nobel prize for biology, for her contributions to molecular evolution.
One of the great appeals of neutral models is they offer a simple null
for us to test real data against.
dN = 2T (1 − C)µ (5.4)
Dividing by dS , we find
dN/dS = (1 − C) (5.5)
strained way. The branches where the gene was likely transitioning
from a functional to non-function state, i.e. pre-mutation and mixed,
had intermediate values of dN/dS = 0.83 − 0.98, consistent with a tran-
sition from a constrained to unconstrained mode of protein evolution
somewhere along these branches of the phylogeny.
this branch.
B) Aardvarks last shared a common ancestor with Afrosoricida
(golden moles, tenrecs) and Macroscelidea (elephant shrews) around ∼
75.1 million years ago in the Cretaceous. Assume that for the portion
of the branch while Enamlin was functional dN/dS = 0.51 and after it
was pseudogenized there was no constaint (i.e. dN/dS = 1). Based on
the branch’s average dN/dS = 0.75, can you estimate the time at which
Enamlin was pseudogenized? (I.e. when is the star in Figure 5.9?) Figure 5.10: Aardvarks (Cape ant-
eater, Orycteropus afer)
Cassell’s natural history ( 1896 ). Duncan,
P. M. Image from the Biodiversity Heritage
Library. Contributed by NCSU Libraries. Not
Adaptive evolution and dN/dS . Clearly genes are not only subject in copyright.
assuming again that all synonymous mutations are neutral. Note that
this means that our estimates of C using 1 − dN/dS will be a lower
population and
quantitative
genetics 85
bound on the true constraint if even a small fraction of mutations
are beneficial. Those cases where the gene is evolving more rapidly
at the protein level than at synonymous sites, i.e. dN /dS > 1, are
potentially strong candidates for positive selection rapidly driving
change at the protein level. We can identify genes that have dN/dS
significantly greater than one, either on the complete gene phylogeny,
or on particular branches. Note that is a very conservative test that
few genes in the genome meet, as many genes that are fixing adaptive
non-synonymous substitutions will have dN/dS < 1; even if adaptive
mutations are common, genes may still evolve in a constrained way
(i.e. dN/dS < 1) if the rapid fixation of beneficial mutations due to pos-
itive selection is outweighed by the loss of non-synonymous mutations
to negative selection.
Colobines
4.7/2.1 primate lysozyme gene, data from
9.3/1.0 Douc langur
Yang (1998). For each branch,
Angolan colobus the numbers give the estimated
4.3/1.1 average number of non-synonymous to
synonymous changes in the lysozyme
Rhesus macaque protein.
2.5/0.0
2.1/3.2 Lar gibbon
9.3/0.0
Human
2.0/1.1
3.1/2.1
Squirrel monkey
8.9/6.9 Marmoset
0.0/3.3
leading to apes (e.g. gibbons and humans) and Colobines (e.g. colobus
and langur monkeys). Colobines have leaf-based diets. They digest
these leaves by bacterial fermentation in their foregut, and then use
lysozymes to break down the bacteria to extract energy from the
leaves. In Colobines, the lysozyme protein has evolved to work well in
the high-PH environment of the stomach. Remarkably, the Colobine
lysozyme has convergently evolved this activity via very similar amino-
acid changes at 5 key residuals in cows and Hoatzins (a leaf eating
bird, Kornegay et al., 1994)
Figure 5.13: (Hoatzin (Opisthocomus
hoazin). A leaf-eating bird.
A history of birds (1910) Pycraft, W.P.
Image from the Biodiversity Heritage Library.
Contributed by American Museum of Natural
History Library. Not in copyright.
86 graham coop
Poly. Fixed
′
Non-Syn. µLN (1 − C)Ttot µLN (1 − C)Tdiv
′
Syn. µLS Ttot µLS Tdiv
Ratio LN (1 − C)/(LS ) LN (1 − C)/(LS )
Poly. Fixed
Non-Syn. 2 12 Figure 5.15: White admiral (Limenitis
Syn. 9 4 arthemis) and Viceroy (Limenitis
archippus). Basilarchia is the old
Ratio 2/9 3/1
genus that these two species were
originally placed in. Viceroy and
Note the strong excess of non-synonymous to synonymous diver- Monarch butterflies are Müllerian
gence compared to polymorphism (p-value of 0.006, Fisher’s exact mimics.
Field book of insects (1918). Lutz, F.E. .
illustrations by Edna L. Beutenmüller. Image
test), which is consistent with the gene evolving in an adaptive man- from the Biodiversity Heritage Library.
Contributed by MBLWHOI Library. Not in
ner among the two species. We would expect roughly only 3 non- copyright.
HT = 12 HS + 21 HB (6.2)
where HB is the probability that a pair of alleles drawn from our two
different sub-populations differ from each other. A pair of alleles from
different sub-populations cannot find a common ancestor with each
other for at least T generations into the past as they are in distinct
populations (not connected by migration). Once our alleles find them-
selves back in the combined ancestral population it takes them on
average 2N generations to coalesce. So the total opportunity for mu-
tation between our pair of alleles sampled from different populations is
2(T + 2N ) generations of meioses, such that the probability that our
pairs of alleles is different is
HB ≈ 2µ(T + 2N ) (6.3)
We can plug this into our expression for HT , and then that in turn
into FST . Doing so we find that
µT T
FST ≈ = (6.4)
µT + 4Ne µ T + 4Ne
population and
quantitative
genetics 91
Note that µ cancels out of this equation. In this simple toy model,
FST is increasing because the amount of between-population diversity
increases with the divergence time of the two populations (initially
linearly with T ). FST grows at a rate give by T/(4Ne ) so that differenti-
ation will be higher between populations separated by long divergence
times or with small effective population sizes.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Figure 6.4: An example of alleles
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
assorting among three populations
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
such that there is no incomplete
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
lineage sorting. Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Past Present
Generations
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Figure 6.5: An example of alleles
● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
assorting among three populations
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
leading to incomplete lineage sorting.
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
●
Code here.
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ● ●
Past Present
Generations
A B C A B C
population and
quantitative
genetics 95
genealogically than either is to their cousin, at any given locus one of
the siblings can share an allele IBD with their cousin that they do not
share with their own sibling, due to the randomness of Mendelian seg-
regation down their pedigree. In these cases, the average relatedness of
the individuals/populations disagrees with the patterns of relatedness
at a particular locus.
As an empirical example of incomplete lineage sorting, let’s con-
sider the work of Jennings and Edwards (2005) who sequenced
a single allele from three different species of Australian grass finches
(Poephila): two sister species of long-tailed finches (Poephila acuti-
cauda and P. hecki) and the black-throated finch (Poephila cincta,
see Figure 6.7). They collected sequence data for 30 genes, and con-
structed phylogenetic gene trees at each of these loci, resulting in 28
well-resolved gene trees. 16 of the gene trees showed P. acuticauda
and P. hecki as sisters with P. cincta) (the tree ((A,H),C) ), while for
twelve genes the gene tree was discordant with the population tree:
for seven of their genes P. hecki fell as an outgroup to the other two
and at five P. acuticauda fell as an outgroup (the trees ((A,C),H) and Figure 6.7: Banded Grass Finch (P.
cincta). Illustration by Elizabeth
((H,C),A) respectively).
Gould.
Let’s use the coalescent to understand this discordance between Birds of Australia Gould J. 1840. CC BY 4.0
uploaded to Flickr by rawpixel.com.
gene trees and species trees. Let’s assume that two sister populations
(A & B) split t1 generations in the past, with a deeper split from a
third outgroup population (C) t2 generations in the past. We’ll as-
sume that there’s no gene flow among our populations after each split.
We can trace back the ancestral lineages of our three alleles. The first
opportunity for the A & B lineages to coalesce is t1 generations ago.
If they coalesce with each other in their shared ancestral population
before t2 in the past (left side of Figure 6.6) their gene tree will def-
initely agree with the population tree. So the only way for the gene
tree to disagree with the population tree is for the A & B lineages to
fail to coalesce in their shared ancestral population between t1 and t2 ;
t −t
this happens with probability (1 − 1/2N ) 2 1 . We’ll get a discordant
gene tree if A & B make it back to the shared ancestral population
with C without coalescing, and then one or the other of them coalesces
with the C lineage before they coalesce with each other. This happens
with probability 2/3, as at the first pairwise-coalescent event there are
are three possible pairs of lineages that could coalesce, two of which
(A & C and B & C ) result in a discordant tree. So the probability
that we get a coalescent tree that is discordant with the population
tree is
2 t −t
(1 − 1/2N ) 2 1 . (6.10)
3
This equation allows us to relate the fraction of loci showing incom-
plete lineage sorting to the population genetics parameters of the
ancestral population.
96 graham coop
Testing for gene flow. We often want to test whether gene flow has
occurred between populations. For example, we might want to es-
tablish a case that interbreeding between humans and Neanderthals
occurred or demonstrate that gene flow occurred after two populations
began to speciate. A broad range of methods have been designed to
test for gene flow and to estimate gene flow rates based on neutral ex-
pectations. Here we’ll briefly just discuss one method based on some
simple coalescent ideas. Above we assumed that gene-tree population-
tree discordance was due to incomplete lineage sorting due to popu-
lations rapidly splitting. However, gene flow among populations can
also lead to gene-tree discordance. While both ILS and gene flow can
lead to discordance, under simplifying assumptions, ILS implies more
symmetry in how these discordances manifest themselves.
in addition to ILS the lineage from B can more recently coalesce with
the lineage from C, and so we should see more ABBAs than BABAs.
To test for this effect of gene flow, we can sample a sequence from
each of our 4 populations and count up the number of sites that show
the two mutational patterns consistent with the gene-tree discordance
population and
quantitative
genetics 97
nABBA and nBABA and calculate
nABBA − nBABA
(6.11)
nABBA + nBABA
This statistic will have expectation zero if the gene-tree discordance is
due to ILS and will be skewed negative if gene flow occurred from C
into B (and skewed positive if gene flow occurred from C into A).
7
Phenotypic Variation and the Resemblance Between
Relatives
There are many different ways to think about studying the path
240
calculate the phenotypic mean for each genotype at a locus. For exam-
ple, Wang et al. (2018) explored the genetic basis of budset time in
210
PtFT2 Genotype
role in flowering time regulation in other plants.
One way for us to assess the relationship between genotype and
Figure 7.2: The effect of a flowering
phenotype is to find the best fitting linear line through the data, i.e. time gene (PtFT2) SNP on bud-
fit a linear regression of phenotypes for our individuals on their geno- set time in European aspen. Each
dot gives the genotype-phenotype
combination for an individual. The
horizontal lines give the budset mean
for each genotype and the vertical
lines show the inter-quartile range.
The dotted line gives the linear re-
gression of phenotype on genotype.
Thanks to Pär Ingvarsson for sharing
these data from Wang et al. (2018).
100 graham coop
X ∼ µ + al Gl (7.1)
In the equation above, X is a vector of the phenotypes of a set of We’ll encounter linear regressions
individuals and Gl is our vector of genotypes at locus l, with Gi,l at various points during the next
few chapters, see the math appendix
taking the value 0, 1, or 2 depending on whether our individual i is around eqn A.43 for more background
homozygote, heterozygote, or the alternate homozygote at our locus of details.
interest. Here µ is our phenotypic mean. The slope of this regression
line (al ) has the interpretation of being the average effect of substitut-
ing a copy of allele 2 for a copy of allele 1. In our Aspen example the
slope is −13.6, i.e. swapping a single T for a G allele moves the budset
forward by 13.6 days, such that the GG homozygote is predicted to set
buds 27.2 days earlier than the T T homozygote.
As a measure of the significance of this genotype-phenotype rela-
tionship, we can calculate the p-value of our regression. To try and
identify loci that are associated with our trait genome-wide, we can
conduct this regression at each SNP we genotype in the genome.
One common way to display the results of such an analysis (called
a genome-wide association study or GWAS for short) is to plot the
logarithm of the p-value for each SNP along genome (a so-called Man-
hattan plot). Here’s one from Wang et al. (2018) for their Aspen
budset phenotype
is not an allele for budset, nor is PtFT2 a gene for budset. It is an “All that we mean when we speak
allele that is associated with budset in the sampled environments and of a gene [allele] for pink eyes is, a
gene which differentiates a pink eyed
populations. In a different set of environments, this allele’s effects fly from a normal one —not a gene
may be far smaller, and a different set of alleles may contribute to [allele] which produces pink eyes
per se, for the character pink eyes
phenotype variation. PtFT2, the gene our focal SNP falls close to, is is dependent on the action of many
just one of many genes and molecular pathways involved in budset. other genes.” - Sturtevant (1915)
A mutant screen for budset may uncover many genes with larger ef-
fects; this gene is just a locus that happens to be polymorphic in this
particular set of genotyped individuals.
While phenotypic variation for some phenotypes has a relatively
simple genetic basis, many phenotypes are likely much more geneti-
cally complex, involving the functional effect of many alleles at hun-
dreds or thousands of polymorphic loci. For example hundreds of
small effect loci affecting human height have been mapped in Euro-
pean populations to date. Such genetically complex traits are called
polygenic traits.
In this chapter, we will use our understanding of the sharing of
alleles between relatives to understand the phenotypic resemblance
between relatives in quantitative phenotypes. This will allow us to
understand the contribution of genetic variation to phenotypic varia-
tion. In the next chapter, we will then use these results to understand
the evolutionary change in quantitative phenotypes in response to
selection.
copies of allele 1 she has at this SNP. Her expected phenotype, given
her genotype at all L SNPs, is then
∑
L
E(Xi |Gi,1 , · · · , Gi,L ) = µ + XA,i = µ + Gi,l al (7.2)
l=1
where XE is the deviation from the mean phenotype due to the envi-
ronment. This XE includes the systematic effects of the environment
our individual finds herself in and all of the noise during development,
growth, and the various random insults that life throws at our indi-
vidual. If a reasonable number of loci contribute to variation in our
trait then we can approximate the distribution of XA,i by a normal
distribution due to the central limit theorem (see Figure 7.5). 2 Thus 2
The central limit theory is discussed
briefly in the math appendix section
if we can approximate the distribution of the effect of environmental
A.2.5.
variation on our trait (XE,i ) also by a normal distribution, which is
reasonable as there are many small environmental effects, then the
distribution of phenotypes within the population (Xi ) will be normally
distributed (see Figure 7.5).
where the 2pl (1 − pl ) term follows from the binomial sampling of two
alleles per individual at each locus. 4 4
These results follow from the proper-
ties of variance in math appendix eqn
Question 1. You have two biallelic SNPs contributing to variance (A.25).
in human height. At the first SNP you have an allele with an additive
104 graham coop
frequency
0.05
0.00
Code here.
1600 1650 1700 1750 −10 −5 0 5
Number of Alleles assoc. with increase height Height Polygenic Score (cm)
V ar(XA ) VA
h2 = = (7.8)
V V
Remember that we are thinking about a trait where all of the alleles
act in a perfectly additive manner. In this case our heritability h2
is referred to as the narrow sense heritability, the proportion of the
variance explained by the additive effect of our loci. When we allow
dominance and epistasis into our model, we’ll also have to define the
broad sense heritability (the total proportion of the phenotypic vari-
ance attributable to genetic variation).
The narrow sense heritability of a trait is a useful quantity; indeed
we’ll see shortly that it is exactly what we need to understand the
evolutionary response to selection on a quantitative phenotype. We
can calculate the narrow sense heritability by using the resemblance
between relatives. For example, if the phenotypic differences between
individuals in our population were solely determined by environmental
differences experienced by these different individuals, we should not
expect relatives to resemble each other any more than random individ-
uals drawn from the population. Now the obvious caveat here is that
relatives also share an environment, so may resemble each other due to
shared environmental effects.
Note that the heritability is a property of a sample from the pop-
ulation in a particular set of environments at a particular time.
Changes in the environment may change the phenotypic variance.
Changes in the environment may also change how our genetic alleles
are expressed through development and so change VA . Thus estimates
of heritability are not transferable across environments or populations.
To make our task easier, we will make two commonly made assump-
tions:
Id. Twins Cov= 0.979 Full Sibs Cov= 0.443 1/2 Sibs Cov= 0.246 1st Cousins Cov= 0.101
3
3
3
2
2
2
1
1
1
Ind 2's phenotype
−1 0
−1
−3 −2 −1
−3 −2 −1
−3
−3
−3 −1 1 2 3 −3 −1 1 2 3 4 −3 −1 0 1 2 3 −3 −1 1 2 3
Ind 1's phenotype Ind 1's phenotype Ind 1's phenotype Ind 1's phenotype
Cov(MZ1 , MZ2 )
h2 = = ρMZ (7.12)
70
● MZ
V ● DZ
●
● ●
●
● ●
60
● ●
●
where ρMZ is the correlation of pairs of MZ twins (see Appendix eqn ●
●
●
●
●
●
●
50
●
(A.42) for more on correlations). For example, we could estimate the ●
● ●
●
●
●
●
Twin 2 (PBF)
●
heritability of a measure of body from the MZ correlation in Figure ● ●
40
●
● ●
●
● ●
7.9. In general, this simple estimator isn’t great as the correlation of ●
●
●
30
● ●
● ●
●
●
●
●
the twins (i.e. Cov(X1E , X2E )). ●
●
●
●
●
●
●
10
●
Moreover, it can be inflated by non-additive effects as identical ●
●
●
●
●
●
●
●
twins don’t just share alleles, they share their entire genotypes, and ●
0
Twin 1 (PBF)
40 50 60 70
● ●
● ●
● ●
Offspring Mean Tarsus Length
●
Offspring Mean Beak Depth
● ●
●
●●
1
● ●
● ● ● ●● ●
● ●
● ● ● ● ● ● ● ●
● ● ● ● ●●
●● ●
● ● ● ●
● ● ●
● ● ●
● ●● ● ● ● ● ● ●●
● ●● ● ●● ● ●
0
●● ●
● ●● ● ● ● ● ●
● ●● ● ●
●●● ●
● ● ●● ●
● ● ● ● ● ● ●
●
●
●
● ● ●
● ●
● ● Figure 7.11: Parent-midpoint off-
● ●● ● ●
●
● ● ●
spring regression for beak depth and
−1
−1
● ● ●
●
● ● tarsus length in song sparrows. The
●
phenotypes have been standardized
−2
−2
types among individuals (h2 ≈ 1), then children will closely resemble
their parents. Conversely if much of the variation is environmental
(h2 ≈ 0), and there is no shared environment between parent and
child, children will not resemble their parents.
L = 100 VE= 100 , VA=1 L = 100 VE= 1 , VA=1 L = 100 VE= 0.001 , VA=1 Figure 7.12: Regression of child’s
phenotype of the parental mid-point
4
30
● ● ●
● ● ●
●
● ●
● ●●
●
●
●
●
●
● ●
●● ●
●
●
●
phenotype. The three panels show
2
● ● ● ●
● ●
decreasing levels of environmental
● ●
● ●
●●
●● ●● ● ●●
●● ● ●●
20
● ● ● ● ● ● ●
● ●● ● ● ● ● ● ● ●● ● ● ●
●● ●
● ●● ● ● ● ● ● ● ●
● ● ● ● ●● ● ●●●●● ● ● ● ● ●●
● ●● ● ● ●● ● ● ● ● ● ●●
● ●●●● ● ● ●
● ●● ●● ●● ●●●
●● ● ●
variance (VE ) holding the additive
2
● ● ● ● ●●
● ●
● ● ●● ●●
● ● ● ●●● ●● ● ●● ● ●● ● ●● ● ● ●● ●● ●●●● ● ●●● ●
● ●● ● ●
●
1
●● ● ●
●●●●● ●
●●●● ●
●●●● ●●● ●●●
●●● ●●●● ●● ● ● ● ● ●● ● ●●
●● ●●● ●●● ● ● ●●● ● ● ●● ● ●
●●● ●●● ●● ● ●●
● ● ●●● ●●●●● ●●●● ●●
10
●
● ●●
genetic variance constant (VA = 1).
● ● ● ● ●● ●● ●
● ●●●● ● ●●● ●●●●●●●● ●● ●● ● ●● ● ●●● ●● ● ●●●● ●
●●●
●● ●
● ● ●● ● ● ●●
●
Child's phenotype
Child's phenotype
Child's phenotype
● ● ● ●●● ●●● ●●●●
●●●●● ● ● ●
●● ●●●●● ● ●
● ● ●●●● ● ● ● ● ● ● ● ●● ● ●●● ●●
●●● ●
● ● ●
● ●● ●●●● ●●●● ●
●●● ●● ● ● ● ● ●● ●● ●
● ●●● ● ●● ●●● ● ● ●● ● ●●●
●● ● ●●● ●● ● ● ●● ●● ● ● ● ●● ●● ● ● ● ● ● ●
● ● ● ●● ●
● ●●●●●●●
●
●● ●● ● ●● ●● ●●●● ●●●●●●●● ●●●●●
●
In these figures, we simulate 100 loci,
●●● ●● ●●● ● ● ● ● ● ●● ● ●
●
●●● ●● ●●● ●●●●●●●
● ●●●
●● ●
● ●●
●● ● ●
● ●● ●●●● ●● ●●●●● ● ● ●●●●● ●●
●●●●●● ●
●● ●● ● ●●
● ● ● ●● ● ●● ● ●● ● ●● ● ●
0
●●●● ●●● ● ● ●● ●●● ●●● ● ●●●● ●● ●
●● ●●● ●●● ● ●● ● ●●● ● ● ● ●●●● ●● ●● ● ●● ●● ●●● ●
●●● ● ● ● ● ●●
● ●●●●
●●
●● ●●● ● ●
● ●●●●●● ● ●●●● ● ●● ●● ● ●●
● ●
●
●
●●
●●●●
●●●
● ● ●● ●
●● ●
●●● ●
●● ● ● ●● ●● ● ●●
0
●● ●
● ● ●●
0
●
● ●
●●●● ●● ●● ●●●●●●●● ●● ●● ●● ●●
● ● ●●●●● ●●● ●●●● ● ●●●●
● ● ●●●●
● ●●●●● ● ●
as described in the caption of Figure
●● ●● ● ● ● ● ●●●●● ●● ●● ● ●●
● ●●●● ● ● ● ● ●● ●●● ●● ●●●●●● ●● ●● ●● ● ●●● ●●● ● ●
● ●● ●
●● ●●●
● ●●●●●● ● ●
●
● ● ● ●● ●●●
● ● ● ●
● ●
●
●
●● ● ● ● ● ●
● ●
●
●
● ●
● ● ●
●
●●●● ● ●● ●● ● ● ● ●●● ●● ●●●● ● ●●
●●
● ●● ● ●
●
●
● ●● ● ●
● ●
● ●●● ● ● ● ●●●● ●●●●●●● ● ● ●●●● ● ●● ● ●● ● ●●● ●
● ●
● ● ●
●●● ● ● ● ● ● ●● ●● ● ●● ●● ●● ● ● ● ●● ● ● ●●
● ● ● ● ●●●● ●● ●●
● ●● ● ●● ●
7.5.We simulate the genotypes and
−1
●●●●● ● ● ●● ●●● ●● ● ● ● ● ●
●● ●● ●●
●●
●
●●●● ● ● ● ● ● ●●● ●
● ● ●
●●
●●
● ●●●●●● ●● ●
● ● ● ● ●● ●●●●●● ●
−10
● ● ● ● ● ● ● ●● ● ● ●● ● ● ●● ● ●
●●●
●●● ● ●●
●● ● ●●● ●●●● ● ●●
● ●●● ●●● ●● ● ●
●●● ● ●
● ● ●● ● ● ●● ●● ●●● ● ●●
●● ● ● ●●
● ●● ●● ●● ● ● ●●●
●●● ● ● ● ● ● ● ●●● ● ● ●●●●● ● ● ●●● ● ●
●
● ● ●●●● ● ●●●
● ● ●● ● ● ● ●●● ● ●
● ●●● ● ●● ●● ●
●●●● ●
●
phenotypes of the two parents, and
−2
● ●● ● ● ●● ● ● ● ● ●
● ● ●● ● ●● ● ●● ●
● ●●
●
● ● ●● ●● ●●●● ●
−2
● ●● ●●● ● ● ● ● ● ● ●● ● ●● ●
●
then simulate the child’s genotype
−20
●
● ●● ● ● ● ●● ● ●● ● ● ●●●●
● ● ● ●
● ● ●
●
●● ● ● ●
−4
Smith and Zach (1979), this halfing of the slope is due to the fact ●
●
● ●
Estimating heritability by these various parent-offspring regres- 5.6 5.8 6.0 6.2
●
●
lation arises due to pleiotropy, because loci that tend to affect trait 1
0.02
● ●
● ●
Height effect size
●●
● ●
● ● ● ●
●● ●
−0.02
We can store our variance and covariance values in matrices, a way −0.06 −0.04 −0.02 0.00 0.02 0.04
of gathering these terms that will be useful when we discuss selection: AAM effect size
( )
Figure 7.15: The additive effect sizes
V1 V1,2
V= (7.19) of loci associated with female Age at
V1,2 V2 Menarche (AAM) and their effect size
on Height in a European population.
and ( ) Data from Pickrell et al. (2016).
V1,A VA,1,2 Code here.
G= (7.20)
VA,1,2 V2,A
population and
quantitative
genetics 113
Here we’ve shown the matrices for two traits, but we can generalize
this to an arbitrary number of traits.
We can estimate these quantities, in a similar way as before, by
studying the covariance in different traits between relatives:
7.17, i.e. individuals were investing their energy in making either few
highly pulsed calls or many calls with few pulses. This phenotypic
covariance reflects underlying a genetic covariance between theses two
frog call characteristics (right side Figure 7.17). Fathers whose sons
have calls with highly pulsed calls also have sons whose calls are more
spaced apart.
22
14
12
2 5 10 20 50 5 10 15 20 25
● ●
and Canal. Code here.
Size of son's forehead patch
●
2
● ●
●
●
●
2
● ●● ● ●
● ● ● ●
● ● ●
● ●
● ●
1
● ● ● ●
● ● ● ● ●
● ● ●
●● ● ● ● ● ● ● ●
1
● ●
● ● ●
●●● ● ●
● ●●
● ● ● ● ● ●
● ●● ● ● ● ●
● ● ● ● ● ● ● ●
● ● ● ● ●
● ● ●● ●●● ●
● ● ● ●● ● ● ● ●● ● ●
● ● ●● ● ● ● ● ● ● ● ● ● ●
0
●●● ●● ● ●● ●
0
● ●● ●● ● ● ●
●
● ●● ● ●● ● ● ● ●●
●
●
● ● ●● ● ● ● ●
●●
● ● ●● ● ● ● ● ●● ● ● ● ● ●
● ●
●
● ●● ● ●
● ● ● ●
●
● ● ● ●● ● ● ●● ●
● ● ● ●● ● ●
● ●● ● ●●
● ● ●● ●● ● ● ● ● ●
−1
● ● ●
● ●● ● ● ● ●
● ● ●
●● ● ● ● ●
−1
● ●
●● ● ●● ● ● ● ●
●
● ● ● ●
●
● ● ● ●
● ●
● ●
−2
● ● ● ●
−2 −1 0 1 2 3 −2 −1 0 1 2
0.5
against the number of allele 1 carried
(from 0 for 22 to 2 for 11). Top
1.0
● ●
Row: Additive relationship between
0.0
genotype and phenotype. Bottom
0.5
● ●
Row: Allele 1 is dominant over allele
−0.5
2, such that the heterozygote has the
0.0
−1.0
The area of each circle is proportion
to the fraction of the population in
0 1 2 0 1 2
each genotypic class (p2 , 2pq, and
c(−0.5, 2.5) c(−0.5, 2.5) q 2 ). One the left column p = 0.1
0.4
0.8
● ● ● ●
●
●
●
regression between phenotype and
−0.2
the timing of puberty). The left side of Figure 7.22 shows the age at
sexual maturity in males. The L allele associated with slower sexual
maturity is recessive in males. While the LL homozygotes mature on
average a whole year later, the additive effect of the allele is weak
while the L allele is rare in the population. The right panel shows
the effect of the L allele in females. Note how the allele is much more
dominant in females, and has a much more pronounced additive ef-
fect. The dominance of an allele is not a fixed property of the allele
but rather a statement of the relationship of genotype to phenotype,
such that the dominance relationship between alleles may vary across
phenotypes and contexts (e.g. sexes).
●
●
circle is proportional to the fraction
2.0
2.4
Age at maturity
Age at maturity
2.0
1.8
● ●
1.2
●
1.6
EE LE LL EE LE LL
Genotype Genotype
The total additive variance for the whole genotype can be found by
118 graham coop
∑
L ∑
L
VA = VA,ℓ = 2pℓ qℓ αℓ2 . (7.26)
ℓ=1 ℓ=1
away from its additive genetic value, with similar expressions for each
of the homozygotes (dℓ,0 and dℓ,2 ). We can then write the dominance
variance at our locus as the genotype-frequency weighted sum of our
squared dominance deviations
∑
L
VD = VD,ℓ . (7.29)
ℓ=1
Having now partitioned all of the genetic variance into additive and
dominant terms, we can write our total genetic variance as
VG = VA + VD . (7.30)
When dominance is present in the loci influencing our trait (VD >
0), we need to modify our phenotype covariance among relatives to
population and
quantitative
genetics 119
account for this non-additivity. Specifically, our equation for the
covariance among a general pair of relatives (eqn. 7.15 for additive
variation) becomes
ample among pairs of loci). This gets a little tricky to think about,
so we will only briefly explain it. We can first estimate the additive
effect of the alleles by considering the effect of the alleles averaging
over their possible genetic backgrounds (including the other interact-
ing alleles they are possibly paired with), just as before. We can then
calculate the additive genetic variance from this. We can estimate the
dominance variance, by calculating the residual variance among geno-
types at a locus unexplained by the additive effect of the loci. We can
then estimate the epistatic variance by estimating the residual vari-
ance left unexplained among the two locus genotypes after accounting
for the additive and dominant deviations calculated from each locus
separately. In practice these high variance components are hard to
estimate, and usually small as much of our variance is assigned to the
additive effect. Again we would find that we mostly care about VA for
predicting short-term evolution, but that the contribution of loci to
the additive genetic variance will depend on the epistatic relationships
among loci.
Question 5. How could you use 1/2 sibs vs. full-sibs to estimate
VD ? Why might this be difficult in practice? Why are identical vs.
non-identical twins better suited for this?
tion has acted, and our mean phenotype before selection acts by µBS .
0
−2 0 2 4
This second quantity may be hard to measure, as obviously selection Phenotype
acts throughout the life-cycle, so it might be easier to think of this as Phenotype distribution after selection, parental mean= 2.48
−2 0 2 4
We are interested in predicting the distribution of phenotypes in Phenotype
the next generation. In particular, we are interested in the mean phe- Phenotype distribution in the children Mean in children = 1.2
where the outer expectation is over possible pairs of randomly mating VE=1, VA=1 (L =100)
4
an expression for this expectation: ●
● ●● ●
● ● ● ●
● ● ●
● ● ●● ● ● ● ● ● ● ● ●
● ● ● ●●● ●● ● ● ● ●●
2
● ●●●●
Child's phenotype
● ● ● ● ● ●● ● ●
● ● ● ● ● ● ●●● ● ● ●
0
●●
●●● ●●● ●
●
●●●● ●●● ●
● ●●●
●
●● ● ●
●● ●
●●●
●
●●●●●●●
●
●●
● ●●● S
●●
●●● ●●● ●
●●●
●
●●
● ● ●
−2
● ●● ● ●● ●● ●● ●
● ● ●● ● ●
point phenotype of pairs of individuals who survive to reproduce. Well ● ●
● ● ● ●● ● ●● ●
●●●● ●
● ● ●
● ●
this is just the expected phenotype in the individuals who survived to ●
●
−4
● ●
●
● ●
reproduce (µS ), so −3 −2 −1 0 1 2 3
Parental midpoint
●
●
● ●
Galen’s data the covariance of
●
●
●
mother and child is 1.3, while the
16
● ● ●
●
●
● ●
● variance of the mother is 2.8. Data
Offspring corolla flare
● ● ●
● ●
●
● ●
from Galen (1996). Code here.
● ● ●
14
● ●
●● ●
● ●
●
● ●
● ● ●
12
● ●
● ● ●
●
●
● ●
●
● ●
10
● ●
8
8 10 12 14 16 18
20
Parental gen. S
15
Frequency
10
5
Question 2. From the experiment shown in Figure 8.5, the mean
0
corn oil content in 1897 was 4.78, among the 24 individuals chosen to 3.5 4.0 4.5 5.0 5.5 6.0 6.5
all.inds
breed to for the next generation the mean was 5.2. The offspring of
10 15 20 25 30
Offspring gen. R
these individuals had a mean kernel oil content of 5.1. What is the
Frequency
narrow sense realized heritability?
5
0
To understand the genetic basis of the response to selection take a 3.5 4.0 4.5 5.0
Oil content (%)
5.5 6.0 6.5
look at Figure 8.6. The setup is the same as in our previous simulation
Figure 8.5: Top. Phenotypic distri-
figures. The individuals who are selected to form our next generation
bution of oil content in corn in 1897,
and the individuals who were selected
to breed for the next generation are
Parental generation marked in blue. Bottom. The distri-
bution in the next generation. Data
150
Frequency
Num. up alleles
carry more alleles that increase the phenotype in the current range of
environments currently experienced by the population. The average
individual before selection carried 100 of these ‘up’ alleles, while the
average individual surviving selection carries 108 ‘up’ alleles.
As individuals faithfully transmit their alleles to the next gener-
ation the average child of the selected parents carries 108 up alleles.
Note that the variance has changed little, the children have plenty of
variation in their genotype, such that selection can readily drive evo-
lution in future generations. The average frequency of an ‘up’ allele
has changed from 50% to 54%. Gains due to selection will be stably
inherited to future generations and can be compounded on generation
after generation if selection pressures were to remain constant.
25
use our breeder’s equation to predict the response. If we are willing to
20
assume that our heritability does not change and we maintain a con-
% Oil content
15
stant selection differential (S), then after n generations our phenotype
mean will have shifted
10
nh2 S (8.5)
5
i.e. our population will keep up a linear response to selection. There-
0
1900 1920 1940 1960 1980 2000
Year
Illinois long term selection experiment
2010
Year
● ●
● the Medium ground-finch population
Mean body size
●
●
measured each year. The 1973 95%
0.4
● ●
● ●
● confidence intervals are shown as
0.2
●
ized selection differentials on body
●
● ● ● ● ● ●
●
●
size. The statistical significance of
−0.2
●
● ● ●
● ●
the selection differentials is shown,
●
black points are p < 0.001 and grey
1975 1980 1985 1990 1995 2000 p < 0.05. Data from Grant and
0.5
●
Grant (2002) Code here.
Selection gradient
0.3
●
0.1
● ● ●
● ● ●
● ● ●
● ● ● ●
● ● ● ●
● ● ● ● ●
−0.1
● ● ●
●
Great Britain.
106
survive to reproduce is
p(x)w(x)
P(X|survive) = ∫ ∞ . (8.8)
−∞
p(x)w(x)dx
Wi ∼ βXi + w (8.14)
The best fitting slope of this regression (β), see math appendix around
eqn A.43 for more on linear regression, lets call it the ‘fitness gradi-
ent’, is given by
β = Cov(X, w(X)/w)/V (8.15)
population and
quantitative
genetics 129
i.e. the fitness gradient is the phenotype-fitness covariance di-
vided by the phenotypic variance. Using this result we can rewrite
the breeder’s equation as
R = VA β (8.16)
i.e. we’ll see a directional response to selection if there is a linear
relationship of phenotype on fitness, and if there is additive genetic
variance for the phenotype. As one example of a fitness gradient, in
●
40
is plotted against the weight of their antlers. The red line gives the
30
linear regression of fitness (LRS) on antler mass and the slope of this ●
●
20
line is the fitness gradient (β).
●
● ●
● ●
● ●
●
● ● ●●
● ●
10
● ● ● ● ●●
● ●
●●
● ● ●
● ● ●● ● ●
● ●●
●● ● ●● ●
●●●●●●● ●
●
● ● ●
●●● ●●● ●●●
● ●● ● ●
● ●●●●●
● ●●● ● ● ●●
●
●●
●
●●
●●
●●
●●
●
●
●
●●
●
●
●●●
●
●●
● ●●
●
● ● ●
0
the mean fitness of our population evolve? If we choose relative fitness 300 400 500 600 700 800 900 1000
1 ∂W
W ∂x
my.y x[−num.points]
W
All
Counts
Counts
Counts
Survivors
d d d
Fitness (w)
Fitness (w)
Fitness (w)
Fitness
Mean Fitness
Selection differential
0
●
●
●
●
●
fitness optimum is offered by a remarkable time-series of stickleback ●
●
●
●
●
●
●
evolution from a fossil lake-bed in Nevada (Bell et al., 2006). In ●
●
●
●
●
−stickleback_traj$gen
●
5000
●
●
this lake the layers of sediment are laid down each year allowing a ●
●
●
●
●
●
Years
●
very detailed time series with over five thousand fossils measured. The ●
●
●
●
●
●
●
time-series documents the evolution towards a new set of optimum ●
●
●
10000
●
●
●
phenotypes in the fifteen thousand years after the initial invasion of ●●
●
●
●
●
●
the lake by a heavily armoured stickleback species. In Figure 8.20 ●
●
●
●
●
●
●
the population mean number of touching pterygiophores, the bones ●
1.0 15000
●
●
●
supporting the dorsal spines, through the fossil record (Figure 8.21).
Note how quickly the species evolves toward its new value, presumably
fitness.surf
Fitness
stickleback_traj$touching
a fitness optimum in their new environment, and the long subsequent
0.5
time interval over which the population mean phenotype fluctuates
0.6 0.8 1.0 1.2 1.4
about its new value. Touching Pterygiophores
Hunt et al. (2008) fitted a model of a population adapting to a Figure 8.20: Top) A time series of
fitness landscape, with a single peak, to these time-series data. Their stickleback phenotypic evolution
fitted fitness surface is shown in the lower panel of Figure 8.20 . The from the fossil record. After a heavily
armoured stickleback invades the
arrows show the moves that the population mean phenotype is making lake it quickly evolves towards fewer
on this inferred fitness surface. The population initially takes large touching pterygiophores (the bones
supporting the dorsal spines). Fossil
steps up toward the peak of this surface and subsequently fluctuates measurements means are calculated
around the peak. Under the interpretation that there is a single sta- in 250 year bins. Bottom) How our
tionary peak these fluctuations represent genetic drift randomly knock- population moves on the Inferred
fitness landscape. The arrows show
ing the population off its optimum, with selection acting to restore the each move made by the population
population towards this local optimum. in the 250 intervals. Data from Bell
et al. (2006) and Hunt et al. (2008)
Code here.
Issues with the interpretation of fitness landscapes. In practice, fit-
ness landscapes may not be constant. The environment may be con-
stantly changing so our population is constantly forced to change to
keep up with the fitness peak. Indeed our environment may change
so quickly that our population cannot keep up with the peak. Our
population is still trying to increase its mean fitness, to ‘adapt’, but
the landscape itself is evolving.In the case of very rapid environmental
change our population may slide further and further away the peak,
and as a consequence its mean fitness decreases which may drive the
population to extinction if our population drops below w < 1 for long Figure 8.21: Fossil stickleback. Photo
enough. The conditions for extinction are an active area of research by Peter J. Park from Losos et al.
(2013), licensed under CC BY 4.0.
132 graham coop
in either tail have lower fitness, the result of which is to reduce the
population and
quantitative
genetics 133
1.00
● ● ●
number of births with different birth
2500
●
●
weights (left axis) Dots show the
0.50
●
● mortality probability for different
2000
Number of births
● birth-weight bins (right axis), the red
line shows a fitted quadratic model
0.20
Mortality
●
1500
to mortality. Data from Karn and
Penrose (1951) Table 2, collapsing
0.10
●
male and female births, Code here.
1000
● ● ●
0.05
●
500
●
●
●
● ●
0.02
●
●
●
0
0 5 10 15
Birth Weight (lb)
1.0
the the remarkable variation in beak
25
size within Black-bellied seedcrackers
0.8
(P. ostrinus). Right A histogram of
20
Prob. Survival
a beak size measurement in Black-
0.6
Count
bellied seedcrackers. All juveniles
15
are shown in grey, while the black
0.4
bars show the survivors. The red
10
curve shows the best fitting linear and
0.2
quadratic model to the probability
5
of survival, fitted using a binomial
0.0
0
generalized linear model with a logit
6.6 7.0 7.5 7.9 8.4 8.8 9.4 9.8
Lower mandible length (mm) link function.
Left illustration from: Size variation in
Pyrenestes by Chapin J.P. in the Bulletin of
the American Museum of Natural History
(Vol. XLIX 1923) Image from the Biodiversity
Heritage Library. Contributed by Toronto
Library. Not in copyright.
When the small, ball galls fall risk of parasitism from parasitoid
wasps. When all the ball galls are small in the population selection
drives strong positive directional selection on gall size, with little sta-
bilizing selection. Notice in the left panel of Figure 8.26 the good
agreement between the linear selection gradient and the fit including a
linear and quadratic term. However, bigger galls fall under the pall of
predation from downy woodpeckers and black-capped chickadees, who
seek out the tasty larvae. Thus intermediate size galls are favoured, a
fitness peak that the population quickly reaches. Once on this peak,
as shown in the right panel of Figure 8.26 there is no directional selec-
tion, i.e. no linear slope, but there is strong stabilizing selection, i.e. a
quadratic term. Thus the population will be maintained at this fitness
peak indefinitely if the environment remains unchanged.
Counts
0.6
● ●
●
● ●
● Library. Contributed by The LuEsther T Mertz
● ● ● ●
Library, the New York Botanical Garden. Not
● ●
in copyright.
Fitness (w)
Fitness (w)
0.5
0.5
● ● ● ●
0.4
0.4
● ●
● ●
● Fitness ●
0.3
0.3
● ●
● Mean Fitness ●
● ●
● Selection differential ●
Figure 8.26: Fitness surface for gall
0.2
0.2
● ●
●
● ● ● Quadratic ●
● ● ●
where the 1 and 2 index our two different traits. Here VA,1 and VA,2
are the additive genetic variance for trait 1 and 2 respectively, while
VA,1,2 is our additive covariance between our traits. Our selection gra-
dient for trait 1, β1 , represents the change in fitness as you change
trait 1 alone holding other traits constant constant. These β can be
estimated by multivariate regression, see brelow. The multivariate
breeders equation is a statement that our response in any one pheno-
type is modified by selection on other traits that genetically covary
with that trait.
We can also write this equivalently in matrix form, for an arbitrary
number of traits. Writing our change in the mean of our multiple
phenotypes within a generation as the vector S and our response
across multiple generations as the vector R. These two quantities are
related by
R = GV−1 S = Gβ (9.2)
where V and G are our matrices of the variance-covariance of pheno-
136 graham coop
types and additive genetic values (eqn. (7.20) (7.19)) and β is a vector
of selection gradients (i.e. the change within a generation as a fraction
of the total phenotypic variance). Note that β = V−1 S, such that
each β represents the selection gradient on a trait accounting for it’s
phenotypic covariances with other traits.
An example of the outcome of selection on multiple phenotypes
consider the bout of selection measured by Grant and Grant
(1995) in medium ground Darwin’s finch (Geospiza fortis). They mea-
sured 634 birds in ’76, of which only 15% survived to 1977. The birds
who survived were heavier and had longer, deeper bills than average.
1.5
● ● ● ● ● ● ●●●
● ●● ● ●
4
4
●
●
●
●
1.0
within a generation; recording an
Number of offspring
●
0.5
● ● ●● ● ●● ● ●
●●● ● ● ● ● ● ●●
● ●
● ●● ●
●●● ●● ●
3
3
● ●
● ●
●
● ● ●
● ●
0.0
●
●
●● ●
● ●
●●
●● ●● ●●●
●● ●●
● ● ● ● ● ● ● ●
● ● ● ●● ●● ● ●
are known to have additive genetic
2
2
●
−0.5
●
● ● ●
●
●
variation. The figures left to right
−1.0
● ● ●
● ●
● ● ● ● ● ● ●● ● ●● ●●
● ● ● ●
are A-C. (Data are simulated. Code
1
1
● ●
−1.5
●
here.)
−2.0
● ● ● ●
0
−3 −2 −1 0 1 2 −1 0 1 2 −1 0 1 2
Male antler size Female leg length 1/2−sister's leg length
Eye−span/body length
● ●
● ●
● ● ● ● ●
● ● ●
● ● ● ●
●
● ●
● ●
●
0.85
1.20
0.80
1.15
0 2 4 6 8 10 0 2 4 6 8 10
generations generations
Question 2.
At the end of ten generations in Wilkinson’s experiment (Figure
9.2), the males from the up- and down-selected lines had mean eye-
stalk to body ratios of 1.29 and 1.14 respectively, while the females
from the up- and down-selected lines had means of 0.9 and 0.82.
A) Wilkinson estimated that by selecting the top/bottom 10
males, he had on average shifted the mean body ratio by 0.024 within
Figure 9.3: Stalk-eyed Flies (Diopsi-
dae).
Diptera. van der Wulp. 1898. Image from the
Biodiversity Heritage Library. Contributed by
Smithsonian Libraries. Not in copyright.
138 graham coop
●
3
● ●
3
●
●
2
2
●
●
●
●
Stripes
Stripes
Stripes
● ● ●
● ●
1
1
● ● ●
● ● ● ● ●
●
● ● ● ●
● ● ● ●
● ● ● ●
● ●
● ● ●
0
0
●
0
● ● ●
● ●
● ● ● ●
● ● ●
● ● ● ● ●
● ●
● ● ●
● ●
●
−1
−1
● ● ● ●
● ● ● ● ● ●
−1
● ● ● ● ● ● ● ● ●
−1 0 1 2 3 4 −1 0 1 2 3 4 −1 0 1 2 3 4
Reversals Reversals Reversals
9.1.1 Hamilton’s Rule and the evolution of altruistic and selfish be-
haviours
“ ‘The only reason for making a buzzing-noise that I know of is be-
cause you’re a bee.’ Then [Pooh] thought another long time, and
said: ‘The only reason for being a bee that I know of is to make
honey...And the only reason for making honey is so as I can eat it.’ ”
–Winnie-the-Pooh, Milne and Shepard (1926).
One of the seismic shifts caused by Darwin’s work was the realisa-
tion that organisms don’t exist for the benefit of other individuals or
other species. Bees didn’t evolve to pollinate flowers, any more than
they evolved to make honey for bears. If we can say that there is a
‘reason’ why an organism exist it is only to leave offspring to the next
generation. Pooh can be forgiven for straying from Darwinian thought,
as he exists for the benefit of Christopher Robin and other childrens’
bedtime stories.
However, there’s a wrinkle to this Darwinian view. Worker bees
don’t make honey to benefit their offspring, they are sterile and are
working for the benefit of the Queen bee and her offspring. Individ- Maynard Smith (1964) coined the
uals frequently behave in ways that sacrifice their own fitness for the name kin selection to describe Hamil-
ton’s approach to this problem. It’s
benefit of others. That selection favours such apparent acts of altru- also sometimes called the inclusive fit-
ism is puzzling at first sight. Hamilton (1964a,b) supplied the first ness approach, as we need to include
not just one individual’s fitness but
general evolutionary explanation of such altruism. His intuition was the weighted sum of all the fitness of
that while an individual is losing out of some reproductive output, all their relatives.
the alleles underlying an altruistic behaviour can still spread in the
population if this cost is outweighed by benefits gained through the
transmission of these alleles through a related individual. Note that
this means that the allele is not acting in an self-sacrificing manner,
even though individuals may as a result.
Altruism reflects social interactions. So as a simple model let’s
imagine that individuals interact in pairs, with our focal individual i
140 graham coop
Fitness of ind. i
Where does this result come from? Well, we can use our quantita-
●
tive genetics framework to gain some intuition by deriving a simple C
version of Hamilton’s Rule by thinking about the phenotypes of an ●
Fitness of ind. i
where Wi is the contribution of the fitness of the individual i due to
their own phenotype, and Wj is the contribution to our individual i’s B
fitness due to the interacting individual j’s behaviour (i.e. j’s phe-
notype). With the benefit B and cost C, our W (i, j) are depicted in ●
Figure 9.7. 0
Altruistic pheno. of ind. j
1
2Fi,j VA
B VA >C
2Fi,j B >C (9.9)
200
●
to attract males flying by. Females with larger, brighter lanterns have
higher fecundity, so males with a preference for brighter flashes will
●
150
●
Number of Eggs
gain a direct benefit to their own fitness. (Note that males will bene- ● ●
100
●
●
● ●
50
● ●
●
●
6 8 10 12 14 16 18
However, even in the absence of direct benefits of choice, selection Lantern Size (mm2)
−3
−3
−3 −2 −1 0 1 2 3 4 −4 −2 0 2
R♂ = β♂ VA,♂ (9.10)
Let’s assume that the female preference trait, the degree to which
females are attracted to long tails, is not under direct selection β♀ = 0.
Then the response to selection of the preference trait can be written as
Up Selection
A response to selection was seen
Mean Female Orange Preference
Down Selection
due to the direct selection on male
2.0
0.20
Mean Orange Area
1.0
1.0 1.5 2.0 2.5 3.0 1.0 1.5 2.0 2.5 3.0
Generation Generation
“Socrates consisted of the genes his parents gave him, the experiences
they and his environment later provided, and a growth and develop-
ment mediated by numerous meals. For all I know, he may have been
very successful in the evolutionary sense of leaving numerous offspring.
His phenotype, nevertheless, was utterly destroyed by the hemlock
and has never since been duplicated. The same argument holds also
for genotypes. With Socrates’ death, not only did his phenotype dis-
appear, but also his genotype.[...] The loss of Socrates’ genotype is
not assuaged by any consideration of how prolifically he may have
reproduced. Socrates’ genes may be with us yet, but not his genotype,
because meiosis and recombination destroy genotypes as surely as
death.” –Williams (1966)
100
● ●
of SHIV in the blood of a macaque
Plasma numbers of SHIV (copies per ml)
1e+06
(black line), the frequency of a drug
80
●
from Feder et al. (2017). Code here.
1e+05
●
● ●
●
● ●
60
●
1e+04
● ●
●
40
Start of drug treatment
1e+03
● ●
20
●
1e+02
● ●
0
0 5 10 15 20
drug.
The frequency of allele A1 in the next generation is given by
Pt+1 W1 P t W1 pt W1
= pt+1 =
= = pt .
Pt+1 + Qt+1 W1 Pt + W2 Qt W1 pt + W2 qt Wt
(10.2)
Importantly, eqn. (10.2) tells us that the change in p only depends
on a ratio of fitnesses. Therefore, we need to specify fitness only up
to an arbitrary constant. As long as we multiply all fitnesses by the
same value, that constant will cancel out and eqn. (10.2) will hold.
Based on this argument, it is very common to scale absolute fitnesses
by the absolute fitness of one of the genotypes, e.g. the most or the
least fit genotype, to obtain relative fitnesses. Here, we will use wi for
the relative fitness of genotype i. If we choose to scale by the absolute
fitness of genotype A1 , we obtain the relative fitnesses w1 = W1 /W1 =
1 and w2 = W2 /W1 .
Without loss of generality, we can therefore rewrite eqn. (10.2) as
w1
pt+1 = pt , (10.3)
w
dropping the subscript t for the dependence of the mean fitness on
time in our notation, but remembering it. The change in frequency
from one generation to the next is then given by
Assuming that the fitnesses of the two alleles are constant over
time, the number of the two allelic types τ generations after time 0 are
Pτ = (W1 )τ P0 and Qτ = (W2 )τ Q0 , respectively. Therefore, the relative
frequency of allele A1 after τ generations past t is
(W1 )τ P0 (w1 )τ P0 p0
pτ = = = ,
(W1 )τ P τ
0 + (W2 ) Q0 (w1 ) P0 + (w2 )τ Q0
τ p0 + (w2 /w1 )τ q0
(10.5)
where the last step includes dividing the whole term by (w1 )τ and
switching from absolute to relative allele frequencies. Rearrange this
to obtain ( )τ
pτ p0 w1
= . (10.6)
qτ q0 w2
Solving this for τ yields
( ) ( )
pτ q0 w1
τ = log / log . (10.7)
qτ p0 w2
( )
1 1 − 2/N + 1/N 2
τ = log
s 1/N 2
1
≈ (log(N ) + log(N − 2))
s
2
≈ log(N ) (10.12)
s
where we make the approximations N 2 − 2N + 1 ≈ N 2 − 2N and later
N − 2 ≈ N.
Question 1. In our example of the evolution of drug resistance,
the drug-resistant SHIV virus spread from undetectable frequencies to
∼ 65% frequency by 16 weeks post infection. An estimated effective
population size of SHIV is 1.5 × 105 , and its generation time is ∼ 1
day. Assuming that the mutation arose as a single copy allele very
shortly the start of drug treatment at 12 weeks, what is the selection
coefficient favouring the drug resistance allele?
A1 A2
The term v Dry 2 1.57
u τ
u∏ Wet 1.16 1.57
t
τ
w1,i (10.17) Arithmetic Mean 1.58 1.57
i=1 Geometric Mean 1.52 1.57
Table 10.1: Fitnesses of two alleles in
is the geometric mean fitness of allele A1 over the τ generations wet and dry years. Means calculated
past generation t. Therefore, allele A1 will only increase in frequency assuming equal chances of wet and
dry years. The geometric mean is
if it has a higher geometric mean fitness than allele A2 (at least in our √
calculated as wwet wdry . Example
simple deterministic model). This implies that an allele with higher numbers taken from Seger and
geometric mean fitness can even invade and spread to fixation if its Brockmann (1987).
(arithmetic) mean fitness is lower than the dominant type. To see this
consider two alleles that experience the fitnesses given in Table 10.1.
The allele A1 does much better in dry years, but suffers in wet years;
while the A2 is generalist and is not affected by the variable environ-
ment. If there is an equal chance of a year being wet or dry, the A1
allele has higher (arithmetic) mean fitness, but it will be replaced by
the A2 allele as the A2 allele has higher geometric mean fitness (See
Figure 10.2).
here.
0.15
0.10
0.05
0.00
0 20 40 60 80 100
Generations
Evolution of bet hedging Don’t put your eggs in one basket, it makes
a lot of sense to spread your bets. Financial advisors often advise you
to diversify your portfolio, rather than placing all your investments
in one stock. Even if that stock looks very strong, you can come a
cropper that 1/20 times some particular part of the market crashes.
Likewise, evolution can result in risk averse strategies. Some species of
population and
quantitative
genetics 153
bird lay multiple nests of eggs; some plants don’t put all of their en-
ergy into seeds that will germinate next year. It can even make sense
to hedge your bets even if that comes at an average cost (Seger and
Brockmann, 1987).
To see this lets think more about geometric fitness. We can write
the relative fitness of an allele in a given generation i as wi = 1 + si ,
such that we can write your geometric fitness as
v
uτ −1
u∏
ḡ = tτ
1 + si (10.18)
i=1
when we think about products it’s often natural to take the log to
turn it into a sum
−1
( ) 1 τ∑ ( )
log ḡ = log 1 + si
τ i=1
[ ]
( )
=E log 1 + si (10.19)
[ ]
=E si − var(si )/2
(10.20)
1.1
Mean
0.9
ing a germination delay. However,
4
0.7
they are more likely to have all of
2
0.5
higher geometric fitness by only
0
●
●
Frequency
0.4
●
●
0.2
● ●
● ●
●
0.0
● ●● ●● ● ● ● ●●●
Years in Past
A1 A1 A1 A2 A2 A2
Absolute no. at birth N p2t N 2pt qt N qt2
Fitnesses W11 W12 W22
Absolute no. at reproduction N W11 p2t N W12 2pt qt N W22 qt2
W11 2 W12 W22 2
Relative freq. at reproduction W t
p W
2pt qt W t
q
Table 10.2: Relative genotype fre-
quencies after one episode of viability
As there is no difference in the fecundity of the three genotypes, the
selection.
allele frequencies in the zygotes forming the next generation are simply
the allele frequency among the reproducing individuals of the previous
generation. Hence, the frequency of A1 in generation t + 1 is
W11 p2t + W12 pt qt
pt+1 = . (10.26)
W
Note that, again, the absolute value of the fitnesses is irrelevant to the
frequency of the allele. Therefore, we can just as easily replace the
absolute fitnesses with the relative fitnesses. That is, we may replace
Wij by wij = Wij /W11 , for instance.
Each of our genotype frequencies is responding to selection in a
manner that depends just on its fitness compared to the mean fitness
of the population. For example, the frequency of the A1 A1 homozy-
gotes increases from birth to adulthood in proportion to W11/W . In
fact, we can estimate this fitness ratio for each genotype by compar-
ing the frequency at birth compared to adults. As an example of this
calculation, we’ll look at some data from sticklebacks.
Marine threespine stickleback (Gasterosteus aculeatus) indepen-
dently colonized and adapted to many freshwater lakes as glaciers
receded following the last ice age, making sticklebacks a wonderful sys-
Figure 10.7: Freshwater threespine
tem for studying the genetics of adaptation. In marine habitats, most Stickleback (G. aculeatus).
British fresh-water fishes. Houghton W 1879.
of the stickleback have armour plates to protect them from preda- Image from the Biodiversity Heritage Library.
Contributed by Ernst Mayr Library, Harvard..
tion, but freshwater populations repeatedly evolve the loss of armour Not in copyright.
(w1 − w2 )
∆pt = pt qt . (10.31)
w
158 graham coop
genotype A1 A1 A1 A2 A2 A2
absolute fitness W11 ≥ W12 ≥ W22
relative fitness (generic) w11 = W11 /W11 w12 = W12 /W11 w22 = W22 /W11
relative fitness (specific) 1 1 − sh 1 − s.
Here, the selection coefficient s is the difference in relative fitness
between the two homozygotes, and h is the dominance coefficient. For
selection to be directional, we require that 0 ≤ h ≤ 1 holds. The
dominance coefficient allows us to move between two extremes. One
is when h = 0, such that allele A1 is fully dominant and A2 fully
recessive. In this case, the heterozygote A1 A2 is as fit as the A1 A1
homozgyote genotype. The inverse holds when h = 1, such that allele
A1 is fully recessive and A2 fully dominant.
We can then rewrite eqn. (10.31) as
pt hs + qt s(1 − h)
∆pt = pt qt , (10.33) Figure 10.8: The trajectory of the
w frequency of allele A1 , starting from
where p0 = 0.01, for a selection coefficient
s = 0.01 and three different dom-
w = 1 − 2pt qt sh − qt2 s. (10.34) inance coefficients. The recessive
beneficial allele (h = 1) will eventually
Question 4. Throughout the Californian foothills are old cop- fix in the population, but it takes a
long time. Code here.
per and gold-mines, which have dumped out soils that are polluted
with heavy metals. While these toxic mine tailing are often depauper-
ate of plants, Mimulus guttatus and a number of other plant species
have managed to adapt to these harsh soils. Wright et al. (2015)
have mapped one of the major loci contributing to the adaptation to
soils at two mines near Copperopolis, CA. Wright et al. planted
homozygote seedlings out in the mine tailings and found that only
10% of the homozygotes for the non-copper-tolerant allele survived to
flower, while 40% of the copper-tolerant seedlings survived to flower.
A) What is the selection coefficient acting against the non-copper-
tolerant allele on the mine tailing?
red (rr)
60
50
cross (Rr)
morph. Data from Elton (1942),
compiled by Allendorf and Hard
20
●
●
(2009). Code here.
●
silver (RR)
10
● ● ● ●
● ● ●
Year
as a recessive trait, with much stronger selection against the silver ho-
mozygotes. As a result of this price difference, silver foxes were hunted
more intensely and declined as a proportion of the population in East-
ern Canada, see Figure 10.10, as documented by Elton, from 16% to
5% from 1834 to 1937. Haldane reanalyzed these data and showed
that they were consistent with recessive selection acting against the
silver morph alone. Note how the heterozygotes (cross) decline some-
what as a result of selection on the silver homozygotes, but overall the
population and
quantitative
genetics 161
R allele is slow to respond to selection as it is ‘hidden’ from selection
in the heterozygote state.
the 1950s. One of the ways that they’ve adapted is through the dele-
tion of their aryl hydrocarbon receptor (AHR) gene. Oziolor et al.
(2019) estimated that individuals who were homozygote for the intact
AHR gene had a relative fitness of 20% of that of homozygotes for
the deletion. Assuming an effective population size of 200 thousand
individuals, how long would it take for the deletion to reach fixation,
starting as a single copy in this population?
We’ll turn to these latter two explanations through this chapter and
the next. Note that these explanations are not mutually exclusive.
Each explanation will explain some proportion of the variation, and
these proportions will differ over species and classes of polymorphism.
A central challenge in population genomics is how we can do this in a
systematic way.
genotype A1 A1 A1 A2 A2 A2
absolute fitness w11 < w12 > w22
relative fitness (generic) w11 = W11 /W12 w12 = W12 /W12 w22 = W22 /W12
relative fitness (specific) 1 − s1 1 1 − s2
1.0
directional selection, but the reparameterization we have chosen here
0.8
makes the math easier.
0.6
In this case, when allele A1 is rare, it is often found in a heterozy-
p
gous state, while the A2 allele is usually in the homozygous state, and
0.4
so A1 is more fit and increases in frequency. However, when the allele
0.2
A1 is common, it is often found in a less fit homozygous state, while
0.0
the allele A2 is often found in a heterozygous state; thus it is now al- 0 50 100 150
0.015
for the equilibrium frequency of interest. This is also the frequency
of A1 at which the mean fitness of the population is maximized. The
0.005
∆p
highest possible fitness of the population would be achieved if every −0.005
homozygotes.
0.86
0.70
0.60
1.0
●
Yearly Viability x Fecundity
0.9
●
Figure 10.14: Top) The change in fre-
0.65
● ●
●
offspring per year
0.45
0.40
●
how the frequency change is positive
0.55
●
below the equilibrium frequency (pe )
0.6
0.35
0.30
Ho+, Ho+ Hop,Ho+ Hop,Hop Ho+, Ho+ Hop,Ho+ Hop,Hop Ho+, Ho+ Hop,Ho+ Hop,Hop equilibrium frequency (pe ). Code
here.
fitness (right plot Figure 10.15). The allele is thus balanced at in-
termediate frequency ( 50%) in the population due to this trade off
between fitness at different life history stages. The fitnesses here are chosen to
roughly match those of the real Soay
Question 7. Assume that the frequency of the HoP allele is 10%, sheep example, as a full model would
require us to more carefully model the
that there are 1000 males at birth, and that individual adults mate at
life-histories of the sheep.
random.
A) What is the expected number of males with each of the three
genotypes in the population at birth?
B) Assume that a typical male individual of each genotypes has the
following probability of surviving to adulthood:
Ho+ Ho+ Ho+ Hop Hop Hop
0.5 0.8 0.8
Making the assumptions from above, how many males of each geno-
type survive to reproduce? C) Of the males who survive to reproduce,
let’s say that males with the Ho+Ho+ and Ho+Hop genotype have on
average 2.5 offspring, while Hop Hop males have on average 1 offspring.
Taking into account both survival and reproduction, how many off-
spring do you expect each of the three genotypes to contribute to the
total population in the next generation?
D) What is the frequency of the Ho+ allele in the sperm that will
form this next generation?
E ) How would your answers to B-D change if the Hop allele was at
population and
quantitative
genetics 165
90% frequency?
0.6
fitness of each genotype away from
0.6
0.8
0.4
is shown as black dots. The area
0.4
0.6
0.2
0.2
0.4
● ● ●
● ●
● ● ●
0.0
0.2
● ● ●
●
The additive genetic fitness of each
genotype is shown as a red dot. The
−0.2
−0.2
0.0
●
● ● ● linear regression between fitness and
−0.4
−0.4
−0.2
0 1 2 0 1 2 0 1 2
the difference between the average
Genotype mean-centered phenotype and additive
genetic value for each genotype. The
left panel shows p = 0.1 and the
right panel shows p = 0.9; in the
middle panel the frequency is set to
To push our understanding of heterozygote advantage a little fur- the equilibrium frequency. Code here.
ther, note that the marginal fitnesses of our alleles are equivalent to
the additive effects of our alleles on fitness. Recall from our discus-
sion of non-additive variation (Section 7.1.1) that the difference in the
additive effects of the two alleles gives the slope of the regression of
additive genotypes on fitness, and that there is additive variance in
fitness when this slope is non-zero. So what’s happening here in our
heterozygote advantage model is that the marginal fitness of the A1
allele, the additive effect of allele A1 on fitness, is greater than the
marginal fitness of the A2 allele (w̄1 > w̄2 ) when A1 is at low fre-
quency in the population. In this case, the regression of fitness on the
number of A1 alleles in a genotype has a positive slope. This is true
when the frequency of the A1 allele is below the equilibrium frequency.
If the frequency of A1 is above the equilibrium frequency, then the
marginal fitness of allele A2 is higher than the marginal fitness of allele
A1 (w̄1 < w̄2 ) and the regression of fitness on the number of copies
of allele A1 that individuals carry is negative. In both cases there is
additive genetic variance for fitness (VA > 0) and the population has
a directional response. Only when the population is at its equilibrium
frequency, i.e. when w̄1 = w̄2 , is there no additive genetic variance
(VA = 0), as the linear regression of fitness on genotype is zero.
genotype A1 A1 A1 A2 A2 A2
absolute fitness w11 > w12 < w22
relative fitness (generic) w11 = W11 /W12 w12 = W12 /W12 w22 = W22 /W12
relative fitness (specific) 1 + s1 1 1 + s2
1.20
1.0
0.8
0.000
1.16
p 0.6
∆p
w
−0.010
1.12
0.4
0.2
1.08
−0.020
0.0
0 50 100 150 0.0 0.2 0.4 0.6 0.8 1.0 0.0 0.2 0.4 0.6 0.8 1.0
generations p p
Frequency
0.30
Pinon Flats
● ● ●
●
Dobzhansky (1946). Right) The
0.25
●
frequency of an allele at the Insulin-
0.4
●
an Orchard in Pennsylvania. Data
0.15
i.e. if the heterozygote has higher geometric mean fitness than the
A2 A2 homozygote.
The question now is whether allele A1 will approach fixation in
the population, or whether there are cases in which we can obtain a
balanced polymorphism. To investigate that, we can simply repeat our
analysis for q ≪ 1, and see that in that case
( t )
pt+1 ∏ w11,i p0
= . (10.44)
qt+1 w
i=0 12,i
q0
1.0
1.4
0.8
●
the left plot (for an initial frequency
0.4
0.2
10.2 Sex ratios, sex ratio distorters, and other selfish elements.
1.0
has manipulable sex ratio due to its
three factor sex determination. She
0.8 started two replicates with a strong
● ● female bias (black) and two replicates
Sex ratio (% males)
●
●
0.2
0.0
Generation
Generations
population and
quantitative
genetics 175
10.2.1 Selfish genetic elements and selection below the level of the
individual.
Selfish sex chromosomes and sex ratio distortion From the perspec-
tive of the autosomes a 50/50 sex ratio normally represents a stable
strategy, but all is not always harmonious in the genome. In systems
with XY sex determination, male fertilization by Y-bearing sperm
leads to sons, while male fertilization by X-bearing sperm leads to
daughters. From the viewpoint of the X chromosome the Y-bearing
sperm, and a male’s sons, are an evolutionary deadend. We can imag-
ine a mutation arising on the X chromosome that causes a poison to
be released during gametogenesis that kills Y-bearing sperm. This
would cause much of the ejaculate of the males carrying this mutation
to be X-bearing sperm, and so these males would have mostly daugh-
ters. Such an allele would potentially spread in the population as it
is over transmitted through males, even if it somewhat reduces the
fitness of the individuals who carry it. The spread of this allele would
strongly bias the population sex ratio towards females. Such ‘selfish’ X
alleles turn out to be relatively common, and they can often substan-
tially low the fitness of the bearer. They do not spread because they
are good for the individual but rather because they are favoured due
to selection below the level of the individual.
One example of a selfish X chromosome allele is the Winters sex-
ratio system found in Drosophila simulans, so named as it was found
Figure 10.29: Top) Normally de-
in flies collected around Winters, California (just a few miles down the
veloping spermatids in D. simulans.
road from Davis). In crosses males carrying the selfish X chromosome Bottom) Abnormally developing
have > 80% daughters. The gene responsible, Dox (Distorter on the spermatids in a male expressing dox.
The spermatids that look like rice
X), is a gene duplicated by transposition and produces a transcript crispies carry the Y chromosome,
which targets a region on the Y chromosome preventing the Y-bearing the normal, slender spermatids are
X-bearing spermatids. Figure from
sperm from developing Tao et al. (see Figure 10.29 from 2007).
Tao et al. (2007), cropped, licensed
The spread of such selfish sex chromosomes, distorting the sex ratio under CC BY 4.0.
strongly away from 50/50, can have profound effects for population
growth rates.3 However, the other sex chromosome and autosomes are 3
Indeed people have long discussed
using selfish Y chromosomes, driving
an over production of sons, for pop-
ulation control of malaria-spreading
mosquitos. Natural selfish systems on
the Y appear rare, likely because of
its low gene content.
176 graham coop
Question 10. Kin selection has been proposed as a way that the
male deleterious mitochondrial mutations could be removed from the
population. Can you explain this idea?
It’s not just chromosomes that get in on the act of the battle of
the sexes. Numerous arthropods, including a high proportion of in-
sects, are infected with the intracellular bacteria Wolbachia, which
are passed to offspring through the maternal cytoplasm. As they are Figure 10.33: male Eggspot butterfly
only transmitted by females, Wolbachia increase their transmission in (Hypolimnas bolina).
P. Cramer’s Uitlandsche kapellen (1780)
Image from the Biodiversity Heritage Library.
a variety of selfish ways including feminization of males and killing Contributed by Smithsonian Libraries. Not in
copyright.
male embryos. In one dramatic case, a male-killing Wolbachia strain
forced a sex ratio of 100 females to every 1 male in Hypolimnas bolina
(eggspot butterflies) throughout Southeast Asia. This extreme sex
ratio persisted for many decades, according to the analysis of museum
collections from the late 19C, before the sex ratio was rapidly restored
to 50/50 by the spread of an autosomal suppressing allele. The autoso-
mal supressor allele spread very rapidly within populations taking just
5 years to spread through the population from 2001 to 2006.
Selfish Autosomal Systems Self genetic systems can also arise and
cause genetic conflicts on the autosomes. The interests of autosomal
alleles are usually relatively well aligned with promoting the fitness of
the individual who carries them. However, these interests can diverge
during meiosis and gametogenesis. After all, there are two alleles at
each autosomal locus but only one of them will get passed to a child,
therefore there can be competition to be in gamete transmitted to the
next generation.
The four products of meiosis in the fungus Podospora anserina are
arrayed in the ascus4 of the spores for the next generation. There is 4
from the Greek word askos meaning
a polymorphism S/T at the Spok gene in this species. In spores from wineskin.
population and
quantitative
genetics 179
SxS and TxT individuals all four products are present. However, only
two out of four spores are present in the ∼ 90% of asci from SxT
individuals (Grognet et al., 2014). The T allele is releasing a toxin
that poisons off the S carrying spores. The jury is still out on whether
the T allele spread due to the advantage created by sabotaging its
rival product of meiosis (Sweigart et al., 2019). However, in other
systems it is clear that alleles have spread due to their selfish actions.
A A
A
A A
A A
A A A B
A B B
1.0
male meiosis all four products of meiosis become gametes. However,
0.8
only 1 of the four products of female meiosis becomes the egg, the
other 3 products are fated to become the polar bodies. Thus alle-
0.6
p
les can cheat in female meiosis by preferentially getting transmitted
0.4
into the egg rather than the polar body. If an allele on a red chromo-
0.2
some (in top panel of Figure 10.35) can manipulate any asymmetry of
meioses so that it can be present in the egg > 50% of the time it will
0.0
0 20 40 60
have a transmission advantage in female heterozygotes. Generations
To see how such drivers can spread through the population lets
consider the case of a population where an allele drives in both male Figure 10.36: The fate of an unfit
transmission distorter allele. If trans-
and female gametogenesis. (Most selfish alleles will be sex-specific, but
mission is fair (α = 1/2, blue curve)
that makes the math a little more tricky.) Imagine a randomly-mating the allele is lost, but the stronger
population of hermaphrodites. In this population, a derived allele (D) its drive in heterozygotes the faster
its spread and the higher its final
segregates that distorts transmission in its favour over the ancestral frequency in the population (black
allele (d) in the production of all the gametes of heterozygotes. The and red curves, α = 0.7 & 0.9 re-
spectively). With fitnesses wdd = 1,
drive leads to a fraction α of the gametes of heterozygotes (D/d) to
wDd = 0.95, and wDD = 0.1. The
carry the D allele (α ≥ 0.5). The D allele causes viability problems dotted lines show the predicted equi-
such that the relative fitnesses are wdd = 1, 1 > wDd ≥ wDD . If librium. Code here.
∂W (s, sp )
=0 (10.47)
∂s s∗ =s=sp
rate. Intuitively this is because, given a fixed mutation rate, less dele-
terious alleles can rise to a higher equilibrium frequency, and thus
contribute the same total load as more deleterious (rarer) alleles, but
this load is spread across more individuals in the population. Note
that this result applies only if the mutation is not totally recessive, i.e.
if h > 0.
A fitness reduction of 2µ is very small, given that the mutation
rate of a gene is likely < 10−5 . However, if there are many loci seg-
regating at mutation–selection balance, small fitness reductions can
accumulate to a substantial so-called genetic load, a major cause of
variation in fitness-related traits among individuals. For example,
the human genome contains over twenty thousand genes, and many
other functional regions, the vast majority of which will be subject
to purifying selection against mutations that disrupt their function.
In humans, most loss of function (LOF) variants, which severely dis-
rupt a protein-coding gene, are found at low frequencies. However,
each human genome typically carries over a hundred LOF variants
(MacArthur et al., 2012; Lek et al., 2016). Not every LOF allele
will be deleterious; some could even be advantageous. However, the
combined load of these LOF alleles must on average lower our fitness,
otherwise selection wouldn’t be removing them from the population.
Each one of us carries a unique set of these LOF alleles, usually in a
heterozygous state. We differ slightly in how many of these alleles we
carry. For example, the left side of Figure 11.2 shows the distribution
of the number of LOF alleles carried by 769 individuals of Dutch an-
cestry. The individuals who carry fewer of these LOF alleles will on
average have higher fitness than those individuals with more.
●
1.2
●
project. Data from Francioli et al..
Number of Individuals
●
●
40
●
The average individual (red line)
1.1
Relative Fitness
●
●
● carries 144 LOF alleles. Right). The
30
●
●
relative fitness of individuals carrying
1.0
●
●
these varying numbers of LOF alleles,
20
●
●
●
0.9
●
●
●
assuming multiplicative selection and
10
●
●
●
a selection coefficient of sh = 10−2
0.8
●
●
● acting against these alleles (Cassa
0
1.0
●
●
●●
● depression over different degrees of
●●
●● inbreeding in S. latifolia. Each point
● ●
● is the mean seed germination rates
Full−sib
●
0.8
●
Outbred
● for different family crosses. Data from
●
Germination Rate
●
● Richards. Code here.
●
●
●
●
●
●
● ●
● ●
●
0.4
●
●●
Half−sib
●
●
● ●
●
●
0.2
●
●
●
●
●
0.0
4000
1250
Zn
500
450
220
(P.P.M.)
● ●
●
70
●
Zinc Tolerance
ically scales, over which seed and pollen will definitely move, we see
strong local adaptation. Zinc-intolerant alleles are nearly absent from
the mine tailings because they prevent plants from growing on these
zinc-heavy soils; conversely, zinc-tolerant alleles do not spread into
the meadow populations, likely due to some trade-off or fitness cost of
zinc-tolerance.
As a first pass at developing a model of local adaptation, let’s con-
sider a haploid two-allele model with two different populations, see
Figure 11.7, where the relative fitnesses of our alleles are as follows
190 graham coop
∆M ig. q1 ≈ m (11.10)
Figure 11.7: Setup of a two-
population haploid model of local
while as noted above ∆S q1 = −sq1 , so that migration and selection adaptation.
are at an equilibrium when 0 = ∆S q1 + ∆M ig. q1 , i.e. an equilibrium
frequency of allele 2 in population 1 of
m
qe,1 = (11.11)
s
Here, migration is playing the role of mutation and so migration–
selection balance (at least under strong selection) is analogous to
mutation–selection balance.
We can use this same model by analogy for the case of migration–
selection balance in a diploid model. For the diploid case, we replace
our haploid s by the cost to heterozygotes hs from our directional
selection model, resulting in a diploid migration–selection balance
equilibrium frequency of
m
qe,1 = (11.12)
hs
As an example of fine-scale local adaptation due to a single lo-
cus, consider the case of the rock pocket mice adapting to lava flows.
Throughout the deserts of the American Southwest there are old lava
flows, where the rocks and soils are much dark than the surrounding
desert.
population and
quantitative
genetics 191
●
0.2
Melanic Phenotype
● ● MC1R D allele
●
0.0
Location
The width of a genetic cline. We can also extend these ideas beyond
our discrete model to a model of a population spread out on a land-
192 graham coop
1.0
les are being moved back and forth over the environmental transition
0.8
Frequency of allele 2, q(x)
much faster than selection can act against these alleles and so the cline
0.6
would be very wide.
0.4
The width of our cline, i.e. the distance over which we make this
0.2
shift from allele 2 to allele 1 predominating, can be defined in a num-
0.0
−100 −50 0 50 100
ber of different ways. One way to define the cline width, which is Position x, km
1.0
● ●
●
allele frequencies, the solid lines
● ● clines fit under a migration-selection
0.8
0.8
●
Allele frequency
Allele frequency
●
●
●
● balance model of a cline. These allele
● ● ● frequencies represent collections over
0.6
0.6
●
●
● ●●
two summers, the frequencies of the
●
0.4
0.4
● ●
●
●
● ●
alleles are substantially reduced in
● ●
●
●
●
●
●
●
the winter due to the reduced use of
0.2
0.2
●
Treated Untreated Treated Untreated pesticides. Data from Lenormand
et al. (1999). Code here.
0 10 20 30 40 50 0 10 20 30 40 50
higher cost to the Ace 1 allele than Ester allele in untreated areas
(c = 0.11 and c = 07 respectively) potentially explaining the less
extreme cline for Ester allele than the Ace 1 allele. Despite these
strong selection pressures we still see a cline over tens of kilometers
because dispersal is relatively high (σ = 6.6km per generation).
Hybrid zones Local adaptation isn’t the only way that selection can
generate strong spatial patterns. We can also see strong selection-
driven clines when partially-reproductively isolated species spread
back in to secondary contact they can hybridize bringing alleles to-
gether that may not work well with each other. One simple model of
is to think about an under-dominant polymorphism, i.e. where the
heterozygote has lower fitness. The two ancestral populations are al-
ternatively fixed for the two fitter homozygote states, e.g. ancestral
population 1 fixed A1 A1 and ancestral population two the A2 A2 . The
hybrid population forming at the mating edge between the two an-
cestral populations has a high frequency of the less fit heterozygotes.
Thus hybrids are at a disadvantage, potentially acting to keep the two
populations from collapsing into each other.
●● ●●
● ● ● southern locations to the right of the
● ●●
● graph). This represents data from
●
●● four different valleys in the French
●
0.8
● ●
● The red curve is the fitted cline under
●●
a model of heterozygote disadvan-
0.6
● ●
● ●
● ●
tage (Bazykin, 1969). Data from
● Barton and Hewitt (1981), Code
●
● here.
0.4
● ●
●
●
●
● ●
0.2
● ●●
●
●● ●
●
●
0.0
●●
●● ● ●● ●
●●
● ●● ●
−500 0 500
Distance (m)
The stronger the selection the more abrupt the transition between
the populations. These wingless grasshoppers move σ ∼ 20 meters
a generation. Thus a reduction in the relative fitness of the hybrid
would be needed to explain this hybrid zone with a width of ∼ 800m.
More generally we can see tension zones arise when hybrids have re-
duced fitness compared to either species. For example, this can occur
due to be due to bad epistatic interactions between alleles from each
species. If selection is strong enough on hybrids, often because many
loci are involved in incompatibilities between the species, the entire
genome can be tied up in a tension zone between the two species.
To obtain q(x + ∆x, t), let’s take a Taylor series expansion of q(x, t):
dq(x, t) 1 d2 q(x, t)
q(x + ∆x, t) = q(x, t) + ∆x + 2 (∆x)2 + ··· (11.16)
dx dx2
then
(∫ ∞ ) (∫ ∞ )
dq(x, t) 1 d2 q(x, t)
q(x, t+1) = q(x, t)+ ∆xg(∆x)d∆x +2 (∆x)2 g(∆x)d∆x +· · ·
−∞ dx −∞ dx2
(11.17)
∫∞
Because g( ) has a mean of zero, −∞ ∆xg(∆x)d∆x = 0, and has
∫∞
because g( ) has variance σ 2 , −∞ (∆x)2 g(∆x)d∆x = σ 2 . All higher
order terms in our Taylor series expansion cancel out (as all high
moments of the normal distribution are zero). Looking at the change
in allele frequency, ∆q(x, t) = q(x, t + 1) − q(x, t), so
σ 2 d2 q(x, t)
∆q(x, t) = (11.18)
2 dx2
This is a diffusion equation, so that migration is acting to smooth
2
out allele frequency differences with a diffusion constant of σ2 . This is
exactly analogous to the equation describing how a gas diffuses out to
equal density, as both particles in a gas and our individuals of type 2
are performing Brownian motion (blurring our eyes and seeing time as
continuous).
We will now introduce fitness differences into our model and set the
relative fitnesses of allele 1 and 2 at location x to be 1 and 1 + sγ(x).
To make progress in this model, we’ll have to assume that selection
isn’t too strong, i.e. sγ(x) ≪ 1 for all x. The change in frequency of
allele 2 obtained within a generation due to selection is
( )
q ′ (x, t) − q(x, t) ≈ sγ(x)q(x, t) 1 − q(x, t) (11.19)
i.e. logistic growth of our favoured allele at location x. Putting our
selection and migration terms together, we find the total change in
allele frequency at location x in one generation is
( ) σ 2 d2 q(x, t)
q(x, t + 1) − q(x, t) = sγ(x)q(x, t) 1 − q(x, t) + (11.20)
2 dx2
In deriving this result, we have essentially assumed that migration
acted upon our original allele frequencies before selection, and in doing
so have ignored terms of the order of σs.
population and
quantitative
genetics 197
σ 2 d2 q(x)
sγ(x)q(x) (1 − q(x)) = − (11.21)
2 dx2
We then could solve this differential equation with appropriate bound-
ary conditions (q(−∞) = 1 and q(∞) = 0) to arrive at the appropriate
functional form for our cline. While we won’t go into the solution of
this equation here, we can note that by dividing our distance x by
√
ℓ = σ/ s, we can remove the effect of our parameters from the above
equation. This compound parameter ℓ is the characteristic length of
our cline, and it is this parameter which determines over what geo-
graphic scale we change from allele 2 predominating to allele 1 pre-
dominating as we move across our environmental shift.
12
The Impact of Genetic Drift on Selected Alleles
A B C D
Gen. 1 2 1 2 …. 1 2 …. 1 2 ….
Prob. P0 P1 P2 P3
Prob. loss 1 pL pL2 pL3
2. Alternatively, our allele could leave one copy of itself to the next
generation (with probability P1 ), in which case with probability pL
this copy eventually goes extinct (Figure 12.1B).
3. Our allele could leave two copies of itself to the next generation
(with probability P2 ), in which case with probability p2L both of
these copies eventually go extinct (Figure 12.1C).
4. More generally, our allele could leave could leave k copies (k > 0)
of itself to the next generation (with probability Pk ), in which case
with probability pkL all of these copies eventually go extinct (e.g.
Figure 12.1D).
(1 + s)i e−(1+s)
Pi = (12.2)
i!
Substituting Pk into the equation above, we see
∞
∑ (1 + s)k e−(1+s)
pL = pkL
k!
k=0
( ∞
)
∑ (pL (1 + s))
k
= e−(1+s) (12.3)
k!
k=0
Solving for pL would give us our probability of loss for any selection
coefficient. Let’s rewrite our result in terms of the the probability of
escaping loss, pF = 1 − pL . We can rewrite eqn. (12.4) as
reason why combinations of drugs are used against viruses like HIV
and malaria is that, even if the viruses adapt to one of the drugs, the
viral load (N ) of the patient is greatly reduced, making it very un-
likely that the population will manage to fix a second drug-resistant
allele.
pF = 2hs (12.8)
Over roughly the past ten thousand years, adaptive alleles con-
ferring resistance to malaria have arisen in a number of genes and
spread through human populations in areas where malaria is en-
demic (Kwiatkowski, 2005). One particularly impressive case of
convergent evolution in response to selection pressures imposed by
malaria are the numerous changes throughout the G6PD gene, which
include at least 15 common variants in Central and Eastern Asia
alone that lower the activity of the enzyme (Howes et al., 2013).
These alleles are now found at a combined frequency of around 8%
frequency in malaria endemic areas, rarely exceeding 20% (Howes
population and
quantitative
genetics 203
et al., 2012). Whether these variants all confer resistance to malaria
is unknown, but a number of these alleles have demonstrated effects
against malaria and are thought to have a selective advantage to het-
erozygotes sh > 5% where malaria is endemic (Ruwende et al.,
1995; Tishkoff et al., 2001; Louicharoen et al., 2009).
With a 5% advantage in heterozygotes, a G6PD allele present
as a single copy would only have a 10% probability of fixing in the
population. If that’s so, how come malaria adaptation has repeat-
edly occurred via changes at G6PD? Well, maybe adaptation didn’t
start from a single copy of the selected allele. How many copies of the
G6PD-deficiency alleles do we expect were segregating in the popula-
tion before selection pressures changed?
In the absence of malaria, these G6PD alleles are deleterious with
carriers suffering from G6PD deficiency, leading to hemolytic anemia
when individuals are exposed to a variety of different compounds,
notably those present in fava beans. There’s upward of one hundred
bases where G6PD-deficiency alleles can arise, so assuming a mutation Figure 12.3: Pythagoras’s “just say no
rate of ≈ 10−8 per base pair per generation, we can roughly estimate to fava beans” campaign. Pythagoras
prohibited the consumption of fava
the rate of mutations arising that affect the G6PD gene as µ ≈ 10−6 beans by his followers; perhaps be-
per generation. In the absence of malaria, the selective cost of being cause favaism, the anemia induced in
G6PD-deficient individuals by fava
a heterozygotes carrier of a G6PD-deficient allele must have been on
beans, is relatively common in the
the order of 5% or more, and thus the frequency of the allele under Mediterranean due to adaptation to
mutation-selection balance would have been ≈ 10 /0.05 = 2 × 10−5 .
−6
endemic malaria. French early 16th
Century. Woodner Collection, Na-
Assuming an effective population size of 2 − 20 million individu- tional Gallery of Art. Public Domain,
als, roughly five to ten thousand years ago that means that there wikimedia.
A full analysis of this case requires
would have been forty to four hundred copies of the G6PD-deficiency
modeling of G6PD’s X chromosome
allele present in the population when selection pressures shifted at inheritance, and the randomness in
the introduction of malaria. The chance that one of these newly the number of copies of the allele
present at mutation-selection balance
adaptive alleles is lost is 90% but the chance that they’re all lost is (Ralph and Coop, 2015).
< (0.9)40 ≈ 0.02, i.e. there would have been a greater than 98% chance
that adaptation would occur via one or more alleles at G6PD. How
many alleles would escape drift? Well with 40 − 400 copies of the allele
pre-malaria, and each of them having a 10% probability of escaping
drift, we expect between 4 and 40 G6PD alleles to escape drift and
contribute to adaptation. We see 15 common G6PD alleles in Eurasia,
so our simple model of adaptation from mutation-selection balance
seems reasonable.
Question 1. ‘Haldane’s sieve’ is the name for the idea that the
mutations that contribute to adaptation are likely to be dominant or
Figure 12.4: Haldane’s sieve. To
at least co-dominant. our knowledge Haldane never wore
A) Briefly explain this argument with a verbal model relating to a sieve, but we assume he owned
one. Sieve, Flickr licensed under CC
the results we’ve developed in the last two chapters. BY 2.0. Haldane, Public Domain
B) Haldane’s sieve is thought to be less important for adaptation wikimedia.
from previously deleterious standing variation, than adaptation from
204 graham coop
new mutation. Can you explain the intuition behind of this idea?
C) Haldane’s sieve is likely to be less important in inbred, e.g.
selfing, populations. Why is this?
● ●
●
●
●
●
●
●
●
dominate the dynamics of alleles and they will behave like they’re
0.3
●
●
●
●
Subterranean Surface
of selection and genetic drift. But how weak must selection on an
allele be for drift to overpower selection? And do these interactions
between selection and drift have longterm consequences for genome-
Figure 12.6: Asellid isopods have
wide patterns evolution? repeatedly invaded subterranean,
To model selection and drift each generation, we can first calculate ground-water habitats from surface-
water habitats, and leading to a
the deterministic change in our allele frequency due to selection using genome-wide increase in dN/dS
our deterministic formula. Then, using our newly calculated expected and larger genomes (Data from
Lefébure et al., 2017, compar-
allele frequency, we can binomially sample two alleles for each of our
ing independent isopod species pairs).
offspring to construct the next generation. This approach to jointly One possible explanation of this is
modeling genetic drift and selection is called the Wright-Fisher model. that the longterm effective population
sizes of the subterranean species are
Under the Wright-Fisher model, we will calculate the expected lower and so these species are less
able to prevent mildly deleterious
alleles fixing, and also less able to
prevent genome expansion from the
accumulation of weakly deleterious,
extraneous genomic DNA. Code here.
population and
quantitative
genetics 205
change in allele frequency due to selection and the variance around
this expectation due to drift. To make our calculations simpler, let’s
assume an additive model, i.e. h = 1/2, and that s ≪ 1 so that w ≈ 1.
Using our directional selection deterministic model, from Chapter 10,
and these approximations gives us our deterministic change due to
selection
s
∆S p = E(∆p) = p(1 − p) (12.9)
2
To obtain our new frequency in the next generation, p1 , we binomially
sample from our new deterministic frequency p′ = p + ∆S p, so the
variance in our allele frequency change from one generation to the
next is given by
p′ (1 − p′ ) p(1 − p)
V ar(∆p) = V ar(p1 − p) = V ar(p1 ) = ≈ . (12.10)
2N 2N
where the previous allele frequency p drops out because it is a con- To see this denote our new count of
stant and the variance in our new allele frequency follows from the allele 1 by i, then
fact that we are binomially sampling 2N new alleles from a frequency Var(p1 − p) = i
Var( 2N − p) = Var( 2N
i
)
p′ to form the next generation. =
Var(i)
(2N )2
To get our first look at the relative effects of selection vs. drift we and from binomial sampling Var(i) =
can simply look at when our change in allele frequency caused by se- 2N p′ (1 − p′ ) and so we arrive at our
answer. Assuming that s ≪ 1, p′ ≈ p,
lection within a generate is reasonably faithfully passed down through
then in practice we can use
the generations. In particular, if our expected change in allele fre-
Var(∆p) = Var(p′ − p) ≈ p(1−p)/2N .
quency is much greater than the variance around this change, genetic
drift will play little role in the fate of our selected allele (once the al-
lele is not at low copy number within the population). When does se-
lection dominant genetic drift? This will happen if E(∆p) ≫ V ar(∆p),
i.e. when |N s| ≫ 1. Conversely, any hope of our selected allele follow-
ing its deterministic path will be quickly undone if our change in allele
frequencies due to selection is much less than the variance induced by
drift. So if the absolute value of our population-size-scaled selection
coefficient |N s| ≪ 1, then drift will dominate the fate of our allele.
To make further progress on understanding the fate of alleles with
selection coefficients of the order 1/N requires more careful modeling.
However, under our diploid model, with an additive selection coef-
ficient s, we can obtain the probability that allele 1 fixes within the
population, starting from a frequency p :
1 − e−2N sp
pF (p) = (12.11)
1 − e−2N s
The proof of this result is sketched out below (see Section 12.2.1). A
new allele that arrives in the population at frequency p = 1/(2N ) has
a probability of reaching fixation of
( )
1 1 − e−s
pF = (12.12)
2N 1 − e−2N s
206 graham coop
6e−04
4e−04
●
2e−04
●
●
0e+00
selection coeff., s
If s ≪ 1 but N s ≫ 1 then pF ( 2N 1
) ≈ s, which nicely gives us back
the result that we obtained above for an allele under strong selection
(eqn. (12.8)). Our probability of fixation (eqn. (12.12)) is plotted as
a function of s and N in Figure 12.7. To recover our neutral result,
we can take the limit s → 0 to obtain our neutral fixation probability,
1/(2N ).
10−3
10−4
10−5
tions. Thus, if consistent selection pressures are exerted over long time
0.2
and optimized the sequence of codons is for translation. Due to the CTT CTC CTA CTG TTA TTG
degeneracy of the protein code, multiple codons code for the same
Figure 12.9: Data from Drosophila
amino-acid. For example, there are six different codons that can code melanogaster on the frequency of
leucine. While these synonymous codons are equivalent at the protein different codons for Leucine. Data
from Genscript. Code here.
level, cells do differ in the number of tRNA molecules that bind these
codons and so the efficacy and accuracy with which proteins can be
formed through translation and folding. These slight differences in ●
particular amino-acids, see Figure 12.9, with the most abundant codon
0.0
●
●
matching the most abundant tRNA in cells. This ’codon bias’ likely ●●
●
−0.2
0 10 20 30 40 50 60
pushing the codon composition of the genome and tRNA abundances Expression Level
●
●
dN/dS
● ●
●
●
●
0.15
● ●
●
● ●
●● ●
● ● ●
● ● ● ● ●
●
●
0.10
●
● ● ●
●
●● ●●
●
● ● ● ● ●
0.05
● ●
This is very similar to the technique that we used when deriving our
probability of escaping loss in a very large population above.
So we need an expression for pF (p + ∆p). To obtain this, we’ll do a
Taylor series expansion of pF (p), assuming that ∆p is small:
dpF (p) d2 pF (p)
pF (p + ∆p) ≈ pF (p) + ∆p + (∆p)2 (p) (12.16)
dp dp2
Figure 12.13: Coquerel’s Sifaka
(Propithecus coquereli).
A hand-book to the primates (1894). Forbes,
H. O. Image from the Biodiversity Heritage
Library. Contributed by Smithsonian Libraries.
Licensed under CC BY-2.0.
210 graham coop
1 − e−2N sp
pF (p) = (12.19)
1 − e−2N s
This proof can be extended to alleles with arbitrary dominance, how-
ever, this does not lead to a analytically tractable expression so we do
not pursue this here.
13
The Effects of Linked Selection.
pN R = e−rτ /2 . (13.4)
● ●
0.8
●
● ●
surounding the dhfr gene in samples
of drug-resistant malaria (Plasmodium
0.6
●
●
falciparum) from Thailand. The dot-
ted horizontal line gives the average
0.4
● Observed
level of heterozygosity found at these
0.2
●●●●
our π0 . The dashed line shows our fit-
−200 −100 0 100 200 ted hitchhiking model from equation
Distance from dhfr (kb) 13.6 with τ ≈ 40, fitted by non-linear
least squares. The recombination rate
in P. falciparum is rBP ≈ 10−6 bp−1 .
Data from Nash et al. (2005). Code
here.
To get a sense of the physical scale over which diversity is reduced,
consider a region where recombination occurs at a rate rBP per base
pair per generation, and a locus ℓ base pairs away from the selected
site, such that r = rBP ℓ (where rBP ℓ ≪ 1 so we don’t need to worry
about more than one recombination event occurring per generation).
Typical recombination rates are on the order of rBP = 10−8 . In Figure
13.5 we show the reduction in diversity, given by eqn. (13.6), for two
different selection coefficients.
For our expected diversity level to recover to 50% of its neutral
expectation E(πr )/θ = 0.5, requires a physical distance ℓ∗ such that
log(0.5) = −rBP ℓ∗ τ , and by re-arrangement,
− log(0.5)
ℓ∗ = . (13.7)
rBP τ
As τ depends inversely on the selection s (eqn. (10.38)), the width
of our trough of reduced diversity depends on s/rBP . All else being
equal, we expect stronger sweeps or sweeps in regions of low recombi-
nation to have a larger hitchhiking effect. For example, in a genomic
region with a recombination rate rBP = 10−8 bp01 a selection coeffi-
cient of s = 0.1% would reduce diversity over 10’s of kb, while a sweep
of s = 1% would affect ∼100kb.
216 graham coop
Hard sweep. Multiple mutation soft sweep. Single mutation soft sweep. Figure 13.12: Three types of sweeps.
To what extent are patterns of variation along the genome and among
species shaped by linked selection, such as selective sweeps? We can
hope to identify individual cases of strong selective sweeps along the
genome, but how do they contribute to broader patterns of variation?
Two observations have puzzled population geneticists since the in-
ception of molecular population genetics. The first is the relatively
high level of genetic variation observed in most obligately sexual
species. The neutral theory of molecular evolution was developed in
part to explain these high levels of diversity. As we saw in Chapter
4, under a simple neutral model, with constant population size, we
population and
quantitative
genetics 219
should expect the amount of neutral genetic diversity to scale with the
product of the population size and mutation rate. The second obser-
vation, however, is the relatively narrow range of polymorphism across
species with vastly different census sizes (see Figure 2.3 and Leffler
et al. (2012) for a recent review). As highlighted by Lewontin
(1974) in his discussion of the paradox of variation, this observation
seemingly contradicts the prediction of the neutral theory that genetic
diversity should scale with the census population size. There are a
number of explanations for the discrepancy between genetic diversity
levels and census population sizes. The first is that the effective size
of the population (Ne ) is often much lower than the census size, due
to high variance in reproductive success and frequent bottlenecks (as
discussed in Chapter 4). The second major explanation, put forward
by Maynard Smith and Haigh (1974), is that neutral levels
of diversity are also systematically reduced by the effects of linked
selection. In large populations, selective sweeps and other forms of
linked selection may come to dominate over genetic drift as a source
of stochasticity in allele frequencies, potentially establishing an upper
limit to levels of diversity (Kaplan et al., 1989; Gillespie, 2000).
we find this implies νBP ≈ 10−12 . Thus, a really low rate of moder-
ately strong sweeps, roughly one every megabase every million gener-
ations, is all we need to explain the profound dip in diversity seen in
regions of the genome with low recombination. However, sweeps from
positively selected alleles are not the only cause of genome-wide signals
of linked selection. Selection against deleterious alleles can also drive
these patterns.
lotypes may be passed on to the next generation, but if they are fully
linked to the deleterious locus they will all eventually be lost because
they carry a deleterious mutation at a site under constraint. Thus, for
a neutral polymorphism completed linked to a constrained locus, only
2N (1 − µ/sh) alleles get to contribute to future generations. Therefore,
the level of pairwise diversity in a constant population due to BGS at
such a locus will be
E[π] = 2µ × 2N (1 − µ/sh) = π0 (1 − µ/sh) (13.14)
where π0 = 4N µ, the level of neutral pairwise diversity in the absence
of linked selection.
The effects of background selection are more pronounced in regions
of low recombination, where neutral alleles are less able to recombine
off the background of deleterious alleles. Thus, under background
selection, we also expect to see reduced diversity in regions of lower
recombination.
BGS: The balance between a steady flux of deleterious mutations and purifying
selection generates a stable partition of chromosomes in a population, depending on
how many deleterious mutations they carry. Chromosomes with deleterious mutations
will be eliminated relatively quickly from the population by purifying selection, but this
class is constantly replenished by new deleterious mutations. In the absence of
recombination, a new neutral mutation can remain in the population for a long period population and
of time and rise to high population frequencies only if it appears on a gamete that is
free of deleterious mutations, and hence is not destined to be rapidly eliminated. The
effect of this “background selection” against deleterious mutations is a reduction in the
quantitative
level of neutral polymorphism [61], as well a downward shift in their population
frequencies, because of the relative excess of short-lived (and hence low frequency) genetics 223
neutral mutations [63].
compounding equation (13.15) across these loci, the expected reduced Figure 13.15: Relative diversity
diversity is approximately compared to the mean diversity in
( L ) windows ≥ 0.01 cM as a function of
∏L ( ) ∑ the distance to the nearest gene. See
µsh µsh
E[π] = π0 1− ≈ exp (13.16) (Beissinger et al., 2016) for details.
i=1
2(rL + sh)2 i=1
2(ri + sh)2 Figure licensed under CC BY 4.0by
Jeff Ross-Ibarra.
To model an average neutral locus in a genomic region with a given
recombination rate, we can imagine that our neutral locus is situated
in the center of a large region with total recombination rate R and
total deleterious mutation rate U , where U = µL. Then our expression
for diversity, equation (13.16), simplies to
5
Syn diversity (%)
224 graham coop
4
3
2
1
0
0.0 0.5 1.0 1.5 2.0 2.5
For a first go at fitting this to genome-wide data, we could look rec rate (cM/Mb)
πS
0.0 0.5 1.0 1.5 2.0 2.5 0.00 0.05 0.10 0.15 0.20 0.25
Pushing this idea further, we can look at the dip in diversity sur-
rounding a non-synonymous substitution averaged across all the sub-
stitutions in the genome. Elyashiv et al. (2016) found a stronger
dip in diversity around non-synonymous substitutions than synony-
mous substitutions (see also Sattath et al., 2011a). Extending the
model of McVicker et al. (2009) to fit a model of background se-
lection and hitchhiking to putative neutral diversity along the genome,
they found that the dip in diversity around synonymous substitu-
tions comes mostly from BGS. But to fully explain the dip in diversity
around non-synonymous substitutions, a reasonable proportion of
these non-synonymous substitutions have to have been accompanied
by a classic (hard) sweep. The majority of these sweeps are estimated
to be due to very weak selection, with selection coefficients < 10−4 .
Furthermore, Elyashiv et al. (2016) estimated a 77 - 89% reduc-
tion in neutral diversity due to selection on linked sites, and concluded
that no genomic window was entirely free of the effects of selection.
Thus linked selection has a profound effect in some species such as
Drosophila melanogaster.
14
Interaction of Multiple Selected Loci
Thus, sex and genetic exchange are incredibly widespread. Yet sex
has substantial short-term costs.
The costs of sex. Three broad costs of sex have often been hypothe-
sized:
1. The cost of mating. Finding and attracting a mate are costly and
may be impossible, and mating can be dangerous.
3. The two-fold cost of sex (Smith, 1971). The offspring of sexual or-
ganisms have two parents. Therefore, sexual parents only contribute
half of their genome to their offspring. While asexual organisms
contribute their entire genome to the next generation. Thus a sex-
ual organism has to have twice as many children to leave the same
number of copies of their genome to the next generation. That
might be doable if both sexual parents were equally committed to
contributing to those offspring. However, that is rarely the case.
This cost is sometimes called the two-fold cost of males, as males
often provide little in terms of resources to their children. Thus
any allele that makes its host asexual should initially spread all else
being equal.
Yet sex and other forms of genetic exchange persist, despite these
short-term advantages to asexual reproduction. Indeed asexual lin-
eages often arise and spread within some sexual populations due to
these advantages.
This is the fitness of the ith haplotype averaged over all of the diploid
genotypes it could occur in, weighted by their probability under ran-
dom mating. Using this notation, and with some rearrangement of
equation (14.1), we obtain
x1 w̄1 − w14 rD
x′1 = (14.3)
w̄
Here we have assumed that w23 = w14 , i.e. that the fitness of AB/ab
individuals is the same as Ab/aB individuals (i.e. that fitness de-
pends only on the alleles carried by an individual, and not on which
chromosome they are carried; this assumption is sometimes called no
cis-epistasis).
We can then write the change in the frequency of our 1 haplotype
as
x1 (w̄1 − w̄) − rw14 D
∆x1 = (14.4)
w̄
Generalizing this result, we write the change in any haplotype i from
our set of four haplotypes as
xi (w̄i − w̄) ± rw14 D
∆xi = (14.5)
w̄
where the coupling haplotypes 1 and 4 use +D and repulsion haplo-
types 2 and 3 use −D. Note that the sum of these four ∆xi is zero, as
our haplotype frequencies sum to one.
So the change in the frequency of a haplotype (e.g. AB, haplotype
1) is determined by the interplay of two factors: First, the extent
to which the marginal fitness of our haplotype is higher (or lower)
than the mean fitness of the population (the magnitude and sign of
(w̄1 − w̄)/w̄). Second, whether there is a deficit or any excess of our
haplotype compared to linkage equilibrium (the magnitude and sign of
D), modified by the strength of recombination. This tension between
selection promoting particular haplotypic combinations, and recom-
bination breaking up overly common haplotypes is the key to a lot of
interesting dynamics and evolutionary processes.
1.0
1.0
aB B arises on the background of a
neutral allele whose initial frequency
0.8
0.8
0.8
aB aB is pA = 10%. The beneficial allele has
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
ab ab ab a strong, additive selection coefficient
AB of hs = 0.05.
0.4
0.4
0.4
0.2
0.2
0.2
AB
Ab aB Ab Ab AB
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0 100 200 300 400 500
Generations Generations Generations
AB
AB
1.0
1.0
deleterious allele. The beneficial
allele B arises on the background of a
0.8
0.8
0.8
deleterious allele A, and the extent to
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
which the A allele gets to hitchhiking
along depends on the recombination
0.4
0.4
0.4
rate. Code here.
0.2
0.2
0.2
sA =0.08
0.0
0.0
0.0
0 100 200 300 400 500 0 100 200 300 400 500 0
s100B =0.06
200 300 400 500
Generations Generations Generations
sAB=0.14
1.0
1.0
aB r=0 aB r=0.001
0.8
0.8
ab
ab
Frequencies
Frequencies
0.6
0.6
AB
0.4
0.4
Ab
Ab
0.2
0.2
0.0
0.0
0 200 400 600 800 1000 0 200 400 600 800 1000
Generations Generations
1.0
1.0
aB aB
ab
0.8
0.8
0.8
ab
ab
Frequencies
Frequencies
Frequencies
0.6
0.6
0.6
AB
0.4
0.4
0.4
AB
0.2
0.2
0.2
AB
Ab Ab
0.0
0.0
0.0
0.40
● sex
asex
●
0.35
●
●
dN/dS
0.30
●
● ●
Figure 14.10: dN/dS calculated on sex-
ual (circles) and asexual (diamonds)
0.25
sence of the yellow band. The second locus segregates for B/b where
B encodes for the bottom-wing being black, and b for the absence of
black on the bottom wing. If Y is recessive and B is dominant, then
the silvana phenotype corresponds to a YY bb genotype. Due to the
dominance of the y and B alleles the bicoloratus phenotype can be
achieved by various genotypes (Yy Bb, yy BB, Yy BB, yy Bb). Lets
assume that both of these phenotypes offer an advantage as they
mimic a M. menophilus model. But there are also genotypes that
don’t do as well; YY BB individuals have a yellow band and a black
bottom and so don’t do a great job mimicking anything and so will
be eaten. Thinking about the four possible haplotypes, y-B has high
marginal fitness as due to its combo of dominant alleles it’ll always
produce a bicoloratus phenotype. Likewise the Y-b haplotype has high
marginal fitness, as it does well in the homozygous state (silvana phe-
notype), and when it is paired with the y-B allele. However, the Y-B
and y-b haplotypes fair less well as they carry two alleles that don’t
work well with each other and so are often individuals who suffer high
rates of predation.
If no recombination occurs between these loci (r = 0, Figure 14.13),
then the Y-B and y-b are selected out of the population, and the y-B
and and Y-b can be stably maintained. However, when there’s too
much recombination between our loci (e.g. r = 0.4, Figure 14.13) the
high-fitness haplotypes keep getting ripped apart by recombination
and the Y-b is lost from the population as it’s recessive advantage is
lost as it’s too often being broken up by recombination in heterozy-
gotes. “coadapted combinations
of several or many genes
14.3.8 Supergenes to the rescue! locked in inverted sections of
chromosomes and therefore
So our polymorphisms can only be maintained if they are tightly inherited as single units.”
linked, i.e if these alleles arose at loci that are genetically close to Dobzhansky (1970) on
each other. But how is it possible that these alleles arose close to supergenes.
238 graham coop
0.8
Y-b two different recombination regimes.
Frequencies
The model has negative frequency
0.4
dependent selection acting to increase
H.n.f. silvana
y-B the frequency of the mimicry morph
Y-b y-b
0.0 that is rarer in the population. While
Y-b
0 20 40 60 80 100 all individuals with genotypes cor-
Generations responding to a mixed phenotype,
Y-B r=0.4 e.g. YY BB, have very low fitness as
0.8
b
Y-
Frequencies
Generations
each other? Well the trick is that they don’t necessarily have to arise
very close to each other. If such a system is polymorphic but being
regularly broken up by recombination, a chromosomal inversion–the
flipping around of a whole section of chromosome– can arise and will
suppress recombination. Imagine that our two loci are far apart ge-
netically, and a chromosomal inversion arises on the Y-b background
forming the b-Y haplotype. This inverted haplotype will not recom-
bine with the y-B haplotype when it is present in a heterozygote, thus
it is not broken down by recombination. This inverted haplotype,
which enjoys the fitness benefits of the Y-b, can therefore replace the
Y-b haplotype in the population. The two other low fitness haplotypes
will disappear as they sre no longer being generated by recombination,
leaving just the y-B and b-Y. The polymorphism system now behaves
like alleles at a single locus, a super gene (e.g. like r = 0 in Figure
14.13).
Now the H. numata system is vastly more complicated than our
toy two locus system, presumably involving many changes and re-
finements, but the same principle holds (Joron et al., 2011). The
differences between the different H. numata mimmicy morphs is found
on a single chromosome, and the inheritance behaves as if controlled
by a single locus (albeit with many alleles). The H. n. f. silvana in-
dividuals carry a recessive haplotype of alleles that which is known to
be locked together by a ∼ 400kb inversion, that is a different chromo-
somal orientation from the bicoloratus allele (haplotype) which acts as
population and
quantitative
genetics 239
a dominant allele. Other alleles at this same chromosomal region pro- Figure 14.14: Left) A coastal peren-
nial and an Inland annuals Mimulus
vide the genetic basis of the other morphs, and sometimes correspond gutatus Lowry and Willis (2010),
to further inversions with a range of dominance relationships. image from Lowry and Willis
(2010) licensed under CC BY 4.0.
Right) A reciprocal transplant exper-
Inland annual ● iment showing that coastal perennial
Coastal perennial
and an Inland annuals are locally
12
adapted to their respective habitats.
10
Data from Lowry and Willis
(2010), Code here..
8
Fitness
6
●
Coastal Pop.
4
A B
Co 1 1
2
In (1 – s) (1 – s)
●
●
0
Coastal Inland
m m
Local Adaptation, Speciation, and Inversions. Inversions have long Inland Pop.
been thought to play an important role in local adaptation and speci- A B
Co (1 – s) (1 – s)
ation. One example of an inversion underlying local adaptation occurs In 1 1
in Mimulus gutatus, in Western North America, where there are an-
nual and perennial ecomorphs with very different life history strategies
Figure 14.15: A two locus, two pop-
(see Figure 14.14). The perennial form grows in many places along ulation migration-selection balance
the Pacific coast, and in other places with year around moisture; it system. Two loci A and B segregate
for an Inland and Coastal adapted
invests a lot of resources in achieving large size and laying down re-
alleles.
sources for the next year, and as a result flowers late. The annual form
grows inland, e.g. the California central valley, where it has to invest
all its effort in flowering rapidly before the long, hot, dry summer.
Neither ecomorph does well in the other’s environment. The perenni-
als get crisped before they have a chance to flower, while the annuals
suffer from high rates of herbivory and cannot tolerate the salt spray.
Lowry and Willis (2010) found that large inversion controled
a lot of of the phenotypic variation in flowering time and a range of
other morphological differences between these two morphs. They also
showed that the inversion controled a reasonable proportion of the dif-
ferences in fitness in the field, consistent with it underlying the fitness
tradeoffs involved in local adaptation.
Why would an inversion be involved in locking together local
adapted alleles? The basic idea, like above, is an inversion can be
selected for we have two (or more) loci segregating for locally adapted
alleles (Figure 14.15). Locally advantagous haplotypes are in danger
of being broken up by recombination with maladapted haplotypes,
which are constantly being introduced into each population by mi-
gration from the other. If an inversion arises that locks these alleles
240 graham coop
together in one population, it can be selected for as does not suffer the
ill effects from recombination with migrating maladaptive haplotype.
Sex chromosomes, under this hypothesis, are super genes locking to-
gether sex determination and sexually-antagonistic alleles. Our male-
beneficial, female-detrimental alleles work well on the background of
the male-determining allele and poorly off it, that’s exactly the su-
https://commons.wikimedia.org/wiki/File:Labe
pergene setup we encountered in Section 14.3.8. This sketch can be
https://www.flickr.com/photos/52993488@N0 otropheus_fuelleborni_in_Botanic_garden_in_
flipped to describe the evolution of ZY systems.
3/4890217915 Teplice_(2).JPG
ob
OB
A.1 Calculus
0.2 0.4
tion, f (x) = x − (5/6)x3 − (1/3)x4 ,
and Bottom) its derivative
f ′ (a) = 1 − 3(5/6)x2 − 4(1/3)x3
f(x)
Code here.
−0.2
−0.6
A B C D
1
0
f′(x)
−1
−2
−3
2
0
−2
f″(x)
−4
−6
−8
0.4
proximation of f (x) at x = a is given by ●
0.2
f (x) ≈ f (a) + f ′ (a)(x − a) for x near a (A.1)
0.0
Returning to our car example, this corresponds to trying to guess
f(x)
−0.2
the position of the car extrapolating from its current location and ●
−0.4
speed, i.e. isn’t accelerating or deccelerating too fast. ●
−0.6
Two common first-order Taylor approximations that we’ll encounter
throughout the notes are −2.0 −1.5 −1.0 −0.5 0.0 0.5 1.0
x
( L )
●
∏L ∑ ●
0.2
parabola instead of a line (see Figure A.4). This is often useful when
examining the effects of stochasticity on some process. These second- −2.0 −1.5 −1.0 −0.5
x
0.0 0.5 1.0
A.1.2 Integrals
∫b
Regarding integrals a f (x) dx, just remember that they represent the
signed area “under” the graph of y = f (x) over the interval [a, b].
A.2 Probability
for example, we could ask the probability that a baby was born some-
where between midnight and 12.18am.
Dog Cat
egg. There are 100 eggs in the box, 57 of them are empty. Forty three
have a toy in them. There are eggs with a stuffed dog toy, eggs with
a cat toy, eggs with a lizard toy, eggs with both a dog and cat toy in
them. The counts of each type of egg are shown in Figure A.5.
13
Question 1. You reach into the box and pull out one egg:
i) For each egg type (dog alone, cat alone, lizard, dog+cat, and no
Lizard
prize), what is the probability that you get an egg of that type? What
do these probabilities sum to? Figure A.5: Venn diagram of fair-
ground game toys, there are a hun-
ii) What’s the probability of getting an egg with a dog? What is dred eggs in total, including 57 eggs
the probability of getting an egg with a dog in it or an egg without a with no prize that are not shown.
Code here.
dog in it.
iii) What’s the probability of getting an egg with a dog in it or an
egg with a lizard.
∑
L ∑
L
P(A) = P(A & Bi ) = P(A|Bi )P(Bi ) (A.12)
i=1 i=1
P(A|B)P(B)
P(B|A) = (A.16)
P(A)
µ = E[X] = p1 x1 + p2 x2 + · · · + (A.17)
The average outcome 1 over a set of independent events is an esti- According to Pascal, the expectation
mate of the mean µ̂, where the hat denotes that it is an estimate. A is the excitement a gambler feels when
placing a bet i.e. each term in the
more precise interpretation of the relationship between the average sum equals the probability of winning
and the expectation is given by the law of large numbers described times the amount won. Apparently
Pascal knew some unusually rational
below. For a continuous random variable, gamblers.
∫ 1
Recalling that we compute average,
E[X] = x p(x) dx. (A.19) the sample mean, of a set of numbers
X1 , · · · , XL as
1 ∑
L
c2 =
σ (Xi − X̄)2 (A.23)
L − 1 i=1
Note that the units of our variance will be the units of X 2 , e.g.
if X is height measurements in cm the variance will have units cm2 .
252 graham coop
One reason that the standard deviation is a more intuitive than the
variance is that its units are the same as X, e.g. cms.
Another important choice of f is f (x) = log x. Provided that X
is positive, exp(E[log X]) corresponds to the geometric mean of X.
Alternatively 1/E[1/X] corresponds to the harmonic mean of X.
∑
L
E[X|Y = y] = xi P(X = xi |Y = y) (A.26)
i=1
∑
M
E[X] = E[X|Y = yj ]P(Y = yj ) (A.27)
j=1
0.4
●
this is the law of total expectation, the analog to the law of total ●
0.3
Probability
probability (eqn (A.12)). We can write this law more generally as ●
[ ]
0.2
● ●
●
0.1
pectation over Y .
●
● ●
0.0
● ● ●
● ● ● ● ● ● ●
0 2 4 6 8 10
i
0.12
● ● p = 0.1
●
● p = 0.5
●
p = 0.7
Probability
●
0.08
Important discrete random variables include ●
●
●
●
●●●
● ●
●
●
●
● ●
0.04
● ●
● ●
● ●
●
●
●
● ●
●
●
●
0.00
● ●
●●
●●●●●●●●●●●●●●●●●● ●● ●●
●●● ●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●●
case,
n! Figure A.6: Binomial distribution for
pi = pi (1 − p)n−i 0 ≤ i ≤ n. (A.28) a sample of n = 10 and n = 100, the
i!(n − i)!
vetical lines show the means np. Code
For a binomial random variable, E[X] = np and σ 2 = np(1 − here.
p). Examples are shown in Figure A.6, Note how the mass of the
distribution becomes more centered on the mean for larger sample
√
sizes, as the standard deviation increases only as n. Another
way that we can write that our observation i is drawn from the
binomial distribution is i ∼ Binomial(p, n), where i ∼ is read as
“i is distributed as” (we will use the notation as short hand for
random variable in the notes).
●
● p = 0.9
● p = 0.5
0.8
p = 0.1
Geometric random variables count the total number of flips X before
0.6
case,
0.4
● ● ●
● ● ● ● ● ● ● ● ● ● ● ● ● ●
p = 1/2 we wait two flips for a head on average while if the coin-flip 5 10 15
i
is very biased against heads p ≪ 1 we can be waiting a very long
time. The variance of a geometric random variable is σ 2 = 1−p/p2 , Figure A.7: Geometric distribution for
different probabilities of success (p).
which means that the mass of the distribution is much more spread The vertical lines show the means 1/p.
out if we consider the waiting time for rare events. See Figure A.7 Code here.
for examples of the distribution.
0.3
pi = λ e /i! (A.30)
Probability
For a random Poisson variable E[X] = λ.
0.2
●
● ●
●
●
The form of this is less intuitive than that of the binomial. How-
0.1
●
● ●
● ●
0.0
● ● ●
● ● ● ● ● ● ● ● ● ●
of coin flips (n → ∞), but you’ve set the chance of heads on a single 0 5
i
10 15
1 ∑
L
σd
2
XY = (Xi − X̄)(Yi − Ȳ ) (A.41)
L − 1 i=1
this captures the extent to which two sets of numbers covary. For
example, the running speeds of kids in a race at age 8 and 9 positively
covary. Example datasets are shown in Figure A.9.
4
● ● ●
0.15 0.82 ●
−0.56
● ●●
where pairs of variables show varying
4
● ●
●
● ● degrees of covariance, the sample
3
●●
3
● ● ● ●
correlation (ρ[
●
XY ) is shown in the top
●
3
● ●
● ● ● ●●
● ● ●
●
●●
● ● ●●
● ●
● ●
●
● ● ●
● ●●
● ● corner. Code here.
2
● ●
●●
Y
●
2
2
● ●
● ● ● ● ●
● ● ● ● ●
● ● ● ●
● ●
● ● ● ●
●● ●
● ● ● ● ● ● ● ●
● ● ●
● ● ● ●
●
1
● ●
1
●
● ●●
1
● ● ●
● ●
●
● ● ●
● ● ●
0
● ●
0
●
● ● ●
0
3 4 5 6 7 3 4 5 6 7 4 5 6 7
X X X
weight from their height. See Figure A.10 for an example plot. To do ●
3.5
●
●
2.5
●
2.0
● ●
●
●
●
Yi ≈ a + bXi
1.5
●
(A.43) ●
● ●
Var(X)= 1.3
1.0
● Cov(X,Y)= 0.47
●
●
Slope= 0.36
3 4 5 6 7
What is the best fitting line? Well one common definition of the X
optimal fit is the choice of a and b that minimize the squared error
between the observed (Y ) and their predicted values, i.e. Figure A.10: An example of a linear
regression with best fitting least-
∑
L squares line. The sample variance and
(Yi − a − bXi )2 (A.44) covariance are given, so that you can
i=1 see for yourself that the best fitting
slope is just the ratio of these two.
Code here.
population and
quantitative
genetics 257
here (Yi − a − bXi )2 is the squared residual error, the square of the
length of the dotted lines in Figure A.10. The best fitting slope, i.e.
that with least squared error, is
b = σd
2 c
XY /σX
2 (A.45)
Yi ∼ a + bXi + ϵi (A.46)
where the errors (ϵi ) are uncorrelated across data points with an ex-
pectation of zero and constant but unknown variance. These assump-
tions would hold for example if ϵi ∼ Normal(0, σ).
We often want to include additional terms in our regression, or have
more complicated error structures, but these extensions can usually be
understood as simple extensions of this machinery. For example, least-
squares can also be used to fit a non-linear function of X, f (X, Ω),
where we minimize
∑L
(Yi − f (Xi ; Ω))2 (A.47)
i=1
Useful Limits.
Law of Large Numbers If X1 , X2 , . . . are a sequence of independent
random variables (i.e. “the outcomes of a sequence of independent
258 graham coop
X1 + · · · + Xn
→ µ as n → ∞ with probability one. (A.49)
n
Hence, LLN implies that if you repeat a bunch of experiments and
take the average outcome (X̄) from the experiments, the value you get
is likely to be close the expected outcome of the experiment.
Of course, in the real world, we can only perform a finite number
of experiments in which case it is useful to have a sense of how much
variation there will be in the average outcome. The central theorem is
the key tool for understanding this variation.
Darwin, C., 1876 The effect of cross and self fertilization in the
vegetable kingdom: Murray. London, UK.
Ewens, W. J., 2010 What is the gene trying to do? British Jour-
nal for the Philosophy of Science 62(1): 155–176.
Ewens, W. J., 2016 Motoo Kimura and James Crow on the In-
finitely Many Alleles Model. Genetics 202(4): 1243–1245.
Lister, A., 1989 Rapid dwarfing of red deer on Jersey in the last
interglacial. Nature 342(6249): 539.
Ohta, T., 1987 Very slightly deleterious mutations and the molecu-
lar clock. Journal of Molecular Evolution 26(1-2): 1–6.
Tajima, F., 1989 Statistical method for testing the neutral muta-
tion hypothesis by DNA polymorphism. Genetics 123(3): 585–595.
Yang, Z., 1998 Likelihood ratio tests for detecting positive selection
and application to primate lysozyme evolution. Molecular Biology
and Evolution 15(5): 568–573.