Population Genetics Lecture Notes-2016 Biology
Population Genetics Lecture Notes-2016 Biology
Population Genetics Lecture Notes-2016 Biology
Nairobi, KENYA
May 30-June 10, 2016
SELECTION THEORY
BREEDING STRATEGIES
University of Georgia
Athens, GA 30602, USA
saggrey@uga.edu
Preface
This lecture notes was written in an attempt to cover parts of Population Genetics,
Quantitative Genetics and Molecular Genetics for postgraduate students and also
as a refresher for field geneticists. The course material is not a text book and not
meant to be copied, duplicated or sold. This text is unedited and I am solely
responsible for all conceptual mistakes, grammatical errors and typos. Genetics is a
life-long course and cannot be covered in a few lectures. Only selected parts of the
population- and quantitative-, and molecular genetics will be covered in this course
because of time constraints. This course will cover some of the evolutionary
changes in allele frequency between generations such as natural selection and gene
flow, and some aspects of Quantitative and Molecular Genetics.
To those men who have kept us awake for over two centuries and I believe would
continue to do so for many more centuries!
POPULATION GENETICS
The study of composition of biological populations, and changes in genetic
composition that result from operation of various factors including (a) natural
selection, (b) genetic drift, (c) mutations and (d) gene flow
Single locus:
Locus A with two alleles A1 and A2
p =P +H
q =Q +H
1
4. Meiosis is fair. We assume that there is no segregation distortion, no gamete
competition, no differences in the developmental ability of eggs or fertilizing
ability of sperms
5. All mating produce the same number of offspring, on average.
Thus, frequency of a particular genotype in the pool of newly formed zygote is:
(frequency of mating) (frequency of genotype produced from mating)
Hardy-Weinberg Law
In a large random mating population in the absence of mutation, migration,
selection and random drift, allele frequency remains the same from generation
to generation. Furthermore, there is a simple relationship between allele
frequency and genotypic frequency
2
Fig. 1 shows the relationship
between allele frequency and three
genotypic frequencies for a
population under Hardy-Weinberg
proportions:
1. The heterozygote is the most
common genotype for intermediate
allele frequencies
2. One of the homozygotes is
the most when the allele frequency
is not intermediate
3. Only of the time when q is between and , is the heterozygote the most
common genotype
4. When q is between 0 and A1A1 is the most common, and when q is
between and 1, A2A2 is the most common.
5. The maximum frequency of the heterozygote occurs when q=0.5
This can be shown directly by setting the derivatives of the H-W heterozygosity,
2pq=2q(1-q), equal to zero and solving for q or
d[2q(1 q)
= 2 4 = 0
Here, we assume that the generations are non-overlapping, i.e. the parents die after
producing progeny, and the progeny then become the next parental generation.
3
then use that to generate the expected genotypic frequencies. We can compute the
chi-square statistic as:
2
( )2
=
Where O and E are the observed and expected number of a particular genotype and
n is the number of genotypic classes. From the calculated value of X2 and the table
value of X2 we can obtain the probability that the observed numbers deviates from
the expected numbers. The degrees of freedom used to determine the significance
of X2 value are equal to the number of genotypic classes, n, minus one, then minus
the number of parameters estimated from the data. One degree of freedom is
always lost because we use the data to estimate allele frequency. We can use the
chi-square distribution to test whether the value of X2 is too large to be the result of
sampling error. In doing so we are performing a one-tailed test. The chi-square
expression for two alleles is given as:
2
(11 p2 N)2 (12 2p q N)2 (22 q2 N)2
= + +
pN 2pq N q2 N
2
= =1
2 2
It can be shown that
2 = 2
For two alleles, the Chi-square good of fit test for Hardy-Weinberg proportions is
equivalent to the test for inbreeding, F=0. However, F is unstable as the expected
(E) value approaches zero, and therefore not useful for rare and very common
alleles. For E=0, O>0, F=-, and for E=0, and O=0, F is undefined. Deviation
from Hardy-Weinberg proportions can also be tested using the likelihood ratio test
which is described in most statistical texts.
4
The B/b locus is responsible for plumage color in chickens found in the Rift
Valley. The B allele expresses black plumage which is completely dominant over
the b allele for brown plumage.
Phenotype Genotype Observed number Expected number
Black BB 290 p2N=289.444
Black Bb 496 2pq=497.112
Brown bb 214 q2N=213.444
Total 1,000 1,000
P=290/1000=0.29; H=496/1000=0.496; Q=214/1000=0.214; P+H+Q=1.0
p=P+H = 0.29+(0.496)=0.538; q=Q+H = 0.214+(0.496)=0.462; p+q=1.0
Note: Chi-square is allergic to fraction and ratios, but really likes integers!
2
(290 289.444)2 (496 497.112)2 (214 213.444)2
= + + = 0.0050
289.444 497.112 213.444
0.496000
=1 =1 = 0.002237
2 0.497112
2 = 2 = 0.0050
5
Extension of Hardy-Weinbergs Law: Multiple Alleles
Let us consider a single locus with three alleles A1, A2 and A3 with frequencies, p,
q and r, respectively.
Please note that, + + = 1, and they key to solving multiple alleles is to break
in order for the problem to resemble a two allele problem
33
(33) = 2 =
33
=
From here, lets reduce the problem to a two allele locus involving the allele, A3
Expected genotypes under H-W: A2A2, A2A3 and A3A3 with expected frequency
22 +23 +33
2 + 2 + 2 = .
From basic algebra: ( + )2 = 2 + 2 + 2 .
This implies: ( + )2 = 2 + 2 + 2
22 +23 +33
Therefore: ( + )2 =
6
22 +23 +33
+=
22 +23 +33 33
=
Since, + + = 1, then = 1 ( + )
22 +23 +33
=1
The ABO blood group in humans is determined by three alleles, A, B and O.
Allele/ A B O
frequency p q r
A AA AB AO
p p2 pq pr
B AB BB BO
q pq q2 qr
O AO BO OO
r pr qr r2
In the year 1825, the director general of ILRI-Musastan ordered a staff nurse to
collect blood samples of all capacity building course participants. Of the 1,825
individuals sampled, 700 were type A, 250 were type B, 75 were type AB and 800
were type O. Determine the frequency of the A, B and O alleles.
Hint:
Phenotype Genotype H-W Expectation Number
A AA + AO p2+2pr 700
2
B BB + BO q +2qr 250
AB AB 2pq 75
2
O OO r 800
7
Natural Selection at One Locus
Differential viability and fertility
Natural selection occurs when some genotypes in a population have
differential survival, fertility or reproduction. In this case, we multiply each
genotypes frequency by its fitness, where fitness is a reflection of the genotypes
probability of survival and its relative participation in reproduction. Assuming a
single autosomal locus population with two alleles A1 and A2 with three diploid
genotypes A1A1, A1A2 and A2A2 and different fitnesses denoted w 11, w12 and
w22, respectively. Unless w11, w12 and w22 are all equal, then natural selection will
occur, possibly leading the genetic composition of the population to change.
Before the operation of natural selection (generation 0), the genotypes are in
Hardy-Weinberg equilibrium and the frequency of A1 and A2 alleles are p0 and q0,
respectively (p0 + q0 = 1). The genotypes of generation 0 produces progeny that
becomes generation one with frequency of A1 and A2 denoted by p1 and q1,
respectively (p1 + q1 = 1). In both generations, the allele frequency is considered at
the zygote stage and may different from adult allele frequency if there is
differential viability.
[02 22 + 12(20 0 12 )]
1 =
2
0 + 0 0 12
1 = 22
[1]
8
Equation [1] is known as a recurrence equation, as it expresses the frequency of
the A1 allele f generation 1 in terms of its frequency in generation 0. The change in
frequency between generations can then be written as:
= 1 0
02 22 + 0 0 12
= 0
02 22 + 0 0 12 0
=
If we substitute w from Table 3, ( = 1 ), and simply the equation above to:
12 + 2 22 (2 11 + 212 + 2 22 )
=
(22 12 + 2 11 + 12 )
=
[(22 12 )(11 12 )]
= [2]
Equations [1] and [2] show, in precise terms, how fitness differences between
genotypes will lead to evolutionary change. If q =0 then no allele frequency
change has occurred and the population is in allelic equilibrium. It is worth
mentioning that q =0 does not mean that no natural selection has occurred. The
condition for that is w11=w12=w22. It is possible for natural selection to occur and
have no effect on allele frequency.
Directional selection
If q > 0, then natural selection has lead the A2 allele to increase in frequency; if
q < 0 then natural selection has led the A1 allele to increase in frequency. If
w11>w12>w22, then A1A1 genotype will be fitter than A1A2, which in turn is fitter
than A2A2; in which case q must be
negative (so far as neither p nor q is
0). At each generation, the frequency
of A1 allele will be greater than in the
previous generation until it eventually
reaches fixation and the A2 allele is
eliminated from the population. Once
A1 reaches fixation (p=1 and q=0) no
further evolutionary changes will
occur. In this case, the A1 allele
confers a fitness advantage on the genotypes that carry it, and its relative frequency
in the population will increase from generation to generation until it is fixed. The
opposite fixation (A2) is true when w22>w12>w11. Table 4 illustrates numerical
9
example of directional natural selection. Fig. 2 illustrates allele frequency under
Hardy-Weinberg proportions where there is no differential viability,
w11=w12=w22=1.0 and the average fitness w=1.0 from generation to generation.
Assuming w22=0.4 as in Table 4, allele frequency of A1 increases and A2
decreases non-linearly until they get into fixation as illustrated in Fig 3. Ultimately,
the population will be monomorphic for the homozygote genotype with the highest
fitness.
Stabilizing selection
An interesting situation arises when the heterozygote is superior in fitness to the two
homozygotes. In this case, w11<w12>w22, and
what happens in this situation is that, an
equilibrium situation is reached with both alleles
present in the population. Since q must be non-
negative, this condition can be satisfied only
there is heterozygote superiority or inferiority-a
condition also known as heterosis. In this case,
natural selection produces heterogeneity and
preserves gene variation. Unlike directional
selection, stabilizing or balancing selection tends
to keep both alleles in the population and each
allele is balanced and converges at a
polymorphic equilibrium (Fig 4).
Disruptive selection
Under disruptive selection (w11>w12<w22), the
heterozygote has a lower relative fitness
compared to the two homozygotes. Viability
selection may lead either to an increasing
frequency of A1 allele or to its decreasing
frequency. In the long run, the population will be
monomorphic for one of the homozygous
genotypes (Fig 5). The population converges to
fixation.
10
Coefficient of selection
The speed with which allele or genotype frequency changes, is driven by the
relative fitness for each allele or genotype. Fitness (w11, w12 and w22) is a relative
value, usually measured in comparison with the most-fit allele/genotype in the
population. Selection coefficient, s, measures the reduction in fitness for a selected
allele or genotype compared to the most-fit allele/genotype in a population.
Selection against an allele may operate either through reduced viability or reduced
fertility or reduced mating ability or different combinations of the three. Therefore,
allele frequency needs to be deduced from the zygote stage of the parent generation
to the zygote stage of the progeny generation. The coefficient of selection
measures the proportionate reduction in gametic contribution of a genotype
compared to the most-fit genotype. The contribution of the most fit genotype is
taken to be 1, and the contribution of the genotype selected against is 1 - s. If the
selection coefficient for a genotype is 0.60; the fitness is then 0.4, which means
that for every 100 zygotes produced by the most-fit genotype, only 40 are
produced by the genotype selected against.
Dominance
To explore the effects of dominance, we can specify the fitnesses using two
parameters; one representing the difference in fitness between the two
homozygotes and the second to represent the degree of dominance, h (fitness of the
heterozygote. Let,
w11 = 1
w12 = 1 - hs
w22 = 1 - s
The parameter h together with s determines the fitness of the heterozygote.
a. If h = 0, the heterozygote has fitness 1, the same as the A1A1 homozygote:
the A1 allele is completely dominant.
b. Conversely if h= 1, the fitness of the heterozygote is the same as that of the
A2A2 homozygote (1-s): the A2 allele is completely dominant.
c. If 0 < h< 1, the heterozygotes fitness is somewhere between those of the
homozygotes: there is incomplete dominance.
d. If h= exactly, the alleles have additive effects: the heterozygote fitness is
the average of the two homozygotes fitnesses.
e. If h< 0, the heterozygotes fitness is greater than 1, and thus greater than that
of the A1A1homozygote; this is called overdominance.
f. Similarly, if h> 1, the heterozygote has lower fitness than the A2A2
homozygote (and of course also the A1A1 homozygote); this is
underdominance.
11
Table 5 Fitness values for different fitness relationships
A1A1 A1A2 A2A2
General fitness w11 w12 w22
Recessive lethal 1 1 0 No dominance, selection against A2A2
Detrimental allele 1 1 1-s No dominance, selection against A2
Dominance 1 1-hs 1-s Partial dominance of A1, selection against A2
Dominance 1 1 1-s Complete dominance of A1, selection against A2
Dominance 1-s 1-s 1 Complete dominance of A1, selection against A1
Heterozygote advantage 1-s1 1 1-s2 Overdominance, selection against A1A1 & A1A2
Heterozygote disadvantage 1+s1 1 1+s2 Underdominance, selection against A1A2
Lethal alleles
These are alleles that cause an organism to die only when present in the
homozygote state. If the mutation is caused by a dominant lethal allele, the
heterozygote for the allele will show the lethal phenotype, the homozygote
dominant is impossible. If the mutation is caused by a recessive lethal allele, the
homozygote for the allele will have the lethal phenotype. Most lethal genes are
recessive. Many lethal alleles prevent cell division and kill an organism at an early
age. Some lethal alleles exert their effect later in life, e.g. Huntington disease
characterized by progressive degeneration of nervous systems, dementia and early
death between 30-50 years.
Dominant lethal alleles: They modify the Mendelian 3:1 ratio to 2:1. The organism
dies before they can produce progeny, so the mutant dominant allele is removed
from the population in the same generation it arose. Fully dominant lethal alleles
kill the carrier in both homozygous and heterozygous states. Huntingtons disease,
creeper legs (short and stunted) in chicken are a dominant lethal where the
homozygote does not survive.
Recessive lethal alleles: The recessive lethal kills the carrier individual only in the
homozygous state. They maybe in two kinds: (1) one which has no obvious
phenotypic effects in the heterozygotes, and (2) on which exhibits a distinctive
phenotype in the heterozygous state. In many cases, lethal alleles become operative
at the onset of sexual maturity. Examples of recessive lethal in cattle are:
osteopetrosis (Angus and Red Angus), pulmonary hypoplasia and anasarca (PHA)
(Shorthorn). In humans, common examples are cystic fibrosis (poorly functioning
Cl ion transport proteins to the lungs), Tay-Sachs disease (enzyme unable to break
down specific membrane lipids), sickle cell anemia and brachydactyly. The
relative fitness for a recessive lethal is presented in Table 5.
12
A1A1 A1A2 A2A2 Total
Initial frequency p2 2pq q2 1
Fitness 1 1 0
Gametic contribution p2 2pq 0 = (1 + )
2 +
22 12
From Equation 1, 1 =
The average fitness, w, under recessive lethal is: = (1 + )
Therefore, 1 = = [3]
(1+) 1+
0 02
= 1 0 = 0 =
1 + 0 1 + 0
The mean fitness reaches 1 when the population is fixed for A1. The relationship
given for q is a recursive relationship. The allele frequency at any time t+1 is a
function of the frequency at time t, or
+1 =
1 +
1
2 =
1 + 1
When we substitute the value of q1 from equation 3 in this expression, it becomes:
0
2 =
1 + 20
This relationship can be generalized to give the frequency in generation t as a
function of the frequency at generation 0:
0
=
1 + 0
Since there are no recessive homozygotes, the maximum allele frequency possible
is 0.5 in all heterozygotes. Fig 6 demonstrates the expected decline in frequency of
recessive lethal allele at two frequencies. When the frequency of allele frequency is
high, the allele frequency is reduced very quickly.
13
Selection against recessives
A1A1 A1A2 A2A2 Total
Initial frequency p2 2pq q2 1
Fitness 1 1 1-s
Gametic contribution p2 2pq 2
q (1-s) w=1-sq2
2 + 12
From Equation 1, From Equation 1, 1 = 22
When selecting against recessives, w12=1, w22=1-s, and w is 1-sq2
Therefore, q1 can be written as:
2 (1 ) +
1 =
1 2
(1 )
=
1 2
The change in frequency of A2 is therefore given as:
2 (1 )
=
1 2
Both the average fitness and change in allele frequency are functions of the allele
frequency and the selection coefficient. Selection against recessive alleles is very
efficient at first, but becomes progressively slower because a sizeable proportion of
the recessive allele is part of the heterozygotes as allele frequency decreases.
Therefore, natural selection alone cannot entirely eliminate the recessive allele
even if it is lethal.
14
More than one locus Linkage and linkage disequilibrium
Under random mating alleles at all autosomal loci combine at random to
form genotypes to attain equilibrium under Hardy-Weinberg law. The basic
assumption here is that transmission of alleles at a given locus across generations is
independent of alleles at another locus. We also assume that fitness of genotypes at
one locus is not affected by genotypes at another locus. For several loci, these
assumptions would likely be violated.
Lets consider A locus with two alleles A1 and A2 at frequencies
and a B locus also with two alleles B1 and B2 at frequencies ,
respectively. Under Hardy-Weinberg proportions, + = 1, + = 1,
and expected genotypic frequencies are 2 + 2 + 2 2 + 2 + 2 ,
respectively. Alleles at A locus may combine at random or in a non-random way
with alleles at the B locus.
15
locus, the attainment of gametic or linkage equilibrium depends on the rate of
recombination in genotypes heterozygous to both loci.
There are two types of double gametic heterozygotes:
1 1
2 2
1 2
2 1
The observed gametic frequency differs from the expected gametic frequency by
an amount D. We measure the non-randomness of the gametic frequencies by
means of deviation from two loci equilibrium. D is the gametic disequilibrium
coefficient. Gametic disequilibrium is often referred to as linkage disequilibrium.
This may be confusing because genes or loci need not be linked to be in gametic
disequilibrium. The gametic disequilibrium coefficient, D is similar to the effect of
inbreeding on genotypic frequencies at a single locus. The Heterozygote deficit
interpretation of inbreeding coefficient, F, has been called a one-locus
disequilibrium coefficient.
= +
=
=
= +
The most common expression of D is:
=
D is therefore the difference between the coupling and repulsive gametic types.
= ( + )( + ) ( )( )
[You can work on the proof in your spare time].
If two genes are in linkage disequilibrium, it means that certain alleles of
each gene are inherited together more often than would be expected by chance.
This may be due to actual genetic linkage, i.e., the genes are closely located on the
16
same chromosome. Or it could be due to some form of functional interaction where
some combinations of alleles at the two loci affect the viability of potential
offspring. It should be noted that an observed non-random association of
alleles/genotypes need not be caused by their chromosomal location. Any of the
evolutionary forces (mutation, random genetic drift, selection and gene flow) can,
at least temporarily, cause such associations.
Recombination
Lets consider the following:
The gametes produced by this genotype A1B1/A2B2 are of four types:
Gametic types 1 and 2 are called non-recombinants because the gametes are
associated with in the same manner as previous generation. Gametic types 3 and 4
are known as recombinants because the gametes are associated differently than in
the previous generation. As a result of Mendelian segregation, f(A1B1)=f(A2B2);
and f(A1B2)=f(A2B1). However, the (12) + (21) does not have to be
equal to (11) + (22). The proportion of recombinant gametes produced
by the double heterozygote is called the recombination fraction, c and the
proportion of non-recombinant gametes is 1-c.
The recombination fraction between genes depends on whether they are on
the same chromosome, and also the physical distance between them. During
meiosis, the four chromatids (of two genes) align. The two inner chromatids can
undergo breakage and exchange of parts (recombination) between the two
chromatids. Thus, only 50% or (0.5) of the chromatids can undergo recombination.
Therefore, the maximum recombination rate, cmax=0.5. For genes on different
chromosomes or far apart on the same chromosome, the recombination fraction,
c=0.5 as the four gametic types are produced in equal frequency. Genes that have
c<0.5 must necessarily be the same chromosome, and such genes are said to be
linked. When c=0, the two genes are very close to each other such that break
almost never happens, and they are transmitted together as one super gene.
17
Gametic disequilibrium and frequency of gamete change over time
The gametic disequilibrium changes from one generation to the next. Let the
frequencies of A1B1, A1B2, A2B1 and A2B2 be r, s, t and u, respectively. Now,
lets construct the gametic frequency of offspring.
Proportion among gametes
Genotype A1B1 A1B2 A2B1 A2B2
A1B1/A1B1 1 0 0 0
A1B1/A1B2 0 0
A1B1/A2B1 0 0
A1B1/A2B2 (1-c) c c (1-c)
A1B2/A1B2 0 1 0 0
A1B2/A2B1 c (1-c) (1-c) c
A1B2/A2B2 0 0
A2B1/A2B1 0 0 1 0
A2B1/A2B2 0 0
A2B2/A2B2 0 0 0 1
There are ten different two-locus genotypes, therefore full mating table would take
100 rows. Assuming Hardy-Weinberg equilibrium, we can calculate the frequency
with which any one genotype will produce a particular gamete.
A1B2/A1B2 s2 s2
A1B2/A2B1 2st (c)st (1-c)st (1-c)st (c)st
A1B2/A2B2 2su su su
A2B1/A2B1 t2 t2
A2B1/A2B2 2tu tu tu
A2B2/A2B2 u2 u2
Total 1 = 0 = 0 = 0 = 0
18
The frequencies of the four gametes after one generation of selection are:
= 0
= 0
= 0
= 0
where D0 is the LD at the preceding generation.
1 =
= [( 0 )( 0 )] [( 0 )( 0 )]
This recursive relationship leads to a general relationship:
= 0 (1 )
where Dt is the D at generation, t. The LD decays each generation at a rate
determined by the degree of recombination. The maximum value of D (+0.25)
occurs when there are only coupling gametes (r=u=0.5). The minimum value of D
(-0.25) occurs when there are only repulsive gametes (s=t=0.5). Thus, the value of
D varies from -0.25 to +0.25. If there is free recombination between two loci
(either on different chromosomes or far apart from each other where c=, D would
be eliminated in about 7 generations (D7=0.00195). However, if c is much less than
0.5, e.g. 0.05, then the decay in disequilibrium will take a substantial period of
time.
A major problem with D is that, its maximum value changes as a function of
allele frequencies at the two loci. As a result, a standardizing D to the maximum
possible value was proposed by Lewontin (1964), where
=
Dmax is equal to the lesser of if D is positive or less of
if D is negative. varies between -1 and 1 regardless of the allele frequency at the
two loci, and it also provides a matrix to compare LD to be to the maximum
possible value it can be.
To determine how long it takes for D to decay to a given value D*, the recursive
equation for Dt can be solved for the number of generations, t, as:
( /)
=
(1 )
When c=0.1, it will take 6.58 and 28.43 years for half and 90% of the LD,
respectively to disappear, however, for c=0.05, it will take 13.51 and 44.89 years,
respectively for half and 90% of the LD to disappear.
19
The gametic disequilibrium coefficient, r is also used as a measure of LD:
2
2
=
where r is the square root of above equation. When the allele frequencies are the
same at both loci, r, ranges from 0 to 1. When the allele frequencies are different at
both loci both r2 and r are somewhat smaller. The value of the Chi-square, X2 is
numerically equal to r2N, where N is the total number of chromosomes examined.
The biological meaning of r is that it is the correlation between alleles present in
the same chromosome.
APPLICATION
Originally the definition of LD was in terms of gametic frequencies because that
allowed for the possibility that the loci are on different chromosomes. However,
the usual application now is to loci on the same chromosome. In that case, the
allele pair AB is a haplotype, and is the observed haplotype frequency. is
estimated from the allele and haplotype frequencies in the sample.
=
The quantity is the coefficient of linkage disequilibrium defined for a specific
pair of alleles, A and B, and does not depend on how many other alleles are at the
two loci. Each pair of alleles has its own D. The values for different pairs of alleles
are constrained by the fact that the allele frequencies at both loci and the haplotype
frequency have to add up to 1. If both loci have two alleles, e.g. SNPs, the
constraint is strong enough that one value of D is needed to characterize LD
between those loci, and = = = , where a and b are the
other alleles. In this case, the D is used without a subscript. The sign of D is
arbitrary and depends on which pair of alleles one starts with.
20
and can be interpreted as the non-independence among these alleles that is not
accounted for by the pairwise coefficients.
Examples of LD
B1B1 B1B2 B2B2 Total
A1A1 40 60 28 128
A1A2 10 48 36 94
A2A2 4 14 26 44
Total 54 122 90 266
A locus B locus
A1A1 PA=128/266=0.4812 B1B1 PB=54/266=0.2030
A1A2 HA=94/266=0.3534 B1B2 HB=122/266=0.4586
A2A2 QA=44/266=0.1654 B2B2 QB=90/266=0.3383
pA=0.4812+(0.3534)=0.6579 pB=0.2030+(0.4586)=0.4323
qA=0.1654+(0.3534)=0.3421 qB=0.3383+(0.3383)=0.5677
21
266 4 40 + 2(60 + 10) + 48
0 =
[ 2 0.6579 0.4323] = 0.0856
266 1 2 266
what does this mean? Since D is positive, the maximum value of D is the lesser of
qApB or pAqB. Since qApB = 0.3421*0.4323 =0.1479, and pAqB
=0.6579*0.5677=0.3735 we chose the former. Therefore,
0.0856
= = = 0.5790
0.1479
This tells us that D is about 57.90% of its maximum value. With a given
recombination rate, c, the value of D will change over time.
2
2 0.08562
= = = 0.1327
0.6579 0.4323 0.3421 0.5677
SNP1 SNP2
Allele Allele Frequency Allele Allele Frequency
1 G pA 1 A pB
2 C qA 2 T qB
22
Combination of SNPs into haplotypes
SNP2
Allele A T
SNP1 G GA GT
C CA CT
23
The gametic frequency in a 1,000 chicken population for the naked neck (Na/na)
and dominant I (I/i) are as follows:
Na-I 0.180 r
Na-i 0.707 s
na-I 0.061 t
na-i 0.052 u
24
The decay in LD is shown in Fig 7 under to different recombination. When there is
no linkage (c=), LD be almost zero by generation 7. However, it takes much
longer for LD to decay when recombination is closer to 0. Since D is negative, the
maximum value of D is the lesser of or pAqA or pBqB. Since pAqA f(Na) x f(na) =
0.877 x 0.113 =0.1002, and pBqB =0.241 x 0.759=0.1829 we chose the former.
Therefore,
0.03377
= = = 0.3369
0.10020
This tells us that D is about 33.69 % of its maximum value.
The observed frequency at generation t = Expected frequency at t=0 + Dt where
= 0 (1 ) where c is the recombination rate. Assuming c=0.1, at generation
2, D2 = -0.0274. The observed frequency of Na-I will be 0.2138-0.0274=0.1864.
Now we can test whether D0 is significantly different from zero or not using Chi-
square.
Null Hypothesis: The observed gametic frequencies do not deviate from the
expected gametic frequencies
2
Since X is allergic to frequencies and fraction, we have to use observed and
expected numbers.
2
(180 213.8)2 (707 673.2)2 (61 27.2)2 (52 85.8)2
= + + + = 62.3571
213.8 673.2 27.2 85.8
Degrees of freedom = 4-1-1 (for estimating f(Na) from the data) 1(for estimating
f(I) from the data=1. X2table, 1 df at p=0.05=3.84. We can reject the null
hypothesis and conclude that the observed gametic frequencies are not in
equilibrium or in linkage disequilibrium.
Population genetics of LD
Linkage disequilibrium is affected by the following:
Selection (both natural and artificial)
Genetic drift
Population subdivision and bottlenecks
Inbreeding, inversion and gene conversion
Applications of LD
Mutation, gene mapping, QTL studies, Genome breeding value estimation
Detecting natural selection
25
Population structure and Gene flow
So far we have assumed that a population is homogeneous, and the
characteristics of the subpopulations sampled from the population would be
identical. This assumption may not be true. The distribution of individuals and
gene (allele) flow connections between different subpopulations can be important
in evolution. By population structure a population geneticist mean that, instead of a
single, simple population, the population may have substructure, i.e., differences in
genetic variation among the subpopulations due to different evolutionary reasons
(genetic drift, nonrandom mating, selection, etc.).
The overall population of subpopulations is referred to as the total
population (T). Individual component of the total population is referred to as
subpopulations (S), local populations or demes. In many real populations, there
may not be obvious structure, and the population is continuous. However, even in
effectively continuous populations, different areas or regions can have different
allele frequency because the mating in the total population is usually nonrandom.
In humans within a country with the same language, most often, there are language
differences suggesting substructure, but it is always difficult to find the exact
boundary where the changeover occurs. Such a population is structured, but
continuous in space. Population structure can therefore be defined as when
subpopulations deviate from Hardy-Weinberg proportions.
0 0
= =1
The reduction in heterozygote frequency is implicit with increases in the frequency
of homozygotes. The reduction in heterozygote frequency is divided equally
among the homozygotes. Change in heterozygote frequency is given as
0 = 2 2(1 ) = 2 [2 2] = 2
26
This implies, the two homozygotes would have their respective frequencies
2
increase by (
2
) = . The reason why the reduced heterozygotes are divided
equally to the two homozygotes is that each heterozygote genotype has one of the
two alleles.
= 2 (1 ) +
{
= 2 (1 )
F coefficients
If individuals mate within subpopulations, they would likely mate with related
individuals than if they mated randomly over the entire population. Sewall Wright
provided an approach to partitioning the genetic variation in subpopulations that
provides an obvious description of differentiation. If are the measure of
heterozygosity in the total and average of the subpopulations, respectively,
Wrights fixation index, which measures the average change in heterozygosity
in subpopulations relative to the total heterozygosity as:
= =1
= 2
=0
27
Within each subpopulation, there can be a deviation from expected heterozygotes
within that subpopulation. Using the same logic,
= =1
= =1
(1 )
Since, = (1 ), 1 = and = 1
1 = (1 )(1 )
If individuals are mating completely at random over the entire population, then
there will be no local variation in allele frequency and each subpopulation will
have the same expected heterozygosity as the total population. In that case =0
and there will be no differentiation among subpopulations. At the other extreme, if
each subpopulation is completely isolated and alleles have become fixed within
each subpopulation, then there is no heterozygosity within the subpopulations. In
that case =1 and there is maximum differentiation among subpopulations
28
Practical example:
A population of 1,600 individuals was divided into three subpopulations and
genotyped for the gene responsible for juicy meat in a delicacy goat breed in
Yourland.
AA Aa aa
Observed numbers
Subpopulation 1 125 250 125 500
Subpopulation 2 55 30 15 100
Subpopulation 3 80 440 480 1,000
Total population 260 720 620 1,600
Subpopulation 1
Subpopulation 2
55 30 15
2 = = 0.55; 2 = = 0.30; 2 = = 0.15; 2 = 2 + 2 = 0.7; 1 = 0.3
100 100 100
Subpopulation 3
80 440 480
3 = = 0.08; 3 = = 0.44; 3 = = 0.48; 3 = 3 + 3 = 0.3; 1 = 0.7
1000 1000 1000
Total population
260 720 620
0 = = 0.1625; 0 = = 0.45; 0 = = 0.3875;
1600 1600 1600
0 = + = 0.3875; 0 = 0.6125
AA Aa aa
Expected numbers
Subpopulation 1 125 250 125 500
Subpopulation 2 49 42 9 100
Subpopulation 3 90 420 490 1,000
Total population 240.2496 759.5008 600.2496 1,600
29
Expected frequency:
1 = 12 = 0.52 = 0.25; 1 = 21 1 = 20.50.5 = 0.50; 1 = 12 = 0.52 = 0.25
2 = 22 = 0.72 = 0.49; 2 = 22 2 = 20.70.3 = 0.42; 2 = 22 = 0.32 = 0.09
3 = 32 = 0.32 = 0.09; 3 = 23 3 = 20.30.7 = 0.42; 3 = 32 = 0.72 = 0.49
2
0 = 0 = 0.38752 = 0.150156; 0 = 20 0 = 20.38750.6125 = 0.474688;
2
0 = 0 = 0.61252 = 0.375156
2 2 0.30
2 = 1 =1 =1 = 0.2857
2 22 2 0.42
3 3 0.44
3 = 1 =1 =1 = 0.0476
3 23 3 0.42
0 0 0.450000
0 = 1 =1 =1 = 0.0520
0 20 0 0.474688
= 20 0 = 20.38750.6125 = 0.474688
30
0.450
= 1 =1 = 0.0112
0.445
0.445
= 1 =1 = 0.0632
0.475
0.450
= 1 =1 = 0.0526
0.475
Verification
1 = (1 )(1 )
31
QUANTITATIVE GENETICS
Let , be the arbitrary genotypic values for A1A1, A1A2 and A2A2,
respectively. The difference between the two homozygous is 2a. The value of a is
a deviation from 0 (mid-point), which is the average of the two homozygotes. The
heterozygote, A1A2 has a value of d = ak, where k is the degree of dominance.
The alleles A1 and A2 behave in a completely additive manner when k=0. When
k=+1, means the A1 allele is completely dominant over A2 allele; and when k=-1,
means the A2 allele is completely dominant over the A1 allele. If k>+1 means over
dominance, and if k<-1 mean under dominance.
Lets look at some data set. The genotypic values of an AluI polymorphic site at
the 5-region of the bovine growth hormone receptor gene for milk fat are as
follows:
AluI (-/-): -25 designated (A2A2)
AluI(+/-): -23 designated (A1A2)
AluI(+/+): -10 designated (A1A1)
32
Population mean
2 2 + 2
=
2 + 2 + 2
The denominator is equal to 1. The numerator can be rewritten as:
(2 2 ) + 2
2 2 = ( + )( )
Therefore, the population mean can be written as:
= ( ) + 2
The homozygotes contribute a(p - q) and the heterozygote contributes 2pqd to the
population mean.
From Fig 9, the population mean depends on allele frequency. The population
mean decreases with increasing frequency of the unfavorable allele (Fig 9a). The
population mean increases with increasing frequency of the favorable allele (Fig
9b).
33
Population mean under additivity (k=0):
We have already established that d=ka, therefore, when k=0, d=0.
= ( )
Since p = 1 q, = (1 ) = (1 2)
Genetic Model
The genotypic value of an individual can be written in term of the genetic
decomposition of the genotype.
=++
The genotypic value equals the breeding value A, dominance deviation, D and
epistasis deviation. For simplicity, we will ignore the epistatic deviation and
concentrate on breeding value or additive value and dominance deviation.
=+
Genotypic value, G
The genotypic value can be written as a deviation from the population mean.
11 =
12 =
22 =
11 = [( ) + 2
= + 2 = (1 + ) 2
= (1 1 + + ) 2
11 = 2( )
Subsequently,
12 = ( ) + (1 2)
and
22 = 2( + )
34
BREEDING (Additive) VALUES (A)
An individuals breeding value can be said to be the sum of the additive effects of
the individuals alleles. The concept of additive effects arises from the fact that
parents pass on their alleles to their progeny and not their genotype. Therefore, the
value of an individual judged by the mean value of its progeny is called the
individuals breeding value. The breeding value for an individual at a locus is
defined as the sum of the additive effects of the alleles at the locus.
When there are only two alleles at a locus, it is more convenient to express their
additive effects in terms of the additive or average effect of allele substitution.
1 = [ + ( )]
2 = [ + ( )]
The effect of substituting one allele with the other is = 1 2 this is, the
average change in the genotypic value when the A1 allele is completely substituted
with the A2 allele.
= 1 2 = + 2 + + + 2 = + + 2 2
= ( + ) + ( 2 2 )
Note that + = 1, ( 2 2 ) = ( + )( )
= + ( )
35
An individuals breeding value A is the sum of all additive effects of its alleles.
When mating is random, the breeding value of a genotype for an individual is
twice the expected mean deviation of its progeny from the population mean. The
deviation is multiplied by two since only one half of the parental alleles are
transmitted to each progeny. Therefore, we can estimate the breeding value of an
individual by mating it to random individuals from the population and taking the
twice the deviation of its offspring mean from the population mean. Breeding
values can be estimated under several scenarios.
1 1 1 2 2 2
Frequency 2 2 2
Breeding value 2 ( ) 2
Mean breeding value + ( )
= 2 ( + ) = 0
From the genetic model, we can calculate the dominance deviation as:
=
Since we have already derived both G and A, we can deduce D. Dominance
deviation arise from interaction between alleles at a locus. In the absence of
dominance, G=A.
Lets write G in terms of
11 = 2( ), = + ( )
= +
11 = 2 2
11 = 2( + ) 2 = 2 2 2 + 2 2
36
Therefore,
11 = 2( )
Subsequently,
12 = ( ) + 2
and
22 = 2( + )
1 1 1 2 2 2
2
Frequency 2 2
Genotypic value, G 2( ) ( ) + 2 = 2( + )
Breeding value, A 2 ( ) 2
2
Dominance, D=G-A 2 2 22
Mean Dominance + = 0
( 2
2 = 2 (
) = 2 2
or
(( )2
2 = 2
or
= ( )2
2
However, if = then
2
= 2
37
GENOTYPIC VARIATION
2 = 2 11
2 2
+ 212 2
+ 2 22
2( ) 1 1
= { ( ) + 2 1 2
2( + ) 2 2
2 = 2 2 + (2)2
Therefore, we can drop the covariance from the above model. Therefore,
2 = 2 + 2
38
Additive genetic variance,
We can use the same logic used in calculating the genetic variance to calculate the
additive genetic variance. Since we have already calculated as a deviation from
the population mean , then,
2 = ( 2 )
21 = 2 1 1
= {1 + 2 = ( ) 1 2
22 = 2 2 2
Dominance variance,
We have already calculated as a deviation from the population mean ,
therefore,
2 = ( 2 )
2 2 1 1
= { 2 1 2
22 2 2
2 = (2)2
2 = 2 2 + (2)2
39
Fig 10 The genotypic (VG), additive (VA) and dominance (VD) variances at
different allele frequency
40
Genetic parameter estimations under different allele frequency
= 0.1 = 0.5 = 0.8
1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2
Egg weight 50 45 30 50 45 30 50 45 30
Genotypic value, G 10 5 -10 10 5 -10 10 5 -10
Genotypic frequency, f 0.81 0.18 0.01 0.25 0.50 0.25 0.04 0.32 0.65
Additive effect
1 = 0.6 5 11.7
2 = -5.4 -5 -2.6
Dominance Deviation, D -0.1 0.9 -8.1 -2.5 2.5 -2.5 -6.4 1.6 -0.4
Mean dominance deviation -0.081 0.162 -0.081 -0.625 1.25 -0.625 -0.256 0.512 -0.256
41
MOLECULAR GENETICS APPLIED TO ANIMAL BREEDING
GENOME ORGANIZATION
What is a genome?
A genome is an organisms complete set of DNA, including all of its genes. Each
genome contains all of the information needed to build and maintain that organism.
The genome is made up of the DNA in chromosomes as well as the DNA in
mitochondria.
The genome contains instructions or blue print for all activity in an organism. The
instructions are written in a four-letter-language of DNA, i.e. Adenine, Cytosine,
Thymine and Guanine, shorten to A, C, T, and G). Almost every cell in an
eukaryotic organism contains a complete copy of these instructions. The genetic
instructions are stored in pairs of chromosomes. Each chromosome contains genes
which contains the direct instructions for a cell to make a protein. The genome
contains coding sequences (genes) and non-coding sequences of DNA.
42
The genome contains:
1. STRUCTURAL GENES: DNA segments that codes for some specific
RNAs or proteins. Encodes for mRNAs, tRNA, snRNAs, scRNAs, etc
2. FUNCTIONAL SEQUENCES: Regulatory sequences-occur as regulatory
elements (initiation sites, promotor regions, terminator regions, etc)
3. NON-FUNCTIONAL SEQUENCES: Introns, repetitive sequences, and all
the unknowns
43
We know about 5-10% of the genome encodes for genes. What is the function of
the other 90%? So far there are no good answers. In the 1990s, the non-coding
regions were referred to as junk DNA, but nobody uses the term junk DNA
anymore our knowledge of the genome keeps improving, and some of the so called
junk DNA have elements that the controls gene transcription. Non-coding RNA,
e.g. microRNA depending on the location can affect gene transcription. A fairly
balanced article on junk DNA post ENCODE era and the controversy that ensued
can be found in PLoS Genetics
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004351
44
Ribonucleic acid (RNA), unlike DNA is single stranded the folds onto itself rather
than a paired double strand. In RNA, the pyrimidine, thymine is replaced by uracil.
One of the universal functions of RNA is protein synthesis where messenger RNA
(mRNA) molecules direct the assembly of proteins on ribosomes. This process
uses transfer RNA (tRNA) molecules to deliver amino acids to the ribosome,
where ribosomal RNA (tRNA) links amino acids together to form proteins.
GENE
A gene was defined at least four decades before the DNA structure was discovered.
To a population geneticist, a gene is the basic unit of heredity which comes in
pairs, and one pair is transmitted from parent to progeny. A more refined definition
of a gene will be a sequence (instruction manual) on a chromosome that encodes a
protein or a polypeptide.
A gene consist of a 5' untranslated region (5' UTR) or leader sequence that ends to
the position of the first codon used in translation. The 3' UTR is the portion of an
mRNA from the 3' end of the mRNA (trailer sequence) to the position of the last
codon used in translation. The frame of a gene consists of exons and introns.
An exon is any nucleotide sequence encoded by a gene that remains within the final
mature RNA product of that gene.
An intron is a noncoding part of a gene that is spliced out before the RNA is
translated into a protein.
45
46
47
MOLECULAR MARKERS
Genome Studies
1. Improve annotation of the genome
2. Function and regulation of coding genes
3. Posttranslational regulation of genes
4. Extract potential functions from non-coding and intergenic DNA
48
To date, a large proportion of genome studies have been possible because of
genetic markers.
GENETIC MARKER: DNA sequence that can be detected and whose inheritance
can be monitored. The three properties that define a genetic marker are: locus
specificity, polymorphic and ease of genotyping. A marker is said to be
polymorphic when it exits in more than one form
There are several laboratory methods used to detect the aforementioned genetic
markers. Those methods would not be the subject of this course. The most
commonly used markers in farm animal studies are microsatellites and SNPs.
49
50
51
SELECTON THEORY
Selection response (R) is how much gain you make when mating of selected parents. Response
to selection can be evaluated in the short- or long-term.
1. How heritable is the trait under selection (i.e. the trait in the breeding goal)?
2. How much genetic variation for that trait is there in the population?
3. What is the average accuracy of the EBV, and thus the accuracy of selection?
4. What proportion of the animals will be selected for breeding?
5. In case genetic gain is to be expressed per year, rather than per generation: how long is a
generation?
= 2
LONG-TERM RESPONSE: As selection proceeds, allele frequency changes and the base
population genetic parameters fails to predict long term response.
The within-generation mean: This reflects the changes in the entire population and that of the
selected population. Selection can cause changes in the distribution of phenotypes. The within-
generational change is what is referred to as the Selection Differential, (S).
= 0
Where 0 is the population mean (Generation 0) before selection and is the mean of the
selected parents that produces the progeny population (Generation 1).
52
The between-generation mean: This is the response to selection, R which measures the changes
in mean between the population before and after selection.
= 1 0
Weighted selection differential: The joint effects of natural and artificial selection affect
selection response. Natural selection is always on the side of fitness and can be in the same
direction or oppose artificial selection.
Lets examine the unweighted and weighted selection differential and ascertain how they are
influenced by natural selection.
53
Male (ram) Female (ewe)
Population mean 24 kg 22 kg
Mating # of offspring measured
1 22 20 2
2 35 29 1
3 23 22 1
4 20 24 2
5 24 20 2
6 30 27 2
7 30 30 0
8 37 22 0
9 22 20 6
10 19 20 10
N=26
Prediction of response to selection from the proportion selected: Selection intensity (i)
The selection differential is limiting when comparing the strength of selection on different traits
or in different populations. When planning a selection program, it would be rather useful to
predict genetic change from certain selection strategy prior to even selecting the parental
population to breed. This is possible when truncation selection (selection of individuals above or
below a certain truncation point or threshold) is practiced. The selection differential can be
derived from the distribution of predicted breeding values or phenotypic values and knowledge
of the proportion of selected individuals. The standardized selection differential, usually called
the selection intensity (i) is the selection differential expressed as a fraction of the phenotypic
standard deviation. The selection intensity is a more useful measure for predicting selection
response or comparing different selection strategies or response in different populations.
= 2
The breeders equation theoretically holds for a single generation of selection from an unselected
bas population. The reliability of using the breeders equation to predict response to selection
beyond one generation depends on:
54
1. The accuracy of the heritability estimate
2. Absence of environmental changes between generations
3. Insignificant change in the heritability estimate from that of the base population
From population genetics, we learned that heritability depends on allele frequency. Selection
changes allele frequency. Therefore, it should be expected that, heritability will change with
selection. Thus, in the strictest sense, the breeders equation is valid only for one generation.
However, heritability is not expected to change significantly in the first few generations of
selection and in practice, the breeders equation has been used to predict short term response (up
to 3-5 generations of selection.
Accuracy
The breeders Equation can be extended beyond choosing an individual solely on the basis of its
phenotype.
2
2
= 2 = ( ) =
Where h is the correlation between the phenotypic and breeding values; = which quantifies
the ability to predict the breeding value of an individual from the individuals phenotype. This is
in essence the accuracy of the selection scheme used to select parents. We can therefore express
the breeders equation in terms of accuracy of selection as:
The EBV of an animal can be estimated by regressing the animals BV on its phenotype. With a
single measurement on an animal, the regression coefficient, equals the heritability 2 :
2
= = 2 = 2
2
55
The EBV, of an animal is = 2 ( ) and = = 2
Where P is the phenotypic value of the trait, is the population mean, and g the relationship
between the individual(s) being measured and the individual for which we are estimating BV.
The value of g is 1.0 for an individual's own performance. It is 0.5 for full sibs, progeny or
parents and 0.25 for half sibs or grandparents.
Example 1:
Daily feed consumption (FC) of two individuals A and B are 125g and 135g respectively. The
mean FC is 120g, with heritability of 0.20. Predict the EBV and accuracy of A and B for FC.
A:
B:
Individual B has a higher EBV for FC than A, but both estimates have the same accuracy.
Some traits can be measured several times during an animal's lifetime. For example feed
consumption, body weight, egg production. If a trait is measured several times during an animal's
life, each value should be used in an estimate of breeding value. The relationship between
repeated records, termed repeatability becomes important. Repeatability (re) is a measure of
the reliability or strength of the relationship between repeated measurements on an individual.
When using repeated measurements on an individual g is still 1.0 since the animal being
measured and the animal the BV is obtained for are still the same. The value of is now a
function of the number of records (n), heritability (h2) and repeatability (re).
2 2
= 1+(1) and = 1+(1)
56
Example 2:
Assume that the daily feed intake of individual A (128 g) is an average of 5 measurements, with
a repeatability of 0.40. Predict the EBV and accuracy of A.
2 5 0.20
= ( ) = (128 120) = 3.08
1 + ( 1) 1 + (5 1)0.40
2 5 0.20
= 1+(1) = 1+(51)0.40 1.0 = 0.62
Repeated measurements on A improve its EBV and accuracy for feed intake.
Number of measurements
Heritability Repeatability 1 5 10
Traits with low heritability benefit from multiple measurements since each additional record
contributes toward to total information available, especially when the repeatability is low. If the
repeatability is high, multiple measurements do not add much to the accuracy of EBV.
57
3. Information from Relatives
In a closed population, there is bound to be full sibs (FS) (have both parents in common) and half
sibs (HS) (have one parent in common) that provide additional information in estimating BV.
Siblings have a proportion of their alleles (genes) in common. Full sibs have half of their alleles
in common, and half sibs have a quarter of their alleles in common. In pig, cattle, sheep and goat,
siblings are initially reared together, and the common environment among siblings also creates
additional similarity (maternal environment, temperature, food supply), however, in commercial
poultry similarity due to common environment is non-existent. In non-commercial poultry where
the hen incubates her own eggs and brood her chicks, similarity of siblings due common
environment is in play when estimating BV. The similarity among siblings, t, depends on the
siblings involved.
2 2
= 2 + = 2 +
where, c2 is the environmental correlation among sibs. The regression coefficient is given as:
2 2
= 1+(1) ( ) and = 1+(1)
where n is the number of siblings, t is the correlation among sibs, g is the genetic relationship
among sibs. For full sibs, g=, and for half sibs, g=.
Example 3:
Individual A has 5 half sibs with and FC of 128 g. Predict the EBV and accuracy of A when
environmental correlation c2 is (a) 0, and (b) 0.125. The population mean for FC is 120g and h2
is 0.20. Assume (c) that the 5 records were obtained from full sibs, and c2 is 0.125.
2 5 0.25 0.20
= ( ) = (128 120) = 1.67
1 + ( 1) 1 + (5 1)0.05
58
2 5 0.25 0.20
= = 0.25 = 0.23
1 + ( 1) 1 + (5 1)0.05
2 5 0.25 0.20
= ( ) = (128 120) = 1.18
1 + ( 1) 1 + (5 1)0.175
2 5 0.25 0.20
= = 0.25 = 0.06
1 + ( 1) 1 + (5 1)0.175
When there is no measurement on the animal, EBV predicted from relatives is low. The higher
the value of t the lower the EBV.
2 5 0.50 0.20
= ( ) = (128 120) = 2.11
1 + ( 1) 1 + (5 1)0.225
2 5 0.50 0.20
= = 0.50 = 0.11
1 + ( 1) 1 + (5 1)0.225
Sib information never results in really high accuracy. Full sib information is limited by
environmental correlations among the sibs. It should not replace individuals own record if it can
be obtained. Rather, it should be used to supplement the information on the individual if sib
information happens to be available.
59
Progeny testing
Using the mean of a parents progeny to predict the parents breeding value, is an alternative
predictor of an individuals breeding value. The correlation between the mean of n progeny, and
the breeding value of the parent is
4 2
= , =
+ 2
2
=
4 + 2 ( 1)
Example:
A breeder selects top 20% of sheep based on performance of 10 offspring. The heritability of
udder size is 0.10, with a phenotypic variance of 50. Predict the response to selection that the
breeder will achieve with this strategy. A selected proportion of 20% results in a selection
intensity of 1.4.
10 0.10
=
4 + 0.10(10 1)
The breeder is disappointed and wants more genetic gain. Predict how much improvement he can
achieve be achieved by selecting the top 10% instead of the top 20% for breeding. What
changed?
The breeder is still not completely satisfied because he wants a genetic gain and decides to base
the selection on the performance of 15 instead of 10 offspring. Predict the selection response for
this new situation. What changed?
The breeders equation thus far calculates response to selection per generation. However, to
calculate the selection response per year, the generation interval is required.
60
=
The generation interval L can be calculated separately for males and females and averaged.
= 2.75 ;
2.65 + 2.90
= = 2.775
2
61
Selection Path
The selection strategy of males and females are different. The major differences between the
sexes are:
1. In mammals there is a limited reproduction capacity in females. We assume that
population size is the same across generations. We should be aware that, selected animals
should be capable to produce sufficient progeny to maintain population size. Males
generally can produce more progeny than female and as a result, selection intensity is
higher in males than females. We should also be mindful of the direction of natural
selection to ensure that sufficient progeny is produced.
2. The information sources for estimating breeding values in males and females may be
different. Males may be selected based on progeny performance, whereas females are
selected on their own performance leading to differences in accuracy of selection.
3. The generation interval for the sexes may also be different. If males re selected based on
progeny testing, then on the average, the age at which males will be used for breeding
will be different from that of females.
The aforementioned differences in males and females require different selection paths when
determining response to selection per year. The breeders equation can be written as:
+ , + ,
= =
+ +
The intensity of selection and accuracy of selection and generation interval may be different in
males and females. The genetic standard deviation, however, is a population parameter and is,
therefore, the same between males and females.
A sheep breeder has 200 ewe flock and selecting for weaning weight. Rams are first selected at 2
years old and mated for 3 years. Ewes are first selected at 2 years old, and mated for 5 years.
Each ram is mating to 20 ewes, 80% lambing rate, 50:50 sex ratio, and there is no significant
mortality in adults. The heritability =0.11 and the phenotypic variance is 0.25 kg. Calculate the
response to selection per year.
62
We can define four selection paths:
+ + +
=
+ + +
63
LIVESTOCK BREEDING STRATEGIES
Samuel E Aggrey, PhD
University of Georgia
Athens, GA 30602, USA
saggrey@uga.edu
Several panels have been assembled in the past by governments, international agencies and non-
profit organizations to map out strategies to improve livestock productivity in developing
countries. The goals have been laudable but the outcomes have been far below expected goals.
Breeding strategy in the developing world has become synonymous with turning the axle of
poultry and livestock production to mirror that of advanced countries. In the developing world
genetic improvement has come to imply upgrading a herd usually, that of a national livestock
research institute. Several crossbreeding projects were initiated all across Africa with the goal of
quickly upgrading low producing indigenous and adapted breeds with high producing exotic
breeds from Europe or North America. Management of crossbred herds did not match their
genetic potential and as a result the expected productivity was not realized. The crossbreeding
approach to genetic improvement was not done in a sustainable manner and currently only
remnants of such projects exist. It should be pointed out that in a few cases, crossbreeding on
private farms with improved nutrition and management has been successful but they are not
enough to meet the massive demand for meat and livestock products.
Genetic improvement is a long term endeavor and short term approaches are bound to yield
limited or no success at all. Funding for genetic improvement projects from most international
agencies only last for about 5 years. Funding from national governments could be as short as one
year. A total mismatch of a long term endeavor with a very short term funding can only point in
the direction of limited success if not failure.
In recent times, scientific jargons have been embraced in several projects. Biotechnology is the
silver bullet expected to radically transform the whole agricultural sector in the developing
world. The argument here is not about the potential of biotechnology. When a high powered fuel
is put into a non-functioning engine, the vehicle would still not move. All other parts of the
vehicle should also be functioning. Genomics, high throughput science, biotechnology and
nanotechnology when applied in the proper environment can lead to tremendous increase in
productivity. However, I would argue that, before any of these advanced technologies are
adopted en masse, the well proven methodologies need to be adopted first.
In the developing world, breeding strategies need to have at least four basic components:
1. Assessment
2. Preplanning
3. Technical mechanics of genetic improvement
4. Sustainability
64
A. ASSESSMENT OF EXISTING SYSTEM
Assessment can be done in five broad areas to answer basic questions to determine whether
genetic improvement is even needed at all.
1. Current Production System
a. Who are the breeders?
b. Who are the animal keepers?
c. What are the management practices?
d. Can the current production system support and improvement program?
e. Is reduction in herd size or animal numbers possible?
f. What are the logistics and infrastructure?
g. What is the environmental impact
h. Is the current production system sustainable?
5. Market Analysis
a. What is the size of the overall market?
b. Can the market improve or grow?
c. Is there demand for the product?
d. What is the purchasing power of the population?
e. Are there export possibilities?
f. Can the market accommodate improvement in the production system?
There should be a fact based justification for genetic improvement. When there is a demand for a
product, there is no need to convince producers to produce more.
65
GENETIC IMPROVEMENT IS A LONG TERM PROGRAM
B. PREPLANNING
In the preplanning stage, both livestock keepers and consumers should be adequately involved in
the early planning and genetic improvement programs. Some questions also need to be
adequately answered at this stage.
In most cases in Africa, livestock keepers have their own breeding criteria and any genetic
improvement program should take that into account when defining the breeding objective. For
example, the Karamoja pastoralist prefers coat color, body size, conformation, horn
configuration and temperament as traits suitable for marketing. In Ethiopia, there is a preferred
phenotypic characteristic of chickens. After all, the breeding objective should be based on
projected profits under future conditions of productions and not merely on the potential to
change trait genetically. The definition of profit may differ from place to place. Whereas, some
places use monetary value to define profit, other may simply use herd size.
It is during the preplanning stage that priorities and the sustainability plan for the entire breeding
strategies should be developed.
PRIORITIES
a. Short terms
b. Medium terms
c. Long terms
66
C. TECHNICAL MECHANICS OF GENETIC IMPROVEMENT
BREEDING OBJECTIVES
Breeding is always aimed at the future. Decisions you make now will influence the future
generation(s). The breeding goal that you have defined indicates what you think will be
important in the future. You have analyzed the market and have an idea about what customers
will demand some years from now. Will it be mainly milk or butter or cheese? Will it be mainly
pork chops or ham or bacon? Will it be mainly breast meat or legs or full carcasses? Finally, you
have an idea about the expected developments in production systems and regulations. What are
new developments related to housing systems, nutrition, etc and how are they expected to
influence the performance of your animals? Has the (inter)national government announced new
regulations that may limit your current production system? Should you anticipate to these
upcoming changes?
This means that the best animals for the future conditions of production need to be developed.
How does one define best animal. The definition of the best animal is subjective, depending on
(1) the function of the animal, (2) culture, (3) market structure, (4) production environment, (5)
legislature (6) population structure [pyramidal or segmented] and (7) environment limitations.
Cattle are kept for meat, milk and draft. Depending on the function of the animal within that
particular society, the best animal can be defined. A high milking cow may be suitable for
Wisconsin, but in the hills of Ethiopia, a hardy cow may be suitable.
The type of birds for cut-ups and further processing is different from just raising whole birds.
This means, breeders would anticipate future markets and develop bird meat demands. It will
also be the best animal for the future.
67
The best animal may not necessarily be a high performance animal for a particular animal
product (milk, meat or fiber), but could be an average performance animal with reasonable
resistance to an endemic disease. Defining the best animal is not an easy one and requires inputs
from animal keepers, consumers, breeders and other stakeholders. Matching genotypes with
suitable environments and societal acceptability depends on the availability of wide range of
genotypes to choose from. A thorough knowledge of similar genotypes in other tropical regions,
including nutrition and local diseases is needed. The phenotypes may be acceptable but may not
necessarily cope in a new environment. The following may be considered in selecting the best
animal:
68
When the infrastructure for the well proven methods of genetic improvement is in place,
advanced technologies become easy to adopt. Several novel approaches can be devised for data
collection. Models can be developed by collecting unmeasured phenotypes through the
measurement of a few easy-to-measure phenotypes.
69
(a) Breeding Programs with separate breeding and production populations
Separation of breeding and production populations allows the breeder to focus on the objectives
of each population. The purpose of the breeding population is for genetic improvements in traits
of interest. The production population is the vehicle through which commercial production is
enhanced. Genetic material from the breeding population should constantly influence the
production population. Most commercial dairy farmers in developed countries and some parts of
Africa purchase semen from improved bulls to constantly upgrade their herds. A breeding
program in Africa can concentrate on developing males and then sell them to local producers to
improve their flocks in exchange for data collection. There are several advantages to do so in
addition to data collection. This automatically includes the animal keeper in the breeding
scheme. Nobody kills the golden goose. When the farmer sees the benefits of improved animals
without the burden of keeping males, such a scheme is bound to be successful. Over time, this
strategy can become part of the sustainability plan.
Components of the above structure can be adopted for sustainable genetic improvement in the
developing world for cattle and small ruminants and even pigs.
70
generation, leads to a pyramidal structure of such a breeding and production program. This is a
strategy usually employed by poultry and pig operations in developed countries. Whereas some
companies house and develop only elite pure lines, others develop an integrated system from
pure lines to the commercial animal.
Under the pyramid structure, consumer concerns, lobby groups and food services concerns from
the bottom of the pyramid bubbles up into the pure lines. Over time, these concerns are
addressed in the genetic improvement programs in the pure lines. The poultry breeding
companies develop animals for different markets and have the opportunities to respond quicker
to market changes than cattle, especially since generation interval is far shorter in poultry than in
cattle.
In a pyramid structure all sources of genetic variation are exploited. Selection response is
realized in the elite pure lines. The additive genetic variance, accuracy of estimation of breeding
values and the selection intensity becomes important as these three factors determine genetic
gain. The grandparent and parent multiplication levels exploit heterosis via non-additive genetic
variance.
In commercial pig breeding programs and in some rare cases of poultry breeding, usually a three-
way cross is applied. The next figure illustrate a commercial three way cross. Usually, the
terminal male is a purebred selected on growth, feed efficiency and other production
characteristics. The final female is usually a hybrid taking advantage of both production and
reproduction traits.
71
Figure 4 Three way commercial cross breeding scheme
This stage includes breeding value estimation, selection criteria and genetic models. After
estimating breeding values and evaluating alternative selection decisions on the genetic response
to selection, the actual practical selection and mating of animals can begin. Selection programs
can maximize genetic gains at an inbreeding rate, e.g. 1% or at any level that will that will limit
the accumulation of inbreeding. It is at this stage that factors such as selection intensity and
generation interval are optimized. Several options can be pursued including:
72
Mating Strategy
The traits in the breeding objectives may not necessarily be the selection traits, therefore, it is
important that the traits in the breeding objective and the selected traits are evaluated after each
year. The following evaluation criteria can be considered:
1. Selection response in selected traits.
2. Selection response in breeding objective traits.
3. Annual rate of inbreeding and inbreeding depression.
4. Annual cost of breeding program including appreciation/depreciation of fixed costs.
The annual rate of inbreeding can be used as an indirect measure of diversity in the elite
populations.
It is important to compare the theoretical expected response to the realized response. The actual
weighted selection intensity could be used to evaluate the theoretical response. If there is
discrepancy, then the causes of the discrepancy need to be ascertained. Potential sources of
discrepancy maybe:
(a) Bias in the estimation of breeding values.
(b) Inappropriate genetic model.
(c) Some environmental factors not considered or accounted for.
(d) Selection criteria not strictly adhere to.
(e) Unexpected correlated response in other traits.
The alleles (genes) of the improved population from here on are disseminated to the production
population depending on the population structure. Mostly, several forms of crossbreeding are
pursued to take advantage of heterosis or hybrid vigor. Heterosis is the change in performance of
crossbred animals over that of the purebreds.
73
ECONOMIC AND GENETIC SUSTAINABILITY OF BREEDING PROGRAM
A breeding program is the organized structure set up to realize the desired gain in the production
population. It is important for producers to also have a sense of improvement in their
populations. Producers can only judge the benefit of a breeding program when the productivity
of their animals improves and their profit margins go up. It is easy for farmers to pay for
genetic material when they make a direct link of their profit margins to the genetic material they
received. Economic sustainability can be achieved only when producers of improved
animals can recover their cost and make a profit from recipients of their improved
animals.
Genetic variation is the raw material for genetic improvement. When a genetic improvement
strategy leads to genetic gain in traits, there is a loss of genetic variation. The inbreeding level
and genetic diversity in the indigenous populations being improved for production also need to
be constantly monitored to ensure that genetic variation between breeds (biodiversity) is
preserved for the future.
74