Population Genetics Lecture Notes-2016 Biology

Training course in Quantitative Genetics and Genomics
Biosciences East and Central Africa-

International Livestock Research Institute (BecA-ILRI) Hub
Nairobi, KENYA
May 30-June 10, 2016
POPULATION AND QUANTITATIVE GENETICS
GENOME ORGANIZATION AND GENETIC MARKERS
SELECTION THEORY
BREEDING STRATEGIES
Samuel E Aggrey, PhD

Professor
Department of Poultry Science
Institute of Bioinformatics
University of Georgia
Athens, GA 30602, USA
saggrey@uga.edu
Preface
This lecture notes was written in an attempt to cover parts of Population Genetics,
Quantitative Genetics and Molecular Genetics for postgraduate students and also
as a refresher for field geneticists. The course material is not a text book and not
meant to be copied, duplicated or sold. This text is unedited and I am solely
responsible for all conceptual mistakes, grammatical errors and typos. Genetics is a
life-long course and cannot be covered in a few lectures. Only selected parts of the
population- and quantitative-, and molecular genetics will be covered in this course
because of time constraints. This course will cover some of the evolutionary
changes in allele frequency between generations such as natural selection and gene
flow, and some aspects of Quantitative and Molecular Genetics.
To those men who have kept us awake for over two centuries and I believe would
continue to do so for many more centuries!
POPULATION GENETICS
The study of composition of biological populations, and changes in genetic
composition that result from operation of various factors including (a) natural
selection, (b) genetic drift, (c) mutations and (d) gene flow
Genetic composition Population
1. The number of alleles at a locus A group of breeding

2. The frequency of alleles at a locus individuals
3. The frequency of genotypes at a locus

4. Transmission of alleles from one generation to the next
Single locus:
Locus A with two alleles A1 and A2
p =P +H
q =Q +H
Derivation of the Hardy-Weinberg principle

Ideal population
1. Two sexes and the population consist of sexually mature individuals
2. Mating between male and female are equal in probability (independent of
distance between mates, type of genotype, age of individuals
3. Population is large and actual frequency of each mating is equal to
Mendelian expectation
1
4. Meiosis is fair. We assume that there is no segregation distortion, no gamete
competition, no differences in the developmental ability of eggs or fertilizing
ability of sperms
5. All mating produce the same number of offspring, on average.
Thus, frequency of a particular genotype in the pool of newly formed zygote is:
(frequency of mating) (frequency of genotype produced from mating)
Frequency (A1A1 in zygotes) = P2 + PH +PH +H2

=(P+H)2
=p2
Frequency (A1A2) =2pq
Frequency (A2A2) =q2
6. Generations do not overlap

7. There is no difference among genotype groups in the probability of survival
8. There is no migration, mutation, drift and selection
Hardy-Weinberg Law
In a large random mating population in the absence of mutation, migration,
selection and random drift, allele frequency remains the same from generation
to generation. Furthermore, there is a simple relationship between allele
frequency and genotypic frequency
Why is Hardy-Weinberg principle so important? Is there any population anywhere

in the world or outer space that satisfies all assumptions? Possible evolutionary
forces within populations cause a violation of at least one of these assumptions,
and departure from Hardy-Weinberg are one way in which we detect those forces
and estimate their magnitude. The most significant evolutionary factors are
selection (natural or artificial), non-random mating and gene flow.
2
Fig. 1 shows the relationship
between allele frequency and three
genotypic frequencies for a
population under Hardy-Weinberg
proportions:
1. The heterozygote is the most
common genotype for intermediate
allele frequencies
2. One of the homozygotes is
the most when the allele frequency
is not intermediate
3. Only of the time when q is between and , is the heterozygote the most
common genotype
4. When q is between 0 and A1A1 is the most common, and when q is
between and 1, A2A2 is the most common.
5. The maximum frequency of the heterozygote occurs when q=0.5
This can be shown directly by setting the derivatives of the H-W heterozygosity,
2pq=2q(1-q), equal to zero and solving for q or
d[2q(1 q)
= 2 4 = 0

Here, we assume that the generations are non-overlapping, i.e. the parents die after
producing progeny, and the progeny then become the next parental generation.
Testing for deviation from Hardy-Weinberg Equilibrium
Departure from Hardy-Weinberg equilibrium can be tested from a sample scored

for their genotypes. The genetic model provided by Hardy-Weinberg generates the
expected frequency at equilibrium. We can now compare observed and expected
allele frequencies under the assumptions of Hardy-Weinberg proportions. The chi-
square test of goodness of fit and the likelihood ratio test can be used to test
departure or lack thereof from Hardy-Weinberg equilibrium. The chi-square test is
an approximation to the likelihood ratio test. To perform a chi-square goodness of
fit test, we first have to estimate the observed genotypic frequency from the data,
3
then use that to generate the expected genotypic frequencies. We can compute the
chi-square statistic as:
2
( )2
=

Where O and E are the observed and expected number of a particular genotype and
n is the number of genotypic classes. From the calculated value of X2 and the table
value of X2 we can obtain the probability that the observed numbers deviates from
the expected numbers. The degrees of freedom used to determine the significance
of X2 value are equal to the number of genotypic classes, n, minus one, then minus
the number of parameters estimated from the data. One degree of freedom is
always lost because we use the data to estimate allele frequency. We can use the
chi-square distribution to test whether the value of X2 is too large to be the result of
sampling error. In doing so we are performing a one-tailed test. The chi-square
expression for two alleles is given as:
2
(11 p2 N)2 (12 2p q N)2 (22 q2 N)2
= + +
pN 2pq N q2 N
An alternate way to estimate differences of observed frequencies from expected

frequencies is to calculate the standardized deviation of the observed frequency
from the Hardy-Weinberg expectation of heterozygotes, which provides the
fixation index or generally inbreeding, F.
2
= =1
2 2
It can be shown that
2 = 2
For two alleles, the Chi-square good of fit test for Hardy-Weinberg proportions is
equivalent to the test for inbreeding, F=0. However, F is unstable as the expected
(E) value approaches zero, and therefore not useful for rare and very common
alleles. For E=0, O>0, F=-, and for E=0, and O=0, F is undefined. Deviation
from Hardy-Weinberg proportions can also be tested using the likelihood ratio test
which is described in most statistical texts.
4
The B/b locus is responsible for plumage color in chickens found in the Rift
Valley. The B allele expresses black plumage which is completely dominant over
the b allele for brown plumage.
Phenotype Genotype Observed number Expected number
Black BB 290 p2N=289.444
Black Bb 496 2pq=497.112
Brown bb 214 q2N=213.444
Total 1,000 1,000
P=290/1000=0.29; H=496/1000=0.496; Q=214/1000=0.214; P+H+Q=1.0
p=P+H = 0.29+(0.496)=0.538; q=Q+H = 0.214+(0.496)=0.462; p+q=1.0
Note: Chi-square is allergic to fraction and ratios, but really likes integers!
2
(290 289.444)2 (496 497.112)2 (214 213.444)2
= + + = 0.0050
289.444 497.112 213.444
The X2-Table at p=0.05 at 1 degree of freedom is 3.84. Since the X2 calculate is

lower than X2 table, we can conclude that the data does not deviate from Hardy-
Weinberg proportions.
0.496000
=1 =1 = 0.002237
2 0.497112
2 = 2 = 0.0050
5
Extension of Hardy-Weinbergs Law: Multiple Alleles
Let us consider a single locus with three alleles A1, A2 and A3 with frequencies, p,
q and r, respectively.
Hardy Weinberg frequencies for three autosomal alleles at a single locus

Allele/ A1 A2 A3
frequency p q r
A1 A1A1 A1A2 A1A3
p p2 pq pr
A2 A2A1 A2A2 A2A3
q qp q2 qr
A3 A3A1 A3A3 A3A3
r rp rq r2
Genotype Frequency Number

A1A1 p2 N11
A1A2 pq+pq=2pq N12
A1A3 pr+pr=2pr N13
A2A2 q2 N22
A2A3 qr+qr=2qr N23
A3A3 r2 N33
TOTAL 1.0 N
Please note that, + + = 1, and they key to solving multiple alleles is to break
in order for the problem to resemble a two allele problem
33
(33) = 2 =

33
=

From here, lets reduce the problem to a two allele locus involving the allele, A3
Expected genotypes under H-W: A2A2, A2A3 and A3A3 with expected frequency
22 +23 +33
2 + 2 + 2 = .

From basic algebra: ( + )2 = 2 + 2 + 2 .
This implies: ( + )2 = 2 + 2 + 2
22 +23 +33
Therefore: ( + )2 =

6
22 +23 +33
+=

22 +23 +33 33
=

Since, + + = 1, then = 1 ( + )
22 +23 +33
=1

The ABO blood group in humans is determined by three alleles, A, B and O.
Allele/ A B O
frequency p q r
A AA AB AO
p p2 pq pr
B AB BB BO
q pq q2 qr
O AO BO OO
r pr qr r2
Genotype Frequency Number

AA p2 N11
AB pq+pq=2pq N12
AO pr+pr=2pr N13
BB q2 N22
BO qr+qr=2qr N23
OO r2 N33
In the year 1825, the director general of ILRI-Musastan ordered a staff nurse to
collect blood samples of all capacity building course participants. Of the 1,825
individuals sampled, 700 were type A, 250 were type B, 75 were type AB and 800
were type O. Determine the frequency of the A, B and O alleles.
Hint:
Phenotype Genotype H-W Expectation Number
A AA + AO p2+2pr 700
2
B BB + BO q +2qr 250
AB AB 2pq 75
2
O OO r 800
7
Natural Selection at One Locus
Differential viability and fertility
Natural selection occurs when some genotypes in a population have
differential survival, fertility or reproduction. In this case, we multiply each
genotypes frequency by its fitness, where fitness is a reflection of the genotypes
probability of survival and its relative participation in reproduction. Assuming a
single autosomal locus population with two alleles A1 and A2 with three diploid
genotypes A1A1, A1A2 and A2A2 and different fitnesses denoted w 11, w12 and
w22, respectively. Unless w11, w12 and w22 are all equal, then natural selection will
occur, possibly leading the genetic composition of the population to change.
Before the operation of natural selection (generation 0), the genotypes are in
Hardy-Weinberg equilibrium and the frequency of A1 and A2 alleles are p0 and q0,
respectively (p0 + q0 = 1). The genotypes of generation 0 produces progeny that
becomes generation one with frequency of A1 and A2 denoted by p1 and q1,
respectively (p1 + q1 = 1). In both generations, the allele frequency is considered at
the zygote stage and may different from adult allele frequency if there is
differential viability.
Assuming there is no mutation, and that Mendel's law of segregation is operational,

then an A1A1 genotype will produce only A1 gametes, an A2A2 genotype will
produce only A2 gametes, and an A1A2 genotype will produce A1 and A2
gametes in equal proportion. Therefore, the proportion of A2 gametes, and thus the
frequency of the A2 allele in generation one at the zygotic stage, is:
[02 22 + 12(20 0 12 )]
1 =

2
0 + 0 0 12
1 = 22
[1]

8
Equation [1] is known as a recurrence equation, as it expresses the frequency of
the A1 allele f generation 1 in terms of its frequency in generation 0. The change in
frequency between generations can then be written as:
= 1 0
02 22 + 0 0 12
= 0

02 22 + 0 0 12 0
=

If we substitute w from Table 3, ( = 1 ), and simply the equation above to:
12 + 2 22 (2 11 + 212 + 2 22 )
=

(22 12 + 2 11 + 12 )
=

[(22 12 )(11 12 )]
= [2]

Equations [1] and [2] show, in precise terms, how fitness differences between
genotypes will lead to evolutionary change. If q =0 then no allele frequency
change has occurred and the population is in allelic equilibrium. It is worth
mentioning that q =0 does not mean that no natural selection has occurred. The
condition for that is w11=w12=w22. It is possible for natural selection to occur and
have no effect on allele frequency.
Directional selection
If q > 0, then natural selection has lead the A2 allele to increase in frequency; if
q < 0 then natural selection has led the A1 allele to increase in frequency. If
w11>w12>w22, then A1A1 genotype will be fitter than A1A2, which in turn is fitter
than A2A2; in which case q must be
negative (so far as neither p nor q is
0). At each generation, the frequency
of A1 allele will be greater than in the
previous generation until it eventually
reaches fixation and the A2 allele is
eliminated from the population. Once
A1 reaches fixation (p=1 and q=0) no
further evolutionary changes will
occur. In this case, the A1 allele
confers a fitness advantage on the genotypes that carry it, and its relative frequency
in the population will increase from generation to generation until it is fixed. The
opposite fixation (A2) is true when w22>w12>w11. Table 4 illustrates numerical
9
example of directional natural selection. Fig. 2 illustrates allele frequency under
Hardy-Weinberg proportions where there is no differential viability,
w11=w12=w22=1.0 and the average fitness w=1.0 from generation to generation.
Assuming w22=0.4 as in Table 4, allele frequency of A1 increases and A2
decreases non-linearly until they get into fixation as illustrated in Fig 3. Ultimately,
the population will be monomorphic for the homozygote genotype with the highest
fitness.
Stabilizing selection
An interesting situation arises when the heterozygote is superior in fitness to the two
homozygotes. In this case, w11<w12>w22, and
what happens in this situation is that, an
equilibrium situation is reached with both alleles
present in the population. Since q must be non-
negative, this condition can be satisfied only
there is heterozygote superiority or inferiority-a
condition also known as heterosis. In this case,
natural selection produces heterogeneity and
preserves gene variation. Unlike directional
selection, stabilizing or balancing selection tends
to keep both alleles in the population and each
allele is balanced and converges at a
polymorphic equilibrium (Fig 4).
Disruptive selection
Under disruptive selection (w11>w12<w22), the
heterozygote has a lower relative fitness
compared to the two homozygotes. Viability
selection may lead either to an increasing
frequency of A1 allele or to its decreasing
frequency. In the long run, the population will be
monomorphic for one of the homozygous
genotypes (Fig 5). The population converges to
fixation.
10
Coefficient of selection
The speed with which allele or genotype frequency changes, is driven by the
relative fitness for each allele or genotype. Fitness (w11, w12 and w22) is a relative
value, usually measured in comparison with the most-fit allele/genotype in the
population. Selection coefficient, s, measures the reduction in fitness for a selected
allele or genotype compared to the most-fit allele/genotype in a population.
Selection against an allele may operate either through reduced viability or reduced
fertility or reduced mating ability or different combinations of the three. Therefore,
allele frequency needs to be deduced from the zygote stage of the parent generation
to the zygote stage of the progeny generation. The coefficient of selection
measures the proportionate reduction in gametic contribution of a genotype
compared to the most-fit genotype. The contribution of the most fit genotype is
taken to be 1, and the contribution of the genotype selected against is 1 - s. If the
selection coefficient for a genotype is 0.60; the fitness is then 0.4, which means
that for every 100 zygotes produced by the most-fit genotype, only 40 are
produced by the genotype selected against.
Dominance
To explore the effects of dominance, we can specify the fitnesses using two
parameters; one representing the difference in fitness between the two
homozygotes and the second to represent the degree of dominance, h (fitness of the
heterozygote. Let,
w11 = 1
w12 = 1 - hs
w22 = 1 - s
The parameter h together with s determines the fitness of the heterozygote.
a. If h = 0, the heterozygote has fitness 1, the same as the A1A1 homozygote:
the A1 allele is completely dominant.
b. Conversely if h= 1, the fitness of the heterozygote is the same as that of the
A2A2 homozygote (1-s): the A2 allele is completely dominant.
c. If 0 < h< 1, the heterozygotes fitness is somewhere between those of the
homozygotes: there is incomplete dominance.
d. If h= exactly, the alleles have additive effects: the heterozygote fitness is
the average of the two homozygotes fitnesses.
e. If h< 0, the heterozygotes fitness is greater than 1, and thus greater than that
of the A1A1homozygote; this is called overdominance.
f. Similarly, if h> 1, the heterozygote has lower fitness than the A2A2
homozygote (and of course also the A1A1 homozygote); this is
underdominance.
11
Table 5 Fitness values for different fitness relationships
A1A1 A1A2 A2A2
General fitness w11 w12 w22
Recessive lethal 1 1 0 No dominance, selection against A2A2
Detrimental allele 1 1 1-s No dominance, selection against A2
Dominance 1 1-hs 1-s Partial dominance of A1, selection against A2
Dominance 1 1 1-s Complete dominance of A1, selection against A2
Dominance 1-s 1-s 1 Complete dominance of A1, selection against A1
Heterozygote advantage 1-s1 1 1-s2 Overdominance, selection against A1A1 & A1A2
Heterozygote disadvantage 1+s1 1 1+s2 Underdominance, selection against A1A2
Lethal alleles
These are alleles that cause an organism to die only when present in the
homozygote state. If the mutation is caused by a dominant lethal allele, the
heterozygote for the allele will show the lethal phenotype, the homozygote
dominant is impossible. If the mutation is caused by a recessive lethal allele, the
homozygote for the allele will have the lethal phenotype. Most lethal genes are
recessive. Many lethal alleles prevent cell division and kill an organism at an early
age. Some lethal alleles exert their effect later in life, e.g. Huntington disease
characterized by progressive degeneration of nervous systems, dementia and early
death between 30-50 years.
Dominant lethal alleles: They modify the Mendelian 3:1 ratio to 2:1. The organism
dies before they can produce progeny, so the mutant dominant allele is removed
from the population in the same generation it arose. Fully dominant lethal alleles
kill the carrier in both homozygous and heterozygous states. Huntingtons disease,
creeper legs (short and stunted) in chicken are a dominant lethal where the
homozygote does not survive.
Recessive lethal alleles: The recessive lethal kills the carrier individual only in the
homozygous state. They maybe in two kinds: (1) one which has no obvious
phenotypic effects in the heterozygotes, and (2) on which exhibits a distinctive
phenotype in the heterozygous state. In many cases, lethal alleles become operative
at the onset of sexual maturity. Examples of recessive lethal in cattle are:
osteopetrosis (Angus and Red Angus), pulmonary hypoplasia and anasarca (PHA)
(Shorthorn). In humans, common examples are cystic fibrosis (poorly functioning
Cl ion transport proteins to the lungs), Tay-Sachs disease (enzyme unable to break
down specific membrane lipids), sickle cell anemia and brachydactyly. The
relative fitness for a recessive lethal is presented in Table 5.
12
A1A1 A1A2 A2A2 Total
Initial frequency p2 2pq q2 1
Fitness 1 1 0
Gametic contribution p2 2pq 0 = (1 + )
2 +
22 12
From Equation 1, 1 =

The average fitness, w, under recessive lethal is: = (1 + )

Therefore, 1 = = [3]
(1+) 1+
0 02
= 1 0 = 0 =
1 + 0 1 + 0
The mean fitness reaches 1 when the population is fixed for A1. The relationship
given for q is a recursive relationship. The allele frequency at any time t+1 is a
function of the frequency at time t, or

+1 =
1 +
1
2 =
1 + 1
When we substitute the value of q1 from equation 3 in this expression, it becomes:
0
2 =
1 + 20
This relationship can be generalized to give the frequency in generation t as a
function of the frequency at generation 0:
0
=
1 + 0
Since there are no recessive homozygotes, the maximum allele frequency possible
is 0.5 in all heterozygotes. Fig 6 demonstrates the expected decline in frequency of
recessive lethal allele at two frequencies. When the frequency of allele frequency is
high, the allele frequency is reduced very quickly.
High throughput data has delineated

lethal haplotypes. This in theory
would allow us to identify carrier
animals and avoid mating them. That
would eliminate recessive lethal
alleles faster than elimination from
natural selection.
13
Selection against recessives
A1A1 A1A2 A2A2 Total
Initial frequency p2 2pq q2 1
Fitness 1 1 1-s
Gametic contribution p2 2pq 2
q (1-s) w=1-sq2
2 + 12
From Equation 1, From Equation 1, 1 = 22

When selecting against recessives, w12=1, w22=1-s, and w is 1-sq2
Therefore, q1 can be written as:
2 (1 ) +
1 =
1 2
(1 )
=
1 2
The change in frequency of A2 is therefore given as:
2 (1 )
=
1 2
Both the average fitness and change in allele frequency are functions of the allele
frequency and the selection coefficient. Selection against recessive alleles is very
efficient at first, but becomes progressively slower because a sizeable proportion of
the recessive allele is part of the heterozygotes as allele frequency decreases.
Therefore, natural selection alone cannot entirely eliminate the recessive allele
even if it is lethal.
14
More than one locus Linkage and linkage disequilibrium
Under random mating alleles at all autosomal loci combine at random to
form genotypes to attain equilibrium under Hardy-Weinberg law. The basic
assumption here is that transmission of alleles at a given locus across generations is
independent of alleles at another locus. We also assume that fitness of genotypes at
one locus is not affected by genotypes at another locus. For several loci, these
assumptions would likely be violated.
Lets consider A locus with two alleles A1 and A2 at frequencies
and a B locus also with two alleles B1 and B2 at frequencies ,
respectively. Under Hardy-Weinberg proportions, + = 1, + = 1,
and expected genotypic frequencies are 2 + 2 + 2 2 + 2 + 2 ,
respectively. Alleles at A locus may combine at random or in a non-random way
with alleles at the B locus.
Random association of alleles showing

expected gametic frequency under equilibrium
Allele/ A1 A2
frequency
B1 A1B1 A2B1

B2 A1B2 A2B2

Lets use some classical notations to represent the actual gametic

frequencies. Let r, s, t and u represent the actual or observed gametic frequencies
of A1B1, A1B2, A2A1 and A2A2, respectively. Under random association of
gametes, = = = + + + = 1. The state of random gametic
association between alleles of different genes is called LINKAGE
EQUILIBRUIM. If two loci are in linkage equilibrium, it means that they are
inherited completely independently in each generation. An example would be loci
that are on two different chromosomes and encode unrelated, non-interacting
proteins.
Under random mating and other assumptions of Hardy-Weinberg
equilibrium, linkage equilibrium between loci is attainable. However, unlike single
15
locus, the attainment of gametic or linkage equilibrium depends on the rate of
recombination in genotypes heterozygous to both loci.
There are two types of double gametic heterozygotes:
1 1

2 2
1 2

2 1
Gamete Expected frequency Observed frequency

A1B1 r Coupling
A1B2 s Repulsive
A2B1 t Repulsive
A2B2 u Coupling
The observed gametic frequency differs from the expected gametic frequency by
an amount D. We measure the non-randomness of the gametic frequencies by
means of deviation from two loci equilibrium. D is the gametic disequilibrium
coefficient. Gametic disequilibrium is often referred to as linkage disequilibrium.
This may be confusing because genes or loci need not be linked to be in gametic
disequilibrium. The gametic disequilibrium coefficient, D is similar to the effect of
inbreeding on genotypic frequencies at a single locus. The Heterozygote deficit
interpretation of inbreeding coefficient, F, has been called a one-locus
disequilibrium coefficient.
= +
=
=
= +
The most common expression of D is:
=
D is therefore the difference between the coupling and repulsive gametic types.
= ( + )( + ) ( )( )
[You can work on the proof in your spare time].
If two genes are in linkage disequilibrium, it means that certain alleles of
each gene are inherited together more often than would be expected by chance.
This may be due to actual genetic linkage, i.e., the genes are closely located on the
16
same chromosome. Or it could be due to some form of functional interaction where
some combinations of alleles at the two loci affect the viability of potential
offspring. It should be noted that an observed non-random association of
alleles/genotypes need not be caused by their chromosomal location. Any of the
evolutionary forces (mutation, random genetic drift, selection and gene flow) can,
at least temporarily, cause such associations.
Recombination
Lets consider the following:
The gametes produced by this genotype A1B1/A2B2 are of four types:
Type 1: A1B1 non-recombinant with frequency (1-c)/2

Type 2: A1B2 recombinant with frequency c/2
Type 3: A2B1 recombinant with frequency c/2
Type 4: A2B2 non-recombinant with frequency (1-c)/2
Gametic types 1 and 2 are called non-recombinants because the gametes are
associated with in the same manner as previous generation. Gametic types 3 and 4
are known as recombinants because the gametes are associated differently than in
the previous generation. As a result of Mendelian segregation, f(A1B1)=f(A2B2);
and f(A1B2)=f(A2B1). However, the (12) + (21) does not have to be
equal to (11) + (22). The proportion of recombinant gametes produced
by the double heterozygote is called the recombination fraction, c and the
proportion of non-recombinant gametes is 1-c.
The recombination fraction between genes depends on whether they are on
the same chromosome, and also the physical distance between them. During
meiosis, the four chromatids (of two genes) align. The two inner chromatids can
undergo breakage and exchange of parts (recombination) between the two
chromatids. Thus, only 50% or (0.5) of the chromatids can undergo recombination.
Therefore, the maximum recombination rate, cmax=0.5. For genes on different
chromosomes or far apart on the same chromosome, the recombination fraction,
c=0.5 as the four gametic types are produced in equal frequency. Genes that have
c<0.5 must necessarily be the same chromosome, and such genes are said to be
linked. When c=0, the two genes are very close to each other such that break
almost never happens, and they are transmitted together as one super gene.
17
Gametic disequilibrium and frequency of gamete change over time
The gametic disequilibrium changes from one generation to the next. Let the
frequencies of A1B1, A1B2, A2B1 and A2B2 be r, s, t and u, respectively. Now,
lets construct the gametic frequency of offspring.
Proportion among gametes
Genotype A1B1 A1B2 A2B1 A2B2
A1B1/A1B1 1 0 0 0
A1B1/A1B2 0 0
A1B1/A2B1 0 0
A1B1/A2B2 (1-c) c c (1-c)
A1B2/A1B2 0 1 0 0
A1B2/A2B1 c (1-c) (1-c) c
A1B2/A2B2 0 0
A2B1/A2B1 0 0 1 0
A2B1/A2B2 0 0
A2B2/A2B2 0 0 0 1
There are ten different two-locus genotypes, therefore full mating table would take
100 rows. Assuming Hardy-Weinberg equilibrium, we can calculate the frequency
with which any one genotype will produce a particular gamete.
Genotype and the frequency of their progeny gametes

Gametes
Genotype Frequency A1B1 A1B2 A2B1 A2B2
2 2
A1B1/A1B1 r r
A1B1/A1B2 2rs rs rs
A1B1/A2B1 2rt rt rt
A1B1/A2B2 2ru (1-c)ru (c)ru (c)ru (1-c)ru
A1B2/A1B2 s2 s2
A1B2/A2B1 2st (c)st (1-c)st (1-c)st (c)st
A1B2/A2B2 2su su su
A2B1/A2B1 t2 t2
A2B1/A2B2 2tu tu tu
A2B2/A2B2 u2 u2
Total 1 = 0 = 0 = 0 = 0
18
The frequencies of the four gametes after one generation of selection are:
= 0
= 0
= 0
= 0
where D0 is the LD at the preceding generation.
1 =
= [( 0 )( 0 )] [( 0 )( 0 )]
This recursive relationship leads to a general relationship:
= 0 (1 )
where Dt is the D at generation, t. The LD decays each generation at a rate
determined by the degree of recombination. The maximum value of D (+0.25)
occurs when there are only coupling gametes (r=u=0.5). The minimum value of D
(-0.25) occurs when there are only repulsive gametes (s=t=0.5). Thus, the value of
D varies from -0.25 to +0.25. If there is free recombination between two loci
(either on different chromosomes or far apart from each other where c=, D would
be eliminated in about 7 generations (D7=0.00195). However, if c is much less than
0.5, e.g. 0.05, then the decay in disequilibrium will take a substantial period of
time.
A major problem with D is that, its maximum value changes as a function of
allele frequencies at the two loci. As a result, a standardizing D to the maximum
possible value was proposed by Lewontin (1964), where

=

Dmax is equal to the lesser of if D is positive or less of
if D is negative. varies between -1 and 1 regardless of the allele frequency at the
two loci, and it also provides a matrix to compare LD to be to the maximum
possible value it can be.
To determine how long it takes for D to decay to a given value D*, the recursive
equation for Dt can be solved for the number of generations, t, as:
( /)
=
(1 )
When c=0.1, it will take 6.58 and 28.43 years for half and 90% of the LD,
respectively to disappear, however, for c=0.05, it will take 13.51 and 44.89 years,
respectively for half and 90% of the LD to disappear.
19
The gametic disequilibrium coefficient, r is also used as a measure of LD:
2
2
=

where r is the square root of above equation. When the allele frequencies are the
same at both loci, r, ranges from 0 to 1. When the allele frequencies are different at
both loci both r2 and r are somewhat smaller. The value of the Chi-square, X2 is
numerically equal to r2N, where N is the total number of chromosomes examined.
The biological meaning of r is that it is the correlation between alleles present in
the same chromosome.
APPLICATION
Originally the definition of LD was in terms of gametic frequencies because that
allowed for the possibility that the loci are on different chromosomes. However,
the usual application now is to loci on the same chromosome. In that case, the
allele pair AB is a haplotype, and is the observed haplotype frequency. is
estimated from the allele and haplotype frequencies in the sample.
=
The quantity is the coefficient of linkage disequilibrium defined for a specific
pair of alleles, A and B, and does not depend on how many other alleles are at the
two loci. Each pair of alleles has its own D. The values for different pairs of alleles
are constrained by the fact that the allele frequencies at both loci and the haplotype
frequency have to add up to 1. If both loci have two alleles, e.g. SNPs, the
constraint is strong enough that one value of D is needed to characterize LD
between those loci, and = = = , where a and b are the
other alleles. In this case, the D is used without a subscript. The sign of D is
arbitrary and depends on which pair of alleles one starts with.
Higher-order disequilibria: The disequilibria can be considered for alleles at

three or more loci. For alleles at three loci (A, B, and C) the third-order coefficient
is:
=
Where , are pairwise disequilibrium coefficients, and can
be viewed as analogous to the three-way interaction term in an analysis of variance
20
and can be interpreted as the non-independence among these alleles that is not
accounted for by the pairwise coefficients.
Another measure is defined to be:

= +
It is a conditional probability that a chromosome carries an A allele, given that it
carries a B allele. It is useful for characterizing the extent to which a particular
allele is associated with a genetic disease.
Estimating and testing significance of Linkage Disequilibrium

For most populations the only information available is the frequency distribution of
multi-locus genotypes while the gametic composition of most zygotes can be
resolved from the genotype (e.g. an A1A2B1B1 must come from A1B1 and A2B1
gametes), double heterozygotes which can come from the union of A1B1 and
A2B2 or A1B2 and A2B1 gametes, cannot be resolved definitely. Assuming
random mating, it is not necessary to discriminate between coupling and repulsive
heterozygotes. In this case, the unbiased estimator of D is given by
41111 + 2(1112 + 1211 ) + 1212

11 =
[ 21 1 ]
1 2
where N is the total sample size, the terms in the numerator are observed numbers
of the four genotypes, and 1 and 1 are estimates of allele frequency.
Examples of LD
B1B1 B1B2 B2B2 Total
A1A1 40 60 28 128
A1A2 10 48 36 94
A2A2 4 14 26 44
Total 54 122 90 266
A locus B locus
A1A1 PA=128/266=0.4812 B1B1 PB=54/266=0.2030
A1A2 HA=94/266=0.3534 B1B2 HB=122/266=0.4586
A2A2 QA=44/266=0.1654 B2B2 QB=90/266=0.3383
pA=0.4812+(0.3534)=0.6579 pB=0.2030+(0.4586)=0.4323
qA=0.1654+(0.3534)=0.3421 qB=0.3383+(0.3383)=0.5677
21
266 4 40 + 2(60 + 10) + 48
0 =
[ 2 0.6579 0.4323] = 0.0856
266 1 2 266
what does this mean? Since D is positive, the maximum value of D is the lesser of
qApB or pAqB. Since qApB = 0.3421*0.4323 =0.1479, and pAqB
=0.6579*0.5677=0.3735 we chose the former. Therefore,
0.0856
= = = 0.5790
0.1479
This tells us that D is about 57.90% of its maximum value. With a given
recombination rate, c, the value of D will change over time.
2
2 0.08562
= = = 0.1327
0.6579 0.4323 0.3421 0.5677
2 = 2 = 0.1327 266 = 35.2868

There are 4 chromosomal types, and since we estimated two allele frequencies
from the data, the degrees of freedom=4-1-2=1. Since 35.2868 is greater than X2
value at p=0.05, at 1 df (=3.84), we can conclude that the gametic types are no in
linkage equilibrium.
LD with SNP data

Without considering distance between two polymorphic SNPs, lets visualize the
following on bovine chromosome 1:
SNP1 SNP2
AGGT CCT..GATT CAA
AGGT CCT..GATT CAA
SNP1 SNP2
Allele Allele Frequency Allele Allele Frequency
1 G pA 1 A pB
2 C qA 2 T qB
22
Combination of SNPs into haplotypes
SNP2
Allele A T
SNP1 G GA GT
C CA CT
Haplotype Expected frequency Observed frequency

GA pApB r+D
GT pAqB s-D
CA qApB t-D
CT qAqB u+D
Lets consider some SNP data from 1,000 bulls

GA = 280; GT =300; CA = 75; CT=245
Observed Observed Allele

Haplotype Number frequency Allele frequency Haplotype Expected frequency
GA 280 r=0.2800 G pA=0.580 GA 0.58*0.355=0.2059
GT 300 s=0.3000 C qA=0.420 GT 0.58*0.645=0.3741
CA 75 t=0.0750 T pB=0.645 CA 0.42*0.355=0.1491
CT 345 u=0.3450 A qB=0.355 CT 0.42*0.645=0.2709
0 = ( ) = (0.280.345) (0.300.075) = 0.0741

Alternatively, DGA can also be calculated as:
= () () = 0.2800 0.2059 0.0741
23
The gametic frequency in a 1,000 chicken population for the naked neck (Na/na)
and dominant I (I/i) are as follows:
Na-I 0.180 r
Na-i 0.707 s
na-I 0.061 t
na-i 0.052 u
Expected allele frequency

f(Na) = f(Na-I) + f(Na-i) = 0.180 + 0.707 = 0.887=
f(na) = f(na-I) + f(na-i) = 0.061 + 0.052 = 0.113=
f(Na) + f(na)= 0.887 + 0.113 = 1.000
f(I) = f(Na-I) + f(na-I) = 0.180 + 0.061 = 0.241=

f(i) = f(Na-i) + f(na-i) = 0.707 + 0.052 = 0.759=
f(I) + f(i)= 0.887 + 0.113 = 1.000
Expected gametic frequencies under Hardy-Weinberg equilibrium

f(Na-I) = f(Na) x f(I) = 0.887 x 0.241 = 0.2138
f(Na-i) = f(Na) x f(i) = 0.887 x 0.759 = 0.6732
f(na-I) = f(na) x f(I) = 0.113 x 0.241 = 0.0272
f(na-i) = f(na) x f(i) = 0.113 x 0.759 = 0.0858
0 = = (0.180 0.052) (0.707 0.061) = 0.0338

Observed frequency = Expected frequency + D0
Observed frequency of Na-I = [f(Na) x f(I)] + D0 = 0.2138 0.0338 = 0.1800
24
The decay in LD is shown in Fig 7 under to different recombination. When there is
no linkage (c=), LD be almost zero by generation 7. However, it takes much
longer for LD to decay when recombination is closer to 0. Since D is negative, the
maximum value of D is the lesser of or pAqA or pBqB. Since pAqA f(Na) x f(na) =
0.877 x 0.113 =0.1002, and pBqB =0.241 x 0.759=0.1829 we chose the former.
Therefore,
0.03377
= = = 0.3369
0.10020
This tells us that D is about 33.69 % of its maximum value.
The observed frequency at generation t = Expected frequency at t=0 + Dt where
= 0 (1 ) where c is the recombination rate. Assuming c=0.1, at generation
2, D2 = -0.0274. The observed frequency of Na-I will be 0.2138-0.0274=0.1864.
Now we can test whether D0 is significantly different from zero or not using Chi-
square.
Null Hypothesis: The observed gametic frequencies do not deviate from the
expected gametic frequencies
2
Since X is allergic to frequencies and fraction, we have to use observed and
expected numbers.
2
(180 213.8)2 (707 673.2)2 (61 27.2)2 (52 85.8)2
= + + + = 62.3571
213.8 673.2 27.2 85.8
Degrees of freedom = 4-1-1 (for estimating f(Na) from the data) 1(for estimating
f(I) from the data=1. X2table, 1 df at p=0.05=3.84. We can reject the null
hypothesis and conclude that the observed gametic frequencies are not in
equilibrium or in linkage disequilibrium.
Population genetics of LD
Linkage disequilibrium is affected by the following:
Selection (both natural and artificial)
Genetic drift
Population subdivision and bottlenecks
Inbreeding, inversion and gene conversion
Applications of LD
Mutation, gene mapping, QTL studies, Genome breeding value estimation
Detecting natural selection
25
Population structure and Gene flow
So far we have assumed that a population is homogeneous, and the
characteristics of the subpopulations sampled from the population would be
identical. This assumption may not be true. The distribution of individuals and
gene (allele) flow connections between different subpopulations can be important
in evolution. By population structure a population geneticist mean that, instead of a
single, simple population, the population may have substructure, i.e., differences in
genetic variation among the subpopulations due to different evolutionary reasons
(genetic drift, nonrandom mating, selection, etc.).
The overall population of subpopulations is referred to as the total
population (T). Individual component of the total population is referred to as
subpopulations (S), local populations or demes. In many real populations, there
may not be obvious structure, and the population is continuous. However, even in
effectively continuous populations, different areas or regions can have different
allele frequency because the mating in the total population is usually nonrandom.
In humans within a country with the same language, most often, there are language
differences suggesting substructure, but it is always difficult to find the exact
boundary where the changeover occurs. Such a population is structured, but
continuous in space. Population structure can therefore be defined as when
subpopulations deviate from Hardy-Weinberg proportions.
Reduction in Heterozygosity is one of the major consequences of population

substructure. The deviation from expected heterozygote frequency in a population
is called inbreeding, F. The inbreeding coefficient, F compares the actual
heterozygotes from the expected heterozygote frequency under Hardy-Weinberg
equilibrium.
The heterozygosity ( ) under equilibrium is the frequency of the
heterozygotes (2pq). With inbreeding, reduces by a factor 1 . Therefore, the
observed frequency of heterozygotes (0 ) becomes 2(1 ).
0 0
= =1

The reduction in heterozygote frequency is implicit with increases in the frequency
of homozygotes. The reduction in heterozygote frequency is divided equally
among the homozygotes. Change in heterozygote frequency is given as
0 = 2 2(1 ) = 2 [2 2] = 2
26
This implies, the two homozygotes would have their respective frequencies
2
increase by (
2
) = . The reason why the reduced heterozygotes are divided
equally to the two homozygotes is that each heterozygote genotype has one of the
two alleles.
The observed and expected genotypic frequency is therefore given as:

Expected genotypic frequency under inbreeding
1 1 1 2 2 2
2
Expected genotype frequency 2 2
Observed genotype frequency 2 + 2(1 ) 2 + 2
If a gene has multiple alleles, 1 , 2 , with respective frequencies 1 , 2 , ,

where 1 + 2 + + = 1, with inbreeding coefficient, F, then
= 2 (1 ) +
{
= 2 (1 )
F coefficients
If individuals mate within subpopulations, they would likely mate with related
individuals than if they mated randomly over the entire population. Sewall Wright
provided an approach to partitioning the genetic variation in subpopulations that
provides an obvious description of differentiation. If are the measure of
heterozygosity in the total and average of the subpopulations, respectively,
Wrights fixation index, which measures the average change in heterozygosity
in subpopulations relative to the total heterozygosity as:

= =1

If individuals are mated at random within the whole population, then = 2.

On the other hand, if there is spatial structure and individuals mate within
subpopulations, then the frequency of heterozygotes will depend on the allele
frequency in that subpopulation,
= 2 ,
If there are a total of k subpopulations, then

= 2
=0
27
Within each subpopulation, there can be a deviation from expected heterozygotes
within that subpopulation. Using the same logic,

= =1

where is a measure of the deviation from Hardy-Weinberg proportions of

expected heterozygotes within subpopulations. Similarly, measures the
deviation from Hardy-Weinberg proportions of expected heterozygotes within the
whole population.

= =1

The heterozygosity within subpopulations is calculated from the observed

heterozygote frequency within the subpopulation.

Consequently, 1 = ; 1 = 1 =

(1 )
Since, = (1 ), 1 = and = 1

1 = (1 )(1 )
If individuals are mating completely at random over the entire population, then
there will be no local variation in allele frequency and each subpopulation will
have the same expected heterozygosity as the total population. In that case =0
and there will be no differentiation among subpopulations. At the other extreme, if
each subpopulation is completely isolated and alleles have become fixed within
each subpopulation, then there is no heterozygosity within the subpopulations. In
that case =1 and there is maximum differentiation among subpopulations
28
Practical example:
A population of 1,600 individuals was divided into three subpopulations and
genotyped for the gene responsible for juicy meat in a delicacy goat breed in
Yourland.
AA Aa aa
Observed numbers
Subpopulation 1 125 250 125 500
Subpopulation 3 80 440 480 1,000
Total population 260 720 620 1,600
Subpopulation 1
125 250 125

1 = = 0.25; 1 = = 0.50; 1 = = 0.25; 1 = 1 + 1 = 0.5; 1 = 0.5
500 500 500
Subpopulation 2
55 30 15
2 = = 0.55; 2 = = 0.30; 2 = = 0.15; 2 = 2 + 2 = 0.7; 1 = 0.3
100 100 100
Subpopulation 3
80 440 480
3 = = 0.08; 3 = = 0.44; 3 = = 0.48; 3 = 3 + 3 = 0.3; 1 = 0.7
1000 1000 1000
Total population
260 720 620
0 = = 0.1625; 0 = = 0.45; 0 = = 0.3875;
1600 1600 1600
0 = + = 0.3875; 0 = 0.6125
AA Aa aa
Expected numbers
Subpopulation 3 90 420 490 1,000
Total population 240.2496 759.5008 600.2496 1,600
29
Expected frequency:
1 = 12 = 0.52 = 0.25; 1 = 21 1 = 20.50.5 = 0.50; 1 = 12 = 0.52 = 0.25
2 = 22 = 0.72 = 0.49; 2 = 22 2 = 20.70.3 = 0.42; 2 = 22 = 0.32 = 0.09
3 = 32 = 0.32 = 0.09; 3 = 23 3 = 20.30.7 = 0.42; 3 = 32 = 0.72 = 0.49
2
0 = 0 = 0.38752 = 0.150156; 0 = 20 0 = 20.38750.6125 = 0.474688;
2
0 = 0 = 0.61252 = 0.375156
Inbreeding coefficient in subpopulations and total population

1 1 0.50
1 = 1 =1 =1 = 0.000
1 21 1 0.50
2 2 0.30
2 = 1 =1 =1 = 0.2857
2 22 2 0.42
3 3 0.44
3 = 1 =1 =1 = 0.0476
3 23 3 0.42
0 0 0.450000
0 = 1 =1 =1 = 0.0520
0 20 0 0.474688
In subpopulation 1, the observed heterozygotes are the same as expected.

In subpopulation 2, there are less heterozygotes observed than expected
In subpopulation 3, there are more heterozygotes than expected
The observed and expected genotypic frequency in subpopulation 2:

2 = 0.2857 = 0.059997
1 1 1 2 2 2
2 2 = 0.42 2
Expected genotype frequency = 0.49 = 0.09
2
Observed genotype frequency + 2(1 ) 2 + 2
= 0.55 = 2 = 0.30 = 2 = 0.15 = 2
1 1 + 2 2 + 3 3 0.5500 + 0.30100 + 0.441000

= = = 0.4500
1600
1 1 + 2 2 + 3 3 0.5500 + 0.42100 + 0.421000
= = = 0.445
1600
= 20 0 = 20.38750.6125 = 0.474688
30
0.450
= 1 =1 = 0.0112
0.445
0.445
= 1 =1 = 0.0632
0.475
0.450
= 1 =1 = 0.0526
0.475
Verification
1 = (1 )(1 )
(1 0.0526) = (1 0.0632)(1 (0.0112))

0.94734 = 1.01120.9368
Some general conclusions

Subpopulation 1 is consistent with Hardy-Weinberg proportions
Subpopulation 2 has experiences some inbreeding
Subpopulation 3 may have experienced heterozygous advantage through
disassortative mating since it has more heterozygotes than expected.
Conclusion concerning the overall degree of genetic differentiation ( )

Subdivision of population, possibly due to genetic drift accounts for 6.32% of the
total genetic variation. The differentiation led to deficiency of heterozygotes over
the total population.
31
QUANTITATIVE GENETICS
Genetic decomposition of a locus on the phenotype
The nature of quantitative traits: A quintessential question all quantitative

geneticists ask is:
How much of the variation in a population with respect to a particular trait
is due to genetic causes and how much is due to environmental factors?
The phenotype (P) can be partitioned into a genotypic value (G) and an
environmental deviation (E).
=+
We will focus our attention on the genetic component, G. Lets consider a single
gene A with two alleles A1 and A2 combining into A1A1, A1A2 and A2A2
Let , be the arbitrary genotypic values for A1A1, A1A2 and A2A2,
respectively. The difference between the two homozygous is 2a. The value of a is
a deviation from 0 (mid-point), which is the average of the two homozygotes. The
heterozygote, A1A2 has a value of d = ak, where k is the degree of dominance.
The alleles A1 and A2 behave in a completely additive manner when k=0. When
k=+1, means the A1 allele is completely dominant over A2 allele; and when k=-1,
means the A2 allele is completely dominant over the A1 allele. If k>+1 means over
dominance, and if k<-1 mean under dominance.
Lets look at some data set. The genotypic values of an AluI polymorphic site at
the 5-region of the bovine growth hormone receptor gene for milk fat are as
follows:
AluI (-/-): -25 designated (A2A2)
AluI(+/-): -23 designated (A1A2)
AluI(+/+): -10 designated (A1A1)
The midpoint of the two homozygotes = [-25 + (-10)]/2 =-17.5.

The value of a=-10-(-17.5) = 7.5 and d = -23-(-17.5)= -5.5; k=d/a = -5.5/7.5=-0.73.
32
Population mean
Lets estimate the population mean () of N individuals assuming a single locus

with two alleles.
Expression of Population Mean

Genotype Frequency Genotypic value Frequency x value
2
A1A1 +a 2
A1A2 2 d 2
2
A2A2 -a 2
2 2 + 2
=
2 + 2 + 2
The denominator is equal to 1. The numerator can be rewritten as:
(2 2 ) + 2
2 2 = ( + )( )
Therefore, the population mean can be written as:
= ( ) + 2
The homozygotes contribute a(p - q) and the heterozygote contributes 2pqd to the
population mean.
From Fig 9, the population mean depends on allele frequency. The population
mean decreases with increasing frequency of the unfavorable allele (Fig 9a). The
population mean increases with increasing frequency of the favorable allele (Fig
9b).
33
Population mean under additivity (k=0):
We have already established that d=ka, therefore, when k=0, d=0.
= ( )
Since p = 1 q, = (1 ) = (1 2)
Population mean under complete dominance (k=1):

Under complete dominance, k=1, which means d=a
= ( ) + 2
= (1 ) + 2(1 )
= 2 + 2 2 2 )
= (1 2 2 )
Genetic Model
The genotypic value of an individual can be written in term of the genetic
decomposition of the genotype.
=++
The genotypic value equals the breeding value A, dominance deviation, D and
epistasis deviation. For simplicity, we will ignore the epistatic deviation and
concentrate on breeding value or additive value and dominance deviation.
=+
Genotypic value, G
The genotypic value can be written as a deviation from the population mean.
11 =
12 =
22 =
11 = [( ) + 2
= + 2 = (1 + ) 2
= (1 1 + + ) 2
11 = 2( )
Subsequently,
12 = ( ) + (1 2)
and
22 = 2( + )
34
BREEDING (Additive) VALUES (A)
An individuals breeding value can be said to be the sum of the additive effects of
the individuals alleles. The concept of additive effects arises from the fact that
parents pass on their alleles to their progeny and not their genotype. Therefore, the
value of an individual judged by the mean value of its progeny is called the
individuals breeding value. The breeding value for an individual at a locus is
defined as the sum of the additive effects of the alleles at the locus.
Allelic value of A1 (1)

An A1 gametes can combine at random with either A1 or A2 to produce A1A1
with genotypic value +a or A1A2 with genotypic value d. Taking into account the
proportions in which they occur, the allelic value of A1 = pa + qd
The mean deviation of the progeny from the population mean is:
+ = + [( ) + 2 = [ + ( )]
[Note: p+1=1; and 1-2p=p+q-2p=q-p]
Allelic value of A2 (2)

An A2 gametes which can combine at random with either A2 or A1 to produce
A2A2 with genotypic value -a or A1A2 with genotypic value d. Taking into
account the proportions in which they occur, the allelic value of A2 = -qa + pd
The mean deviation of the progeny from the population mean is:
+ = + [( ) + 2 = [ + ( )]
When there are only two alleles at a locus, it is more convenient to express their
additive effects in terms of the additive or average effect of allele substitution.
1 = [ + ( )]
2 = [ + ( )]
The effect of substituting one allele with the other is = 1 2 this is, the
average change in the genotypic value when the A1 allele is completely substituted
with the A2 allele.
= 1 2 = + 2 + + + 2 = + + 2 2
= ( + ) + ( 2 2 )
Note that + = 1, ( 2 2 ) = ( + )( )
= + ( )
35
An individuals breeding value A is the sum of all additive effects of its alleles.
When mating is random, the breeding value of a genotype for an individual is
twice the expected mean deviation of its progeny from the population mean. The
deviation is multiplied by two since only one half of the parental alleles are
transmitted to each progeny. Therefore, we can estimate the breeding value of an
individual by mating it to random individuals from the population and taking the
twice the deviation of its offspring mean from the population mean. Breeding
values can be estimated under several scenarios.
The breeding values are:

21 1 1
= { 1 + 2 1 2
22 2 2
Mean breeding value:

The summation of the breeding value multiplied by the frequency for each
genotype will provide the mean breeding value.
1 1 1 2 2 2
Frequency 2 2 2
Breeding value 2 ( ) 2

Mean breeding value + ( )
= 2 ( + ) = 0
Dominance deviation (D)
From the genetic model, we can calculate the dominance deviation as:
=
Since we have already derived both G and A, we can deduce D. Dominance
deviation arise from interaction between alleles at a locus. In the absence of
dominance, G=A.
Lets write G in terms of
11 = 2( ), = + ( )
= +
11 = 2 2
11 = 2( + ) 2 = 2 2 2 + 2 2
36
Therefore,
11 = 2( )
Subsequently,
12 = ( ) + 2
and
22 = 2( + )
1 1 1 2 2 2
2
Frequency 2 2
Genotypic value, G 2( ) ( ) + 2 = 2( + )
Breeding value, A 2 ( ) 2
2
Dominance, D=G-A 2 2 22
Mean Dominance + = 0
COMPONENTS OF GENETIC VARIATION
Genetics as a subject focuses on variability on several levels. Without variability,

there is nothing to study. It is therefore important to quantify variability and
partition the variability into its components. A single locus with two alleles
provides us with three genotypes. We can therefore compute the genotypic
variation.
Estimation of variation: In general we study variation by estimating the variance.

Variance can be estimated as:
( 2
2 = 2 (
) = 2 2
or
(( )2
2 = 2
or
= ( )2
2
However, if = then
2
= 2
37
GENOTYPIC VARIATION
The genotypic variance, 2 can be estimated as:

2 = ( 2 ) 2
Since we have already calculated as a deviation from the population mean ,
then,
2 = ( 2 )
2 = 2 11
2 2
+ 212 2
+ 2 22
2( ) 1 1
= { ( ) + 2 1 2
2( + ) 2 2
Thus, 2 = 2 [2( ]2 + 2[( ) + 2]2 + 2 [2( + )]2
2 = 2 2 + (2)2
Partitioning of the Genetic Variance

Earlier on we defined
=+
The genetic model contains both the additive and dominance values. The variance
of G is:
2 = 2 + 2 + 2
In a population under Hardy-Weinberg equilibrium (without inbreeding), the
covariance between the breeding value and dominance deviation is zero.
= ( )
= [(2 )(2)(2 2 )] + [(2)(( ))(2)] + [( 2 )(2)(22 )]
= 42 3 + 42 2 ( ) + 43 2
= 42 2 ( + + ) = 0
Therefore, we can drop the covariance from the above model. Therefore,
2 = 2 + 2
38
Additive genetic variance,
We can use the same logic used in calculating the genetic variance to calculate the
additive genetic variance. Since we have already calculated as a deviation from
the population mean , then,
2 = ( 2 )
21 = 2 1 1
= {1 + 2 = ( ) 1 2
22 = 2 2 2
2 = 2 (2)2 + 2[( )]2 + 2 (2)2

2 = 42 2 2 + 2( )2 2 + 42 2 2
= 2 2 (2 + 2 2 + 2 + 2)
2 2 (2 + 2 + 2 )
2 = 2 2
Dominance variance,
We have already calculated as a deviation from the population mean ,
therefore,
2 = ( 2 )
2 2 1 1
= { 2 1 2
22 2 2
2 = 2 (22 )2 + 2(2)2 + 2 (22 )2

= 42 4 2 + 83 3 2 + 44 2 2
= 42 2 2 ( 2 + 2 + 2 )
2 = (2)2
2 = 2 2 + (2)2
39
Fig 10 The genotypic (VG), additive (VA) and dominance (VD) variances at
different allele frequency
If there is no dominance (d=0), the dominance variance, 2 = 0, resulting in

2 = 2 .If there is complete dominance (d=a) the additive variance becomes,
2 = 8 3 2
2 = 2
= = 0.5 { 2
= 2
40
Genetic parameter estimations under different allele frequency
= 0.1 = 0.5 = 0.8
1 1 1 2 2 2 1 1 1 2 2 2 1 1 1 2 2 2
Egg weight 50 45 30 50 45 30 50 45 30
Genotypic value, G 10 5 -10 10 5 -10 10 5 -10
Genotypic frequency, f 0.81 0.18 0.01 0.25 0.50 0.25 0.04 0.32 0.65
Population mean=( ) + 2 8.9 2.5 -4.4

= + ( ) 6 10 13
Additive effect
1 = 0.6 5 11.7
2 = -5.4 -5 -2.6
Breeding value, A 1.2 -4.8 -10.8 10 0 -10 20.8 7.8 -5.2

Mean breeding value 0.972 -0.864 -0.108 2.5 0 -2.5 0.832 2.496 -3.328
Dominance Deviation, D -0.1 0.9 -8.1 -2.5 2.5 -2.5 -6.4 1.6 -0.4
Mean dominance deviation -0.081 0.162 -0.081 -0.625 1.25 -0.625 -0.256 0.512 -0.256
Additive variance 6.48 50 54.08

Dominance variance 0.81 6.25 2.56
Genetic variance 7.29 56.25 56.64
41
MOLECULAR GENETICS APPLIED TO ANIMAL BREEDING
GENOME ORGANIZATION
What is a genome?
A genome is an organisms complete set of DNA, including all of its genes. Each
genome contains all of the information needed to build and maintain that organism.
The genome is made up of the DNA in chromosomes as well as the DNA in
mitochondria.
The genome contains instructions or blue print for all activity in an organism. The
instructions are written in a four-letter-language of DNA, i.e. Adenine, Cytosine,
Thymine and Guanine, shorten to A, C, T, and G). Almost every cell in an
eukaryotic organism contains a complete copy of these instructions. The genetic
instructions are stored in pairs of chromosomes. Each chromosome contains genes
which contains the direct instructions for a cell to make a protein. The genome
contains coding sequences (genes) and non-coding sequences of DNA.
42
The genome contains:
1. STRUCTURAL GENES: DNA segments that codes for some specific
RNAs or proteins. Encodes for mRNAs, tRNA, snRNAs, scRNAs, etc
2. FUNCTIONAL SEQUENCES: Regulatory sequences-occur as regulatory
elements (initiation sites, promotor regions, terminator regions, etc)
3. NON-FUNCTIONAL SEQUENCES: Introns, repetitive sequences, and all
the unknowns
DNA: Double stranded helical structure

NUCLEOSOME: DNA is complexed with histones. Each nucleosome consist of
eight histones proteins around which the DNA wraps 1.65
times.
CHROMATOSOME: A nucleosome plus H1 histone. Nucleosomes fold up to
produce a 30 nm fiber that forms loops averaging 300 nm in
height, which are compressed and folded to produce a 250-nm
wide fiber. The tight coiling of the 250 nm fiber produces the
chromatid of a chromosome
We can all agree with these noble hard working

scientists that the genome is very complex and may
never grasp all the complexity. Our knowledge about
the genome keeps improving. There are so many
unanswered questions.
43
We know about 5-10% of the genome encodes for genes. What is the function of
the other 90%? So far there are no good answers. In the 1990s, the non-coding
regions were referred to as junk DNA, but nobody uses the term junk DNA
anymore our knowledge of the genome keeps improving, and some of the so called
junk DNA have elements that the controls gene transcription. Non-coding RNA,
e.g. microRNA depending on the location can affect gene transcription. A fairly
balanced article on junk DNA post ENCODE era and the controversy that ensued
can be found in PLoS Genetics
http://journals.plos.org/plosgenetics/article?id=10.1371/journal.pgen.1004351
THE DOUBLE HELIX

Deoxyribonucleic Acid (DNA) has double stranded helix structure and it encodes
the genetic instructions used in the development and function of all known living
organisms and many viruses. The two strands of DNA run in opposite direction to
each other. Attached to each sugar is one of four nucleobases. It is the sequence of
these four nucleobases along the backbone that encodes genetic code or biological
information. The four nucleobases are two purines (Adenine and Guanine) and two
pyrimidines (Cytosine and Thymine). In the double helix structure, adenine bonds
with thymine (A-T) and guanine bonds with cytosine (C-G). Under the genetic
code, RNA strands are translated to specify the sequence of amino acids within
proteins. The RNA strands are initially created using DNA strands as a template in
a process called transcription.
44
Ribonucleic acid (RNA), unlike DNA is single stranded the folds onto itself rather
than a paired double strand. In RNA, the pyrimidine, thymine is replaced by uracil.
One of the universal functions of RNA is protein synthesis where messenger RNA
(mRNA) molecules direct the assembly of proteins on ribosomes. This process
uses transfer RNA (tRNA) molecules to deliver amino acids to the ribosome,
where ribosomal RNA (tRNA) links amino acids together to form proteins.
GENE
A gene was defined at least four decades before the DNA structure was discovered.
To a population geneticist, a gene is the basic unit of heredity which comes in
pairs, and one pair is transmitted from parent to progeny. A more refined definition
of a gene will be a sequence (instruction manual) on a chromosome that encodes a
protein or a polypeptide.
A gene consist of a 5' untranslated region (5' UTR) or leader sequence that ends to
the position of the first codon used in translation. The 3' UTR is the portion of an
mRNA from the 3' end of the mRNA (trailer sequence) to the position of the last
codon used in translation. The frame of a gene consists of exons and introns.
An exon is any nucleotide sequence encoded by a gene that remains within the final
mature RNA product of that gene.
An intron is a noncoding part of a gene that is spliced out before the RNA is
translated into a protein.
45
46
47
MOLECULAR MARKERS
What is the composition of the intergenic noncoding part of the genome?
Genome Studies
1. Improve annotation of the genome
2. Function and regulation of coding genes
3. Posttranslational regulation of genes
4. Extract potential functions from non-coding and intergenic DNA
For Animal and Poultry Breeding

1. Map quantitative trait loci
2. Identify genes associated with traits of economic importance
3. Estimation of genome breeding values
4. Genetic diversity
5. Gene flow
6. Population studies
7. Epidemiological studies
8. Domestication
9. Toxicity and many others
48
To date, a large proportion of genome studies have been possible because of
genetic markers.
GENETIC MARKER: DNA sequence that can be detected and whose inheritance
can be monitored. The three properties that define a genetic marker are: locus
specificity, polymorphic and ease of genotyping. A marker is said to be
polymorphic when it exits in more than one form
Types of genetic markers

1. Restricted fragment length polymorphism (RFLP)
2. Variable number of tandem repeats (VNTR)
a. Minisatellites
b. DNA fingerprinting
c. microsatellites
3. Sequenced tagged sites (STS) and expressed sequence tags (EST)
4. Random amplified polymorphic DNA (RAPD)
5. Amplified fragment length polymorphism (AFLP)
6. Single stranded conformation polymorphism (SSCP)
7. PCR amplification of specific alleles (PASA)
8. Copy number variation (CNV)
9. Single nucleotide polymorphism (SNP)
a. Anonymous SNP (No known effect on gene function-have been used
extensively in gene mapping, linkage disequilibrium and diversity
studies)
b. cSNP (located within protein coding sequence (May interfere with
gene function by altering the amino acid sequence
c. Candidate SNP- SNP thought to have putative functional effect
d. rSNP (SNP in the regulatory region of a gene; the regulatory region
effect gene expression, e.g. A mutation in the 5' UTR of the endoglin
gene affects the translational initiation and alter the reading frame in
hereditary hemorrhagic telangiectasia (vascular disorder)
e. pSNP (When a phenotype is changed as a result of altered protein
function, cSNP or rSNP may be labelled a pSNP.
f. Synonymous SNP (When a base pair change occurs in a cSNP, but the
cSNP still codes for the same amino acid.
There are several laboratory methods used to detect the aforementioned genetic
markers. Those methods would not be the subject of this course. The most
commonly used markers in farm animal studies are microsatellites and SNPs.
49
50
51
SELECTON THEORY
Selection response (R) is how much gain you make when mating of selected parents. Response
to selection can be evaluated in the short- or long-term.
Success of the selection decisions depend on a number of factors:
1. How heritable is the trait under selection (i.e. the trait in the breeding goal)?
2. How much genetic variation for that trait is there in the population?
3. What is the average accuracy of the EBV, and thus the accuracy of selection?
4. What proportion of the animals will be selected for breeding?
5. In case genetic gain is to be expressed per year, rather than per generation: how long is a
generation?
To optimize the success of a breeding program it is important to

balance the relatively short-term decisions: acquire high genetic
gain, and the long term maintenance of the population:
controlling rate of inbreeding.
SHORT-TERM RESPONSE: Predict a few generations of selection response when the base
population (generation 0) additive genetic variance (heritability) is sufficient to make satisfactory
prediction using the breeders equation (Lush, 1937)
= 2
LONG-TERM RESPONSE: As selection proceeds, allele frequency changes and the base
population genetic parameters fails to predict long term response.
CHANGES IN THE MEAN:
The within-generation mean: This reflects the changes in the entire population and that of the
selected population. Selection can cause changes in the distribution of phenotypes. The within-
generational change is what is referred to as the Selection Differential, (S).
The within-generational change is the means due to selection is:
= 0
Where 0 is the population mean (Generation 0) before selection and is the mean of the
selected parents that produces the progeny population (Generation 1).
52
The between-generation mean: This is the response to selection, R which measures the changes
in mean between the population before and after selection.
= 1 0
Where 1 is the population mean (Generation 1) before selection.
Weighted selection differential: The joint effects of natural and artificial selection affect
selection response. Natural selection is always on the side of fitness and can be in the same
direction or oppose artificial selection.
Important assumption in evaluating predictions of genetic gain: environmental influences remain

constant across generations
Lets examine the unweighted and weighted selection differential and ascertain how they are
influenced by natural selection.
Data from a long term selection program:
1. Calculate the Unweighted selection differential

2. Calculate the Weighted selection differential
3. Where the direction of national selection
53
Male (ram) Female (ewe)
Population mean 24 kg 22 kg
Mating # of offspring measured
1 22 20 2
2 35 29 1
3 23 22 1
4 20 24 2
5 24 20 2
6 30 27 2
7 30 30 0
8 37 22 0
9 22 20 6
10 19 20 10
N=26
Prediction of response to selection from the proportion selected: Selection intensity (i)
The selection differential is limiting when comparing the strength of selection on different traits
or in different populations. When planning a selection program, it would be rather useful to
predict genetic change from certain selection strategy prior to even selecting the parental
population to breed. This is possible when truncation selection (selection of individuals above or
below a certain truncation point or threshold) is practiced. The selection differential can be
derived from the distribution of predicted breeding values or phenotypic values and knowledge
of the proportion of selected individuals. The standardized selection differential, usually called
the selection intensity (i) is the selection differential expressed as a fraction of the phenotypic
standard deviation. The selection intensity is a more useful measure for predicting selection
response or comparing different selection strategies or response in different populations.
Where is the phenotypic standard deviation of the trait: This implies, =
The breeders equation can therefore be written as:
= 2
The breeders equation theoretically holds for a single generation of selection from an unselected
bas population. The reliability of using the breeders equation to predict response to selection
beyond one generation depends on:
54
1. The accuracy of the heritability estimate
2. Absence of environmental changes between generations
3. Insignificant change in the heritability estimate from that of the base population
From population genetics, we learned that heritability depends on allele frequency. Selection
changes allele frequency. Therefore, it should be expected that, heritability will change with
selection. Thus, in the strictest sense, the breeders equation is valid only for one generation.
However, heritability is not expected to change significantly in the first few generations of
selection and in practice, the breeders equation has been used to predict short term response (up
to 3-5 generations of selection.
Accuracy
The breeders Equation can be extended beyond choosing an individual solely on the basis of its
phenotype.
2
2
= 2 = ( ) =

We can rewrite the response to selection equation as:
Where h is the correlation between the phenotypic and breeding values; = which quantifies
the ability to predict the breeding value of an individual from the individuals phenotype. This is
in essence the accuracy of the selection scheme used to select parents. We can therefore express
the breeders equation in terms of accuracy of selection as:
1. Single measurement on an animal
The EBV of an animal can be estimated by regressing the animals BV on its phenotype. With a
single measurement on an animal, the regression coefficient, equals the heritability 2 :
2
= = 2 = 2
2
55
The EBV, of an animal is = 2 ( ) and = = 2
Where P is the phenotypic value of the trait, is the population mean, and g the relationship
between the individual(s) being measured and the individual for which we are estimating BV.
The value of g is 1.0 for an individual's own performance. It is 0.5 for full sibs, progeny or
parents and 0.25 for half sibs or grandparents.
Example 1:
Daily feed consumption (FC) of two individuals A and B are 125g and 135g respectively. The
mean FC is 120g, with heritability of 0.20. Predict the EBV and accuracy of A and B for FC.
A:
EBV=2 ( ) = 0.20 x (128-120) = 1.6 g
Acc=2 = = 0.20 1 = 0.45
B:
EBV=2 ( ) = 0.20 x (135-120) = 3.0 g
Acc=2 = = 0.20 1 = 0.45
Individual B has a higher EBV for FC than A, but both estimates have the same accuracy.
2. Repeated measurement on an animal
Some traits can be measured several times during an animal's lifetime. For example feed
consumption, body weight, egg production. If a trait is measured several times during an animal's
life, each value should be used in an estimate of breeding value. The relationship between
repeated records, termed repeatability becomes important. Repeatability (re) is a measure of
the reliability or strength of the relationship between repeated measurements on an individual.
When using repeated measurements on an individual g is still 1.0 since the animal being
measured and the animal the BV is obtained for are still the same. The value of is now a
function of the number of records (n), heritability (h2) and repeatability (re).
With repeated measurements on an animal:
2 2
= 1+(1) and = 1+(1)

56
Example 2:
Assume that the daily feed intake of individual A (128 g) is an average of 5 measurements, with
a repeatability of 0.40. Predict the EBV and accuracy of A.
2 5 0.20
= ( ) = (128 120) = 3.08
1 + ( 1) 1 + (5 1)0.40
2 5 0.20
= 1+(1) = 1+(51)0.40 1.0 = 0.62

Repeated measurements on A improve its EBV and accuracy for feed intake.
Accuracy of Estimated Breeding Values for different heritability,
Repeatability and number of measurements on an animal.
Number of measurements
Heritability Repeatability 1 5 10
0.10 0.25 0.32 0.50 0.55
0.50 0.32 0.41 0.43
0.75 0.32 0.35 0.36
0.25 0.25 0.50 0.79 0.88
0.50 0.50 0.65 0.67
0.75 0.50 0.56 0.57
0.50 0.50 0.71 0.91 0.95
0.75 0.71 0.79 0.80
Traits with low heritability benefit from multiple measurements since each additional record
contributes toward to total information available, especially when the repeatability is low. If the
repeatability is high, multiple measurements do not add much to the accuracy of EBV.
57
3. Information from Relatives
In a closed population, there is bound to be full sibs (FS) (have both parents in common) and half
sibs (HS) (have one parent in common) that provide additional information in estimating BV.
Siblings have a proportion of their alleles (genes) in common. Full sibs have half of their alleles
in common, and half sibs have a quarter of their alleles in common. In pig, cattle, sheep and goat,
siblings are initially reared together, and the common environment among siblings also creates
additional similarity (maternal environment, temperature, food supply), however, in commercial
poultry similarity due to common environment is non-existent. In non-commercial poultry where
the hen incubates her own eggs and brood her chicks, similarity of siblings due common
environment is in play when estimating BV. The similarity among siblings, t, depends on the
siblings involved.
2 2
= 2 + = 2 +
where, c2 is the environmental correlation among sibs. The regression coefficient is given as:
2 2
= 1+(1) ( ) and = 1+(1)
where n is the number of siblings, t is the correlation among sibs, g is the genetic relationship
among sibs. For full sibs, g=, and for half sibs, g=.
Example 3:
Individual A has 5 half sibs with and FC of 128 g. Predict the EBV and accuracy of A when
environmental correlation c2 is (a) 0, and (b) 0.125. The population mean for FC is 120g and h2
is 0.20. Assume (c) that the 5 records were obtained from full sibs, and c2 is 0.125.
(a) tHS = x 0.20 + 0 = 0.05, and g=0.25
2 5 0.25 0.20
= ( ) = (128 120) = 1.67
1 + ( 1) 1 + (5 1)0.05
58
2 5 0.25 0.20
= = 0.25 = 0.23
1 + ( 1) 1 + (5 1)0.05
(b) tHS = x 0.20 + 0.125 = 0.175, and g=0.25
2 5 0.25 0.20
= ( ) = (128 120) = 1.18
1 + ( 1) 1 + (5 1)0.175
2 5 0.25 0.20
= = 0.25 = 0.06
1 + ( 1) 1 + (5 1)0.175
When there is no measurement on the animal, EBV predicted from relatives is low. The higher
the value of t the lower the EBV.
(c) tFS = x 0.20 + 0.125 = 0.225, and g=0.50
2 5 0.50 0.20
= ( ) = (128 120) = 2.11
1 + ( 1) 1 + (5 1)0.225
2 5 0.50 0.20
= = 0.50 = 0.11
1 + ( 1) 1 + (5 1)0.225
Sib information never results in really high accuracy. Full sib information is limited by
environmental correlations among the sibs. It should not replace individuals own record if it can
be obtained. Rather, it should be used to supplement the information on the individual if sib
information happens to be available.
59
Progeny testing
Using the mean of a parents progeny to predict the parents breeding value, is an alternative
predictor of an individuals breeding value. The correlation between the mean of n progeny, and
the breeding value of the parent is
4 2
= , =
+ 2
2
=
4 + 2 ( 1)
Example:
A breeder selects top 20% of sheep based on performance of 10 offspring. The heritability of
udder size is 0.10, with a phenotypic variance of 50. Predict the response to selection that the
breeder will achieve with this strategy. A selected proportion of 20% results in a selection
intensity of 1.4.
10 0.10
=
4 + 0.10(10 1)
The breeder is disappointed and wants more genetic gain. Predict how much improvement he can
achieve be achieved by selecting the top 10% instead of the top 20% for breeding. What
changed?
The breeder is still not completely satisfied because he wants a genetic gain and decides to base
the selection on the performance of 15 instead of 10 offspring. Predict the selection response for
this new situation. What changed?
From Response per generation to Response per year
The breeders equation thus far calculates response to selection per generation. However, to
In quantitative genetics, generation intervals are generally defined as the

average age of parents at birth of their offspring. In this definition, generation
interval is based on the contributions of parental age classes to newborn
offspring; i.e., the average age of parents is calculated as the sum of ages at
birth of offspring weighted by the contribution of each age class to newborn
offspring. This approach is adopted in the well-known gene flow procedure
(Hill 1974).
calculate the selection response per year, the generation interval is required.
The breeders equation can be calculated as:
60

=

The generation interval L can be calculated separately for males and females and averaged.
Equal numbers of 2 and 3 year old bulls selected as parents: = 2.5
Equal numbers of 2, 3 and 4 year old cows selected as parents: = 3.0 ;
= 2.75 ;
Age structure of animals selected for breeding

Age 2 3 4 5 TOTAL
Male 10 7 3 20
Female 200 175 100 25 500
(102) + (73) + (34)

= = 2.65
10 + 7 + 3
(2002) + (1753) + (1004) + (255)

= = 2.90
200 + 175 + 100 + 25
2.65 + 2.90
= = 2.775
2
High selection intensity means high generation interval, and low

selection intensity means low generation interval. This does not fit
well with maximizing i/L.
i/L should be OPTIMIZED
Optimizing genetic gain will require a balance between increase of

the accuracy and increase of the generation interval
61
Selection Path
The selection strategy of males and females are different. The major differences between the
sexes are:
1. In mammals there is a limited reproduction capacity in females. We assume that
population size is the same across generations. We should be aware that, selected animals
should be capable to produce sufficient progeny to maintain population size. Males
generally can produce more progeny than female and as a result, selection intensity is
higher in males than females. We should also be mindful of the direction of natural
selection to ensure that sufficient progeny is produced.
2. The information sources for estimating breeding values in males and females may be
different. Males may be selected based on progeny performance, whereas females are
selected on their own performance leading to differences in accuracy of selection.
3. The generation interval for the sexes may also be different. If males re selected based on
progeny testing, then on the average, the age at which males will be used for breeding
will be different from that of females.
The aforementioned differences in males and females require different selection paths when
determining response to selection per year. The breeders equation can be written as:
+ , + ,
= =
+ +
The intensity of selection and accuracy of selection and generation interval may be different in
males and females. The genetic standard deviation, however, is a population parameter and is,
therefore, the same between males and females.
A sheep breeder has 200 ewe flock and selecting for weaning weight. Rams are first selected at 2
years old and mated for 3 years. Ewes are first selected at 2 years old, and mated for 5 years.
Each ram is mating to 20 ewes, 80% lambing rate, 50:50 sex ratio, and there is no significant
mortality in adults. The heritability =0.11 and the phenotypic variance is 0.25 kg. Calculate the
response to selection per year.
Age structure of animals selected for breeding

Age 2 3 4 5 6 TOTAL
Male 5 5 10
Female 40 40 40 40 40 200
200 ewes, 80% lambing rate means 160 lambs in total (80 of each sex). Select 5 out of 80 males
each year. The proportion is 5/80=6.25% corresponding to selection intensity, i of ~1.98. Select
40 out of 80 females each year. The proportion is 40/80=50%, corresponding to selection
intensity i of 0.798. Calculate the response to selection per year.
62
We can define four selection paths:
Sires to breed sires (SS)

This is the most stringent selection path to breed new fathers of the fathers. Only elite
sires make it to sire father.
Sires to breed dams (SD)
Within the sires this is a less stringent selection path. These sires will be the fathers of the
breeding females (the dams).
Dams to breed sires (DS)
This is the most stringent selection path within the dams to breed new sires. Only the elite
dams will make it to sire mother.
Dams to breed dams (DD)
This is the least stringent selection path. It depends on the studbook whether there are
selection criteria for new dams.
+ + +
=
+ + +
Selection response can be divided into a number of selection

paths, the number depending on the number of differences in
selection intensity and the accuracy of selection
63
LIVESTOCK BREEDING STRATEGIES
Samuel E Aggrey, PhD
University of Georgia
Athens, GA 30602, USA
saggrey@uga.edu
Several panels have been assembled in the past by governments, international agencies and non-
profit organizations to map out strategies to improve livestock productivity in developing
countries. The goals have been laudable but the outcomes have been far below expected goals.
Breeding strategy in the developing world has become synonymous with turning the axle of
poultry and livestock production to mirror that of advanced countries. In the developing world
genetic improvement has come to imply upgrading a herd usually, that of a national livestock
research institute. Several crossbreeding projects were initiated all across Africa with the goal of
quickly upgrading low producing indigenous and adapted breeds with high producing exotic
breeds from Europe or North America. Management of crossbred herds did not match their
genetic potential and as a result the expected productivity was not realized. The crossbreeding
approach to genetic improvement was not done in a sustainable manner and currently only
remnants of such projects exist. It should be pointed out that in a few cases, crossbreeding on
private farms with improved nutrition and management has been successful but they are not
enough to meet the massive demand for meat and livestock products.
Genetic improvement is a long term endeavor and short term approaches are bound to yield
limited or no success at all. Funding for genetic improvement projects from most international
agencies only last for about 5 years. Funding from national governments could be as short as one
year. A total mismatch of a long term endeavor with a very short term funding can only point in
the direction of limited success if not failure.
In recent times, scientific jargons have been embraced in several projects. Biotechnology is the
silver bullet expected to radically transform the whole agricultural sector in the developing
world. The argument here is not about the potential of biotechnology. When a high powered fuel
is put into a non-functioning engine, the vehicle would still not move. All other parts of the
vehicle should also be functioning. Genomics, high throughput science, biotechnology and
nanotechnology when applied in the proper environment can lead to tremendous increase in
productivity. However, I would argue that, before any of these advanced technologies are
adopted en masse, the well proven methodologies need to be adopted first.
In the developing world, breeding strategies need to have at least four basic components:
1. Assessment
2. Preplanning
3. Technical mechanics of genetic improvement
4. Sustainability
64
A. ASSESSMENT OF EXISTING SYSTEM
Assessment can be done in five broad areas to answer basic questions to determine whether
genetic improvement is even needed at all.
1. Current Production System
a. Who are the breeders?
b. Who are the animal keepers?
c. What are the management practices?
d. Can the current production system support and improvement program?
e. Is reduction in herd size or animal numbers possible?
f. What are the logistics and infrastructure?
g. What is the environmental impact
h. Is the current production system sustainable?
2. Existing Input and Support

a. Water
b. Labor
c. Animal health care
d. Extension
e. Training support
f. Research Support
3. Cultural and Social practices

a. What is the cultural/societal value of animals?
b. What are the significance of raising and/or keeping animals
4. Current Breeding Practices

a. How do genes flow from breeding to producing animals?
i. How do farmers obtain replacement animals?
ii. Pure or crossbred? or no form of improvement?
5. Market Analysis
a. What is the size of the overall market?
b. Can the market improve or grow?
c. Is there demand for the product?
d. What is the purchasing power of the population?
e. Are there export possibilities?
f. Can the market accommodate improvement in the production system?
There should be a fact based justification for genetic improvement. When there is a demand for a
product, there is no need to convince producers to produce more.
65
GENETIC IMPROVEMENT IS A LONG TERM PROGRAM
What we learned from past attempted programs
1. Short term funding (5 years) has been a colossal FAILURE.

2. Economic sustainable plan into the long term is required.
3. Genetic diversity plan (biodiversity) should be required for the long term
Otherwise, do not start!
B. PREPLANNING
In the preplanning stage, both livestock keepers and consumers should be adequately involved in
the early planning and genetic improvement programs. Some questions also need to be
adequately answered at this stage.
1. Is there a demand for increased productivity?

2. Are improved animals needed by livestock keepers without exceeding their capacity to
manage the animals?
3. Will increased supply of external inputs (diet, vaccines, housing, etc.) increase
productivity rather than a new breed?
4. Will consumers accept a new breed, improved strain or crossbred?
In most cases in Africa, livestock keepers have their own breeding criteria and any genetic
improvement program should take that into account when defining the breeding objective. For
example, the Karamoja pastoralist prefers coat color, body size, conformation, horn
configuration and temperament as traits suitable for marketing. In Ethiopia, there is a preferred
phenotypic characteristic of chickens. After all, the breeding objective should be based on
projected profits under future conditions of productions and not merely on the potential to
change trait genetically. The definition of profit may differ from place to place. Whereas, some
places use monetary value to define profit, other may simply use herd size.
It is during the preplanning stage that priorities and the sustainability plan for the entire breeding
strategies should be developed.
PRIORITIES
a. Short terms
b. Medium terms
c. Long terms
1. Can the objectives of the priorities be achieved in the given time?

2. Is there any funding in place or in the future for any of the priority steps?
3. Are outcome bench marks clearly defined?
4. Can the outcomes be achieved?
66
C. TECHNICAL MECHANICS OF GENETIC IMPROVEMENT
BREEDING OBJECTIVES
The breeding objective is defined based on projected profits under future

conditions of production, not merely on the potential to change traits genetically
Breeding is always aimed at the future. Decisions you make now will influence the future
generation(s). The breeding goal that you have defined indicates what you think will be
important in the future. You have analyzed the market and have an idea about what customers
will demand some years from now. Will it be mainly milk or butter or cheese? Will it be mainly
pork chops or ham or bacon? Will it be mainly breast meat or legs or full carcasses? Finally, you
have an idea about the expected developments in production systems and regulations. What are
new developments related to housing systems, nutrition, etc and how are they expected to
influence the performance of your animals? Has the (inter)national government announced new
regulations that may limit your current production system? Should you anticipate to these
upcoming changes?
This means that the best animals for the future conditions of production need to be developed.
How does one define best animal. The definition of the best animal is subjective, depending on
(1) the function of the animal, (2) culture, (3) market structure, (4) production environment, (5)
legislature (6) population structure [pyramidal or segmented] and (7) environment limitations.
Cattle are kept for meat, milk and draft. Depending on the function of the animal within that
particular society, the best animal can be defined. A high milking cow may be suitable for
Wisconsin, but in the hills of Ethiopia, a hardy cow may be suitable.
The best animal should function well within the production

and climatic environment and be culturally acceptable.
Broiler (meat-type) chicken processing changes in the USA

1980 Percentage processed 1990 Percentage processed
67% whole birds 23% whole birds
33% Cut-ups 67% Cut-ups
10% Further processed
The type of birds for cut-ups and further processing is different from just raising whole birds.
This means, breeders would anticipate future markets and develop bird meat demands. It will
also be the best animal for the future.
67
The best animal may not necessarily be a high performance animal for a particular animal
product (milk, meat or fiber), but could be an average performance animal with reasonable
resistance to an endemic disease. Defining the best animal is not an easy one and requires inputs
from animal keepers, consumers, breeders and other stakeholders. Matching genotypes with
suitable environments and societal acceptability depends on the availability of wide range of
genotypes to choose from. A thorough knowledge of similar genotypes in other tropical regions,
including nutrition and local diseases is needed. The phenotypes may be acceptable but may not
necessarily cope in a new environment. The following may be considered in selecting the best
animal:
1. Genetically improving locally adaptable indigenous animals.

2. Introducing breeds/stains from similar environment(s).
3. Crossbreeding of local adaptable animals with high producing animals from similar
environment(s).
4. Crossbreeding with exotic breed (s) with a clear pathway for reliable supply of exotics.
5. Developing a synthetic breed.
India has been successful in developing several local poultry strains most of which are strains of
choice in commercial poultry production. The Australian Brangus cattle are about 38 Brahman
and 58 Angus in their genetic makeup. The cattle are usually sleek black in color, but reds are
also acceptable. Australian Brangus are also good walkers and foragers and "do well" in a wide
variety of situations. South Africa has successfully developed both cattle and poultry breeds.
Data Recording System

Any serious genetic improvement program should have the infrastructure for collecting data.
Without data collection it is almost impossible to undertake any form of tractable genetic
improvement. Large cattle herds are kept by pastoralists in Nigeria and Eastern Africa. There are
several households who own small numbers of animals. Involvement of animal keepers in a
genetic improvement program offers the opportunity to collect data on their animals. Data
repository center with high storage and computing ability is absolutely essential in developing
any improvement programs. In the USA, the US Department of Agriculture is responsible for
storage and analysis of dairy cattle data. Beef cattle data is handled by the various breed
associations and some large cattle ranches. Swine and poultry are handled by their respective
private breeding companies. A data repository agency need to be identified in each African
country and their roles clearly defined. In recent times, the prospects of biotechnology and
genomic selection have been projected as savior for genetic improvement in the developing
world. Regardless of the potential of genomic selection, phenotypic data and pedigree
information have to be collected.
While it is possible to realize genetic gain with well-defined phenotypes

without genomic information, it is NOT possible to realize gains without well-
defined phenotypes even with genomic information (Henryon et al. 2014)
68
When the infrastructure for the well proven methods of genetic improvement is in place,
advanced technologies become easy to adopt. Several novel approaches can be devised for data
collection. Models can be developed by collecting unmeasured phenotypes through the
measurement of a few easy-to-measure phenotypes.
Figure 1 The livestock breeding and improvement cycle
GENETIC IMPROVEMENT PLAN
1. ANIMAL POPULATION AND POPULATION STRUCTURE

A breeding scheme defines the breeding objectives for the production of the next generation of
animals. Animal breeding scheme is a combination of recording selected traits, the estimation of
breeding values, the selection of potential parents and a mating program for the selected parents
including appropriate (artificial) reproduction methods. The breeding scheme will also depend on
the population structure.
69
(a) Breeding Programs with separate breeding and production populations
Separation of breeding and production populations allows the breeder to focus on the objectives
of each population. The purpose of the breeding population is for genetic improvements in traits
of interest. The production population is the vehicle through which commercial production is
enhanced. Genetic material from the breeding population should constantly influence the
production population. Most commercial dairy farmers in developed countries and some parts of
Africa purchase semen from improved bulls to constantly upgrade their herds. A breeding
program in Africa can concentrate on developing males and then sell them to local producers to
improve their flocks in exchange for data collection. There are several advantages to do so in
addition to data collection. This automatically includes the animal keeper in the breeding
scheme. Nobody kills the golden goose. When the farmer sees the benefits of improved animals
without the burden of keeping males, such a scheme is bound to be successful. Over time, this
strategy can become part of the sustainability plan.
When the farmer links the receipt of genetic material to profits,

it becomes easy for the farmer to pay for such genetic material.
That is when the breeding strategy becomes sustainable.
Figure 2 The components of a sustainable animal breeding scheme
Components of the above structure can be adopted for sustainable genetic improvement in the
developing world for cattle and small ruminants and even pigs.
(b) Breeding programs with a pyramidal structure

This structure is often seen in species where trait recording is extensive and also very expensive.
Under this structure only a small number of individuals relative to the production population are
recorded. Genetic improvement is done in a limited number of animals and these animals
become the source of gene flow to the production population. The genetic improvement in small
elite pure lines, the multiplication in the next generation with a much larger number of animals
(parents) and the generation of the production animals in very large numbers in the final
70
generation, leads to a pyramidal structure of such a breeding and production program. This is a
strategy usually employed by poultry and pig operations in developed countries. Whereas some
companies house and develop only elite pure lines, others develop an integrated system from
pure lines to the commercial animal.
Figure 3 The classic pyramidal structure of livestock genetic improvement
Under the pyramid structure, consumer concerns, lobby groups and food services concerns from
the bottom of the pyramid bubbles up into the pure lines. Over time, these concerns are
addressed in the genetic improvement programs in the pure lines. The poultry breeding
companies develop animals for different markets and have the opportunities to respond quicker
to market changes than cattle, especially since generation interval is far shorter in poultry than in
cattle.
In a pyramid structure all sources of genetic variation are exploited. Selection response is
realized in the elite pure lines. The additive genetic variance, accuracy of estimation of breeding
values and the selection intensity becomes important as these three factors determine genetic
gain. The grandparent and parent multiplication levels exploit heterosis via non-additive genetic
variance.
In commercial pig breeding programs and in some rare cases of poultry breeding, usually a three-
way cross is applied. The next figure illustrate a commercial three way cross. Usually, the
terminal male is a purebred selected on growth, feed efficiency and other production
characteristics. The final female is usually a hybrid taking advantage of both production and
reproduction traits.
71
Figure 4 Three way commercial cross breeding scheme
2. SELECTION OR IMPROVEMENT STRATEGY
This stage includes breeding value estimation, selection criteria and genetic models. After
estimating breeding values and evaluating alternative selection decisions on the genetic response
to selection, the actual practical selection and mating of animals can begin. Selection programs
can maximize genetic gains at an inbreeding rate, e.g. 1% or at any level that will that will limit
the accumulation of inbreeding. It is at this stage that factors such as selection intensity and
generation interval are optimized. Several options can be pursued including:
(a) Mass selection

(b) Optimum contributing selection (OCS)-maximizing long term gains by maximizing the
weighted-genetic merit of selected parents while constraining the relationship between
parents
(c) Index selection
(d) Single or multi-trait selection
(e) Correlated traits
Selection allows for choosing of parents of offspring of the next generation. However, a mating
plan needs to be in place to ensure that diversity is always maintained and inbreeding does not
accrue at a faster rate.
72
Mating Strategy
1. Enables selection to align ancestors closer to exact threshold linear relationship.

2. Reduces rate of inbreeding, risk of allele being lost through genetic drift.
3. Reduce variation in the accuracy of breeding values between selected candidates by
increasing connectivity.
4. Genomic information can enable us to develop mating designs that disperse genetic
contributions more efficiently than pedigree information.
a. Minimizing co-ancestry mating.
b. Minimizing the covariance between ancestral contributions.
c. Maximizing the probability that all ancestors contribute chromosomal segments to all
allocated mating.
EVALUATION OF IMPROVEMENT STRATEGY
The traits in the breeding objectives may not necessarily be the selection traits, therefore, it is
important that the traits in the breeding objective and the selected traits are evaluated after each
year. The following evaluation criteria can be considered:
1. Selection response in selected traits.
2. Selection response in breeding objective traits.
3. Annual rate of inbreeding and inbreeding depression.
4. Annual cost of breeding program including appreciation/depreciation of fixed costs.
The annual rate of inbreeding can be used as an indirect measure of diversity in the elite
populations.
It is important to compare the theoretical expected response to the realized response. The actual
weighted selection intensity could be used to evaluate the theoretical response. If there is
discrepancy, then the causes of the discrepancy need to be ascertained. Potential sources of
discrepancy maybe:
(a) Bias in the estimation of breeding values.
(b) Inappropriate genetic model.
(c) Some environmental factors not considered or accounted for.
(d) Selection criteria not strictly adhere to.
(e) Unexpected correlated response in other traits.
DISSEMINATION OF GENETIC MATERIAL TO PRODUCTION POPULATIONS
The alleles (genes) of the improved population from here on are disseminated to the production
population depending on the population structure. Mostly, several forms of crossbreeding are
pursued to take advantage of heterosis or hybrid vigor. Heterosis is the change in performance of
crossbred animals over that of the purebreds.
73
ECONOMIC AND GENETIC SUSTAINABILITY OF BREEDING PROGRAM
A breeding program is the organized structure set up to realize the desired gain in the production
population. It is important for producers to also have a sense of improvement in their
populations. Producers can only judge the benefit of a breeding program when the productivity
of their animals improves and their profit margins go up. It is easy for farmers to pay for
genetic material when they make a direct link of their profit margins to the genetic material they
received. Economic sustainability can be achieved only when producers of improved
animals can recover their cost and make a profit from recipients of their improved
animals.
Pertinent questions to ask at this point are:
1. Can breeding programs sponsored for up to five years be economically sustainable?

2. Is the breeding program also genetically sustainable?
Genetic variation is the raw material for genetic improvement. When a genetic improvement
strategy leads to genetic gain in traits, there is a loss of genetic variation. The inbreeding level
and genetic diversity in the indigenous populations being improved for production also need to
be constantly monitored to ensure that genetic variation between breeds (biodiversity) is
preserved for the future.
74

Population Genetics Lecture Notes-2016 Biology

Uploaded by

Copyright:

Available Formats

Population Genetics Lecture Notes-2016 Biology

Uploaded by

Document Information

Original Description:

Copyright

Available Formats

Share this document

Share or Embed Document

Sharing Options

Did you find this document useful?

Is this content inappropriate?

Copyright:

Available Formats

Population Genetics Lecture Notes-2016 Biology

Uploaded by

Copyright:

Available Formats

Training course in Quantitative Genetics and Genomics

Biosciences East and Central Africa-

POPULATION AND QUANTITATIVE GENETICS

GENOME ORGANIZATION AND GENETIC MARKERS

Samuel E Aggrey, PhD

Genetic composition Population

1. The number of alleles at a locus A group of breeding

3. The frequency of genotypes at a locus

Derivation of the Hardy-Weinberg principle

Frequency (A1A1 in zygotes) = P2 + PH +PH +H2

6. Generations do not overlap

Why is Hardy-Weinberg principle so important? Is there any population anywhere

Testing for deviation from Hardy-Weinberg Equilibrium

Departure from Hardy-Weinberg equilibrium can be tested from a sample scored

An alternate way to estimate differences of observed frequencies from expected

The X2-Table at p=0.05 at 1 degree of freedom is 3.84. Since the X2 calculate is

Hardy Weinberg frequencies for three autosomal alleles at a single locus

Genotype Frequency Number

Genotype Frequency Number

Assuming there is no mutation, and that Mendel's law of segregation is operational,

High throughput data has delineated

Random association of alleles showing

Lets use some classical notations to represent the actual gametic

Gamete Expected frequency Observed frequency

Type 1: A1B1 non-recombinant with frequency (1-c)/2

Genotype and the frequency of their progeny gametes

Higher-order disequilibria: The disequilibria can be considered for alleles at

Another measure is defined to be:

Estimating and testing significance of Linkage Disequilibrium

41111 + 2(1112 + 1211 ) + 1212

2 = 2 = 0.1327 266 = 35.2868

LD with SNP data

Haplotype Expected frequency Observed frequency

Lets consider some SNP data from 1,000 bulls

Observed Observed Allele

0 = ( ) = (0.280.345) (0.300.075) = 0.0741

Expected allele frequency

f(I) = f(Na-I) + f(na-I) = 0.180 + 0.061 = 0.241=

Expected gametic frequencies under Hardy-Weinberg equilibrium

0 = = (0.180 0.052) (0.707 0.061) = 0.0338

Reduction in Heterozygosity is one of the major consequences of population

The observed and expected genotypic frequency is therefore given as:

If a gene has multiple alleles, 1 , 2 , with respective frequencies 1 , 2 , ,

If individuals are mated at random within the whole population, then = 2.

where is a measure of the deviation from Hardy-Weinberg proportions of

The heterozygosity within subpopulations is calculated from the observed

125 250 125

Inbreeding coefficient in subpopulations and total population

In subpopulation 1, the observed heterozygotes are the same as expected.

The observed and expected genotypic frequency in subpopulation 2:

1 1 + 2 2 + 3 3 0.5500 + 0.30100 + 0.441000

(1 0.0526) = (1 0.0632)(1 (0.0112))

Some general conclusions

Conclusion concerning the overall degree of genetic differentiation ( )

Genetic decomposition of a locus on the phenotype

The nature of quantitative traits: A quintessential question all quantitative

The midpoint of the two homozygotes = [-25 + (-10)]/2 =-17.5.

Lets estimate the population mean () of N individuals assuming a single locus

Expression of Population Mean