Annurev Animal 020518 115024

Download as pdf or txt
Download as pdf or txt
You are on page 1of 14

AV07CH05_Hayes ARjats.

cls January 11, 2019 12:38

Annual Review of Animal Biosciences


1000 Bull Genomes Project to
Map Simple and Complex
Genetic Traits in Cattle:
Applications and Outcomes
Ben J. Hayes1,2 and Hans D. Daetwyler2,3
1
Queensland Alliance for Agriculture and Food Innovation, The University of Queensland,
St Lucia, Queensland 4067, Australia; email: b.hayes@uq.edu.au
2
Agriculture Victoria Research, AgriBio, Bundoora, Victoria 3083, Australia
3
School of Applied Systems Biology, La Trobe University, Bundoora, Victoria 3083, Australia

Annu. Rev. Anim. Biosci. 2019. 7:89–102 Keywords


First published as a Review in Advance on
cattle, whole-genome sequences, deleterious, mutations, complex traits
December 3, 2018

The Annual Review of Animal Biosciences is online at Abstract


animal.annualreviews.org
The 1000 Bull Genomes Project is a collection of whole-genome sequences
https://doi.org/10.1146/annurev-animal-020518-
from 2,703 individuals capturing a significant proportion of the world’s cattle
115024
diversity. So far, 84 million single-nucleotide polymorphisms (SNPs) and
Copyright © 2019 by Annual Reviews.
2.5 million small insertion deletions have been identified in the collection,
All rights reserved
a very high level of genetic diversity. The project has greatly accelerated
the identification of deleterious mutations for a range of genetic diseases, as
well as for embryonic lethals. The rate of identification of causal mutations
for complex traits has been slower, reflecting the typically small effect size
of these mutations and the fact that many are likely in as-yet-unannotated
regulatory regions. Both the deleterious mutations that have been identified
and the mutations associated with complex trait variation have been included
in low-cost SNP array designs, and these arrays are being genotyped in tens
of thousands of dairy and beef cattle, enabling management of deleterious
mutations in these populations as well as genomic selection.

89

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

BACKGROUND
The global cattle population is highly diverse, ranging from breeds specialized for very high milk
production in controlled, temperate environments to numerous local breeds adapted to harsh
tropical conditions. The advent of relatively low-cost whole-genome sequencing has made it pos-
sible to contemplate linking the variation observed in these cattle phenotypes and others important
in dairy and beef cattle production to variation at the genome level.
The first bovine genome reference assembly, based on the Hereford cow Dominette, was pub-
lished in 2009 (1). At the same time, as a result of resequencing, panels of tens of thousands of
single-nucleotide polymorphisms (SNPs) were identified that were polymorphic in a range of
cattle breeds (2, 3). These SNPs, with physically mapped positions on the bovine genome assem-
bly, enabled association trait mapping and reconstructions of past population histories of various
cattle breeds (3).
By 2011, resequencing of the entire genomes of cattle that were foundation animals of breeds
such as Holstein–Friesian dairy cattle had begun. Larkin et al. (4) described the resequencing of
the genomes of Pawnee Farm Arlinda Chief (Chief) and his son Walkway Chief Mark (Mark),
each accounting for ∼7% of all current genomes of Holstein–Friesian dairy cattle. By comparing
the frequencies of the haplotypes of Chief and Mark in the current population, the authors were
able to identify genome regions in which the frequencies of the paternal and maternal haplotype
from each bull differed more than expected owing to drift and to link these regions to possible
candidate polymorphisms for traits that have been under selection.
In the same year, at the 2011 Sir Mark Oliphant Genomics Conference in Melbourne, a group
of researchers met and decided to pool efforts to resequence key ancestors of globally important
dairy and beef breeds. The ultimate aim was to enable accurate imputation of the genotypes of an-
imals genotyped with SNP arrays to whole-genome sequence to find causative mutations affecting
key economic traits in beef and dairy cattle. This was the genesis of the 1000 Bull Genomes Project.

THE 1000 BULL GENOMES PROJECT: A RESOURCE FOR CATTLE


RESEARCHERS AND THE CATTLE INDUSTRY
The 1000 Bull Genomes Project proceeds in “runs.” A run takes place approximately every 6–12
months, and in a run all cattle whole-genome sequences in the collection are processed through
the 1000 Bull Genomes Project pipeline, which detects sequence variants in the form of SNPs
and small insertion deletions (INDELs) from alignments of the sequences. Each animal with
whole-genome sequence is genotyped at all sequence variants detected. The first run of the
1000 Bull Genomes Project took place in 2012 and included 90 key Holstein–Friesian ances-
tor bulls and 43 key Fleckvieh bulls. In the first run, with bulls sequenced at tenfold coverage or
above, 17.4 million variants were detected, including 15.8 million SNPs and 1.6 million INDELs
(Figure 1). A large proportion of these SNPs were confirmed by segregation analysis, and approx-
imately half were already in dbSNP. Since that time, five more runs have been completed. The
number of cattle genomes included in the project has grown to 2,703, and the number of SNPs
detected has grown more than fourfold (Figure 1).
The number of breeds with whole-genome sequences in the project has increased from 2 in
2012 to 121 today (Supplemental Table 1), covering a reasonable proportion of the world’s cattle
diversity, with the possible exception of African breeds (Figure 1b) (Supplemental Table 1).
The number of variants detected in Bos taurus breeds seems to be slowly plateauing, despite
the large number of additional B. taurus animals in Run 6. In contrast, the number of variants
detected in the combined B. taurus–Bos indicus runs is increasing rapidly, despite a relatively small
number of B. indicus animals in Run 5 and Run 6 (Supplemental Table 1). This confirms the
90 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

90
a
80 Bos taurus SNP
Bos taurus INDEL
70 Bos taurus–Bos indicus SNP
Bos taurus–Bos indicus INDEL

60
Million variants

50

40

30

20

10

0
1 2 3 4 5 6
Run

Figure 1
(a) Number of sequence variants [single-nucleotide polymorphisms (SNPs, blue) and small insertions and
deletions (INDELs, orange)] detected in successive iterations of the 1000 Bull Genomes Project. (b) Origin of
breeds or samples sequenced in the project (shaded gray). The authors would like to acknowledge Thuy
Nguyen for help with this figure.

high levels of genetic diversity that have been suggested for B. indicus breeds from SNP data and
targeted resequencing of certain genome regions (3). The majority of SNPs and INDELs were
in intergenic regions, and more than 185,000 were in coding sequences and predicted to cause
amino acid substitutions (Table 1).

www.annualreviews.org • 1000 Bull Genomes Project 91

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

Table 1 Annotation of single-nucleotide polymorphisms (SNPs) and small insertion and


deletions (INDELs) from Run 6 of the 1000 Bull Genomes project
Annotation SNP INDEL
intergenic_variant 28,353,891 1,144,901
intron_variant 11,232,495 476,122
upstream_gene_variant 1,510,605 66,438
downstream_gene_variant 1,282,230 59,886
missense_variant 185,046 N/A
synonymous_variant 179,442 N/A
3_prime_UTR_variant 99,283 4,839
frameshift_variant 1,619
inframe_deletion 1,325
splice_region_variant 32,194 1,236
5_prime_UTR_variant 23,431 849
non_coding_transcript_exon_variant 12,878 346
inframe_insertion 253
stop_gained 3,831 27
splice_donor_variant 1,876 104
splice_acceptor_variant 1,618 135
mature_miRNA_variant 407 17
start_lost 292 N/A
stop_lost 257 N/A
coding_sequence_variant 243 72
stop_retained_variant 126 N/A
non_coding_transcript_variant 82 20
Total 42,920,227 1,758,189

Attempts have been made to detect structural variants, including copy number variants, as
well as large insertions, deletions, and inversions, in the 1000 Bull Genomes sequence collections.
Detecting structural variation from short-read sequence data (the vast majority of sequences in the
data set are 100-bp paired-end reads from Illumina sequencing) is very challenging. Chen et al. (5)
considered structural variants as likely to be real only if there was evidence for transmission from
one generation to the next. Those authors described 3.49 and 0.67 Mb of structural variants that
were validated by sire–son transmission in Holstein and Jersey cattle, respectively. Interestingly,
structural variants were significantly depleted in a set of genes identified as core for eukaryote
function (that is, very few structural variants were discovered in these genes) (5).

BULL GENOMES DATA REVEAL THE IMPACT OF DOMESTICATION,


BREED FORMATION, AND SELECTION ON GENETIC DIVERSITY
OF CATTLE
Boitard et al. (6) reconstructed the past effective population size of cattle from four breeds
(Holstein, Fleckvieh, Jersey, and Angus) using whole-genome sequence data from the 1000 Bull
Genomes Project. The authors clearly demonstrated that domestication and subsequent breed
formation substantially reduced effective population size. This suggests the high levels of poly-
morphism observed in modern cattle genomes (e.g., in the 1000 Bull Genomes data) largely

92 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

accumulated prior to domestication, as current effective population sizes and typical SNP mu-
tation rates (1 × 10−9 ) would not give rise to such high levels of polymorphism.
During domestication, breed formation, and, more recently, intense selection for milk and meat
production, favorable alleles at loci that affect these traits or processes will have been strongly
selected. For instance, coat color or coat pattern is often a breed-defining feature. Boitard et al.
(7) used the 1000 Bull Genomes sequence data and two different approaches to detect significant
signals of positive selection: a within-population approach aimed at identifying selective sweeps
and a population-differentiation approach designed to capture soft or incomplete sweeps. Their
results confirmed already-described, well-known breed-defining or trait-associated loci, including
MC1R (coat color), KIT (coat color and pattern), GHR (growth, milk production), PLAG1 (stature,
age at onset of puberty), and NCAPG/LCORL (stature), and detected several new loci (e.g., ARL15,
PRLR, CYP19A1, and PPM1L). Encouragingly, in some cases, they demonstrated that use of the
sequence data allowed them to pinpoint the underlying causal mutation under selection. They
concluded that the vast majority of adaptive mutations are likely to be regulatory rather than
protein-coding variants.

ACCELERATING DISCOVERY OF DELETERIOUS MUTATIONS


FROM YEARS TO DAYS
The availability of a large collection of sequenced cattle has enabled very rapid discovery of
deleterious mutations. For dominant de novo mutations, Bourneuf et al. (8) outlined the approach
and tested it with seven defects (glass-eyed albino, dominant red, neurocristopathy, osteogenesis
imperfecta type 2, and three variants of bulldog calf syndrome). In each case, they sequenced one
affected individual then compared the heterozygous SNPs to the 1000 Bull Genomes animals’
sequences. To be the causative SNP, it needed to be observed (as heterozygous) in the affected
individual but never (heterozygous) in the (healthy) 1000 Bull Genomes individuals. The putative
variant also had to be predicted to be deleterious to protein function according to Ensembl
Variant Effect Predictor annotations. In all seven cases, the authors were able to identify a single
mutation as the causative variant.
A different approach is necessary to identify embryonic lethal mutations. VanRaden et al. (9)
identified haplotypes of SNP markers that carried embryonic lethal mutations because these hap-
lotypes were never observed as homozygotes, even in very large populations. The most economi-
cally significant haplotype effect that has been identified is designated Holstein Haplotype 1. The
mutation responsible was subsequently discovered to have originated in the Holstein key ancestor
bull Chief and was found to be responsible for 525,000 spontaneous abortions worldwide. This
large number demonstrates how quickly lethal recessive mutations can increase in frequency with
the widespread use of artificial insemination (AI) in dairy cattle. The sequencing of Chief and
Mark, the first resequenced bulls, enabled the discovery of the nonsense mutation in APAF1 (10).
The mutation is now included in SNP arrays and used to screen the hundreds of thousands of cat-
tle genotyped each year for carriers. This mutation was discovered without the use of the 1000 Bull
Genomes Project sequence collection; however, discoveries of other lethal recessives have made
extensive use of the collection. The steps used to identify the lethal mutation underlying Holstein
Haplotype 3 (HH3) illustrates the approach when the 1000 Bull Genomes project data are avail-
able (11). The filters applied to SNPs in the HH3 region were (a) carried in the heterozygous
state by the HH3 carrier bull, (b) absent in the 63 predicted noncarrier Holstein bulls, (c) absent
in the homozygous state in the Holstein bulls with unknown status, and (d) absent in the other
breeds in the 1000 Bull Genomes Project sequence collection (which assumes that the deleterious
mutation was recent). After applying these filters, only one candidate mutation was retained in

www.annualreviews.org • 1000 Bull Genomes Project 93

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

the HH3 region: a thymine-to-cytosine transition in the gene Smc2 (structural maintenance of
chromosome protein 2). This gene is largely conserved all the way back to yeast.
The 1000 Bull Genomes data have also been used to assist in the rapid identification of causal
mutations for Weaver syndrome in Brown Swiss cattle (12), progressive retinal degeneration in
European cattle (13), oculocutaneous albinism in Braunvieh cattle (14), lethal chondrodyspla-
sia in Holstein cattle (15), the belted phenotype in several cattle breeds (16), familial renal syn-
drome (xanthinuria) in Tyrolean Grey cattle (17), recessive embryonic lethal mutations in Holstein
cattle (18), and recessive embryonic lethal mutations in Montbéliarde cattle (19). Perhaps one of
the most interesting mutations identified was the dominant mutation causing bulldog calf syn-
drome in a particular sire family. However, only a small proportion of the calves were affected,
suggesting mosaicism in the sire germline (11).
The availability of the 1000 Bull Genomes resource, and the fact that cattle producers can be
collaborators in recording genetic defects on a very large scale, led Bourneuf et al. (8, p. 13) to
propose,

The availability of large databases in cattle combined with the typical structure of livestock populations
facilitate the rapid detection and functional characterization of de novo deleterious mutations. The study
of mutations underlying sporadic syndromes in cattle, which also occur in humans, offers an interesting
alternative to laboratory animals for confirming the genetic aetiology of isolated clinical case reports
and gaining insights into the molecular mechanisms involved.

As a result of the detection of recessive defect mutations, particularly embryonic lethals, large-
scale screening for these defects in dairy populations has become routine. When the mutations
are identified, they are included in the next round of low-cost SNP array design (20). These arrays
are then genotyped in hundreds of thousands of industry cattle (to enable genomic selection), and
bull breeders/farmers can make decisions on whether to use carrier animals for breeding. These
deleterious mutations are of course also excellent targets for genome editing, particularly in bulls
that otherwise have very high genetic merit.

ACCELERATING DISCOVERY OF CAUSAL MUTATIONS AFFECTING


COMPLEX TRAITS
The discovery of causative mutations underlying variation in complex traits, such as milk and beef
production, has been slow. This reflects the typically small effect sizes of individual mutations,
thousands of which contribute to the total observed genetic variation for a typical complex trait
(21, 22). For example, from 2001 to 2015, three causal mutations affecting milk production traits
were identified in the genes DGAT1 (23), GHR (24), and ABCG2 (25). The identification of these
three mutations followed quantitative trait loci (QTL) linkage studies with microsatellites, fol-
lowed by fine mapping with haplotype-based approaches and then by targeted resequencing of
small QTL intervals.
By using the 1000 Bull Genomes data set, it is now possible to impute whole-genome sequence
data into the tens of thousands of cattle that have been genotyped with SNP arrays, and have phe-
notypes, and then run genome-wide association studies (GWAS) directly on the imputed sequence
data. In proof-of-principle experiments, Daetwyler et al. (11) and Pausch et al. (26) demonstrated
(in a range of breeds) that this approach could identify the DGAT1 and GHR mutations (23, 24).
New putative causal mutations have also been identified with this approach. For example, a
regulatory mutation associated with the AGPAT6 gene affecting early-lactation milk content was
identified in Fleckvieh and Brown Swiss cattle (11, 27). A mutation contributing to curly coat in
Fleckvieh cattle was identified in the KRT27 (keratin 27) gene (11). Using both milk production
94 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

trait data and gene expression data, Xiang et al. (28) reported a splice site QTL at Chr6:87392580
within the fifth exon of kappa casein (CSN3) associated with milk production traits.
A more typical outcome from GWAS with imputed (1000 Bull Genomes) sequence data is the
identification not of a definitive casual mutation but rather of a small number of variants, in high
or complete linkage disequilibrium (LD), that are candidate causative mutations. For example,
Kemper et al. (29) identified two SNPs close to the SLC37A1 gene in complete LD associated
with phosphorus content of milk, gene expression, and protein content of milk (BTA1: 144367474
bp and BTA1: 144377960 bp). The gene is a good candidate for affecting phosphorus content of
milk, as it is a phosphorus–glucose antiporter. Sanchez et al. (30) suggested another SNP in close
proximity (BTA1: 144,398,814) as a potential mutation, in their case affecting concentrations of
alpha S1 casein and alpha-lactalbumin. Table 2 describes candidate loci that have been identified
across a range of traits using imputed sequence data.

Table 2 Candidate loci for complex trait variation that have been identified across a range of traits using (1000 Bull
Genomes) imputed sequence data and genome-wide association
Trait(s) Gene Breeds Reference
Early-lactation milk fat AGPAT6 Fleckvieh, Holstein Daetwyler et al. (11)
content
Late-lactation milk fat GHR, MGST1 Brown Swiss Frischknecht et al. (27)
content
Early-lactation milk fat AGPAT6 Brown Swiss Frischknecht et al. (27)
content
Fat content, protein content SLC37A1, TST, MGST1, Holstein, Fleckvieh, Jersey Pausch et al. (26)
TBC1D22A, ABCG2, CSN1S1,
PAEP, DGAT1, FASN, GHR,
LMAN1, AGPAT6, MBL1
Levels of six major milk SLC37A1, MGST1, ABCG2, Montbéliarde, Normande, Sanchez et al. (30)
proteins (whey proteins CSN1S1, CSN2, CSN1S2, CSN3, Holstein
α-lactalbumin and PAEP, DGAT1, AGPAT6, ALPL,
β-lactoglobulin, casein ANKH, PICALM
αs1, αs2, β, and κ)
Fatty acid profiles in milk LARP1B Holstein–Friesian Duchemin et al. (45)
Fat% and protein% in milk FASN, LALBA Holstein–Friesian, Jersey, Goddard et al. (22)
Australian Red
Milk production and ROBO1, SLC37A1, PSMB2, Holstein–Friesian, Jersey, MacLeod et al. (21)
composition OGDH, MYH9, NCF4, ARNTL2, Australian Red
MGST1, CSN2, CSN3, GC,
RDH8, TTC7B, PROM2, PAEP,
ABO, DGAT1, COX6C, TRIM29,
KRT19, PTRF, ERGIC1, GHR,
SMEK1, WARS, MLH1, GMDS,
MARF1, SCD, PRDX3
Fertility traits, calving traits IGLL1, ATP10A Brown Swiss Frischknecht et al. (46)
Milk production traits BTRC, MGST1, SLC37A1, Holstein–Friesian, Jersey Raven et al. (47)
STAT5A, PAEP, GC, CSF2RB,
MUC1, NCF4, GHDCa
Cow fertility EIF4EBP3b Holstein–Friesian Moore et al. (48)

a
Genes in bold highly differentially expressed in mammary gland in Chamberlain et al. (49).
b
Supported by differential expression in endometrium and corpus luteum of high- and low-fertility dairy cows.

www.annualreviews.org • 1000 Bull Genomes Project 95

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

The GWAS that have been performed using imputed sequence data demonstrate that this
approach leads to quite accurate identification of causative genes, if not causative mutations. Al-
though this is certainly a step forward, ideally the approach would lead more directly to identifica-
tion of causative mutations. There are at least two limitations here. The inaccuracy of imputation
means that in some cases causal mutation genotypes are imputed less accurately than the level of
LD between other SNPs in close proximity and the true causal mutation. Running association
tests on the imputed data could result in the imputed causal variant having a higher P-value than
other SNPs that are in high LD with the true causal variant (27, 31). The second limitation is the
lack of annotation of regulatory regions on the bovine genome. Many mutations affecting com-
plex traits are expected to be in regulatory regions [and human genetics studies support this, e.g.,
Zhu et al. (32)]. Without annotation of the regulatory regions, it is difficult to distinguish between
variants that may affect gene regulation and variants that are simply in high LD with the causal
mutation. The Functional Annotation of Animal Genomes (FAANG) Consortium (see 33, also in
this volume) aims to comprehensively annotate the bovine genome in the next few years (34).
As pointed out above, the high levels of LD within a breed often lead to results in which the
SNPs identified with the most significant effects on the trait in the sequence data are in complete
LD, which makes identifying the causative mutation very challenging. One approach to reducing
the levels of LD in the population is to use phenotypes and imputed sequence data from multiple
breeds. Bouwman et al. (31) used this approach in a very large (58,265 cattle) GWAS of stature.
The meta-analysis included 17 populations and 8 breeds. The resulting confidence intervals in
significant regions were small, often including only one gene. In addition, several putative causative
mutations were identified with supporting evidence from an expression QTL (eQTL) study and
(limited) functional annotations.
The results of the GWAS with imputed sequence data have in some cases been rapidly trans-
lated into industry applications—the SNPs identified with the most significant effects on the traits
have been included in new SNP array designs that are used for routine genome evaluations of dairy
and beef cattle (20).

IMPROVING THE ACCURACY OF WITHIN- AND ACROSS-BREED


GENOMIC SELECTION WITH SEQUENCE VARIANTS
Although the availability of the 1000 Bull Genomes data has led to rapid identification of genetic
defects and assisted in identification of some causative mutations affecting complex traits, the
impact of the whole-genome sequence data on the accuracy of genomic predictions has been more
limited.
The first study to use (imputed) cattle whole-genome sequence data in genomic prediction
used 12,590,056 SNPs imputed in 5,503 Holstein–Friesian bulls with daughter trait deviations
(accurate phenotypes) for a range of dairy traits (35). Reliabilities (accuracy squared) of predictions
were from 2,087 validation bulls, whereas the other 3,416 bulls were used as the reference set for
estimating the SNP effects. Genomic predictions with the (imputed) whole-genome sequence
data were actually less accurate than with the BovineHD array (631,428) SNPs. Frischknecht
et al. (36) also reported no increase in the accuracy of genomic predictions with imputed sequence
data in Brown Swiss dairy cattle.
Subsequent approaches to using the whole-genome sequence data in genomic predictions have
focused on selecting sequence variant subsets to add to routinely used SNP sets, such as those
on the Illumina SNP50 (54K genome-wide SNPs). Brøndum et al. (37) imputed whole-genome
sequence variant genotype data into Nordic Holstein, Danish Jersey, and Nordic Red cattle with
phenotypes for 16 dairy traits. A GWAS on the imputed sequence data was used to identify the

96 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

top 5–15 QTLs per trait per breed, with 3–5 sequence variants used to tag each QTL. A total of
1,623 additional sequence variant QTL markers were selected (for inclusion on a new SNP array
that also included the 54K original markers). As a result of including the new sequence variants,
reliabilities increased by up to 4 percentage points for production traits in Nordic Holsteins, up to
3 percentage points for Nordic Reds, and up to 5 percentage points for French Holsteins. Smaller
gains were observed for mastitis and fertility.
VanRaden et al. (38) imputed sequence variants into 26,970 progeny-tested Holstein–Friesian
bulls. They found that when 6,648 candidate SNPs from the whole-genome sequence with the
largest estimated effects were added to the 60,671 SNPs used in routine evaluations, the reliabil-
ity of genomic prediction was improved by an average of 2.7 percentage points across 33 traits.
Veerkamp et al. (39) did not observe any increase in reliability of genomic predictions using a
similar approach, although their population was restricted to a single breed.
The greatest gains in accuracy of genomic predictions have been observed for multibreed pop-
ulations, particularly in the situation where one breed is not in the reference (or training) pop-
ulation where the genomic predictions are derived but animals from that breed are among the
selection candidates. MacLeod et al. (21) reported that using whole-genome sequence (in their
case, 994,019 sequence variant SNPs in or close to genes) in across-breed genomic predictions
for dairy cattle improved the accuracy of these genomic predictions by approximately 8% for a
breed (Australian Red) that was not in the reference or training population (Figure 2).
Raymond et al. (40) evaluated the use of whole-genome sequence data for across-breed predic-
tions in 595 New Zealand Jersey bulls; 957 Holstein bulls from New Zealand; and 5,553 Dutch
Holstein bulls. They found the highest accuracies of across-breed prediction (up to 0.35) were
achieved when subsets of SNPs were preselected from the whole-genome sequence data using
GWAS, and only those markers (and the routinely used 54K) were used in the genomic prediction.
Using all the variants in whole-genome sequence data did not significantly improve the propor-
tion of genetic variance captured across breeds compared with scenarios with few but preselected
markers.
Taken together, the results of all these studies suggest the following:

1. Within (dairy) breed, the additional accuracy of genomic prediction that can be achieved as
a result of adding whole-genome sequence data is likely to be small; the best results that have
been observed for within-breed predictions were from VanRaden et al. (38), with a 2.7%
increase in the reliability of genomic predictions across traits, where a very large reference
population was available (and the average reliability with 50K was about 60%).
2. The best results (largest increases in accuracy) to date are observed when SNPs are pres-
elected from the sequence data using GWAS and a nonlinear genomic prediction method
is used (e.g., BayesB, BayesR, BayesSSVS). Use of a BLUP method and sequence data does
not result in any additional accuracy when the SNPs from the sequence data are included
(as the effects of these SNPs are shrunk too much to make an impact with this method).
3. The largest gains in accuracy are for multibreed genomic predictions, particularly where
selection candidates for a breed not in the reference (or training) are predicted.

THE FUTURE OF THE 1000 BULL GENOMES PROJECT


We expect the number of sequences in the 1000 Bull Genomes Project to continue to grow as
the cost of sequencing continues to fall. In terms of capturing the diversity of the world’s cattle,
B. taurus beef and dairy cattle are now well represented in the project; however, a concerted effort
to collect sequences from tropically adapted cattle (including B. indicus) should be made.

www.annualreviews.org • 1000 Bull Genomes Project 97

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

0.8

BayesR 800K
0.7
BayesR SEQ
BayesRC Lact

r (genomic estimated breeding value,


0.6

daughter yield deviation) 0.5

0.4

0.3

0.2

0.1

0.0
RH AR RH AR RH AR
Fat yield Milk yield Protein yield
Figure 2
Accuracy (correlation of genomic predictions and trait deviations) for fat, milk, and protein yield from a
reference set of 16,214 Holstein and Jersey bulls and cows, in a Red Holstein (RH) bull validation data set
and Australian Red (AR) cow validation sets (based on table 5 in Reference 21). BayesR 800K (blue) are
genomic predictions based on the BovineHD single-nucleotide polymorphism (SNP) (Illumina San Diego),
BayesR SEQ (orange) are genomic predictions from whole-genome sequence, and BayesRC Lact (gray) are
genomic predictions using the BayesRC method described by MacLeod et al. (21). Figure adapted from
Reference 21.

To make more progress with using whole-genome sequence data in genomic predictions, fur-
ther research is required in several areas. Methods for analyzing the sequence data that are com-
putationally efficient are required, given that both the number of sequence variants identified and
the number of animals with imputed sequence variants are likely to grow rapidly. Highly efficient
approaches for implementing Bayesian methods with sequence data exist, and these methods can
be refined and improved (41–43). Further, biological information, including gene expression (both
differential expression and eQTL) and genome annotation information, could be used to identify
classes of sequence variants more likely to harbor mutations affecting complex traits.
Although the GWAS approach to selecting sequence variants to include in genomic predictions
described above is straightforward to implement and has been demonstrated to improve predic-
tion accuracy, it cannot take full advantage of the sequence data. The many mutations of small
effect, which contribute a substantial proportion of the variance for a typical complex trait, will
not be significant in these GWAS (22). Using the biological information such as gene expression
and genome annotation (including enhancers and other regulatory elements) may improve our
ability to identify these mutations of small effect. MacLeod et al. (21) described a genomic pre-
diction method called BayesRC, which groups sequence variants into classes based on annotation,
differential expression, or other information and allows the proportion of variants in each class
that have no effect (excluded from the model) and small, moderate, and large effects (assumed to

98 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

come from distributions with 0.0001, 0.001, and 0.01 of the genetic variance, respectively) to vary
between classes. When this method was applied to genomic prediction for milk production traits
in dairy cattle, and classes were determined based on whether sequence variants were in genes
that were differentially expressed in lactation experiments, there was some improvement in the
accuracy of genomic predictions (Figure 2). Zhang et al. (44) evaluated the BayesRC approach
in pigs (with imputed sequence data) and also found improvement in the accuracy of genomic
predictions for some (but not all) traits.
MacLeod et al. (21) demonstrated with simulated data that if classes of sequence variants can
be identified that are substantially enriched for causal mutations, the BayesRC approach can result
in significant improvements in the accuracy of genomic predictions, compared with what can be
achieved with high-density array genotypes. With improved annotation of the bovine genome as
a result of the efforts of the FAANG Consortium, this may be a reality within the next few years.

DISCLOSURE STATEMENT
The authors are not aware of any affiliations, memberships, funding, or financial holdings that
might be perceived as affecting the objectivity of this review.

ACKNOWLEDGMENTS
The authors would like to sincerely thank the 1000 Bull Genomes Project consortium, without
which this manuscript and many others would not have been possible. The 1000 Bull Genomes
Project consortium includes Ruedi Fries (Technische Universität München, Germany), Mogens
Lund/Bernt Guldbrandtsen (Aarhus University, Denmark), Didier Boichard (INRA, France),
Paul Stothard (University of Alberta, Genome Canada), Roel Veerkamp (Wageningen UR,
Netherlands), Curt Van Tassell (US Department of Agriculture), Tom Druet (University of
Liege), Birgit Gredler (Qualitas AG), Johanna Vilkki (Natural Resources Institute Finland),
Erik Mullaart (CRV), Alessandro Bagnato (Universitá degli Studi di Milano), Donagh Berry
(TEAGASC), D.-J. De Koning (Swedish University of Agricultural Sciences), Enrico Santus
(Associazione nazionale Allevatori Razza Bruna), James Reecy (Iowa State University), Jerry
Taylor (University of Missouri), Flavio Schenkel (University of Guelph), Cord Drögemüller
(University of Bern), Steve Miller (AgResearch), Dirk Hinrichs (University of Kiel), Beatriz
Villanueva (Spanish National Institute for Agricultural and Food Research and Technology),
Eileen Wall (Scotland’s Rural College), Lorenzo Bomba (Università Cattolica del Sacro Cuore),
Ezequiel Luis Nicolazzi (Fondazione Parco Tecnologico Padano), Luis Varona (Universidad de
Zaragoza), Joanna Szyda (Wroclaw University of Environmental and Life Sciences), Norwegian
University of Life Sciences, Jesús Piedrafita (Universitat Autònoma de Barcelona), Christa
Kuhn (Leibniz Institute for Farm Animal Biology), and Ding Xiang Dong (Chinese Agricultural
University). The authors would also like to thank Mike Goddard for ideas and discussion leading
to some of the results presented in this manuscript.

LITERATURE CITED
1. Bov. Genome Seq. Anal. Consort., Elsik CG, Tellam RL, Worley KC, Gibbs RA, et al. 2009. The genome
sequence of taurine cattle: a window to ruminant biology and evolution. Science 324(5926):522–28
2. Van Tassell CP, Smith TP, Matukumalli LK, Taylor JF, Schnabel RD, et al. 2008. SNP discovery and allele
frequency estimation by deep sequencing of reduced representation libraries. Nat. Methods 5(3):247–52

www.annualreviews.org • 1000 Bull Genomes Project 99

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

3. Bov. HapMap Consort., Gibbs RA, Taylor JF, Van Tassell CP, Barendse W, et al. 2009. Genome-wide
survey of SNP variation uncovers the genetic structure of cattle breeds. Science 324(5926):528–32
4. Larkin DM, Daetwyler HD, Hernandez AG, Wright CL, Hetrick LA, et al. 2012. Whole-genome re-
sequencing of two elite sires for the detection of haplotypes under selection in dairy cattle. PNAS
109(20):7693–98
5. Chen L, Chamberlain AJ, Reich CM, Daetwyler HD, Hayes BJ. 2017. Detection and validation of struc-
tural variations in bovine whole-genome sequence data. Genet. Sel. Evol. 49:13
6. Boitard S, Rodríguez W, Jay F, Mona S, Austerlitz F. 2016. Inferring population size history from large
samples of genome-wide molecular data—an approximate Bayesian computation approach. PLOS Genet.
12(3):e1005877
7. Boitard S, Boussaha M, Capitan A, Rocha D, Servin B. 2016. Uncovering adaptation from sequence data:
lessons from genome resequencing of four cattle breeds. Genetics 203(1):433–50
8. Bourneuf E, Otz P, Pausch H, Jagannathan V, Michot P, et al. 2017. Rapid discovery of de novo deleterious
mutations in cattle enhances the value of livestock as model species. Sci. Rep. 7(1):11466
9. VanRaden PM, Olson KM, Null DJ, Hutchison JL. 2011. Harmful recessive effects on fertility detected
by absence of homozygous haplotypes. J. Dairy Sci. 94(12):6153–61
10. Adams HA, Sonstegard TS, VanRaden PM, Null DJ, Van Tassell CP, et al. 2016. Identification of a non-
sense mutation in APAF1 that is likely causal for a decrease in reproductive efficiency in Holstein dairy
cattle. J. Dairy Sci. 99(8):6693–701
11. Daetwyler HD, Capitan A, Pausch H, Stothard P, van Binsbergen R, et al. 2014. Whole-genome sequenc-
ing of 234 bulls facilitates mapping of monogenic and complex traits in cattle. Nat. Genet. 46(8):858–65
12. Kunz E, Rothammer S, Pausch H, Schwarzenbacher H, Seefried FR, et al. 2016. Confirmation of a non-
synonymous SNP in PNPLA8 as a candidate causal mutation for Weaver syndrome in Brown Swiss cattle.
Genet. Sel. Evol. 48:21
13. Michot P, Chahory S, Marete A, Grohs C, Dagios D, et al. 2016. A reverse genetic approach identifies
an ancestral frameshift mutation in RP1 causing recessive progressive retinal degeneration in European
cattle breeds. Genet. Sel. Evol. 48(1):56
14. Rothammer S, Kunz E, Seichter D, Krebs S, Wassertheurer M, et al. 2017. Detection of two non-
synonymous SNPs in SLC45A2 on BTA20 as candidate causal mutations for oculocutaneous albinism
in Braunvieh cattle. Genet. Sel. Evol. 49(1):73
15. Agerholm JS, Menzi F, McEvoy FJ, Jagannathan V, Drögemüller C. 2016. Lethal chondrodysplasia in a
family of Holstein cattle is associated with a de novo splice site variant of COL2A1. BMC Vet. Res. 12:100
16. Rothammer S, Kunz E, Krebs S, Bitzer F, Hauser A, et al. 2018. Remapping of the belted phenotype in
cattle on BTA3 identifies a multiplication event as the candidate causal mutation. Genet. Sel. Evol. 50(1):36
17. Murgiano L, Jagannathan V, Piffer C, Diez-Prieto I, Bolcato M, et al. 2016. A frameshift mutation in
MOCOS is associated with familial renal syndrome (xanthinuria) in Tyrolean Grey cattle. BMC Vet. Res.
12(1):276
18. Fritz S, Hoze C, Rebours E, Barbat A, Bizard M, et al. 2018. An initiator codon mutation in SDE2 causes
recessive embryonic lethality in Holstein cattle. J. Dairy Sci. 101(7):6220–31
19. Michot P, Fritz S, Barbat A, Boussaha M, Deloche MC, et al. 2017. A missense mutation in PFAS (phos-
phoribosylformylglycinamidine synthase) is likely causal for embryonic lethality associated with the MH1
haplotype in Montbéliarde dairy cattle. J. Dairy Sci. 2100(10):8176–87
20. Boichard D, Boussaha M, Capitan A, Rocha D, Hoze C, et al. 2018. Experience from large scale use of
the EuroGenomics custom SNP chip in cattle. Proc. World Congr. Genet. Appl. Livest. Prod. 4:675
21. MacLeod IM, Bowman PJ, Vander Jagt CJ, Haile-Mariam M, Kemper KE, et al. 2016. Exploiting bio-
logical priors and sequence variants enhances QTL discovery and genomic prediction of complex traits.
BMC Genom. 17:144
22. Goddard ME, Kemper KE, MacLeod IM, Chamberlain AJ, Hayes BJ. 2016. Genetics of complex traits:
prediction of phenotype, identification of causal polymorphisms and genetic architecture. Proc. Biol. Sci.
B 283(1835):20160569

100 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

23. Grisart B, Coppieters W, Farnir F, Karim L, Ford C, et al. 2002. Positional candidate cloning of a QTL
in dairy cattle: identification of a missense mutation in the bovine DGAT1 gene with major effect on milk
yield and composition. Genome Res. 12(2):222–31
24. Blott S, Kim JJ, Moisio S, Schmidt-Küntzel A, Cornet A, et al. 2003. Molecular dissection of a quantitative
trait locus: A phenylalanine-to-tyrosine substitution in the transmembrane domain of the bovine growth
hormone receptor is associated with a major effect on milk yield and composition. Genetics 163(1):253–66
25. Cohen-Zinder M, Seroussi E, Larkin DM, Loor JJ, Everts-van der Wind A, et al. 2005. Identification
of a missense mutation in the bovine ABCG2 gene with a major effect on the QTL on chromosome 6
affecting milk yield and composition in Holstein cattle. Genome Res. 15(7):936–44
26. Pausch H, Emmerling R, Gredler-Grandl B, Fries R, Daetwyler HD, Goddard ME. 2017. Meta-analysis
of sequence-based association studies across three cattle breeds reveals 25 QTL for fat and protein per-
centages in milk at nucleotide resolution. BMC Genom. 18(1):853
27. Frischknecht M, Pausch H, Bapst B, Signer-Hasler H, Flury C, et al. 2017. Highly accurate sequence
imputation enables precise QTL mapping in Brown Swiss cattle. BMC Genom. 18(1):999
28. Xiang R, Hayes BJ, Vander Jagt CJ, MacLeod IM, Khansefid M, et al. 2018. Genome variants associated
with RNA splicing variations in bovine are extensively shared between tissues. BMC Genom. 19(1):521
29. Kemper KE, Littlejohn MD, Lopdell T, Hayes BJ, Bennett LE, et al. 2016. Leveraging genetically simple
traits to identify small-effect variants for complex phenotypes. BMC Genom. 17(1):858
30. Sanchez MP, Govignon-Gion A, Croiseau P, Fritz S, Hozé C, et al. 2017. Within-breed and multi-breed
GWAS on imputed whole-genome sequence variants reveal candidate mutations affecting milk protein
composition in dairy cattle. Genet. Sel. Evol. 49(1):68
31. Bouwman AC, Daetwyler HD, Chamberlain AJ, Ponce CH, Sargolzaei M, et al. 2018. Meta-analysis of
genome-wide association studies for cattle stature identifies common genes that regulate body size in
mammals. Nat. Genet. 50(3):362–67
32. Zhu Z, Zhang F, Hu H, Bakshi A, Robinson MR, et al. 2016. Integration of summary data from GWAS
and eQTL studies predicts complex trait gene targets. Nat. Genet. 48(5):481–87
33. Giuffra E, Tuggle CK, FAANG Consort. 2019. Functional Annotation of Animal Genomes (FAANG):
current achievements and roadmap. Annu. Rev. Anim. Biosci. 7:65–88
34. Andersson L, Archibald AL, Bottema CD, Brauning R, Burgess SC, et al. 2015. FAANG Consortium.
Coordinated international action to accelerate genome-to-phenome with FAANG, the Functional An-
notation of Animal Genomes project. Genome Biol. 16:57
35. van Binsbergen R, Calus MP, Bink MC, van Eeuwijk FA, Schrooten C, Veerkamp RF. 2015. Genomic
prediction using imputed whole-genome sequence data in Holstein Friesian cattle. Genet. Sel. Evol. 47:71
36. Frischknecht M, Meuwissen THE, Bapst B, Seefried FR, Flury C, et al. 2018. Short communication:
genomic prediction using imputed whole-genome sequence variants in Brown Swiss Cattle. J. Dairy Sci.
101(2):1292–96
37. Brøndum RF, Su G, Janss L, Sahana G, Guldbrandtsen B, et al. 2015. Quantitative trait loci markers
derived from whole genome sequence data increases the reliability of genomic prediction. J. Dairy Sci.
98(6):4107–16
38. VanRaden PM, Tooker ME, O’Connell JR, Cole JB, Bickhart DM. 2017. Selecting sequence variants to
improve genomic predictions for dairy cattle. Genet. Sel. Evol. 49(1):32
39. Veerkamp RF, Bouwman AC, Schrooten C, Calus MP. 2016. Genomic prediction using preselected DNA
variants from a GWAS with whole-genome sequence data in Holstein-Friesian cattle. Genet. Sel. Evol.
48(1):95
40. Raymond B, Bouwman AC, Schrooten C, Houwing-Duistermaat J, Veerkamp RF. 2018. Utility of whole-
genome sequence data for across-breed genomic prediction. Genet. Sel. Evol. 50:27
41. Calus MP, Bouwman AC, Schrooten C, Veerkamp RF. 2016. Efficient genomic prediction based on
whole-genome sequence data using split-and-merge Bayesian variable selection. Genet. Sel. Evol. 48(1):49
42. Wang T, Chen YP, Bowman PJ, Goddard ME, Hayes BJ. 2016. A hybrid expectation maximisation and
MCMC sampling algorithm to implement Bayesian mixture model based genomic prediction and QTL
mapping. BMC Genom. 17(1):744

www.annualreviews.org • 1000 Bull Genomes Project 101

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57
AV07CH05_Hayes ARjats.cls January 11, 2019 12:38

43. van den Berg I, Bowman PJ, MacLeod IM, Hayes BJ, Wang T, et al. 2017. Multi-breed genomic prediction
using Bayes R with sequence data and dropping variants with a small effect. Genet. Sel. Evol. 49(1):70
44. Zhang C, Kemp RA, Stothard P, Wang Z, Boddicker N, et al. 2018. Genomic evaluation of feed efficiency
component traits in Duroc pigs using 80K, 650K and whole-genome sequence variants. Genet. Sel. Evol.
50:14
45. Duchemin SI, Bovenhuis H, Megens H-J, Van Arendonk JAM, Visker MHPW. 2017. Fine-mapping of
BTA17 using imputed sequences for associations with de novo synthesized fatty acids in bovine milk.
J. Dairy Sci. 100(11):9125–35
46. Frischknecht M, Bapst B, Seefried FR, Signer-Hasler H, Garrick D, et al. 2017. Genome-wide association
studies of fertility and calving traits in Brown Swiss cattle using imputed whole-genome sequences. BMC
Genom. 18(1):910
47. Raven LA, Cocks BG, Kemper KE, Chamberlain AJ, Vander Jagt CJ, et al. 2016. Targeted imputation of
sequence variants and gene expression profiling identifies twelve candidate genes associated with lactation
volume, composition and calving interval in dairy cattle. Mamm. Genome 27(1–2):81–97
48. Moore SG, Pryce JE, Hayes BJ, Chamberlain AJ, Kemper KE, et al. 2016. Differentially expressed genes
in endometrium and corpus luteum of Holstein cows selected for high and low fertility are enriched for
sequence variants associated with fertility. Biol. Reprod. 94(1):19
49. Chamberlain AJ, Vander Jagt CJ, Hayes BJ, Khansefid M, Marett LC, et al. 2015. Extensive variation
between tissues in allele specific expression in an outbred mammal. BMC Genom. 16:993

102 Hayes • Daetwyler

Downloaded from www.AnnualReviews.org


Guest (guest)
IP: 200.118.60.32
On: Tue, 25 Jun 2024 20:53:57

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy