Genetic Diversity of A Maize Association Population With Restricted Phenology
Genetic Diversity of A Maize Association Population With Restricted Phenology
Genetic Diversity of A Maize Association Population With Restricted Phenology
∑ P
2
D̂ = (1 – u )/[1 – (1 + f )/n],
Pedigree u =1
Pedigree information was compiled from multiple sources (Gerdes in which Pu is the frequency of the uth allele, n is the sample
et al., 1993; Liu et al., 2003; Flint-Garcia et al., 2005; Mikel, size, and f is the inbreeding coefficient estimated from genotype
2006; Mikel and Dudley, 2006; USDA, ARS, National Genetic frequencies (Weir, 1996). Polymorphism information content
Resources Program [available at http://www.ars-grin.gov/cgi-bin/ is another related measure of genetic diversity; for a given loci,
npgs/html/tax_site_acc.pl?NC7%20Zea%20mays%20subsp.%20 PIC was calculated as:
mays; verified 6 Dec. 2010]) and restructured in a previously pro- k k −1 k
∑ P ∑ ∑
2 2 2
n =1–
PIC – 2P uP v ,
posed standard pedigree format (Purdy et al., 1968). Coefficients of u =1
u
u = 1v = u + 1
coancestry were calculated using the computer program RELATE
in which for genotype AuAv, Pu is the frequency of the uth allele
(Bernardo et al., 1997) to produce a relatedness matrix of all inbred
at Au and P v is the frequency of the vth allele at Av (Botstein et
lines (Emik and Terrill, 1949). This program utilizes a modifica-
al., 1980).
tion of the tabular method to calculate the coefficients. The tabular
These summary statistics were calculated for the entire diver-
analysis procedure was modified to account for full inbreeding of
sity panel as well as specific subsets of the panel. The so-called “cate-
individuals and unequal parental contributions. For lines derived
gory” subsets included genotypes unique to the Wisconsin diversity
from sources of unknown origin, the parental contribution from
set, genotypes from the Goodman-Buckler diversity set that meet
that source was considered zero. This will impact germplasm
our phenology restrictions, genotypes from the Goodman-Buckler
derived outside of the United States more than U.S. germplasm
(Supplemental Table 1). Lines derived from open pollinated popu- diversity set that do not meet the phenology restrictions, and all
lations were assumed unrelated in the analysis due to the lack of genotypes from the Goodman-Buckler diversity set. The “maturity
sufficient information regarding the degree of relatedness. group” subsets were determined based on average flowering time
measurements from 2008 and 2009. Maturity groups correspond
to classic industry groupings in days after planting: time to flow-
Molecular Marker Genotyping ering is less than 75 d after planting (2008: 1210 growing degree
Of the 627 lines included in this study, 411 were genotyped days [GDD]; 2009: 1046 GDD), 75 to 85 d after planting (2008:
using the Illumina GoldenGate high-throughput single nucleo- 1210–1430 GDD; 2009: 1046–1214 GDD), 85 to 95 d after plant-
tide polymorphism (SNP) assay (Fan et. al., 2006). Seedling leaf ing (2008: 1430–1601 GDD; 2009: 1214–1410 GDD), 95 to 105
tissue from five to ten plants was bulked for DNA extraction. d after planting (2008: 1601–1789 GDD; 2009: 1410–1607 GDD),
DNA was extracted using the cetyl(trimethyl)ammonium bro- and greater than 105 d after planting (2008: 1789 GDD; 2009: 1607
mide (CTAB) method (Saghai-Maroof et al., 1984). A set of 1536 GDD). Population substructure groups were assigned as the sub-
SNP marker data points was obtained for each genotype. The structure group with the majority membership for each genotype.
remaining 216 lines had already been genotyped using the same If a genotype did not have greater than 50% majority membership
Illumina GoldenGate high-throughput assay (McMullen et al., in any group it was assigned to a “mixed group” (group 9). This
2009). Genotypic data of these inbred lines was obtained from membership threshold was used simply for illustration purposes.
the website (http://www.panzea.org/lit/data_sets.html#genos PowerMarker was also used to calculate genetic relatedness
[verified 6 Dec. 2010]). A subset of the 1536 SNPs were origi- as pair-wise Rogers distances (Rogers, 1972) based on the 511
nally designed to have a unique allele in inbred line B73 for map- unbiased SNP markers. Rogers distances were calculated as:
ping the NAM populations, which resulted in a bias toward B73. m aj
The biased SNPs and those SNPs with inconsistent quality across DR = 1/ m ∑ ∑( p ij − qij )2 ,
the samples were discarded. This resulted in a total of 511 high j i
quality unbiased SNP loci, which were used for the relationship in which pij and qij are the frequencies of the ith allele at the jth
and structure analysis presented in this paper. locus in each genotype, aj is the number of alleles at the jth locus,
and m is the number of loci examined. Unweighted pair group
method with arithmetic mean (UPGMA)-based phylogeny was
Molecular Statistical Analysis constructed using this Rogers distance matrix. FigTree version
Population substructure was determined using the STRUC-
1.2.3 was used to produce the UPGMA tree image (http://tree.
TURE software (Pritchard et al., 2000). An admixture model
bio.ed.ac.uk/software/figtree [verified 6 Dec. 2010]). Multidi-
with a burn-in time and replication number set at 50,000 was
mensional scaling (MDS) analysis was performed using PROC
used for each run. Three runs were performed for each value
MDS in SAS (SAS Institute, 2003) to evaluate genetic relation-
of K (number of populations) from one to ten. The run with
ships using the Rogers dissimilarity matrix.
the maximum likelihood of the observed genotypes given the
number of subpopulations in the model was used to assign the
probability that a line belongs to each substructure group. Field Evaluations
Summary statistics including total number of alleles, group Trials were grown during the summer of 2008 at the West Madi-
specific alleles, average number of alleles per locus, major allele son Agricultural Research Station in Madison, WI, and sum-
frequency, gene diversity, and polymorphism information con- mer of 2009 at the Arlington Agricultural Research Station
tent (PIC) were calculated using PowerMarker version 3.25 (Liu in Arlington, WI, using a randomized complete block design
and Muse, 2005). Gene diversity (expected heterozygosity) is the with two replications. The 2008 replicated trial contained 611
of the 1411 lines described above. In addition, all 1411 lines were
and materials provided by the several collaborators. Lines diversity panel contains 15 lines derived from the Germ-
were visually evaluated in a single replication nursery dur- plasm Enhancement of Maize (GEM; Ames, IA) project
ing the summer of 2008 at the West Madison Agricultural (Pollak, 2003). Fourteen of these lines were found to be, by
Research Station in Madison, WI. A subset of 611 of those pedigree, 25% unadapted and 75% elite germplasm and one
lines was selected based on prior performance and pedigree line was 50% unadapted and 50% elite germplasm. When
information and evaluated in a replicated trial at the Madi- assembling a diversity panel, it is important to have a diverse
son location that same year. Based on the 2008 replicated set of germplasm as well as a balance of alleles. Having mul-
evaluation 548 of those lines were further selected to be part tiple genotypes derived from the same open pollinated pop-
of this diversity panel. Selections were based on maturity, ulation helps to maintain a balance of allele frequencies.
agronomic suitability, inbred uniformity, and seed supply.
An additional 79 lines were identified based on the single Phenotypic Diversity
replication nursery evaluation conducted during 2008 and Variation due to genotype was significant for all phenotypic
were added to the set of 548 lines for the 2009 evaluation. traits when the analysis included lines unique to the Wiscon-
The final Wisconsin diversity set therefore contains 627 sin diversity set, lines in common with the Goodman-Buckler
lines (Supplemental Table 1). The entire Wisconsin diversity diversity set, and the complete Wisconsin diversity set (Table
set is available. Plant introduction information is provided 1). Although the subset of the Goodman-Buckler diversity
for the 606 (of the 627) lines available through the NC7. set that was evaluated here is not representative of all pheno-
Source information is provided for the 21 lines not currently typic diversity in maize, it is representative of the phenotypic
available through the NC7 (Supplemental Table 1). diversity that exists in currently characterized inbred lines that
will mature in a short day growing environment. The lines
Relationships by Pedigree unique to this diversity panel expanded the range of pheno-
Many of the Wisconsin diversity set lines trace back to typic variation for many of these traits relative to the subset of
eight open-pollinated populations including Iowa Stiff Stalk lines from the Goodman-Buckler diversity set that met our
Synthetic (BSSS), Minnesota No. 13, Reid Yellow Dent, phenology restriction. For instance, the current panel dramati-
Lancaster Surecrop, Golden Glow, Funk Yellow Dent, cally enhanced the variability for last leaf with epicuticular
Pride of Saline, and Krug among others. In addition, this wax by decreasing the previous minimum by approximately
three leaves and increasing the previous maximum by 2.5 a similar trend for PIC across maturity groups. Maturity group
leaves. Among other traits, both the upper and lower limit of 3 did not follow these trends, which is likely because there are
variation were expanded for plant height, ear height, upper- significantly more lines in this maturity group. Also, there was
most internode with a developed ear, number of elongated a better balance of alleles in the earlier maturity groups. In the
internodes above the uppermost ear, percentage of leaves with later maturity groups the average major allele frequency across
epicuticular wax, number of leaves with no epicuticular wax, all SNPs is greatest. Maximum power to detect trait loci asso-
and percentage of leaves with no epicuticular wax. The upper ciation occurs when allele frequencies are balanced (Myles et
limit of variation was expanded for stover yield and internode al., 2009). When alleles become rare (<0.05), they are often not
length. The lower limit of variation was expanded for days to included in association mapping or are pooled with other rare
flowering, 300 kernel weight, stalk diameter, and leaf number alleles. In regions where later maturing lines cannot be grown,
(Table 1). A diversity panel intended for association mapping haplotypes from germplasm in the earlier maturity groups are
should utilize the maximum phenotypic diversity possible. the most relevant. Addition of genetic diversity, as indicated by
This increase in phenotypic diversity of characterized inbred gene diversity and PIC based on lines unique to this diversity
lines that will mature in a short day growing environment panel, was greater at early maturity groups and decreased at
for all traits measured except flowering time will increase the later groups due to the consideration of phenology. Although
power to detect trait marker associations for researchers work- the phenology restricted diversity set overall presents a very
ing in short day growing environments. minor reduction in gene diversity (0.3197 to 0.3079) and PIC
(0.2606 to 0.2509) compared to the Goodman-Buckler diver-
Genetic Diversity sity set (Table 2), it will serve as an effective resource for its
Assignment into a maturity group was based on the average intended use.
days after planting (DAP) to flowering from 2008 and 2009. Genetic relatedness was calculated as 1 minus the pair-
The gene diversity was greatest in the earliest maturity group wise Rogers dissimilarity distance (Rogers, 1972) based on
and decreased in the later maturity groups (Table 2). There was 511 unbiased SNP markers (Supplemental Table 2). Only six
of the 474,721 pairs had genetic similarity distances less than number of subpopulations in the model had a K value of
0.5 on a scale from 0 to 1. This is likely an artifact of the bial- eight. This run was used to assign the probability that a line
lelic assay used in this analysis. The UPGMA tree constructed belongs to each substructure group based on membership
from these values shows that the current diversity set is repre- probability thresholds of 50%. By pedigree, groups 1 and
sented on all major branches and most subbranches (the blue 4 are primarily Stiff Stalk Synthetic (SSS) lines, group 2 is
and yellow lines in Fig. 2). Several primary branches contain primarily popcorn, sweet corn, and flint lines, groups 3,
only lines unique to this diversity panel (the blue lines in Fig. 2) 5, 7, and 8 are majority Non-Stiff Stalk Synthetic (NSS),
indicating increased representation of many groups. Although and group 6 are tropical and subtropical lines (Fig. 2 and 3,
there is a branch consisting of predominantly red lines denot- Supplemental Table 1, and Supplemental Fig. 1). The group
ing the genotypes from the Goodman-Buckler diversity set 1 SSS lines are mostly B73 and B14 types and the group
that do not meet phenology restriction (the Tropical and Trop- 4 SSS lines are mostly B37 type. Group 3 NSS lines are
ical/Subtropical groups in Fig. 2), the presence of some yellow mostly WF9 type, group 5 NSS lines are mostly Minnesota
lines in this branch indicate that this group is not completely 13 type, group 7 NSS lines are mostly Oh43 and Iodent
excluded in the current panel. Thus, while phenology restric- types, and the group 8 NSS lines are mostly Mo17 type.
tion did not result in complete loss of representation of late The ideal association mapping panel will have subtle pop-
maturity germplasm, it resulted in a major gain in diversity in ulation substructure and familial relatedness (Zhu et al., 2008).
the areas that did not exist previously. False positive associations can be generated in LD analysis due
to the unequal distribution of alleles within subpopulations
Genetic Structure (Flint-Garcia et. al., 2003; Yu et. al., 2006). Thus, it is impor-
The STRUCTURE (Pritchard et al., 2000) run with the tant to determine the genetic substructure and include it in the
maximum likelihood of the observed genotypes given the model. In this diversity panel, population structure accounted
for between 3% (last leaf with epicuticular wax) and 22% (days group and one within the NSS cloud, consistent with previ-
to flowering [GDD]) of the phenotypic variation (Table 1). ous observations in popcorn lines (Kantety et al., 1995). The
This moderate percentage of phenotypic variation explained NSS lines did not form tight clusters, which is consistent with
by population substructure is desirable. the population substructure results. There are four population
substructures representing the NSS lines while there are only
Comparison between Pedigree two population substructures representing SSS. The MDS plot
and Genetic Relatedness generated from only NSS and SSS classified lines further dem-
Genotypes were grouped into NSS, SSS, tropical, popcorn, onstrates this point (Fig. 5a). When only the NSS and SSS
and sweet corn by pedigree and are color coded on the MDS lines are examined there is very little overlap between the NSS
plot (Fig. 4; Supplemental Table 1). The purpose of MDS plots groups. C103, Mo17, Oh43, Wf9, and PH207 appear in the
is to provide a visual representation of the pattern of related- pedigrees of 58 of the 171 by pedigree NSS lines in this panel.
ness between inbred lines, such that closely related lines will Lines that have one of these founder lines in common by pedi-
be placed near each other on the plot. The SSS, tropical, and gree tend to cluster tightly by genetic distance (Fig. 5a). There
sweet corn classifications form relatively tight clusters on are also subgroups by pedigree that are maintained by genetic
the MDS plot. The popcorn lines included in this diversity distance within the SSS group (Fig. 5b). The SSS subgroups
panel formed two distinct clusters, one cluster near the SSS overlap more than the NSS because B73, B37, and B14 were
all derived from BSSS. The pedigrees of the other SSS do not
trace back to BSSS; however, they do trace back to other Stiff
Stalk Synthetics. For example, there are seven GEM lines that
were crossed to unknown elite inbred lines classified as SSS.
These comparisons indicate that the same population stratifi-
cation observed in maize by pedigree is also observed at the
genetic level.
The relationship between pairs of inbred lines was
determined based on pedigree and genetic analysis. The
parental contribution by pedigree for lines with unknown
origin was considered zero for this analysis. In addition, lines
derived from open pollinated populations were assumed
unrelated due to the lack of sufficient information regarding
the degree of relatedness. This, however, resulted in biased
pair-wise relationships (Supplemental Table 2). To obtain a
correlation value between pedigree and genetic relationships
with minimal bias pair-wise comparisons with a pedigree
Figure 4. Multidimensional scaling plot of 313 lines based on relationship of zero were removed from the analysis. The
511 unbiased single nucleotide polymorphism (SNP) markers. correlation comparison between relatedness by pedigree
Inbred lines with mixed, unrelated, or unknown pedigree were not and relatedness by genotype show a relatively high Pear-
included. Color coded classifications are based on pedigrees. son correlation (R 2 = 0.4892; p < 1 × 10−4) (Fig. 6). There
Coordinates were determined using PROC MDS in SAS (SAS are many reasons that the correlation between genetic and
Institute, 2003) based on a matrix of Rogers distances (Rogers, pedigree distances is significantly less than 1.00. The pedi-
1972). NSS, Non-Stiff Stalk Synthetic; SSS, Stiff Stalk Synthetic. gree distances assume that there is no selection, mutation,
or drift when inbred lines are generated, the first of which insertion was used rather than CGindel587. The mite insertion
is obviously violated in any directed breeding program. In was scored on 534 of the genotypes in this diversity panel.
addition pedigree distances rely solely on identity by descent Flowering time data were collected on one replication in both
and disregard identity by state. Finally, the genetic distances 2008 and 2009. While, genotype × environment interactions
determined by a biallelic assay are likely to be inflated due could not be evaluated, highly significant Pearson correlation
to the lack of allelic states. Although there were ambiguous between the 2008 and 2009 data (r = 0.7576; p < 1 × 10−4)
or unknown pedigrees for some genotypes in this diversity allowed us to use the average data across 2008 and 2009. Asso-
panel this high correlation between genetic and a pedigree ciation analysis using a mixed model that accounts for both
distance suggests the available pedigree information is reli- population substructure and kinship relationships has been
able for many of the lines (Supplemental Table 1). shown to reduce both type I and type II error (Yu et al., 2006).
Using this method, the mite insertion was significantly associ-
Flowering Time Association Analysis ated with flowering time in this population (p = 3.08 × 10−6).
To demonstrate the utility of this diversity panel for None of the 511 SNP analyzed showed any association with
mapping, association analysis was conducted on a region flowering time. This result is expected given the relatively low
known to be linked with flowering time. The flowering marker density in this study. This successful proof of concept
time trait was selected for this proof of concept to demon- related to the mite insertion confirms the utility of this diversity
strate that the phenology restriction placed on the diversity panel for association mapping and lack of any negative conse-
panel did not hinder its utility for association mapping. quences of phenology restrictions. With the added advantage
It has been shown previously that multiple polymor- of inclusion of only those lines maturing in the target area of its
phisms in Vgt1, a noncoding region upstream of ZmRap2.7, use, this panel will be an excellent resource for future associa-
are associated with flowering time in maize (Salvi et al., 2007; tion studies.
Ducrocq et al., 2008). Ducrocq et al. (2008) reported that a
2-bp indel (CGindel587) showed the greatest association with CONCLUSIONS
flowering time in this region. CGindel587 is in high LD with Previously described diversity panels offered limited num-
a mite insertion that is also associated with flowering time ber of genotypes that mature in the upper Midwest region of
(Ducrocq et al., 2008). Due to the ease of genotyping, the mite the United States. Herein, we have described an expanded