USDA-ARS, Dale Bumpers National Rice Research Center, 2890 Hwy. 130 East, Stuttgart, AR 72160.
Abstract Rice (Oryza sativa L.) end-use cooking quality core ideas
is vital for producers and billions of consumers worldwide.
Grain quality is a complex trait with interacting genetic and • We characterized core and minicore subsets of the
environmental factors. Deciphering the complex genetic USDA National Small Grains Collection of global rice
architecture associated with grain quality provides essential accessions for grain quality.
information for improved breeding strategies to enhance • We identified loci and candidate genes for grain
desirable traits that are stable across variable climatic and quality and grain chalk traits in rice diversity panels.
environmental conditions. In this study, genome-wide association • We detected loci with pleiotropic effects across
(GWA) analysis of three rice diversity panels, the USDA rice core
multiple grain quality and agronomic traits.
subset (1364 accessions), the minicore (MC) (173 accessions
after removing non-sativa), and the high density rice array–MC • We demonstrated the utility for genome-wide
(HDMC) (383 accessions), with simple sequence repeats, single association (GWA) discovery in a minicore selected to
nucleotide polymorphic markers, or both, revealed large- and maximize diversity with a minimal panel size.
small-effect loci associated with known genes and previously
uncharacterized genomic regions. Clustering of the significant
regions in the GWA results suggests that multiple grain quality
traits are inherited together. The 11 novel candidate loci for
grain quality traits and the seven candidates for grain chalk
C rop germplasm collections preserve and provide
access to useful genetic diversity that is critical for
continued crop improvement (McCouch et al., 2013).
identified are involved in the starch biosynthesis pathway. These ex situ collections often comprise tens of thou-
This study highlights the intricate pleiotropic relationships that sands of plant accessions (Bockelman et al., 2003), and it
exist in complex genotype–phenotypic associations and gives is impractical to exhaustively explore the entire collec-
a greater insight into effective breeding strategies for grain tion for most traits. For this reason, core collections are
quality improvement. developed that represent phenotypic, genotypic, and geo-
graphical diversity with minimal redundancy. To enable
Abbreviations: AAC, apparent amylose content; ASV, alkali spreading value; more intensive phenotyping and genotyping, a subset
BrL, brown rice grain length; BrW, brown rice grain width; Chk, grain chalk;
FDR, false discovery rate; FNP, functional nucleotide polymorphisms; GBSS,
granule bound starch synthase; GWA, genome-wide association; HD, days Citation: Huggins, T.D., M.-H. Chen, R.G. Fjellstrom, A.K. Jackson,
to heading; HDRA, high-density rice array; HDMC, HDRA–minicore; indel, A.M. McClung, and J.D. Edwards. 2019. Association Analysis of Three
insertion–deletion; MAF, minor allele frequency; MC, minicore; MC-pub, MC
Diverse Rice (Oryza sativa L.) Germplasm Collections for Loci Regulating
with phenotype data published prior to 2009; MC09, MC with phenotype
Grain Quality Traits. Plant Genome 12:170085. doi: 10.3835/
data from 2009; MLM, mixed linear model; MSU7, Michigan State University
Rice Genome Annotation Project Release version 7; NPBR, Nipponbare marker
designed at the functional nucleotide polymorphism; PC, principal component;
PHt, plant height; QTL, quantitative trait locus; RCS, USDA rice core subset Received 30 Sept. 2017. Accepted 3 Apr. 2018.
collection; RDP1, Rice Diversity Panel 1; RDP2, Rice Diversity Panel 2; RgL, seed *Corresponding author (
length; RgW, seed width; SNP, single nucleotide polymorphism; SS, soluble
starch synthase; SSR, simple sequence repeat; WxIn1, Waxy Intron 1. This is an open access article distributed under the CC BY-NC-ND
license (
(or MC) of maximally diverse accessions may be selected Genome-wide association studies are an analytical
from the core collection. Phenotypic and genotypic char- tool that can decipher the relationships of a trait and its
acterization of core and MC diversity panels facilitates genomic causal region. The diverse phenotypic and genetic
the discovery of new useful alleles and the introduction variation present in large sets of unrelated accessions can
of new diversity into breeding programs. be studied to uncover the genetics underlying complex
Large international germplasm collections target traits (Zhu et al., 2008; McCouch et al., 2016). Recently,
staple crops that are globally important for food security. an increase in efficient genotyping techniques has led to
Rice is one such staple grain and is consumed by billions large high-quality SNP datasets. The manageable size and
of people worldwide (Maclean et al., 2002; Sweeney and genetic diversity in the MC make it ideal for GWA analy-
McCouch, 2007). Rice is grown in over 100 countries sis, and QTLs for pericarp color, amylose content and seed
and, through breeding, has become adapted to a wide length have been identified (Wang et al., 2016).
range of climatic zones, environments, and cultural Genome-wide association, QTL mapping, and
management practices (Muthayya et al., 2014). O. sativa marker discovery present the opportunity to increase
has been subdivided into two distinct subspecies groups, the efficiency of breeding improved varieties through
JAPONICA and INDICA on the basis of numerous stud- marker-assisted selection. Increasing the economic value
ies of phylogeny, morphology, and genetics (Sweeney of the crop requires varieties that have high yield poten-
and McCouch, 2007). The JAPONICA group is further tial and superior grain quality. Rice grain quality encom-
subdivided into aromatic, temperate japonica, and tropi- passes a broad range of traits including grain shape,
cal japonica subpopulations and the INDICA group is translucency, milling yield, cooking characteristics, sen-
divided into aus and indica subpopulations. sory traits, and nutritional aspects (Fitzgerald and Resur-
There have been significant global efforts to collect, reccion, 2009). Standard market classes of rice include
preserve, and characterize rice germplasm collections. short, medium, and long grains, which are determined
The USDA-ARS National Small Grains Collection of rice by both grain dimension and other specified physico-
consists of approximately 19,000 accessions collected chemical properties required for conventional markets.
over a century from 116 countries and serves as a diverse Translucent grains are desired for essentially all market
genetic resource. A USDA rice core subset (RCS) collection classes except for opaque waxy (sweet) rice or the chalky
representative of the genetic diversity of the entire collec- rice used for risotto or paella (Calingacion et al., 2014).
tion was selected from 114 countries and consists of 1794 Chalky grains are considered to be low quality because
accessions of the Oryza genus and includes the species O. of their poor grain appearance and the negative impact
sativa, Oryza glaberrima Steud., Oryza rufipogon Griff., they have on rice cooking (Lisle et al., 2000) and milling
and Oryza nivara S.D.Sharma & Shastry. The accessions quality (Khush et al., 1978; Kadan et al., 2008).
were chosen by random stratification to maintain genetic The market value of a rice variety is ultimately depen-
diversity (Yan et al., 2003a, 2007). The genetic diversity dent on the end user, whether that is an industrial proces-
and population structure of the RCS collection were ana- sor or a consumer. Preference studies have shown that
lyzed with 71 simple sequence repeat (SSR) markers and there is tremendous global diversity in what are consid-
one insertion–deletion (indel) marker covering the entire ered to be desirable sensory quality traits (Calingacion et
genome with genetic distances of approximately 30 cM al., 2014). Amylose content, which is predominantly con-
between each marker (Agrama et al., 2009). The USDA trolled by the Waxy gene, granule bound starch synthase 1
MC collection consists of 217 accessions that represent the (GBSS 1), is considered the most important determinant
genotypic and phenotypic diversity of the RCS (Agrama of cooking and sensory (texture) quality (Fitzgerald and
et al., 2009). The MC has been evaluated with SSR mark- Resurreccion, 2009). Single nucleotide polymorphisms
ers in numerous studies, identifying quantitative trait within the GBSS 1 gene are associated with amylose con-
loci (QTLs) associated with agronomic traits (Agrama et tent and starch paste viscosity curves, which are predic-
al., 2009; Li et al., 2010, 2011), grain quality (Agrama et tors of suitability for parboiling and canning processes
al., 2009), yield components and harvest index (Li et al., (Chen et al., 2008a, 2008b). Soluble starch synthase IIa
2012), sheath blight resistance (Jia et al., 2012), hull silica (SSIIa) controls gelatinization temperature of starch gran-
content (Bryant et al., 2011), grain protein concentration ules in the rice grain, which is important in large scale
(Bryant et al., 2013), cold tolerance (Schläppi et al., 2017), industrial processing. Single nucleotide polymorphisms
and starch biosynthesis (Li et al., 2017). More recently, a in this gene (Alk) have been shown to differentiate vari-
Rice Diversity Panel was developed that consists of differ- eties with high or intermediate gelatinization tempera-
ent collections: Rice Diversity Panel 1 (RDP1), Rice Diver- tures from those with low gelatinization temperatures
sity Panel 2 (RDP2), and a collection from the National (Umemoto and Aoki, 2005; Bao et al., 2006). The chalky
Institute of Agrobiology Sciences (McCouch et al., 2016). endosperm is a result of disordered starch granules and
The accessions originated from approximately 92 coun- small, rounded, loosely packed amyloplasts (Lisle et al.,
tries, represent the five subpopulations of rice, and were 2000; Chun et al., 2009). Grain milling yield, palatability,
genotyped with a fixed array of 700,000 single nucleotide and texture are negatively affected by chalky endosperms
polymorphisms (SNPs) (Liakat Ali et al., 2011; Zhao et al., (Lisle et al., 2000; Chun et al., 2009). Grain chalk is com-
2011; Eizenga et al., 2014; McCouch et al., 2016). plexly inherited with 10 QTL being reported thus far,
Genetic Data as the genotyped individuals, any heterozygous sites were
converted to missing data.
USDA Rice Core Collection Markers
The RCS was previously genotyped with 71 SSR markers The HDRA Dataset
and an indel as described by (Agrama et al., 2009, 2010). A genotypic dataset of 700,000 SNPs was generated by the
One SSR marker was discarded because it had excessive HDRA technology as detailed by McCouch et al. (2016).
polymorphism, whereas nine additional markers associ- The HDRA genotypic data of 1554 accessions, referred
ated with amylose content and major blast (Magnaporthe to as the HDRA panel, is the first of its kind in rice and
oryzae) disease resistance genes were used to genotype captures much of the genetic variation in rice. This high-
1364 accessions of the RCS. Four markers are specific to density SNP set afforded much higher resolution than
the Waxy gene of rice: three are SNPs in Waxy Intron what was previously available and has the ability to reveal
1 (WxIn1), Waxy Exon 6, and Waxy Exon 10; one is an genetic regions of both minor and major effects (McCouch
indel in the WxIn1 region that is scored as a 3-bp varia- et al., 2016). Single nucleotide polymorphisms from the
tion in fragment size (Chen et al., 2010). Two functional HDRA dataset for RDP1 and RDP2 were obtained from
markers specific to the Alk gene were developed: the (
ALK marker targets a region containing two SNPs at index.cfm, accessed 14 Sept. 2018). These SNPs were fil-
positions 6,752,887 and 6,752,888 of Nipponbare [Michi- tered for MAF, the percentage missing data, and the per-
gan State University Rice Genome Annotation Project centage of heterozygosity across accessions as was done
Release version 7 (MSU7); http://rice.plantbiology.msu. for the MC. Additionally, any SNPs in the HDRA dataset
edu/, accessed 14 Sept. 2018) that change from GC to were removed when non-Nipponbare reference alleles
TT (TGCCGCGCACCTGGAGC, forward wild-type; were detected in the Nipponbare controls.
CGAGCCGCACAAGC, reverse) (Note: the ALK marker Overlap between the HDRA and the RCS
amplifies the functional nucleotide polymorphism in the The MC collection is a subset of the RCS, whereas the
Alk gene) and the marker NPBR (Nipponbare marker RDP1 and RDP2 collections genotyped by the HDRA par-
designed at the functional nucleotide polymorphism) tially overlap with the RCS collection (Fig. 1a). Because the
targets a single A to G SNP at position 6752,756 of Nip- HDRA and MC datasets are both based on SNPs called
ponbare (CGGGTCGAACGCCGAAAC, forward wild- against the same reference genome sequence, it was pos-
type; AACGGGTCGAACGCCGAAAT, forward mutant; sible to match SNP data from the RCS accessions in the
GGCCTCAACCAGCTCTACGC, reverse). The NPBR HDRA dataset and generate a merged dataset between the
marker contains a 1-nucleotide mismatch (A instead of resequencing-derived MC SNPs and the fixed-array-based
C) added at the third base from the 3´ end of the allele HDRA SNPs, called the HDMC. The intersection of SNPs
specific primers to aid in SNP detection. Polymerase chain shared between the MC and HDRA genotype data was
reaction conditions for the ALK and NPBR markers were found with VCFTools (Danecek et al., 2011) on the basis
the same as those described in Costanzo et al. (2011) and of the SNP pseudomolecule coordinates. Only SNPs pres-
used a 67°C annealing temperature. Individual markers ent in both datasets (HDRA and the sequenced MC) were
were combined to generate haplotypes for Waxy and Alk used for the HDRA genotyped collection. The combined
and were scored as shown in Supplemental Table S1. SNPs were filtered, as was done for the MC. The compat-
ibility of the two SNP datasets was validated by generating
Minicore Resequencing a neighbor-joining tree and verifying that the 23 lines that
The MC collection was sequenced to an average depth are shared between the MC and the HDRA genotyped col-
of 1.5× by Wang et al. (2016). Because additional SNP lections appeared as nearest neighbors on the tree. Follow-
resources are now available for O. sativa that can be used ing this quality control step, when MC resequencing and
to improve SNP calling and because nonimputed SNPs HDRA SNP data were available for an accession, the MC
were desired for later steps, the SNP genotypes called SNPs were used and the HDRA SNPs were discarded.
from Wang et al. (2016) were not used. Instead, raw
reads were downloaded from the sequence read archive Population Structure of the HDMC
(BioProject PRJNA301661). The SNPs were then called The compiled panel resulted in a total of 122,102 high-
against the Nipponbare reference genome according to quality SNPs after filtering for MAF (0.05) and removing
the Genome Analysis Toolkit best practices (McKenna et heterozygous sites. The resulting genotypic data for the 383
al., 2010). The HDRA SNP dataset (McCouch et al., 2016) diverse accessions were analyzed for population structure
was used for variant recalibration. The resulting SNP with fastSTRUCTURE (Raj et al., 2014). The ‘’
calls for the MC were then filtered to exclude SNPs with a option was used to infer the number of populations (k).
minor allele frequency (MAF) below 0.05 and those with By using the admixture model for analysis, k was assigned
>60% missing data. In addition, because these are inbred values from 4 to 10 to infer populations. The number of
lines and heterozygosity is expected to be low, SNPs with components that explain the structure was determined
heterozygosity > 5% were removed. Finally, because the with the ‘’ option by parsing through the out-
generation of seeds that was phenotyped is not the same put for each assigned k value. The expected admixture
proportions thus inferred were visualized in a distruct options in TASSEL version 5 (Bradbury et al., 2007), which
plot generated in fastSTRUCTURE (Raj et al., 2014) (Fig. calculates an association test for each marker and trait
1b). Principal components (PCs) were calculated from the combination, was used in the MLM analysis. The p-values
122,102 SNPs by applying the principal component func- returned from the MLM analysis were subjected to false
tion in TASSEL version 5 (Bradbury et al., 2007). discovery rate (FDR) testing to reduce the likelihood of
false positives in the R package “qvalue” (Storey and Tib-
Genome-Wide Association Mapping
shirani, 2003). For the RCS panel, a significance threshold
The RCS Panel of 10-3 was calculated from the FDR correction.
A total of 83 markers (72 SSRs, 2 indels, 7 SNPs, and 2
haplotypes) were used to perform GWA analysis on the The MC Panel
1364 accessions for grain quality traits, HD, and PHt in The 3.3 million SNPs from the MC resequencing were used
the TASSEL version 5 pipeline (Bradbury et al., 2007; for GWA analysis in TASSEL version 5 (Bradbury et al.,
Zhang et al., 2010). Before GWA analysis, the markers 2007). The genotype data were filtered for SNPs with a MAF
were converted to ACGT± allele groups as specified for of >0.05 and less than 30% missing sites. After filtering. 3.2
formatting in the TASSEL version 5 manual. The con- million SNPs remained. The remaining 173 accessions were
verted genotypic data were filtered with a MAF of 0.05. used in the calculation of the kinship matrix and PCs. A
The filtered data were then used to calculate principal centered identity-by-state kinship matrix and the first three
components and a kinship matrix with the centered PCs were used as covariates to account for relatedness and
identity-by-state. A MLM analysis was performed with the subpopulation structure in a MLM model. Associations
first three PCs and the kinship matrix as covariates to cor- for each marker and trait were calculated using the ‘Each-
rect for population structure and the relatedness present marker’ and ‘no-compression’ options in TASSEL version 5
in the RCS panel. The ‘Eachmarker’ and ‘no-compression’ (Bradbury et al., 2007) in the MLM analysis.
The HDMC Panel Results
In total, 123,121 SNPs for the 383 accessions were gener- Genome-Wide Association Analysis
ated after merging and filtering the MC and the HDRA. In the RCS, GWA analysis identified three significant
The SNPs with MAF > 0.05 and 30% missing sites were markers for AAC, six markers for ASV, two for brown
removed via the filtering options available in TASSEL rice grain length (BrL), one for brown rice grain width
version 5. After filtering, 122,102 SNPs remained for (BrW), six for HD, five for PHt, three for rice grain seed
the 383 accessions. A centered identity-by-state kinship length (RgL), and one for seed width (RgW) in the com-
matrix and PCs were calculated with the kinship and PCs bined (aus + indica + aromatic + temperate japonica +
options in TASSEL version 5. The GWA analysis was per- tropical japonica + admixed) group analysis (Table 2).
formed with a MLM, incorporating the kinship matrix Three significant markers were detected for AAC, two for
and the first three PCs as covariates in the model. The ASV, one for HD, and PHt in the INDICA (aus + indica
option ‘no-compression’ and ‘EachMarker’ in TASSEL + admixed aus–indica) group analysis (Table 2). In the
version 5 (Bradbury et al., 2007)were used in the analysis. JAPONICA (aromatic + temperate japonica + tropical
japonica + admixed temperate–tropical–aromatic) group
Post-Processing of GWA Results
analysis, three significant markers were detected for AAC,
Significant Regions five for ASV, two for HD, one for PHt, and one for RgW
The raw output from TASSEL containing p-values for (Table 2). The three markers identified for AAC in the
each SNP were processed with R scripts via the qqman combined group were also identified in the INDICA and
package (Turner, 2014) to generate Manhattan plots and JAPONICA group analyses. Two markers for ASV (Alk_
quantile–quantile plots. Q-values were calculated with the hap and ALK) were detected across all three groups, two
R package qvalue (Storey and Tibshirani, 2003). Because additional markers were detected in the combined and
multiple significant SNPs can be found within close prox- JAPONICA group only, one in the combined group, and
imity of each other, a Perl script was used to process the one was detected in the JAPONICA group only (Table
TASSEL output with a threshold p-value for declaring a 2). For grain dimension traits, the markers were detected
significant region and rules for determining the borders only in the combined group (one for BrW and three for
of the region based on the pseudomolecule distance sepa- RgL). Four of the detected HD markers were identified
rating adjacent significant SNPs. The distance threshold only in the combined group, one in the combined and
used was 50,000 bp and the p-value threshold used was INDICA groups, one in the combined and JAPONICA
–log10(p > 6) except in cases of excessive significant SNPs groups, and one in JAPONICA only, but none were com-
where the p-value threshold was increased. The start, mon between INDICA and JAPONICA (Table 2). Three
end, and position of the most significant SNP (peak SNP) of the five detected PHt markers were identified in the
within each region were reported by the script. combined group, one in the combined and INDICA
groups, and one in the combined and JAPONICA groups.
Candidate Gene Identification The lone marker identified for RgW was detected in the
To facilitate analysis of candidate genes, a Perl script was combined and JAPONICA groups.
used to extract all annotated genes within each signifi- Genome-wide association analysis was conducted on
cant region from the MSU7 (Ouyang et al., 2007) and the MC-pub data with the MC high-density resequenced
Rice Annotation Project (RAP1; http://rapdb.dna.affrc. genotypic dataset to investigate the genetic basis of grain, accessed 14 Sept. 2018) (Sakai et al., 2013) rice gene quality. The analysis of AAC identified 51 significant
annotations. The lists of genes generated by the script were genomic segments in the combined group, 60 in the
then inspected to identify probable candidate genes on the INDICA group (–log10(p) > 8), and 13 in the JAPONICA
basis of the annotated gene functions. Candidate genes group (Note: this excludes aromatics for MC09, HDMC,
were inspected via the gene annotation tracks found in the and MC), whereas 21 segments were identified in the
Ricebase genome browser (, accessed combined group, 35 in the INDICA group, and 12 in the
11 June 2018) (Edwards et al., 2016). Genes within 250 kb of JAPONICA group for ASV (Fig. 2a, 2b; Supplemental
the significant SNP position were reported as candidates. Table S2, Supplemental Table S3, Supplemental Table S4;
Supplemental Fig. S1, Supplemental Fig. S2). Analysis of
Overlapping Regions BrL identified 12 segments in the combined group, seven
Significant regions were identified for each trait in each in the INDICA group, and two in the JAPONICA group
panel via the output generated from the Perl script men- (Fig. 2c; Supplemental Table S4). The other grain length
tioned above. An additional Perl script was used to com- trait, RgL, had 10 segments in the combined group, three
pare significant regions for overlap across traits and to in the INDICA group, and three in the JAPONICA group
report clusters. (Supplemental Fig. S7; Supplemental Table S4). Brown
rice width segments were identified in six regions in the
combined group, five in the INDICA group, and two in
the JAPONICA group (Fig. 2d; Supplemental Table S4).
Five segments were identified in the combined group, five
Fig. 2. Genome-wide association analysis for amylose content, alkali spreading value, and grain length and width for the pre-2009 minicore (MC)
(MC-pub) (top), the combined high-density rice array–MC (HDMC) (middle), and the MC with phenotype data from 2009 (MC09) (bottom) diver-
sity panels. Manhattan plots of amylose content (a), alkali spreading value (b), brown rice grain length (c), and brown rice grain width (d). Manhat-
tan plots illustrate the p-values obtained from a mixed linear model with high-quality single nucleotide polymorphisms (SNPs) for each trait evaluated.
The x-axis displays SNPs along chromosomes and the y-axis displays the –log10(p) values for each SNP. The significance threshold is represented
by the black horizontal line on each Manhattan plot. Single nucleotide polymorphisms with p-values less than 10−6 were classified as significant.
combined group, 10 in INDICA and five in JAPONICA segment comparison identified genetic regions associ-
were located in previously reported grain chalk regions ated with multiple traits (Fig. 4). Fifty-two significant
in the biparental populations, Lemont/TeQing (Zhao et regions for grain chalk overlapped the regions of other
al., 2016) and KBNT lpa/Zhe733 (Edwards et al., 2017) traits. Forty-one of these regions overlapped with AAC,
(Table 3). Analysis of grain protein content identified sig- 13 with ASV, and 10 with both AAC and ASV (Fig. 4;
nificant segments on all chromosomes. Thirty-five seg- Supplemental Table S4). Eight of the detected grain chalk
ments were detected in the combined group, seven in the regions did not overlap with either AAC or ASV; how-
INDICA group, and 39 in the JAPONICA group (Supple- ever, six of these regions overlapped HD regions (Supple-
mental Table S4; Supplemental Table S6). mental Table S4). Grain chalk regions overlapped with a
total of four grain length regions and three grain width
Overlapping Genomic Segments regions. Additionally, three grain chalk regions over-
Significant chromosomal regions identified from the lapped with regions shared with both AAC and grain
GWA results were compared across all traits in the protein, and two regions shared with ASV and grain pro-
MC-pub, HDMC, and MC09 panels. The chromosomal tein (Supplemental Table S4).
Table 3. Summary of significant genome-wide association analysis
single nucleotide polymorphisms (SNPs) for grain chalk in the
minicore diversity panel. The analysis consisted of 3.2 million
SNPs. Markers detected at or above the threshold [-log10(p) = 6]
were considered significant.
Chromosome Start† Stop† Peak SNP‡ Peak_val‡
——————————bp ——————————
1¶ 1,654,788 1,754,788 1,704,788 6.12 × 10 –7
1 12,313,419 12,489,991 12,363,419 1.97 × 10 –8
1 25,378,734 25,478,736 25,428,734 8.45 × 10 –7
1§¶ 31,444,903 31,545,511 31,494,903 4.34 × 10–7
2§ 23,027,141 23,127,141 23,077,141 4.46 × 10–7
2 24,474,052 24,574,756 24,524,052 2.02 × 10 –8
3 3,707,484 3,822,358 3,757,484 5.94 × 10–7
3 9,447,631 9,547,631 9,497,631 4.47 × 10–7
3 22,388,199 22,502,385 22,438,199 8.00 × 10–7
4¶ 4,754,567 4,991,082 4,804,567 2.53 × 10–9
4¶ 5,218,584 5,358,637 5,291,778 8.65 × 10–10
4 11,551,491 11,879,693 11,601,491 1.13 × 10–9
4 20,550,548 20,650,548 20,600,548 3.47 × 10 –7
4§ 22,029,754 22,129,754 22,079,754 5.84 × 10–7
5¶ 3,041,626 3,141,626 3,091,626 5.07 × 10 –7
5§¶ 3,725,967 3,825,967 3,775,967 1.73 × 10 –8
5§ 4,111,225 4,211,225 4,161,225 9.65 × 10–7
5 5,335,281 5,510,712 5,396,755 2.40E × 10–7
6§ 1,729,126 1,876,635 1,779,126 1.07 × 10 –10
6 4,575,534 4,775,534 4,725,534 1.95 × 10 –7
6 8,969,806 9,071,605 9,019,806 9.67 × 10–7
6 15,657,572 15,757,572 15,707,572 8.33 × 10 –7
7 7,289,618 7,460,150 7,339,618 1.56 × 10–7
7 22,964,373 23,221,163 23,014,373 3.74 × 10 –7
8 15,757,626 16,025,232 15,961,296 1.28 × 10 –9
8§ 17,474,125 17,686,054 17,524,125 2.13 × 10 –7
8 18,447,974 19,316,518 19,010,220 4.12 × 10 –7
8 19,570,551 19,670,551 19,620,551 4.37 × 10–7
8 19,891,463 20,103,321 20,031,164 2.19 × 10 –10
8 24,524,827 24,624,827 24,574,827 5.92 × 10 –7
8 26,027,610 26,127,610 26,077,610 3.11 × 10–7
10 4,262,440 4,362,440 4,312,440 6.34 × 10 –8
11§ 8,116,105 8,333,235 8,166,105 4.75 × 10–10
11§ 24,255,726 24,355,726 24,305,726 5.42 × 10 –7
12 9,221,916 9,321,916 9,271,916 1.00 × 10–6 Fig. 4. Heat-map of the overlapping genome segments significantly
† Start and Stop indicate the regions 50,000 bp upstream and downstream of the peak SNP, respectively.
associated with multiple traits in the rice core subset (RCS), the mini-
core (MC) with data from before 2009 (MC-pub), the MC with phe-
‡ Peak SNP,; most significant SNP in the region; Peak_val, the p-value of the most significant SNP. notype data from 2009 (MC09), or the combined high-density rice
§ Regions identified in the Lemont × TeQing biparental populations for grain chalk. array–MC panels (HDMC). The segments (horizontal) are clustered
¶ Regions identified in the KBNT lpa × Zhe733 biparental populations for grain chalk. according to the patterns of shared significant traits (vertical) with red
indicating a significant association between that chromosome region
Candidate Genes Identified for Grain Quality and the trait, and blue indicating no detected association. Known
genes contained within segments associated with grain chalk are
Some of the significant segments that were identified annotated on the left.
in the GWA analysis were in proximity to or located
within known and characterized genes. A few of these Perl scripts and the MSU7 gene annotation tracks
major genes include Grain Size 3 (Os03g0407400), Grain in Ricebase (Edwards et al., 2016) were used to identify
Weight 5 (DQ991205), the dwarf genes semi-dwarf 1 candidate genes within 200 kb of significant segments. Six
(Os01g0883800) and OsGH3.1 (Os01g0785400), and the potential candidate genes were identified for AAC: a Na–
starch biosynthesis genes Waxy (Os06g0133000) and Ca exchanger gene (LOC_Os02g43110), a triose phosphate
SSIIa (Os06g0229800). translocator gene (LOC_Os05g07870), a cellulose synthase
gene (LOC_Os06g39970), a trehalose phosphatase gene
2). RM1339 is located ~184 kb upstream and RM431 be involved in heat stress tolerance and may affect starch
is located ~512 kb downstream of the “Green Revolu- metabolism (Li et al., 2015).
tion” semi-dwarf 1 (Os01g08803800) gene, a mutation The ASV candidate gene, LOC_Os07g46790,
that reduces plant height by affecting the final stages of encodes for Disproportionating Enzyme 1, a
gibberellin biosynthesis (Cho et al., 1994; Monna et al., 4-α-glucanotransferase. This protein is involved in the
2002; Spielmeyer et al., 2002). The semi-dwarf 1 gene is a synthesis of starch and also affects amylose content, amy-
major-effect gene and produces a ‘mountain range’ dis- lopectin structure, and the size of starch granules (Colleoni
tribution of significant SNPs in this region, as described et al., 1999; Dong et al., 2015). Suppression of Dispropor-
by Atwell et al. (2010). The marker RM489 is significantly tionating Enzyme 1 resulted in increased amylose content,
associated with HD and grain length. Closer examination reduced proportions of amylopectin chains with a degree
of the region showed that it sits between the dwarf gene of polymerization of 6 to 8 glucose units and those with
OsBP-73 (Os03g0183100) and a plant height gene TIFY11b a degree of polymerization of 16 to 36 glucose units but it
(Os03g0181100), which are located ~21 kb upstream and increased those with a degree of polymerization of 9 to 15
~84 kb downstream, respectively. The dwarf gene OsBP- glucose units, and displayed loosely packed starch granules
73 inhibits plant growth by reducing tiller number and in the rice endosperm. When overexpressed, it reduced
panicle number and shortening culms (Chen et al., 2003), amylose content, increased the proportion of amylopectin
whereas TIFY11b increases plant height and increases chains with a degree of polymerization of 6 to 10 glucose
seed size by pronounced accumulation of stem carbohy- units and those with a degree of polymerization of 23 to
drates (Nakamura et al., 2007; Hakata et al., 2012). Previ- 38 glucose units, whereas it reduced those with a degree
ous studies have reported similar pleiotropic observations of polymerization of 11 to 22 glucose units, and the starch
between HD and grain length in a chromosomal region; granules were tightly packed (Dong et al., 2015).
an example of this is the Ghd8 gene, which affects grain The candidate gene for grain chalk, LOC_Os03g07480,
yield, HD, and PHt (Yan et al., 2011). is a sucrose transporter located approximately ~1.4
Mb downstream of a low phytic acid gene (XS-lpa2,
Possible Candidate Genes Os03g0142800) and ~1.0 Mb upstream of rice myo-inositol
The high resolution afforded by GWA analysis can allow 3-phosphate synthase 1 (RINO1, Os03g0192700) (Supple-
for detection and identification of regions that are sig- mental Table S5). Sucrose transporters are proton-coupled
nificantly associated with traits of interest. Possible can- uptake transporters that transport sucrose, maltose, and
didate genes were identified on the basis of the biological α- and β-glucosides into sink tissues and the phloem
function of the surrounding characterized genes and the (Kühn and Grof, 2010; Ayre, 2011; Reinders et al., 2012).
presence of significant regions occurring within 200 kb of Edwards et al. (2017) reported that the low phytic acid gene
a known gene in rice (Supplemental Table S5). The AAC located on chromosome 2 (OsLpa1) was a likely candidate
candidate gene, LOC_Os05g07870, is located approxi- for causing grain chalkiness in the KBNT lpa × Zhe733
mately ~860 kb downstream of a major grain chalk gene, biparental mapping population. Phytic acid biosynthesis
chalk5, a vacuolar pyrophosphatase with H+ translocation genes are regulators of seed P and have been reported to
activity. The LOC_Os05g07870 gene is characterized as a be influenced by abscisic acid during seed development
triose phosphate-encoding gene. Triose phosphates play (Yoshida et al., 2002; Matsuno and Fujimura, 2014). Phytic
an important role in the source–sink relationship in plants acid, abscisic acid, and sucrose accumulate in rice seeds
and reside within the cell wall of chloroplasts, regulating during the same developmental period, and it has been
the transport of sucrose in and out of the cytosol, con- reported that abscisic acid regulates grain filling along
necting photosynthesis, starch synthesis, and glycolysis with sucrose (Akihiro et al., 2005; Tang et al., 2009).
(Jin-Yue et al., 2004; Toyota et al., 2006). LOC_Os07g30160 Another grain chalk candidate gene, LOC_
is a trehalose-6-phosphatase gene that has been shown to Os05g06160, is a trehalose phosphatase that is located
regulate starch use in plants and is an indicator of plant ~244 kb upstream of the chalk5 gene identified by Li
sucrose status (Wingler et al., 2000; Schluepmann et al., et al. (2014). This chromosomal region has previously
2004; Lunn et al., 2006; Ponnu et al., 2011). More impor- been reported to contain a grain chalk QTL (qBCHK5)
tantly, trehalose-6-phosphatase is a known regulator (Edwards et al., 2017). Two other possible candidate genes
of starch metabolism in plants and specifically induces for grain chalk were identified on chromosome 5, LOC_
starch accumulation and synthesis (Wingler et al., 2000; Os05g07130 (a fructose-6-phosphate-2-kinase gene) and
Lunn et al., 2006). The LOC_Os08g30210 gene produces LOC_Os05g07750 (a sugar transporter gene) are located at
1-aminocyclopropane-1-carboxylate oxidase, which is ~436 kb and ~800 kb downstream of chalk5, respectively
involved in ethylene biosynthesis, thus regulating plant (Supplemental Table S5). Fructose-6-phosphate-2-kinase
developmental stages and stress tolerance (Ruduś et al., is a bifunctional enzyme that modulates fructose-
2013). Previous studies have reported that high nighttime 2,6-bisphosphate in plants. It is primarily expressed in
temperatures can affect endosperm development, result- leaves and regulates leaf sucrose levels (Park et al., 2007;
ing in poor packing of starch granules (Ambardekar et Udomchalothorn et al., 2009). Sugar transporters medi-
al., 2011; Lanning et al., 2011). The presence of 1-amino- ate the movement of starch from source to sink tissues,
cyclopropane-1-carboxylate oxidase has been reported to especially during grain filling (Kühn and Grof, 2010; Ayre,
Future Research group (bottom). All aromatics and other admixtures were
The genomic regions and candidate genes identified in removed from analysis.
this study will be targeted for development of gene-specific Supplemental Figure S2. Genome-wide analysis for alkali
markers. The genetic markers thus developed could be used spreading value (ASV) using the mini-core (MC-pub) data.
to validate the findings of the GWA results in biparental Genome-wide analysis Manhattan and quantile–quantile
populations and be used to assess the diversity panels fully. plots with NGS SNPs for the trait evaluated. For each Man-
Validated markers will be tested in panels of breeding lines hattan plot, the x-axis displays SNPs along chromosomes and
and subsequently be deployed for use in marker-assisted the y-axis displays the –log10(p) values for each SNP. The sig-
selection to accelerate breeding for grain quality. Knowl- nificance threshold is represented by the black horizontal line
edge of the different pleiotropic effects of the various loci is on each Manhattan plot. Single nucleotide polymorphisms
also critical for their deployment in breeding so that grain with p-values greater than 10–6 were classified as significant.
defects like chalk can be reduced without simultaneously For each quantile–quantile plot, the x-axis displays the
having an undesired effect on other grain quality traits. expected distribution of association across the SNPs and the
y-axis displays the observed SNP distribution in –log10(p).
huggins et al .
