Panchy Et Al (2016) - Plant Physiology

Download as pdf or txt
Download as pdf or txt
You are on page 1of 23

Topical Review on Gene Duplication

Evolution of Gene Duplication in Plants1[OPEN]


Nicholas Panchy, Melissa Lehti-Shiu, and Shin-Han Shiu*
Genetics Program (N.P., S.-H.S.) and Department of Plant Biology (M.L.-S., S.-H.S.), Michigan State University,
East Lansing, Michigan 48824
ORCID IDs: 0000-0002-1551-3517 (N.P.); 0000-0003-1985-2687 (M.L.-S.); 0000-0001-6470-235X (S.-H.S.).

Ancient duplication events and a high rate of retention of extant pairs of duplicate genes have contributed to an abundance of
duplicate genes in plant genomes. These duplicates have contributed to the evolution of novel functions, such as the
production of floral structures, induction of disease resistance, and adaptation to stress. Additionally, recent whole-genome
duplications that have occurred in the lineages of several domesticated crop species, including wheat (Triticum aestivum),
cotton (Gossypium hirsutum), and soybean (Glycine max), have contributed to important agronomic traits, such as grain
quality, fruit shape, and flowering time. Therefore, understanding the mechanisms and impacts of gene duplication will
be important to future studies of plants in general and of agronomically important crops in particular. In this review, we
survey the current knowledge about gene duplication, including gene duplication mechanisms, the potential fates of
duplicate genes, models explaining duplicate gene retention, the properties that distinguish duplicate from singleton
genes, and the evolutionary impact of gene duplication.

Distinct from other eukaryotic genomes, plant ge- event occurred approximately 450 MYA in the lineage
nomes tend to evolve at higher rates, leading to higher leading to humans (Panopoulou et al., 2003; Dehal and
genome diversity (Kejnovsky et al., 2009; Murat et al., Boore, 2005) and approximately 200 MYA in the
2012). For example, differences in genome size be- budding yeast lineage (Wolfe and Shields, 1997; Kellis
tween closely related plant species are much larger et al., 2004). Strikingly, many plant species also com-
than between other closely related eukaryotes. Among prise mixed populations of diploid and polyploid in-
dicotyledonous species that diverged approximately dividuals, illustrating the prevalence of polyploidy in
150 million years ago (MYA), genome size ranges from plants (Husband et al., 2013). For example, 2.4% of
merely 63 Mb in the carnivorous Genlisea margaretae Lythrum salicaria populations have both diploid and pol-
(Greilhuber et al., 2006) to approximately 150 Gb in yploid individuals (Kubatova et al., 2008), and this per-
the canopy plant Paris japonica (Pellicer et al., 2010). centage is even higher (greater than 60%) for Chamerion
This 2,000-fold difference in genome size among di- angustifolium (Sabara et al., 2013) and Actinidia chinensis
cots is in stark contrast to that observed among the (Li et al., 2010).
mammalian species that also radiated approximately WGD, or polyploidization, is an extreme mechanism
150 MYA (Warren et al., 2008), where genome size of gene duplication that leads to a sudden increase in
ranges from approximately 1.6 Gb in Carriker’s
round-eared bat (Smith et al., 2013) to approximately
8 Gb in the tetraploid red viscacha rat (Gallardo et al.,
1999).
Plant genomes also have an abundance of duplicate ADVANCES
genes. Whole-genome duplication (WGD) has occurred
multiple times over the past 200 million years of an-  On average, 65% of annotated genes in plant genomes have a
giosperm evolution (Lyons et al., 2008; Soltis et al., duplicate copy. Of these, most were derived from WGD,
2009, 2014; Lee et al., 2013; Renny-Byfield and Wendel, consistent with the prevalence of paleopolyploidization
2014), and genomic sequencing continues to reveal events in the land plant lineage.
new events (Velasco et al., 2010; D’Hont et al., 2012;  Multiple mechanisms contribute to duplicate retention. Nota-
Wang et al., 2012; Lu et al., 2013; Myburg et al., 2014; bly, mechanisms that do not require the evolution of new func-
Wang et al., 2014b). In contrast, the most recent WGD tions (e.g. dosage balance) may play an important role in the
initial retention of duplicate genes.
 Retained duplicates and those that have reverted back to single
1 copy can be predicted with models that incorporate gene func-
This work was supported by the National Science Foundation
tions and multiple sequence properties.
(grant nos. MCB–1119778 and IOS–1126998 to S.-H.S.).
* Address correspondence to shius@msu.edu.  Fitness differences between two locally adapted plant popula-
N.P. and S.-H.S. analyzed the data; N.P., M.L.-S., and S.-H.S. wrote tions can be explained by genetic variation in duplicate genes
the article. involved in abiotic stress tolerance.
[OPEN]
Articles can be viewed without a subscription.
www.plantphysiol.org/cgi/doi/10.1104/pp.16.00523

2294 Plant PhysiologyÒ, August 2016, Vol. 171, pp. 2294–2316, www.plantphysiol.org Ó 2016 American Society of Plant Biologists. All Rights Reserved.
Origin and Evolution of Duplicate Genes

both genome size and the entire gene set. However, it is [Oryza sativa]) of plant genes were defined as paralogous
not the only mechanism that gives rise to duplicated based on transcript data (Blanc and Wolfe, 2004a); how-
genes. In general, gene duplication generates two gene ever, genome sequence data have yielded paralog fre-
copies; this theoretically allows one or both to evolve quencies as high as approximately 75% in soybean
under reduced selective constraint and, on some occa- (Glycine max; Schmutz et al., 2010). In Arabidopsis
sions, to acquire novel gene functions that contribute to (Arabidopsis thaliana), the estimate of duplicate gene con-
adaptation. There is little question that duplicate genes tent ranges from 47% (Blanc and Wolfe, 2004a) to 63%
have contributed to novel traits over the course of plant (Ambrosino et al., 2016) due to differences in gene models,
evolution (Van de Peer et al., 2009b). Through com- methodology, and parameters (i.e. similarity cutoffs).
parative analyses of an ever-increasing number of plant To obtain a comparable estimate of duplicate gene
genome sequences and functional genomic data sets, number across plant genomes, we applied a common
we now have an unprecedented understanding of how methodology and similarity threshold to identify du-
genes are duplicated, how duplicated genes evolve plicate genes in 41 sequenced land plant genomes (Fig.
new functions, and the impact of gene duplication on 1). On average 64.5% of plant genes are paralogous,
genome evolution (Conant and Wolfe, 2008; Freeling ranging from 45.5% in the bryophyte Physcomitrella
et al., 2015; Soltis et al., 2015). patens to 84.4% in apple (Malus domestica). Given that
Gene duplication is but one type of genomic change ancient and/or fast-evolving paralogs are not easily
that can lead to evolutionary novelties. Novel func- detected due to sequence divergence, these percentages
tions can arise from the co-option of existing genes are likely underestimates. Total genic content is corre-
(True and Carroll, 2002), new genes can arise de novo lated significantly with both paralog content (r2 = 0.46,
from intergenic space (Tautz and Domazet-Lošo, 2011; P , 7e-6) and the presence of a reported polyploidiza-
Schlötterer, 2015), and new transcriptional regulatory tion event (r2 = 0.35, P , 8e-4), demonstrating the large
sites can come into existence that alter gene expres- contribution of duplication, particularly WGD, to dif-
sion (Wray et al., 2003). In addition, although in this ferences in gene content among plant species.
review we focus only on genes, the duplication of Another way to illustrate the preponderance of plant
other genomic features, including regulatory regions duplicates is to look at the number of paralogs within
(Nourmohammad and Lässig, 2011), transposable ele- gene families (Dayhoff, 1976). In a survey of eight di-
ments (TEs; Lisch, 2013), and repeat elements (Sharopova, verse plant species, the percentage of genes belonging
2008), has been reported to influence gene expression to gene families ranges from 40% in the green alga
and function. Nonetheless, gene duplication remains of Chlamydomonas reinhardtii to 95% in the lycophyte
specific interest both because of the abundance of plant Selaginella moellendorffii, with most species having in
gene duplicates and their potential to contribute to excess of 65% familial genes (Guo, 2013). Although the
plant novelties. The goal of this review is to provide an proportion of familial genes in plant genomes is high,
overview of our current state of knowledge about plant there can be dramatic differences in the size of gene
gene duplication and its significance. We first focus on families across species due to lineage-specific expan-
the prevalence of gene duplication in plants and the sions (Lespinet et al., 2002). For example, one of the
mechanisms that contribute to gene duplication. We largest families in plants is the protein kinase super-
then discuss the fate of duplicate genes and the factors family, which has 426 members in the unicellular green
that influence whether a duplicate is retained or not. alga C. reinhardtii and 2,532 in Eucalyptus grandis (Lehti-
Finally, we consider the influence of duplicate genes on Shiu and Shiu, 2012). Another example illustrating the
the evolution of plant species and agronomically im- variation in plant gene family size is the large difference
portant traits. in the number of transcription factors, which can differ
more than 10-fold among plant species (Jin et al., 2014).
At the other extreme, some genes have few or no
paralogs. For example, there is only one Arabidopsis
PREVALENCE AND MECHANISMS gene encoding DNA gyrase A, despite the fact that re-
OF DUPLICATION peated rounds of WGD would have generated gyrase A
Predominance of Duplicate Genes in Plant Genomes duplicates in the past. Thus, not all gene families are
created equal. What contributes to these large differ-
In the green lineage, gene numbers range from 8,166 ences in gene family size? Integrated analysis of gene
in the unicellular green alga Ostreococcus tauri (Derelle family and functional annotation data led to the finding
et al., 2006) to approximately 95,000 in bread wheat that plant genes involved in transcriptional regulation,
(Triticum aestivum; Brenchley et al., 2012). What pro- signal transduction, and stress response tend to have
portion of genes within each species have shared paralogs (Blanc and Wolfe, 2004b; Maere et al., 2005;
common ancestry due to duplication (Fitch, 1970)? Al- Shiu et al., 2005; Hanada et al., 2008) but those involved
though a paralog is well defined conceptually, the cri- in essential functions, such as genome repair, genome
teria (e.g. the threshold sequence similarity) and data duplication, and organelles, tend not to (Li et al., 2016).
sets used to identify paralogs vary among studies, and The correlation between duplication and function also
thus direct comparisons are difficult. For example, be- appears to be influenced by how duplicates are made
tween 16% (barley [Hordeum vulgare]) and 49% (rice (Hanada et al., 2008). For example, transcription factors
Plant Physiol. Vol. 171, 2016 2295
Panchy et al.

Figure 1. Duplication events and paralagous gene content in selected plant species. Left, Phylogeny of selected plant species.
Duplication (squares), triplication (hexagons), and undefined (circles) polyploidization events are indicated on the tree. Middle,
Total (blue) and duplicated (pink) gene numbers in each species. A gene is regarded as duplicated if it is significantly similar to
another gene in a BLAST (Altschul et al., 1997) search (identity $ 30%, aligned region $ 150 amino acids, expect value # 1025).
Right, Species names. The data used to generate this figure were obtained from CoGe and Phytozome 11 (Lyons and Freeling,
2008; Lyons et al., 2008; Goodstein et al., 2012) as well as genome annotations for Eucalyptus grandis (Myburg et al., 2014),
Panicum virgatum (Lu et al., 2013), P. patens (Rensing et al., 2008), Salix purpurea (Phytozome), Populus trichocarpa (Tuskan
et al., 2006), and Spirodela polyrhiza (Wang et al., 2014b). References for the information aggregated on CoGe can be found at
https://genomevolution.org/wiki/index.php/Plant_paleopolyploidy.

have higher than average retention after WGD (Maere contribute to gene duplication (Fig. 2). Considering the
et al., 2005) but not after local duplication (Hanada impact on gene content, the most dramatic form of gene
et al., 2008). Therefore, to understand how duplication duplication involves duplication of an entire chromo-
impacts gene content, it is necessary to know how genes some or the whole genome (Fig. 2A). Ancient WGD
are duplicated. events have taken place in the common ancestors of
seed plants (approximately 340 MYA) and of angio-
sperms (approximately 170 MYA; Jiao et al., 2011).
Mechanisms of Gene Duplication Subsequently, three rounds of WGD events (referred to
as a, b, and g) took place in the Arabidopsis lineage
Duplication is a form of mutation in which a genomic (Blanc et al., 2003; Bowers et al., 2003). In some cases,
region is replicated and, in some cases, is inserted into WGD events involve genome triplication, as is the case
a physically separate location. Multiple mechanisms for Brassica rapa (Lysak et al., 2005), the wild radish
2296 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

Figure 2. Mechanisms of gene duplication. A, WGD,


or duplication of genes via an increase in ploidy. B,
Tandem duplication, or duplication of a gene via
unequal crossing over between similar alleles. C,
Transposon-mediated duplication, or duplication of a
gene associated with a TE via replicative transposi-
tion. D, Segmental duplication, or duplication of
genes via replicative transposition of LINE elements in
the human genome. The extent and causative ele-
ments of segmental duplication are ill defined in
plants. E, Retroduplication, or duplication of a gene
via reverse transcription of processed mRNA.

Raphanus raphanistrum (Moghe et al., 2014), and bread The contribution of WGD to existing duplicates is
wheat (Salse et al., 2008). Even higher levels of poly- likely much higher than estimated based on analyses of
ploidization also have been observed (e.g. octoploid syntenic duplicates, because some WGD syntenic blocks,
cultivated strawberry [Fragaria 3 ananassa]; Byrne and particularly those arising from older events, are no
Jelenkovic, 1976). Paleopolyploidy events are thought longer recognizable due to genome rearrangements,
to be rare, as only a handful of recognizable events insertions, and deletions. Thus, some of the duplicates
occurred over the past 200 MYA (Van de Peer et al., that cannot currently be ascribed to WGD may actually
2009a). Nonetheless, the frequency of paleopolyploid- be WGD duplicates. Alternatively, they could be de-
ization is much higher in plants than in any other eu- rived from subgenomic duplication events, which in-
karyotic lineage (Otto and Whitton, 2000). In addition, clude tandem duplication (Zhang, 2003), duplication
although the survival rate of nascent polyploids remains mediated by TEs (Jiang et al., 2004), segmental dupli-
largely unclear (Soltis et al., 2015), diploid and polyploid cation (Bailey et al., 2002), and retroduplication (Drouin
cytotypes frequently coexist (Ramsey and Schemske, and Dover, 1990; Brosius, 1991). Tandem (or local)
1998). Given the frequency of ancient and more recent duplication (Fig. 2B) results from unequal crossing-over
WGD events in plants, it is not surprising that WGD events and leads to a cluster of two to many paralogous
accounts for the majority of duplicate genes. In Ara- sequences with no or few intervening gene sequences
bidopsis, for example, approximately 60% of genes (Zhang, 2003). The number of tandem duplicates in
have at least one paralog in a corresponding syntenic plants varies widely, from 451 (4.6% of gene content)
block derived from one of three WGD events (Bowers in Craspedia variabilis to 16,602 (26.1%) in apple (Yu
et al., 2003). et al., 2015). In Arabidopsis, the proportion of tandem
Plant Physiol. Vol. 171, 2016 2297
Panchy et al.

duplicates is close to the average of approximately 9% functional retrogenes in mammals and D. melanogaster
observed for 39 plant species. One challenge in estimat- (Kaessmann et al., 2009). There is also evidence that plant
ing the contribution of tandem duplication to duplicate retrogenes may be functional. Rice retrogenes have per-
gene content is that, in the case of recent duplication sisted longer than would be expected for nonfunctional
events, paralogous copies may be misannotated as a sin- elements and are under selection (Wang et al., 2006). In
gle gene. For example, there are two SEC10 paralogous addition, studies in rice and Arabidopsis found that be-
genes involved in exocytotic vesicle fusion, but these tween one-fourth and one-third of retrogenes, respec-
genes make up only one locus in the assembled tively, have similar expression patterns to their parental
Arabidopsis genome (Vukašinovic et al., 2014). Thus, genes (Sakai et al., 2011; Abdelsamad and Pecinka, 2014),
misassembly contributes to an underestimate of tan- and in Arabidopsis specifically, retrogene expression is
dem genes, and this issue is likely significant as most up-regulated in pollen (Abdelsamad and Pecinka, 2014).
plant genomes are of draft quality. However, while the functions of a few retrogenes have
In contrast to tandem duplication, which takes place been examined in Arabidopsis (Abdelsamad and Pecinka,
locally, other subgenomic duplication mechanisms form 2014), the overall functional significance of these dupli-
dispersed duplicates. These mechanisms likely involve cates has yet to be fully demonstrated.
repetitive sequences and/or replicative transposition by Taken together, WGDs and tandem duplications ac-
TEs (Alleman and Freeling, 1986; Kapitonov and Jurka, count for the majority of plant duplicates, but TE-based
2007). In the case of TE-mediated gene duplication (Fig. mechanisms and retroduplication also generate a sig-
2C) in plants, gene capture by Mutator-like elements nificant number of duplicates. It should be noted that,
(MULEs) is the most prominent example (Bennetzen, for example, in Arabidopsis, these mechanisms com-
2005). There are approximately 3,000 rice Pack-MULEs bined account for approximately 70% of duplicate genes.
that collectively contain approximately 1,000 fragmented It remains to be determined if the remaining 30% were
or whole-gene sequences (Jiang et al., 2004). There are a generated by some unknown mechanism or if there was
similar number of Helitrons (approximately 2,800) in a failure in assigning a duplication mechanism either
maize (Zea mays; Du et al., 2009) but only 46 Pack- due to the age-related erasure of specific signatures
MULEs in Arabidopsis (Jiang et al., 2011). This differ- (i.e. synteny, proximity, and repeats) or too-stringent
ence potentially reflects the historical difference in TE methods used to assign a mechanism.
activity. Interestingly, Pack-MULEs preferentially ob-
tain genic sequences (Ferguson et al., 2013), and a
subset of Pack-MULE-carried genes are expressed and
GENE DUPLICATE LONGEVITY
appear to be under selection (Hanada et al., 2009c).
AND PSEUDOGENIZATION
However, the mechanisms underlying this preferential
acquisition and the functions of the acquired genes re- Half-Life of Duplicate Genes
main unclear. Similarly, in mammals, long interspersed
nuclear elements and long terminal repeat retroposons Despite the contribution of multiple duplication mech-
are implicated in the generation of recent duplicates anisms and the variance in genome size, plant gene
(Bailey et al., 2003; She et al., 2008) in a process referred content remains relatively similar across land plant
to as segmental duplication (Fig. 2D; Bailey et al., 2002). species. Considering that at least five rounds of WGD
Note that this is distinct from the original use of seg- took place in the land plant lineage leading to
mental duplication in the plant literature, which re- Arabidopsis (Bowers et al., 2003; Jiao et al., 2011) and
ferred to rearranged genomic regions derived from assuming that the common ancestor of land plants had
WGDs (Arabidopsis Genome Initiative, 2000). Al- approximately 10,000 genes, the number of genes in
though both Pack-MULEs and mammalian segmental extant species would be 320,000 even without taking
duplicates are associated with TEs, it remains unclear if into account other duplication mechanisms. This ex-
they are generated via similar mechanisms. pected gene number is approximately 10 times higher
Another TE-associated mechanism that generates dis- than the actual gene number in Arabidopsis and indi-
persed duplicates is retroduplication (Fig. 2E; Drouin and cates extensive gene loss over time. Thus, although
Dover, 1990; Brosius, 1991). Here, mRNAs are reverse some duplicates have survived over millions to hun-
transcribed into DNA and inserted into the genome. dreds of millions of years, the predominant fate of
These duplicate genes are referred to as retrogenes. Sim- most duplicates is loss (Li, 1983; Maere et al., 2005;
ilar to Pack-MULEs, more retrogenes are found in rice Hanada et al., 2008). The preponderance of duplicates
(1,235) than in Arabidopsis (251; Wang et al., 2006; in plant genomes is driven mainly by the high rate of
Abdelsamad and Pecinka, 2014), which also may reflect duplications over evolutionary time accompanied by
differences in past and current TE activity. In Drosophila the preferential retention of some duplicates.
melanogaster, retrogenes tend to be derived from genes How long will a duplicate survive after duplication?
that are highly expressed in germ-line tissues (Langille Assuming that the mutational process that leads to
and Clark, 2007). However, because the regulatory se- duplicate loss is stochastic, the longevity of a duplicate
quences in the promoter are usually not duplicated, ret- gene can be estimated in the form of half-life (i.e. the
rogenes were initially considered dead on arrival (Graur amount of time for half of the duplicates derived from a
et al., 1989). However, there are several examples of single event [e.g. WGD] to be lost; Lynch and Conery,
2298 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

2000). The genome-wide half-life of Arabidopsis du- duplicate genes are completely identical, there should
plicates is estimated to be 17.3 million years (Lynch and not be a penalty for deleting either copy. In reality,
Conery, 2003). For example, if a WGD event happened however, duplicates are rarely equal. For example,
in the Arabidopsis lineage 17.3 MYA, we would expect copies derived from TE-mediated duplication, retro-
that approximately 50% of the duplicates from that duplication, or tandem duplication may be missing
event will have been lost. As mentioned earlier, there parts of the parent gene-coding and/or regulatory re-
have been multiple WGDs in the Arabidopsis lineage, gions. In these cases, loss of the new duplicate will
so we can evaluate the consistency of duplicate half-life likely incur no fitness penalty. Even in WGD, particu-
by considering these events independently. The most larly when it involves allopolyploidy (merging of two
recent a WGD took place approximately 50 to 65 MYA related, but not identical, genomes), the patterns of
(Bowers et al., 2003; Beilstein et al., 2010), and the ob- duplicate loss are far from random (Thomas et al.,
served duplicate survival rate ranges from 13.3% (Blanc 2006). The wholesale loss of duplicates is an important
et al., 2003) to 16.3% (Maere et al., 2005). Based on this feature of fractionation, the reduction of genic content
information, the half-life estimate of a WGD duplicates post WGD (Freeling et al., 2012). Analysis of syntenic
is 17.2 to 24.8 million years. Although this is not far off blocks produced by the a WGD in Arabidopsis
from the genome-wide estimate of 17.3 million years revealed that genes in one duplicate block were lost
(Lynch and Conery, 2003), a duplicates in general have preferentially (Thomas et al., 2006). This fractionation
a longer half-life than when all duplicates are consid- bias has been observed in several plant species, in-
ered. How about a more ancient WGD? The g dupli- cluding Gossypium raimondii (diploid cotton; Renny-
cation likely took place approximately 140 MYA Byfield et al., 2015), maize (Schnable et al., 2011), and
(Bowers et al., 2003; Moore et al., 2007), and 4.4% of B. rapa (Cheng et al., 2012). Importantly, this bias ap-
duplicates are still retained (Maere et al., 2005). This plies only to gaps in syntenic blocks that span multiple
implies a longer half-life, 31.3 million years, compared genes but not to gaps resulting from deletions of single
with a duplicates. On the other hand, certain gene genes (Thomas et al., 2006). Simulations of fractionation
families across angiosperms show bias against having based on yeast data suggest that the observed bias in
paralogs from older WGD events, suggesting that their gene loss requires deletion events covering multiple
half-lives are shorter than average (Li et al., 2016). This genes and that random, single gene losses alone do not
difference in estimated half-life between the a and g explain the observed pattern of fractionation (van Hoek
WGD events suggests that duplicate longevity is not and Hogeweg, 2007). What is the basis for fractionation
constant over time; the rate of duplicate loss appears to bias? Looking across multiple WGD events, fractiona-
decrease as the time since duplication increases, per- tion appears biased against duplicates with reduced
haps because a greater proportion of older duplicates expression level and promoter complexity (Schnable
are retained due to selective constraints. et al., 2012). Supporting this idea, analysis of lowly
Based on studies comparing duplicate and singleton expressed genes in Arabidopsis suggests that they may
(no closely related paralog) genes, the longevity of be undergoing more rapid divergence and possibly
duplicates may be influenced by molecular and bio- pseuodgenization (Yang et al., 2011).
logical functions (Maere et al., 2005; Hanada et al., Nonfunctional duplicates are not always deleted;
2008), structural features (Jiang et al., 2013), number of plant genomes are littered with thousands of appar-
protein interactions (Makino and McLysaght, 2012), ently degenerated, nonfunctional duplicates referred to
and parent of origin (Song et al., 1995). Additionally, as pseudogenes (Benovoy and Drouin, 2006; Guo et al.,
duplication mechanisms have different impacts on 2009; Zou et al., 2009a). Pseudogenes are identified
gene content. For example, WGD increases gene con- based on their similarity to annotated genes and the
tent dramatically but happens relatively infrequently. presence of disabling mutations (e.g. premature stop
In contrast, tandem duplication, although affecting a codons and frame shifts in protein-coding genes) that
limited number of genes, can increase or decrease gene lead to presumed loss of function (Vanin, 1985). Al-
number every meiotic division. These types of features though pseudogenes are presumably nonfunctional, a
that may affect duplicate retention are not taken into small subset of pseudogenes in rice and Arabidopsis are
account when estimating genome-wide half-life (Lynch clearly expressed (Yamada et al., 2003; Thibaud-Nissen
and Conery, 2003), and thus significant deviations are et al., 2009; Zou et al., 2009a). There are three potential
expected. Nonetheless, whether certain types of dupli- explanations for pseudogene expression. First, some
cates have longer or shorter half-lives, even the most pseudogenes may have been falsely predicted due to
conservative estimates suggest that most duplicates are misannotation. An example is the rice ent-KAURENE
lost relatively quickly after they are generated. SYNTHASE LIKE2 gene, which had mispredicted cod-
ing regions (Tezuka et al., 2015). This issue will likely be
less prominent as transcript-based annotations improve
Mechanisms of Duplicate Gene Loss (Law et al., 2015). The second explanation is that some
pseudogenes may still be functional as truncated pro-
The process of duplicate loss may involve deletion of teins or as RNA. For example, apomixis in the grass
the entire duplicate sequence and/or pseudogenization Paspalum simplex is hypothesized to be the conse-
through loss-of-function mutations (Fig. 3A). If two quence of antisense regulation by the transcript of a
Plant Physiol. Vol. 171, 2016 2299
Panchy et al.

Figure 3. Potential fates of duplicate genes.


Duplicate genes can be pseudogenized/lost
(A), retained by selection on existing func-
tions (B–E), or retained by selection on novel
functions (F and G). Models of selection on
existing function include the following: gene
dosage, or retention of both duplicates because
of a beneficial increase in expression (B);
duplication-degeneration-complementation
(DDC)/subfunctionalization, or retention of
both duplicates to preserve the full com-
plement of ancestral functions (C); dosage
balance, or retention of both duplicates to
maintain the stoichiometric balance (D);
and paralog interference, or retention of
both duplicates to prevent interference be-
tween the products of each paralog (E).
Models of selection on novel functions in-
clude the following: neofunctionalization,
or retention of both duplicates because of a
gain of function post duplication (F); and
escape from adaptive conflict (EAC), or re-
tention of both duplicates that allows for the
independent optimization of conflicting an-
cestral functions (G).

pseudogene related to the ORIGIN RECOGNITION for the antisense transcript. Finally, some pseudogenes
COMPLEX3 gene (Siena et al., 2016). However, there is may have become pseudogenized relatively recently
no direct evidence that the pseudogene is responsible and are in the process of complete decay. This is
2300 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

consistent with the finding that expressed pseudogenes idea that both duplicates are retained without a sig-
tend to be derived from more recent duplication events nificant change in function, either because insufficient
(Zou et al., 2009a). In addition, pseudogenes, expressed time has passed for deleterious mutations to accumu-
or not, tend to have elevated nonsynonymous (amino late or because there is selection pressure to retain re-
acid-changing) substitution rates (Zou et al., 2009a), dundant functions.
indicating that they are not subject to the same degree of Assuming that mutations accumulate randomly and
selective pressure as their intact relatives. that selection is not a factor, genetic drift will be the
Do duplicates have similar propensities for pseudo- dominant factor influencing the frequency of mutant
genization? In B. rapa and R. raphanistrum, which ex- alleles (Kimura, 1968, 1983). In this case, a mutation that
perienced a recent genome triplication event, there are appears in a gene would take approximately four Ne
significantly more pseudogenes than in the related generations to become fixed (where Ne is effective
species Arabidopsis and Arabidopsis lyrata, which did population size; Kimura and Ohta, 1969). In Arabi-
not experience a recent WGD (Moghe et al., 2014). Also, dopsis, which has an estimated Ne of 250,000 (Cao et al.,
there is a significant, positive correlation between gene 2011) and a winter annual life cycle for most accessions
family size and the number of pseudogenes within (Michaels et al., 2004), the time to fixation is approxi-
families (Zou et al., 2009a). Thus, in general, the more mately 1 million years. When a pair of duplicates is
members of a family that are duplicated, the more los- considered, the situation is more complicated because
ses occur. However, this correlation is far from perfect either copy can be lost without affecting fitness. As-
(Zou et al., 2009a), indicating that other factors are suming that the time for a recent WGD to sweep
important. One such factor is gene function. For ex- through the population is negligible, the time to fixation
ample, Arabidopsis pseudogenes tend to have func- is a function of Ne, the fitness effect of the loss of both
tional relatives playing roles in disease resistance, duplicate genes, and the mutation rate at the duplicated
specialized (secondary) metabolism, cell wall modifi- loci (Kimura and King, 1979). The average time to fix-
cation, and protein degradation (Zou et al., 2009a), but ation is estimated to be between three and 20 Ne gen-
transcription factor and receptor-like kinase families erations. In Arabidopsis, this translates to an average
tend not to have pseudogenes (Hanada et al., 2008). In fixation time of between 0.75 and 5 million years. Thus,
addition, pseudogenes tend to be derived from tandem it is expected that some duplicate genes potentially
duplicates (Hanada et al., 2008), although this may be survive for several million years due to genetic drift and
due to the higher rate of tandem duplication compared not because their presence is beneficial. In this situation,
with other duplication mechanisms. Taken together, some duplicates may be decaying functionally even
duplicate longevity depends on functional role and though there is no apparent sign of pseudogenization
duplication mechanism, which necessarily means that (Lehti-Shiu et al., 2015). However, if drift were the only
there is a significant bias in the kinds of duplicates that factor affecting retention and the expected time to de-
are retained. letion was approximately 1 million years, we would
expect a much lower genome-wide duplicate half-life
(mean lifetime approximately 1.44 3 half-life). Thus, a
substantial number of duplicates are most likely under
MECHANISMS FOR RETENTION OF selection (i.e. loss of function in either of the duplicates
DUPLICATE GENES is expected to reduce the fitness of the individual).
Genetic Drift and Genetic Redundancy Alternatively, duplicate retention might occur via se-
lection for genetic redundancy (or genetic buffering),
Over the course of plant evolution, hundreds of where the effects of a null mutation are ameliorated (or
thousands of new genes were created by duplication, buffered) due to the presence of an intact, duplicate copy
and most of these duplicates were lost over time. (Zhang, 2012). The prediction is that developmental or
Nonetheless, considering that more than half of the physiological phenotypes are only obvious when a gene
gene content in most plant species consists of dupli- and one or more of its relatives (paralogs) are mutated
cates, some duplicates have clearly escaped this fate. (Nowak et al., 1997). Consistent with the idea of genetic
Why do some duplicates persist while others are lost? redundancy, phenotypic effects when one copy of a
Models for duplicate gene retention in general have duplicated pair is disrupted are significantly smaller
been reviewed elsewhere (Innan and Kondrashov, compared with those observed when a singleton gene is
2010; Maere and Van de Peer, 2010); here, we will focus disrupted in Arabidopsis (Hanada et al., 2009a). How-
on examples of duplicate retention in plants (Fig. 3, B– ever, claims of genetic redundancy between duplicates
G). It is important to note that these models are not thus far are based on the absence of gross morphological,
mutually exclusive. For example, selection on both developmental, and/or behavioral phenotypes in highly
duplicates to maintain dosage balance (Fig. 3D) con- controlled environments. Thus, relatively subtle pheno-
tributes to increased duplicate longevity, which may typic changes or conditional phenotypes resulting from
allow time for the evolution of novel functions (Fig. 3F; mutations in one duplicate copy, which may have fit-
Veitia et al., 2013). Here, we discuss each model inde- ness consequences, may not be detected. Additionally,
pendently to emphasize the distinct mechanisms that although redundancy may be beneficial in a way that
contribute to duplicate retention. First, we discuss the is analogous to a fail-safe in an engineered system
Plant Physiol. Vol. 171, 2016 2301
Panchy et al.

(McAdams and Arkin, 1999; Kitano, 2004; Kondrashov, assumed to be independently mutable, and it is ex-
2010), it remains to be shown how selection can act in pected that the ancestral functions are partitioned sym-
anticipation of future need to favor redundancy. Al- metrically among duplicates. Interestingly, the pattern of
though long-term conservation of redundant dupli- functional partitioning between duplicates tends to be
cates is feasible in simulated situations, it requires highly asymmetric, where one copy tends to have sig-
perfect equivalency in gene functions and in mutation nificantly more subfunctions than its paralog, compared
rates between the two duplicates (Nowak et al., 1997), with what would be expected randomly (Zou et al.,
which is highly unlikely. After gene duplication, var- 2009b). Thus, either subfunctions are not independently
ious degrees of functional redundancy are expected. mutable and/or other factors affect the partitioning of
But the redundant functions can be present not nec- subfunctions.
essarily because they are useful but because there has An alternative model for duplicate retention is the
been insufficient time for their loss. Due to the chal- gene balance hypothesis (Birchler et al., 2005; Birchler
lenges in experimentally assessing the functional and and Veitia, 2007, 2010). Under one version of this
fitness contributions of duplicates, the true extent of model, after WGD, duplicate genes that are dosage
genetic redundancy and whether redundancy is the sensitive tend to be retained (dosage balance; Fig. 3D;
cause or consequence of duplicate retention remain Thomas et al., 2006). The idea is that duplication of a
unclear. gene whose product has a greater number of molec-
ular interactions (e.g. protein-protein or protein-
DNA interactions) will lead to a dosage imbalance if
Selection on Existing Functions all its interactors remain single copy. The individual
harboring the duplicate will then have reduced fit-
Duplicates can be retained without acquiring new ness due to this imbalance. A higher degree of im-
functions via one of the following four mechanisms: balance is expected for gene products with a higher
gene dosage increase (Ohno, 1970), DDC (Force et al., number of interactions. In this situation, the dupli-
1999), gene balance (Freeling and Thomas, 2006), and cation of just one gene will likely have a deleterious
paralog interference (Baker et al., 2013). Ohno (1970) effect. But when this highly connected gene is du-
recognized that, in situations where increased gene plicated along with its interactors in a WGD event, its
dosage confers an advantage by meeting metabolic retention is favorable because its removal would lead
demands, the presence of duplicate copies is benefi- to imbalance. Consistent with gene balance, the ex-
cial (Fig. 3B). Unlike redundancy, the robustness of pression patterns of highly interconnected genes tend
duplicates is clearly selectable (McAdams and Arkin, to be more highly correlated than random duplicates
1999; Kitano, 2004; Kondrashov, 2010); although the (Lemos et al., 2004). Furthermore, there is a greater
molecular function of the duplicates may be unchanged, correlation between the expression patterns of WGD
the effect of increased dosage is new. Models of budding duplicates than between tandem duplicates (Casneuf
yeast gene networks suggest that WGD likely con- et al., 2006; Arabidopsis Interactome Mapping Con-
tributed to increased flux in the glycolytic pathways, sortium, 2011).
which confers a fitness benefit in high-Glc environ- Gene balance also can result from mechanisms
ments (van Hoek and Hogeweg, 2009). Similarly, in other than dosage balance. In situations where mo-
Arabidopsis, duplicate retention after WGD is as- lecular interaction is important for gene function,
sociated with reactions with high metabolic flux degenerative mutations in one duplicate copy may
(Bekaert et al., 2011), suggesting that increased gene interfere with the functions of its paralog (paralog
dosage results in increased metabolic activity, which interference; Fig. 3E). These degenerative mutations
may be beneficial. The impact of duplicate gene will be selected against, leading to retention of both
dosage is emphasized in a recent review (Conant duplicates (Bridgham et al., 2008; Baker et al., 2013).
et al., 2014). Paralog interference is distinct from dosage balance in
In DDC or subfunctionalization (Fig. 3C; Force et al., two respects. First, paralog interference occurs at the
1999), after duplication of a multifunctional gene, each level of protein interaction and is independent of
copy randomly loses subfunctions of the original gene changes in the amount of gene products. Second, it is
(degeneration), and because each duplicate loses dif- relevant specifically to situations where the formation
ferent subfunctions, both copies have to be kept to of homomultimers is important for the ancestral gene
maintain the original, ancestral functionality (comple- function. The sequestration of interactors by the de-
mentation). An important point to emphasize is that generate copy effectively creates an imbalance but
duplicate retention through DDC does not require any not in dosage. Given that homomultimerization is
new evolved functions, just partitioning of the old ones. prominent in multiple protein families such as tran-
Evidence suggests that DDC has occurred at the level scription factors (Amoutzias et al., 2008), paralog in-
of protein function (Aklilu et al., 2014; Aklilu and terference may have a significant contribution to
Culligan, 2016), at the level of gene expression (Duarte duplicate retention post WGD (Kaltenegger and Ober,
et al., 2006; Throude et al., 2009; Zou et al., 2009a; 2015). Note that paralog interference also is distinct
Ma et al., 2015), and at both levels simultaneously from DDC, which requires degenerative mutations to
(Geuten et al., 2011). Under DDC, subfunctions are explain retention.
2302 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

Contribution of Selection on New Functions to adaptive changes in anthocyanin synthesis (Des Marais
Duplicate Retention and Rausher, 2008) and salicylic acid/benzoic acid/
theobromine enzymes, where improved enzymatic
Duplicates, although originally sharing the same activities likely are under positive selection (Huang
functions, may acquire new functions. If these functions et al., 2012).
are beneficial, selection will act to retain both dupli- The models that we have presented here as distinct
cates. Two models that explain duplicate retention due may actually involve a combination of different mech-
to the acquisition of novel adaptive functions are neo- anisms. For example, EAC invokes both the neo-
functionalization (Ohno, 1970) and EAC (Des Marais functionalization and DDC models to explain duplicate
and Rausher, 2008). Under the neofunctionalization retention. Similarly, under the subneofunctionalization
model, one duplicate retains the ancestral function model (He and Zhang, 2005), the expected outcome of
while its paralog gains a novel function (Fig. 3F). If duplication is first the partitioning of ancestral func-
the novel function contributes to better fitness, selec- tions followed by neofunctionalization in both dupli-
tion should maintain both duplicates. Note that deter- cates. Subneofunctionalization is distinct from EAC,
mining whether neofunctionalization has taken place however, in that the total number of unique functions of
requires knowledge of gene functions prior to duplica- the duplicate pair is expected to exceed the number of
tion. Some examples where neofunctionalization after original functions. This is consistent with the finding
duplication has likely contributed to duplicate retention that the numbers of protein interactions among two
include MADS box transcription factors involved in the duplicate genes are higher than those of randomly
evolution of novel floral structures (He and Saedler, 2005; paired singletons (He and Zhang, 2005). Paralog
Hu et al., 2015b; Zhang et al., 2015), 4,5-dioxygenase interference does not require adaptive change as a
and cytochrome P450 genes that contribute to pig- mechanism for duplicate retention, but subsequent
ment variation in Caryophyllales (Brockington et al., neofunctionalization may resolve interference in a
2015), and the recruitment of duplicated primary me- way that further increases the adaptive value of both
tabolite genes into specialized metabolite pathways duplicates (Baker et al., 2013; Kaltenegger and Ober,
(Durbin et al., 2000). At the gene expression level, it is 2015). Taken together, the retention of a duplicate
estimated that approximately 10% of Arabidopsis du-
may involve any individual or combination of the
plicate genes have gained a novel response to stress
above mechanisms, and it is of interest to know the
conditions (Zou et al., 2009b), although it remains to be
relative contributions of each. Although some studies
determined whether these novel responses are adap-
have explicitly compared different models of reten-
tive or not.
tion (Yang et al., 2006), the major obstacle to assessing
Similar to DDC, EAC (Hittinger and Carroll, 2007; Des
the mechanisms of duplicate retention remains the
Marais and Rausher, 2008) predicts subfunctionalization
inference of ancestral function, which requires a
followed by the specialization of duplicates, but in
number of assumptions and is hypothetical. Thus,
EAC, novel functions arise prior to duplication (Fig.
assessing the relative contributions of each duplica-
3G). Subsequent subfunctionalization allows the
tion mechanism, or combination of mechanisms, to
separate functions of the ancestral gene to evolve in-
duplicate retention remains a major challenge.
dependently. Thus, the distinguishing characteristic
of EAC is improvement of the original ancestral
function in one duplicate and of the novel function in
the other (Des Marais and Rausher, 2008). EAC also is PROPERTIES OF RETAINED DUPLICATES
similar to the classic neofunctionalization model in Evolutionary Rate of Duplicates
that duplication allows further adaptive changes to
accumulate, but the neofunctionalization model pos- After gene duplication, the rate of evolution (se-
tulates that, after duplication, only one copy acquires quence substitutions) should increase, at least initially,
a novel function. Under both neofunctionalization because the presence of two copies relaxes selection
and EAC, duplicates are retained due to their adap- against previously deleterious mutations (Scannell
tive contribution: individuals with duplicates that and Wolfe, 2008). Consistent with this expectation,
have novel and/or optimized functions are expected duplicates display relaxed purifying selection (Carretero-
to have higher fitness compared with individuals Paulet and Fares, 2012) as well as differences in
with single copies. In contrast, under DDC, the du- sequence structure, such as length of the coding region
plicates do not provide an adaptive edge compared and the size and distribution of indels (Wang et al.,
with the single-copy gene. Because of the requirement 2013a). This increase in evolutionary rate is not nec-
that the novel function existed prior to duplication, essarily equal for both copies: the proportion of WGD
EAC is likely particularly relevant to genes that duplicates with evidence of asymmetric evolution
are nonessential and not highly conserved but are ranges from 21.2% in maize to 68.3% in P. patens
highly liable to selection (Sikosek et al., 2012). Examples (Carretero-Paulet and Fares, 2012). Additionally, after
of EAC in plants include improved enzymatic ac- a recent genome triplication approximately 25 MYA,
tivity on ancestral flavonoid substrates after dupli- 13% to 19% of B. rapa and R. raphanistrum paralogs
cation of dihydroflavonol-4-reductase leading to evolved asymmetrically (Moghe et al., 2014). This pattern
Plant Physiol. Vol. 171, 2016 2303
Panchy et al.

is expected under the neofunctionalization model, where (Renny-Byfield et al., 2014). Expression divergence at
one copy maintains the original function while mutations the protein level also has been documented (Hu et al.,
that contribute to new functions accumulate in the other 2015a). This divergence in expression may be under
copy. However, this pattern also can be explained by the selection due to subfunctionalization and/or neo-
gradual loss of function in one copy: once one degener- functionalization. However, this difference in expres-
ative mutation has occurred in one duplicate, the chance sion could result from fractionation among WGD
for a second mutation to occur in the same duplicate is duplicates (Schnable et al., 2011) that may not be subject
expected to be higher. Thus, for any particular case of to selection, at least initially. If expression differences
asymmetry, the underlying cause needs to be deter- were purely neutral, paralogs from younger duplica-
mined, and neofunctionalization cannot be assumed by tion events would be expected to have more similar
default. expression profiles. Yet the correlation between dupli-
Asymmetry in evolutionary rate demonstrates dif- cate expression similarity and the timing of duplication
ferential evolution between duplicate copies, but what (using synonymous substitution rate as a proxy) is
about relative to the previous, unduplicated copy? To weak, and the direction of correlation is not consistent
assess whether duplication is associated with altered between WGD (positive) and tandem (negative) du-
evolutionary rates, three different comparisons can be plicates (Haberer et al., 2004; Ganko et al., 2007). This is
made. First, a duplicate pair can be compared with the also true in rice (Li et al., 2009) and poplar (Populus spp.;
putative ancestral gene. This is exceedingly difficult Rodgers-Melnick et al., 2012). Thus, expression diver-
due to the challenges associated with inferring ancestral gence is not an entirely neutral process. In contrast,
function. Second, a duplicate pair can be compared there is a highly significant negative correlation be-
with a closely related singleton (that was duplicated in tween nonsynonymous substitution rate and expres-
the past but whose paralogs were lost). In this case, the sion similarity between duplicates (Ganko et al., 2007).
duplicates and singletons have common ancestry and, That is, the more divergent the protein sequences are,
presumably, similar functions. An exemplary study of the more similar the expression profiles. Assuming that
this type provides evidence for asymmetric sequence the duplicates that contribute to this pattern are in-
evolution of both WGD and small-scale duplicates in dispensable, it would appear that duplicates will be
multiple plant species (Carretero-Paulet and Fares, retained if they have sufficiently large differences in
2012). Third, all duplicates (genes with paralogs) can be either expression profiles or sequences and, in some
compared with all singletons (those with no obvious cases, a combination of both. Consistent with this no-
paralog) in order to assess the average trend. In Ara- tion, younger duplicates that are essential (lethal when
bidopsis, the nonsynonymous (amino acid-changing) mutated) do, in fact, have more divergent expression
substitution rates of duplicates tend to be lower com- profiles compared with older, essential duplicate genes
pared with those of singletons (Yang and Gaut, 2011). (Lloyd et al., 2015). In this case, the large difference in
The apparently more constrained evolution among expression patterns of young duplicates contributes to
duplicate genes can be the consequence of gene balance the lack of buffering effect if one duplicate is lost.
and/or paralog interference, but there can be other Expression divergence between duplicates can arise
confounding factors that complicate the comparison of due to differences at various stages of expression reg-
duplicates and singletons. For example, the evolution- ulation. First, differences in cis-regulatory elements
ary rate differences between duplicates and singletons between Arabidopsis duplicate genes explain, in part,
can reflect differences in gene functions (discussed be- why duplicates have different responses to stressful
low) and sequence features such as protein domains, environments (Zou et al., 2009b). In Gossypium hirsutum
which tend to be longer, more numerous, and more (allotetraploid cotton), 40% of homologs derived from a
highly conserved in duplicates than in singletons recent WGD display transcriptional divergence that can
(Chapman et al., 2006). Additional sequence features be attributed to cis-regulatory divergence between the
differ between singleton and duplicate genes across diploid progenitors (Chaudhary et al., 2009). This result
multiple plant species (Jiang et al., 2013), but it remains also is consistent with a recent study that found diver-
to be seen whether these features contribute to the dif- gence in DNaseI footprints in the promoters of recently
ference in evolutionary rate or are themselves conse- duplicated genes, suggesting that they are regulated by
quences of duplication. different sets of transcription factors (Arsovski et al.,
2015). The degree of regulatory interaction divergence
is dependent on the duplication mechanism: most
Expression Patterns of Duplicates WGD-derived duplicates share some regulatory inter-
actions, while non-WGD duplicates tend not to have
In addition to sequence differences, duplicate genes overlapping regulators (Arsovski et al., 2015). Gene
can have divergent expression patterns. At the tran- body methylation is found to impact transcription
scriptional level, approximately 70% of duplicate pairs (Lister et al., 2008; Takuno and Gaut, 2012), and di-
in Arabidopsis have significant differences in transcript vergence in methylation pattern between duplicates
levels (Ganko et al., 2007). In Gossypium (cotton), the is significantly, although rather weakly, correlated
transcript levels of 99.4% of the duplicates derived from with expression divergence in Arabidopsis (Wang
a WGD event that occurred 60 MYA have diverged et al., 2014a), rice (Wang et al., 2013b), and cassava
2304 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

(Manihot esculenta; Wang et al., 2015a). Beyond while genes derived from WGD have greater regula-
transcriptional regulation, duplicates with divergent tion connectivity, tandem and WGD duplicates show
microRNA-binding sites tend to have more divergent no significant differences in the number of associated
expression profiles (Wang and Adams, 2015), and protein-protein interactions (Carretero-Paulet and
duplicates also can differ significantly in alternative Fares, 2012).
splicing. For example, in hexaploid bread wheat, 42% The function of a gene also can be defined as the
and 61% of alternative splicing events differ between impact of its gene product on a biological process. There
homologous gene pairs in chromosome 3A-3B and 3A- is a large body of experimental evidence demonstrating
3D comparisons, respectively (Akhunov et al., 2013). how biological process functions differ between two
In Arabidopsis, 85% of a WGD and 89% of tandem plant duplicates. For example, paralogous MADS do-
duplicates have divergent alternatively spliced forms main transcription factors control different aspects of
(Tack et al., 2014). plant development, and this functional divergence is
Much like evolutionary rate, the expression of du- due to changes in both promoter and coding regions
plicates as a whole can be significantly different from (McCarthy et al., 2015). To summarize how the bio-
that of singletons (Holstege et al., 1998; Seoighe and logical processes involving duplicates differ from those
Wolfe, 1999). In plants, duplicates tend to be consis- involving singleton genes, one approach is to examine
tently more highly expressed than singletons in mul- annotated gene functions. Importantly, the functional
tiple plant species (Jiang et al., 2013). Conversely, differences between duplicates and singletons are
while breadth of expression also is higher in duplicates heavily influenced by the duplication mechanism. For
in general, the trend is not universal. The differences in example, Arabidopsis WGD duplicates involved in
chromatin accessibility between duplicates and sin- signal transduction and transcriptional regulation tend
gletons may provide an explanation for expression to be retained (Blanc and Wolfe, 2004b; Seoighe and
level differences: in Arabidopsis, the promoter regions Gehring, 2004; Maere et al., 2005; Shiu et al., 2005). On
of duplicate genes have nearly twice as many DNase I the other hand, Arabidopsis and rice genes in Gene
hypersensitive sites compared with singleton genes Ontology (GO; Ashburner et al., 2000) categories rele-
(Arsovski et al., 2015). Taken together, these studies vant to response to environmental stress tend to be
highlight the molecular mechanisms that underlie tandemly duplicated (Rizzon et al., 2006; Hanada et al.,
expression divergence between duplicates and be- 2008), and Arabidopsis genes that are transcriptionally
tween duplicates and singletons. The challenge now responsive to stress have experienced lineage-specific
is to distinguish between the expression differences expansion due mainly to tandem duplication (Hanada
that contribute to differences in duplicate functions et al., 2008). To assess if the functional biases in dupli-
and those with no significant impact. cate retention are consistent across species, the enrich-
ment of GO terms in WGD and tandem duplicates was
evaluated in six flowering plants (Jiang et al., 2013). The
Functions of Duplicates overrepresentation of GO terms was found to be anti-
correlated between WGD and tandem duplication
Given that some, although not all (e.g. gene balance), within each species. However, only a subset GO terms
models of duplicate retention involve selection on gain- (transcriptional regulation, ribosomes/translation,
or loss-of-function mutations, we expect to find evi- amino acid synthesis, and kinase activity) had the same
dence of functional difference in duplicate pairs. For the pattern of enrichment in the majority of species (Jiang
purposes of this review, we classify functions into two et al., 2013). This is consistent with another study that
different categories: molecular function and biological found differences between the functions of WGD and
process-based function. Molecular function is defined tandem genes but little correlation between the overall
as the molecular activity of a gene product (e.g. protein- pattern of GO category enrichment in duplicate genes
protein interaction). Analysis of Arabidopsis inter- across four plant species (Carretero-Paulet and Fares,
actome data revealed that duplicates tend to have 2012). Conversely, gene families that are common
different interaction partners (Carretero-Paulet and across sequenced angiosperms are biased toward
Fares, 2012; Guo et al., 2013). Nearly half of all WGD single-copy status and retained as such in all species
duplicate pairs are completely diverged, with no shared except those with a recent WGD (less than 50 MYA;
protein-protein interactions (Guo et al., 2013). Younger Li et al., 2016). Although GO-based analysis has
duplicate pairs tend to have more shared interactions, been informative, there can be significant issues
but a similar portion of young (92.7%) and old (97.3%) with its use (Rhee et al., 2008), particularly because
duplicates have at least some divergence in protein- there are differences in the quality of functional
protein interactions. Greater discordance in protein- annotation for different genomes. Further meta-
protein interactions tends to be correlated with decreased analysis using compiled experimental data across
expression similarity and a greater number of distinct species, which is routinely done in evo-devo studies
protein domains (Guo et al., 2013). Duplicate genes of floral evolution, for example (Fornoni et al., 2016),
as a whole encode proteins with significantly more may provide the required resolution and accuracy
protein-protein interactions compared with singleton to assess functional differences between duplicates
genes (Alvarez-Ponce and Fares, 2012). However, and singletons.
Plant Physiol. Vol. 171, 2016 2305
Panchy et al.

Taken together, duplicate and singleton genes have patterns (Liu et al., 2011). Novel leaf gene expression
significantly different sequence properties, expression patterns also are observed for some maize duplicate
patterns, molecular functions, and biological roles. genes (Hughes et al., 2014). However, a novel expres-
These properties can be used to construct quantitative sion pattern is not necessarily adaptive. In the case of
models that can predict whether a duplicate will be novel environmental responses, some of these re-
retained or not. Among the properties that distinguish sponses may be adaptive (Zou et al., 2009b), but direct
duplicates from singletons, which ones are more im- evidence is not available. Duplication also has con-
portant? Based on duplicate genes from six flowering tributed to the generation of novel metabolic functions.
plant species, the number of unique protein domains Specialized metabolism genes are retained preferen-
(another metric for the number of functions), nucleotide tially following tandem duplication (Yu et al., 2015),
diversity, and GC content were found to be the most and tandemly duplicated clusters have been found in
important predictors (Jiang et al., 2013). In another multiple plant species (Kliebenstein and Osbourn, 2012;
study, where multiple sequence properties, measures of Nützmann and Osbourn, 2014). Interestingly, some
conservation, molecular functions, and biological net- specialized metabolic genes likely arose from neo-
work information were consolidated in a single quan- functionalized duplicates of primary metabolism genes
titative model to predict retention (Moghe et al., 2014), (Qi et al., 2006), and further duplications of specialized
it was shown that no single property could accurately metabolic genes have likely contributed to additional
distinguish between duplicates and singletons. How- novel biochemical activities (Field and Osbourn, 2008;
ever, models that used only GO terms, sequence, and Takos et al., 2011). In some cases, these duplicates are
conservation features could predict duplicates equally implicated in defense against herbivores and microbes
well as models including the full set of features (Moghe as well as in attracting pollinators (Mizutani and Ohta,
et al., 2014), highlighting the importance of these fea- 2010; Moghe and Last, 2015). However, the adaptive
tures in duplicate retention. significance of most novel biochemical activities has yet
to be demonstrated.
There also is evidence that duplicate genes have
contributed to novel plant structures/functions, and in
EVOLUTIONARY, ECOLOGICAL, AND AGRONOMIC
some cases, there is clear adaptive significance. A key
IMPACTS OF GENE DUPLICATION
example is the specification of floral organ identity by
Contribution of Duplicate Genes to Evolutionary Novelty MADS box transcription factors (Bowman et al., 1989;
Sommer et al., 1990; Coen and Meyerowitz, 1991;
A common theme among models of duplicate re- Theissen, 2001). Although the MADS box gene family
tention is that, for both copies of a gene to be main- originated prior to the divergence between major eu-
tained, differences in function, expression, or interaction karyotic lineages (Gramzow et al., 2010), extensive
occur in one if not both paralogs. In some cases, du- duplication events have resulted in the expansion of
plicates will acquire novel functions and contribute MADS box genes involved in floral development
to evolutionary novelties. For certain duplicates, evi- (Purugganan et al., 1995; Kramer et al., 1998, 2006;
dence for a novel function can be seen in the mor- Causier et al., 2010). Many MADS box paralogs appear
phological phenotypes caused when they are knocked to have redundant functions (Vandenbussche et al.,
out, and the presence of such a phenotype is correlated 2004; de Martino et al., 2006; Geuten and Irish, 2010),
with the extent of sequence and expression divergence but others have clearly diverged in function (Airoldi
(Hanada et al., 2009b). The impact of duplication on and Davies, 2012). Some examples include the di-
evolution has been discussed in depth (Van de Peer vergence of the AP3/TM6 paralogs in petunia (Pe-
et al., 2009b). Here, we focus on plant examples and tunia hybrida) due to a change in regulatory elements
classify evolutionary novelties into three types: (1) (Rijpkema et al., 2006), the modification of preexisting
novel molecular function (e.g. expression in a novel floral regulators leading to the development of novel
context or interaction with a new protein); (2) novel floral organ types in the Ranunculales (Kramer et al.,
plant structure/function that results from a new mo- 2007; Rasmussen et al., 2009), and the lineage-specific
lecular function (e.g. a new floral organ or a new cell expansion and divergence of the DEFICIENS genes in
type); and (3) novel adaptive traits that result from a orchids (Orchidaceae spp.), giving rise to unique floral
novel structure/function (e.g. disease resistance). This features (Mondragón-Palomino and Theissen, 2011).
distinction is important because the novel functions/ These examples highlight the importance of duplication
structures in types 1 and 2 need not be adaptive and, and divergence of floral identity genes as a major con-
thus, may be novel without being selected for. What is tributor to the evolution of floral morphology. This
the evidence that gene duplication has contributed to process also has likely been important to the evolution
these three types of novelties in plants? of vegetative organs. For example, a duplicated KNOX
Novel molecular functions acquired after gene du- transcription factor has acquired a novel regulatory
plication can easily be seen in the context of gene ex- pattern that regulates leaf shape and aboveground ar-
pression. For example, Arabidopsis duplicates have chitecture in plants (Furumizu et al., 2015). Finally,
likely accumulated novel environmental responses duplication can contribute to the interactions of plants
(Zou et al., 2009b) and novel developmental regulatory with other organisms: the duplication of a receptor-like
2306 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

kinase gene originally involved in mycorrhizal symbi- 2008, 2009). Similarly, the duplication and subsequent
osis gave rise to the lysin motif receptor-like kinase diversification of CYCLOIDEA2-like transcription fac-
SILYK10 in tomato (Solanum lycopersicum), which likely tors in Malpighiaceae is thought to have been important
adopted a new role in nodulation with clear adaptive for the evolution of bilateral symmetry (zygomorphic)
significance (Buendia et al., 2016). These are but a few in flowers from radial symmetry (actinomorphic;
examples of evolutionary novelties that can be found in Zhang et al., 2010).
the large body of plant genomic and functional studies. In addition to developmental novelties that facili-
It should be emphasized that a novel molecular tate ecological interactions, duplicate genes are im-
function may not necessarily contribute to a novel plant portant components in the evolutionary arms race
structure/function. In addition, a novel plant structure/ between plants and pathogens/herbivores. Plant de-
function may not necessarily be adaptive. The challenge fense against herbivores involves specialized me-
lies in not only determining the presence of a novel tabolites such as the glucosinolates in Brassicales
activity but also assessing its adaptive significance. (Demain and Fang, 2000; Halkier and Gershenzon,
Using the disease resistance (R) genes as an example, 2006) and novel glucosinolate pathway components,
tandem duplication (Rizzon et al., 2006; Yu et al., 2015) which likely derived from duplication events (Edger
and WGD (Cannon et al., 2002; Plocik et al., 2004; et al., 2015). In particular, the retention of the core Trp
Zhang et al., 2014) have contributed to a net gain of R pathway gene duplicates derived from the b WGD
genes over the course of plant evolution. Based on event appears to have led to the evolution of the glu-
detailed functional studies of selected R genes, they cosinolate pathway (Edger et al., 2015). Similarly, the
are important for eliciting proper host defense re- previously noted function and expansion of R gene
sponses; thus, their presence is clearly adaptive (Dangl families are suggestive of the importance of R gene
and Jones, 2001; Jones and Dangl, 2006). In addition, duplication in the interactions between plants and
strong positive selection driving sequence divergence pathogens.
between some R gene duplicates has been noted Interestingly, aside from plant-nonplant interactions,
(Bakker et al., 2006; Ratnaparkhe et al., 2011). Thus, the both R genes and specialized metabolic genes also can
proliferation of R genes has been hypothesized to be impact plant speciation. Speciation results from barriers
an indication of their adaptive value (Holub, 2001). in gene flow, which can arise at different times during
Nonetheless, R genes belong to one of the most highly development and can involve many different mecha-
variable gene families (Clark et al., 2007) and tend to nisms, from molecular to environmental (Rieseberg and
have an excess of pseudogenes (Zou et al., 2009a). A Willis, 2007). At the genetic level, incompatibility be-
substantial number of R genes may not be under se- tween related species can be explained by the Bateson-
lection, and for those that are, assessing the adaptive Dobzhansky-Muller model, which involves negative
significance is challenging because the biotic factors epistasis between two or more divergent loci in hybrids
that interact with the R gene products are largely un- (Orr, 1996). After gene duplication, differential loss of
known. This challenge is not specific to R genes but function or patterns of subfunctionalization between
applies to any duplicate gene thought to contribute to duplicate loci can, in some situations, result in net loss
increased fitness. of function for certain gamete combinations (Force and
Lynch, 2000).
Multiple examples of negative interactions resulting
Ecological Impacts of Duplicate Genes from copy number variation and reciprocal silencing
of duplicate genes were observed in a survey of spe-
The evolutionary innovations ascribed to duplicate ciation genes (Rieseberg and Blackman, 2010). For
genes can have important ecological implications. Gene example, R gene duplicates have been hypothesized to
duplication has contributed to developmental nov- play an important role in speciation due to their ability
elties that facilitate interactions between plants and to induce necrosis in hybrids (Bomblies and Weigel,
other species. For example, the evolution of floral 2007). Defensive compounds such as glucosinolates
characteristics is strongly associated with functional may function as part of a pollinator syndrome (Demain
groups of pollinators (Fenster et al., 2004). The concept and Fang, 2000; Halkier and Gershenzon, 2006), which
of a pollinator syndrome suggests that selective pres- can serve as a reproductive barrier. Hybrid incom-
sure exerted by pollinators results in the convergent patibility has been shown to result from the differ-
evolution of a common set of floral traits. In orchids, ential expression of tandemly duplicated receptor-like
the majority of species have only a single pollinator kinases that are involved in innate immunity in Ara-
(Tremblay, 1992), but the evolution of specialized mor- bidopsis (Smith et al., 2011) and by negative inter-
phology for certain pollinators has occurred multiple actions between tandemly duplicated clusters of
times (Johnson et al., 1998). According to the orchid receptor-like kinases and subtilisin-like proteases
code hypothesis, the proximal cause of the diversity in wild rice (Oryza rufipogon) and domesticated rice
of orchid floral structures can be attributed to two (Chen et al., 2014).
DEFICIENS-like transcription factor duplication events Because a change in ploidy levels is expected to result
followed by gain and/or loss of function in different in instant reproductive isolation, WGD is seen as a
orchid lineages (Mondragón-Palomino and Theissen, major mechanism of plant speciation (Ramsey and
Plant Physiol. Vol. 171, 2016 2307
Panchy et al.

Schemske, 1998). Consistent with this expectation, there Contribution of Duplicate Genes to Agronomic Traits
is a significant correlation between the presence of a
recent WGD event and the number of extant species in Gene duplication facilitates the evolution of novel
Brassicaceae, Cleomaceae, Fabaceae, Poaceae, and traits that can be subjected not only to natural selection
Solanaceae (Soltis et al., 2009). Similarly, an estimated but also to artificial selection, which is important to crop
15% of speciation events in angiosperms and 31% in improvement. A recent review has summarized studies
ferns were associated with an increase in ploidy revealing that duplicate genes derived from polyploidy
(Wood et al., 2009). Duplication also has been associ- can be key to crop domestication and the evolution of
ated with the evolution of interspecies interaction in stress resistance/tolerance traits (Renny-Byfield and
plants (Edger et al., 2015) as well as the evolution of Wendel, 2014). Many crop plants have experienced
novel structures/functions in angiosperms (Soltis and relatively recent WGDs: 1.5 MYA in G. hirsutum (Li
Soltis, 2016), which are thought to be important to et al., 2015a), 13 MYA in soybean (Schmutz et al., 2010),
speciation and diversification. The radiation of plant and 0.5 MYA and approximately 3.5 MYA in wheat
species appears to lag significantly behind WGDs, (Brenchley et al., 2012). In bread wheat, polyploidy
which has led to the proposition that WGD and the contributed to the grain free-threshing character con-
subsequent development of novel traits primes a trolled by complex interactions between the duplicate
population for speciation by a subsequent dispersal Q transcription factors (Zhang et al., 2011) and to the
event (Schranz et al., 2012; Tank et al., 2015). Consis- soft grain character controlled by the Hardness locus
tent with this model, the timing of recent ploidy events (Chantret et al., 2005). It is also hypothesized that poly-
in angiosperms is clustered around the Cretaceous- ploidization has contributed to the expansion of wheat
Paleogene extinction (Vanneste et al., 2014), and storage protein genes (Brenchley et al., 2012). In
WGD events and speciation events in conifers oc- G. hirsutum, the recent WGD has resulted in up-regulation
curred around the time of the Permian-Triassic ex- and increased selection of fiber genes on the A(t) sub-
tinction (Lu et al., 2014; Li et al., 2015b). However, genome compared with the progenitor genes, which is
while the lag-modulated association between WGD correlated with the production of longer, spinnable fi-
and diversification has been shown to be significant bers (Li et al., 2015a). Similarly, in soybean, nodulation
(Tank et al., 2015), the connections between WGD and and oil production gene duplicates tend to be retained
species radiation are correlational, not cause and ef- post WGD (Schmutz et al., 2010) and may contribute to
fect. Neopolyploid lineages are more prone to extinc- soybean domestication traits. Based on comparative
tion than diploids (Mayrose et al., 2011), which is genomic evidence, it is also argued that selection in
consistent with the observation that the rate of neo- Brassica napus has led to the preservation of duplicate
polyploidization in plants is high yet detectable spe- oil biosynthesis genes (Chalhoub et al., 2014). Although
ciation events due to WGD are relatively rare (Otto not all of the above examples provide direct evidence,
and Whitton, 2000; Ramsey and Schemske, 2002; Jiao they are suggestive of the impact of gene duplication,
et al., 2011; Moghe and Shiu, 2014). Given that speci- particularly polyploidy, on agronomically relevant
ation by WGD is rare, the correlation between WGD traits.
and radiating events may be because they are unlikely In addition to WGD, smaller scale duplications can
to be observed in association with dispersed events. contribute to agronomic traits. Given that a number of
Taken together, gene duplications, both small and genes involved in plant stress tolerance or resistance are
large scale, may be important in plant adaptation to found in tandem clusters and display copy number
variable abiotic and biotic environments, in speciation, variation (Ellis et al., 2000), it is expected that tandem
and in contributing to the diversity of angiosperm duplication has played a major role in these traits. For
species. Although there is indirect evidence, direct example, the NBS-LRR gene family, whose members
demonstration of the role of duplicate genes in eco- have roles in biotic stress resistance, are highly variable
logical adaptation is challenging to come by. In a between rice cultivars (Yang et al., 2008). In terms of
landmark study examining the genetic basis of how abiotic stress tolerance, one prominent example is the
plants adapt to their local environment, 15 fitness rice Submergence1 (Sub1) tandem cluster, which con-
quantitative trait loci were identified using an Ara- tains multiple ethylene-responsive factor genes (Sub1A,
bidopsis recombinant inbred population derived Sub1B, and Sub1C) and is involved in submergence
from a Swedish accession and an Italian accession tolerance (Fukao et al., 2006). Through comparisons of
(Ågrena et al., 2013). Interestingly, one major quanti- rice cultivars and wild rice species, it was shown that
tative trait locus contains the C-REPEAT-BINDING the Sub1A haplotype likely arose recently, potentially
FACTOR (CBF) locus, which has three CBF genes after rice domestication (Fukao et al., 2009).
known to control plant cold tolerance (Gehan et al., Tandem duplicates also are implicated in the evolu-
2015). In the Italian population, CBF2 is nonfunctional, tion of other agronomic traits. For example, copy
resulting in reduced cold tolerance. Although this is number variation in a tandem duplicated region con-
not an example of a new duplicate attaining a novel, trols rice grain size diversity, an important quality trait
adaptive function, it provides direct evidence of the (Wang et al., 2015b). This region contains two copies of
importance of duplicate genes in influencing species Grain Length on Chromosome7 (GL7) that are homologs
range due to abiotic constraints. of the Arabidopsis LONGIFOLIA gene, which regulates
2308 Plant Physiol. Vol. 171, 2016
Origin and Evolution of Duplicate Genes

cell elongation. Tandem duplication leads to elevated interference. In particular, the challenge is to determine
levels of GL7 expression and increased grain length the relative contributions of these retention mecha-
(Wang et al., 2015b). Based on comparisons of wild and nisms. Another challenge is that knowledge of func-
domesticated pepper (Capsicum annuum) species, tan- tional divergence alone is insufficient to distinguish
dem duplicate genes involved in capsaicin biosynthesis between retention mechanisms. Knowledge of ances-
have likely contributed to the diversification of pun- tral functions and expression state, which can only be
gency in peppers (Qin et al., 2014), which may be the inferred, also is required.
basis of pungency variation among domesticated pep- The retention mechanisms outlined above all involve
per varieties. Additionally, domesticated tomato pro- natural selection on existing and/or novel functions.
vides an example where an agronomic trait was Thus, there must be a fitness cost if one of the dupli-
influenced by transposon-mediated duplication. Fruit cates is lost. There are but a few studies that have di-
shape difference between domesticated and wild (Solanum rectly addressed the fitness contribution of duplicates
pimpinellifolium) tomato is due to the increased expres- (DeLuna et al., 2008; Qian and Zhang, 2014). Instead,
sion of IQD12 after retrotransposon-mediated dupli- the great majority of studies on plant duplicate genes
cation to a new genomic region (Xiao et al., 2008). have focused on morphological, developmental, and/
In the previous sections, we have discussed several or physiological phenotypes of loss-of-function mu-
domestication-related traits that can be attributed to tants in highly controlled environments. In many cases,
gene duplications. One obvious omission is flowering the lack of a phenotype when one duplicate is lost is
time. When plants are grown in new environments, the attributed to genetic redundancy (Hanada et al., 2009a).
change in photoperiod makes selection for proper Although genetic redundancy is an authentic phe-
flowering time essential. In sunflower (Helianthus nomenon, one cannot rule out the possibility that the
annuus), the expression of Flowering Locus T (FT) du- specific environment requiring the function of the ap-
plicates is central to the control of flowering time parently redundant duplicate has not yet been identi-
(Blackman et al., 2010). During sunflower domestica- fied. It is also possible that there are subtle phenotypes
tion, a dominant negative mutation likely occurred in that remain undetected but have significant fitness
the FT1 duplicate and contributed to delayed flower- consequences (Ågrena et al., 2013). Testing these two
ing. Another example is the Heading Date1 (HD1) locus, possibilities requires assessing the fitness cost of loss-of-
which encodes a member of the CONSTANS tran- function mutations in the field, preferably in native
scription factor family (Liu et al., 2015). In sorghum environments. Recent studies of Arabidopsis local ad-
(Sorghum bicolor), foxtail millet (Setaria italica), and rice, aptation show that it is feasible to detect minute fitness
mutant HD1 orthologs have contributed to delayed effects (Ågrena et al., 2013). In addition, especially for
flowering time; however, the presence of distinct mu- recently duplicated genes (i.e. from duplication events
tations in each lineage suggests independent origins that occurred approximately 1 MYA), one cannot rule
over the course of parallel domestication (Liu et al., out the possibility that some duplicates persist because
2015). In summary, because of their abundance and the not enough time has passed for them to be removed by
potential for functional divergence and the acquisition genetic drift, even though they are no longer functional
of new functions, duplicate genes have contributed to (Kimura and Ohta, 1969). This is particularly true for
the evolution of morphological, nutritional, and phys- selfing plants like Arabidopsis, where even deleterious
iological traits in crops. alleles are not efficiently eliminated (Bustamante et al.,
2002). Given the diversity in life histories and envi-
ronments encountered by different plant species,
FUTURE DIRECTIONS knowledge of the past history (i.e. history of selection,
bottlenecks, gene flow, changes in effective population
Studies based on accumulating comparative and size, and mating system) will be necessary to better
functional genomic data have contributed to our estimate the contribution of drift to the persistence of
understanding of the life cycle of duplicated genes, duplicates. This knowledge can be partially acquired from
including their origins, longevity, mechanisms of re- comparative studies of variation in duplicate gene content
tention, molecular functional implications, and im- within and between related plant species. Initiatives like
pacts on plant evolution and ecology. Nonetheless, the Arabidopsis 1001 Genomes Project (Cao et al., 2011)
there are still many unresolved questions (see “Out- and parallel efforts in other species may soon provide
standing Questions”). Although metrics like half-life some insights in this regard.
provide a good indicator of the average behavior of Genome-wide studies of duplicate genes have re-
duplicate genes, there is large variance in duplicate vealed that retained duplicates tend to have particular
longevity (Lynch and Conery, 2003; Maere et al., 2005). patterns at the sequence, expression, and molecular
Why are some duplicates retained longer than pre- function levels (Ganko et al., 2007; Carretero-Paulet and
dicted by the average half-life? What are the factors Fares, 2012; Jiang et al., 2013). However, each pattern
contributing to short-lived duplicates? The answers to can only marginally predict duplicate retention (Jiang
these questions lie in a better understanding of re- et al., 2013; Moghe et al., 2014), and there remains much
tention mechanisms, including neofunctionalization, unexplained variance even when most factors that have
dosage effect, EAC, DDC, gene balance, and paralog been shown to be correlated with retention are combined
Plant Physiol. Vol. 171, 2016 2309
Panchy et al.

of expression, or interaction) with no effect on fitness


could have been fixed by genetic drift. When a dupli-
OUTSTANDING QUESTIONS cate pair is retained because of dosage sensitivity and
paralog interference, an unrelated change in function
 Why are some duplicates retained for longer periods of time may be misidentified as a functional novelty. Further-
than expected, and what factors contribute to differences in more, apparent novelty at the single gene level may
duplicate gene longevity?
result from larger scale changes in the genome follow-
 To what extent are duplicate genes functionally redundant? Is ing WGD, such as fractionation (Schnable et al., 2011).
pure genetic redundancy possible? To be able to claim that natural selection is important,
 What are the systems-level impacts of gene duplication? Are either a direct study of fitness or molecular evidence of
these impacts dependent on duplication mechanism? a nonneutral mutation is required. Similarly, the con-
 What are the relative contributions of retention mechanisms, tribution of duplication to the diversification of a mul-
singly or in combination, to the maintenance of duplicate titude of plant traits has been explored, but few have
pairs? examined the adaptive significance of those traits
 What is the adaptive significance of novel structures and func- (Ågrena et al., 2013; Gehan et al., 2015). More examples
tions that are derived from gene duplication? illustrating the impact of novel traits on plant adapta-
 How has gene duplication contributed to the evolution of new tion are needed.
species, particularly those of ecological, evolutionary, and ag- Understanding the impact of duplicated genes is
ricultural significance? important in light of the challenges facing agriculture
in the 21st century, including both the old problems of
yield, disease resistance, and stress tolerance as well as
new issues related to global climate change. Addressing
in a single predictive framework (Jiang et al., 2013; the grand challenge of food security will not only re-
Moghe et al., 2014). This suggests that additional, un- quire improving our ability to modify plant traits
known factors may contribute to duplicate retention. (Halpin, 2005) but also our ability to identify the caus-
Considering that the function of a gene is directly or ative loci of desirable traits (Mickelbart et al., 2015) and
indirectly influenced by many other genes, the retention the genomic context in which they exist (Vaughan et al.,
of a duplicate gene should be influenced by other genes 2007). In this regard, a continuing effort to understand
that are closely linked to the duplicate in the gene net- how duplicate genes have contributed to novel func-
work (i.e. in the same functional module), a possibility tions, expansion of gene families, and the structure of
consistent with the expected outcome of the gene- the genome as a whole is necessary. Considering how
balance model (Birchler et al., 2005; Birchler and Veitia, duplicate genes have contributed to evolutionary nov-
2007, 2010). This network idea can be pushed to an even elties and diversity in plants, understanding the evo-
higher level of organization by asking how other mod- lution of duplicate gene functions holds the key to
ules influence the retention of duplicate genes in an en- understanding the future of both natural and domes-
tire module. Thus, a systems-level understanding (i.e. ticated populations, particularly in light of impending
knowledge of the architecture of the gene networks and environmental shift due to global climate change.
the nature of the connections between genes) is essential Received April 2, 2016; accepted May 17, 2016; published June 10, 2016.
to assess how duplicates are influenced by other net-
work components. This knowledge will be helpful not
only for addressing the rather esoteric question of how LITERATURE CITED
duplicate genes are retained but also for understanding Abdelsamad A, Pecinka A (2014) Pollen-specific activation of Arabidopsis
how duplicate genes collectively influence molecular retrogenes is associated with global transcriptional reprogramming.
functions, physiology, and development in plants. For Plant Cell 26: 3299–3313
example, due to the high rate of transcription factor re- Ågrena J, Oakley CG, McKay JK, Lovell JT, Schemske DW (2013) Genetic
mapping of adaptation reveals fitness tradeoffs in Arabidopsis thaliana.
tention, a gene is not only regulated by transcription
Proc Natl Acad Sci USA 110: 21077–21082
factors from different families but also bound by multi- Airoldi CA, Davies B (2012) Gene duplication and the evolution of
ple members of the same family (Macneil and Walhout, plant MADS-box transcription factors. J Genet Genomics 39: 157–165
2011). The expression level of the gene in question is thus Akhunov ED, Sehgal S, Liang H, Wang S, Akhunova AR, Kaur G, Li W,
determined by a large number of duplicate factors. Forrest KL, See D, Simková H, et al (2013) Comparative analysis of
Without a systems-level understanding of how dupli- syntenic genes in grass genomes reveals accelerated rates of gene
structure and coding sequence evolution in polyploid wheat. Plant
cates differ in their functions, it will not be possible to Physiol 161: 252–265
gain a complete picture of how a gene is regulated. Aklilu BB, Culligan KM (2016) Molecular evolution and functional di-
Ultimately, our interest in studying duplicate genes versification of Replication Protein A1 in plants. Front Plant Sci 7: 33
lies in their evolutionary, ecological, and agronomic Aklilu BB, Soderquist RS, Culligan KM (2014) Genetic analysis of the
impacts. Although the acquisition of novel molecular Replication Protein A large subunit family in Arabidopsis reveals
unique and overlapping roles in DNA repair, meiosis and DNA repli-
functions among duplicates is common (Blanc and
cation. Nucleic Acids Res 42: 3104–3118
Wolfe, 2004b), this alone is not sufficient to conclude Alleman M, Freeling M (1986) The Mu transposable elements of maize:
that there was an impact on evolution. An apparently evidence for transposition and copy number regulation during devel-
novel function (e.g. a new biochemical activity, pattern opment. Genetics 112: 107–119

2310 Plant Physiol. Vol. 171, 2016


Origin and Evolution of Duplicate Genes

Altschul SF, Madden TL, Schäffer AA, Zhang J, Zhang Z, Miller W, of the bread wheat genome using whole-genome shotgun sequencing.
Lipman DJ (1997) Gapped BLAST and PSI-BLAST: a new generation of Nature 491: 705–710
protein database search programs. Nucleic Acids Res 25: 3389–3402 Bridgham JT, Brown JE, Rodríguez-Marí A, Catchen JM, Thornton JW
Alvarez-Ponce D, Fares MA (2012) Evolutionary rate and duplicability in (2008) Evolution of a new function by degenerative mutation in ceph-
the Arabidopsis thaliana protein-protein interaction network. Genome alochordate steroid receptors. PLoS Genet 4: e1000191
Biol Evol 4: 1263–1274 Brockington SF, Yang Y, Gandia-Herrero F, Covshoff S, Hibberd JM,
Ambrosino L, Bostan H, di Salle P, Sangiovanni M, Vigilante A, Chiusano Sage RF, Wong GK, Moore MJ, Smith SA (2015) Lineage-specific gene
ML (2016) pATsi: paralogs and singleton genes from Arabidopsis thali- radiations underlie the evolution of novel betalain pigmentation in
ana. Evol Bioinform Online 12: 1–7 Caryophyllales. New Phytol 207: 1170–1180
Amoutzias GD, Robertson DL, Van de Peer Y, Oliver SG (2008) Choose Brosius J (1991) Retroposons: seeds of evolution. Science 251: 753
your partners: dimerization in eukaryotic transcription factors. Trends Buendia L, Wang T, Girardin A, Lefebvre B (2016) The LysM receptor-like
Biochem Sci 33: 220–229 kinase SlLYK10 regulates the arbuscular mycorrhizal symbiosis in to-
Arabidopsis Genome Initiative (2000) Analysis of the genome sequence of mato. New Phytol 210: 184–195
the flowering plant Arabidopsis thaliana. Nature 408: 796–815 Bustamante CD, Nielsen R, Sawyer SA, Olsen KM, Purugganan MD,
Arabidopsis Interactome Mapping Consortium (2011) Evidence for net- Hartl DL (2002) The cost of inbreeding in Arabidopsis. Nature 416: 531–
work evolution in an Arabidopsis interactome map. Science 333: 601– 534
607 Byrne DH, Jelenkovic G (1976) Cytological diploidization in the cultivated
Arsovski AA, Pradinuk J, Guo XQ, Wang S, Adams KL (2015) Evolution of octoploid strawberry Fragaria 3 ananassa. Can J Genet Cytol 18: 653–
cis-regulatory elements and regulatory networks in duplicated genes of 659
Arabidopsis. Plant Physiol 169: 2982–2991 Cannon SB, Zhu H, Baumgarten AM, Spangler R, May G, Cook DR,
Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM, Davis Young ND (2002) Diversity, distribution, and ancient taxonomic rela-
AP, Dolinski K, Dwight SS, Eppig JT, et al (2000) Gene Ontology: tool tionships within the TIR and non-TIR NBS-LRR resistance gene sub-
for the unification of biology. Nat Genet 25: 25–29 families. J Mol Evol 54: 548–562
Bailey JA, Gu Z, Clark RA, Reinert K, Samonte RV, Schwartz S, Adams Cao J, Schneeberger K, Ossowski S, Günther T, Bender S, Fitz J, Koenig
MD, Myers EW, Li PW, Eichler EE (2002) Recent segmental duplica- D, Lanz C, Stegle O, Lippert C, et al (2011) Whole-genome se-
tions in the human genome. Science 297: 1003–1007 quencing of multiple Arabidopsis thaliana populations. Nat Genet 43:
Bailey JA, Liu G, Eichler EE (2003) An Alu transposition model for the 956–963
origin and expansion of human segmental duplications. Am J Hum Carretero-Paulet L, Fares MA (2012) Evolutionary dynamics and func-
Genet 73: 823–834 tional specialization of plant paralogs formed by whole and small-scale
Baker CR, Hanson-Smith V, Johnson AD (2013) Following gene duplica- genome duplications. Mol Biol Evol 29: 3541–3551
tion, paralog interference constrains transcriptional circuit evolution. Casneuf T, De Bodt S, Raes J, Maere S, Van de Peer Y (2006) Nonrandom
Science 342: 104–108 divergence of gene expression following gene and genome duplications
Bakker EG, Toomajian C, Kreitman M, Bergelson J (2006) A genome-wide in the flowering plant Arabidopsis thaliana. Genome Biol 7: R13
survey of R gene polymorphisms in Arabidopsis. Plant Cell 18: 1803–1818 Causier B, Castillo R, Xue Y, Schwarz-Sommer Z, Davies B (2010) Tracing
Beilstein MA, Nagalingum NS, Clements MD, Manchester SR, Mathews the evolution of the floral homeotic B- and C-function genes through
S (2010) Dated molecular phylogenies indicate a Miocene origin for genome synteny. Mol Biol Evol 27: 2651–2664
Arabidopsis thaliana. Proc Natl Acad Sci USA 107: 18724–18728 Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, Chiquet J,
Bekaert M, Edger PP, Pires JC, Conant GC (2011) Two-phase resolution of Belcram H, Tong C, Samans B, et al (2014) Early allopolyploid evolu-
polyploidy in the Arabidopsis metabolic network gives rise to relative tion in the post-Neolithic Brassica napus oilseed genome. Science 345:
and absolute dosage constraints. Plant Cell 23: 1719–1728 950–953
Bennetzen JL (2005) Transposable elements, gene creation and genome Chantret N, Salse J, Sabot F, Rahman S, Bellec A, Laubin B, Dubois I,
rearrangement in flowering plants. Curr Opin Genet Dev 15: 621–627 Dossat C, Sourdille P, Joudrier P, et al (2005) Molecular basis of evo-
Benovoy D, Drouin G (2006) Processed pseudogenes, processed genes, and lutionary events that shaped the hardness locus in diploid and polyploid
spontaneous mutations in the Arabidopsis genome. J Mol Evol 62: 511– wheat species (Triticum and Aegilops). Plant Cell 17: 1033–1045
522 Chapman BA, Bowers JE, Feltus FA, Paterson AH (2006) Buffering of
Birchler JA, Riddle NC, Auger DL, Veitia RA (2005) Dosage balance in crucial functions by paleologous duplicated genes may contribute cy-
gene regulation: biological implications. Trends Genet 21: 219–226 clicality to angiosperm genome duplication. Proc Natl Acad Sci USA
Birchler JA, Veitia RA (2007) The gene balance hypothesis: from classical 103: 2730–2735
genetics to modern genomics. Plant Cell 19: 395–402 Chaudhary B, Flagel L, Stupar RM, Udall JA, Verma N, Springer NM,
Birchler JA, Veitia RA (2010) The gene balance hypothesis: implications for Wendel JF (2009) Reciprocal silencing, transcriptional bias and func-
gene regulation, quantitative traits and evolution. New Phytol 186: tional divergence of homeologs in polyploid cotton (Gossypium). Ge-
54–62 netics 182: 503–517
Blackman BK, Strasburg JL, Raduski AR, Michaels SD, Rieseberg LH Chen C, Chen H, Lin YS, Shen JB, Shan JX, Qi P, Shi M, Zhu MZ, Huang
(2010) The role of recently derived FT paralogs in sunflower domesti- XH, Feng Q, et al (2014) A two-locus interaction causes interspecific
cation. Curr Biol 20: 629–635 hybrid weakness in rice. Nat Commun 5: 3357
Blanc G, Hokamp K, Wolfe KH (2003) A recent polyploidy superimposed Cheng F, Wu J, Fang L, Sun S, Liu B, Lin K, Bonnema G, Wang X (2012)
on older large-scale duplications in the Arabidopsis genome. Genome Biased gene fractionation and dominant gene expression among the
Res 13: 137–144 subgenomes of Brassica rapa. PLoS ONE 7: e36442
Blanc G, Wolfe KH (2004a) Widespread paleopolyploidy in model plant Clark RM, Schweikert G, Toomajian C, Ossowski S, Zeller G, Shinn P,
species inferred from age distributions of duplicate genes. Plant Cell 16: Warthmann N, Hu TT, Fu G, Hinds DA, et al (2007) Common sequence
1667–1678 polymorphisms shaping genetic diversity in Arabidopsis thaliana. Sci-
Blanc G, Wolfe KH (2004b) Functional divergence of duplicated genes ence 317: 338–342
formed by polyploidy during Arabidopsis evolution. Plant Cell 16: 1679– Coen ES, Meyerowitz EM (1991) The war of the whorls: genetic interac-
1691 tions controlling flower development. Nature 353: 31–37
Bomblies K, Weigel D (2007) Hybrid necrosis: autoimmunity as a potential Conant GC, Birchler JA, Pires JC (2014) Dosage, duplication, and dip-
gene-flow barrier in plant species. Nat Rev Genet 8: 382–393 loidization: clarifying the interplay of multiple models for duplicate
Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling an- gene evolution over time. Curr Opin Plant Biol 19: 91–98
giosperm genome evolution by phylogenetic analysis of chromosomal Conant GC, Wolfe KH (2008) Turning a hobby into a job: how duplicated
duplication events. Nature 422: 433–438 genes find new functions. Nat Rev Genet 9: 938–950
Bowman JL, Smyth DR, Meyerowitz EM (1989) Genes directing flower Dangl JL, Jones JD (2001) Plant pathogens and integrated defence re-
development in Arabidopsis. Plant Cell 1: 37–52 sponses to infection. Nature 411: 826–833
Brenchley R, Spannagl M, Pfeifer M, Barker GL, D’Amore R, Allen AM, Dayhoff MO (1976) The origin and evolution of protein superfamilies. Fed
McKenzie N, Kramer M, Kerhornou A, Bolser D, et al (2012) Analysis Proc 35: 2132–2138

Plant Physiol. Vol. 171, 2016 2311


Panchy et al.

Dehal P, Boore JL (2005) Two rounds of whole genome duplication in the developmental acclimation responses to submergence in rice. Plant Cell 18:
ancestral vertebrate. PLoS Biol 3: e314 2021–2034
DeLuna A, Vetsigian K, Shoresh N, Hegreness M, Colón-González M, Furumizu C, Alvarez JP, Sakakibara K, Bowman JL (2015) Antagonistic
Chao S, Kishony R (2008) Exposing the fitness contribution of dupli- roles for KNOX1 and KNOX2 genes in patterning the land plant body
cated genes. Nat Genet 40: 676–681 plan following an ancient gene duplication. PLoS Genet 11: e1004980
Demain AL, Fang A (2000) The natural functions of secondary metabolites. Gallardo MH, Bickham JW, Honeycutt RL, Ojeda RA, Köhler N (1999)
Adv Biochem Eng Biotechnol 69: 1–39 Discovery of tetraploidy in a mammal. Nature 401: 341
de Martino G, Pan I, Emmanuel E, Levy A, Irish VF (2006) Functional Ganko EW, Meyers BC, Vision TJ (2007) Divergence in expression be-
analyses of two tomato APETALA3 genes demonstrate diversification in tween duplicated genes in Arabidopsis. Mol Biol Evol 24: 2298–2309
their roles in regulating floral development. Plant Cell 18: 1833–1845 Gehan MA, Park S, Gilmour SJ, An C, Lee CM, Thomashow MF (2015)
Derelle E, Ferraz C, Rombauts S, Rouzé P, Worden AZ, Robbens S, Natural variation in the C-repeat binding factor cold response pathway
Partensky F, Degroeve S, Echeynié S, Cooke R, et al (2006) Genome correlates with local adaptation of Arabidopsis ecotypes. Plant J 84: 682–693
analysis of the smallest free-living eukaryote Ostreococcus tauri unveils Geuten K, Irish V (2010) Hidden variability of floral homeotic B genes in
many unique features. Proc Natl Acad Sci USA 103: 11647–11652 Solanaceae provides a molecular basis for the evolution of novel func-
Des Marais DL, Rausher MD (2008) Escape from adaptive conflict after tions. Plant Cell 22: 2562–2578
duplication in an anthocyanin pathway gene. Nature 454: 762–765 Geuten K, Viaene T, Irish VF (2011) Robustness and evolvability in the
D’Hont A, Denoeud F, Aury JM, Baurens FC, Carreel F, Garsmeur O, B-system of flower development. Ann Bot (Lond) 107: 1545–1556
Noel B, Bocs S, Droc G, Rouard M, et al (2012) The banana (Musa Goodstein DM, Shu S, Howson R, Neupane R, Hayes RD, Fazo J, Mitros
acuminata) genome and the evolution of monocotyledonous plants. T, Dirks W, Hellsten U, Putnam N, et al (2012) Phytozome: a com-
Nature 488: 213–217 parative platform for green plant genomics. Nucleic Acids Res 40:
Drouin G, Dover GA (1990) Independent gene evolution in the potato actin D1178–D1186
gene family demonstrated by phylogenetic procedures for resolving Gramzow L, Ritz MS, Theissen G (2010) On the origin of MADS-domain
gene conversions and the phylogeny of angiosperm actin genes. J Mol transcription factors. Trends Genet 26: 149–153
Evol 31: 132–150 Graur D, Shuali Y, Li WH (1989) Deletions in processed pseudogenes ac-
Du C, Fefelova N, Caronna J, He L, Dooner HK (2009) The polychromatic cumulate faster in rodents than in humans. J Mol Evol 28: 279–285
Helitron landscape of the maize genome. Proc Natl Acad Sci USA 106: Greilhuber J, Borsch T, Müller K, Worberg A, Porembski S, Barthlott W
19916–19921 (2006) Smallest angiosperm genomes found in Lentibulariaceae, with
Duarte JM, Cui L, Wall PK, Zhang Q, Zhang X, Leebens-Mack J, Ma H, chromosomes of bacterial size. Plant Biol (Stuttg) 8: 770–777
Altman N, dePamphilis CW (2006) Expression pattern shifts following Guo H, Lee TH, Wang X, Paterson AH (2013) Function relaxation followed
duplication indicative of subfunctionalization and neofunctionalization by diversifying selection after whole-genome duplication in flowering
in regulatory genes of Arabidopsis. Mol Biol Evol 23: 469–478 plants. Plant Physiol 162: 769–778
Durbin ML, McCaig B, Clegg MT (2000) Molecular evolution of the Guo X, Zhang Z, Gerstein MB, Zheng D (2009) Small RNAs originated
chalcone synthase multigene family in the morning glory genome. Plant from pseudogenes: cis- or trans-acting? PLOS Comput Biol 5: e1000449
Mol Biol 42: 79–92 Guo YL (2013) Gene family evolution in green plants with emphasis on the
Edger PP, Heidel-Fischer HM, Bekaert M, Rota J, Glöckner G, Platts AE, origination and evolution of Arabidopsis thaliana genes. Plant J 73: 941–
Heckel DG, Der JP, Wafula EK, Tang M, et al (2015) The butterfly plant 951
arms-race escalated by gene and genome duplications. Proc Natl Acad Haberer G, Hindemitt T, Meyers BC, Mayer KF (2004) Transcriptional
Sci USA 112: 8362–8366 similarities, dissimilarities, and conservation of cis-elements in dupli-
Ellis J, Dodds P, Pryor T (2000) Structure, function and evolution of plant cated genes of Arabidopsis. Plant Physiol 136: 3009–3022
disease resistance genes. Curr Opin Plant Biol 3: 278–284 Halkier BA, Gershenzon J (2006) Biology and biochemistry of glucosino-
Fenster CB, Armbruster WS, Wilson P, Dudash MR, Thomson JD (2004) lates. Annu Rev Plant Biol 57: 303–333
Pollination syndromes and floral specialization. Annu Rev Ecol Evol Halpin C (2005) Gene stacking in transgenic plants: the challenge for 21st
Syst 35: 375–403 century plant biotechnology. Plant Biotechnol J 3: 141–155
Ferguson AA, Zhao D, Jiang N (2013) Selective acquisition and retention of Hanada K, Kuromori T, Myouga F, Toyoda T, Li WH, Shinozaki K (2009a)
genomic sequences by Pack-Mutator-like elements based on guanine- Evolutionary persistence of functional compensation by duplicate genes
cytosine content and the breadth of expression. Plant Physiol 163: 1419–1432 in Arabidopsis. Genome Biol Evol 1: 409–414
Field B, Osbourn AE (2008) Metabolic diversification: independent assembly of Hanada K, Kuromori T, Myouga F, Toyoda T, Shinozaki K (2009b) In-
operon-like gene clusters in different plants. Science 320: 543–547 creased expression and protein divergence in duplicate genes is asso-
Fitch WM (1970) Distinguishing homologous from analogous proteins. Syst ciated with morphological diversification. PLoS Genet 5: e1000781
Zool 19: 99–113 Hanada K, Vallejo V, Nobuta K, Slotkin RK, Lisch D, Meyers BC, Shiu
Force A, Lynch M (2000) The origin of interspecific genomic incompati- SH, Jiang N (2009c) The functional role of pack-MULEs in rice inferred
bility via gene duplication. Am Nat 165: 590–605 from purifying selection and expression profile. Plant Cell 21: 25–38
Force A, Lynch M, Pickett FB, Amores A, Yan YL, Postlethwait J (1999) Hanada K, Zou C, Lehti-Shiu MD, Shinozaki K, Shiu SH (2008) Impor-
Preservation of duplicate genes by complementary, degenerative mu- tance of lineage-specific expansion of plant tandem duplicates in the
tations. Genetics 151: 1531–1545 adaptive response to environmental stimuli. Plant Physiol 148: 993–1003
Fornoni J, Ordano M, Pérez-Ishiwara R, Boege K, Domínguez CA (2016) He C, Saedler H (2005) Heterotopic expression of MPF2 is the key to the
A comparison of floral integration between selfing and outcrossing evolution of the Chinese lantern of Physalis, a morphological novelty in
species: a meta-analysis. Ann Bot (Lond) 117: 299–306 Solanaceae. Proc Natl Acad Sci USA 102: 5779–5784
Freeling M, Scanlon MJ, Fowler JE (2015) Fractionation and subfunctionaliza- He X, Zhang J (2005) Rapid subfunctionalization accompanied by pro-
tion following genome duplications: mechanisms that drive gene content and longed and substantial neofunctionalization in duplicate gene evolution.
their consequences. Curr Opin Genet Dev 35: 110–118 Genetics 169: 1157–1164
Freeling M, Thomas BC (2006) Gene-balanced duplications, like tetra- Hittinger CT, Carroll SB (2007) Gene duplication and the adaptive evo-
ploidy, provide predictable drive to increase morphological complexity. lution of a classic genetic switch. Nature 449: 677–681
Genome Res 16: 805–814 Holstege FC, Jennings EG, Wyrick JJ, Lee TI, Hengartner CJ, Green MR,
Freeling M, Woodhouse MR, Subramaniam S, Turco G, Lisch D, Golub TR, Lander ES, Young RA (1998) Dissecting the regulatory cir-
Schnable JC (2012) Fractionation mutagenesis and similar consequences cuitry of a eukaryotic genome. Cell 95: 717–728
of mechanisms removing dispensable or less-expressed DNA in plants. Holub EB (2001) The arms race is ancient history in Arabidopsis, the
Curr Opin Plant Biol 15: 131–139 wildflower. Nat Rev Genet 2: 516–527
Fukao T, Harris T, Bailey-Serres J (2009) Evolutionary analysis of the Sub1 Hu G, Koh J, Yoo MJ, Chen S, Wendel JF (2015a) Gene-expression novelty
gene cluster that confers submergence tolerance to domesticated rice. in allopolyploid cotton: a proteomic perspective. Genetics 200: 91–104
Ann Bot (Lond) 103: 143–150 Hu Y, Liang W, Yin C, Yang X, Ping B, Li A, Jia R, Chen M, Luo Z, Cai Q,
Fukao T, Xu K, Ronald PC, Bailey-Serres J (2006) A variable cluster et al (2015b) Interactions of OsMADS1 with floral homeotic genes in rice
of ethylene response factor-like genes regulates metabolic and flower development. Mol Plant 8: 1366–1384

2312 Plant Physiol. Vol. 171, 2016


Origin and Evolution of Duplicate Genes

Huang R, Hippauf F, Rohrbeck D, Haustein M, Wenke K, Feike J, Langille MG, Clark DV (2007) Parent genes of retrotransposition-generated
Sorrelle N, Piechulla B, Barkman TJ (2012) Enzyme functional evolu- gene duplicates in Drosophila melanogaster have distinct expression
tion through improved catalysis of ancestrally nonpreferred substrates. profiles. Genomics 90: 334–343
Proc Natl Acad Sci USA 109: 2966–2971 Law M, Childs KL, Campbell MS, Stein JC, Olson AJ, Holt C, Panchy N,
Hughes TE, Langdale JA, Kelly S (2014) The impact of widespread regu- Lei J, Jiao D, Andorf CM, et al (2015) Automated update, revision, and
latory neofunctionalization on homeolog gene evolution following quality control of the maize genome annotations using MAKER-P im-
whole-genome duplication in maize. Genome Res 24: 1348–1355 proves the B73 RefGen_v3 gene models and identifies new genes. Plant
Husband BC, Baldwin SJ, Suda J (2013) The incidence of polyploidy in Physiol 167: 25–39
natural plant populations: major patterns and evolutionary processes. In Lee TH, Tang H, Wang X, Paterson AH (2013) PGDD: a database of gene
J Greilhuber, J Dolezel, JF Wendel, eds, Plant Genome Diversity. Volume and genome duplication in plants. Nucleic Acids Res 41: D1152–D1158
2. Physical Structure, Behavior and Evolution of Plant Genomes. Lehti-Shiu MD, Shiu SH (2012) Diversity, classification and function of the
Springer-Verlag, Vienna, pp 255–276 plant protein kinase superfamily. Philos Trans R Soc Lond B Biol Sci 367:
Innan H, Kondrashov F (2010) The evolution of gene duplications: classi- 2619–2639
fying and distinguishing between models. Nat Rev Genet 11: 97–108 Lehti-Shiu MD, Uygun S, Moghe GD, Panchy N, Fang L, Hufnagel DE,
Jiang N, Bao Z, Zhang X, Eddy SR, Wessler SR (2004) Pack-MULE transposable Jasicki HL, Feig M, Shiu SH (2015) Molecular evidence for functional
elements mediate gene evolution in plants. Nature 431: 569–573 divergence and decay of a transcription factor derived from whole-
Jiang N, Ferguson AA, Slotkin RK, Lisch D (2011) Pack-Mutator-like genome duplication in Arabidopsis thaliana. Plant Physiol 168: 1717–
transposable elements (Pack-MULEs) induce directional modification 1734
of genes through biased insertion and DNA acquisition. Proc Natl Acad Lemos B, Meiklejohn CD, Hartl DL (2004) Regulatory evolution across the
Sci USA 108: 1537–1542 protein interaction network. Nat Genet 36: 1059–1060
Jiang WK, Liu YL, Xia EH, Gao LZ (2013) Prevalent role of gene features in Lespinet O, Wolf YI, Koonin EV, Aravind L (2002) The role of lineage-
determining evolutionary fates of whole-genome duplication duplicated specific gene family expansion in the evolution of eukaryotes. Genome
genes in flowering plants. Plant Physiol 161: 1844–1861 Res 12: 1048–1059
Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph Li D, Liu Y, Zhong C, Huang H (2010) Morphological and cytotype vari-
PE, Tomsho LP, Hu Y, Liang H, Soltis PS, et al (2011) Ancestral poly- ation of wild kiwifruit (Actinidia chinensis complex) along an altitudinal
ploidy in seed plants and angiosperms. Nature 473: 97–100 and longitudinal gradient in central-west China. Bot J Linn Soc 164: 72–
Jin J, Zhang H, Kong L, Gao G, Luo J (2014) PlantTFDB 3.0: a portal for the 83
functional and evolutionary study of plant transcription factors. Nucleic Li F, Fan G, Lu C, Xiao G, Zou C, Kohel RJ, Ma Z, Shang H, Ma X, Wu J,
Acids Res 42: D1182–D1187 et al (2015a) Genome sequence of cultivated upland cotton (Gos-
Johnson S, Linder H, Steiner K (1998) Phylogeny and radiation of polli- sypium hirsutum TM-1) provides insights into genome evolution. Nat
nation systems in Disa (Orchidaceae). Am J Bot 85: 402 Biotechnol 33: 524–530
Jones JD, Dangl JL (2006) The plant immune system. Nature 444: 323–329 Li WH (1983) Evolution of duplicate genes and pseudogenes. In M Nei, RK
Kaessmann H, Vinckenbosch N, Long M (2009) RNA-based gene dupli- Koehn, eds, Evolution of Genes and Proteins. Sinauer Associates, Sun-
cation: mechanistic and evolutionary insights. Nat Rev Genet 10: 19–31 derland, MA, pp 14–37
Kaltenegger E, Ober D (2015) Paralogue interference affects the dynamics Li Z, Baniaga AE, Sessa EB, Scascitelli M, Graham SW, Rieseberg LH,
after gene duplication. Trends Plant Sci 20: 814–821 Barker MS (2015b) Early genome duplications in conifers and other seed
Kapitonov VV, Jurka J (2007) Helitrons on a roll: eukaryotic rolling-circle plants. Sci Adv 1: e1501084
transposons. Trends Genet 23: 521–529 Li Z, Defoort J, Tasdighian S, Maere S, Van de Peer Y, De Smet R (2016)
Kejnovsky E, Leitch IJ, Leitch AR (2009) Contrasting evolutionary dy- Gene duplicability of core genes is highly consistent across all angio-
namics between angiosperm and mammalian genomes. Trends Ecol sperms. Plant Cell 28: 326–344
Evol 24: 572–582 Li Z, Zhang H, Ge S, Gu X, Gao G, Luo J (2009) Expression pattern di-
Kellis M, Birren BW, Lander ES (2004) Proof and evolutionary analysis of vergence of duplicated genes in rice. BMC Bioinformatics (Suppl 6) 10:
ancient genome duplication in the yeast Saccharomyces cerevisiae. Na- S8
ture 428: 617–624 Lisch D (2013) How important are transposons for plant evolution? Nat
Kimura M (1968) Evolutionary rate at the molecular level. Nature 217: 624– Rev Genet 14: 49–61
626 Lister R, O’Malley RC, Tonti-Filippini J, Gregory BD, Berry CC, Millar
Kimura M (1983) The Netural Theory of Molecular Evolution. Cambridge AH, Ecker JR (2008) Highly integrated single-base resolution maps of
University Press, Cambridge, UK the epigenome in Arabidopsis. Cell 133: 523–536
Kimura M, King JL (1979) Fixation of a deleterious allele at one of two Liu H, Liu H, Zhou L, Zhang Z, Zhang X, Wang M, Li H, Lin Z (2015)
“duplicate” loci by mutation pressure and random drift. Proc Natl Acad Parallel domestication of the heading date 1 gene in cereals. Mol Biol
Sci USA 76: 2858–2861 Evol 32: 2726–2737
Kimura M, Ohta T (1969) The average number of generations until fixation Liu SL, Baute GJ, Adams KL (2011) Organ and cell type-specific comple-
of a mutant gene in a finite population. Genetics 61: 763–771 mentary expression patterns and regulatory neofunctionalization be-
Kitano H (2004) Biological robustness. Nat Rev Genet 5: 826–837 tween duplicated genes in Arabidopsis thaliana. Genome Biol Evol 3:
Kliebenstein DJ, Osbourn A (2012) Making new molecules: evolution of 1419–1436
pathways for novel metabolites in plants. Curr Opin Plant Biol 15: 415–423 Lloyd JP, Seddon AE, Moghe GD, Simenc MC, Shiu SH (2015) Charac-
Kondrashov F (2010) Gene dosage and duplication. In K Dittmar, DA teristics of plant essential genes allow for within- and between-species
Liberles, eds, Evolution after Gene Duplication. John Wiley & Sons, prediction of lethal mutant phenotypes. Plant Cell 27: 2133–2147
Hoboken, NJ, pp 215–218 Lu F, Lipka AE, Glaubitz J, Elshire R, Cherney JH, Casler MD, Buckler
Kramer EM, Dorit RL, Irish VF (1998) Molecular evolution of genes con- ES, Costich DE (2013) Switchgrass genomic diversity, ploidy, and evo-
trolling petal and stamen development: duplication and divergence lution: novel insights from a network-based SNP discovery protocol.
within the APETALA3 and PISTILLATA MADS-box gene lineages. PLoS Genet 9: e1003215
Genetics 149: 765–783 Lu Y, Ran JH, Guo DM, Yang ZY, Wang XQ (2014) Phylogeny and di-
Kramer EM, Holappa L, Gould B, Jaramillo MA, Setnikov D, Santiago vergence times of gymnosperms inferred from single-copy nuclear
PM (2007) Elaboration of B gene function to include the identity of novel genes. PLoS ONE 9: e107679
floral organs in the lower eudicot Aquilegia. Plant Cell 19: 750–766 Lynch M, Conery JS (2000) The evolutionary fate and consequences of
Kramer EM, Su HJ, Wu CC, Hu JM (2006) A simplified explanation for duplicate genes. Science 290: 1151–1155
the frameshift mutation that created a novel C-terminal motif in the Lynch M, Conery JS (2003) The evolutionary demography of duplicate
APETALA3 gene lineage. BMC Evol Biol 6: 30 genes. J Struct Funct Genomics 3: 35–44
Kubatova B, Travnicek P, Bastlova D, Curn V, Jarolimova V, Suda J Lyons E, Freeling M (2008) How to usefully compare homologous plant
(2008) DNA ploidy-level variation in native and invasive populations genes and chromosomes as DNA sequences. Plant J 53: 661–673
of Lythrum salicaria at a large geographical scale. J Biogeogr 35: 167– Lyons E, Pedersen B, Kane J, Alam M, Ming R, Tang H, Wang X, Bowers
176 J, Paterson A, Lisch D, et al (2008) Finding and comparing syntenic

Plant Physiol. Vol. 171, 2016 2313


Panchy et al.

regions among Arabidopsis and the outgroups papaya, poplar, and Otto SP, Whitton J (2000) Polyploid incidence and evolution. Annu Rev
grape: CoGe with rosids. Plant Physiol 148: 1772–1781 Genet 34: 401–437
Lysak MA, Koch MA, Pecinka A, Schubert I (2005) Chromosome Panopoulou G, Hennig S, Groth D, Krause A, Poustka AJ, Herwig R,
triplication found across the tribe Brassiceae. Genome Res 15: 516– Vingron M, Lehrach H (2003) New evidence for genome-wide dupli-
525 cations at the origin of vertebrates using an amphioxus gene set and
Ma Y, Wang J, Zhong Y, Geng F, Cramer GR, Cheng ZM (2015) Sub- completed animal genomes. Genome Res 13: 1056–1066
functionalization of cation/proton antiporter 1 genes in grapevine Pellicer J, Fay M, Leitch I (2010) The largest eukaryotic genome of them all.
in response to salt stress in different organs. Hortic Res 2: 15031 Bot J Linn Soc 164: 10–15
Macneil LT, Walhout AJ (2011) Gene regulatory networks and the role of Plocik A, Layden J, Kesseli R (2004) Comparative analysis of NBS domain
robustness and stochasticity in the control of gene expression. Genome sequences of NBS-LRR disease resistance genes from sunflower, lettuce,
Res 21: 645–657 and chicory. Mol Phylogenet Evol 31: 153–163
Maere S, De Bodt S, Raes J, Casneuf T, Van Montagu M, Kuiper M, Van Purugganan MD, Rounsley SD, Schmidt RJ, Yanofsky MF (1995) Mo-
de Peer Y (2005) Modeling gene and genome duplications in eukaryotes. lecular evolution of flower development: diversification of the plant
Proc Natl Acad Sci USA 102: 5454–5459 MADS-box regulatory gene family. Genetics 140: 345–356
Maere S, Van de Peer Y (2010) Duplicate retention after small- and large- Qi X, Bakht S, Qin B, Leggett M, Hemmings A, Mellon F, Eagles J, Werck-
scale duplications. In K Dittmar, DA Liberles, eds, Evolution after Gene Reichhart D, Schaller H, Lesot A, et al (2006) A different function for a
Duplication. John Wiley & Sons, Hoboken, NJ, pp 31–56 member of an ancient and highly conserved cytochrome P450 family:
Makino T, McLysaght A (2012) Positionally biased gene loss after whole from essential sterols to plant defense. Proc Natl Acad Sci USA 103:
genome duplication: evidence from human, yeast, and plant. Genome 18848–18853
Res 22: 2427–2435 Qian W, Zhang J (2014) Genomic evidence for adaptation by gene dupli-
Mayrose I, Zhan SH, Rothfels CJ, Magnuson-Ford K, Barker MS, Rieseberg cation. Genome Res 24: 1356–1362
LH, Otto SP (2011) Recently formed polyploid plants diversify at lower rates. Qin C, Yu C, Shen Y, Fang X, Chen L, Min J, Cheng J, Zhao S, Xu M, Luo
Science 333: 1257 Y, et al (2014) Whole-genome sequencing of cultivated and wild peppers
McAdams HH, Arkin A (1999) It’s a noisy business! Genetic regulation at provides insights into Capsicum domestication and specialization. Proc
the nanomolar scale. Trends Genet 15: 65–69 Natl Acad Sci USA 111: 5135–5140
McCarthy EW, Mohamed A, Litt A (2015) Functional divergence of Ramsey J, Schemske DW (1998) Pathways, mechanisms, and rates of
APETALA1 and FRUITFULL is due to changes in both regulation and polyploid formation in flowering plants. Annu Rev Ecol Syst 23: 467–501
coding sequence. Front Plant Sci 6: 1076 Ramsey J, Schemske DW (2002) Neopolyploidy in flowering plants. Annu
Michaels SD, Bezerra IC, Amasino RM (2004) FRIGIDA-related genes are Rev Ecol Syst 33: 589–639
required for the winter-annual habit in Arabidopsis. Proc Natl Acad Sci Rasmussen DA, Kramer EM, Zimmer EA (2009) One size fits all? Molec-
USA 101: 3281–3285 ular evidence for a commonly inherited petal identity program in Ra-
Mickelbart MV, Hasegawa PM, Bailey-Serres J (2015) Genetic mecha- nunculales. Am J Bot 96: 96–109
nisms of abiotic stress tolerance that translate to crop yield stability. Nat Ratnaparkhe MB, Wang X, Li J, Compton RO, Rainville LK, Lemke C,
Rev Genet 16: 237–251 Kim C, Tang H, Paterson AH (2011) Comparative analysis of peanut
Mizutani M, Ohta D (2010) Diversification of P450 genes during land plant NBS-LRR gene clusters suggests evolutionary innovation among du-
evolution. Annu Rev Plant Biol 61: 291–315 plicated domains and erosion of gene microsynteny. New Phytol 192:
Moghe GD, Hufnagel DE, Tang H, Xiao Y, Dworkin I, Town CD, Conner 164–178
JK, Shiu SH (2014) Consequences of whole-genome triplication as re- Renny-Byfield S, Gallagher JP, Grover CE, Szadkowski E, Page JT, Udall
vealed by comparative genomic analyses of the wild radish Raphanus JA, Wang X, Paterson AH, Wendel JF (2014) Ancient gene duplicates in
raphanistrum and three other Brassicaceae species. Plant Cell 26: 1925– Gossypium (cotton) exhibit near-complete expression divergence. Ge-
1937 nome Biol Evol 6: 559–571
Moghe GD, Last RL (2015) Something old, something new: conserved Renny-Byfield S, Gong L, Gallagher JP, Wendel JF (2015) Persistence of
enzymes and the evolution of novelty in plant specialized metabolism. subgenomes in paleopolyploid cotton after 60 my of evolution. Mol Biol
Plant Physiol 169: 1512–1523 Evol 32: 1063–1071
Moghe GD, Shiu SH (2014) The causes and molecular consequences of Renny-Byfield S, Wendel JF (2014) Doubling down on genomes: poly-
polyploidy in flowering plants. Ann N Y Acad Sci 1320: 16–34 ploidy and crop plants. Am J Bot 101: 1711–1725
Mondragón-Palomino M, Theissen G (2008) MADS about the evolution of Rensing SA, Lang D, Zimmer AD, Terry A, Salamov A, Shapiro H,
orchid flowers. Trends Plant Sci 13: 51–59 Nishiyama T, Perroud PF, Lindquist EA, Kamisugi Y, et al (2008) The
Mondragón-Palomino M, Theissen G (2009) Why are orchid flowers so Physcomitrella genome reveals evolutionary insights into the conquest
diverse? Reduction of evolutionary constraints by paralogues of class B of land by plants. Science 319: 64–69
floral homeotic genes. Ann Bot (Lond) 104: 583–594 Rhee SY, Wood V, Dolinski K, Draghici S (2008) Use and misuse of the
Mondragón-Palomino M, Theissen G (2011) Conserved differential ex- Gene Ontology annotations. Nat Rev Genet 9: 509–515
pression of paralogous DEFICIENS- and GLOBOSA-like MADS-box Rieseberg LH, Blackman BK (2010) Speciation genes in plants. Ann Bot
genes in the flowers of Orchidaceae: refining the ‘orchid code.’ Plant J (Lond) 106: 439–455
66: 1008–1019 Rieseberg LH, Willis JH (2007) Plant speciation. Science 317: 910–914
Moore MJ, Bell CD, Soltis PS, Soltis DE (2007) Using plastid genome-scale Rijpkema AS, Royaert S, Zethof J, van der Weerden G, Gerats T,
data to resolve enigmatic relationships among basal angiosperms. Proc Vandenbussche M (2006) Analysis of the petunia TM6 MADS box gene
Natl Acad Sci USA 104: 19363–19368 reveals functional divergence within the DEF/AP3 lineage. Plant Cell 18:
Murat F, Van de Peer Y, Salse J (2012) Decoding plant and animal genome 1819–1832
plasticity from differential paleo-evolutionary patterns and processes. Rizzon C, Ponger L, Gaut BS (2006) Striking similarities in the genomic
Genome Biol Evol 4: 917–928 distribution of tandemly arrayed genes in Arabidopsis and rice. PLOS
Myburg AA, Grattapaglia D, Tuskan GA, Hellsten U, Hayes RD, Grimwood J, Comput Biol 2: e115
Jenkins J, Lindquist E, Tice H, Bauer D, et al (2014) The genome of Euca- Rodgers-Melnick E, Mane SP, Dharmawardhana P, Slavov GT, Crasta
lyptus grandis. Nature 510: 356–362 OR, Strauss SH, Brunner AM, Difazio SP (2012) Contrasting patterns
Nourmohammad A, Lässig M (2011) Formation of regulatory modules by of evolution following whole genome versus tandem duplication events
local sequence duplication. PLOS Comput Biol 7: e1002167 in Populus. Genome Res 22: 95–105
Nowak MA, Boerlijst MC, Cooke J, Smith JM (1997) Evolution of genetic Sabara HA, Kron P, Husband BC (2013) Cytotype coexistence leads to
redundancy. Nature 388: 167–171 triploid hybrid production in a diploid-tetraploid contact zone of Cha-
Nützmann HW, Osbourn A (2014) Gene clustering in plant specialized merion angustifolium (Onagraceae). Am J Bot 100: 962–970
metabolism. Curr Opin Biotechnol 26: 91–99 Sakai H, Mizuno H, Kawahara Y, Wakimoto H, Ikawa H, Kawahigashi H,
Ohno S (1970) Evolution by Gene Duplication. Springer-Verlag, New York Kanamori H, Matsumoto T, Itoh T, Gaut BS (2011) Retrogenes in rice
Orr HA (1996) Dobzhansky, Bateson, and the genetics of speciation. Ge- (Oryza sativa L. ssp. japonica) exhibit correlated expression with their
netics 144: 1331–1335 source genes. Genome Biol Evol 3: 1357–1368

2314 Plant Physiol. Vol. 171, 2016


Origin and Evolution of Duplicate Genes

Salse J, Bolot S, Throude M, Jouffe V, Piegu B, Quraishi UM, Calcagno T, of angiosperm diversification: increased diversification rates often fol-
Cooke R, Delseny M, Feuillet C (2008) Identification and characteri- low whole genome duplications. New Phytol 207: 454–467
zation of shared duplications between rice and wheat provide new in- Tautz D, Domazet-Lošo T (2011) The evolutionary origin of orphan genes.
sight into grass genome evolution. Plant Cell 20: 11–24 Nat Rev Genet 12: 692–702
Scannell DR, Wolfe KH (2008) A burst of protein sequence evolution and a Tezuka D, Ito A, Mitsuhashi W, Toyomasu T, Imai R (2015) The rice ent-
prolonged period of asymmetric evolution follow gene duplication in KAURENE SYNTHASE LIKE 2 encodes a functional ent-beyerene syn-
yeast. Genome Res 18: 137–147 thase. Biochem Biophys Res Commun 460: 766–771
Schlötterer C (2015) Genes from scratch: the evolutionary fate of de novo Theissen G (2001) Development of floral organ identity: stories from the
genes. Trends Genet 31: 215–219 MADS house. Curr Opin Plant Biol 4: 75–85
Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Thibaud-Nissen F, Ouyang S, Buell CR (2009) Identification and charac-
Song Q, Thelen JJ, Cheng J, et al (2010) Genome sequence of the pa- terization of pseudogenes in the rice gene complement. BMC Genomics
laeopolyploid soybean. Nature 463: 178–183 10: 317
Schnable JC, Springer NM, Freeling M (2011) Differentiation of the maize Thomas BC, Pedersen B, Freeling M (2006) Following tetraploidy in an
subgenomes by genome dominance and both ancient and ongoing gene Arabidopsis ancestor, genes were removed preferentially from one ho-
loss. Proc Natl Acad Sci USA 108: 4069–4074 meolog leaving clusters enriched in dose-sensitive genes. Genome Res
Schnable JC, Wang X, Pires JC, Freeling M (2012) Escape from preferential 16: 934–946
retention following repeated whole genome duplications in plants. Front Throude M, Bolot S, Bosio M, Pont C, Sarda X, Quraishi UM, Bourgis F,
Plant Sci 3: 94 Lessard P, Rogowsky P, Ghesquiere A, et al (2009) Structure and ex-
Schranz ME, Mohammadin S, Edger PP (2012) Ancient whole genome pression analysis of rice paleo duplications. Nucleic Acids Res 37: 1248–
duplications, novelty and diversification: the WGD radiation lag-time 1259
model. Curr Opin Plant Biol 15: 147–153 Tremblay RL (1992) Trends in the pollination ecology of the Orchidaceae:
Seoighe C, Gehring C (2004) Genome duplication led to highly selective evolution and systematics. Can J Bot 70: 642–650
expansion of the Arabidopsis thaliana proteome. Trends Genet 20: 461– True JR, Carroll SB (2002) Gene co-option in physiological and morpho-
464 logical evolution. Annu Rev Cell Dev Biol 18: 53–80
Seoighe C, Wolfe KH (1999) Yeast genome evolution in the post-genome Tuskan GA, Difazio S, Jansson S, Bohlmann J, Grigoriev I, Hellsten U,
era. Curr Opin Microbiol 2: 548–554 Putnam N, Ralph S, Rombauts S, Salamov A, et al (2006) The genome
Sharopova N (2008) Plant simple sequence repeats: distribution, variation, of black cottonwood, Populus trichocarpa (Torr. & Gray). Science 313:
and effects on gene expression. Genome 51: 79–90 1596–1604
She X, Cheng Z, Zöllner S, Church DM, Eichler EE (2008) Mouse seg- Vandenbussche M, Zethof J, Royaert S, Weterings K, Gerats T (2004) The
mental duplication and copy number variation. Nat Genet 40: 909–914 duplicated B-class heterodimer model: whorl-specific effects and com-
Shiu SH, Shih MC, Li WH (2005) Transcription factor families have much plex genetic interactions in Petunia hybrida flower development. Plant
higher expansion rates in plants than in animals. Plant Physiol 139: 18– Cell 16: 741–754
26 Van de Peer Y, Fawcett JA, Proost S, Sterck L, Vandepoele K (2009a) The
Siena LA, Ortiz JP, Calderini O, Paolocci F, Cáceres ME, Kaushal P, flowering world: a tale of duplications. Trends Plant Sci 14: 680–688
Grisan S, Pessino SC, Pupilli F (2016) An apomixis-linked ORC3-like Van de Peer Y, Maere S, Meyer A (2009b) The evolutionary significance of
pseudogene is associated with silencing of its functional homolog in ancient genome duplications. Nat Rev Genet 10: 725–732
apomictic Paspalum simplex. J Exp Bot 67: 1965–1978 van Hoek MJ, Hogeweg P (2007) The role of mutational dynamics in ge-
Sikosek T, Chan HS, Bornberg-Bauer E (2012) Escape from adaptive nome shrinkage. Mol Biol Evol 24: 2485–2494
conflict follows from weak functional trade-offs and mutational ro- van Hoek MJ, Hogeweg P (2009) Metabolic adaptation after whole genome
bustness. Proc Natl Acad Sci USA 109: 14888–14893 duplication. Mol Biol Evol 26: 2441–2453
Smith JD, Bickham JW, Gregory TR (2013) Patterns of genome size di- Vanin EF (1985) Processed pseudogenes: characteristics and evolution.
versity in bats (order Chiroptera). Genome 56: 457–472 Annu Rev Genet 19: 253–272
Smith LM, Bomblies K, Weigel D (2011) Complex evolutionary events at a Vanneste K, Maere S, Van de Peer Y (2014) Tangled up in two: a burst of
tandem cluster of Arabidopsis thaliana genes resulting in a single-locus genome duplications at the end of the Cretaceous and the consequences
genetic incompatibility. PLoS Genet 7: e1002164 for plant evolution. Philos Trans R Soc Lond B Biol Sci 369: 369
Soltis DE, Albert VA, Leebens-Mack J, Bell CD, Paterson AH, Zheng C, Vaughan DA, Balázs E, Heslop-Harrison JS (2007) From crop domestica-
Sankoff D, Depamphilis CW, Wall PK, Soltis PS (2009) Polyploidy and tion to super-domestication. Ann Bot (Lond) 100: 893–901
angiosperm diversification. Am J Bot 96: 336–348 Veitia RA, Bottani S, Birchler JA (2013) Gene dosage effects: nonline-
Soltis DE, Visger CJ, Soltis PS (2014) The polyploidy revolution then.and arities, genetic interactions, and dosage compensation. Trends Genet
now: Stebbins revisited. Am J Bot 101: 1057–1078 29: 385–393
Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and Velasco R, Zharkikh A, Affourtit J, Dhingra A, Cestaro A, Kalyanaraman
genome evolution in plants. Curr Opin Genet Dev 35: 119–125 A, Fontana P, Bhatnagar SK, Troggio M, Pruss D, et al (2010) The
Soltis PS, Soltis DE (2016) Ancient WGD events as drivers of key inno- genome of the domesticated apple (Malus 3 domestica Borkh.). Nat
vations in angiosperms. Curr Opin Plant Biol 30: 159–165 Genet 42: 833–839
Sommer H, Beltrán JP, Huijser P, Pape H, Lönnig WE, Saedler H, 
Vukašinovic N, Cvrcková F, Eliáš M, Cole R, Fowler JE, Zárský V, Synek
Schwarz-Sommer Z (1990) Deficiens, a homeotic gene involved in the L (2014) Dissecting a hidden gene duplication: the Arabidopsis thaliana
control of flower morphogenesis in Antirrhinum majus: the protein SEC10 locus. PLoS ONE 9: e94077
shows homology to transcription factors. EMBO J 9: 605–613 Wang H, Beyene G, Zhai J, Feng S, Fahlgren N, Taylor NJ, Bart R,
Song K, Lu P, Tang K, Osborn TC (1995) Rapid genome change in syn- Carrington JC, Jacobsen SE, Ausin I (2015a) CG gene body DNA
thetic polyploids of Brassica and its implications for polyploid evolu- methylation changes and evolution of duplicated genes in cassava.
tion. Proc Natl Acad Sci USA 92: 7719–7723 Proc Natl Acad Sci USA 112: 13729–13734
Tack DC, Pitchers WR, Adams KL (2014) Transcriptome analysis indicates Wang J, Marowsky NC, Fan C (2014a) Divergence of gene body DNA meth-
considerable divergence in alternative splicing between duplicated ylation and evolution of plant duplicate genes. PLoS ONE 9: e110357
genes in Arabidopsis thaliana. Genetics 198: 1473–1481 Wang S, Adams KL (2015) Duplicate gene divergence by changes in
Takos AM, Knudsen C, Lai D, Kannangara R, Mikkelsen L, Motawia MS, microRNA binding sites in Arabidopsis and Brassica. Genome Biol
Olsen CE, Sato S, Tabata S, Jørgensen K, et al (2011) Genomic clus- Evol 7: 646–655
tering of cyanogenic glucoside biosynthetic genes aids their identifica- Wang W, Haberer G, Gundlach H, Gläßer C, Nussbaumer T, Luo MC,
tion in Lotus japonicus and suggests the repeated evolution of this Lomsadze A, Borodovsky M, Kerstetter RA, Shanklin J, et al (2014b)
chemical defence pathway. Plant J 68: 273–286 The Spirodela polyrhiza genome reveals insights into its neotenous re-
Takuno S, Gaut BS (2012) Body-methylated genes in Arabidopsis thaliana duction fast growth and aquatic lifestyle. Nat Commun 5: 3311
are functionally important and evolve slowly. Mol Biol Evol 29: 219–227 Wang W, Zheng H, Fan C, Li J, Shi J, Cai Z, Zhang G, Liu D, Zhang J,
Tank DC, Eastman JM, Pennell MW, Soltis PS, Soltis DE, Hinchliff CE, Vang S, et al (2006) High rate of chimeric gene origination by retropo-
Brown JW, Sessa EB, Harmon LJ (2015) Nested radiations and the pulse sition in plant genomes. Plant Cell 18: 1791–1802

Plant Physiol. Vol. 171, 2016 2315


Panchy et al.

Wang Y, Tan X, Paterson AH (2013a) Different patterns of gene structure Yang L, Takuno S, Waters ER, Gaut BS (2011) Lowly expressed genes in
divergence following gene duplication in Arabidopsis. BMC Genomics Arabidopsis thaliana bear the signature of possible pseudogenization by
14: 652 promoter degradation. Mol Biol Evol 28: 1193–1203
Wang Y, Wang X, Lee TH, Mansoor S, Paterson AH (2013b) Gene body Yang S, Gu T, Pan C, Feng Z, Ding J, Hang Y, Chen JQ, Tian D (2008)
methylation shows distinct patterns associated with different gene origins Genetic variation of NBS-LRR class resistance genes in rice lines. Theor
and duplication modes and has a heterogeneous relationship with gene Appl Genet 116: 165–177
expression in Oryza sativa (rice). New Phytol 198: 274–283 Yang X, Tuskan GA, Cheng MZ (2006) Divergence of the Dof gene families
Wang Y, Xiong G, Hu J, Jiang L, Yu H, Xu J, Fang Y, Zeng L, Xu E, Xu J, in poplar, Arabidopsis, and rice suggests multiple modes of gene evo-
et al (2015b) Copy number variation at the GL7 locus contributes to lution after duplication. Plant Physiol 142: 820–830
grain size diversity in rice. Nat Genet 47: 944–948 Yu J, Ke T, Tehrim S, Sun F, Liao B, Hua W (2015) PTGBase: an integrated
Wang Z, Hobson N, Galindo L, Zhu S, Shi D, McDill J, Yang L, Hawkins database to study tandem duplicated genes in plants. Database (Oxford)
S, Neutelings G, Datla R, et al (2012) The genome of flax (Linum usi- 2015: bav017
tatissimum) assembled de novo from short shotgun sequence reads. Zhang H (2003) Evolution by gene duplication: an update. Trends Ecol Evol
Plant J 72: 461–473 18: 292
Warren WC, Hillier LW, Marshall Graves JA, Birney E, Ponting CP, Zhang J (2012) Genetic redundancies and their evolutionary maintenance.
Grützner F, Belov K, Miller W, Clarke L, Chinwalla AT, et al (2008) Adv Exp Med Biol 751: 279–300
Genome analysis of the platypus reveals unique signatures of evolution. Zhang R, Murat F, Pont C, Langin T, Salse J (2014) Paleo-evolutionary
Nature 453: 175–183 plasticity of plant disease resistance genes. BMC Genomics 15: 187
Wolfe KH, Shields DC (1997) Molecular evidence for an ancient duplica- Zhang S, Zhang JS, Zhao J, He C (2015) Distinct subfunctionalization and
tion of the entire yeast genome. Nature 387: 708–713 neofunctionalization of the B-class MADS-box genes in Physalis flori-
Wood TE, Takebayashi N, Barker MS, Mayrose I, Greenspoon PB, Rieseberg dana. Planta 241: 387–402
LH (2009) The frequency of polyploid speciation in vascular plants. Proc Natl Zhang W, Kramer EM, Davis CC (2010) Floral symmetry genes and the
Acad Sci USA 106: 13875–13879 origin and maintenance of zygomorphy in a plant-pollinator mutualism.
Wray GA, Hahn MW, Abouheif E, Balhoff JP, Pizer M, Rockman MV, Proc Natl Acad Sci USA 107: 6388–6393
Romano LA (2003) The evolution of transcriptional regulation in eu- Zhang Z, Belcram H, Gornicki P, Charles M, Just J, Huneau C, Magdelenat
karyotes. Mol Biol Evol 20: 1377–1419 G, Couloux A, Samain S, Gill BS, et al (2011) Duplication and parti-
Xiao H, Jiang N, Schaffner E, Stockinger EJ, van der Knaap E (2008) A tioning in evolution and function of homoeologous Q loci governing
retrotransposon-mediated gene duplication underlies morphological domestication characters in polyploid wheat. Proc Natl Acad Sci USA
variation of tomato fruit. Science 319: 1527–1530 108: 18737–18742
Yamada K, Lim J, Dale JM, Chen H, Shinn P, Palm CJ, Southwick Zou C, Lehti-Shiu MD, Thibaud-Nissen F, Prakash T, Buell CR, Shiu SH
AM, Wu HC, Kim C, Nguyen M, et al (2003) Empirical analysis of (2009a) Evolutionary and expression signatures of pseudogenes in
transcriptional activity in the Arabidopsis genome. Science 302: Arabidopsis and rice. Plant Physiol 151: 3–15
842–846 Zou C, Lehti-Shiu MD, Thomashow M, Shiu SH (2009b) Evolution of
Yang L, Gaut BS (2011) Factors that contribute to variation in evolutionary stress-regulated gene expression in duplicate genes of Arabidopsis
rate among Arabidopsis genes. Mol Biol Evol 28: 2359–2369 thaliana. PLoS Genet 5: e1000581

2316 Plant Physiol. Vol. 171, 2016

You might also like

pFad - Phonifier reborn

Pfad - The Proxy pFad of © 2024 Garber Painting. All rights reserved.

Note: This service is not intended for secure transactions such as banking, social media, email, or purchasing. Use at your own risk. We assume no liability whatsoever for broken pages.


Alternative Proxies:

Alternative Proxy

pFad Proxy

pFad v3 Proxy

pFad v4 Proxy