Plant Physiology Preview. Published on March 21, 2017, as DOI:10.1104/pp.16.01981
1 2
Running Title: Hierarchical alignment of legume genomes
3 4
Corresponding author:
5
Xiyin Wang
6
School of Life Sciences and Center for Genomics and Computational Biology, North
7
China University of Science and Technology, Tangshan, Hebei 063000, China
8
Tel: 86-315-3721512
9
E-mail:
[email protected]
10
1
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Copyright 2017 by the American Society of Plant Biologists
11 12
Title: Hierarchically aligning 10 legume genomes establishes a family-level genomics
13
platform
14 15
Authors: Jinpeng Wang, Pengchuan Sun, Yuxian Li, Yinzhe Liu, Jigao Yu, Xuelian Ma,
16
Sangrong Sun, Nanshan Yang, Ruiyan Xia, Tianyu Lei, Xiaojian Liu, Beibei Jiao, Yue
17
Xing, Weina Ge, Li Wang, Zhenyi Wang, Xiaoming Song, Min Yuan, Di Guo, Lan
18
Zhang, Jiaqi Zhang, Dianchuan Jin, Wei Chen, Yuxin Pan, Tao Liu, Ling Jin, Jinshuai
19
Sun, Jiaxiang Yu, Rui Cheng, Xueqian Duan, Shaoqi Shen, Jun Qin, Meng-chen Zhang,
20
Andrew H. Paterson, Xiyin Wang*
21 22
School of Life Sciences, North China University of Science and Technology, Tangshan,
23
Hebei 063000, China (J.W., Y.Li, Y.Liu, J.Y., X.M., S.Sun., N.Y., R.X., T.Lei, X.L.,
24
W.G., L.W., Z.W., X.S., M.Y., D.G., L.Z., J.Z., Y.P., J.S., J.Y., R.C., X.D., S.Shen,
25
X.W.); Center for Genomics and Computational Biology, North China University of
26
Science and Technology, Tangshan, Hebei 063000, China (J.W., P.S., Y.Li, Y.Liu, J.Y.,
27
S.Sun., N.Y., T.Lei, B.J., Y.X., W.G., L.W., Z.W., X.S., M.Y., D.G., L.Z., J.Z., D.J.,
28
W.C., Y.P., T.Liu, L.J., J.Y., X.W.); Cereal & Oil Crop Institute, Hebei Academy of
29
Agricultural and Forestry Sciences No. 162, Hengshanjie Street, Shijiazhuang, 050035,
30
China (J.Q., M.Z.); Plant Genome Mapping Laboratory, University of Georgia, Athens,
31
GA, 30605, USA (A.P.)
32 33
One-sentence summary: A hierarchical and event-related alignment laid a solid
34
foundation for further genomics exploration in the legume research community and
35
beyond.
36 37
2
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
38
Footnotes:
39
1
40
Key Research Project “Seven Key Crop Breeding Project” (SQ2016ZY03002918), China
41
National Science Foundation (3151333 to J.W. and 31371282 to X.W.), Natural Science
42
Foundation of Hebei Province (C2015209069 to J.W. and C2016209097 to W.G.). Hebei
43
New Century 100 Creative Talents Project, Hebei 100 Talented Scholars project, and
44
Tangshan Key Laboratory Project to X.W.; National fund cultivation project of North
45
China University of Science and Technology (GP201508) to D. J., US National Science
46
Foundation (ACI1339727) to X.W. and A.P., and GA Peanut Commission and
47
Southeastern Peanut Research Initiative to A.P.
We appreciate financial support from the China Department of Science and Technology
48 49
*
Address correspondence to
[email protected]
50 51
The author responsible for distribution of materials integral to the findings presented in
52
this article in accordance with the policy described in the Instructions for Authors
53
(www.plantphysiol.org) is: Xiyin Wang (
[email protected]).
54 55
X.W. conceived and led the research. J.W. implemented and coordinated the analysis.
56
P.S., Y.Li, Y.Liu, R.X., X.M., J.Y., N.Y., S.Sun, X.L., B.J., Y.X., X.S., J.Z., L.J., J.S.,
57
J.Y., R.C., X.D., S.Shen performed the analysis. T.Liu and T.Lei contributed analyzing
58
tools. W.G., L.W., Z.W., L.Z., D.G., D.J., Y.P., J.Q., M.Z. performed the analysis with
59
constructive discussions. X.W., A.P., and J.W. wrote the manuscript.
60 61
3
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
62 63 64
ABSTRACT
65
wild peanuts, barrel medic, etc, have been sequenced. However, a family-level
66
comparative genomics analysis has been unavailable. With grape and selected legume
67
genomes as outgroups, we managed to perform a hierarchical and event-related alignment
68
of these genomes and deconvoluted layers of homologous regions produced by ancestral
69
polyploidizations or speciations. Consequently, we illustrated genomic fractionation
70
characterized by wide-spread gene losses after the polyploidizations. Notably, high
71
similarity in gene retention between recently duplicated chromosomes in soybean
72
supported a likely autopolypoidy nature of its tetraploid ancestor. Moreover, though most
73
gene losses were nearly random, largely but not fully described by geometric distribution,
74
we showed that polyploidization contributed divergently to copy number variation of
75
important gene families. Besides, we showed significantly divergent evolutionary levels
76
among legumes, and by performing Ks correction, re-dated major evolutionary events
77
during their expansion. The present effort laid a solid foundation for further genomics
78
exploration in the legume research community and beyond. We described only a tiny
79
fraction of legume comparative genomics analysis that we performed, and more
80
information was stored in the newly constructed Legume Comparative Genomics
81
Research Platform (www.legumegrp.org).
82 83
Key words: Legume, Polyploidization, Whole-genome alignment, Genomic
84
fractionation, Gene colinearity
Mainly due to their economic importance, genomes of 10 legumes, including soybean,
85
4
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
86
INTRODUCTION
87
The Fabaceae, Leguminosae or Papilionaceae, commonly known as the legume, pea, or
88
bean family, is a large and economically important monophyletic family of flowering
89
plants. It includes trees, shrubs, and perennial or annual herbaceous plants, which are
90
easily recognized by their fruit (legume) and their compound, stipulated leaves (Goebel,
91
1969). As the third-largest land plant family, legumes are widely distributed and divided
92
into 650 genera and over 18,860 species, accounting for about 7% of flowering plant
93
species (Magallon and Sanderson, 2001). Along with cereals, fruits and tropical roots of a
94
number of legumes have been a staple human food and their use is closely related to
95
human evolution (Zhu et al., 2005). Further, legumes are an important part of natural
96
ecosystems as they fix atmospheric nitrogen by intimate symbioses with microorganisms
97
(Doyle, 2011).
98
Mainly due to their economic importance, whole-genome sequences for a number of
99
legumes have been deciphered, including Glycine max (L.) Merr. (soybean)(Schmutz et
100
al., 2010), Cicer arietinum (L.) (chickpea) (Varshney et al., 2013), Medicago truncatula
101
Gaertn. (barrel medic) (Young et al., 2011; Tang et al., 2014), Lotus japonicus L. (lotus)
102
(Sato et al., 2008), Vigna radiata (L.) R. Wilczek (mung bean) (Kang et al., 2014) and
103
Vigna angularis (Willd.) Ohashi (adzuki bean) (Kang et al., 2015), Cajanus cajan (L.)
104
Millsp (pigeon pea) (Varshney et al., 2012), Phaseolus vulgaris (L.) (common bean)
105
(Schmutz et al., 2014), and two wild peanuts (Arachis duranensis Krapov. &
106
W.C.Gregory and Arachis ipaensis Krapov. & W.C.Gregory)(Bertioli et al., 2016; Chen
107
et al., 2016). These legume genomes have sizes ranging from ~400 (barrel medic) to 1150
108
Mb (soybean), packaged into 6 to 20 chromosomes.
109
Most if not all legumes, having originated from a common ancestor about 60 million
110
years ago (Mya), shared a tetraploid ancestor (named legume-common tetraploid, or
111
LCT) of similar age (Schmutz et al., 2010) that played a major role in shaping legume
112
genome organization (Young et al., 2011). Before the LCT, legumes share an ancient
113
core-eudicot-common hexaploid ancestor (ECH, often named gamma), which was
114
revealed first with the Arabidopsis genome sequence (Bowers et al., 2003), then
115
described in details based on the grape genome (Jaillon et al., 2007; Jiao et al., 2012),
116
often taken as a valuable reference to explore genome structure of eudicots. More recent
5
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
117
polyploidizations continued to occur in some legume lineages, offering the opportunity
118
for punctuational change in the evolution of these plants, e.g., one occurring ~13 Mya
119
and specifically contributing to the formation of the extant soybean genome (Schmutz et
120
al., 2010) (named soybean-specific tetraploid, or SST).
121
Polyploidization, as an abrupt evolutionary event, can occur over night, but exerts an
122
enormous effect on the evolution of a plant, and even triggers speciation and
123
diversification processes (Paterson et al., 2004; Soltis et al., 2008; Jiao et al., 2011).
124
Recently, polyploidization has been suggested to explain the long-standing mystery of the
125
rapid formation and diversification of land plants (Frohlich and Chase, 2007; Van de
126
Peer, 2011). Polyploidization can have short-term and long-term effects, genetically or
127
epigenetically, and/or at single-gene or whole-genome scale. After a new polyploid
128
forms, the genome can be very unstable, and in the first generations, it may lose much of
129
its DNA content, as evidenced for example by the production of synthetic tetraploid
130
wheat (Kashkush et al., 2002). Evolutionary analysis also supports this inference.
131
Comparative analysis of the cereal genomes, sharing a 100-Mya tetraploid ancestor,
132
suggested that the majority of gene losses (97% or more) occurred before the divergence
133
of sorghum (panicoids) and rice (oryzoids) (Paterson et al., 2009). Nonetheless,
134
thousands of polyploidy-derived duplicated genes can still be preserved in extant
135
genomes. These duplicated genes may take different evolutionary avenues, to share or
136
divide ancestral gene functions, or develop novel genetic functions (Feldman et al., 2012;
137
Lin et al., 2014). As to gene expression, it has been proposed that at least 57-85% of
138
paleopolyploid-produced duplicates have diverged in rice (Throude et al., 2009), and
139
duplicates with high expression tend to have higher CG body methylation (Wang et al.,
140
2013). This suggests that epigenetic changes may have contributed to genomic
141
preservation, maintenance, and restoration of genomic stability (Wang et al., 2013).
142
The availability of 10 hard-won legume genomes provides a precious opportunity to
143
understand legume biology. Here, by developing approaches to perform hierarchical
144
comparative genomics analysis, we produced multiple alignments of all these 10 legume
145
genomes. By tracking information about ancestral polyploidization, we de-convoluted the
146
layer-by-layer homology between the legume genomes. This enabled us to evaluate
147
evolutionary divergence among legumes, re-date major evolutionary events, and reveal
6
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
148
rules of massive gene losses and expression changes between duplicated genes. The
149
hierarchical alignment yielded a homologous gene list, relating to different evolutionary
150
events such as recursive polyploidizations and plant divergences. The present efforts
151
provided a valuable genomic platform for researchers in the plant community to
152
investigate evolutionary changes, functional innovations, and phylogenetic structures of
153
gene families and regulatory pathways.
154 155
RESULTS
156
Gene colinearity within and among genomes
157
Intragenomic homology
158
By inferring gene colineartiy, we detected colinear genes within each legume
159
genome, between each pair of them, and between them and grape, which was used as an
160
outgroup reference. Homologous blocks with more than 4, 10, 20, and 50 colinear genes
161
were checked (Supplemental Table S1-2).
162
The legume genomes were divergent in numbers of duplicated blocks and colinear
163
genes residing in them. For blocks containing more than 4 colinear genes, we found the
164
most duplicated genes in soybean (25,302 pairs), and the fewest in adzuki bean (1,956
165
pairs) (Supplemental Table S1). The large difference in duplicated gene numbers among
166
genomes might be related to the SST in soybean, or to incomplete assembly of the
167
legume genomes. In soybean, 434, 224, and 87 blocks had more than 10, 20, and 50
168
colinear genes, which contain 20,365, 17,578, and 13,191 colinear genes, accounting for
169
44.9%, 38.8%, and 29.1% of total gene contents, respectively. The longest homologous
170
region supported by gene colinearity was from soybean chromosomes Gm10 and Gm20,
171
having 824 colinear genes in a 12.87 Mb region. The other genomes had much shorter
172
duplicated blocks, often with fewer than 10 blocks having more than 50 colinear genes.
173
For example, among hundreds of duplicated blocks in barrel medic and mung bean, each
174
had only 9 duplicated blocks with more than 50 colinear genes. Common bean has the
175
most (12) duplicated blocks of more than 50 colinear genes.
176
Intergenomic homology 7
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
177
Intergenomic homology among legumes is often better than intragenomic
178
homology, consistent with speciations often being more recent than genome duplications.
179
Between these legume genomes, there were often many thousands of colinear genes
180
(Supplemental Table S1). Soybean had more colinear genes with other legumes than
181
were found between any other legumes, due to the SST. For example, soybean and barrel
182
medic genes form 50,672 colinear gene pairs located in 2,824 homologous blocks with
183
more than 4 colinear genes, involving 21,103 (~35.4%) and 34,822 (~47.7%) genes from
184
the two genomes, respectively. There were often tens of intergenomic blocks with more
185
than 50 colinear genes. Two peanut genomes have 16,484 colinear genes in 50 blocks,
186
with each containing at least 50 colinear genes. Detailed statistics of numbers of inferred
187
paralogous and orthologous genes, gene pairs, and blocks are in Supplemental
188
information Table S2-5.
189
Multiple genome/chromosome alignment
190
Event-related genomic homology
191
Intergenomic comparison helped to unravel the structural complexity of legume
192
genomes, which had been a result of recursive polyploidization events successively
193
doubling or tripling the numbers of existing homologous regions (Fig. 1). Analysis of the
194
grape genome contributed to understanding the triplicated nature of the ancestral core
195
eudicot genome, which appears to have transitioned from 2n = 2x = 14 to 2n = 6x = 42
196
chromosomes (Jaillon et al., 2007). Here, we used the grape genome to distinguish
197
orthologous from outparalogous regions between legumes, and paralogous regions within
198
each legume. Homologous regions in different genomes are called outparalogous when
199
they were produced by the genomic duplication in two species’ common ancestor, to
200
distinguish from paralogous regions produced by duplication specific to one species.
201
Homologous gene dotplots (Supplemental Fig. S1-Fig. S4) depict genomic comparisons
202
and provide for inferences of orthology and paralogy. Orthologous regions between grape
203
and legumes have much better DNA similarity than between outparalogous regions, the
204
latter being a result of the ECH. The details inferring orthology and paralogy can be
205
found in the methods and Supplemental text. Similar analyses have been described for
8
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
206
grass genomes and the cotton genome (Paterson et al., 2012; Wang et al., 2015; Wang et
207
al., 2016). In that an extra LCT shared by all legumes, there would be an expected 1:2
208
ratio of orthologous regions between grape and most legumes, with the additional SST
209
conferring a 1:4 ratio between grape and soybean. In partial summary, intergenomic
210
analysis revealed layers of genomic homology in the complex legume genomes. Above,
211
we used grape as the outgroup reference to deconvolute the genomic complexity of barrel
212
medic and other legumes to find duplicated blocks in each of them, and homology
213
between them. In a similar manner, we adopted barrel medic and common bean as
214
references to distinguish recent SST duplicated regions in soybean.
215
Multiple alignment
216
With the grape genome as a reference, we produced a table to store inter- and
217
intra-genomic homology information. First, we filled in all grape gene IDs in the first
218
column of the table, then added gene IDs from legumes column by column, species by
219
species according to the colinearity inferred by multiple alignments. As noted above, in
220
the absence of gene loss the grape genes would have 2 colinear orthologous genes in most
221
legumes, and 4 in soybean. When a legume species contained a gene showing colinearity
222
with a grape gene, a gene ID was filled into an appropriate cell in the table. When a
223
legume species did not have an expected colinear gene, often due to gene loss or
224
translocation or insufficient assembly, a dot (signifying missing) was filled into an
225
appropriate cell. For 11 (sub)genomes (including two subgenomes for soybean) there
226
have 23 (9x2+4+1) columns in the table. Moreover, due to the ECH, each chromosomal
227
segment would repeat three times in each genome. Based on homology inferred in grape,
228
we therefore extended the table to 69 columns. Finally, we constructed a table of colinear
229
genes reflecting three polyploidizations and all salient speciations. In partial summary,
230
the table summarized results of multiple-genome and event-related alignment, reflecting
231
layers of tripled and/or doubled homology due to recursive polyploidizations (Fig. 2).
232
The genomic alignment table for 10 legumes with grape as a reference is not
233
complete – in particular, it cannot include all duplicated genes produced by the SST. That
234
is, genes specific to legumes and absent from the grape genome are not represented.
9
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
235
Therefore, the grape-legume homology table was supplemented by a genomic homology
236
table with barrel medic as reference (Supplemental Fig. S5), to better represent pan-
237
legume gene content.
238
Event-related duplicated genes
239
The cross-legume genome analyses described above helped to identify duplicated
240
genes produced by each polyploidization event, and to infer gene content in the ancestral
241
genomes before each polyploidization and speciation event. In grape, we inferred 1,764
242
pairs of genes in 86 homoeologous regions derived from the ECH, involving 2,893 extant
243
genes (Table 1). Being affected by more polyploidizations, legume genomes contain
244
more duplicates. In barrel medic, 2,504 gene pairs involving 2,961 genes were inferred in
245
194 ECH-derived homoeologous regions. However, fewer ECH-derived duplicates were
246
inferred in some legumes. For example, only 300-1,400 ECH gene pairs were inferred for
247
pigeon pea, adzuki bean, and Lotus japonica. The most ECH-derived gene pairs were
248
inferred from soybean, with 3,663 gene pairs involving 2,575 genes from 344
249
homoeologous regions. The high numbers of soybean ECH genes result partly from the
250
additional SST, which would have produced up to 5 times ((6,2)/(3,2)) the number of
251
various combinations of homoeologous gene pairs found in other legumes. Here, (m, n)
252
defines the combinatorial number.
253
We also characterized LCT-derived gene pairs, which showed 10-fold variation
254
among legumes. In barrel medic, 4,796 gene pairs involving 4,198 genes were inferred
255
from 309 LCT-derived homoeologous regions. In soybean, 8,317 gene pairs involving
256
9,486 genes were derived from 343 LCT-derived homoeologous regions. Pigeon pea has
257
the fewest LCT-derived gene pairs (869). The reduced abundance of inferred LCT-
258
derived gene pairs may be resulted from poor assembly. Soybean-specific
259
tetraploidization, SST, produced 17,104 gene pairs involving 19,210 genes were derived
260
from 133 homoeologous regions.
261
Genomic fractionation
10
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
262
Genomic fractionation reshapes plant genomes. Key forces driving genomic
263
fractionation include polyploidization, multiplying gene content of an entire genome; and
264
transposon activities, duplicating, and relocating individual genes (Wang et al., 2011).
265
Here, by using grape, barrel medic, and common bean as references, we show how gene
266
removal eroded colinearity between homologous genomic regions.
267
By using the grape genome and genes as a reference, it is clear that there has been
268
widespread genomic fractionation following LCT (Supplemental Table S6). For
269
example, regarding grape chromosome 1 as outgroup, as to pairwise alignment of the
270
grape and each medic barrel duplicate, 75% and 77% of grape genes were not found at
271
the respective colinear locations; as to triple-wise alignment of barrel medic duplicated
272
regions and the outgroup, 70% of the grape gene were absent from both collinear
273
locations. For common bean, the corresponding numbers are 94%, 89%, and 83%,
274
respectively. Using barrel medic chromosome 1 as a reference, 74%, 73%, and 69% of its
275
genes were not found at the respective colinear locations in each or both of the duplicated
276
regions produced by the SST. A local alignment of colinear blocks among genomes
277
shows the pattern of genomic fractionation (Fig. 3). Some missing genes from the
278
homologous locations may be related to deletions of adjacent transposons or movements
279
of transposons disrupted the gene orders, and may be also related to poor assemblies or
280
annotations, as further discussed below.
281
To investigate the scale and potential mechanisms of fractionation, we counted
282
the numbers of runs of removed genes in each legume genome relative to a reference
283
genome, that is, the numbers of consecutive genes from the reference not appearing in the
284
studied genome. Many missing genes comprised small runs, i.e., of only 1 or 2 genes (Fig.
285
4). For example, these small runs comprise 53% of missing genes and up to 71% of all
286
10604 runs in common bean; 15% of genes and up to 49% of all 13936 runs in barrel
287
medic; and 15.2% of genes and up to 44.7% of all 7984 runs in the referenced grape
288
genome. From another perspective, 77.6%, 56.5%, and 47% of genes were removed from
289
their anticipated locations, in runs of 10 genes or fewer that account for up to 48.4%,
290
89.5%, and 85.6% of all runs for each of the referred grape, barrel medic, and common
291
bean genomes, respectively. The references work as temporal outgroups, with common
11
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
292
bean, barrel medic, and grape being successively more diverged from soybean
293
(Supplemental Table S7-8). Missing genes were more likely to appear in small runs
294
using common bean as reference than barrel medic or grape. This suggests an
295
accumulating effect with initial gene loss resulting in small runs that are gradually
296
extended over time.
297
The lengths and numbers of runs of removed genes closely approximated a
298
geometric distribution. We fitted the observed distribution of numbers of different runs
299
by using different density curves of the geometric distribution, with extension parameters
300
0.33, 0.31, and 0.30, respectively for common bean, barrel medic, and grape as references;
301
finding goodness of fit of 0.995, 0.991, and 0.994 with p-values of 0.92, 0.91, and 0.89
302
(F-test), respectively (Supplemental Table S9). The closer is the reference plant to
303
soybean, the shorter are the runs of lost genes (Fig. 4), showing better gene sharing
304
pattern. The deviation between the observed numbers and the theoretically predicated
305
becomes larger when the gene loss runs are longer, which also supports the length
306
extension of removed-gene runs over time.
307 308
Correspondingly balanced fractionation between the SST homoeologous chromosomes
309
Aligning duplicated soybean regions onto corresponding single barrel medic
310
chromosomes permitted us to ‘reconstruct’ (infer) the gene composition of ancestral
311
duplicated SST paralogous chromosomes, which often show significant divergence of
312
gene retention rates. Among 8 barrel medic chromosomes, 7 have significantly divergent
313
paralogous soybean chromosomal regions at Chisq-test significance level 0.05; or 6 at
314
0.01 (Supplemental Table S10 and Table S7). This finding shows unbalanced gene
315
retention between homoeologous chromosomes. However, scrutiny of gene retention/loss
316
using a sliding window along chromosomes showed that in nearly all local regions, with
317
the exception of large patches of DNA losses in one copy of the duplicated chromosomes,
318
genomic retention and loss are often highly similar (Fig. 5). The difference of gene
319
retention between corresponding paralogous regions is always varying around zero level.
320
The difference observed above in the chromosome level should have been caused by
321
large-patches of alternative segmental DNA losses due to genomic instability (Fig. 5).
12
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
322
Grossly, this finding suggests little if any dominance between members of homoeologous
323
chromosome pairs, providing further evidence of likely autotetraploidization nature of the
324
SST (Garsmeur et al., 2014).
325
Karyotype changes and inter-genomic representation
326
After recursive polyploidizations, plants often restore chromosome numbers to
327
relatively small values. Grape and legumes share an eudicot common ancestor inferred to
328
have had 2n = 6x = 42 chromosomes, resulting from triplication of a basal set (x) of 7
329
chromosomes (2n = 2x = 14) by the ECH. By using gene colinearity information, the 19
330
grape chromosomes or chromosomal regions were grouped into 7 sets of paralogous
331
triplets, which were mapped onto the chromosomes of legumes (Fig. 1).
332
After the LCT, the non-soybean legumes under consideration have 6 – 11 haploid
333
chromosomes, suggesting considerable chromosome number reduction. The legume-
334
common ancestor may have had 11 chromosomes, still found in common bean and its
335
indigoteroid/millettioid relatives, while the Dalbergioid (peanut) and Hologalegina
336
(chickpea and barrel medic) legumes may have experienced chromosome number
337
reductions. Soybean tetraploidization (SST) might have produced 22 chromosomes, with
338
a chromosome fusion resulting in 20 extant chromosomes. Within the indigoteroid clade,
339
3 legumes here have the same chromosome number (n = 11), but their chromosomes
340
differ in composition (Fig. 6). At least 6 common bean chromosomes were largely
341
preserved in other legumes (Fig. 6).
342
Evolutionary divergence and dating
343
We found that legume genes evolve at considerably divergent rates in different
344
genomes. By estimating synonymous nucleotide substitutions at synonymous sites (Ks),
345
we characterized divergence levels between colinear homoeologs between different
346
legumes or within a legume. Recursive polyploidization events can be identified based on
347
Ks distributions for duplicated genes, as ‘Ks peaks’ that deviate from a general decline in
348
frequency with increasing Ks value. For example, the soybean duplicates form a
349
distribution with three peaks reflecting three polyploidizations (SST, LCT, ECH) over
13
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
350
time, although the peaks resulting from more ancient events can be difficult to discern.
351
Ks distributions of inter-legume colinear homoeologs reflect both polyploidization events
352
common to them and speciation events that differentiate them. The peak corresponding to
353
their differentiation is often more prominent than the polyploidization-derived one(s) due
354
to wide-spread gene losses following polyploidization(s). We adopted kernel function
355
analysis to distinguish different components in Ks distributions (See Methods for details),
356
and each Ks distribution was represented by a linear combination of multiple normal
357
distributions, each corresponding to an ancestral event (polyploidization or speciation)
358
(Supplemental Table S11).
359
Both the LCT and ECH produced Ks peaks with divergent locations in different
360
legumes (Fig. 7, Supplemental Table S11) revealing divergent gene evolutionary rates.
361
Lotus japonica has evolved the slowest and peanut the fastest (with a nearly 25%
362
difference). Relative to soybean, gene sequences of other legumes have evolved 17%-24%
363
faster (peanut 23.9%; adzuki bean: 20.7%; mung bean: 19.4%; chickpea: 19.1%; barrel
364
medic: 18.8%; common bean: 17.0%), or 3.9%-11.2% slower (pigeon pea: 3.9%; lotus:
365
11.2%) (Supplemental Table S11).
366
Such high divergence in evolutionary rates may jeopardize efforts to date
367
evolutionary events and perform phylogenetic analysis, hindering understanding of
368
legume biology and evolution. Using soybean as a reference, we performed ‘correction’
369
to other legumes’ evolutionary rates, calibrating the LCT peaks in the other legumes’ Ks
370
distribution to that in soybean (See Materials and Methods for details) (Fig. 7B-C and
371
Supplemental Table S12). Supposing that ECH occurred ~130 Mya (Jiao et al., 2012),
372
then we estimated that LCT occurred ~59 Mya, and peanut (from the Dalbergioid tribe)
373
split from the other legumes about 49.1 Mya, and the Hologalegina (including barrel
374
medic, lotus, and chickpea) and Millettioid tribe (including soybean, pigeon pea, mung
375
bean, adzuki bean, common bean) split 48.1 Mya.
376
Inference of ancestral genome content
377
By using information of event-related colinearity, we inferred gene content at the major
378
evolutionary nodes of legumes (Fig. 8). Two colinear orthologs from different genomes
14
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
379
show that the most recent common ancestor had a single ancestral gene at the
380
corresponding location in its genome; whereas two colinear (out)paralogous genes
381
produced by the same polyploidization would derive from an ancestral gene in the
382
paleogenome before the event. Therefore, by referring to the event-related colinear gene
383
table (Table 1), it was quite easy to infer the ancestral gene content at any evolutionary
384
node during the evolution and divergence of these legumes. For example, the most recent
385
common ancestors had at least 22,177 genes for soybean and common bean; 18,935
386
genes for the two peanut genomes; and 28,900 genes for all legumes after the LCT. After
387
the ECH, there were at least 11,672 genes in the eudicot common ancestor.
388
GO analysis
389
By counting genes still in colinearity, we explored how each polyploidization
390
event contributed to copy number variations for genes with different functions. By
391
characterizing Gene Ontology functions, it was clear that each event increased copy
392
numbers for all functional genes but by divergent increments (Fig. S6), and different
393
events have resulted in divergent contributions to enhancement of functions. After the
394
SST, genes related to macromolecular complexes, membrane function and organelle
395
function (classified in view of cellular components), and metallochaperone, molecular
396
regulator and structural activities (classified in view of molecular functions) were
397
significantly retained. The most significantly preserved genes were related to
398
macromolecular complexes, accounting for up to 9.24% of the SST alpha duplicates but
399
only 6.15% of all genes in the genome (Fisher’s exact test p-value = 6.75 x 10-35)
400
(Supplemental Table S13).
401
In contrast, genes being least increased by the SST were related to catalytic
402
activities (p-value = 1.04 x 10-70), and nearly all genes relating to biological process were
403
not increased with the exception of those relating to localization.
404
By checking the barrel medic genome, we can evaluate what genes are likely to be
405
removed from the soybean after the SST. These genes are still in the barrel medic
406
genome, but have no corresponding copies at the expected locations in soybean, which
407
could be a result of post-polyploidy instability (Fig. S7). Genes in metabolic processes
15
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
408
(P-value = 5.08 x 10-8), catalytic activity (P-value = 3.7 x 10-12), and molecular binding
409
(P-value = 4.5 x 10-4) were frequently not deleted or transposed. Comparatively, genes
410
related to biological regulation (P-value = 8.6 x 10-10), membrane part (P-value = 4.24 x
411
10-5), and nucleic acid binding transcription factors (P-value = 2.6 x 10-6), were
412
frequently deleted or transposed (Supplemental Table S14).
413
Nodulation and oil synthesis
414
A topic of singular importance to legume biology is whether recursive
415
polyploidizations have contributed to the evolution of key traits such as nodulation
416
associated with the symbiotic nitrogen fixation that is a distinguishing feature of legumes.
417
Legumes have divergent numbers of nodulation related genes (Supplemental Table S15).
418
Using the reported soybean nodulation genes as seeds (Schmutz et al., 2010), we detected
419
their homologs in all legumes at Blastp E-value < 1e-10 and score > 150 (Supplemental
420
Table S15). Soybean has the most nodulation-related genes (1,702), comprising 4
421
families of 50 or fewer genes, and 3 families of more than 200 genes (Supplemental
422
Table S16). We wanted to know whether the recursive polyploidizations had contributed
423
to their expansion. Since large gene families are excluded from inferences of colinearity
424
(see Methods) and therefore under-represented in the colinear gene table, to investigate
425
whether recursive polyploidizations had contributed to their expansion we plotted the
426
distribution of the nodulation-related genes in the whole genome, also showing colinear
427
genes related to each polyploidization (Fig. 9). Notably, in soybean we found that 78%,
428
74%, and 66% of nodulation-related genes could be located at paralogous chromosomal
429
regions related to the three polyploidization events (SST, SCT, and ECH), respectively.
430
Genes involved in younger polyploidization(s) could also be involved in older events if
431
they have a paralogous copy produced by the latter. Nonetheless, these finding showed
432
that polyploidizations may have contributed to the increase of nodulation-related gene
433
copy numbers, with increases of 73 in the soybean-specific tetraploidization (Fig. 9), 284
434
in the ECH (based on barrel medic) and 852 related to the LCT. Similar findings have
435
been observed in the other legumes.
16
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
436
While new genes can be produced by tandem duplications and transposon
437
activities, these events produced fewer genes than polyploidization. At a Ks < 0.15, a
438
time after or overlapping the SST event, we found more than 13 genes residing in
439
duplicated regions from soybean chromosomes 4 and 5; 7 and 8; and 11 and 12 that were
440
clearly produced by the SST. We also found young tandem gene clusters on
441
chromosomes 16, 14, 9 and others, and young transposed genes on many other
442
chromosomes (Fig. 9). One tandem cluster on chromosome 16 contains more than 20
443
young duplicated genes, some with Ks ~ 0 and four pairs with Ks < 0.015, involving 6
444
genes (Glyma16g07010.1, Glyma16g07051.1, Glyma16g07031.1, Glyma16g07060.1,
445
Glyma16g30695.1 and Glyma16g30911.1) showing a hotspot of new gene production.
446
Then, we checked how polyploidizations affected the copy number variation of
447
genes participating in the synthesis of high concentrations of seed oils that are an
448
important economic product of many legumes. Oil synthesis related (OSR) genes could
449
be classified into 9 different functions: Synthesis of fatty acids in plastids, synthesis and
450
storage of oil, metabolism of acyl lipids in mitochondria, lipid signaling, fatty acid
451
elongation and wax and cutin metabolism, synthesis of membrane lipids in
452
endomembrane system, degradation of storage lipids and straight fatty acids, and
453
miscellaneous functions, as reported previously (Wang and Brendel, 2006; Schmutz et al.,
454
2010). Each of these families has more than 50 genes in soybean (Supplemental Table
455
S17). There are more than 850 OSR genes in the peanut genomes, and 1,528 in soybean
456
(Supplemental Table S18). In peanut, 42% and 22% of OSR genes can be related to
457
paralogous regions produced by the LCT and ECH events, respectively; in soybean, 65%,
458
58%, and 27% of OSR genes can be related to the SST, LCT, and ECH events,
459
respectively (Supplemental Fig. S8). This shows that each of these polyploidizations
460
may have expanded the OSR families, which also seems true in other legumes. As with
461
nodulation genes, tandem duplications and transposon activities might also have
462
contributed to expansion of the OSR families. At a Ks < 0.15, more than 13 genes
463
residing in duplicated regions of soybean chromosomes 4 and 6; 7 and 8; 11 and 12; and
464
14 and 17, were clearly produced by the SST. We also found young small tandem clusters
465
= 50% and Identity >= 60%).
555
This could have been resulted from gene divergence and gene loss. With the best matched
556
genes, about half of them share gene colinearity between genomes. We got similar
557
findings with barrel medic as a reference. These findings suggest that gene movement,
558
possibly involving transposons, may contribute to genomic fractionation. With EST, at
559
coverage >= 30% and identity >= 90%, we found that there are at least 50% of genes
560
having no EST support, which suggests that legume gene annotations need much to
561
improve. The annotation of genes would affect the inference of gene colinearity, and
562
therefore affect the characterization of gene losses and genomic fractionation. We would
563
update our inference based on latest versions of annotated genes in the future.
564
Unbalanced evolutionary rates among legumes
565
Duplicated genes deriving from a shared duplication event provide a direct means to
566
compare evolutionary rates among taxa. In grasses, divergence of duplicated genes
567
produced by a grass-common tetraploidization shows that 8.5-48% divergence in
568
evolutionary rates, with rice being the slowest (Wang et al., 2015). A phylogenetic
569
analysis with mulberry genes and their orthologs from Rosales relatives showed that
570
mulberry evolved much (even 3 times) faster than other Rosales (He et al., 2013).
571 572
Polyploidization itself may drive genes with duplicates to evolve faster, as
573
duplicated genes may buffer mutations in one another, possibly resulting in
574
neofunctionalization or subfunctionalization. For example, cotton genes affected by a
575
decaploidization may have evolved 19% and 15% faster than orthologs in cacao that have
576
not experienced duplication since the two taxa diverged (Wang et al., 2016). Further,
577
genes from a duplicated pair of grass chromosomes affected by gene conversion, evolved
578
faster than those not affected by gene conversion (Wang et al., 2009).
579
Unexpectedly, duplicated genes in soybean, affected by the SST, did not
580
necessarily evolve faster than those of other legumes. With soybean as a reference, genes
581
in peanut, adzuki bean, mung bean, chickpea, barrel medic, and common bean evolve
21
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
582
faster and those in pigeon pea and lotus are slower than soybean genes. This weakens the
583
generalization that duplicated genes evolve faster than single-copy genes, perhaps
584
pointing to the importance of other factors such as living in different environments for
585
millions of years.
586
MATERIALS AND METHODS
587
Genomic materials
588
We downloaded genomic sequences and annotations from respective websites for each
589
genome projects, for which complete information can be found at the Supplemental
590
table S29.
591
Inferring gene colinearity
592
With annotated genes as input, chromosomes from within a genome or between different
593
genomes were compared. Firstly, by performing BLASTP (Altschul et al., 1990), protein
594
sequences were searched against one another to find potentially homologous genes (E-
595
value < 1e-5). A smaller E-value may involve more-diverged homologous genes and help
596
find ancient duplicated genes. Secondly, information of gene homology was used as input
597
for the software ColinearScan (Wang et al., 2006) to locate homologous gene pairs in
598
colinearity. The key parameter, the maximum gap was set to be 50 intervening genes, as
599
adopted in previous genomics research (Wang et al., 2015; Wang et al., 2016). Large
600
gene families with 30 or more copies in a genome were removed from inferring
601
colinearity.
602
Inferring genomic homology
603
To infer chromosomal homology in legumes, we used the grape genome as an
604
outgroup reference, which provide information of chromosome homology transitively.
605
The grape genome preserves much of the ancestral genome structure before and after the
606
ECH that was common to most eudicot plants (Bowers et al., 2003; Jaillon et al., 2007),
607
much better than other sequenced eudicot genomes, which are often affected by further
608
polyploidization(s). The grape genome was important to reveal and distinguish
22
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
609
paralogous blocks within legume genomes that were produced by the ECH event or not.
610
Due to the ECH, any one grape genomic region often has 2 paralogous regions within
611
grape itself, and more in legume genomes. Dotplots of genomic homology between
612
genomes produced by our custom software were used to help distinguish orthologous and
613
outparalogous regions between different genomes.
614
We produced dotplots between grape and other legumes. For example, we show
615
how grape-barrel medic homology dotplot helps understand barrel medic genome
616
structure. The 19 chromosomes of grape were denoted with blocks in 7 colors,
617
corresponding to 7 ancestral eudicot chromosomes before the ECH. Due to the ECH, and
618
the legume-specific LCT, we anticipated that a grape region would have 2 orthologous
619
barrel medic regions, which are paralogous to one another, and 4 outparalogous regions
620
(Supplemental Fig. S1). In the grape-barrel medic dotplot, orthologous and
621
outparalogous blocks can be inferred without much difficulty. A grape chromosomal
622
region is often much more similar, measured by collinear gene number, to its barrel
623
medic orthologous regions than to the outparalogous regions. Some outparalogous blocks
624
can have few homologous gene dots and can only inferred by transitively using paralogy
625
between grape chromosomes (Supplemantal Fig. S11, and detailed in supplemental
626
text). Ideally, a grape chromosome would have 2 orthologous corresponding regions.
627
However, often they are broken into pieces by chromosomal rearrangement. A
628
complementary pattern of broken segments helps infer their being derived from the same
629
ancestral chromosome.
630
The above strategy was also applied to comparative analysis between grape and
631
other legumes. To infer intragenomic homology in soybean after its specific SST, we
632
used the barrel medic genome as reference.
633 634
Supplemental Methods
635
Description of details about inferring genomic colinearity, estimating nucleotide
636
substitution, evolutionary dating, modeling gene loss, inferring Gene-Ontology, and
637
polyploidization and NBS-LRR genes can be found in Supplemental text.
638
23
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
639
SUPPLEMENTAL DATA
640
The following supplemental materials are available.
641
Supplemental Text. Description of details about inferring genomic colinearity,
642
estimating nucleotide substitution, evolutionary dating, modeling gene loss, and inferring
643
Gene-Ontology.
644
Supplemental Figure S1. Homologous dotplot between Vitis vinifera and Medicago
645
truncatula genomes.
646
Supplemental Figure S2. Homologous dotplot between Vitis vinifera and Arachis
647
duranensis genomes.
648
Supplemental Figure S3. Homologous dotplot between Vitis vinifera and Arachis
649
ipaensis genomes.
650
Supplemental Figure S4. Homologous dotplot between Medicago (Medicago
651
truncatula) and Soybean (Glycine max) genomes.
652
Supplemental Figure S5. Homologous alignments of 10 legume genomes with
653
Medicago truncatula as reference.
654
Supplemental Figure S6. GO analysis distribution of Glycine max retention genes
655
produced by ECH, LCT and SST.
656
Supplemental Figure S7. GO analysis distribution of Glycine max lost genes in ECH,
657
LCT, SST and LCT-SST
658
Supplemental Figure S8. Oil genes amplification model related to gene duplication
659
events in soybean.
660
Supplemental Figure S9. NBS-class genes amplification model related to gene
661
duplication events in soybean.
662
Supplemental Figure S10. NBS-domains genes amplification model related to gene
663
duplication events in soybean.
664
Supplemental Figure S11. Homologous dotplot between grape and barrel medic
665
chromosomes.
666
Supplemental Table S1. Number of homologous blocks and gene pairs within a genome
667
or between genomes.
24
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
668
Supplemental Table S2. Number of homologous genes within a genome or between
669
genomes.
670
Supplemental Table S3. Number of paralogous, orthologous and out-paralogous gene
671
pairs within a genome or between genomes.
672
Supplemental Table S4. Number of paralogous, orthologous and out-paralogous genes
673
within a genome or between genomes.
674
Supplemental Table S5. Number of paralogous, orthologous and out-paralogous blocks
675
within a genome or between genomes.
676
Supplemental Table S6. Legume gene loss rates and gene translocation with grape as
677
reference genome.
678
Supplemental Table S7. Legume gene loss and gene translocation rates with medicago
679
as reference genome.
680
Supplemental Table S8. Legume gene loss and gene translocation rates with common
681
bean as reference genome.
682
Supplemental Table S9. The observed distribution of gene loss and translocation
683
numbers fitted by using different density curves of geometry distribution.
684
Supplemental Table S10. Gene retention in soybean duplicated chromosomes.
685
Supplemental Table S11. Kernel function analysis of Ks distribution related to
686
duplication events within each genome and between selected legumes (before
687
evolutionary rate correction).
688
Supplemental Table S12. Kernel function analysis of Ks distribution related to
689
duplication events within each genome and between selected legumes (after evolutionary
690
rate correction).
691
Supplemental Table S13. GO analysis distribution of Glycine max retention genes
692
produced by CEH, LCT and SST.
693
Supplemental Table S14. GO analysis distribution of Glycine max lost genes in CEH,
694
LCT, SST and LCT-SST.
695
Supplemental Table S15. Nodulation genes related to duplication events in each legume
696
genome.
697
Supplemental Table S16. Nodulation 7 subfamily genes related to duplication events in
698
soybean genome.
25
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
699
Supplemental Table S17. Oil 9 subfamily genes related to duplication events in soybean
700
genome.
701
Supplemental Table S18. Oil genes related to duplication events in each legume
702
genome.
703
Supplemental Table S19. NBS-CC genes related to duplication events in each legume
704
genome.
705
Supplemental Table S20. NBS-TIR genes related to duplication events in each legume
706
genome.
707
Supplemental Table S21. NBS-TNL genes related to duplication events in each legume
708
genome.
709
Supplemental Table S22. NBS-TNx genes related to duplication events in each legume
710
genome.
711
Supplemental Table S23. NBS-xNL genes related to duplication events in each genome.
712
Supplemental Table S24. NBS-xNx genes related to duplication events in each genome.
713
Supplemental Table S25. Bidirectional BLAST searched against all annotated genes
714
between grape and legume.
715
Supplemental Table S26. Bidirectional BLAST searched against all annotated genes
716
between barrel medic and other legumes.
717
Supplemental Table S27. Barrel medic soybean, and lotus genes against their respective
718
EST sequences (Alignment of coverage≥30%).
719
Supplemental Table S29. Information of original data material.
720 721
ACKNOWLEDGEMENTS
722
We thank Liming Zhou for helpful discussions about the manuscript.
723 724
26
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
725 726 727
728 729 730
TABLES Table 1. Number of duplicated genes within legume genomes related to ECH, LCT and SST. Species
ECHa-related
LCTb-related
SSTc-related
V. vinifera
86/2,423/3,851d
---
---
M. truncatula
194/2,504/2,961
309/3,600/4,796
---
C. arietinum
317/2,998/3,936
257/2,913/4,743
---
A. duranensis
124/1,891/2,747
96/2,094/3,847
---
A. ipaensis
115/1,861/2,697
106/2,205/3,928
---
V. radiata
100/1,521/ 2,223
68/1,378/2,529
---
V. angularis
25/447/611
53/939/1,482
---
P. vulgaris
126/2,579/3,440
109/3,043/4,853
---
L. japonicus
63/1,185/1,710
97/2,082/3,116
---
C. cajan
26/341/588/
30/464/869
---
G. max
344/3,663/2,575
343/8317/9,486
133/10,312/19,210
a
Core eudicot-common hexaploid; b Legume-common tetraploid; c Soybean specific tetraploid; d block/ gene pairs/ gene number.
27
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
731 732 733 734
FIGURE LEGENDS Figure 1. Species and gene phylogenetic tree. A, Phylogenetic tree of G. max (G), A.
735
duranensis (A), A. ipaensis (B), M. truncatula (M), P. vulgaris (P), L. japonicus (L), C.
736
arietium (E), C. cajan (C), V. angularis (U), V. radiata (R), and V. vinifera (V); The
737
Eudicot-common hexaploidy (ECH) is denoted by blue hexagon, legume-common
738
tetraploidy (LCT) by red square, and soybean-specific tetraploidy (SST) by yellow square;
739
B, Gene phylogenetic tree: three paralogous genes in the V. vinifera genome, V1, V2 and
740
V3, produced by the ECH, each have two orthologs in non-soybean legume genomes, and
741
four orthologs in soybean.
742 743
Figure 2. Homologous alignments of legume genomes with V. vinifera as a reference.
744
Genomic paralogy, orthology, and outparalogy information within and among 10
745
legumes, with same name abbreviations as in Fig. 1, are displayed in 69 circles, each
746
corresponding to an extant gene in Fig. 1b; The curved lines within the inner circle are
747
formed by 19 grape chromosomes color-coded to correspond to the 7 ancestral
748
chromosomes before the ECH. The short lines forming the innermost grape chromosome
749
circles represent predicted genes, which have 2 sets of paralogous regions, forming
750
another two circles. Each of the three sets of grape paralogous chromosomal regions has
751
2 orthologous copies in a legume with exception of soybean, which has 4. The resulting
752
69 circles were marked according to species by a capital letter, as defined in Fig. 1. Each
753
circle has an underline colored as to its source plant corresponding to the color scheme in
754
Fig. 1a and each circle is formed by short vertical lines that denote homologous genes,
755
colored as to chromosome number in their respective source plant as shown in the inset
756
color scheme.
757 758
Figure 3. Local alignment in selected genomes. grape, barrel medic, and soybean. The
759
graph shows details of a short segment of alignment marked out by a triangle in Fig. 2.
760
Homologous block phylogeny (left): three paralogous chromosome segments in the grape
761
genome, Grape-14, Grape-05 and Grape-07, from ancestral chromosomes affected by
762
ECH, each with two orthologous barrel medic and four soybean chromosome segments.
28
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
763
Chromosome numbers are shown after the names of plants, and locations on
764
chromosomes are also shown. A gene is shown by a rectangle with a small arrow
765
indicating its transcriptional direction. Homologous genes between neighboring
766
chromosomal regions are linked with lines.
767 768
Figure 4. Fitting a geometric distribution and gene loss rates. G. max to the V. vinifera
769
(A), M. truncatula (B) and P. vulgaris (C) genomes. The x-axis means numbers of
770
continuously missing genes in gene-colinearity regions.
771 772
Figure 5. Homologous aligments and G. max gene retention along corresponding
773
orthologous M. truncatula chromosomes. Genomic paralogy, orthology information
774
within and among genomes are displayed in 3 circles; The short lines forming the
775
innermost M. truncatula chromosome circles represent predicted genes. Each of the M.
776
truncatula paralogous chromosomal regions has 2 orthologous copies in soybean. Each
777
circle is formed by short vertical lines that denote homologous genes, colored as to
778
chromosome number in their respective source plant as shown in the inset color scheme.
779
(A) Rates of retained genes in sliding windows of soybean homoelogous region group 1
780
(red), homoelogous region group 2 (black). (B) the difference between two groups (blue)
781
are displayed.
782 783
Figure 6. Chromosome representation by using the 7 eudicot ancestral chromosomes and
784
those of P. vulgaris. Each chromosome from grape and legume genomes are firstly
785
represented by genes colinear to grape. Genes are denoted by short lines in 7 different
786
colors related to ancestral chromosomes before the ECH. Secondly, with the exception of
787
P. vulgaris, chromosomes from the other 10 legumes are represented by genes having P.
788
vulgaris colinear genes, and these colinear genes in each plant are colored as to P.
789
vulgaris chromosomes where their orthologs reside. Thus, a chromosome in the legume
790
genomes is displayed in two sets of short lines arranged side by side.
791 792
Figure 7. Dating evolutionary events within and among the legume genomes. soybean
793
(G), peanut (A&B), barrel medic (M), common bean (P), lotus (L), chickpea (E), pigeon
29
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
794
pea (C), adzuki bean (U), mung bean (R), and grape (V). A, Distribution of average
795
synonymous substitution levels (Ks) between colinear gene pairs in inter-genomic (solid
796
curves) and intra-genomic blocks (dashed curves). B, Distribution of average
797
synonymous substitution levels after correction to account for the evolutionary rate of
798
soybean genes. C, Correction to the Ks distribution and occurrence of key evolutionary
799
events.
800 801
Figure 8. Inferred ancestral gene numbers during the evolution of legumes.
802 803
Figure 9. Nodulation gene amplification model related to gene duplication events in
804
soybean. (A) Curved lines within the inner circle, colored by green, link paralog pairs on
805
the 20 soybean chromosomes produced by SST, (B) LCT and (C) ECH. Nodulation
806
subfamily genes are displayed in colors, light salmon (subfamily1), green (subfamily2),
807
grey (subfamily3), yellow (subfamily4), black (subfamily5), blue (subfamily6) and red
808
(subfamily7). Colored curved lines link nodulation gene pairs with Ks < 0.15.
809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831
LITERATURE CITED Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410 Barker MS, Husband BC, Pires JC (2016) Spreading Winge and flying high: The evolutionary importance of polyploidy after a century of study. Am J Bot 103: 1139-1145 Bertioli DJ, Cannon SB, Froenicke L, Huang G, Farmer AD, Cannon EK, Liu X, Gao D, Clevenger J, Dash S, Ren L, Moretzsohn MC, Shirasawa K, Huang W, Vidigal B, Abernathy B, Chu Y, Niederhuth CE, Umale P, Araujo AC, Kozik A, Kim KD, Burow MD, Varshney RK, Wang X, Zhang X, Barkley N, Guimaraes PM, Isobe S, Guo B, Liao B, Stalker HT, Schmitz RJ, Scheffler BE, Leal-Bertioli SC, Xun X, Jackson SA, Michelmore R, Ozias-Akins P (2016) The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat Genet 48: 438-446 Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RW, Chen AH, Edwards TM, Estill JC, Exum HE, Goff VH, Herrick KL, Steele CL, Karunakaran S, Lafayette GK, Lemke C, Marler BS, Masters SL, McMillan JM, Nelson LK, Newsome GA, Nwakanma CC, Odeh RN, Phelps CA, Rarick EA, Rogers CJ, Ryan SP, Slaughter KA, Soderlund CA, Tang H, Wing RA, Paterson AH (2005) Comparative physical mapping links conservation of 30
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875
microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci U S A 102: 13206-13211 Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433-438 Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B, Correa M, Da Silva C, Just J, Falentin C, Koh CS, Le Clainche I, Bernard M, Bento P, Noel B, Labadie K, Alberti A, Charles M, Arnaud D, Guo H, Daviaud C, Alamery S, Jabbari K, Zhao M, Edger PP, Chelaifa H, Tack D, Lassalle G, Mestiri I, Schnel N, Le Paslier MC, Fan G, Renault V, Bayer PE, Golicz AA, Manoli S, Lee TH, Thi VH, Chalabi S, Hu Q, Fan C, Tollenaere R, Lu Y, Battail C, Shen J, Sidebottom CH, Wang X, Canaguier A, Chauveau A, Berard A, Deniot G, Guan M, Liu Z, Sun F, Lim YP, Lyons E, Town CD, Bancroft I, Wang X, Meng J, Ma J, Pires JC, King GJ, Brunel D, Delourme R, Renard M, Aury JM, Adams KL, Batley J, Snowdon RJ, Tost J, Edwards D, Zhou Y, Hua W, Sharpe AG, Paterson AH, Guan C, Wincker P (2014) Plant genetics. Early allopolyploid evolution in the postNeolithic Brassica napus oilseed genome. Science 345: 950-953 Chen X, Li H, Pandey MK, Yang Q, Wang X, Garg V, Li H, Chi X, Doddamani D, Hong Y, Upadhyaya H, Guo H, Khan AW, Zhu F, Zhang X, Pan L, Pierce GJ, Zhou G, Krishnamohan KA, Chen M, Zhong N, Agarwal G, Li S, Chitikineni A, Zhang GQ, Sharma S, Chen N, Liu H, Janila P, Li S, Wang M, Wang T, Sun J, Li X, Li C, Wang M, Yu L, Wen S, Singh S, Yang Z, Zhao J, Zhang C, Yu Y, Bi J, Zhang X, Liu ZJ, Paterson AH, Wang S, Liang X, Varshney RK, Yu S (2016) Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens. Proc. Natl. Acad. Sci. U. S. A 113: 6785-6790 Doyle JJ (2011) Phylogenetic Perspectives on the Origins of Nodulation. Mol Plant Microbe In 24: 1289-1295 Feldman M, Levy AA, Fahima T, Korol A (2012) Genomic asymmetry in allopolyploid plants: wheat as a model. J Exp Bot 63: 5045-5059 Frohlich MW, Chase MW (2007) After a dozen years of progress the origin of angiosperms is still a great mystery. Nature 450: 1184-1189 Garsmeur O, Schnable JC, Almeida A, Jourda C, D'Hont A, Freeling M (2014) Two evolutionarily distinct classes of paleopolyploidy. Mol Biol Evol 31: 448-454 Goebel K (1969) Organography of plants; especially of the Archegoniatae and Spermaphyta, Ed Authorized English. Hafner Pub. Co., New York, He N, Zhang C, Qi X, Zhao S, Tao Y, Yang G, Lee TH, Wang X, Cai Q, Li D, Lu M, Liao S, Luo G, He R, Tan X, Xu Y, Li T, Zhao A, Jia L, Fu Q, Zeng Q, Gao C, Ma B, Liang J, Wang X, Shang J, Song P, Wu H, Fan L, Wang Q, Shuai Q, Zhu J, Wei C, Zhu-Salzman K, Jin D, Wang J, Liu T, Yu M, Tang C, Wang Z, Dai F, Chen J, Liu Y, Zhao S, Lin T, Zhang S, Wang J, Wang J, Yang H, Yang G, Wang J, Paterson AH, Xia Q, Ji D, Xiang Z (2013) Draft genome sequence of the mulberry tree Morus notabilis. Nat Commun 4: 2445
31
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920
International-Wheat-Genome-Sequencing-Consortium (2014) A chromosomebased draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345: 1251788 Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyere C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pe ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quetier F, Wincker P, French-Italian Public Consortium for Grapevine Genome C (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463-467 Jannoo N, Grivet L, David J, D'Hont A, Glaszmann JC (2004) Differential chromosome pairing affinities at meiosis in polyploid sugarcane revealed by molecular markers. Heredity 93: 460-467 Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers JE, McKain MR, McNeal J, Rolf M, Ruzicka DR, Wafula E, Wickett NJ, Wu X, Zhang Y, Wang J, Zhang Y, Carpenter EJ, Deyholos MK, Kutchan TM, Chanderbali AS, Soltis PS, Stevenson DW, McCombie R, Pires JC, Wong GK, Soltis DE, Depamphilis CW (2012) A genome triplication associated with early diversification of the core eudicots. Genome Biol 13: R3 Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum SE, Schuster SC, Ma H, Leebens-Mack J, dePamphilis CW (2011) Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-100 Kang YJ, Kim SK, Kim MY, Lestari P, Kim KH, Ha BK, Jun TH, Hwang WJ, Lee T, Lee J, Shim S, Yoon MY, Jang YE, Han KS, Taeprayoon P, Yoon N, Somta P, Tanya P, Kim KS, Gwag JG, Moon JK, Lee YH, Park BS, Bombarely A, Doyle JJ, Jackson SA, Schafleitner R, Srinives P, Varshney RK, Lee SH (2014) Genome sequence of mungbean and insights into evolution within Vigna species. Nat Commun 5: 5443 Kang YJ, Satyawan D, Shim S, Lee T, Lee J, Hwang WJ, Kim SK, Lestari P, Laosatit K, Kim KH, Ha TJ, Chitikineni A, Kim MY, Ko JM, Gwag JG, Moon JK, Lee YH, Park BS, Varshney RK, Lee SH (2015) Draft genome sequence of adzuki bean, Vigna angularis. Sci Rep 5: 8069 Kashkush K, Feldman M, Levy AA (2002) Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160: 1651-1659 Kellogg EA (2016) Has the connection between polyploidy and diversification actually been tested? Curr Opin Plant Biol 30: 25-32 Lin Y, Cheng Y, Jin J, Jin X, Jiang H, Yan H, Cheng B (2014) Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes. PLoS One 9: e102825
32
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965
Magallon S, Sanderson MJ (2001) Absolute diversification rates in angiosperm clades. Evolution 55: 1762-1780 Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob ur R, Ware D, Westhoff P, Mayer KF, Messing J, Rokhsar DS (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551-556 Paterson AH, Bowers JE, Chapman BA (2004) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. U.S.A 101: 9903-9908 Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, Sasamoto S, Watanabe A, Ono A, Kawashima K, Fujishiro T, Katoh M, Kohara M, Kishida Y, Minami C, Nakayama S, Nakazaki N, Shimizu Y, Shinpo S, Takahashi C, Wada T, Yamada M, Ohmido N, Hayashi M, Fukui K, Baba T, Nakamichi T, Mori H, Tabata S (2008) Genome structure of the legume, Lotus japonicus. DNA Res 15: 227-239 Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178-183 Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C, Torres-Torres M, Geffroy V, Moghaddam SM, Gao D, Abernathy B, Barry K, Blair M, Brick MA, Chovatia M, Gepts P, Goodstein DM, Gonzales M, Hellsten U, Hyten DL, Jia G, Kelly JD, Kudrna D, Lee R, Richard MM, Miklas PN, Osorno JM, Rodrigues J, Thareau V, Urrea CA, Wang M, Yu Y, Zhang M, Wing RA, Cregan PB, Rokhsar DS, Jackson SA (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46: 707-713 Schnable JC, Springer NM, Freeling M (2011) Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A 108: 4069-4074 Soltis DE, Bell CD, Kim S, Soltis PS (2008) Origin and early evolution of angiosperms. Ann N Y Acad Sci 1133: 3-25 Soltis DE, Visger CJ, Marchant DB, Soltis PS (2016) Polyploidy: Pitfalls and paths to a paradigm. Am J Bot 103: 1146-1166 Soltis DE, Visger CJ, Soltis PS (2014) The polyploidy revolution then...and now: Stebbins revisited. Am J Bot 101: 1057-1078
33
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010
Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and genome evolution in plants. Curr Opin Genet Dev 35: 119-125 Tang H, Krishnakumar V, Bidwell S, Rosen B, Chan A, Zhou S, Gentzbittel L, Childs KL, Yandell M, Gundlach H, Mayer KF, Schwartz DC, Town CD (2014) An improved genome release (version Mt4.0) for the model legume Medicago truncatula. BMC Genomics 15: 312 Throude M, Bolot S, Bosio M, Pont C, Sarda X, Quraishi UM, Bourgis F, Lessard P, Rogowsky P, Ghesquiere A, Murigneux A, Charmet G, Perez P, Salse J (2009) Structure and expression analysis of rice paleo duplications. Nucleic Acids Res 37: 1248-1259 Van de Peer Y (2011) A mystery unveiled. Genome Biol 12: 113 Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MT, Azam S, Fan G, Whaley AM, Farmer AD, Sheridan J, Iwata A, Tuteja R, Penmetsa RV, Wu W, Upadhyaya HD, Yang SP, Shah T, Saxena KB, Michael T, McCombie WR, Yang B, Zhang G, Yang H, Wang J, Spillane C, Cook DR, May GD, Xu X, Jackson SA (2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30: 83-89 Varshney RK, Song C, Saxena RK, Azam S, Yu S, Sharpe AG, Cannon S, Baek J, Rosen BD, Tar'an B, Millan T, Zhang X, Ramsay LD, Iwata A, Wang Y, Nelson W, Farmer AD, Gaur PM, Soderlund C, Penmetsa RV, Xu C, Bharti AK, He W, Winter P, Zhao S, Hane JK, Carrasquilla-Garcia N, Condie JA, Upadhyaya HD, Luo MC, Thudi M, Gowda CL, Singh NP, Lichtenzveig J, Gali KK, Rubio J, Nadarajan N, Dolezel J, Bansal KC, Xu X, Edwards D, Zhang G, Kahl G, Gil J, Singh KB, Datta SK, Jackson SA, Wang J, Cook DR (2013) Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat Biotechnol 31: 240-246 Wang BB, Brendel V (2006) Genomewide comparative analysis of alternative splicing in plants. Proc Natl Acad Sci U S A 103: 7175-7180 Wang X, Guo H, Wang J, Lei T, Liu T, Wang Z, Li Y, Lee TH, Li J, Tang H, Jin D, Paterson AH (2016) Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation. New Phytol 209: 1252-1263 Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J (2006) Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice. BMC Bioinformatics 7: 447 Wang X, Tang H, Bowers JE, Paterson AH (2009) Comparative inference of illegitimate recombination between rice and sorghum duplicated genes produced by polyploidization. Genome Res 19: 1026-1032 Wang X, Wang J, Jin D, Guo H, Lee TH, Liu T, Paterson AH (2015) Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events. Mol Plant 8: 885-898 Wang Y, Wang X, Lee TH, Mansoor S, Paterson AH (2013) Gene body methylation shows distinct patterns associated with different gene origins and
34
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039
duplication modes and has a heterogeneous relationship with gene expression in Oryza sativa (rice). New Phytol 198: 274-283 Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH (2011) Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS One 6: e28150 Young ND, Debelle F, Oldroyd GE, Geurts R, Cannon SB, Udvardi MK, Benedito VA, Mayer KF, Gouzy J, Schoof H, Van de Peer Y, Proost S, Cook DR, Meyers BC, Spannagl M, Cheung F, De Mita S, Krishnakumar V, Gundlach H, Zhou S, Mudge J, Bharti AK, Murray JD, Naoumkina MA, Rosen B, Silverstein KA, Tang H, Rombauts S, Zhao PX, Zhou P, Barbe V, Bardou P, Bechner M, Bellec A, Berger A, Berges H, Bidwell S, Bisseling T, Choisne N, Couloux A, Denny R, Deshpande S, Dai X, Doyle JJ, Dudez AM, Farmer AD, Fouteau S, Franken C, Gibelin C, Gish J, Goldstein S, Gonzalez AJ, Green PJ, Hallab A, Hartog M, Hua A, Humphray SJ, Jeong DH, Jing Y, Jocker A, Kenton SM, Kim DJ, Klee K, Lai H, Lang C, Lin S, Macmil SL, Magdelenat G, Matthews L, McCorrison J, Monaghan EL, Mun JH, Najar FZ, Nicholson C, Noirot C, O'Bleness M, Paule CR, Poulain J, Prion F, Qin B, Qu C, Retzel EF, Riddle C, Sallet E, Samain S, Samson N, Sanders I, Saurat O, Scarpelli C, Schiex T, Segurens B, Severin AJ, Sherrier DJ, Shi R, Sims S, Singer SR, Sinharoy S, Sterck L, Viollet A, Wang BB, Wang K, Wang M, Wang X, Warfsmann J, Weissenbach J, White DD, White JD, Wiley GB, Wincker P, Xing Y, Yang L, Yao Z, Ying F, Zhai J, Zhou L, Zuber A, Denarie J, Dixon RA, May GD, Schwartz DC, Rogers J, Quetier F, Town CD, Roe BA (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480: 520-524 Zhu H, Choi HK, Cook DR, Shoemaker RC (2005) Bridging model and crop legumes through comparative genomics. Plant Physiol 137: 1189-1196
35
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Fi gur e1.Spec i esandgenephy l ogenet i ct r ee.A,Phy l ogenet i ct r eeofG.max( G) ,A.dur anens i s( A) ,A.i paens i s( B) ,M.t r unc at ul a( M) ,P.v ul gar i s( P) ,L.j aponi c us( L) ,C.ar i et i um ( E) ,C.c aj an( C) ,V.angul ar i s( U) , V.r adi at a( R) ,andV.v i ni f er a( V) ;TheEudi c ot c ommonhex apl oi dy( ECH)i sdenot edbybl uehex agon,l egumec ommont et r apl oi dy( LCT)byr eds quar e,ands oy beans pec i f i ct et r apl oi dy( SST)byy el l ow s quar e; B,Genephy l ogenet i ct r ee:t hr eepar al ogousgenesi nt heV.v i ni f er agenome,V1,V2andV3,pr oduc edby t heECH,eac hhav et woor t hol ogsi nnons oy beanl egumegenomes ,andf ouror t hol ogsi ns oy bean.
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Fi gur e2.Homol ogousal i gnment sof10l egumegenomeswi t hV.vi ni f er aasar ef er enc e.Genomi c par al ogy ,or t hol ogy ,andout par al ogyi nf or mat i onwi t hi nandamong10l egumesar edi s pl ay edi n69c i r c l es , eac hc or r es pondi ngt oanex t antgenei nFi g.1B;Thec ur v edl i neswi t hi nt hei nnerc i r c l ear ef or medby19 gr apec hr omos omesc ol or c odedt oc or r es pondt ot he7anc es t r al c hr omos omesbef or et heECH.Thes hor t l i nesf or mi ngt hei nner mos tgr apec hr omos omec i r c l esr epr es entpr edi c t edgenes ,whi c hhav e2s et sof par al ogousr egi ons ,f or mi nganot hert woc i r c l es .Eac hoft het hr ees et sofgr apepar al ogousc hr omos omal r egi onshas2or t hol ogousc opi esi nal egumewi t hex c ept i onofs oy bean,whi c hhas4.Ther es ul t i ng69 c i r c l eswer emar k edac c or di ngt os pec i esbyac api t al l et t er ,asdef i nedi nFi g.1.Eac hc i r c l ehasan under l i necol or edast oi t ssour cepl antcor r espondi ngt ot hecol orschemei nFi g.1Aandeachci r cl ei s f or medbyshor tver t i cal l i nest h atdenot e ho o go us-g enes,c owww.plantphysiol.org l or edast ochr omosomenumberi nt hei r Downloaded from onmo Julyl 21, 2017 Published by Copyright © 2017 American Society of Plant Biologists. All rights reserved. r espect i vesour cepl antasshowni nt hei nsetcol orscheme.
Cor eeudi cotcommonhexapl oi dy ( gamma, ~130mya)
Legumecommont er apl oi dy ( bet a, ~59mya)
Soybeanspeci f i ct et r apl oi dy ( al pha, ~13mya)
Fi gur e3.Localal i gnmenti nsel ect edgenomes.gr ape,bar r elmedi c,andsoybean.Thegr aphshowsdet ai l sofashor t segmentofal i gnmentmar kedoutbyat r i angl ei nFi gur e2A.Homol ogousbl ockphyl ogeny( l ef t ) :t hr eepar al ogouschr omosomesegment si nt hegr apegenome,Gr ape14,Gr ape05andGr ape07,f r om ancest r alchr omosomesaf f ect edby ECH,eachwi t ht woor t hol ogousmedi cagoandf oursoybeanchr omosomesegment s.Chr omosomenumber sar eshown af t ert henamesofpl ant s,andl ocat i onsonchr omosomesar eal soshown.Agenei sshownbyar ect angl ewi t hasmal l ar r ow i ndi cat i ngi t st r anscr i pt i onaldi r ect i on.Homol ogousgenesbet weennei ghbor i ngchr omosomalr egi onsar el i nked wi t hl i nes.
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A
B
C
Fi gur e4.Fi t t i ngageomet r i cdi s t r i but i onandgenel os sr at es .G.maxt ot heV.v i ni f er a( A) ,M.t r unc at ul a( B) andP.v ul gar i s( C)genomes .Th ex ax i sfrom meon an sn umb er ofc on t i n uous l ymi s s i nggenesi ngenec ol i near i t y Downloaded July 21, 2017 -s Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. r egi ons .
Fi gur e5.Homol ogousal i gment sandG.maxgener et ent i onal ongc or r es pondi ngor t hol ogous M.t r unc at ul ac hr omos omes .Genomi cpar al ogy ,or t hol ogyi nf or mat i onwi t hi nandamonggenomesar e di s pl ay edi n3c i r c l es ;Thes hor tl i nesf or mi ngt hei nner mos tM.t r unc at ul ac hr omos omec i r c l esr epr es ent pr edi c t edgenes .Eac hoft heM.t r unc at ul apar al ogousc hr omos omalr egi onshas2or t hol ogousc opi es i ns oy bean.Eac hc i r c l ei sf or medbys hor tv er t i c all i nest hatdenot ehomol ogousgenes ,c ol or edast o c hr omos omenumberi nt hei rr es pec t i v es our c epl antass howni nt hei ns etc ol ors c heme.( A)Rat esof r et ai nedgenesi ns l i di ngwi ndowsofs oy beanhomoel ogousr egi ongr oup1( r ed) ,homoel ogousr egi on gr oup2( bl ac k ) .( B)t hedi f f er enc ebet weent wogr oups( bl ue)ar edi s pl ay ed. Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
Fi gur e6.Chr omosomer epr esent at i onbyusi ngt he7eudi cotancest r al chr omosomesandt hoseofP.vul gar i s.Eachchr omosomef r om gr apeand 10l egumegenomesar ef i r st l yr epr esent edbygenescol i neart ogr ape.Genesar edenot edbyshor tl i nesi n7di f f er entcol or sr el at edt oancest r al chr omosomesbef or et heECH.Secondl y,wi t ht heexcept i onofP.vul gar i s,chr omosomesf r om t heot her9l egumesar er epr esent edbygenes havi ngP.vul gar i scol i neargenes,andt hesecol i neargenesi neachpl antar ecol or edast oP.vul gar i schr omosomeswher et hei ror t hol ogsr esi de. Thus,achr omosomei nt he9 l egume gen me si sdi s pl ayedby i n t woset sofshor tl i nesar r angedsi debysi de. Downloaded from ono July 21, 2017 - Published www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A
B
C
Fi gur e7.Dat i ngevol ut i onar yevent swi t hi nandamong9l egumegenomes.soybean( G) ,peanut( A&B) , bar r el medi c( M) ,commonbean( P) ,l ot us( L) ,chi ckpea( E) ,pi geonpea( C) ,adzuki bean( U) ,mungbean ( R) ,andgr ape( V) .A,Di st r i but i onofaver agesynonymoussubst i t ut i onl evel s( Ks)bet weencol i neargene pai r si ni nt er genomi c( sol i dcur ves)andi nt r agenomi cbl ocks( dashedcur ves) .B,Di st r i but i onofaver age synonymoussubst i t ut i onl evel saf t ercor r ect i ont oaccountf ort heevol ut i onar yr at eofsoybeangenes.C, Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Cor r ect i ont ot heKsdi st r i but i onand ccu r r enceSociety ofkeof yPlant evol ut i onar ye ven t s. Copyright ©o 2017 American Biologists. All rights reserved.
Fi gur e8.I nf er r edanc es t r al genenumber sdur i ngt heev ol ut i onofl egume.
Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.
A
B
C
Fi gur e9.Nodul at i ongeneampl i f i cat i onmodelr el at edt ogenedupl i cat i onevent si nsoybean.A,Cur vedl i neswi t hi nt hei nnerc i r cl e,col or edbygr een, l i nkpar al ogpai r sont he20soybeanc hr omosomespr oducedbySST,B,LCTandC,ECH.Nodul at i onsubf ami l ygenesar edi spl ayedi ncol or s,l i ght sal mon( subf ami l y1) ,gr een( subf ami l y2) ,gr ey( subf ami l y3) ,yel l ow ( subf ami l y4) ,bl ack( subf ami l y5) ,bl ue( subf ami l y6)andr ed( subf ami l y7) .Col or ed cur vedl i nesl i nknodul at i ongenepai r swi t hKs