Hierarchically aligning 10 legume genomes ... - Plant Physiology

4 downloads 0 Views 7MB Size Report
Mar 21, 2017 - foundation for further genomics exploration in the legume research ...... YP, Lyons E, Town CD, Bancroft I, Wang X, Meng J, Ma J, Pires JC, King ...
Plant Physiology Preview. Published on March 21, 2017, as DOI:10.1104/pp.16.01981

1 2

Running Title: Hierarchical alignment of legume genomes

3 4

Corresponding author:

5

Xiyin Wang

6

School of Life Sciences and Center for Genomics and Computational Biology, North

7

China University of Science and Technology, Tangshan, Hebei 063000, China

8

Tel: 86-315-3721512

9

E-mail: [email protected]

10

1

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Copyright 2017 by the American Society of Plant Biologists

11 12

Title: Hierarchically aligning 10 legume genomes establishes a family-level genomics

13

platform

14 15

Authors: Jinpeng Wang, Pengchuan Sun, Yuxian Li, Yinzhe Liu, Jigao Yu, Xuelian Ma,

16

Sangrong Sun, Nanshan Yang, Ruiyan Xia, Tianyu Lei, Xiaojian Liu, Beibei Jiao, Yue

17

Xing, Weina Ge, Li Wang, Zhenyi Wang, Xiaoming Song, Min Yuan, Di Guo, Lan

18

Zhang, Jiaqi Zhang, Dianchuan Jin, Wei Chen, Yuxin Pan, Tao Liu, Ling Jin, Jinshuai

19

Sun, Jiaxiang Yu, Rui Cheng, Xueqian Duan, Shaoqi Shen, Jun Qin, Meng-chen Zhang,

20

Andrew H. Paterson, Xiyin Wang*

21 22

School of Life Sciences, North China University of Science and Technology, Tangshan,

23

Hebei 063000, China (J.W., Y.Li, Y.Liu, J.Y., X.M., S.Sun., N.Y., R.X., T.Lei, X.L.,

24

W.G., L.W., Z.W., X.S., M.Y., D.G., L.Z., J.Z., Y.P., J.S., J.Y., R.C., X.D., S.Shen,

25

X.W.); Center for Genomics and Computational Biology, North China University of

26

Science and Technology, Tangshan, Hebei 063000, China (J.W., P.S., Y.Li, Y.Liu, J.Y.,

27

S.Sun., N.Y., T.Lei, B.J., Y.X., W.G., L.W., Z.W., X.S., M.Y., D.G., L.Z., J.Z., D.J.,

28

W.C., Y.P., T.Liu, L.J., J.Y., X.W.); Cereal & Oil Crop Institute, Hebei Academy of

29

Agricultural and Forestry Sciences No. 162, Hengshanjie Street, Shijiazhuang, 050035,

30

China (J.Q., M.Z.); Plant Genome Mapping Laboratory, University of Georgia, Athens,

31

GA, 30605, USA (A.P.)

32 33

One-sentence summary: A hierarchical and event-related alignment laid a solid

34

foundation for further genomics exploration in the legume research community and

35

beyond.

36 37

2

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

38

Footnotes:

39

1

40

Key Research Project “Seven Key Crop Breeding Project” (SQ2016ZY03002918), China

41

National Science Foundation (3151333 to J.W. and 31371282 to X.W.), Natural Science

42

Foundation of Hebei Province (C2015209069 to J.W. and C2016209097 to W.G.). Hebei

43

New Century 100 Creative Talents Project, Hebei 100 Talented Scholars project, and

44

Tangshan Key Laboratory Project to X.W.; National fund cultivation project of North

45

China University of Science and Technology (GP201508) to D. J., US National Science

46

Foundation (ACI1339727) to X.W. and A.P., and GA Peanut Commission and

47

Southeastern Peanut Research Initiative to A.P.

We appreciate financial support from the China Department of Science and Technology

48 49

*

Address correspondence to [email protected]

50 51

The author responsible for distribution of materials integral to the findings presented in

52

this article in accordance with the policy described in the Instructions for Authors

53

(www.plantphysiol.org) is: Xiyin Wang ([email protected]).

54 55

X.W. conceived and led the research. J.W. implemented and coordinated the analysis.

56

P.S., Y.Li, Y.Liu, R.X., X.M., J.Y., N.Y., S.Sun, X.L., B.J., Y.X., X.S., J.Z., L.J., J.S.,

57

J.Y., R.C., X.D., S.Shen performed the analysis. T.Liu and T.Lei contributed analyzing

58

tools. W.G., L.W., Z.W., L.Z., D.G., D.J., Y.P., J.Q., M.Z. performed the analysis with

59

constructive discussions. X.W., A.P., and J.W. wrote the manuscript.

60 61

3

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

62 63 64

ABSTRACT

65

wild peanuts, barrel medic, etc, have been sequenced. However, a family-level

66

comparative genomics analysis has been unavailable. With grape and selected legume

67

genomes as outgroups, we managed to perform a hierarchical and event-related alignment

68

of these genomes and deconvoluted layers of homologous regions produced by ancestral

69

polyploidizations or speciations. Consequently, we illustrated genomic fractionation

70

characterized by wide-spread gene losses after the polyploidizations. Notably, high

71

similarity in gene retention between recently duplicated chromosomes in soybean

72

supported a likely autopolypoidy nature of its tetraploid ancestor. Moreover, though most

73

gene losses were nearly random, largely but not fully described by geometric distribution,

74

we showed that polyploidization contributed divergently to copy number variation of

75

important gene families. Besides, we showed significantly divergent evolutionary levels

76

among legumes, and by performing Ks correction, re-dated major evolutionary events

77

during their expansion. The present effort laid a solid foundation for further genomics

78

exploration in the legume research community and beyond. We described only a tiny

79

fraction of legume comparative genomics analysis that we performed, and more

80

information was stored in the newly constructed Legume Comparative Genomics

81

Research Platform (www.legumegrp.org).

82 83

Key words: Legume, Polyploidization, Whole-genome alignment, Genomic

84

fractionation, Gene colinearity

Mainly due to their economic importance, genomes of 10 legumes, including soybean,

85

4

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

86

INTRODUCTION

87

The Fabaceae, Leguminosae or Papilionaceae, commonly known as the legume, pea, or

88

bean family, is a large and economically important monophyletic family of flowering

89

plants. It includes trees, shrubs, and perennial or annual herbaceous plants, which are

90

easily recognized by their fruit (legume) and their compound, stipulated leaves (Goebel,

91

1969). As the third-largest land plant family, legumes are widely distributed and divided

92

into 650 genera and over 18,860 species, accounting for about 7% of flowering plant

93

species (Magallon and Sanderson, 2001). Along with cereals, fruits and tropical roots of a

94

number of legumes have been a staple human food and their use is closely related to

95

human evolution (Zhu et al., 2005). Further, legumes are an important part of natural

96

ecosystems as they fix atmospheric nitrogen by intimate symbioses with microorganisms

97

(Doyle, 2011).

98

Mainly due to their economic importance, whole-genome sequences for a number of

99

legumes have been deciphered, including Glycine max (L.) Merr. (soybean)(Schmutz et

100

al., 2010), Cicer arietinum (L.) (chickpea) (Varshney et al., 2013), Medicago truncatula

101

Gaertn. (barrel medic) (Young et al., 2011; Tang et al., 2014), Lotus japonicus L. (lotus)

102

(Sato et al., 2008), Vigna radiata (L.) R. Wilczek (mung bean) (Kang et al., 2014) and

103

Vigna angularis (Willd.) Ohashi (adzuki bean) (Kang et al., 2015), Cajanus cajan (L.)

104

Millsp (pigeon pea) (Varshney et al., 2012), Phaseolus vulgaris (L.) (common bean)

105

(Schmutz et al., 2014), and two wild peanuts (Arachis duranensis Krapov. &

106

W.C.Gregory and Arachis ipaensis Krapov. & W.C.Gregory)(Bertioli et al., 2016; Chen

107

et al., 2016). These legume genomes have sizes ranging from ~400 (barrel medic) to 1150

108

Mb (soybean), packaged into 6 to 20 chromosomes.

109

Most if not all legumes, having originated from a common ancestor about 60 million

110

years ago (Mya), shared a tetraploid ancestor (named legume-common tetraploid, or

111

LCT) of similar age (Schmutz et al., 2010) that played a major role in shaping legume

112

genome organization (Young et al., 2011). Before the LCT, legumes share an ancient

113

core-eudicot-common hexaploid ancestor (ECH, often named gamma), which was

114

revealed first with the Arabidopsis genome sequence (Bowers et al., 2003), then

115

described in details based on the grape genome (Jaillon et al., 2007; Jiao et al., 2012),

116

often taken as a valuable reference to explore genome structure of eudicots. More recent

5

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

117

polyploidizations continued to occur in some legume lineages, offering the opportunity

118

for punctuational change in the evolution of these plants, e.g., one occurring ~13 Mya

119

and specifically contributing to the formation of the extant soybean genome (Schmutz et

120

al., 2010) (named soybean-specific tetraploid, or SST).

121

Polyploidization, as an abrupt evolutionary event, can occur over night, but exerts an

122

enormous effect on the evolution of a plant, and even triggers speciation and

123

diversification processes (Paterson et al., 2004; Soltis et al., 2008; Jiao et al., 2011).

124

Recently, polyploidization has been suggested to explain the long-standing mystery of the

125

rapid formation and diversification of land plants (Frohlich and Chase, 2007; Van de

126

Peer, 2011). Polyploidization can have short-term and long-term effects, genetically or

127

epigenetically, and/or at single-gene or whole-genome scale. After a new polyploid

128

forms, the genome can be very unstable, and in the first generations, it may lose much of

129

its DNA content, as evidenced for example by the production of synthetic tetraploid

130

wheat (Kashkush et al., 2002). Evolutionary analysis also supports this inference.

131

Comparative analysis of the cereal genomes, sharing a 100-Mya tetraploid ancestor,

132

suggested that the majority of gene losses (97% or more) occurred before the divergence

133

of sorghum (panicoids) and rice (oryzoids) (Paterson et al., 2009). Nonetheless,

134

thousands of polyploidy-derived duplicated genes can still be preserved in extant

135

genomes. These duplicated genes may take different evolutionary avenues, to share or

136

divide ancestral gene functions, or develop novel genetic functions (Feldman et al., 2012;

137

Lin et al., 2014). As to gene expression, it has been proposed that at least 57-85% of

138

paleopolyploid-produced duplicates have diverged in rice (Throude et al., 2009), and

139

duplicates with high expression tend to have higher CG body methylation (Wang et al.,

140

2013). This suggests that epigenetic changes may have contributed to genomic

141

preservation, maintenance, and restoration of genomic stability (Wang et al., 2013).

142

The availability of 10 hard-won legume genomes provides a precious opportunity to

143

understand legume biology. Here, by developing approaches to perform hierarchical

144

comparative genomics analysis, we produced multiple alignments of all these 10 legume

145

genomes. By tracking information about ancestral polyploidization, we de-convoluted the

146

layer-by-layer homology between the legume genomes. This enabled us to evaluate

147

evolutionary divergence among legumes, re-date major evolutionary events, and reveal

6

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

148

rules of massive gene losses and expression changes between duplicated genes. The

149

hierarchical alignment yielded a homologous gene list, relating to different evolutionary

150

events such as recursive polyploidizations and plant divergences. The present efforts

151

provided a valuable genomic platform for researchers in the plant community to

152

investigate evolutionary changes, functional innovations, and phylogenetic structures of

153

gene families and regulatory pathways.

154 155

RESULTS

156

Gene colinearity within and among genomes

157

Intragenomic homology

158

By inferring gene colineartiy, we detected colinear genes within each legume

159

genome, between each pair of them, and between them and grape, which was used as an

160

outgroup reference. Homologous blocks with more than 4, 10, 20, and 50 colinear genes

161

were checked (Supplemental Table S1-2).

162

The legume genomes were divergent in numbers of duplicated blocks and colinear

163

genes residing in them. For blocks containing more than 4 colinear genes, we found the

164

most duplicated genes in soybean (25,302 pairs), and the fewest in adzuki bean (1,956

165

pairs) (Supplemental Table S1). The large difference in duplicated gene numbers among

166

genomes might be related to the SST in soybean, or to incomplete assembly of the

167

legume genomes. In soybean, 434, 224, and 87 blocks had more than 10, 20, and 50

168

colinear genes, which contain 20,365, 17,578, and 13,191 colinear genes, accounting for

169

44.9%, 38.8%, and 29.1% of total gene contents, respectively. The longest homologous

170

region supported by gene colinearity was from soybean chromosomes Gm10 and Gm20,

171

having 824 colinear genes in a 12.87 Mb region. The other genomes had much shorter

172

duplicated blocks, often with fewer than 10 blocks having more than 50 colinear genes.

173

For example, among hundreds of duplicated blocks in barrel medic and mung bean, each

174

had only 9 duplicated blocks with more than 50 colinear genes. Common bean has the

175

most (12) duplicated blocks of more than 50 colinear genes.

176

Intergenomic homology 7

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

177

Intergenomic homology among legumes is often better than intragenomic

178

homology, consistent with speciations often being more recent than genome duplications.

179

Between these legume genomes, there were often many thousands of colinear genes

180

(Supplemental Table S1). Soybean had more colinear genes with other legumes than

181

were found between any other legumes, due to the SST. For example, soybean and barrel

182

medic genes form 50,672 colinear gene pairs located in 2,824 homologous blocks with

183

more than 4 colinear genes, involving 21,103 (~35.4%) and 34,822 (~47.7%) genes from

184

the two genomes, respectively. There were often tens of intergenomic blocks with more

185

than 50 colinear genes. Two peanut genomes have 16,484 colinear genes in 50 blocks,

186

with each containing at least 50 colinear genes. Detailed statistics of numbers of inferred

187

paralogous and orthologous genes, gene pairs, and blocks are in Supplemental

188

information Table S2-5.

189

Multiple genome/chromosome alignment

190

Event-related genomic homology

191

Intergenomic comparison helped to unravel the structural complexity of legume

192

genomes, which had been a result of recursive polyploidization events successively

193

doubling or tripling the numbers of existing homologous regions (Fig. 1). Analysis of the

194

grape genome contributed to understanding the triplicated nature of the ancestral core

195

eudicot genome, which appears to have transitioned from 2n = 2x = 14 to 2n = 6x = 42

196

chromosomes (Jaillon et al., 2007). Here, we used the grape genome to distinguish

197

orthologous from outparalogous regions between legumes, and paralogous regions within

198

each legume. Homologous regions in different genomes are called outparalogous when

199

they were produced by the genomic duplication in two species’ common ancestor, to

200

distinguish from paralogous regions produced by duplication specific to one species.

201

Homologous gene dotplots (Supplemental Fig. S1-Fig. S4) depict genomic comparisons

202

and provide for inferences of orthology and paralogy. Orthologous regions between grape

203

and legumes have much better DNA similarity than between outparalogous regions, the

204

latter being a result of the ECH. The details inferring orthology and paralogy can be

205

found in the methods and Supplemental text. Similar analyses have been described for

8

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

206

grass genomes and the cotton genome (Paterson et al., 2012; Wang et al., 2015; Wang et

207

al., 2016). In that an extra LCT shared by all legumes, there would be an expected 1:2

208

ratio of orthologous regions between grape and most legumes, with the additional SST

209

conferring a 1:4 ratio between grape and soybean. In partial summary, intergenomic

210

analysis revealed layers of genomic homology in the complex legume genomes. Above,

211

we used grape as the outgroup reference to deconvolute the genomic complexity of barrel

212

medic and other legumes to find duplicated blocks in each of them, and homology

213

between them. In a similar manner, we adopted barrel medic and common bean as

214

references to distinguish recent SST duplicated regions in soybean.

215

Multiple alignment

216

With the grape genome as a reference, we produced a table to store inter- and

217

intra-genomic homology information. First, we filled in all grape gene IDs in the first

218

column of the table, then added gene IDs from legumes column by column, species by

219

species according to the colinearity inferred by multiple alignments. As noted above, in

220

the absence of gene loss the grape genes would have 2 colinear orthologous genes in most

221

legumes, and 4 in soybean. When a legume species contained a gene showing colinearity

222

with a grape gene, a gene ID was filled into an appropriate cell in the table. When a

223

legume species did not have an expected colinear gene, often due to gene loss or

224

translocation or insufficient assembly, a dot (signifying missing) was filled into an

225

appropriate cell. For 11 (sub)genomes (including two subgenomes for soybean) there

226

have 23 (9x2+4+1) columns in the table. Moreover, due to the ECH, each chromosomal

227

segment would repeat three times in each genome. Based on homology inferred in grape,

228

we therefore extended the table to 69 columns. Finally, we constructed a table of colinear

229

genes reflecting three polyploidizations and all salient speciations. In partial summary,

230

the table summarized results of multiple-genome and event-related alignment, reflecting

231

layers of tripled and/or doubled homology due to recursive polyploidizations (Fig. 2).

232

The genomic alignment table for 10 legumes with grape as a reference is not

233

complete – in particular, it cannot include all duplicated genes produced by the SST. That

234

is, genes specific to legumes and absent from the grape genome are not represented.

9

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

235

Therefore, the grape-legume homology table was supplemented by a genomic homology

236

table with barrel medic as reference (Supplemental Fig. S5), to better represent pan-

237

legume gene content.

238

Event-related duplicated genes

239

The cross-legume genome analyses described above helped to identify duplicated

240

genes produced by each polyploidization event, and to infer gene content in the ancestral

241

genomes before each polyploidization and speciation event. In grape, we inferred 1,764

242

pairs of genes in 86 homoeologous regions derived from the ECH, involving 2,893 extant

243

genes (Table 1). Being affected by more polyploidizations, legume genomes contain

244

more duplicates. In barrel medic, 2,504 gene pairs involving 2,961 genes were inferred in

245

194 ECH-derived homoeologous regions. However, fewer ECH-derived duplicates were

246

inferred in some legumes. For example, only 300-1,400 ECH gene pairs were inferred for

247

pigeon pea, adzuki bean, and Lotus japonica. The most ECH-derived gene pairs were

248

inferred from soybean, with 3,663 gene pairs involving 2,575 genes from 344

249

homoeologous regions. The high numbers of soybean ECH genes result partly from the

250

additional SST, which would have produced up to 5 times ((6,2)/(3,2)) the number of

251

various combinations of homoeologous gene pairs found in other legumes. Here, (m, n)

252

defines the combinatorial number.

253

We also characterized LCT-derived gene pairs, which showed 10-fold variation

254

among legumes. In barrel medic, 4,796 gene pairs involving 4,198 genes were inferred

255

from 309 LCT-derived homoeologous regions. In soybean, 8,317 gene pairs involving

256

9,486 genes were derived from 343 LCT-derived homoeologous regions. Pigeon pea has

257

the fewest LCT-derived gene pairs (869). The reduced abundance of inferred LCT-

258

derived gene pairs may be resulted from poor assembly. Soybean-specific

259

tetraploidization, SST, produced 17,104 gene pairs involving 19,210 genes were derived

260

from 133 homoeologous regions.

261

Genomic fractionation

10

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

262

Genomic fractionation reshapes plant genomes. Key forces driving genomic

263

fractionation include polyploidization, multiplying gene content of an entire genome; and

264

transposon activities, duplicating, and relocating individual genes (Wang et al., 2011).

265

Here, by using grape, barrel medic, and common bean as references, we show how gene

266

removal eroded colinearity between homologous genomic regions.

267

By using the grape genome and genes as a reference, it is clear that there has been

268

widespread genomic fractionation following LCT (Supplemental Table S6). For

269

example, regarding grape chromosome 1 as outgroup, as to pairwise alignment of the

270

grape and each medic barrel duplicate, 75% and 77% of grape genes were not found at

271

the respective colinear locations; as to triple-wise alignment of barrel medic duplicated

272

regions and the outgroup, 70% of the grape gene were absent from both collinear

273

locations. For common bean, the corresponding numbers are 94%, 89%, and 83%,

274

respectively. Using barrel medic chromosome 1 as a reference, 74%, 73%, and 69% of its

275

genes were not found at the respective colinear locations in each or both of the duplicated

276

regions produced by the SST. A local alignment of colinear blocks among genomes

277

shows the pattern of genomic fractionation (Fig. 3). Some missing genes from the

278

homologous locations may be related to deletions of adjacent transposons or movements

279

of transposons disrupted the gene orders, and may be also related to poor assemblies or

280

annotations, as further discussed below.

281

To investigate the scale and potential mechanisms of fractionation, we counted

282

the numbers of runs of removed genes in each legume genome relative to a reference

283

genome, that is, the numbers of consecutive genes from the reference not appearing in the

284

studied genome. Many missing genes comprised small runs, i.e., of only 1 or 2 genes (Fig.

285

4). For example, these small runs comprise 53% of missing genes and up to 71% of all

286

10604 runs in common bean; 15% of genes and up to 49% of all 13936 runs in barrel

287

medic; and 15.2% of genes and up to 44.7% of all 7984 runs in the referenced grape

288

genome. From another perspective, 77.6%, 56.5%, and 47% of genes were removed from

289

their anticipated locations, in runs of 10 genes or fewer that account for up to 48.4%,

290

89.5%, and 85.6% of all runs for each of the referred grape, barrel medic, and common

291

bean genomes, respectively. The references work as temporal outgroups, with common

11

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

292

bean, barrel medic, and grape being successively more diverged from soybean

293

(Supplemental Table S7-8). Missing genes were more likely to appear in small runs

294

using common bean as reference than barrel medic or grape. This suggests an

295

accumulating effect with initial gene loss resulting in small runs that are gradually

296

extended over time.

297

The lengths and numbers of runs of removed genes closely approximated a

298

geometric distribution. We fitted the observed distribution of numbers of different runs

299

by using different density curves of the geometric distribution, with extension parameters

300

0.33, 0.31, and 0.30, respectively for common bean, barrel medic, and grape as references;

301

finding goodness of fit of 0.995, 0.991, and 0.994 with p-values of 0.92, 0.91, and 0.89

302

(F-test), respectively (Supplemental Table S9). The closer is the reference plant to

303

soybean, the shorter are the runs of lost genes (Fig. 4), showing better gene sharing

304

pattern. The deviation between the observed numbers and the theoretically predicated

305

becomes larger when the gene loss runs are longer, which also supports the length

306

extension of removed-gene runs over time.

307 308

Correspondingly balanced fractionation between the SST homoeologous chromosomes

309

Aligning duplicated soybean regions onto corresponding single barrel medic

310

chromosomes permitted us to ‘reconstruct’ (infer) the gene composition of ancestral

311

duplicated SST paralogous chromosomes, which often show significant divergence of

312

gene retention rates. Among 8 barrel medic chromosomes, 7 have significantly divergent

313

paralogous soybean chromosomal regions at Chisq-test significance level 0.05; or 6 at

314

0.01 (Supplemental Table S10 and Table S7). This finding shows unbalanced gene

315

retention between homoeologous chromosomes. However, scrutiny of gene retention/loss

316

using a sliding window along chromosomes showed that in nearly all local regions, with

317

the exception of large patches of DNA losses in one copy of the duplicated chromosomes,

318

genomic retention and loss are often highly similar (Fig. 5). The difference of gene

319

retention between corresponding paralogous regions is always varying around zero level.

320

The difference observed above in the chromosome level should have been caused by

321

large-patches of alternative segmental DNA losses due to genomic instability (Fig. 5).

12

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

322

Grossly, this finding suggests little if any dominance between members of homoeologous

323

chromosome pairs, providing further evidence of likely autotetraploidization nature of the

324

SST (Garsmeur et al., 2014).

325

Karyotype changes and inter-genomic representation

326

After recursive polyploidizations, plants often restore chromosome numbers to

327

relatively small values. Grape and legumes share an eudicot common ancestor inferred to

328

have had 2n = 6x = 42 chromosomes, resulting from triplication of a basal set (x) of 7

329

chromosomes (2n = 2x = 14) by the ECH. By using gene colinearity information, the 19

330

grape chromosomes or chromosomal regions were grouped into 7 sets of paralogous

331

triplets, which were mapped onto the chromosomes of legumes (Fig. 1).

332

After the LCT, the non-soybean legumes under consideration have 6 – 11 haploid

333

chromosomes, suggesting considerable chromosome number reduction. The legume-

334

common ancestor may have had 11 chromosomes, still found in common bean and its

335

indigoteroid/millettioid relatives, while the Dalbergioid (peanut) and Hologalegina

336

(chickpea and barrel medic) legumes may have experienced chromosome number

337

reductions. Soybean tetraploidization (SST) might have produced 22 chromosomes, with

338

a chromosome fusion resulting in 20 extant chromosomes. Within the indigoteroid clade,

339

3 legumes here have the same chromosome number (n = 11), but their chromosomes

340

differ in composition (Fig. 6). At least 6 common bean chromosomes were largely

341

preserved in other legumes (Fig. 6).

342

Evolutionary divergence and dating

343

We found that legume genes evolve at considerably divergent rates in different

344

genomes. By estimating synonymous nucleotide substitutions at synonymous sites (Ks),

345

we characterized divergence levels between colinear homoeologs between different

346

legumes or within a legume. Recursive polyploidization events can be identified based on

347

Ks distributions for duplicated genes, as ‘Ks peaks’ that deviate from a general decline in

348

frequency with increasing Ks value. For example, the soybean duplicates form a

349

distribution with three peaks reflecting three polyploidizations (SST, LCT, ECH) over

13

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

350

time, although the peaks resulting from more ancient events can be difficult to discern.

351

Ks distributions of inter-legume colinear homoeologs reflect both polyploidization events

352

common to them and speciation events that differentiate them. The peak corresponding to

353

their differentiation is often more prominent than the polyploidization-derived one(s) due

354

to wide-spread gene losses following polyploidization(s). We adopted kernel function

355

analysis to distinguish different components in Ks distributions (See Methods for details),

356

and each Ks distribution was represented by a linear combination of multiple normal

357

distributions, each corresponding to an ancestral event (polyploidization or speciation)

358

(Supplemental Table S11).

359

Both the LCT and ECH produced Ks peaks with divergent locations in different

360

legumes (Fig. 7, Supplemental Table S11) revealing divergent gene evolutionary rates.

361

Lotus japonica has evolved the slowest and peanut the fastest (with a nearly 25%

362

difference). Relative to soybean, gene sequences of other legumes have evolved 17%-24%

363

faster (peanut 23.9%; adzuki bean: 20.7%; mung bean: 19.4%; chickpea: 19.1%; barrel

364

medic: 18.8%; common bean: 17.0%), or 3.9%-11.2% slower (pigeon pea: 3.9%; lotus:

365

11.2%) (Supplemental Table S11).

366

Such high divergence in evolutionary rates may jeopardize efforts to date

367

evolutionary events and perform phylogenetic analysis, hindering understanding of

368

legume biology and evolution. Using soybean as a reference, we performed ‘correction’

369

to other legumes’ evolutionary rates, calibrating the LCT peaks in the other legumes’ Ks

370

distribution to that in soybean (See Materials and Methods for details) (Fig. 7B-C and

371

Supplemental Table S12). Supposing that ECH occurred ~130 Mya (Jiao et al., 2012),

372

then we estimated that LCT occurred ~59 Mya, and peanut (from the Dalbergioid tribe)

373

split from the other legumes about 49.1 Mya, and the Hologalegina (including barrel

374

medic, lotus, and chickpea) and Millettioid tribe (including soybean, pigeon pea, mung

375

bean, adzuki bean, common bean) split 48.1 Mya.

376

Inference of ancestral genome content

377

By using information of event-related colinearity, we inferred gene content at the major

378

evolutionary nodes of legumes (Fig. 8). Two colinear orthologs from different genomes

14

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

379

show that the most recent common ancestor had a single ancestral gene at the

380

corresponding location in its genome; whereas two colinear (out)paralogous genes

381

produced by the same polyploidization would derive from an ancestral gene in the

382

paleogenome before the event. Therefore, by referring to the event-related colinear gene

383

table (Table 1), it was quite easy to infer the ancestral gene content at any evolutionary

384

node during the evolution and divergence of these legumes. For example, the most recent

385

common ancestors had at least 22,177 genes for soybean and common bean; 18,935

386

genes for the two peanut genomes; and 28,900 genes for all legumes after the LCT. After

387

the ECH, there were at least 11,672 genes in the eudicot common ancestor.

388

GO analysis

389

By counting genes still in colinearity, we explored how each polyploidization

390

event contributed to copy number variations for genes with different functions. By

391

characterizing Gene Ontology functions, it was clear that each event increased copy

392

numbers for all functional genes but by divergent increments (Fig. S6), and different

393

events have resulted in divergent contributions to enhancement of functions. After the

394

SST, genes related to macromolecular complexes, membrane function and organelle

395

function (classified in view of cellular components), and metallochaperone, molecular

396

regulator and structural activities (classified in view of molecular functions) were

397

significantly retained. The most significantly preserved genes were related to

398

macromolecular complexes, accounting for up to 9.24% of the SST alpha duplicates but

399

only 6.15% of all genes in the genome (Fisher’s exact test p-value = 6.75 x 10-35)

400

(Supplemental Table S13).

401

In contrast, genes being least increased by the SST were related to catalytic

402

activities (p-value = 1.04 x 10-70), and nearly all genes relating to biological process were

403

not increased with the exception of those relating to localization.

404

By checking the barrel medic genome, we can evaluate what genes are likely to be

405

removed from the soybean after the SST. These genes are still in the barrel medic

406

genome, but have no corresponding copies at the expected locations in soybean, which

407

could be a result of post-polyploidy instability (Fig. S7). Genes in metabolic processes

15

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

408

(P-value = 5.08 x 10-8), catalytic activity (P-value = 3.7 x 10-12), and molecular binding

409

(P-value = 4.5 x 10-4) were frequently not deleted or transposed. Comparatively, genes

410

related to biological regulation (P-value = 8.6 x 10-10), membrane part (P-value = 4.24 x

411

10-5), and nucleic acid binding transcription factors (P-value = 2.6 x 10-6), were

412

frequently deleted or transposed (Supplemental Table S14).

413

Nodulation and oil synthesis

414

A topic of singular importance to legume biology is whether recursive

415

polyploidizations have contributed to the evolution of key traits such as nodulation

416

associated with the symbiotic nitrogen fixation that is a distinguishing feature of legumes.

417

Legumes have divergent numbers of nodulation related genes (Supplemental Table S15).

418

Using the reported soybean nodulation genes as seeds (Schmutz et al., 2010), we detected

419

their homologs in all legumes at Blastp E-value < 1e-10 and score > 150 (Supplemental

420

Table S15). Soybean has the most nodulation-related genes (1,702), comprising 4

421

families of 50 or fewer genes, and 3 families of more than 200 genes (Supplemental

422

Table S16). We wanted to know whether the recursive polyploidizations had contributed

423

to their expansion. Since large gene families are excluded from inferences of colinearity

424

(see Methods) and therefore under-represented in the colinear gene table, to investigate

425

whether recursive polyploidizations had contributed to their expansion we plotted the

426

distribution of the nodulation-related genes in the whole genome, also showing colinear

427

genes related to each polyploidization (Fig. 9). Notably, in soybean we found that 78%,

428

74%, and 66% of nodulation-related genes could be located at paralogous chromosomal

429

regions related to the three polyploidization events (SST, SCT, and ECH), respectively.

430

Genes involved in younger polyploidization(s) could also be involved in older events if

431

they have a paralogous copy produced by the latter. Nonetheless, these finding showed

432

that polyploidizations may have contributed to the increase of nodulation-related gene

433

copy numbers, with increases of 73 in the soybean-specific tetraploidization (Fig. 9), 284

434

in the ECH (based on barrel medic) and 852 related to the LCT. Similar findings have

435

been observed in the other legumes.

16

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

436

While new genes can be produced by tandem duplications and transposon

437

activities, these events produced fewer genes than polyploidization. At a Ks < 0.15, a

438

time after or overlapping the SST event, we found more than 13 genes residing in

439

duplicated regions from soybean chromosomes 4 and 5; 7 and 8; and 11 and 12 that were

440

clearly produced by the SST. We also found young tandem gene clusters on

441

chromosomes 16, 14, 9 and others, and young transposed genes on many other

442

chromosomes (Fig. 9). One tandem cluster on chromosome 16 contains more than 20

443

young duplicated genes, some with Ks ~ 0 and four pairs with Ks < 0.015, involving 6

444

genes (Glyma16g07010.1, Glyma16g07051.1, Glyma16g07031.1, Glyma16g07060.1,

445

Glyma16g30695.1 and Glyma16g30911.1) showing a hotspot of new gene production.

446

Then, we checked how polyploidizations affected the copy number variation of

447

genes participating in the synthesis of high concentrations of seed oils that are an

448

important economic product of many legumes. Oil synthesis related (OSR) genes could

449

be classified into 9 different functions: Synthesis of fatty acids in plastids, synthesis and

450

storage of oil, metabolism of acyl lipids in mitochondria, lipid signaling, fatty acid

451

elongation and wax and cutin metabolism, synthesis of membrane lipids in

452

endomembrane system, degradation of storage lipids and straight fatty acids, and

453

miscellaneous functions, as reported previously (Wang and Brendel, 2006; Schmutz et al.,

454

2010). Each of these families has more than 50 genes in soybean (Supplemental Table

455

S17). There are more than 850 OSR genes in the peanut genomes, and 1,528 in soybean

456

(Supplemental Table S18). In peanut, 42% and 22% of OSR genes can be related to

457

paralogous regions produced by the LCT and ECH events, respectively; in soybean, 65%,

458

58%, and 27% of OSR genes can be related to the SST, LCT, and ECH events,

459

respectively (Supplemental Fig. S8). This shows that each of these polyploidizations

460

may have expanded the OSR families, which also seems true in other legumes. As with

461

nodulation genes, tandem duplications and transposon activities might also have

462

contributed to expansion of the OSR families. At a Ks < 0.15, more than 13 genes

463

residing in duplicated regions of soybean chromosomes 4 and 6; 7 and 8; 11 and 12; and

464

14 and 17, were clearly produced by the SST. We also found young small tandem clusters

465

= 50% and Identity >= 60%).

555

This could have been resulted from gene divergence and gene loss. With the best matched

556

genes, about half of them share gene colinearity between genomes. We got similar

557

findings with barrel medic as a reference. These findings suggest that gene movement,

558

possibly involving transposons, may contribute to genomic fractionation. With EST, at

559

coverage >= 30% and identity >= 90%, we found that there are at least 50% of genes

560

having no EST support, which suggests that legume gene annotations need much to

561

improve. The annotation of genes would affect the inference of gene colinearity, and

562

therefore affect the characterization of gene losses and genomic fractionation. We would

563

update our inference based on latest versions of annotated genes in the future.

564

Unbalanced evolutionary rates among legumes

565

Duplicated genes deriving from a shared duplication event provide a direct means to

566

compare evolutionary rates among taxa. In grasses, divergence of duplicated genes

567

produced by a grass-common tetraploidization shows that 8.5-48% divergence in

568

evolutionary rates, with rice being the slowest (Wang et al., 2015). A phylogenetic

569

analysis with mulberry genes and their orthologs from Rosales relatives showed that

570

mulberry evolved much (even 3 times) faster than other Rosales (He et al., 2013).

571 572

Polyploidization itself may drive genes with duplicates to evolve faster, as

573

duplicated genes may buffer mutations in one another, possibly resulting in

574

neofunctionalization or subfunctionalization. For example, cotton genes affected by a

575

decaploidization may have evolved 19% and 15% faster than orthologs in cacao that have

576

not experienced duplication since the two taxa diverged (Wang et al., 2016). Further,

577

genes from a duplicated pair of grass chromosomes affected by gene conversion, evolved

578

faster than those not affected by gene conversion (Wang et al., 2009).

579

Unexpectedly, duplicated genes in soybean, affected by the SST, did not

580

necessarily evolve faster than those of other legumes. With soybean as a reference, genes

581

in peanut, adzuki bean, mung bean, chickpea, barrel medic, and common bean evolve

21

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

582

faster and those in pigeon pea and lotus are slower than soybean genes. This weakens the

583

generalization that duplicated genes evolve faster than single-copy genes, perhaps

584

pointing to the importance of other factors such as living in different environments for

585

millions of years.

586

MATERIALS AND METHODS

587

Genomic materials

588

We downloaded genomic sequences and annotations from respective websites for each

589

genome projects, for which complete information can be found at the Supplemental

590

table S29.

591

Inferring gene colinearity

592

With annotated genes as input, chromosomes from within a genome or between different

593

genomes were compared. Firstly, by performing BLASTP (Altschul et al., 1990), protein

594

sequences were searched against one another to find potentially homologous genes (E-

595

value < 1e-5). A smaller E-value may involve more-diverged homologous genes and help

596

find ancient duplicated genes. Secondly, information of gene homology was used as input

597

for the software ColinearScan (Wang et al., 2006) to locate homologous gene pairs in

598

colinearity. The key parameter, the maximum gap was set to be 50 intervening genes, as

599

adopted in previous genomics research (Wang et al., 2015; Wang et al., 2016). Large

600

gene families with 30 or more copies in a genome were removed from inferring

601

colinearity.

602

Inferring genomic homology

603

To infer chromosomal homology in legumes, we used the grape genome as an

604

outgroup reference, which provide information of chromosome homology transitively.

605

The grape genome preserves much of the ancestral genome structure before and after the

606

ECH that was common to most eudicot plants (Bowers et al., 2003; Jaillon et al., 2007),

607

much better than other sequenced eudicot genomes, which are often affected by further

608

polyploidization(s). The grape genome was important to reveal and distinguish

22

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

609

paralogous blocks within legume genomes that were produced by the ECH event or not.

610

Due to the ECH, any one grape genomic region often has 2 paralogous regions within

611

grape itself, and more in legume genomes. Dotplots of genomic homology between

612

genomes produced by our custom software were used to help distinguish orthologous and

613

outparalogous regions between different genomes.

614

We produced dotplots between grape and other legumes. For example, we show

615

how grape-barrel medic homology dotplot helps understand barrel medic genome

616

structure. The 19 chromosomes of grape were denoted with blocks in 7 colors,

617

corresponding to 7 ancestral eudicot chromosomes before the ECH. Due to the ECH, and

618

the legume-specific LCT, we anticipated that a grape region would have 2 orthologous

619

barrel medic regions, which are paralogous to one another, and 4 outparalogous regions

620

(Supplemental Fig. S1). In the grape-barrel medic dotplot, orthologous and

621

outparalogous blocks can be inferred without much difficulty. A grape chromosomal

622

region is often much more similar, measured by collinear gene number, to its barrel

623

medic orthologous regions than to the outparalogous regions. Some outparalogous blocks

624

can have few homologous gene dots and can only inferred by transitively using paralogy

625

between grape chromosomes (Supplemantal Fig. S11, and detailed in supplemental

626

text). Ideally, a grape chromosome would have 2 orthologous corresponding regions.

627

However, often they are broken into pieces by chromosomal rearrangement. A

628

complementary pattern of broken segments helps infer their being derived from the same

629

ancestral chromosome.

630

The above strategy was also applied to comparative analysis between grape and

631

other legumes. To infer intragenomic homology in soybean after its specific SST, we

632

used the barrel medic genome as reference.

633 634

Supplemental Methods

635

Description of details about inferring genomic colinearity, estimating nucleotide

636

substitution, evolutionary dating, modeling gene loss, inferring Gene-Ontology, and

637

polyploidization and NBS-LRR genes can be found in Supplemental text.

638

23

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

639

SUPPLEMENTAL DATA

640

The following supplemental materials are available.

641

Supplemental Text. Description of details about inferring genomic colinearity,

642

estimating nucleotide substitution, evolutionary dating, modeling gene loss, and inferring

643

Gene-Ontology.

644

Supplemental Figure S1. Homologous dotplot between Vitis vinifera and Medicago

645

truncatula genomes.

646

Supplemental Figure S2. Homologous dotplot between Vitis vinifera and Arachis

647

duranensis genomes.

648

Supplemental Figure S3. Homologous dotplot between Vitis vinifera and Arachis

649

ipaensis genomes.

650

Supplemental Figure S4. Homologous dotplot between Medicago (Medicago

651

truncatula) and Soybean (Glycine max) genomes.

652

Supplemental Figure S5. Homologous alignments of 10 legume genomes with

653

Medicago truncatula as reference.

654

Supplemental Figure S6. GO analysis distribution of Glycine max retention genes

655

produced by ECH, LCT and SST.

656

Supplemental Figure S7. GO analysis distribution of Glycine max lost genes in ECH,

657

LCT, SST and LCT-SST

658

Supplemental Figure S8. Oil genes amplification model related to gene duplication

659

events in soybean.

660

Supplemental Figure S9. NBS-class genes amplification model related to gene

661

duplication events in soybean.

662

Supplemental Figure S10. NBS-domains genes amplification model related to gene

663

duplication events in soybean.

664

Supplemental Figure S11. Homologous dotplot between grape and barrel medic

665

chromosomes.

666

Supplemental Table S1. Number of homologous blocks and gene pairs within a genome

667

or between genomes.

24

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

668

Supplemental Table S2. Number of homologous genes within a genome or between

669

genomes.

670

Supplemental Table S3. Number of paralogous, orthologous and out-paralogous gene

671

pairs within a genome or between genomes.

672

Supplemental Table S4. Number of paralogous, orthologous and out-paralogous genes

673

within a genome or between genomes.

674

Supplemental Table S5. Number of paralogous, orthologous and out-paralogous blocks

675

within a genome or between genomes.

676

Supplemental Table S6. Legume gene loss rates and gene translocation with grape as

677

reference genome.

678

Supplemental Table S7. Legume gene loss and gene translocation rates with medicago

679

as reference genome.

680

Supplemental Table S8. Legume gene loss and gene translocation rates with common

681

bean as reference genome.

682

Supplemental Table S9. The observed distribution of gene loss and translocation

683

numbers fitted by using different density curves of geometry distribution.

684

Supplemental Table S10. Gene retention in soybean duplicated chromosomes.

685

Supplemental Table S11. Kernel function analysis of Ks distribution related to

686

duplication events within each genome and between selected legumes (before

687

evolutionary rate correction).

688

Supplemental Table S12. Kernel function analysis of Ks distribution related to

689

duplication events within each genome and between selected legumes (after evolutionary

690

rate correction).

691

Supplemental Table S13. GO analysis distribution of Glycine max retention genes

692

produced by CEH, LCT and SST.

693

Supplemental Table S14. GO analysis distribution of Glycine max lost genes in CEH,

694

LCT, SST and LCT-SST.

695

Supplemental Table S15. Nodulation genes related to duplication events in each legume

696

genome.

697

Supplemental Table S16. Nodulation 7 subfamily genes related to duplication events in

698

soybean genome.

25

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

699

Supplemental Table S17. Oil 9 subfamily genes related to duplication events in soybean

700

genome.

701

Supplemental Table S18. Oil genes related to duplication events in each legume

702

genome.

703

Supplemental Table S19. NBS-CC genes related to duplication events in each legume

704

genome.

705

Supplemental Table S20. NBS-TIR genes related to duplication events in each legume

706

genome.

707

Supplemental Table S21. NBS-TNL genes related to duplication events in each legume

708

genome.

709

Supplemental Table S22. NBS-TNx genes related to duplication events in each legume

710

genome.

711

Supplemental Table S23. NBS-xNL genes related to duplication events in each genome.

712

Supplemental Table S24. NBS-xNx genes related to duplication events in each genome.

713

Supplemental Table S25. Bidirectional BLAST searched against all annotated genes

714

between grape and legume.

715

Supplemental Table S26. Bidirectional BLAST searched against all annotated genes

716

between barrel medic and other legumes.

717

Supplemental Table S27. Barrel medic soybean, and lotus genes against their respective

718

EST sequences (Alignment of coverage≥30%).

719

Supplemental Table S29. Information of original data material.

720 721

ACKNOWLEDGEMENTS

722

We thank Liming Zhou for helpful discussions about the manuscript.

723 724

26

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

725 726 727

728 729 730

TABLES Table 1. Number of duplicated genes within legume genomes related to ECH, LCT and SST. Species

ECHa-related

LCTb-related

SSTc-related

V. vinifera

86/2,423/3,851d

---

---

M. truncatula

194/2,504/2,961

309/3,600/4,796

---

C. arietinum

317/2,998/3,936

257/2,913/4,743

---

A. duranensis

124/1,891/2,747

96/2,094/3,847

---

A. ipaensis

115/1,861/2,697

106/2,205/3,928

---

V. radiata

100/1,521/ 2,223

68/1,378/2,529

---

V. angularis

25/447/611

53/939/1,482

---

P. vulgaris

126/2,579/3,440

109/3,043/4,853

---

L. japonicus

63/1,185/1,710

97/2,082/3,116

---

C. cajan

26/341/588/

30/464/869

---

G. max

344/3,663/2,575

343/8317/9,486

133/10,312/19,210

a

Core eudicot-common hexaploid; b Legume-common tetraploid; c Soybean specific tetraploid; d block/ gene pairs/ gene number.

27

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

731 732 733 734

FIGURE LEGENDS Figure 1. Species and gene phylogenetic tree. A, Phylogenetic tree of G. max (G), A.

735

duranensis (A), A. ipaensis (B), M. truncatula (M), P. vulgaris (P), L. japonicus (L), C.

736

arietium (E), C. cajan (C), V. angularis (U), V. radiata (R), and V. vinifera (V); The

737

Eudicot-common hexaploidy (ECH) is denoted by blue hexagon, legume-common

738

tetraploidy (LCT) by red square, and soybean-specific tetraploidy (SST) by yellow square;

739

B, Gene phylogenetic tree: three paralogous genes in the V. vinifera genome, V1, V2 and

740

V3, produced by the ECH, each have two orthologs in non-soybean legume genomes, and

741

four orthologs in soybean.

742 743

Figure 2. Homologous alignments of legume genomes with V. vinifera as a reference.

744

Genomic paralogy, orthology, and outparalogy information within and among 10

745

legumes, with same name abbreviations as in Fig. 1, are displayed in 69 circles, each

746

corresponding to an extant gene in Fig. 1b; The curved lines within the inner circle are

747

formed by 19 grape chromosomes color-coded to correspond to the 7 ancestral

748

chromosomes before the ECH. The short lines forming the innermost grape chromosome

749

circles represent predicted genes, which have 2 sets of paralogous regions, forming

750

another two circles. Each of the three sets of grape paralogous chromosomal regions has

751

2 orthologous copies in a legume with exception of soybean, which has 4. The resulting

752

69 circles were marked according to species by a capital letter, as defined in Fig. 1. Each

753

circle has an underline colored as to its source plant corresponding to the color scheme in

754

Fig. 1a and each circle is formed by short vertical lines that denote homologous genes,

755

colored as to chromosome number in their respective source plant as shown in the inset

756

color scheme.

757 758

Figure 3. Local alignment in selected genomes. grape, barrel medic, and soybean. The

759

graph shows details of a short segment of alignment marked out by a triangle in Fig. 2.

760

Homologous block phylogeny (left): three paralogous chromosome segments in the grape

761

genome, Grape-14, Grape-05 and Grape-07, from ancestral chromosomes affected by

762

ECH, each with two orthologous barrel medic and four soybean chromosome segments.

28

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

763

Chromosome numbers are shown after the names of plants, and locations on

764

chromosomes are also shown. A gene is shown by a rectangle with a small arrow

765

indicating its transcriptional direction. Homologous genes between neighboring

766

chromosomal regions are linked with lines.

767 768

Figure 4. Fitting a geometric distribution and gene loss rates. G. max to the V. vinifera

769

(A), M. truncatula (B) and P. vulgaris (C) genomes. The x-axis means numbers of

770

continuously missing genes in gene-colinearity regions.

771 772

Figure 5. Homologous aligments and G. max gene retention along corresponding

773

orthologous M. truncatula chromosomes. Genomic paralogy, orthology information

774

within and among genomes are displayed in 3 circles; The short lines forming the

775

innermost M. truncatula chromosome circles represent predicted genes. Each of the M.

776

truncatula paralogous chromosomal regions has 2 orthologous copies in soybean. Each

777

circle is formed by short vertical lines that denote homologous genes, colored as to

778

chromosome number in their respective source plant as shown in the inset color scheme.

779

(A) Rates of retained genes in sliding windows of soybean homoelogous region group 1

780

(red), homoelogous region group 2 (black). (B) the difference between two groups (blue)

781

are displayed.

782 783

Figure 6. Chromosome representation by using the 7 eudicot ancestral chromosomes and

784

those of P. vulgaris. Each chromosome from grape and legume genomes are firstly

785

represented by genes colinear to grape. Genes are denoted by short lines in 7 different

786

colors related to ancestral chromosomes before the ECH. Secondly, with the exception of

787

P. vulgaris, chromosomes from the other 10 legumes are represented by genes having P.

788

vulgaris colinear genes, and these colinear genes in each plant are colored as to P.

789

vulgaris chromosomes where their orthologs reside. Thus, a chromosome in the legume

790

genomes is displayed in two sets of short lines arranged side by side.

791 792

Figure 7. Dating evolutionary events within and among the legume genomes. soybean

793

(G), peanut (A&B), barrel medic (M), common bean (P), lotus (L), chickpea (E), pigeon

29

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

794

pea (C), adzuki bean (U), mung bean (R), and grape (V). A, Distribution of average

795

synonymous substitution levels (Ks) between colinear gene pairs in inter-genomic (solid

796

curves) and intra-genomic blocks (dashed curves). B, Distribution of average

797

synonymous substitution levels after correction to account for the evolutionary rate of

798

soybean genes. C, Correction to the Ks distribution and occurrence of key evolutionary

799

events.

800 801

Figure 8. Inferred ancestral gene numbers during the evolution of legumes.

802 803

Figure 9. Nodulation gene amplification model related to gene duplication events in

804

soybean. (A) Curved lines within the inner circle, colored by green, link paralog pairs on

805

the 20 soybean chromosomes produced by SST, (B) LCT and (C) ECH. Nodulation

806

subfamily genes are displayed in colors, light salmon (subfamily1), green (subfamily2),

807

grey (subfamily3), yellow (subfamily4), black (subfamily5), blue (subfamily6) and red

808

(subfamily7). Colored curved lines link nodulation gene pairs with Ks < 0.15.

809 810 811 812 813 814 815 816 817 818 819 820 821 822 823 824 825 826 827 828 829 830 831

LITERATURE CITED Altschul SF, Gish W, Miller W, Myers EW, Lipman DJ (1990) Basic local alignment search tool. J Mol Biol 215: 403-410 Barker MS, Husband BC, Pires JC (2016) Spreading Winge and flying high: The evolutionary importance of polyploidy after a century of study. Am J Bot 103: 1139-1145 Bertioli DJ, Cannon SB, Froenicke L, Huang G, Farmer AD, Cannon EK, Liu X, Gao D, Clevenger J, Dash S, Ren L, Moretzsohn MC, Shirasawa K, Huang W, Vidigal B, Abernathy B, Chu Y, Niederhuth CE, Umale P, Araujo AC, Kozik A, Kim KD, Burow MD, Varshney RK, Wang X, Zhang X, Barkley N, Guimaraes PM, Isobe S, Guo B, Liao B, Stalker HT, Schmitz RJ, Scheffler BE, Leal-Bertioli SC, Xun X, Jackson SA, Michelmore R, Ozias-Akins P (2016) The genome sequences of Arachis duranensis and Arachis ipaensis, the diploid ancestors of cultivated peanut. Nat Genet 48: 438-446 Bowers JE, Arias MA, Asher R, Avise JA, Ball RT, Brewer GA, Buss RW, Chen AH, Edwards TM, Estill JC, Exum HE, Goff VH, Herrick KL, Steele CL, Karunakaran S, Lafayette GK, Lemke C, Marler BS, Masters SL, McMillan JM, Nelson LK, Newsome GA, Nwakanma CC, Odeh RN, Phelps CA, Rarick EA, Rogers CJ, Ryan SP, Slaughter KA, Soderlund CA, Tang H, Wing RA, Paterson AH (2005) Comparative physical mapping links conservation of 30

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

832 833 834 835 836 837 838 839 840 841 842 843 844 845 846 847 848 849 850 851 852 853 854 855 856 857 858 859 860 861 862 863 864 865 866 867 868 869 870 871 872 873 874 875

microsynteny to chromosome structure and recombination in grasses. Proc Natl Acad Sci U S A 102: 13206-13211 Bowers JE, Chapman BA, Rong J, Paterson AH (2003) Unravelling angiosperm genome evolution by phylogenetic analysis of chromosomal duplication events. Nature 422: 433-438 Chalhoub B, Denoeud F, Liu S, Parkin IA, Tang H, Wang X, Chiquet J, Belcram H, Tong C, Samans B, Correa M, Da Silva C, Just J, Falentin C, Koh CS, Le Clainche I, Bernard M, Bento P, Noel B, Labadie K, Alberti A, Charles M, Arnaud D, Guo H, Daviaud C, Alamery S, Jabbari K, Zhao M, Edger PP, Chelaifa H, Tack D, Lassalle G, Mestiri I, Schnel N, Le Paslier MC, Fan G, Renault V, Bayer PE, Golicz AA, Manoli S, Lee TH, Thi VH, Chalabi S, Hu Q, Fan C, Tollenaere R, Lu Y, Battail C, Shen J, Sidebottom CH, Wang X, Canaguier A, Chauveau A, Berard A, Deniot G, Guan M, Liu Z, Sun F, Lim YP, Lyons E, Town CD, Bancroft I, Wang X, Meng J, Ma J, Pires JC, King GJ, Brunel D, Delourme R, Renard M, Aury JM, Adams KL, Batley J, Snowdon RJ, Tost J, Edwards D, Zhou Y, Hua W, Sharpe AG, Paterson AH, Guan C, Wincker P (2014) Plant genetics. Early allopolyploid evolution in the postNeolithic Brassica napus oilseed genome. Science 345: 950-953 Chen X, Li H, Pandey MK, Yang Q, Wang X, Garg V, Li H, Chi X, Doddamani D, Hong Y, Upadhyaya H, Guo H, Khan AW, Zhu F, Zhang X, Pan L, Pierce GJ, Zhou G, Krishnamohan KA, Chen M, Zhong N, Agarwal G, Li S, Chitikineni A, Zhang GQ, Sharma S, Chen N, Liu H, Janila P, Li S, Wang M, Wang T, Sun J, Li X, Li C, Wang M, Yu L, Wen S, Singh S, Yang Z, Zhao J, Zhang C, Yu Y, Bi J, Zhang X, Liu ZJ, Paterson AH, Wang S, Liang X, Varshney RK, Yu S (2016) Draft genome of the peanut A-genome progenitor (Arachis duranensis) provides insights into geocarpy, oil biosynthesis, and allergens. Proc. Natl. Acad. Sci. U. S. A 113: 6785-6790 Doyle JJ (2011) Phylogenetic Perspectives on the Origins of Nodulation. Mol Plant Microbe In 24: 1289-1295 Feldman M, Levy AA, Fahima T, Korol A (2012) Genomic asymmetry in allopolyploid plants: wheat as a model. J Exp Bot 63: 5045-5059 Frohlich MW, Chase MW (2007) After a dozen years of progress the origin of angiosperms is still a great mystery. Nature 450: 1184-1189 Garsmeur O, Schnable JC, Almeida A, Jourda C, D'Hont A, Freeling M (2014) Two evolutionarily distinct classes of paleopolyploidy. Mol Biol Evol 31: 448-454 Goebel K (1969) Organography of plants; especially of the Archegoniatae and Spermaphyta, Ed Authorized English. Hafner Pub. Co., New York, He N, Zhang C, Qi X, Zhao S, Tao Y, Yang G, Lee TH, Wang X, Cai Q, Li D, Lu M, Liao S, Luo G, He R, Tan X, Xu Y, Li T, Zhao A, Jia L, Fu Q, Zeng Q, Gao C, Ma B, Liang J, Wang X, Shang J, Song P, Wu H, Fan L, Wang Q, Shuai Q, Zhu J, Wei C, Zhu-Salzman K, Jin D, Wang J, Liu T, Yu M, Tang C, Wang Z, Dai F, Chen J, Liu Y, Zhao S, Lin T, Zhang S, Wang J, Wang J, Yang H, Yang G, Wang J, Paterson AH, Xia Q, Ji D, Xiang Z (2013) Draft genome sequence of the mulberry tree Morus notabilis. Nat Commun 4: 2445

31

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

876 877 878 879 880 881 882 883 884 885 886 887 888 889 890 891 892 893 894 895 896 897 898 899 900 901 902 903 904 905 906 907 908 909 910 911 912 913 914 915 916 917 918 919 920

International-Wheat-Genome-Sequencing-Consortium (2014) A chromosomebased draft sequence of the hexaploid bread wheat (Triticum aestivum) genome. Science 345: 1251788 Jaillon O, Aury JM, Noel B, Policriti A, Clepet C, Casagrande A, Choisne N, Aubourg S, Vitulo N, Jubin C, Vezzi A, Legeai F, Hugueney P, Dasilva C, Horner D, Mica E, Jublot D, Poulain J, Bruyere C, Billault A, Segurens B, Gouyvenoux M, Ugarte E, Cattonaro F, Anthouard V, Vico V, Del Fabbro C, Alaux M, Di Gaspero G, Dumas V, Felice N, Paillard S, Juman I, Moroldo M, Scalabrin S, Canaguier A, Le Clainche I, Malacrida G, Durand E, Pesole G, Laucou V, Chatelet P, Merdinoglu D, Delledonne M, Pezzotti M, Lecharny A, Scarpelli C, Artiguenave F, Pe ME, Valle G, Morgante M, Caboche M, Adam-Blondon AF, Weissenbach J, Quetier F, Wincker P, French-Italian Public Consortium for Grapevine Genome C (2007) The grapevine genome sequence suggests ancestral hexaploidization in major angiosperm phyla. Nature 449: 463-467 Jannoo N, Grivet L, David J, D'Hont A, Glaszmann JC (2004) Differential chromosome pairing affinities at meiosis in polyploid sugarcane revealed by molecular markers. Heredity 93: 460-467 Jiao Y, Leebens-Mack J, Ayyampalayam S, Bowers JE, McKain MR, McNeal J, Rolf M, Ruzicka DR, Wafula E, Wickett NJ, Wu X, Zhang Y, Wang J, Zhang Y, Carpenter EJ, Deyholos MK, Kutchan TM, Chanderbali AS, Soltis PS, Stevenson DW, McCombie R, Pires JC, Wong GK, Soltis DE, Depamphilis CW (2012) A genome triplication associated with early diversification of the core eudicots. Genome Biol 13: R3 Jiao Y, Wickett NJ, Ayyampalayam S, Chanderbali AS, Landherr L, Ralph PE, Tomsho LP, Hu Y, Liang H, Soltis PS, Soltis DE, Clifton SW, Schlarbaum SE, Schuster SC, Ma H, Leebens-Mack J, dePamphilis CW (2011) Ancestral polyploidy in seed plants and angiosperms. Nature 473: 97-100 Kang YJ, Kim SK, Kim MY, Lestari P, Kim KH, Ha BK, Jun TH, Hwang WJ, Lee T, Lee J, Shim S, Yoon MY, Jang YE, Han KS, Taeprayoon P, Yoon N, Somta P, Tanya P, Kim KS, Gwag JG, Moon JK, Lee YH, Park BS, Bombarely A, Doyle JJ, Jackson SA, Schafleitner R, Srinives P, Varshney RK, Lee SH (2014) Genome sequence of mungbean and insights into evolution within Vigna species. Nat Commun 5: 5443 Kang YJ, Satyawan D, Shim S, Lee T, Lee J, Hwang WJ, Kim SK, Lestari P, Laosatit K, Kim KH, Ha TJ, Chitikineni A, Kim MY, Ko JM, Gwag JG, Moon JK, Lee YH, Park BS, Varshney RK, Lee SH (2015) Draft genome sequence of adzuki bean, Vigna angularis. Sci Rep 5: 8069 Kashkush K, Feldman M, Levy AA (2002) Gene loss, silencing and activation in a newly synthesized wheat allotetraploid. Genetics 160: 1651-1659 Kellogg EA (2016) Has the connection between polyploidy and diversification actually been tested? Curr Opin Plant Biol 30: 25-32 Lin Y, Cheng Y, Jin J, Jin X, Jiang H, Yan H, Cheng B (2014) Genome duplication and gene loss affect the evolution of heat shock transcription factor genes in legumes. PLoS One 9: e102825

32

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

921 922 923 924 925 926 927 928 929 930 931 932 933 934 935 936 937 938 939 940 941 942 943 944 945 946 947 948 949 950 951 952 953 954 955 956 957 958 959 960 961 962 963 964 965

Magallon S, Sanderson MJ (2001) Absolute diversification rates in angiosperm clades. Evolution 55: 1762-1780 Paterson AH, Bowers JE, Bruggmann R, Dubchak I, Grimwood J, Gundlach H, Haberer G, Hellsten U, Mitros T, Poliakov A, Schmutz J, Spannagl M, Tang H, Wang X, Wicker T, Bharti AK, Chapman J, Feltus FA, Gowik U, Grigoriev IV, Lyons E, Maher CA, Martis M, Narechania A, Otillar RP, Penning BW, Salamov AA, Wang Y, Zhang L, Carpita NC, Freeling M, Gingle AR, Hash CT, Keller B, Klein P, Kresovich S, McCann MC, Ming R, Peterson DG, Mehboob ur R, Ware D, Westhoff P, Mayer KF, Messing J, Rokhsar DS (2009) The Sorghum bicolor genome and the diversification of grasses. Nature 457: 551-556 Paterson AH, Bowers JE, Chapman BA (2004) Ancient polyploidization predating divergence of the cereals, and its consequences for comparative genomics. Proc. Natl. Acad. Sci. U.S.A 101: 9903-9908 Sato S, Nakamura Y, Kaneko T, Asamizu E, Kato T, Nakao M, Sasamoto S, Watanabe A, Ono A, Kawashima K, Fujishiro T, Katoh M, Kohara M, Kishida Y, Minami C, Nakayama S, Nakazaki N, Shimizu Y, Shinpo S, Takahashi C, Wada T, Yamada M, Ohmido N, Hayashi M, Fukui K, Baba T, Nakamichi T, Mori H, Tabata S (2008) Genome structure of the legume, Lotus japonicus. DNA Res 15: 227-239 Schmutz J, Cannon SB, Schlueter J, Ma J, Mitros T, Nelson W, Hyten DL, Song Q, Thelen JJ, Cheng J, Xu D, Hellsten U, May GD, Yu Y, Sakurai T, Umezawa T, Bhattacharyya MK, Sandhu D, Valliyodan B, Lindquist E, Peto M, Grant D, Shu S, Goodstein D, Barry K, Futrell-Griggs M, Abernathy B, Du J, Tian Z, Zhu L, Gill N, Joshi T, Libault M, Sethuraman A, Zhang XC, Shinozaki K, Nguyen HT, Wing RA, Cregan P, Specht J, Grimwood J, Rokhsar D, Stacey G, Shoemaker RC, Jackson SA (2010) Genome sequence of the palaeopolyploid soybean. Nature 463: 178-183 Schmutz J, McClean PE, Mamidi S, Wu GA, Cannon SB, Grimwood J, Jenkins J, Shu S, Song Q, Chavarro C, Torres-Torres M, Geffroy V, Moghaddam SM, Gao D, Abernathy B, Barry K, Blair M, Brick MA, Chovatia M, Gepts P, Goodstein DM, Gonzales M, Hellsten U, Hyten DL, Jia G, Kelly JD, Kudrna D, Lee R, Richard MM, Miklas PN, Osorno JM, Rodrigues J, Thareau V, Urrea CA, Wang M, Yu Y, Zhang M, Wing RA, Cregan PB, Rokhsar DS, Jackson SA (2014) A reference genome for common bean and genome-wide analysis of dual domestications. Nat Genet 46: 707-713 Schnable JC, Springer NM, Freeling M (2011) Differentiation of the maize subgenomes by genome dominance and both ancient and ongoing gene loss. Proc Natl Acad Sci U S A 108: 4069-4074 Soltis DE, Bell CD, Kim S, Soltis PS (2008) Origin and early evolution of angiosperms. Ann N Y Acad Sci 1133: 3-25 Soltis DE, Visger CJ, Marchant DB, Soltis PS (2016) Polyploidy: Pitfalls and paths to a paradigm. Am J Bot 103: 1146-1166 Soltis DE, Visger CJ, Soltis PS (2014) The polyploidy revolution then...and now: Stebbins revisited. Am J Bot 101: 1057-1078

33

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

966 967 968 969 970 971 972 973 974 975 976 977 978 979 980 981 982 983 984 985 986 987 988 989 990 991 992 993 994 995 996 997 998 999 1000 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010

Soltis PS, Marchant DB, Van de Peer Y, Soltis DE (2015) Polyploidy and genome evolution in plants. Curr Opin Genet Dev 35: 119-125 Tang H, Krishnakumar V, Bidwell S, Rosen B, Chan A, Zhou S, Gentzbittel L, Childs KL, Yandell M, Gundlach H, Mayer KF, Schwartz DC, Town CD (2014) An improved genome release (version Mt4.0) for the model legume Medicago truncatula. BMC Genomics 15: 312 Throude M, Bolot S, Bosio M, Pont C, Sarda X, Quraishi UM, Bourgis F, Lessard P, Rogowsky P, Ghesquiere A, Murigneux A, Charmet G, Perez P, Salse J (2009) Structure and expression analysis of rice paleo duplications. Nucleic Acids Res 37: 1248-1259 Van de Peer Y (2011) A mystery unveiled. Genome Biol 12: 113 Varshney RK, Chen W, Li Y, Bharti AK, Saxena RK, Schlueter JA, Donoghue MT, Azam S, Fan G, Whaley AM, Farmer AD, Sheridan J, Iwata A, Tuteja R, Penmetsa RV, Wu W, Upadhyaya HD, Yang SP, Shah T, Saxena KB, Michael T, McCombie WR, Yang B, Zhang G, Yang H, Wang J, Spillane C, Cook DR, May GD, Xu X, Jackson SA (2012) Draft genome sequence of pigeonpea (Cajanus cajan), an orphan legume crop of resource-poor farmers. Nat Biotechnol 30: 83-89 Varshney RK, Song C, Saxena RK, Azam S, Yu S, Sharpe AG, Cannon S, Baek J, Rosen BD, Tar'an B, Millan T, Zhang X, Ramsay LD, Iwata A, Wang Y, Nelson W, Farmer AD, Gaur PM, Soderlund C, Penmetsa RV, Xu C, Bharti AK, He W, Winter P, Zhao S, Hane JK, Carrasquilla-Garcia N, Condie JA, Upadhyaya HD, Luo MC, Thudi M, Gowda CL, Singh NP, Lichtenzveig J, Gali KK, Rubio J, Nadarajan N, Dolezel J, Bansal KC, Xu X, Edwards D, Zhang G, Kahl G, Gil J, Singh KB, Datta SK, Jackson SA, Wang J, Cook DR (2013) Draft genome sequence of chickpea (Cicer arietinum) provides a resource for trait improvement. Nat Biotechnol 31: 240-246 Wang BB, Brendel V (2006) Genomewide comparative analysis of alternative splicing in plants. Proc Natl Acad Sci U S A 103: 7175-7180 Wang X, Guo H, Wang J, Lei T, Liu T, Wang Z, Li Y, Lee TH, Li J, Tang H, Jin D, Paterson AH (2016) Comparative genomic de-convolution of the cotton genome revealed a decaploid ancestor and widespread chromosomal fractionation. New Phytol 209: 1252-1263 Wang X, Shi X, Li Z, Zhu Q, Kong L, Tang W, Ge S, Luo J (2006) Statistical inference of chromosomal homology based on gene colinearity and applications to Arabidopsis and rice. BMC Bioinformatics 7: 447 Wang X, Tang H, Bowers JE, Paterson AH (2009) Comparative inference of illegitimate recombination between rice and sorghum duplicated genes produced by polyploidization. Genome Res 19: 1026-1032 Wang X, Wang J, Jin D, Guo H, Lee TH, Liu T, Paterson AH (2015) Genome Alignment Spanning Major Poaceae Lineages Reveals Heterogeneous Evolutionary Rates and Alters Inferred Dates for Key Evolutionary Events. Mol Plant 8: 885-898 Wang Y, Wang X, Lee TH, Mansoor S, Paterson AH (2013) Gene body methylation shows distinct patterns associated with different gene origins and

34

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 1026 1027 1028 1029 1030 1031 1032 1033 1034 1035 1036 1037 1038 1039

duplication modes and has a heterogeneous relationship with gene expression in Oryza sativa (rice). New Phytol 198: 274-283 Wang Y, Wang X, Tang H, Tan X, Ficklin SP, Feltus FA, Paterson AH (2011) Modes of gene duplication contribute differently to genetic novelty and redundancy, but show parallels across divergent angiosperms. PLoS One 6: e28150 Young ND, Debelle F, Oldroyd GE, Geurts R, Cannon SB, Udvardi MK, Benedito VA, Mayer KF, Gouzy J, Schoof H, Van de Peer Y, Proost S, Cook DR, Meyers BC, Spannagl M, Cheung F, De Mita S, Krishnakumar V, Gundlach H, Zhou S, Mudge J, Bharti AK, Murray JD, Naoumkina MA, Rosen B, Silverstein KA, Tang H, Rombauts S, Zhao PX, Zhou P, Barbe V, Bardou P, Bechner M, Bellec A, Berger A, Berges H, Bidwell S, Bisseling T, Choisne N, Couloux A, Denny R, Deshpande S, Dai X, Doyle JJ, Dudez AM, Farmer AD, Fouteau S, Franken C, Gibelin C, Gish J, Goldstein S, Gonzalez AJ, Green PJ, Hallab A, Hartog M, Hua A, Humphray SJ, Jeong DH, Jing Y, Jocker A, Kenton SM, Kim DJ, Klee K, Lai H, Lang C, Lin S, Macmil SL, Magdelenat G, Matthews L, McCorrison J, Monaghan EL, Mun JH, Najar FZ, Nicholson C, Noirot C, O'Bleness M, Paule CR, Poulain J, Prion F, Qin B, Qu C, Retzel EF, Riddle C, Sallet E, Samain S, Samson N, Sanders I, Saurat O, Scarpelli C, Schiex T, Segurens B, Severin AJ, Sherrier DJ, Shi R, Sims S, Singer SR, Sinharoy S, Sterck L, Viollet A, Wang BB, Wang K, Wang M, Wang X, Warfsmann J, Weissenbach J, White DD, White JD, Wiley GB, Wincker P, Xing Y, Yang L, Yao Z, Ying F, Zhai J, Zhou L, Zuber A, Denarie J, Dixon RA, May GD, Schwartz DC, Rogers J, Quetier F, Town CD, Roe BA (2011) The Medicago genome provides insight into the evolution of rhizobial symbioses. Nature 480: 520-524 Zhu H, Choi HK, Cook DR, Shoemaker RC (2005) Bridging model and crop legumes through comparative genomics. Plant Physiol 137: 1189-1196

35

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Fi gur e1.Spec i esandgenephy l ogenet i ct r ee.A,Phy l ogenet i ct r eeofG.max( G) ,A.dur anens i s( A) ,A.i paens i s( B) ,M.t r unc at ul a( M) ,P.v ul gar i s( P) ,L.j aponi c us( L) ,C.ar i et i um ( E) ,C.c aj an( C) ,V.angul ar i s( U) , V.r adi at a( R) ,andV.v i ni f er a( V) ;TheEudi c ot c ommonhex apl oi dy( ECH)i sdenot edbybl uehex agon,l egumec ommont et r apl oi dy( LCT)byr eds quar e,ands oy beans pec i f i ct et r apl oi dy( SST)byy el l ow s quar e; B,Genephy l ogenet i ct r ee:t hr eepar al ogousgenesi nt heV.v i ni f er agenome,V1,V2andV3,pr oduc edby t heECH,eac hhav et woor t hol ogsi nnons oy beanl egumegenomes ,andf ouror t hol ogsi ns oy bean.

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Fi gur e2.Homol ogousal i gnment sof10l egumegenomeswi t hV.vi ni f er aasar ef er enc e.Genomi c par al ogy ,or t hol ogy ,andout par al ogyi nf or mat i onwi t hi nandamong10l egumesar edi s pl ay edi n69c i r c l es , eac hc or r es pondi ngt oanex t antgenei nFi g.1B;Thec ur v edl i neswi t hi nt hei nnerc i r c l ear ef or medby19 gr apec hr omos omesc ol or c odedt oc or r es pondt ot he7anc es t r al c hr omos omesbef or et heECH.Thes hor t l i nesf or mi ngt hei nner mos tgr apec hr omos omec i r c l esr epr es entpr edi c t edgenes ,whi c hhav e2s et sof par al ogousr egi ons ,f or mi nganot hert woc i r c l es .Eac hoft het hr ees et sofgr apepar al ogousc hr omos omal r egi onshas2or t hol ogousc opi esi nal egumewi t hex c ept i onofs oy bean,whi c hhas4.Ther es ul t i ng69 c i r c l eswer emar k edac c or di ngt os pec i esbyac api t al l et t er ,asdef i nedi nFi g.1.Eac hc i r c l ehasan under l i necol or edast oi t ssour cepl antcor r espondi ngt ot hecol orschemei nFi g.1Aandeachci r cl ei s f or medbyshor tver t i cal l i nest h atdenot e ho o go us-g enes,c owww.plantphysiol.org l or edast ochr omosomenumberi nt hei r Downloaded from onmo Julyl 21, 2017 Published by Copyright © 2017 American Society of Plant Biologists. All rights reserved. r espect i vesour cepl antasshowni nt hei nsetcol orscheme.

Cor eeudi cotcommonhexapl oi dy ( gamma, ~130mya)

Legumecommont er apl oi dy ( bet a, ~59mya)

Soybeanspeci f i ct et r apl oi dy ( al pha, ~13mya)

Fi gur e3.Localal i gnmenti nsel ect edgenomes.gr ape,bar r elmedi c,andsoybean.Thegr aphshowsdet ai l sofashor t segmentofal i gnmentmar kedoutbyat r i angl ei nFi gur e2A.Homol ogousbl ockphyl ogeny( l ef t ) :t hr eepar al ogouschr omosomesegment si nt hegr apegenome,Gr ape14,Gr ape05andGr ape07,f r om ancest r alchr omosomesaf f ect edby ECH,eachwi t ht woor t hol ogousmedi cagoandf oursoybeanchr omosomesegment s.Chr omosomenumber sar eshown af t ert henamesofpl ant s,andl ocat i onsonchr omosomesar eal soshown.Agenei sshownbyar ect angl ewi t hasmal l ar r ow i ndi cat i ngi t st r anscr i pt i onaldi r ect i on.Homol ogousgenesbet weennei ghbor i ngchr omosomalr egi onsar el i nked wi t hl i nes.

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

A

B

C

Fi gur e4.Fi t t i ngageomet r i cdi s t r i but i onandgenel os sr at es .G.maxt ot heV.v i ni f er a( A) ,M.t r unc at ul a( B) andP.v ul gar i s( C)genomes .Th ex ax i sfrom meon an sn umb er ofc on t i n uous l ymi s s i nggenesi ngenec ol i near i t y Downloaded July 21, 2017 -s Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved. r egi ons .

Fi gur e5.Homol ogousal i gment sandG.maxgener et ent i onal ongc or r es pondi ngor t hol ogous M.t r unc at ul ac hr omos omes .Genomi cpar al ogy ,or t hol ogyi nf or mat i onwi t hi nandamonggenomesar e di s pl ay edi n3c i r c l es ;Thes hor tl i nesf or mi ngt hei nner mos tM.t r unc at ul ac hr omos omec i r c l esr epr es ent pr edi c t edgenes .Eac hoft heM.t r unc at ul apar al ogousc hr omos omalr egi onshas2or t hol ogousc opi es i ns oy bean.Eac hc i r c l ei sf or medbys hor tv er t i c all i nest hatdenot ehomol ogousgenes ,c ol or edast o c hr omos omenumberi nt hei rr es pec t i v es our c epl antass howni nt hei ns etc ol ors c heme.( A)Rat esof r et ai nedgenesi ns l i di ngwi ndowsofs oy beanhomoel ogousr egi ongr oup1( r ed) ,homoel ogousr egi on gr oup2( bl ac k ) .( B)t hedi f f er enc ebet weent wogr oups( bl ue)ar edi s pl ay ed. Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

Fi gur e6.Chr omosomer epr esent at i onbyusi ngt he7eudi cotancest r al chr omosomesandt hoseofP.vul gar i s.Eachchr omosomef r om gr apeand 10l egumegenomesar ef i r st l yr epr esent edbygenescol i neart ogr ape.Genesar edenot edbyshor tl i nesi n7di f f er entcol or sr el at edt oancest r al chr omosomesbef or et heECH.Secondl y,wi t ht heexcept i onofP.vul gar i s,chr omosomesf r om t heot her9l egumesar er epr esent edbygenes havi ngP.vul gar i scol i neargenes,andt hesecol i neargenesi neachpl antar ecol or edast oP.vul gar i schr omosomeswher et hei ror t hol ogsr esi de. Thus,achr omosomei nt he9 l egume gen me si sdi s pl ayedby i n t woset sofshor tl i nesar r angedsi debysi de. Downloaded from ono July 21, 2017 - Published www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

A

B

C

Fi gur e7.Dat i ngevol ut i onar yevent swi t hi nandamong9l egumegenomes.soybean( G) ,peanut( A&B) , bar r el medi c( M) ,commonbean( P) ,l ot us( L) ,chi ckpea( E) ,pi geonpea( C) ,adzuki bean( U) ,mungbean ( R) ,andgr ape( V) .A,Di st r i but i onofaver agesynonymoussubst i t ut i onl evel s( Ks)bet weencol i neargene pai r si ni nt er genomi c( sol i dcur ves)andi nt r agenomi cbl ocks( dashedcur ves) .B,Di st r i but i onofaver age synonymoussubst i t ut i onl evel saf t ercor r ect i ont oaccountf ort heevol ut i onar yr at eofsoybeangenes.C, Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Cor r ect i ont ot heKsdi st r i but i onand ccu r r enceSociety ofkeof yPlant evol ut i onar ye ven t s. Copyright ©o 2017 American Biologists. All rights reserved.

Fi gur e8.I nf er r edanc es t r al genenumber sdur i ngt heev ol ut i onofl egume.

Downloaded from on July 21, 2017 - Published by www.plantphysiol.org Copyright © 2017 American Society of Plant Biologists. All rights reserved.

A

B

C

Fi gur e9.Nodul at i ongeneampl i f i cat i onmodelr el at edt ogenedupl i cat i onevent si nsoybean.A,Cur vedl i neswi t hi nt hei nnerc i r cl e,col or edbygr een, l i nkpar al ogpai r sont he20soybeanc hr omosomespr oducedbySST,B,LCTandC,ECH.Nodul at i onsubf ami l ygenesar edi spl ayedi ncol or s,l i ght sal mon( subf ami l y1) ,gr een( subf ami l y2) ,gr ey( subf ami l y3) ,yel l ow ( subf ami l y4) ,bl ack( subf ami l y5) ,bl ue( subf ami l y6)andr ed( subf ami l y7) .Col or ed cur vedl i nesl i nknodul at i ongenepai r swi t hKs