Sequencing of Panax notoginseng genome reveals

0 downloads 0 Views 10MB Size Report
Jul 4, 2018 - 6Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, China. 7Hong Kong ...... [Status and prospective on nutritional physiology and ... Gururani MA, Venkatesh J, Upadhyaya CP, Nookaraju A, Pandey SK, Park SW. ... The Arabidopsis book / American Society of Plant Biologists.
bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

Sequencing of Panax notoginseng genome reveals genes involved in disease resistance and ginsenoside biosynthesis Guangyi Fan1,2,*, Yuanyuan Fu2,3,*, Binrui Yang1,*, Minghua Liu4,*, He Zhang2, Xinming Liang2, Chengcheng Shi2, Kailong Ma2, Jiahao Wang2, Weiqing Liu2, Libin Shao 2, Chen Huang1, Min Guo1, Jing Cai1, Andrew KC Wong5, Cheuk-Wing Li1, Dennis Zhuang5, Ke-Ji Chen6, Wei-Hong Cong6, Xiao Sun3, Wenbin Chen2, Xun Xu2, Stephen Kwok-Wing Tsui4,7,†, Xin Liu2,†, Simon Ming-Yuen Lee1,†. 1

State Key Laboratory of Quality Research in Chinese Medicine, Institute of Chinese Medical

Sciences, University of Macau, Macao, China. 2

BGI-Qingdao, BGI-Shenzhen, Qingdao 266000, China.

3

State Key Laboratory of Bioelectronics, School of Biological Sciences and Medical Engineering,

Southeast University, Nanjing 210096, China 4

School of Biomedical Sciences, The Chinese University of Hong Kong, Hong Kong, China

5

System Design Engineering, University of Waterloo, Ontario, Canada

6

Xiyuan Hospital, China Academy of Chinese Medical Sciences, Beijing, 100091, China

7

Hong Kong Bioinformatics Centre, The Chinese University of Hong Kong, Hong Kong, China

*These authors contributed equally to this work.

†Correspondence authors:

Simon Ming-Yuen Lee ([email protected]), Wenbin Chen

([email protected]) and Stephen Kwok-Wing Tsui ([email protected]).

Abstract Background: Panax notoginseng is a traditional Chinese herb with high medicinal and economic value. There has been considerable research on the pharmacological activities of ginsenosides contained in Panax spp.; however, very little is known about the ginsenoside biosynthetic pathway. Results: We reported the first de novo genome of 2.36 Gb of sequences from P. notoginseng with 35,451 protein-encoding genes. Compared to other plants, we found notable gene family contraction of disease-resistance genes in P. notoginseng, but

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

notable expansion for several ATP-binding cassette (ABC) transporter subfamilies, such as the Gpdr subfamily, indicating that ABCs might be an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, we identified several key genes, their transcription factor binding sites and all their family members involved in the synthesis pathway of ginsenosides in P. notoginseng, including dammarenediol synthase, CYP716 and UGT71. Conclusions: The complete genome analysis of P. notoginseng, the first in genus Panax, will serve as an important reference sequence for improving breeding and cultivation of this important nutraceutical and medicinal but vulnerable plant species. Keywords: Panax notoginseng, Ginsenosides biosynthesis, Genome, Transcriptome, Traditional Chinese herb.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

Background Genus Panax in the Araliaceae, contains some medicinally and economically important ginseng species including P. ginseng, P. quinquefolius and P. notoginseng (Sanqi in Chinese)[1]. The United States, Canada, China and South Korea are the biggest producers of ginseng, with their total production of fresh ginseng being approximately 175.85 million pounds and constituting more than 99% of the world’s total ginseng production[2]. The world ginseng market is estimated to be worth $2,084 million[2]. Unlike P. ginseng and P. quinquefolius, which are widely distributed in several countries in the northern hemisphere, growth of P. notoginseng is mainly restricted to mountain areas with altitudes of 1200–2000 m around 23.5°N and 104°E in Wenshan Prefecture, Yunnan Province, in China[3]. In China, P. notoginseng has been cultivated for about 400 years[4] and the cultivated area was estimated at around 8300 ha in late 2005, harvesting 7.03 × 106 kg of fresh roots of P. notoginseng[5]. The stable P. notoginseng population is relatively genetically homogeneous compared with the other ginseng species. Panax notoginseng is susceptible to a wide range of pathogens and identification of the genes conferring disease resistance has been a major focus of research[6], which greatly reduces its economic benefits. The main class of disease-resistance genes (R-genes) consist of a NBS, a C-terminal LRR and a putative coiled coil domain (CC) at the N-terminus or a N-terminal domain with homology to the mammalian toll interleukin 1 receptor (TIR) domain[7]. Generally, the LRR domain is often involved in protein–protein interactions, with an important role in recognition specificity as well as ligand binding[8]. Besides, ATP-binding cassette (ABC) proteins have also been reported to play a crucial role in the regulation of resistance processes in plants[9], most of which modulate the activity of heterologous channels or have intrinsic channel activity. The reason underlying the poor resistance of P. notoginseng is still unclear. Saponin constituents, also known as ginsenosides, are the major active ingredients in genus Panax[10, 11] and have diverse pharmacological activities, such as

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

hepatoprotection, renoprotection, estrogen-like activities and protection against cerebro-cardiovascular ischemia and dyslipidaemia in experimental models[12-16]. Xuesaitong (in Chinese) is a prescription botanical drug, manufactured from saponins in the root of P. notoginseng[12], used for the prevention and treatment of cardiovascular diseases, and is among the top best-selling prescribed Chinese medicines in China[17, 18]. Compound Danshen Dripping Pill, a prescription botanical drug for cardiovascular disease[19], comprises P. notoginseng, Salvia miltiorrhiza and synthetic borneol, and has successfully completed Phase II and is undergoing Phase III clinical trials in the United States. Although different ginseng species have been one of the most important traded herbs for human health around the world, the industry is facing elevating challenges, such as increasing disease vulnerability of the cultivated plant and an as yet undetermined disease preventing continuous cultivation on the same land, leading to a trend of decreasing production[2]. Plant metabolites play very important roles in plant defense mechanisms against stress and disease, and some have important nutritional value for human consumption. For instance, capsaicinoids and caffeine present in pepper and coffee, respectively, have important health benefits. The recently completed whole-genome sequencing of coffee and pepper provide new clues in understanding evolution and transcriptional control of the biosynthesis pathway of these metabolites[20, 21]. Ginsenosides, the oligosaccharide glycosides of a series of dammarane- or oleanane-type triterpenoid glycosides[11], are a group of secondary metabolites of isoprenoidal natural products, specifically present in genus Panax. Based on the structure differentiation of the sapogenins (aglycones of saponin), dammarane-type ginsenosides can be classified into two subtypes: protopanaxadiol (PPD) and protopanaxatriol (PPT) types[11]. Up to now, over 100 structurally diversified ginsenosides have been isolated. Interestingly, the two types (PPD and PPT) of ginsenosides have been reported to exhibit opposing biological activities; for instance, pro-angiogenesis and anti-angiogenesis[22, 23]. The whole plant of P. notoginseng contains both PPD- and PPT-type ginsenosides but the aerial parts (e.g. leaf and flower) contain a higher abundance of PPD compared to

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

roots[24, 25]. Herein, we characterize the genome of P. notoginseng, the population of which has a relatively uniform genetic background. This was aimed to determine the genetic causes of weakened resistance of P. notoginseng as well as to understand evolution and regulation of the triterpenoid saponin biosynthesis pathway leading to the production of specialized ginsenosides in genus Panax.

Data descriptions Genomic DNA was isolated from leaf of P. notoginseng from Wenshan City of China and then six libraries were constructed with various insert sizes ranging from 170 bp to 10 kb (including 170bp, 500bp, 800bp, 2kb, 5kb and 10kb). In total, 377.44 Gb (~153-fold) of raw sequences were generated using the Illumina Hiseq 2000 platform and ~13 Gb (~6-fold, average read length of ~9kb) SMRT sequence data generated by Pacbio RSII sequencing system (Supplementary Table 1). To meet the requirement of further analysis, we obtained ~256.07 Gb data (~104.09-fold) by filtering the low quality (≤ 7) and adapter contaminated reads. Leaves and roots of P. notoginseng were collected from 1-, 2- and 3-year-old P. notoginseng and flowers collected from 2- and 3-year-old P. notoginseng. Total RNA was extracted from each part and then mRNA was isolated. An average of ~6.06 Gb raw data were obtained from each sample sequenced using Illumina HiSeq 2000 (Supplementary Table 2).

Analyses Genome assembly and annotation We de novo sequenced and assembled the P. notoginseng genome using ~256.07 Gb data (~104.09-fold) generated by Illumina HiSeq2000 and ~13 Gb (~6-fold, average read length of ~9kb) SMRT sequence data generated by Pacbio RSII sequencing system. The final genome assembly spanned about 2.36 Gb (~95.93% of the estimated genome size) with scaffold N50 of 72.37 kb and contig N50 of 16.42 kb (Supplementary Table 3, 4 and Supplementary Fig. 1). To evaluate our assembly, we firstly checked the links within scaffolds based on the pair-end relationships of sequencing reads and the result revealed that pair-end information supported the links

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

of scaffolds well (Supplementary Fig. 2). Secondly, we obtained the unmapped sequencing reads against our genome assembly and re-assembled them into ~359 Mb sequences which contained ~95.05% repetitive sequences (Supplementary Table 5). Using KAT sect tool[26], we also calculated the sequencing depth distribution of flanking sequences of gap regions, we observed that ~71% of sequences nearby junction regions have higher sequencing depth (Supplementary Fig. 3), indicating that majority of the gap regions were repetitive sequences. Finally, more than 95.42% of transcripts (coverage ratio >= 90%) could be unambiguously mapped into assembly sequences revealing the good integrity of genome assembly (Supplementary Table 6). Of the genome, 72.45% of repetitive sequences ratio was estimated by Jellyfish[27] but only 1.24 Gb (51.03%) was found to be repetitive sequences, which is more than that of carrot genome (~46%)[28], with long terminal repeats the most abundant (~95.08% of transposable elements) (Supplementary Table 7). We totally predicted 35,451 protein-encoding genes in the P. notoginseng genome by integrating the evidences of ab initio prediction, homologous

alignment and transcripts

(Supplementary Table 8). Assessing the P. notoginseng genome assembly and annotation completeness with single-copy orthologs approach (BUSCO)[29] showed that 895 (93%) complete single-copy genes containing 246 (25%) complete duplicated genes were validated, similar to that of cotton[30] and Dendrobium officinale[31]. In combination with the RNA-seq mapping results above, it was unambiguous that our gene set was available for further analysis. Genome evolution of Panax species Panax notoginseng is the first sequenced genome of genus Panax, thus we compared its genome to other related species to reveal genome evolution of Panax species. To analyze P. notoginseng evolution, we first constructed the phylogenetic tree using published closely-related species (Supplementary Fig. 4). We estimated the species divergence time between P. notoginseng and D. carota to be about 71.9 million years ago (Supplementary Fig. 4). Compared to closely related species (D. carota, S. tuberosum, S. lycopersicum and C. annuum), we found 1,144 unique gene families (containing 2,714 genes) with no orthologs in other species (Supplementary Fig. 5),

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

which were possibly related to specific features of P. notoginseng. Thus, we performed functional enrichment of these genes to find several interesting Gene Ontology

(GO)

terms

likely

related

to

saponin

biosynthesis,

such

as

UDP-glucosyltransferase (UGT) activity (P = 0.003, Supplementary Fig. 6). To determine features and specific functions in P. notoginseng, we identified a total of 1,520 expanded gene families (Supplementary Fig. 4) and further performed Kyoto Encyclopedia of Genes and Genomes (KEGG) and GO functional enrichment analyses for these genes. We found several significant gene families related to saponin biosynthesis, such as sesquiterpenoid and triterpenoid biosynthesis (P = 0.006, Supplementary Table 9). Resistance genes Most

cloned

R-genes

in

the

plant

encode

nucleotide-binding

site

and

leucine-rich-repeat (NBS-LRR) domains. In our P. notoginseng genome assembly, 129 NBS-LRR-encoding genes were detected, which was notably fewer than Arabidopsis thaliana (183), D. carota (170), S. tuberosum (441), S. lycopersicum (282) and C. annuum (778), and similar to that of C. sativus (89)[32] (Supplementary Table 10). Considering R genes were known to be small and in repetitive areas and our assembly with much gaps in the repetitive regions, the total number of R gene of P. notoginseng would be likely to underestimated. However, the number of genes in the TIR_NBA_LRR subfamily was still relatively expanded according to the phylogenetic analysis using P. notoginseng, tomato and carrot, which possibly be related to the resistance character of P. notoginseng (Fig. 1a and 1b). ABC transporter gene family We identified a total of 153 ATP-binding cassette (ABC) genes in P. notoginseng, which were classified into nine subfamilies (Supplementary Table 11 and Supplementary Fig. 7). Generally, compared with A. thaliana (12), Daucus carota (5), Solanum lycopersicum (10), S. tuberosum (13) and Capsicum annuum (10), there were notably fewer members of subfamily A in P. notoginseng (3); whereas, there were notably more of subfamily B in P. notoginseng (28, 28, 33, 38, 34 and 47 for subfamily B, respectively) (Supplementary Table 11). In detail, two genes were

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

notably contracted for subfamily A (Supplementary Fig. 8) – a subfamily reportedly involved in cellular lipid transport[33]. For subfamily B, we found a specific expanded clade in P. notoginseng containing 11 genes, three duplication expansions and one contraction (Supplementary Fig. 7 and Supplementary Fig. 9). In particular, duplication expansions were Pno037415.1 and Pno028859.1

(AT2G36910.1

homologs);

Pno009402.1

and

Pno037963.1

(AT3G28860.1 homologs); and Pno034123.1 and Pno018232.1 (AT5G39040.1 homologs). One contraction was Pno003332.1 (AT4G28620.1, AT4G28630.1 and AT5G58270.1 homolog). Interestingly, the expanded genes have been reported as involved in aluminum resistance and auxin transport in stems and roots, respectively[34, 35]. The contracted gene was found to be involved in iron export[36]. In the Gpdr subfamily, we also found a notable expansion (P. notoginseng: 10 vs. A. thaliana: 1, AT1G15520.1) (Fig. 1c). The homolog of AT1G15520.1 is known to be related to pleiotropic drug resistance and abscisic acid (ABA) uptake transport[37, 38]. Key genes involved in ginsenoside biosynthesis of P. notoginseng Dammarenediol synthase Ginsenosides are committed to be synthesized from dammarenediol-II after hydroxylation via cytochrome P450 (CYP450)[39] and then glycosylation by glycosyltransferase (GT)[40]. Dammarenediol-II is synthesized from farnesyl-PP, which is synthesized via the terpenoid backbone biosynthesis pathway. There are three enzymes involved from farnesyl-PP to dammarenediol-II: farnesyl-diphosphate farnesyltransferase (FDFT1, K00801), squalene monooxygenase (SQLE, K00511) and dammarenediol synthase (DDS, K15817) (Fig. 2a). In total, we identified three SQLE, two FDFT1 and one DDS in P. notoginseng. Combining with the results of real-time PCR, DDS (Pno034035.1) was found with extensive higher expression in the root than aerial parts and in 3-year old plant both root and aerial parts were found with higher expression of DDS (Pno034035.1). We also found that expressions of one FDFT1 (Pno029444.1) and one SQLE (Pno022472.1) were higher in roots compared with flowers and leaves, and one SQLE (Pno009162) gene showed the opposite

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

pattern according to RNA-seq, while the real-time PCR results suggested an increasing expression year after year of all these three genes (Fig. 2b). Phylogenetic analysis showed that DDS of P. notoginseng and P. ginseng had a very high (98.96%) amino acid sequence identity, differing in only eight amino acids, and with fewer genes than other species (Supplementary Fig. 10). We further found there were three amino-acid residue insertions (L194, A195 and E196) in the DDS sequences of P. notoginseng and P. ginseng, which were located in the cyclase-N domain (Fig. 2c). We used the SWISS-MODEL[41] to conduct structural modeling of DDS protein using

the

human

oxidosqualene

cyclase

(OSC)

in

complex

with

Ro

48-80771[42](SMTL: 1w6j.1) as the template. When the locations of the amino acids were mapped to the protein 3D structure with amino-acid identity of 40%, two residue insertions (L194 and A195) were located between two DNA helices, which are the presumed substrate-binding pocket and the catalytic residues (Fig. 2d). These three amino acids of DDS sequences of P. notoginseng probably played an important role in the Dammarenediol-II biosynthesis. CYP450s CYP450s and GTs have been demonstrated to be involved in hydroxylation or glycosylation of aglycones for triterpene saponin biosynthesis. We identified a total of 268 CYP450 genes in P. notoginseng using 243 CYP450 homolog genes of A. thaliana, which is consistent with the report that there are around 300 CYP450 genes in genomes of flowering plants[43]. To compare with other species, we also identified CYP450 genes of four other genomes—D. carota (325), S. tuberosum (465), S. lycopersicum (252) and C. annuum (256)—using the same parameters as for P. notoginseng (Supplementary Table 12 and Supplementary Fig. 11). We further classified the subfamilies of CYP450 genes for each species based on the category of CYP450 in A. thaliana and nine CYP716 genes were identified in P. notoginseng (Fig. 3a). CYP716A47 was reported to produce PPD[44] and CYP716A53v2 was reported to be involved in the synthesis of PPT in P. ginseng (Fig. 3b). Moreover, considering the role of different types of notoginsenoside in different parts/tissues of P. notoginseng— that is, PPT was relatively higher in roots while PPD was higher in

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

aerial parts (leaves and flowers) (Fig. 3c)— we sequenced the transcriptomes of leaves, roots and flowers collected from different ages of P. notoginseng. We found Pno012347.1 (CYP716A47 homolog), Pno002960.1 (CYP716A53v2 homolog) and Pno011760.1 were always more highly expressed in roots than in leaves and flowers for all ages; while Pno021283.1 (CYP716A47 homolog) was the opposite (Fig. 3d). Thus, Pno012347.1 and Pno021283.1 were the candidate genes involved in PPD biosynthesis and Pno002960.1 was the candidate gene involved in PPT biosynthesis in P. notoginseng. The expression levels of three CYP716 genes (Pno012347.1, Pno002960.1 and Pno011760.1) validated using real-time PCR were consistent with RNA-seq. Meanwhile, we identified another 22 CYP450 genes (including CYP78, CYP71, CYP72 and CYP86), which had the same expression pattern as three CYP716A53v2-homolog genes (Fig. 3d), indicated they may be involved into PPT biosynthesis. UGTs Three UGTs (UGT71A27, UGT74AE2 and UGT94Q2) that participate in the biosynthesis of PPD-type ginsenosides and three UGTs (UGTPg1, UGTPg100 and UGTPg101) participating in the formation of PPT-type saponins have previously been identified in P. ginseng. We identified a total of 160 UGT genes in the P. notoginseng genome based on 116 homologous UGT genes of A. thaliana[45]. We classified these 160 UGTs of P. notoginseng into 12 subfamilies containing 19 UGT71A homologs and 14 UTG74 homologs (Supplementary Fig. 12). UGTPg1, UGTPg100, UGTPg101, UGTPg102 and UGTPg103 of P. ginseng were UGT71A27 homologs; however, UGTPg102 and UGTPg103 were found to have no detectable activity on PPT due to the lack of several key amino acids in their proteins (Fig. 4a and b). We identified 38 differentially expressed UGT genes between roots and aerial parts (flowers and leaves), including 23 expressed more highly in roots (HR) and 15 in aerial tissues (HA) (Fig. 4c and d). Interestingly, there were three UGT71A27 homolog (Pno020280.1, Pno027722.1 and Pno026280.1) genes in the HR, and HA included four UGT74 genes (Pno031515.1, Pno033394.1, Pno002810.1 and Pno000722.1) and one UGT71A27 homolog (Pno013844.1) gene (Fig. 4c and d).

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

Furthermore, when comparing the UGT71A27 homolog genes of P. notoginseng with UTGPg1, UGTPg100, UGTPg101, UGTPg102 and UGTPg103, we checked the known key amino-acid residues determining the function of UGTs. We found that the key amino-acid residues UGTPg100-A142T, UGTPg100-L186S, UGTPg100-G338R and UGTPg1-H82C were conserved in both Pno026280.1 and Pno27722.1. However, UGTPg1-H144F is a specific mutation F in the Pno026280.1 (Supplementary Fig. 13). All these candidate genes probably play important roles in the biosynthesis of ginsenoside. Identification of transcription factor binding sites We have predicted the transcription factor binding site (TFBS) in the upstream regions of ten genes encoding enzymes involved in the ginsenoside biosynthesis, including HMG-CoA synthase (HMGS),

HMG-CoA reductase (HMGR),

mevalonate kinase (MK), phosphomevalonate kinase (PMK), mevalonate diphosphate decarboxylase (MDD), isopentenylpyrophosphate isomerase (IPI), geranylgeranyl diphosphate synthase (GGR), farnesyl diphosphate synthase (FPS), squalene epoxidase (SE) and dammarenediol-II synthase (DS) (Supplementary Fig. 14). DNA-binding with one finger 1 (Dof1) and prolamin box binding factor (PBF) binding sites were detected in all these ten genes which used the same binding site motif (5'-AA[AG]G-3') (Supplementary Table 13). Therefore, the expression levels of Dof1 and PBF with these ten genes in different developmental stages of roots and leaves from P. notoginseng were compared (Supplementary Table 14). Most of genes’ expression levels were higher in the 3 roots than that in the 3 leaves, expecting PMK and Dof1 (Supplementary Fig. 15). Dof1 belong to the Dof family, which is plant-specific transcription factor [46]. Yanagisawa’s studies suggested that Dof1 in maize is related with the light-regulated plant-specific gene expression [47, 48]. PBF encodes Dof zinc finger DNA binding proteins, which could enhance the DNA binding of the bZIP transcription factor Opaque-2 to O2 binding site elements [49]. In three stages of roots, the trend of PBF expression level was consistent with that of DS, HMGR, SE, HMGS, MK, MDD, FPS and GGR (Supplementary Table 15). Whereas, in different developmental stage of leaves, there was almost no significant change in

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

genes expression levels in these genes, excluding Dof1 and DS with a rising trend (Supplementary Table 16). We inferred that the PBF and Dof1 were probably associated with the transcriptional controls of these genes, involved in triterpene saponins biosynthesis pathway, in different parts, the root and the leaf of P. notoginseng, respectively.

Discussion Compared with other phylogenetically-related plant species, there was notable gene family contraction of R-genes in P. notoginseng. The encoded resistance proteins play important roles in plant defense by controlling the host plant’s ability to detect a pathogen attack and facilitate a counter attack against the pathogen[7]. Whether P. notoginseng has evolved other defenses and/or anti-stress mechanisms to compensate for the contraction of R-genes remains to be determined in data mining of the first draft genome of this genus. Given that the notable contraction of R-genes of P. notoginseng may provide clues to its poor stress and disease resistance, we also observed expansion and contraction of different subfamilies of the ABC transporter gene family in P. notoginseng. The substrates of ABC proteins include a wide range of compounds such as peptides, lipids, heavy metal chelates and steroids[50]. Interestingly, one of the contracted gene candidates (Pno003332.1) in P. notoginseng was reported to be involved in iron export[36]. A previous experimental study showed that optimal iron concentration could significantly determine the plant biomass and stabilize arsenic content in soil to reduce arsenic contamination[51] where arsenic contamination of P. notoginseng is a serious agricultural problem due to arsenic pollution in the environment[51]. Phenolics, alkaloids and terpenoids are three major classes of chemicals involved in plant defenses[52]. Another possible complementary mechanism to cope with reduced pathogen resistance, which possibly resulted from R-gene contraction, is through evolving synthesis of defensive secondary plant metabolites. Pepper and tomato, two species phylogenetically close to P. notoginseng, produce alkaloids such as capsaicinoids and tomatine, respectively, which have been found to function as

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

deterrents against pathogens[21, 53]. Azadiracht, a meliacane-type triterpenoid with very similar skeleton to dammarane from the neem tree (Azadirachta indica)[54], has insecticidal properties. Ginsenosides are the oligosaccharide glycosides of a series of dammarane- or oleanane-type triterpenoid glycosides[11]. Palazón et al. reported that during in vitro culture of ginseng hair root lines, the addition of methyl jasmonate, a signaling molecule specifically expressed by plants in response to insect and pathogenic attacks, could enhance overall ginsenoside production and conversion of PPD-type (e.g. Rb1, Rb2, Rc and Rd) to PPT-type (e.g. Re, Rf and Rg1) ginsenoside[55]. In addition, natural ginsenosides have antimicrobial and antifungal action, as shown in numerous laboratory studies[56, 57]. The exact natural roles of ginsenosides in plants, particularly controlling plant growth and defense against pathogenic infection, remain unclear and require further investigation. DDS is the critical determinant for the initial step of producing the dammarane-type sapogenin (aglycone of saponin) template. This can be further subjected to structural diversification by CYP450 and UGT enzymes. Along the triterpenoid saponin biosynthesis pathway, DDS of the OSC group is responsible for the cyclization of 2,3-oxidosqualene to dammarenediol-II[58]. A highly-conserved DDS homolog of P. ginseng was also identified in P. notoginseng. Upstream of DDS in the triterpenoid saponin biosynthesis pathway, three SQLE and two FDFT1 genes were identified and shown to exhibit significant differential expression in different parts in P. notoginseng. Han et al. identified that CYP716A47 produces PPD through the hydroxylation of C-12 in dammarenediol-II[44], CYP716A53v2 hydroxylates the C-6 of PPD to produce PPT[59], and CYP716A52v2 is involved in oleanane-type ginsenoside biosynthesis[60] in P. ginseng. We were interested in determining how the transcriptional regulation of different CYP450 genes respectively correlated with synthesis of higher content of PPT- and PPD-type ginsenosides in roots and aerial parts (leaves and flowers). Combined the previous research and transcriptome analysis, we identified the key genes which produces PPD and PPT in the P. notoginseng.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

UGTs are GTs that use uridine diphosphate (UDP)-activated sugar molecules as donors. Several types of UGTs, which catalyze the synthesis of specific sapogenins to ginsenosides, were found in previous studies of P. ginseng[61]. For example, UGTPg1 was demonstrated to region-specifically glycosylate C20-OH of PPD and PPT[62]. UGTPg100 specifically glycosylates PPT to produce bioactive ginsenoside Rh1, and UGTPg101 catalyzes PPT to produce F1[62]. Up to now, very few of the UGTs characterized that are able to glycosylate triterpenoid saponins have been found in P. ginseng[63-66] and P. notoginseng[67, 68]. However, in the present study, an almost complete collection of 160 UGT genes, representing many new homologs and putative UGT members, was disclosed in the P. notoginseng genome. In our study, we found the lack of correlation between qPCR and RNA-seq in the Pno026280, Pno002772 and Pno020280. In order to eliminate error because of improper normalization method, we used other two quantification methods. Using the raw RNA-seq reads counting and normalized by quantiles (limma) [69]and DESeq[70] showed the completely consistent tendency with FPKM results (Supplementary Table 17). The probable reasons are the nonspecific binding of qPCR primers to cDNA template and the bias in the RNA-seq sequencing due to sequencing library preparation, namely, it was due to nature of these two methods (RNA-seq vs. qPCR). In conclusion, we sequenced and assembled the first de novo genome of P. notoginseng. Using a combination of data from genome and transcriptomes, we identified contraction and expansion in numerous important genome events during the evolution of this plant. The contraction of disease-resistance genes families figured out the vulnerability to disease of P. notoginseng on genetic level and the expansion of several ABC transporter subfamilies indicated the potential of ABCs as an additional mechanism for the plant to cope with biotic stress. Combining eight transcriptomes of roots and aerial parts, several key genes and all their family members involved in the synthesis pathway of ginsenosides were also identified including dammarenediol synthase, CYP716 and UGT71. These findings provide new insight into its poor resistance and the biosynthesis of high-valued pharmacological molecules. This reference genome will serve as an important platform for improving

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

breeding and cultivation of this vulnerable plant species to increase the medicinal and economic value of Panax species.

Methods K-mer analysis We used the total length of sequence reads divided by sequencing depth to calculate the genome size. To estimate the sequencing depth, the copy number of 17-mers present in sequence reads was counted and the distribution of sequencing depth of the assembled genome was plotted. The peak value in the curve represents the overall sequencing

depth,

and

can

be

used

to

estimate

genome

size

(G):

G = Number17-mer/Depth17-mer. Genome assembly The first genome assembly version (v1.0) using SOAPdenovo[71] (v2.04) based on the short length reads data sequencing by Hiseq2000, was highly fragmented. It contained more than 600Mb gaps with missing sequence because of presence of high proportion of repetitive DNA sequences in the genome. The contig N50 sizes of v1.0 was 4.2kb. For the short contig N50 size, therefore, we further generated ~13Gb SMRT sequence data (~6-fold of whole genome size) with average sub-read length of ~9kb. Considering the inadequate depth of our Pacbio third generation sequencing, we used the PBJelly[72] to fill the gap and upgrade the genome assembly with long length reads. Concretely, we generated our protocol file containing full paths of the reference assembly above, the output directory and the input files. Then, we executed multiple steps (including setup, mapping, support, extraction, assembly, output) in a consecutive order. From the statistics of the result of PBJelly, we found that the gap number was lowered to 533Mb while sizes of contig N50 was increased to 10.7kb. Next, we used the libraries with the insert sizes of 500bp and 800bp as the gap-filling input data of the GapCloser (v1.0, http://soap.genomics.org.cn/). After running, the gap number was decreased to 493Mb and the length of contig N50 was increased to 16.4kb. Finally, the sizes of contig and scaffold N50 of the final version (v1.1) were

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

16.4kb and 70.6kb, respectively. In addition, the total length of the genome changed from 2.1Gb to 2.4Gb with 493Mb gap sequences. The detailed parameters of SOAPdenovo were “pregraph -s lib.cfg.1 -z 2800000000 -K 45 -R -d 1 -o SAN_63 -p 12; contig -g SAN_63 -R -p 24 ; map -s lib.cfg.1 -g SAN_63 -k 45 -p 24; scaff -g SAN_63 -F -p 24”. We also adopted ABySS[73] to estimate the optimized K value for genome assembly from 35 to 75 using about 20-fold high quality sequencing reads. The result revealed the K=45 is the best K value (Supplementary Table 18), which is completely consistent with the best K value of SOAPdenovo. Transcript de novo assembly Leaves and roots of P. notoginseng were collected from 1-, 2- and 3-year-old P. notoginseng and flowers collected from 2- and 3-year-old P. notoginseng. All of these 8 samples were randomly collected from the fileld of commercial planting base in Yanshan county of Wenshan Zhuang and Miao minority autonomous prefecture in Yunnan province. After cleansing, the leaves, roots and flowers were collected separately, cut into small pieces, immediately frozen in liquid nitrogen, and stored at -80 °C until further processing. Subsequently, mRNA isolation, cDNA library construction and sequencing were performed orderly. Briefly, total RNA was extracted from each tissue using TRIzol reagent and digested with DNase I according to the manufacturer’s protocol. Next, Oligo magnetic beads were used to isolate mrna from the total rna. By mixing with fragmentation buffer, the mrna was broken into short fragments. The cDNA was synthesized using the mRNA fragments as templates. The short fragments were purified and resolved with EB buffer for end repair and single nucleotide A (adenine) addition, and then connected with adapters. Suitable fragments were selected for PCR amplification as templates. During the quality control steps, an Agilent 2100 Bioanalyzer and ABI StepOnePlus Real-Time PCR System were used for quantification and qualification of the sample library. Each cDNA library was sequenced in a single lane of the Illumina HiSeqTM 2000 system using paired end protocols. Before performing the assembly, firstly, raw sequencing data that generated by Illumina HiSeq2000, was subjected to quality control (QC) check including the analysis of base composition and quality. After QC, raw reads

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

which contained the sequence of adapter (adapters are added to cDNAs during library construction and part of them may be sequenced), more than 10% unknown bases or 5% low quality bases were filtered into clean reads. Finally, Trinity[74] was used to perform the assembly of clean reads (detail information could be retrieved in Trinity website http://trinityrnaseq.sourceforge.net/). Based on the Trinity original assembly result, TGICL[75] and Phrap[76] were used to acquire sequences that cannot be extended on either end, the sequences are final assembled transcripts. We mapped all transcripts that were de novo assembled from nine transcriptome short reads by Trinity into the genome assembly using Blat[77], and then calculated their coverage and mapping rate. The sequencing depth distribution was examined by aligning all short reads against assembly using SOAP2[78] and then the depth of each base was calculated. Gene expression analysis All the QC and filter processes of raw sequencing transcriptome data were the same as that of de novo assembly. All the clean reads are mapped to reference gene sequences by using SOAP2[78] with no more than 5 mismatches allowed in the alignment. We also performed the QC of alignment results because mRNAs were firstly broken into short segments by chemical methods during the library construction and then sequenced. If the randomness was poor, read preference for specific gene region could influence the calculation of gene expression. The gene expression level of each gene was calculated by using RPKM method[79] (reads per kilobase transcriptome per million mapped reads) based on the unique alignment results. In the previous test, comparative analysis between RPKM and FPKM showed that the correlation between RPKM and QPCR was better than the correlation between FPKM and QPCR. Referring to the previous study[80], we have developed a stringent algorithm to identify differentially expressed genes between two samples. The probability of a gene expressed equally between two samples was calculated based on the Poisson distribution. P-values corresponding to differential gene expression test were generated using Benjamini, Yekutieli.2001 FDR method[81]. Moreover, because of some debates in the normalization method by gene size like

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

RPKM and then two new quantification methods were conducted to validate the existing results. Concretely, based on the tophat mapping results, R package ‘Rsubread’ was used to do the raw reads counting then we used ‘limma’ with the normalize parameter "quantile". Additionally, DESeq also was conducted. Finally, aiming at the differential expressed genes mentioned in the article based on RNA-seq (RPKM), we found the completely same tendency. Identification of repetitive sequences Transposable elements (TEs) in the P. notoginseng genome were annotated with structure-based

analyses

and

homology-based

comparisons.

In

detail,

RepeatModeler[23] was used to de novo find these TEs based on features of structures. RepeatMasker and RepeatProteinMask were applied using RepBase[82] for TE identification at the DNA and protein level, respectively. Overlapped TEs belonging to the same repeat class were checked and redundant sequences were removed. TRF software[83] was applied to annotate tandem repeats in the genome. Gene prediction and annotation We predicted protein-encoding genes by homolog, de novo and RNA-seq evidence; and results of the former two methods were integrated by the GLEAN program[84] then filtered by threshold level of 20% percent overlap with at least 3 homolog species support. Subsequently, the inner auto pipeline was used to integrated RNA-seq data within and then checked manually. Firstly, protein sequences from five closely related species—A. thaliana, S. lycopersicum, S. tuberosum, Capsicum annuum and Daucus carota—were applied for homolog prediction by respectively mapping them to the genome assembly using TblastN software with E-value -1e-5. Aiming at these target areas, we used Genewise[85] to cluster and filter pseudogenes. Then Fgenesh[86], AUGUSTUS[87] and GlimmerHMM[88] were used for de novo prediction with parameters trained on A. thaliana. We merged three de novo predictions into a unigene set. De novo gene models that were supported by one more de novo methods were retained. For overlapping gene models, the longest one was selected and finally, we got de novo-based gene models. Using de novo gene set (30,660-41,495) and five homolog-based results as gene models (24,358 to 36,116)

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

integration was done using the GLEAN program. Finally, we got the GLEAN gene set (referred to as G-set, 31,678) with parameter “value 0.01, fixed 0”. Combined with the RNA transcriptome sequencing data, we used Tophat2[89] to map and Cufflinks[90] to assemble reads into transcripts and finally integrated the transcripts into the gene set using inner pipeline. The process of inner pipeline was: (1) The assembly of transcriptome data. RNA-seq reads were mapped to P. notoginseng genome using tophat, and then cufflinks pipeline was used to conduct the assembly of transcripts. (2) Prediction of the ORF in transcripts (CDS length >=300bp, score >-15). (3) Comparison of transcripts and Glean results to count the overlap region (identity >=95%, total/len >=90%). (4) Integration of the existing results. Aiming at the results, we checked manually by discarding genes without any functional annotation information, any homolog support or cufflinks results support. The final gene set (35,451) was obtained. Then, with the known protein databases (SwissProt[91], Trembl, KEGG[92], InterPro[93] and GO[94]), we annotated these functional proteins in the gene set by aligning with template sequences in the databases using BLASTP (1e-5) and InterProScan. Identification of R-genes Most R-genes in plants encode NBS-LRR proteins. According to the conservative structural

characteristics

of

domains,

we

used

HMMER

(V3,

http://hmmer.janelia.org/software) to screen the domains in the Pfam NBS (NB-ARC) family. Then we compared all NBS-encoding genes with TIR HMM (PF01582) and LRR 1 HMM (PF00560) data sets using HMMER (V3). For the CC domains, we used the MARCOIL program[95] with a threshold probability of 0.9 and double-checked using paircoil2[96] with a P-score cut-off of 0.025. Gene cluster analysis We used OrthoMCL[97] to identify the gene family and so obtained the single-copy gene families and multi-gene families that were conserved among species. Then we constructed a phylogenetic tree based on the single-copy orthologous gene families using PhyML[98]. The different molecular clock (divergence rate) might be explained by the generation-time hypothesis. Here, we used the MCMCTREE program[99] (a

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

BRMC approach from the PAML package) to estimate the species divergence time. ‘Correlated molecular clock’ and ‘JC69’ model in the MCMCTREE program were used in our calculation. Analysis of key gene families The reference genes of interest in A. thaliana (such as ABC, CYP450 and UGT) were found in the TAIR10 functional descriptions file. Then, the target genes of P. notoginseng were identified and classified using BlastP with cut-off value of 1e-5 and constructing gene trees using PhyML (-d aa -b 100). The tree representation was constructed using MEGA software. Real-time PCR analysis Isolated RNA from different P. notoginseng tissues were reverse-transcribed to single-strand cDNA using the Super Script™ III First-Strand Synthesis System (Invitrogen™, USA). Quantitative reactions were performed on the Real-Time PCR Detection System (VIIA 7TM Real-Time PCR System, Applied Biosystems, USA) using FastStart Universal Probe Master and Universal ProbeLibrary Probes (Roche, Switzerland). All primers used in this study are listed in Supplementary Table 19. The relative gene expression was calculated with the 2-ΔΔCT method. For each sample, the mRNA levels of the target genes were normalized to that of 18S rRNA.

Availability of supporting data and materials The genome assembly and sequencing data have been deposited into NCBI Sequence Read Archive (SRA) under project number PRJNA299863 and into GigaDB.

Declarations Acknowledgements This work was supported by the Science and Technology Development Fund (FDCT) of Macao SAR (Ref. No. 134/2014/A3), Research Committee of University of Macau (MYRG139(Y1-L4)-ICMS12-LMY and MYRG2015-00214-ICMS-QRCM), the

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

Overseas and Hong Kong, Macau Young Scholars Collaborative Research Fund by the Natural National Science Foundation of China (Grant no. 81328025), Technology Innovation

Program

Support

by

Shenzhen

Municipal

Government

(NO.JSGG20130918102805062) and Basic Research Program Support by Shenzhen Municipal

Government

(NO.JCYJ2014072916361729

and

NO.

JCYJ20150831201123287).

Additional files Additional file 1: Supplementary information includes supporting figures and supporting tables.

Competing interests The authors declare that they have no competing interests.

Author’s contributions S. M.Y.L, X.X and X.L designed the project. G.F, B.R.Y, Y. F, H.Z, X.L, C.S, K.M, J.H, R.G, L.S, S.D, Q.X, W.L, M.L, A.K.W, D.Z and M.C analyzed the data. W.C, X.L, G.F, Y. F, J.C and B.R.Y wrote the manuscript. B.R.Y, M.G and C. H prepared the samples and conducted the experiments.

Authors’ Affiliations (1)

State Key Laboratory of Quality Research in Chinese Medicine, Institute of

Chinese Medical Sciences, University of Macau, Macao, China. (2)

BGI-Shenzhen, Shenzhen 518083, China.

(3)

State Key Laboratory of Bioelectronics, School of Biological Sciences and

Medical Engineering, Southeast University, Nanjing 210096, China (4)

Faculty of Science and Technology, Department of Civil and Environmental

Engineering, University of Macau, Macao, China

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

Figure legends Fig. 1. The gene families related to resistance of P. notoginseng. (a) The evolution of R-genes with NBS domains among S. lycopersicum (Sl), D. carota (Dc) and P. notoginseng (Pn). The numbers in circles represent the gene numbers of a species and the numbers in clades represent the gene number of expanded gene families (E), contracted gene families (C) and extinct gene families (D). NBS represents a gene that only contains the NBS domain, and NBS_LRR, TIR_NBS_LRR and CC_NBS_LRR follow a similar representation. (b) Phylogenetic tree of R genes of P. notoginseng, D. carota and Oryza sativa (outgroup). Blue label is P. notoginseng, orange is D. carota and purple is Oryza sativa. (c) The gene trees of ABC Gpdr subfamily of P. notoginseng and other five plants. The red represents genes of P. notoginseng, the green is pepper, the orange is potato, dark blue is tomato, the cyan is carrot and the blue is Arabidopsis. .

Fig. 2. The preliminary steps of ginsenoside biosynthesis and several key gene families. (a) The dammarenediol-II synthesis pathway from glycolysis involves terpenoid backbone biosynthesis and finally catalysis by DDS. (b) Comparison of gene expressions of three key enzyme gene families between the roots and aerial parts (flowers and leaves) and the qPCR results. R1 represents one-year-old roots of P. notoginseng, R2: two-year-old roots, R3: three-year-old roots from plant B, L1: one-year-old leaves, L2, two-year-old leaves, L3: three-year-old leaves, F2: two-year-old flowers, F3: three-year-old flowers. (c) Protein sequence multiple comparisons of DDS among P. notoginseng, P. ginseng and other representative plants. Stars represent amino acids conserved in all protein sequences, red circles represent amino acids that are the same in P. notoginseng and P. ginseng, but missing in all other plants. (d) The protein 3D structure of DDS of P. notoginseng constructed by SWISS-MODEL. The arrows indicate two amino acids specific to P. notoginseng and

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

P. ginseng.

Fig. 3. The CYP450 gene families of P. notoginseng. (a) Phylogenetic analysis of CYP716 subfamily among P. notoginseng, P. ginseng and other plants. Red circles represent P. notoginseng, purple diamonds represent P. ginseng, green circles represent S. tuberosum, yellow triangles represent D. carota, purple squares represent S. lycopersicum and green diamonds represent A. thaliana. The lengths of clades in the gene tree show the protein similarity between two clades. (b) Synthesis pathway of protopanaxadiol (PPD) and protopanaxatriol (PPT) of P. notoginseng. (c)The relative amount of PPD was more than that of PPT in the aerial parts of P. notoginseng, but the opposite in roots. (d) The differentially expressed CYP450 genes of P. notoginseng between the aerial parts and roots and the qPCR results. The pink and green genes in the heatmap represent CYP716 subfamily genes that were involved in the synthesis pathway in P. ginseng.

Fig. 4. The UGT gene families of P. notoginseng. (a) The synthesis pathway of different PPT-type ginsenosides that were catalyzed by different UGT genes. (b) The synthesis pathway of different PPD-type ginsenosides that were catalyzed by different UGT genes. (c) Highly expressed UGT genes of P. notoginseng in roots than in aerial parts and the qPCR results. (d) Highly expressed UGT genes of P. notoginseng in aerial parts than in roots and the qRCR results. The purple, green and blue genes in the heatmap represent gene families that were involved in the synthesis pathway in P. ginseng.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

References 1. Briskin DP. Medicinal plants and phytomedicines. Linking plant biochemistry and physiology to human health. Plant physiology. 2000;124(2):507-14. 2. Baeg IH, So SH. The world ginseng market and the ginseng (Korea). Journal of ginseng research. 2013;37(1):1-7. doi:10.5142/jgr.2013.37.1. 3. Guo H, Cui X, An N, Cai G. Sanchi ginseng (Panax notoginseng (Burkill) F. H. Chen) in China: distribution, cultivation and variations. Genet Resour Crop Evol. 2010;57(3):453-60. doi:10.1007/s10722-010-9531-2. 4. Lee CH, Kim JH. A review on the medicinal potentials of ginseng and ginsenosides on cardiovascular diseases. Journal of ginseng research. 2014;38(3):161-6. doi:10.1016/j.jgr.2014.03.001. 5. Guo XC. New green industry – takes Wenshan Panax notoginseng status and its future as an example. Ecol Econ 2007;1:114-7. 6. Ou X, Jin H, Guo L, Yang Y, Cui X, Xiao Y et al. [Status and prospective on nutritional physiology and fertilization of Panax notoginseng]. Zhongguo Zhong yao za zhi = Zhongguo zhongyao zazhi = China journal of Chinese materia medica. 2011;36(19):2620-4. 7. Gururani MA, Venkatesh J, Upadhyaya CP, Nookaraju A, Pandey SK, Park SW. Plant disease resistance genes: Current status and future directions. Physiological and Molecular Plant Pathology. 2012;78:51-65. doi:http://dx.doi.org/10.1016/j.pmpp.2012.01.002. 8. Knepper C, Day B. From perception to activation: the molecular-genetic and biochemical landscape of disease resistance signaling in plants. The Arabidopsis book / American Society of Plant Biologists. 2010;8:e012. doi:10.1199/tab.0124. 9. Andolfo G, Ruocco M, Di Donato A, Frusciante L, Lorito M, Scala F et al. Genetic variability and evolutionary diversification of membrane ABC transporters in plants. BMC plant biology. 2015;15:51. doi:10.1186/s12870-014-0323-2. 10. Leung KW, Wong AS. Pharmacology of ginsenosides: a literature review. Chinese medicine. 2010;5:20. doi:10.1186/1749-8546-5-20. 11. Yang WZ, Hu Y, Wu WY, Ye M, Guo DA. Saponins in the genus Panax L. (Araliaceae): a systematic review of their chemical diversity. Phytochemistry. 2014;106:7-24. doi:10.1016/j.phytochem.2014.07.012. 12. Ng TB. Pharmacological activity of sanchi ginseng (Panax notoginseng). The Journal of pharmacy and pharmacology. 2006;58(8):1007-19. doi:10.1211/jpp.58.8.0001. 13. Son HY, Han HS, Jung HW, Park YK. Panax notoginseng Attenuates the Infarct Volume in Rat Ischemic Brain and the Inflammatory Response of Microglia. Journal of pharmacological sciences. 2009;109(3):368-79. 14. Li H, Deng CQ, Chen BY, Zhang SP, Liang Y, Luo XG. Total saponins of Panax notoginseng modulate the expression of caspases and attenuate apoptosis in rats following focal cerebral ischemia-reperfusion. Journal of ethnopharmacology. 2009;121(3):412-8. doi:10.1016/j.jep.2008.10.042. 15. Yang CY, Wang J, Zhao Y, Shen L, Jiang X, Xie ZG et al. Anti-diabetic effects of Panax notoginseng saponins and its major anti-hyperglycemic components. Journal of ethnopharmacology. 2010;130(2):231-6. doi:10.1016/j.jep.2010.04.039. 16. Xiang H, Liu Y, Zhang B, Huang J, Li Y, Yang B et al. The antidepressant effects and mechanism of action of total saponins from the caudexes and leaves of Panax notoginseng in animal models of

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

depression. Phytomedicine : international journal of phytotherapy and phytopharmacology. 2011;18(8-9):731-8. doi:10.1016/j.phymed.2010.11.014. 17. Yang X, Xiong X, Wang H, Yang G, Wang J. Xuesaitong soft capsule (chinese patent medicine) for the treatment of unstable angina pectoris: a meta-analysis and systematic review. Evidence-based complementary and alternative medicine : eCAM. 2013;2013:948319. doi:10.1155/2013/948319. 18. Wang L, Li Z, Zhao X, Liu W, Liu Y, Yang J et al. A network study of chinese medicine xuesaitong injection to elucidate a complex mode of action with multicompound, multitarget, and multipathway. Evidence-based complementary and alternative medicine : eCAM. 2013;2013:652373. doi:10.1155/2013/652373. 19. Luo J, Song W, Yang G, Xu H, Chen K. Compound Danshen (Salvia miltiorrhiza) dripping pill for coronary heart disease: an overview of systematic reviews. The American journal of Chinese medicine. 2015;43(1):25-43. doi:10.1142/S0192415X15500020. 20. Denoeud F, Carretero-Paulet L, Dereeper A, Droc G, Guyot R, Pietrella M et al. The coffee genome provides insight into the convergent evolution of caffeine biosynthesis. Science. 2014;345(6201):1181-4. doi:10.1126/science.1255274. 21. Kim S, Park M, Yeom SI, Kim YM, Lee JM, Lee HA et al. Genome sequence of the hot pepper provides insights into the evolution of pungency in Capsicum species. Nature genetics. 2014;46(3):270-8. doi:10.1038/ng.2877. 22. Yue PY, Mak NK, Cheng YK, Leung KW, Ng TB, Fan DT et al. Pharmacogenomics and the Yin/Yang actions of ginseng: anti-tumor, angiomodulating and steroid-like activities of ginsenosides. Chinese medicine. 2007;2:6. doi:10.1186/1749-8546-2-6. 23. Sengupta S, Toh SA, Sellers LA, Skepper JN, Koolwijk P, Leung HW et al. Modulating angiogenesis: the yin and the yang in ginseng. Circulation. 2004;110(10):1219-25. doi:10.1161/01.CIR.0000140676.88412.CF. 24. Yang WZ, Bo T, Ji S, Qiao X, Guo DA, Ye M. Rapid chemical profiling of saponins in the flower buds of Panax notoginseng by integrating MCI gel column chromatography and liquid chromatography/mass spectrometry analysis. Food Chem. 2013;139(1-4):762-9. doi:10.1016/j.foodchem.2013.01.051. 25. Wan JB, Zhang QW, Hong SJ, Li P, Li SP, Wang YT. Chemical investigation of saponins in different parts of Panax notoginseng by pressurized liquid extraction and liquid chromatography-electrospray ionization-tandem mass spectrometry. Molecules. 2012;17(5):5836-53. doi:10.3390/molecules17055836. 26. Mapleson D, Garcia Accinelli G, Kettleborough G, Wright J, Clavijo BJ. KAT: a K-mer analysis toolkit to quality control NGS datasets and genome assemblies. Bioinformatics. 2016. doi:10.1093/bioinformatics/btw663. 27. Marcais G, Kingsford C. A fast, lock-free approach for efficient parallel counting of occurrences of k-mers. Bioinformatics. 2011;27(6):764-70. doi:10.1093/bioinformatics/btr011. 28. Iorizzo M, Ellison S, Senalik D, Zeng P, Satapoomin P, Huang J et al. A high-quality carrot genome assembly provides new insights into carotenoid accumulation and asterid genome evolution. Nature genetics. 2016;48(6):657-66. doi:10.1038/ng.3565. 29. Simao FA, Waterhouse RM, Ioannidis P, Kriventseva EV, Zdobnov EM. BUSCO: assessing genome assembly and annotation completeness with single-copy orthologs. Bioinformatics. 2015;31(19):3210-2. doi:10.1093/bioinformatics/btv351. 30. Zhang T, Hu Y, Jiang W, Fang L, Guan X, Chen J et al. Sequencing of allotetraploid cotton

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

(Gossypium hirsutum L. acc. TM-1) provides a resource for fiber improvement. Nature biotechnology. 2015;33(5):531-7. doi:10.1038/nbt.3207. 31. Yan L, Wang X, Liu H, Tian Y, Lian J, Yang R et al. The Genome of Dendrobium officinale Illuminates the Biology of the Important Traditional Chinese Orchid Herb. Molecular plant. 2015;8(6):922-34. doi:10.1016/j.molp.2014.12.011. 32. Huang S, Li R, Zhang Z, Li L, Gu X, Fan W et al. The genome of the cucumber, Cucumis sativus L. Nature genetics. 2009;41(12):1275-81. doi:10.1038/ng.475. 33. Rea PA. Plant ATP-binding cassette transporters. Annual review of plant biology. 2007;58:347-75. doi:10.1146/annurev.arplant.57.032905.105406. 34. Martinoia E, Klein M, Geisler M, Bovet L, Forestier C, Kolukisaoglu U et al. Multifunctionality of plant ABC transporters--more than just detoxifiers. Planta. 2002;214(3):345-55. 35. Geisler M, Murphy AS. The ABC of auxin transport: the role of p-glycoproteins in plant development. FEBS letters. 2006;580(4):1094-102. doi:10.1016/j.febslet.2005.11.054. 36. Chen S, Sanchez-Fernandez R, Lyver ER, Dancis A, Rea PA. Functional characterization of AtATM1, AtATM2, and AtATM3, a subfamily of Arabidopsis half-molecule ATP-binding cassette transporters implicated in iron homeostasis. The Journal of biological chemistry. 2007;282(29):21561-71. doi:10.1074/jbc.M702383200. 37. Kang J, Hwang JU, Lee M, Kim YY, Assmann SM, Martinoia E et al. PDR-type ABC transporter mediates cellular uptake of the phytohormone abscisic acid. Proceedings of the National Academy of Sciences of the United States of America. 2010;107(5):2355-60. doi:10.1073/pnas.0909222107. 38. Campbell EJ, Schenk PM, Kazan K, Penninckx IA, Anderson JP, Maclean DJ et al. Pathogen-responsive expression of a putative ATP-binding cassette transporter gene conferring resistance to the diterpenoid sclareol is regulated by multiple defense signaling pathways in Arabidopsis. Plant physiology. 2003;133(3):1272-84. doi:10.1104/pp.103.024182. 39. Shibuya M, Hoshino M, Katsube Y, Hayashi H, Kushiro T, Ebizuka Y. Identification of beta-amyrin and sophoradiol 24-hydroxylase by expressed sequence tag mining and functional expression assay. The FEBS journal. 2006;273(5):948-59. doi:10.1111/j.1742-4658.2006.05120.x. 40. Choi DW, Jung J, Ha YI, Park HW, In DS, Chung HJ et al. Analysis of transcripts in methyl jasmonate-treated ginseng hairy roots to identify genes involved in the biosynthesis of ginsenosides and other secondary metabolites. Plant cell reports. 2005;23(8):557-66. doi:10.1007/s00299-004-0845-4. 41. Schwede T, Kopp J, Guex N, Peitsch MC. SWISS-MODEL: An automated protein homology-modeling server. Nucleic acids research. 2003;31(13):3381-5. 42. Thoma R, Schulz-Gasch T, D'Arcy B, Benz J, Aebi J, Dehmlow H et al. Insight into steroid scaffold formation from the structure of human oxidosqualene cyclase. Nature. 2004;432(7013):118-22. doi:10.1038/nature02993. 43. Nelson D, Werck-Reichhart D. A P450-centric view of plant evolution. The Plant journal : for cell and molecular biology. 2011;66(1):194-211. doi:10.1111/j.1365-313X.2011.04529.x. 44. Han JY, Hwang HS, Choi SW, Kim HJ, Choi YE. Cytochrome P450 CYP716A53v2 catalyzes the formation of protopanaxatriol from protopanaxadiol during ginsenoside biosynthesis in Panax ginseng. Plant & cell physiology. 2012;53(9):1535-45. doi:10.1093/pcp/pcs106. 45. Yonekura-Sakakibara K, Hanada K. An evolutionary view of functional diversity in family 1 glycosyltransferases. The Plant journal : for cell and molecular biology. 2011;66(1):182-93. doi:10.1111/j.1365-313X.2011.04493.x.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

46. Yanagisawa S. A novel DNA-binding domain that may form a single zinc finger motif. Nucleic Acids Res. 1995;23(17):3403-10. 47. Yanagisawa S. Dof1 and Dof2 transcription factors are associated with expression of multiple genes involved in carbon metabolism in maize. The Plant journal : for cell and molecular biology. 2000;21(3):281-8. 48. Yanagisawa S, Sheen J. Involvement of maize Dof zinc finger proteins in tissue-specific and light-regulated gene expression. Plant Cell. 1998;10(1):75-89. 49. Vicente-Carbajosa J, Moose SP, Parsons RL, Schmidt RJ. A maize zinc-finger protein binds the prolamin box in zein gene promoters and interacts with the basic leucine zipper transcriptional activator Opaque2. Proc Natl Acad Sci U S A. 1997;94(14):7685-90. 50. Theodoulou FL. Plant ABC transporters. Biochimica et Biophysica Acta (BBA) - Biomembranes. 2000;1465(1–2):79-103. doi:http://dx.doi.org/10.1016/S0005-2736(00)00132-2. 51. Yan XL, Lin LY, Liao XY, Zhang WB, Wen Y. Arsenic stabilization by zero-valent iron, bauxite residue, and zeolite at a contaminated site planting Panax notoginseng. Chemosphere. 2013;93(4):661-7. doi:10.1016/j.chemosphere.2013.05.083. 52. Freeman BC, Beattie GA. An Overview of Plant Defenses against Pathogens and Herbivores. The Plant Health Instructor.; 2008. 53. Friedman M. Tomato glycoalkaloids: role in the plant and in the diet. Journal of agricultural and food chemistry. 2002;50(21):5751-80. 54. Aerts R, Mordue AJ. Feeding Deterrence and Toxicity of Neem Triterpenoids. J Chem Ecol. 1997;23(9):2117-32. doi:10.1023/b:joec.0000006433.14030.04. 55. Palazón J, Cusidó RM, Bonfill M, Mallol A, Moyano E, Morales C et al. Elicitation of different Panax ginseng transformed root phenotypes for an improved ginsenoside production. Plant Physiology and Biochemistry. 2003;41(11–12):1019-25. doi:http://dx.doi.org/10.1016/j.plaphy.2003.09.002. 56. Bernards MA YL, Nicol RW. The allelopathic potential of ginsenosides. . In Allelochemicals: Biological Control of Plant Pathogens and Diseases. Netherlands: Springer; 2006. 57. Nicol RW, Traquair JA, MA. B. Ginsenosides as host resistance factors in American ginseng (Panax quinquefolius). Canadian Journal of Botany. 2002;80:557-62. . 58. Kushiro T, Ohno Y, Shibuya M, Ebizuka Y. In vitro conversion of 2,3-oxidosqualene into dammarenediol by Panax ginseng microsomes. Biol Pharm Bull. 1997;20(3):292-4. 59. Han JY, Kim HJ, Kwon YS, Choi YE. The Cyt P450 enzyme CYP716A47 catalyzes the formation of protopanaxadiol from dammarenediol-II during ginsenoside biosynthesis in Panax ginseng. Plant & cell physiology. 2011;52(12):2062-73. doi:10.1093/pcp/pcr150. 60. Han JY, Kim MJ, Ban YW, Hwang HS, Choi YE. The involvement of beta-amyrin 28-oxidase (CYP716A52v2) in oleanane-type ginsenoside biosynthesis in Panax ginseng. Plant & cell physiology. 2013;54(12):2034-46. doi:10.1093/pcp/pct141. 61. Jung SC, Kim W, Park SC, Jeong J, Park MK, Lim S et al. Two ginseng UDP-glycosyltransferases synthesize ginsenoside Rg3 and Rd. Plant & cell physiology. 2014;55(12):2177-88. doi:10.1093/pcp/pcu147. 62. Wei W, Wang P, Wei Y, Liu Q, Yang C, Zhao G et al. Characterization of Panax ginseng UDP-Glycosyltransferases Catalyzing Protopanaxatriol and Biosyntheses of Bioactive Ginsenosides F1 and Rh1 in Metabolically Engineered Yeasts. Molecular plant. 2015. doi:10.1016/j.molp.2015.05.010. 63. Yan X, Fan Y, Wei W, Wang P, Liu Q, Wei Y et al. Production of bioactive ginsenoside compound K in metabolically engineered yeast. Cell Res. 2014;24(6):770-3. doi:10.1038/cr.2014.28.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

64. Sun C, Li Y, Wu Q, Luo H, Sun Y, Song J et al. De novo sequencing and analysis of the American ginseng root transcriptome using a GS FLX Titanium platform to discover putative genes involved in ginsenoside biosynthesis. BMC genomics. 2010;11:262. doi:10.1186/1471-2164-11-262. 65. Chen S, Luo H, Li Y, Sun Y, Wu Q, Niu Y et al. 454 EST analysis detects genes putatively involved in ginsenoside biosynthesis in Panax ginseng. Plant cell reports. 2011;30(9):1593-601. doi:10.1007/s00299-011-1070-6. 66. Li C, Zhu Y, Guo X, Sun C, Luo H, Song J et al. Transcriptome analysis reveals ginsenosides biosynthetic genes, microRNAs and simple sequence repeats in Panax ginseng C. A. Meyer. BMC genomics. 2013;14:245. doi:10.1186/1471-2164-14-245. 67. Liu MH, Yang BR, Cheung WF, Yang KY, Zhou HF, Kwok JS et al. Transcriptome analysis of leaves, roots and flowers of Panax notoginseng identifies genes involved in ginsenoside and alkaloid biosynthesis. BMC genomics. 2015;16:265. doi:10.1186/s12864-015-1477-5. 68. Luo H, Sun C, Sun Y, Wu Q, Li Y, Song J et al. Analysis of the transcriptome of Panax notoginseng root uncovers putative triterpene saponin-biosynthetic genes and genetic markers. BMC genomics. 2011;12 Suppl 5:S5. doi:10.1186/1471-2164-12-S5-S5. 69. Ritchie ME, Phipson B, Wu D, Hu Y, Law CW, Shi W et al. limma powers differential expression analyses for RNA-sequencing and microarray studies. Nucleic acids research. 2015;43(7):e47. doi:10.1093/nar/gkv007. 70. Anders S, Huber W. Differential expression analysis for sequence count data. Genome biology. 2010;11(10):R106. doi:10.1186/gb-2010-11-10-r106. 71. Luo R, Liu B, Xie Y, Li Z, Huang W, Yuan J et al. SOAPdenovo2: an empirically improved memory-efficient short-read de novo assembler. GigaScience. 2012;1(1):18. doi:10.1186/2047-217X-1-18. 72. English AC, Richards S, Han Y, Wang M, Vee V, Qu J et al. Mind the gap: upgrading genomes with Pacific Biosciences RS long-read sequencing technology. PloS one. 2012;7(11):e47768. doi:10.1371/journal.pone.0047768. 73. Simpson JT, Wong K, Jackman SD, Schein JE, Jones SJ, Birol I. ABySS: a parallel assembler for short read sequence data. Genome research. 2009;19(6):1117-23. doi:10.1101/gr.089532.108. 74. Grabherr MG, Haas BJ, Yassour M, Levin JZ, Thompson DA, Amit I et al. Full-length transcriptome assembly from RNA-Seq data without a reference genome. Nature biotechnology. 2011;29(7):644-52. doi:10.1038/nbt.1883. 75. Pertea G, Huang X, Liang F, Antonescu V, Sultana R, Karamycheva S et al. TIGR Gene Indices clustering tools (TGICL): a software system for fast clustering of large EST datasets. Bioinformatics. 2003;19(5):651-2. 76. Vogel JP, Gu YQ, Twigg P, Lazo GR, Laudencia-Chingcuanco D, Hayden DM et al. EST sequencing and phylogenetic analysis of the model grass Brachypodium distachyon. TAG Theoretical and applied genetics Theoretische und angewandte Genetik. 2006;113(2):186-95. doi:10.1007/s00122-006-0285-3. 77. Kent WJ. BLAT--the BLAST-like alignment tool. Genome research. 2002;12(4):656-64. doi:10.1101/gr.229202. Article published online before March 2002. 78. Li R, Yu C, Li Y, Lam TW, Yiu SM, Kristiansen K et al. SOAP2: an improved ultrafast tool for short read alignment. Bioinformatics. 2009;25(15):1966-7. doi:10.1093/bioinformatics/btp336. 79. Mortazavi A, Williams BA, McCue K, Schaeffer L, Wold B. Mapping and quantifying mammalian transcriptomes by RNA-Seq. Nature methods. 2008;5(7):621-8. doi:10.1038/nmeth.1226.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

80. Audic S, Claverie JM. The significance of digital gene expression profiles. Genome research. 1997;7(10):986-95. 81. Yoav Benjamini DY. The control of the false discovery rate in multiple testing under dependency. Ann Statist. 2001;29(4):1165-88. 82. Jurka J, Kapitonov VV, Pavlicek A, Klonowski P, Kohany O, Walichiewicz J. Repbase Update, a database of eukaryotic repetitive elements. Cytogenetic and genome research. 2005;110(1-4):462-7. doi:10.1159/000084979. 83. Benson G. Tandem repeats finder: a program to analyze DNA sequences. Nucleic acids research. 1999;27(2):573-80. 84. Elsik CG, Mackey AJ, Reese JT, Milshina NV, Roos DS, Weinstock GM. Creating a honey bee consensus gene set. Genome biology. 2007;8(1):R13. doi:10.1186/gb-2007-8-1-r13. 85. Birney E, Clamp M, Durbin R. GeneWise and Genomewise. Genome research. 2004;14(5):988-95. doi:10.1101/gr.1865504. 86. Salamov AA, Solovyev VV. Ab initio gene finding in Drosophila genomic DNA. Genome research. 2000;10(4):516-22. 87. Stanke M, Keller O, Gunduz I, Hayes A, Waack S, Morgenstern B. AUGUSTUS: ab initio prediction of alternative transcripts. Nucleic acids research. 2006;34(Web Server issue):W435-9. doi:10.1093/nar/gkl200. 88. Majoros WH, Pertea M, Salzberg SL. TigrScan and GlimmerHMM: two open source ab initio eukaryotic gene-finders. Bioinformatics. 2004;20(16):2878-9. doi:10.1093/bioinformatics/bth315. 89. Trapnell C, Pachter L, Salzberg SL. TopHat: discovering splice junctions with RNA-Seq. Bioinformatics. 2009;25(9):1105-11. doi:10.1093/bioinformatics/btp120. 90. Trapnell C, Williams BA, Pertea G, Mortazavi A, Kwan G, van Baren MJ et al. Transcript assembly and quantification by RNA-Seq reveals unannotated transcripts and isoform switching during cell differentiation. Nature biotechnology. 2010;28(5):511-5. doi:10.1038/nbt.1621. 91. Bairoch A, Apweiler R. The SWISS-PROT protein sequence database and its supplement TrEMBL in 2000. Nucleic acids research. 2000;28(1):45-8. 92. Kanehisa M, Goto S. KEGG: kyoto encyclopedia of genes and genomes. Nucleic acids research. 2000;28(1):27-30. 93. Zdobnov EM, Apweiler R. InterProScan--an integration platform for the signature-recognition methods in InterPro. Bioinformatics. 2001;17(9):847-8. 94. Ashburner M, Ball CA, Blake JA, Botstein D, Butler H, Cherry JM et al. Gene ontology: tool for the unification of biology. The Gene Ontology Consortium. Nature genetics. 2000;25(1):25-9. doi:10.1038/75556. 95. Delorenzi M, Speed T. An HMM model for coiled-coil domains and a comparison with PSSM-based predictions. Bioinformatics. 2002;18(4):617-25. 96. McDonnell AV, Jiang T, Keating AE, Berger B. Paircoil2: improved prediction of coiled coils from sequence. Bioinformatics. 2006;22(3):356-8. doi:10.1093/bioinformatics/bti797. 97. Li L, Stoeckert CJ, Jr., Roos DS. OrthoMCL: identification of ortholog groups for eukaryotic genomes. Genome research. 2003;13(9):2178-89. doi:10.1101/gr.1224503. 98. Guindon S, Dufayard JF, Lefort V, Anisimova M, Hordijk W, Gascuel O. New algorithms and methods to estimate maximum-likelihood phylogenies: assessing the performance of PhyML 3.0. Systematic biology. 2010;59(3):307-21. doi:10.1093/sysbio/syq010. 99. Yang Z. PAML 4: phylogenetic analysis by maximum likelihood. Molecular biology and evolution.

bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046. The copyright holder for this preprint (which was not peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

2007;24(8):1586-91. doi:10.1093/molbev/msm088.

TIR_NBS_LRR 25

13

23 E23/C0/D0

MRCA

NBS_LRR

E11/C9/D0

73

E12/C9/D6

MRCA

9

75

Sl

52

Dc

A1

2g

19

12

tu

b0 g1 So 5g 93 lyc 0 80 tub 05g 264 05 0 05 35 0.1 g0 .1 90 26 05 .2 4 lyc g053 10.1 .1 05 .1 6 00 g CA 0535 .2.1 70 05 CA 05g g158 .2.1 90 159 So 0 tub 05g 0 Soly 026 c05 420 g05 .1.1 361 CA0 0.2 5g1 CA0 588 .1 0g2 0 166 0 CA0 4g01 Sotub 140 09g0 2804 CA03g 0.1.1 0117 Dcarot 0 a01623 7.1 Dcarota0 16246.1 Pno00051 5.1 Dcarota007 547.1 Pno017868.1

39

0

1 1. 1 . 1 42 0. 15 80.1 1 79 a0 2 1. 34 rot 0 07 80. 0 a 57 4 Dc 11g 011 g05 c 2 g .1.1 ly 1 A1 290 .1 So ub1 C t .1 07 70 g0 So 4 1 1 50 c1 01 25 ly g g7 So b11 40 00 tu 25 CA .1.1 So 0g7 40 0 84 CA 0 02 6 g 5 72 05 tub 00g So CA 20 725 00g CA 530 2 7 0g CA0 0.2.1 533 g05 c05 0.1.1 Soly 2843 05g0 Sotub .1 24 76 1 Pno02 ota009141. .1 Dcar 009142 Dcarota

lyc

ABC_Gpdr

09144.1

.1

CA12g07800

Dcarota026298.1 Dcarota007545.1 543.1 514.1 Pno000 869.1 Pno017 1 08727. Pno0 46.1 .1 5854 ta0075 Dcaro rota01 .1 ca D 622 018 .1 Pno 767 2 .1 2 0 585 .1 Pno 001 911 Pno 027 8.1 Pno 3830 o0 Pn

Dcarota007

Dca

rota

.1 2.2 42 08 84 01.1 11 o0 0104 0.1.1 Pn .1 ta 53 aro 017 10.2 c g 5 1 D 0. 05 18 tub 5g0 691 2 0 So c G 0 ly 2 4 So AT 122 50 1 2 1. 6g 12 0 1 0. g CA 06 472 .2. CA 02 670 6g 065 b0 g u 6 t So lyc0 So

3 3G

aro

Dc

aro

CA

92

02

Pn o

56

16

aro

g0

70

g0

ar

D

ca

ot

a0

10

ot

14

00

84

.1

1

7.

96

29

a0

ot

20

64

a0

30

ta

ar

Dc

1

2.

90

30

17

g0

40 90 2.1 g2 48 00 03 CA ta0 aro Dc

15

ro

9.1

90

ar

g1

67

35

Dc

11

8.1

20

90

09

9.

29

10

g2

90

ta0

87

CA

Dc

12

Pn o0 ot 10 a0 06 05 5. 72 7. 1 1 Pn Dc o0 aro Pn 02 o0 ta0 51 04 02 6. Pn 2 93 o0 0.1 47.1 1 36 10 3.1 Dc Pn aro o 01 ta0 22 26 0 5.1 95 6.1 Dc aro Dca ta0 rota 233 011 97.1 123 Dca .1 rota 0 1 Dcar 112 ota0 0.1 1111 8.1 CCG 0032 55.1

1

ar

8.

Dc

24

g1

CA

a0 ot ar

18

07

Dc

ot a0

CA

CA

AT2G37280.1

AT3G53480.1

CA00g62170 CA00g621 80 Pno038 386.1

.1 379 004 00 .1 Pno 10 55 52 .1.1 g0 .1 08 G1 70 330 CA AT1 598 g022 1G 12 AT otub S 20 463 00g 980.2.1 CA 0 g12 .1.1 c03 980 Soly 03g035 .1 b 6190 Sotu ta01 Dcaro 3.1 1.2 1318 a00758 Pno0 Dcarot 07582.1 Dcarota0 36.1 AT4G152 .1 AT4G15215 AT4G15230.1

36

.1

01

Dc

1

96

05.1

068

01

6.

1

.1

309

01g

CA

ta0

069

ta0

CA

Solyc01g101070.2.1 Sotub01g038890.1.1

ar

017

Dc

Dcarota0 05260.1 Dcarot a00526 CCG0 2.1 38595 .1 CCG 0154 01.1

Dc

AT

_D

276 Dcarota000

AT2G29940.1 Dcarota007372.1 Pno015467.1 Pno031281.1 CA06g1879 0 Solyc06g 076930.1.1 Sotub06 g03271 0.1.1 CA02g1 8750 Solyc0 2g08 1870 Sotub .2.1 02g0 2518 0.1.1 Dca rota 0072 CA0 82.1 0g8 000 0

Dcarota026296.1

9.1

BS

Dcarota0

CA02g18760

Dcarota026297.1

44.1

Pn

50.1 .1 So lyc 12g So 019 tub 620 12 g0 .1.1 20 34 0.1 .1

P

So

Dcarota0075

52

o0

0 no

So

So

Dc

5_

ca ro ta _N LO BS C _O P _L Pn P nPoPn0RnR Pn s0 08P o no Po0o Po 83no L8Og 4.1 01 P0n2o4 0P2PP3nn0n8o0o08803 PP0nn2o5 C430 _9D3P P0P428 nP9o0008858 noo0012 _O0 c5a9no nno5o61.1Po0n1o0400838.1830_4 0226551 .10_2 PP0n02327_no310.38880.1 ro .1 s00.1 taD _31D33D 3.1 5 D 3P _a5DP11n5768o6_.1D cD 72_.1 8g_O _5C c1an6o2o0051.12030_c742aD55ro .1 5 00.12_c c 2 .1 r c P nno.1 3 ___DD Cro_.1055P16.1.1 octa 43s __5D aDr 1o9a030_r3.2 raccora.1 ta__D78n3c_16ata oData D 0a1ti otaC _D _ Pn N ro _ D1coccata BS o c .1 _ 1 3 D 5 .1 N C c 0 _acNrrta a4ro_ .1c_5ta o0 cBaSro 0v.1a _15D5D acDaro C C_C5C r4oaooB_ta __C 25 2_ta _a32cN rootaN __CaNCr.1 D1_c1ro ta LLOO cNta taD_B 5 .1 ta r a S B 5 a ta o _CC_BC .1 B _ 8 OC a 7P7 C ro _S_Nro_S___ta ro _N_ta __NcSN LO ta .1 __OLOO sa_N LCC n.1 CSN_B__N C _taND O B__L_BDSND o0_SPC BSLNSaBBBrS C _ta _c_BcRN LS Cs_Ls1OC a R tivB S n N N 1 3 c C a LO _Os1 _ r _ 2 N R _oN 2A3No02BSaBBro OC _LS R ta_ _LB_N BR_SLoRSLSR aS__ LB 2O s2g1_gO Rrota L1Og3 LC O _RLRL_RN 51_S O_COLsOLC _Sta 8.1 _R__LSBN RLta 029sg92s02869g P 2.3 _R NLR CC L 0 R R L W _ 1 0 _ R 5 _ S _ R 0 R 0 N n 0 BSR RBR_R O 1_gO OO50 LOC 7 o01 _5S_A7N.1RRCRC R g4.1 PP7840.1 SRSLRRRBS 0Os0 2 NB_D _B _Os0L s113g0s201Ls611sg1.1 _L LO Pnnnoo0001_028_O g1L2_g9O N .1 7 7 0 2 .3 C_O9OC LO S .1 s R 9 s 3 O 4 a 1 c o02315038_3O 92.1 ati_v O 53 W R _ a BS 979a09tiv C00_O g14 _OLs0 s0 OCC R 123547002ti9.2 .12O_1Cs1 0.1 LOLO a__N 0_a.1O __O sO .1 s_NaN.2_SL5R_ro ava_tia_O ta_ 8s.1 R 90.1 C1g 9 s .1 ti N C4_O 33 _ s0 _ 1 O v 1 ti 1 _O O 3 v s 1.2 s03sa vtiSaS AN BSNB aO 68g_ a_B sB LOs0 2_1s9agO .1___S gg3gtiv 0sa as0S DSaAcB 4.2O2_ 723tiv Nv_B_a_LNL .3 _LS A 4g7g Cs0 ti3C S 38829a35N _O N vStia.1 _08B aN 0sa 1v__ti_1aLv0_aR_R 43 A a 0 17 _ O R W .1 ro .3 S B .3 R N R s0 34 N a C _ 3 N R LOC3g25 Pno PLO SR 5_ R W W__55LB N .3ta _LN O 0.1.1__O R.1 tiv B N B NO B saS __O W LOC_ C01 200. 1_ R B S 01no S_s _ S a_ _O0. 1_ sa _O B _O tiv _BL NB 75 84 O0. sa 5N___RB S C N O1_ NR LtiR tivLS s05g Pn Pno0 OsLO s0 sa sa a L 88 a C 61 sa tiv R S S _ B R N 5g o0 R a 09 _N tiv .1 tiv N 11 vRaR_ aa__N _RN SL__R B_S R _L 07 4129 _S.1 B a_ Otiv LOC_ g2 BN C_00 Pn7. S 36 91 31 LOC_ 18Pn saa_ A41 SS B _D C Os40 _LLLRR NB NBB N.3 _L_S 4. 0. tivC OsC_ RRR RR LR 0. 09.1g2 ca 1_ LO SSS B 21 1_ o0o0 _N 1_ LO _O 02Os 1_ R O a_ W _ S ro SA 69 13 D R s1 g1 R L C_ _L _ O sa 5_ 00 B R NBSSR taCO ca 0g36Pn12 81 N6. LRRRR sa LOC_Os tiv.1a_ Os 20 sa _N LO 02 .3 ro 1_ 40 _L 8. C_N 09 tiva_ RRR g1 W C_ ta LO .1_ 2_ SA Btiv 27o0 34 _L _O LOLO NB g2 80 5_ a_ Os S 12g305 LO C_ SA Os 0.1_ C_C_ 91 sa 00 N_N 70 S_ B_L 02 N Os C_ OsOs LO CNR BS ati .3 9. tiv R .1_ S_L 30 N. BS g1 LROs 04g R 02 Os B_N C_ CR O va 1_ 90LO _L a_ .1_ 3W 80Os S g1 sa 247 Os 04 .1_C_ _L R ati _N 06g SAW NB LO R 0080 ati 80 g3 tiv RN 04 OsOs 00. 5_ 04 BS R C_ 157 R_L BS .1_ va S_ N.5_ 06 RBS g3 ativ a_ g3 1_O Os R R_L NB 50. _N .1_ 90 06 Os 3W 06 05g vaLR a_ RR NB sat 1_O 60 .1_ 10 BS Os NB LOC_Os ati S_ 125 _NRBS iva .1_ 5_ _L LOC_Os11g395 .1_ sat Os va ati 70. _C LRR iva Os Os _N ati va 1_O LOC_O C_SC_ R R 07g33730.2 _C ati _LRR S_LRRNB ativ va _N R BS NBNB 90.1_Osati sat LOC S_ LOC_Os LOC va _N a_ BS S_C_ _Os0 iva s03g37720 _L _Os0 _N LRR BS NB LRLR _C va_N 2g272g27 RR _Os BS LOCLOC LOC S_L RR BS 540.1 RR 037 680. ativ07g NB _Os _Os12g _Os _Osa 1_Os 50.1 .1_OsaS_ a_N336 S_LRR 07g3 07g tiva_ _Os ativa 337 BS_ NBS _NB ativa tiva_NB 40.1 LRR .1_O _LRR 90.1 S 3720 _Os _NB sativ _Os ativa S_L a_N ativRR _NB RR BS_ RR a_NS_L S_L LRR LOC_Os04g53160.1_LOC_O BS_ LOC_Os04g11430 LRR Osativa s04g5 .1_Osativa_NBS LOC_O _NBS_ 3120.1 s04g220 BED_L _Osat 90.1_Os LOC_Os RR iva_NB ativa_N 04g5305 LOC_Os11g29520.1 S_BED_LRR BS_LRR 0.1_Osat iva_NBS _Osativa_ LOC_Os04g LOC_Os02g02 _LRRRR NBS_LRR 52970.1_Os 640.1_Osativa_ ativa_NBS_L LOC_Os04g53030.1 CC_NBS_LRR LOC_Os02g02670.1_Osativa_ _Osativa_NBS CC_NBS LOC_Os06g45690.1_Osativa_NBS_LRR LOC_Os11g24170.1_Osativa_NBS LRR Pno013210.1_SAN.3W5_NBS _Osativa_CC_NBS_ NBS_LRR LOC_Os11g16500.1_Osativa_NBS LOC_Os08g20000.1 LOC_Os08g19980.1_Osativa_ S LRR _LRR iva_NB NBS_LRR _NBS_ ativa_NBS iva_NBS NBS RR _Osat _Osativa_ 15700.1_Os Osativa 0.1_Osat BS ativa_ BS_L 2g29710.1 9390.1 LOC_Os11g 380.5_ ativa_N va_N .1_Os LOC_Os1 s06g4 s06g49 _CC_NBS Osati 60.1_Os 04900 LOC_O LOC_O NBS ativa 96.1_ s06g49312g2969 Os07g S_LRR tiva_ BS g534 LOC_OLOC_Os 1_Os LOC_ _NB _Osa C_N Os04 060. S ativa 270.2 a_C LOC_ 2g16 _Os 2g16 _NB _Os0 50.1 _Os0 .1_O ativa LOC 022 LOC 6330 _Ossativ 01g 80.1 02g1 _Os _LR RR R LRRR LRR _Os 022 BS LOC S_L S_ BS_ _LR 01gBS LOC _N NB BS a_N _NB _Os iva R BS BS_LRR a_ _N _L ativ tiva sat R LOC LR _NRR iva ativ BS _Os Osa S_ 1_O _Osativa_N sat Os .1_ _N 10.2 80. NB _N _N RR SvaBS .1__LR va 2_O 340 a_ 573 _N va vaati 572 _L Os R 20BS ati 70. NB g57 01g iva ati R RR ativ ati 01g 02g16250.1 02 BS .1_ Os 572 R _LRR s01 C_ sat _Os Os Os g3 _L Os _Os _N LR Os 80 .1_ C_ 01g 05 C_O .1_20 va LOC 1_O LOC BS .1__C _N 45 SS N_N BSR S_ LO 30 .1_ Os Os LO ati 30Os S R va 10. _LRNB 90 g4 BS C_ C_ 50 45.1_ ati NB 11 67 352 LO g0 va g0 LO 01 Os a_ RS_LRR g1 Os 04 _LBRRSRRa_ 02 11g ati NB CCS_L Sa_a_ g3 N tiv 06 BS RR R .1_ Os Os C_ Os a_ 02 sa B 00 Os C_04 LLLR R C_50 LO g024 _N tiv Rtiv sa .1_ NBBa_ tivtiv S _L Os LOC_OsLOOs 59 N C_ LO a_ _O S_L tiv ONB sa SB__S CC C_ .11_ sa a_N tiv O1_ 09 1_ 90 sa O a_ 04g2 O 10 LO30 sa tiv Ssa aN 0. N_BBBNSSaa___NB Oa_ BS tiv LOC_ C_OsLO O46 27 0. 04g3 ___N 1_ sa 0. NB 1_tiv aO g3 2_ 68 Osa O32 0.N satitiv 10 34 LO LOC_OsOs 0. a_ sa 12 a_ SBS _N tiv_aativ tiv 1_ 1_ 2g S0. 22 C O 10 tiv tiv 0. 18O0.sasa34 2g66 a v tiv BB10 C sa s1 32 2g sa C_67 s1 NN71 R O_sa O a_ a_ O2g O _O 0.1_Os s1 32 O5g 0___.1 0.1_ a_ C O tiv tiv 2sa 7.1 s1 102g C_ 1_ 2g .1 _O .1 9 1_ tiv 6 O s0 05 sa s1 sa _LR 0. .2 LO s1LO 0 0 C 2g 6 LO 1 0. O sa O 0 O 6 4 C_0.LO 39 s1_O20 23 OC 1_ BS R 1_ Cgg_O 10 61g16446830g03211 1_ 0. N _O LOC_ LO40 34 2g0. s06g LO LR LO 36 6g_444O1s0 s133 g O6s0 10 __L_R vaS 10 LOC_O 6C s0 s0O_5g S _O 2g 10 2g OO_s0 C2g aBtiB s04 s0 s1 _C s1 LOCC_OOLCC LO _N s1 O_asN C_O_O OC_OLOOC_O RR LO LLLO LOLO tiavti_av LOCC_O L _L 7s0as.1 BS O 1 BS _O _3 4.1 g.2 _N 4 a 0 _N 0 v 0 9 ati Os0270 W5 Os C1g_14g24 N.3 .1_ LsO1s1S R R 510 SA _NO __O LR CaC _B R_LRR 1.1_ g58 O O S 1 R L L v S B 0 S LBRRLR 13 Ss sa_tiN LS_ 00 N_BO _R_N NSB _BS BRaRSS R aO_C _O W5 0 _NLLvB .1.3 N 5_B BR NR no sativL vBasSS_a__tiN NR L_R W_ 2S8A0N _C O _O SCta __L P C.3C s_a_a_NtiN tiBv5Baro g3.19_ 0.1 0vaa.1 _O BSS s cNN a _BS taA_N s1171 153 BS _ro .1 O sa5ati_2tivC CtaD_W _ _O 6O 333 7g0 _SN 5_c.1N 500s8.1 a N_.3 _.1 5BS OnCo0 .1104__g1OW R Os0 SD9 LP BS g84s002.1 NNB 85c7Aaro BW _5DS RR C_ .3 .3S__W 2 .3 0_2 L_R _N s411_13O 07N N .1 1 .1 N N R R 4 ta 5 LO _ C B ta _ g S 8 2 R 1 A LL Og C 21 A N roB RSA5o560 ro a_SC C _ta 0_L4O4s00_3Sn4o030.1 W ca__a_SN.1N_.3_LBS SS__ cS Os03_gO.100P11 ativNBLO 23n ta 5_.1 ro BS P _DD NBBR ta Ovsa_OC__LOOs9C5P7no6o009 4BA5N o0 9130c.1 c.1 aaaro_0roN RtaL_R_LRNRRRR 5_N .1s_ati LO 2701D C 1 n ta1 _9SPn R BCSC 2_0O L_R o01.1 84_7_a_Dro L Pno0 P D0c.2 _5N BLa_BSSrBoS_RSR__L .3W 0201.1 Po2on0201022.2.3 c 0n4o0 5WN_cN BNRRS N B5_S 04g79 9 P1n0n40_7DP NW __taD_N__N Osg50 _SLLB A _.3W o001BP612S03.15o500 N.3ota C_05 BS WABS5NS .30Ar0oar.1ota rCBta CNSS_B _S Pnn_o3N n B 5 0 N L_OOs P N o No_ 5.1 PC2n S A_82cDcaca_a_ta C 5_ N5S_.3.1S0N RrN 0 ta LO N_BnCo0P cCRo _6 _D_Dota BWS 2A.2 .1 aLr 52 7n03o6.108.1c_a.1a_rcro_DC taro_taP _N.3.10_32.3S583W 5N ctaDSBS 02 ca aro 0178P803_54_3Dr6ND1o_B.1 SA 205A62o4N N no _W D_cD .1 c38a_ta.1_ P N.1.3 9n5_o0S1n 001470.15ta .11_.1 3A6 01Pn.1oPono03o040_3r22oD 2162 3r2o 6_4S no9P8 PPnPn553.1 oa5a 021 30.11 P159 0022P2oD9n_0cD2 c 0n1o o 3 o o o 3 0 7n n_ PnP PPn2n5.1P3.1 13P Pno o0 o03323 Pn P2n05235 on0o PnP Pn

AT 1 AT G66 2G 95 0 CA 3638 .1 So 00g 0.1 14 So lyc1 8 tub 1g0 20 CA 11g 670 So 12 024 00.1 8 g .1 So lyc1 055 10. 1.1 Dc tub 1g0 80 07 ar 11 g ot 3 a0 011 00 . 1 19 4 01 60. .1 1. 2. 1 1

C

CA

So

72

15

98

50

CA

07g

189

80

RR

S_L

NB TIR_ R

R _L

BS

BS

N C_

BS

_N

CC

ta_

aro

Dc

.1_

6 43

21 BS _N _C o0 BS _N _CC rN ota Pn C R a_ _CR ota RDcC taLR ar R _taL_ C rSo_ Dc LSRBB8SrS5o.1 R BcSaBS R.1RB__SN__B 8Nc8a_LR _N S 0D C C B __NN_DNBS_L3RL3_CRNCC _ 0 S N C o B Co.3ta_BSBSS1S3C _ S _6_n.1 RR ta_ R RBNB1BCtaBoo_8ta ta8P _N _6Co7rRta _L _No0N_r_oaNar8r C C __N aro LR ta5a_crata L_taN _ta ac0c0 _NBSS NBS Dc S_ c ta CD r0o_4D ootaaCnDrC_o_cD 5 .1_ NBRR RR BracSoraror_cCP_.1 coa.1_0aD _ no Wta__NB CC_ RR S 14 a__L D59n.1 R SS5_ 97.1PNa.3Rrota _L_L _8DNcDccata5oD.2ta _7Pta SS NB 58 ativBS cLroNBBW R __D__Dr.1oa5_8r888SA BB o57.1 5_ 00 s_NR _N.3 810.12r88.1 _N LR .1.1ca34003c88 .1 _N _B_DSDc_5a5_N W PnBoS.1a_tiOva_LR S vtiava BS_ 50o261c01282a971228_o8.1nDn08o_o280D00.1 W A .3 _ ti R 1 S a N N 0 n 4 W a s S N s R NB .3 N 3 D B _ 1 _ 2 NRR_L OO s _ 0nP_o1260.1n09PP 10513ta2.1N.3_ _LRS .1 va_ SAativ3a7.18_CO _ _arota _R _R o7o0Pno.1 0o00201n80o8P7030a8610ro2_S0SA6AN3.1 BS B .1 g 0 0 S C o sati n 0 s .1 c N 5 6 n L 6 9 P _N 5 o R _ B__ 4663 _D 2 O 03 N _ PP2PP7n0Pn000 PnonoD0c61.1.4 _O W5 _B 16_L _Os085Nta _ 34 roSta 1g2 .1 0.1 N.3 01nono P.1 SBS 14BS0.1_4gc5aW_ro P3n91n46o0 cas0s40Bg4S3526 76 SA S N_BN .3 o0 P P 42203016P o0_N012OsC_0NDW 36 O 1 _D_O .15__NB 1g vtiav_aS A7.3.1O Ptinv1ag1C1_L2OS.1 S3378W C_5n_oN0 ti a Pn 01P1nnoo0 1 C B s N a s 2 B a sO N L.3WP O S s s1 O911.1 _N 1 .3 o P S3_A5 LO C__NSB07.10_.1O_tiva_ ativPano_0SAN BS _O_O o054L0.1 0o0_1 SAN Pn O a s .1 L .1 9 ta 8 C B N s n 9 _O 796S1 rota_ 30 O P0n50P7 .1_ _Ng0338S_O aro cW 85 L Po0n1o9 776 Bs95S05g58_33N0B.1 6680.1 _D ota05_3NB nW 1 _Dca _.3 g5 P _NsO0W N 0.1 55 g4 Nca .3ro _CO Ata 1.1 Pn no00 01 37 SroC 01.3g Os02 _S_A D 520 AsN Os P 08 c_aOLO 9.1 o01 .3 o0 _5D.1L__SO C_ NBS C_ Pn038.139L9O.1C LO 3051907623 Pn BS W5_ S LO .3 0o o7061013 B R SBS a_N _SAN tiv PnPon5_NBNsa _LR W5_N NBS oP0n1 o0 .1BS NBS 05 a_ O R Pn Pn W N a_ N R tiv 92 A .3sativ a__L _S tiva_ N.3 00tiv sa0.R1_ nosa A1_ O38 Osa .2 BS1_O1_ BS R 1_ _N P O5_N38 _S 96 16 ta 0. 0. .1 _L 0. ro 1g 6 S 89 W 59 0..3 no03 78 ca7g02 s0NB 49 NBS 6445 00N 371g 29 W5_a_NBS .1_Ds0 s0.34g C_O _SA 63P o0s1 tiva_s11g _ON _O LO CSA PCn_O tiv 94.10143 C OsaC_O sa LO _NOBS SS ta1_ 5.1_ 0.1_ LO S 0253Pno 13LO ro0. LO46 21 NB NB ca NB 5_ 5_ 95 D Pno a_ o0 20 71 3W sativ 3W 1_ N.N. Pn607. 1g 2g s0 SA SA _ONBS S O2. .15_ 20 1_1_ C_6. 00 _Os0 o010 09 LO Pn 5_NB 3W.3W 248019 02g1 S LOC N.AN o0o0 NB SA_S NBS _CC_ Os PnPn BS C_ _N BS 2.1_3.1 rota_ va va _N 12 LO10 SS ati ati va _Dca .1_.1_ NB NB ati Os BS Os 3577 5_ a_ _N Os ativa_NB o05.1 Pno0 rot BS ota .1_ 70 Pn04 ca.3W _NBS 30 _Niva car AN 8030 57 18 _D ota _D 05 _S g0 52690.3_Os g3 _NBS sat 4.1 g3 8.1 Pno0 Os 03 S 9.1 08 3.1 82 1_O 68 10 85 Os _NB 20 499 rota_CCBSLOC_Os04g 80.car 3.1_D 17 05 Os W5 C_ o0 o0 C_ o0 o01 019 937 C_ Pn LO Pn .1_Dca Pn LO _Osativa_N S SAN.3 LO 38830.1 Os Pn _NBBS .1_06g 030062 C_o00 3W5a_N 012 LO Pno AN.sativ 11g 038 .1_S Pno 714 .1_O LOC_Os Pno031 8470 08g2 _Os _NBS AN.3W5_NBS LOC .3W5 _Osativa_NBS 5.1_SAN Pno005998.1_S NBS 0984430.1 Pno06g02 sativa_CC_ 0.1_O a_NBS LOC_Os0 g3902 Osativ BS Os11 5_NBS 010.1_ ota_N LOC_ AN.3W _Dcar s08g45 442.1_S 8348.1 LOC_O Pno02Pno030 arota_NBS 93.1_Dc 3W5_NBS Pno0292 1.1_SAN. Pno02167 sativa_NBS g39570.1_O BS _LRR LOC_Os08 _SAN.3W5_N Pno025552.1 Pno018589.1_SA ativa_NBS N.3W5_NBS LOC_Os12g28050.1_Os Pno005919.1_SAN.3W5_NBS Pno030226.1_Dcarota_NBS

LOC_Os06g48990.2_Osativa_NBS_LRR Pno017303.1_SAN.3W5_NBS_LRR Pno007540.1_Dcarota_N BS_LRR

LOC_Os11g4 LOC_Os11g2903 LOC_Os10 6210.1_Osati LOC_Os 0.1_Osativa_NBS g04180.1_O va_NBS_LRR LOC_O LOC_O 11g3932 LOC_Os1 sativa_NBS s11g39 1g39260. 0.1_Osa s11g29 _LRR 160.1_O 1_Osativa tiva_NBS 050.1_ sativa_ _LRR Osativ NBS_L a_NBS RR LOC_ _LRR_NBS LO LO LO Os11 LOC_0.1_O C_ LOC C_ g4342 C_ Os11g _Os1 LO Os LOC Os 42660 Os LO 1g43 LOC_ _Os sativa C_ 06 .1_Os 06 06 320.1 C_ 11g4 Os11 LOC _CC_ LO ativa_ g0 g0 g0 Os _Osa g397 3410 Os NBS_ _Os NBS_LRR LOC LOC 63 63 64 C_ 30.1_ tiva_ 07 11g4 LRR .1_O 01 80 _Os 90 _Os 00 LO NBS LO Osati Og4 sativ 3480 g1 .1_ .1_ .1_ 11g 11g _LRR va_NBS_LRR s0 C_ a_N C_ .1_O 08 LOC 63 432 433 Os Os Os BS_ 3g Os sativ LOC 10 Os _Os 50.1 70 ati LRR ati ati LO90. 30 a_N 01 _Os 07g 1_O va_N _Os .1 01 va va C_O BS _Os 347 91 11g _O _NBS g1 sati ativ Os g1 50.1 s11 370 va_ a_C 0..1_ 63 BS sa 64_N ati LOC_ LO 40.1_Os g43 1_ NB.1_ C_N 90 _L tiv _L va 00BS C_ _L LO S LO O RR .1_ BS_ LO a_ RR Os _N _NB RR .1 Os C_ ativ sa C_140 LOC_ C_ LRRativa 11g _O Ossa NB a_NS_L BS 06 Osa Os LO tiv Os RR 11g BS_ 11g 384 g1 atitiv tiva _L C_ S_ a_ 05Os 64 LRR 385 370 va 40. _CC_NB Os RR Os01g g3.1_ NB LR 50 _N 50. 1_O a_ 11 16 RLR S_ 1_O g320. 1_O 10 NB BS sat S_LRR Os 85 .1_ satBS sat iva 2338 _L S_ 80 ati Os iva iva _N .1_ R va RR LR _NBS ati _N _C Os BS 0.1_ va LO LO R _LR ativ C_ _LRRR _C _LR LO RCC NB a_ C_ Osativ C_ LLOOCCLO S_NB C LO _O O _O _NBS_L LR S_LR LO LLO C s0 R LO a_NB O LO_O C_L_OO LO 6g Cs0 LC R C 8g 6g RR CLL_O _O s0 O 06 _O CCs0 s0 06 O LO 07 S_LR _O 8g _O 86 _O CsC s1 s0 s0_C O s0 85 21 92 0. 07 _11O LO C0. C _O LsO s1 LO O LLO_C s0 g LO s0 8g _sO 0. 1_ 8g 7 _O O R 93 3 g 2g C O 1_ C 8g C_O OOsa 28 1 28 0.s0 1_ 221C 2542 8g _O s1 30 _LO LCLO_OC 07 s0 1_ gsgs1 _3gO 17O 28 sasa 54 46 tiv O0. 2g s0 2s3C 08 94 O tivtiv sa a_ 7L100O.1 57 0. 7_13177g7sg411712535g158 O OC__sO 8g 30 0.8g LOC 0. a_ NB 0LO tiv 1_ _ 1_ 07 07 1_ s .1 0. L 07 1_ L N 8 7 O 7 1 O 0 7 L S_ a_ a_ O C 77 0 O 1_ 89 _ BS _ Osa g8C3g7s0061.1 g0130 0.O C_OO8sO sa OC 003.1 LRRR sa N1_ sa 0__4O C 4.0.1_ 1__937 1_ _L Osa O.1 BS tivaO sa C 13O81.1 tiv tiv tiv sa Otiv s0 Ossg1C _8___O _N _L RR O a_ 120C g08O a_ a_ sa tiv _tiv sativ _11s.1 LO O107.1 4gsa R O s2_O B 4O Osa NO a__NN tivN NN LLO a_ tiva_ Sa_ BS C7BB sO Bsa B a_ titi.1 80ggg6004ss4s031s11.1 0O_1aas0O a_NN tiv 1 .1 SS NB C S C0C tisv1aa1tiv NSB _L BS __N _L O aa_v_a_O 0vsvti.1 29 1411112g_ggg1.1 11133_7aO S_L a8B_0 R LLO RR _LLB S_L SR _L O L_CLO a_ti_vNN aCN RRBBS _LRR .1 R R _L R O L__OO a S 4 3 N _ B 8 O 3 _ 0 R C .1 s OC s _ 4 ti _ 1 5 O B N S 4 5 s S B R .1 R R a O_saLR 1C R vtiavs__aBNS R LC _aL_RN 4.1_s_O S__LS LO a3ti410000.1 _s111_Lg11O _LCC .1__aON tivRa .1 Pno L_O R 3g4C LLOLOLO _stiCBvB_S CO L _RBLLSRRRLR 3 _OOssva.1 OsO aS OOCO s_O O aa___O 0048 NRBRRR R _NBS O1O O CC__s_1C sasaatiaCtitiv__RN 2_sg1gs04009134g_1278Og08s0satititivvN aaO LC 35.1 _BsC S OC_C_LO_O_O SNCtivvavaaLa_R _LR 131s g0 .1.1 Oss2gssO R _Dca SB__LR CL_OsO _18Og_vOa__N _BN 020232g601g367403g2105148.1 R _C LLsC LLs1OO_O CC L R rota 30sa4s BB_SN__B C CC 177.1 90_2O 69060.1 LLOOO1CsO _CC 1_O 1CCC1g_sg21gL6813O _S __RR ti9a_v7ti0vS_LLS .1_.1 sO 223_0600_.1 _NB 171 597C BBB aOtiO a a_ RRRNNN 3C_g_3_O CC1Lg1OO SSS O O0_.1 .18.1 S ssva_as.1 __LL N_BNOBR __OOgs74O 137OOs78ssg8030718709_.1 0O s_a_sO RRRR atitia_vtiN S s S v 1g10.1 gg8717.1_sLO 1_O1atiOs_vtisaaO sCs017_194O1gs701134.1 a v _ a a _ _LNtiRvLaR g ati_asti_avtia_B_CSN C 81gg4s77_3g8.1373_89_7O03.1O RS_RN a_v_Ca_C_B_LBR 857s26a0_sas3tia_v2tiOv1CvaCC 00713.11O5_07sO _CN _NNSNB R_ B a79tis04.1 ti.1O avs71_01C 36L3g54O.1 .1gC_C B__BSBNSS SLRSR_L tiv_v_a_s_Oa _aN 4LO s1a_vaOa.1 .1 N 00.26C.1 R 3 N ti B O O a _ v O 2 _BLSR R O.1__4ti0vs__aCOs_NssNB SS aaati B _ 2s1LBa0LtiSR C OO.1 _RLR _Osas__tiNCvas_aatiBvtitiSS_vv_aaNL_B L_R v .1 R RR va__LRN _NaB S_ RRa__O BsN sa01ati9tivO C N RBRB BRSLR sCa 1ggva0a7Sa_ti_LBS_NSN 2_7N5N8vBaR_R_BLSSS___LLRR _tivNaB LRRR _SC_ 4B40SS.1C R_R L CL_R 0.1 __L_LCOR_ RRRR NRB _RORsRN S_ aB sativS_L LR a tiv _ R R a_ NBR N S BS _L _L RR R R

RS RNB S _aL_ NB BaStiSv a_ tiv _NNsB sa aO _O ti.1va_ 0.1 sa4ati0v 00 8 _5Os4 g3 g_0O 08.1 11 Os 0s40B.1S C_ O59N S g_17a5_ LO NB 0g5Cti0v O BS a_ BS 0Lss9a N v R ti _ _OsO R RNRR BS LR sa va C_.1O_ LRRLvRa_R R _N R R S_ _O LLRR sati RR RS LOC0 LSR_sSRa__tiRL_R LR _TIR NB30.1_NLBR B SSL_RR O _L S_ L3O6 B_NO R W5 _NBLNBSS_B_LR .1_BS S R Rva_ 6N1RBvaS_LRS NRBRR R 33 _BNS_S_N _LR N.3 BS a_R BSLRaRti g4L_R B_aa_N_BS S5a0_BN_SNLBR BS _SA v L_RLR LR ati S_NBN 6g vNav0a.1 _ R_RsRR11Sv_as SBa_a_ atiSB__SL S_ B R v O L O ti s S N s B a_N16.1 S s0 svasatia4_ti_tia3vaN tiativa_v_NB 2N8aB2tiav_BvNaS L B N _ N _ a _ __N B __L_ONs.1 as_astiavti a__vO_aNN_BB O aOtiO _N g_stiv_aNtiv_aSN .1SC ativ 32 a_vS 1tiOsv7Oasasvtiav 1v2aO B N0B BOtiS.1 _s _gaO v tia tiva BS _NOBsnSo01 v3a_a6Otis0vaNti_BO_O 9a0tis.1 C_ .1.1 ti _savOsa_tisaN vtia4_3N s_.2_Os_asti Oas.1 2_ssOatiavOsa C_N70.1 _P 102.1 9OO 18svOa08_.14.1 3_.1 aa 2v6Ca_ sL1ga_0.1 _ s0O.1 LO 09_9_02O _s0_sO.1ati_ti_vO 6 W5 g.10_S 2 aOti5_84 .1_ _C R2R.3 7sa_gtiC1_s101O8.1 8.14O8090.2_O _CO 00724_.160Os0.1 5g001sg122 s911930103.1 _asOO B810 tivaS0_7SLgA1N g970_111O v0s2a10_1.1 25911C07170_1475.174300.1 L.1O 22814.1 05_08O s_ 14 Os_aNOB 9931C_gN 4.11s C1_33gO 5g0O a_1ti9sC _s.1O 3C 86gL25gO4g21gg647 0220g15g04.1 _ .1 0.1 _sO 1O Og08_45C4O 11LO 1s11gg_sC08 0.3ti_vaC 006 g0gs41s11s21g17121g16 28s1111sg34085148g7L006O _9O 1C_gsg_O 41s5C O COO_sLasO 3Os2O ti1v_aO 460_7Osoa0L2O8 gs800g10.1sL1OO 0OO R BS __ s C 1LOL _ssO1 2g_sO0O 2 1 _ O g n 4 O 1 C g C .1 s___C1O O L s C_sC P L CCO 2 1 COs_C _CsOC81_sg_sO19010C a_NBS_LR S sO1O _OLO O O LLO .1_ Os71950 _ _O LO N R NB _OLOC L_LO sativa_S LCLO g0O C LOOs_C0OO RSR _O OL_CC C 60 _ 0 Sa_ B RB NLBR .1_O tiv 11 LOsC LOC OCOLLC_OLO S__R C1_2 LLO 08g RNRvB _N S_L LLOOLO _N sativRR RR g3 B OSCsa LO L COs ti La_RLR08a0_.1NS NC1_ O S_asL_aS _Ba_ CCO BS_L 12 a_ LC_L Btiva_ v_BNB_S1g46tiv 0tiv 0. C_ O tiv B N a_N a_NBS_L Os 66 a__tiaN _8O sa sa 5Oa9Osa LO a_O0Ns.1 s1_Ogsa R 30 LO Stiv 4 C_ ativ S tiv sa Rsa O.2 tiv BN 8g O v4_2tiv 1_ 1_ 10.1_ RR N _L sa ti7.1 C_ LO s0 4. Sa_ 1_ 0. OB_L a_ s1 a10O 0_79 O B0. Osa 0. 5.1 37 93 O 1_ _O tiv N _sa tiv BS LO1C5045 R 51 C N O79sg0_.1 18 sa sa 45 a_ 111g 20a_ O O 22 61g_2g tiv LO g0O S_SLR NBS 31tiv 1_ 1_ g2s0_90090.10 0g 1L sa s1 s1 a_ sa 0. s1 1 0. 2g 4 O NB 9.1 56074O s1 tiv NB O g 79 s1 43 _O a_ 1_ 5s1 _O O _O 9g7sC CC_O C Osa 0. 11 tiv 00g_ _s0 C 27 tiva_ 0.1_ NBS sa 2O 1g gO_1Ls1 LO 1g 83_O OLO 0.1_ LO LO Osa s194 11 LO s1 tiva_ 14C s1 LO_C 1_O 91 02C 1_ C O _ sa 1g O 0. _O 09 0. _O O O C 20s0 OsL_C s08g 7g 94 LLO Cs1 46O LO 09 _O _O 0.1_ C_OLCO 1gC_ 7g LOCLOC 2229 s1s0 LO L O O LO 0g C_ C_ LRR RR RR C_NBS _Os1LOLO S_ R _L RR NB LR RR RR LOC sativa_C a_N _L BS BS_L a_ S_ _L BS _L BSS S BS _N 1_ORR tiv va_NB BS NB BS _N va va_NRR _N 10. sati sa BS ativ ati va a_ va _N _N _N 573 1_O S_L _O BS _Os atiati Os ati tiv va va iva Os _Osativa_NB 50. 02g .1 LR C_1_O _NB 00.1 sat Os ati ati sa 40.1 Os RR 395 .1_ Os ativ S__LR 50 tiva 396 378a_N 00.1_ Os .1_ Os 11g BS 20 _O _Os .1_ NB 79 11g 06g Osa 38 00 90. LO _N .1 _Os .1_ a__L .1_ 30.1 00 .1_ _Os g1 g1 7979 iva g1 372 59 30 128 80 LOC ativBS 35 08 850 LOC g1 va_N LOC 06 sat 06 12g 04g_Os 53 79 78 Os Os g0 ati g14 06 R Os Os _Os g1 1_O g0 g1 .1_ Os C_ 06 s08 Os C_Os 06 C_ 60. LOC 06 06 C_ LRR LRR .1_ LO LOC_O C_ Os _NB 9710 R S_LR Os LO BS_ BS_ 50 Os Os ativa LO LO g0341 LO C_ a_N a_N S_LR 41 C_ 09g 1_Os 12 C_ C_ sativ sativ g3 C_NB LO Os 280. LO LO 09 .1_O LO .1_O iva_C C_Os 2g37 _LRR Osat Os 2710 2700 LOC_ _Os1 _NBS 70.1_ 08g4 LOC 08g4 sativa g426 _Os LOC_LO _Os 0.1_O Os08 LOC g2750 LOC_ Os02 LOC_LOC

20

Sl

W

N

25

C

RR

S_L

B C_N

LRR NBS_ sativa_NBS_LRR 1_SAN 90.1_OS_LRR 28va_NB 8g3.3W5_ s0_Osati C_O Pno00 O2880.1 s08g3 L5152. LOC_O RRR _NBS_LR BS_LS_LRR _N ota_CC_CC_NB r.3W5_CC 5.1_DcaAN.3W5 __LLRRRR 7800.1_S SSR _t_iN aaaS G019657.1_SAN ttivisvB CPno007 Pno0006 C O ..4 O 6a__ 2s2 R LBB N m v9tsasi_va_aO _ fU 1 .r._ fgsg vN 6.m S 6h0.e U_yC 3sghn hhLrrC A 0esn N CO C R m h0R3..N s45R LNRBRS_LRR ge0se __ Sa L .1nf1e B_ n _N aN s.t2ia O hnO _ 1O .A C 4A 7 Oe O

57

.3

158

E16/C16/D13

Pno023723.1 Pno000157.1

So

g1

AN

65

Pn

45

4.1

o0

E25/C12/D6

MRCA

90 70 10 57 57 193 g0 09 g g1 12 A12 CA 0 1.1 CA C . 60 18 970 .1.1 30 g 03 029 210 CA 2g 98 0 b1 tu 12g lyc So

54

g1

12 CA

12

Pn

98

Pn

E33/C6/D0

Sotub 08g0 1 10 6 Solyc0 0.1.1 8g067 620.2 Sotub0 .1 8g01 61 10 .1.1 Solyc08 g06761 0.2.1 CA06g0 2280 CA08g056 00 CA08g05490 CA08g05620 Dcarota001031.1

c

22

37

_S

CA06 g022 70 CA0 6g03 730 So Pno 006 tub06g 016 054 4 .1

50

CA

o0

130

.1 Dcarota031692 690.2 Dcarota031 29797.1 Dcarota0 0.1.1 g01964 1 Solyc12 15520. AT1G .1.1 8050 9g02 0.2.1 Sotub0 9167 0 09g0 118 Solyc 3g0 CA0 0 50 861 219 3g1 .1 12g CA0 CA 70.1 4 2 .1.1 g03 180 .1 b12 100 Sotu 0.1 12g 3248 .1 lyc g0 So 0.1 12 19 970 tub 00 g1 2g21 So 12 1 lyc CA So

91

Dc Pn

E6/C51/D20 E8/C13/D5

4

E6/C0/D0 NBS

b

Sl

16

Pn

55

E3/C15/D6

g1

MRCA

Dc

18

E0/C9/D3 E4/C1/D0

E0/C8/D2

12

42

19

E0/C3/D1

CA

E5/C0/D0

Sl

OsC1_1O LOC_LO 12 12 g0L.62 2_C 1O0_s.O 1a_stiO tivBaS_NBS g0s6 asg _a1 N O 0v8 1 50 CL1_O 0O 1s_ _O 1O a_tivaB_SNBS 1sO 2 4_4 7O0 0 .215_0 Cs0_ LOC_L tivssaa OO s0 ..1 gO4 C_g 02Lg v_aO _sNaBtiS 00.11 4_0aNO O 0B ssa03t1 4S 0.t1iN ivga LOC_Os12g2517 va_NBS 0.1_Osa LOC_LOC_O O s02g1s02g1 0tiva_NB Ssativa_C 580.0550.1 1_O _OsatiC _NBSS va_NB 01279.1_Dcarota LOCPno0 _Os10g0 _NBS_LRR LOC_Os04g 4510 LOC .1_O 30530 LOC_Os _Os11g29110 LOC sativa_NBSLOC ativa_NBS LOC_Os08.1_Os 06g_Os 01g399 _Os10g04 .1_Osativa_N 43670. 90.1 g30634.1_Osat 1_O _Os LOC BS_LRR 550.1_Osativa ativ sati LO _Os iva_NBS a_C PnC_O va_ _NBS 012093 o02 C_N CC06g PnPno 179 s129.1 _NB 994 BS 10.1 o0 g39 .1_ S_L 18o0 Pn 620 SA _D RR_Osativa_NPno008534.1_D 1912 .2_ N.3 car 9.2 Pn Osa W5 ota 38 _S o0 BS_LRR tiva Pn _N _N carota_N 3.1 AN 21 _NB o04.1 BS BS 43 _S .3W 21_D _LR _LR Pn BS 433.1 Pn 5_ R RR o0 RS_L .3W o0 NB Pn Pn 00AN ca _D Pno0 37 S_ o01 rot Pno0 5_ 35Pn car 00 o022 a_ LR o0 NB 092 5.1 ota 9.1 NB R 15 35 S_ 29 8.1 0519 _C _S 52_S Pn LR 50 _D 0. C_ AN Pn LR 2.1 AN P 1_ car RS_ NB 2.o0 0.1_D R .3W o017 Pno .3W _D ota SA 1_ S no01 27 _N ca SA 5_ N. Pn 65 5_a_ carota_ BS 01P23 61 Pn 3W NBrot o0 N.3. 9.1_ 56 o0 S_NB S_SLR 3W 5_ P81 11 Pn 1_ NB no SA no LR 05 73 TI TIR_ .6 77 SA 5_ o0 03 R_ N. R R _S 10 .202 2. TI N. Pn 04 3W NB NB 71 1_ _S 6. 75 R_ A 3W o0 3_ S_LR N S_ 5_ SANB PPno 29 A41 5_ .3 18 2. no00 SA TITI LR N26 N. .1 .2 W 1_ S_ R_ .3 66SA R RNB 0176 N 3W _S _S R_ W LR .3 2. 21 NB 5_ A.3 TI A5_ W Pno 5_ 1_ R S N N N R 5_ TI 20 S .3 SA 29 TI .3 _N R_ TI.3 W Pn 03 R5_ W5_ .2_S N R 5_ B NBS _N _SA W 85 _N S o01 .4 TIW BR TI_L Pn Pno TI BS R5_ P A S R N.3 Pn3P7nPPnPnono012 _L _N _N o01N 0083 oP0 Pnon0o P R RR _S R S_N 0n22.2 W5_ BTI oP 171.3 SBS 5_ 04o300P 3 n3o7 _L_N P 41o7W 99B.1R NTT 90 _L n90 6824A 0Pn3nno2oPo0o .3IR RRBS IR R 94 00147 035140Pn 2n7o.1 W_N .1 _N _ 0.2 7_S Pn Pn 5_BT 100770959.1 176.2o0 7023_58SP.2 80_SA 6_.1 ANR.3 An7001.1 0P9.1 B_S SN A .1 S_L oL SIR _L _AN.3 o0 9967642S.1 27 50P_No7nS NA.3 .3o0_W .1__S 0AN .3 S n3o3_.2 _N .1 W .1 RRB RW5_TIR W W 17L 03O _.3 A9cN59 .1 N W A07S7A 0S1A 32C_ R ___SSSA S .3 5N 9T.1 _7S N.3Pn D 5___ .3 A55N _B PS NN P _N0S590_NN .4 W 1O7C P N _NB W n_L Pn4 Os NN.3 __AS5AN IR 5 o0R 2A0NWo00a.3 ro.2 .3 B SB _5Sno0.3AP6W W BS_ .3.3W S 5.1 0_.1 n2o.3 11 S_L 22R3 W _OL noo700.1 _ ta A 5 W 0 0 2 N 2 BS A 5 W .3 5 N 5SN _ .1_W _N 21 __TTIR RR BW 2 _N N 98403T_5IR LOL_DsO0 Pn027832S45gA29 55_5___NN 5SBS.3 S W BS 0.1_D S5AB 6.1 IR_.3 TBBIR W5 9.10_9A NB.3 _N3N 1ag_ o08P682N 09 5L_R _NN _ S LCOO_cCC .3 .2 caro .3 N N S _ _ 2 O NRSS_LNB B 0 .1 _ n 57s 5o9.1_SW0.1 LSR Dc BBSS_TIRDc _SW OO_sro LC .3BW ta_N 5_B S RS L 4 090_SA5__O O0ta _5LR aro __LLRR_ aroSA LLO s08_O07CC 0 BS O_CO N 1 NN.3BS R _LRR Pn g 0 N 1 C A R s .1 _ _ ta ta 7 .3 C sO 37 C_.243 _1N0.3B NR Lg0O Sa_tiva BS _TIRRR C_N _TIRW5 o0 44 SA.1W LCO__O O9ssg20gC169_08O_NsO 5_LR_ W BS _ _T 0 _ 0 s 15Pn Pn _ N _ B 9 1 L C N 4 5 a C s003_g4O010.10Sg.1 D _ BR C _ N IR RR NB 41o0 o0 P LOO9gg01s461_.1_ti3Lv9_aO.3cW aN5roBS_ _N S_ LRRBS_L_NB L_O _ 0 s 4 0 R O 5P.10P8n 08 n s S _ B L 1 S 0 L 4 R C a C _ 9 ta 6 RR C1_2_00L9.1g0_s.1Oa sR 0Ctiv N__LRR S_ R _LR n_o7o70 88o0 001O LR a_ BC ati.1__N OgOL3s5OO 4_0tisO R SRC v D0Pn20.18 8.108 R 4 .1 C a _ O R s N v B 0 _ .1 c1a1o _7 _ 77 a0tis_aC L a 0 1C __1O v ti_CNsSaBti_S_RNRBS .1 r3o030D69 D 0.1 P 37g22g_0_3OO 1 .1O n L _ c ta589c.1 0sss1saa2ati__vOaC__BNSvLaR _.2N2ar_oDaro_D o02 091_71O 61tigv sCNB_SL_RNRBR LRR _BD4.1tac ta ca 9 R 0 0sgva3a16a_0_4aN_tiBN aB___SLLRR S_L SPc_an__aCr _C ro 75 .1.1 C vS o S_C_RR RR 7 _ti_v4O11C0B_.1 LroD o0tac0aCta_ _ C_ta_N.1_ Osaas_0aC.1 CLR R _R tiN__BLOR R_8r7oNCB NB B D c tivCv_aO_Sss_aRtiNRBS N9ta C B9S_.1SC_NS_LS aro a_CN CBaCtiLRva _ _CD_BS RR ta_ C_Sv__aNR__NLRR CC N LRBN BS cNaB _N BS_RSB_SL_ roS ta BS LR RLRR _C R R C _N BS

E3/C5/D0

Pn Pon0 o208 0644P 867n.1 7o.1_0Pn D_0c9o Da4c072 rao787 o.1ta Pnrta _6_3.2 Pn o0 N_DN Bc o0 04 SBa_rDoPn 04 97 PPnn 8P76SP_ctaaLo_r 0 0P 5.1 oo0 no.1noRoNta Pn 0PRB4n_8 P _PDPn00044P0n_0D o0 ncoao0 9988o40P08cn4Po8SoN70B90n.1o0 26 a 0 S4 2 0ro 1P7PP.1 0P40taPnP 65 nn0ono74or28n0o04.1 o5 8_4D52 o08n.1 1.1 94_no89nN o__0D 0D 0 40ta0_4.1 _08N4_4D21c.3a 04 2B _PDn 0o0.1 004.1 85_c8Ba8c.1 r_o .1 049009c0c447a4a6994r9r.2D coa02 Pn S_4_D69D 8ta 9710.1o78o.1 Sr0oa_.1 D.1 53.1 roDDtac__S Pnn6ota60_4 P PPn Pn Pro o ca_ta _88ta __.1 00P cac6aro _N .1 o o 66.1 roBAN nPoo0 rDo__taccDNaa_N _D 9n4o roD ta ta c_N_BD 2 B noP Pn 02P2n2o 02PPN 00nn0o118 taS .3 coata ac_BrrCoaSC S 6072 D_ca_D o00 5063Pnn6no64oS_0470_D.1 caro rro ta 3o50052582 acB rtao___NW ta .12_3 NN rr_oooNta Bata ro 896 .10o_2082252L6c.1 _a6P_D ro P RnDcota 6R BtaN_BB 5_ taN _N 08021328.1 147n.1 .1o_ 5 6 4 S9A P 4.1 D2cP9.1 4.1SSta 8 5 5 9 RNSS N a ____LSNN n _ 578_23D_.1 0.1 .1 0D _o0D _D anro D 40ota N.3 SBSBBBR 0c.1aroN _S BNBN ta Bta B B _ro D c1_a_4ccD7aaro caro 6.1 5D D 2c_4a0__N _1cSaN c .1 WAN SSSS S S_ 8 ro _ 4 ro D _ ro 1 3 c c a PPnnoP L _ B ta_Pn7ta ro Bta N ta 5_.3 3__ro ccaa9_ro ro R D LR .1___ ta .1 BS on 0 NBo5S0a.1 _R_ta NW PP _taS L_ta aroro ND_ Nta taBN L_N B n0o0n30o D R 86n0 1N 7P0 BS5 _ S D R R N S L 9 4 c R N 8 B c B c 4 7 R _ R _ ta aLBBro o1021o27091.1 _N N_aSro SSR S_N 36a.1 ro S R ta LtaR NBLB_S 5238.1 _.1 _6.2 __L_R BS PPno R 1572 SS4A_AS RS noP P00 _taD no P 0 N _ R _ 02 N R _ N N n .1 5 R c A L LOLO 00o66 no91 .3.3 14 .3 aBroBSS_LRNRB _D NW 009 D ca C_O 91 _ro 02 C_O 22 .3 SW ca ta_ RRS 65 14 W 1_S .1 .2 A55ta s0 s0 __N ta ro 64_S N _S LO 23 .1 5 7g LO N 3g N _ .3 .1 _ .1 B AA C_O LO N BN C31 BS 2626LO _N NN_A _S S _O S_ P B .3SN S C_O nBW A.3 s12g s00. A _LR SLN W 0.19 ___LL_ W LOC_80 C32 oB LOC .3 N N 1_ 8g R _RBR 1_ 05S _O 5_ .3 W 5_ LO .3 L s0 O R 3 R L LO O W R R sa W 6 N 5_ s1 C_ Os0 N R S _O 3g 69 sa 671R_RL 590. BBS 5_ tiv1_ 5N 0g1_ OC09 N 4. tiv _O s1 8g S__L a_O s1 B 10O1g NSR a_ _L .1_RR s0 B 1g N50 B N15 36 0. BS 11 1g _L R 17 sa S BS SAN R 0. sa R_ 70 0.sa 01 _L tiv RLR 54 1_ _L O41 1_ 08 a_ R1_ O4.sa RRR .3W 0. 1_ RBS R 0. a_ N 1_ Rtiv tivO 1_ O sa tiv N sa a_ O Oa_ BS tiv a_ _L 5_N tivsa CC R a_ tiv Bsa tiv RN S_L NBa_ _N a_ NB BS_ LOC_ BS N S_ RSNBS BS _L LR LRR Os08LO _L RR R RR LOLO g094 R C_30 C_LO C_ C_ Os Os Os .1 04 09 1297 _Og4 0716 LOOs g0 LOLO g3 g0 C_ sa 63 C_ C_ 88 50 Os tiv 00 Os LO 09 Os 20 LO LOC_ 90 .1_ a_ 06g4 LOC_ C_ 04Os g0 .1NB 12 C_Os .1_ LO .1 Os g004 _OS_ Os 94 g1 Os C_ Os _O 11g 85 28 LO Os 43 90 09g3 ati LOC saLR 08g LOC C_O 20 6018 ati 120 11g sa 30 va tivR _Os .1_ LOC LOC 161 .1_ g2 _Os 0220. vativ .1_ s11 119 50. _N a_ 20 _Os 11g Os _Os 04g _N Os 1_O a_ g11 90. LOC Os LOC BS .1_ ati 90 11g 119 NB 11g ati 142 2_O 1_O 960 _Os BS NB ati _Os1 satsat .1_ 123 Os _L 123 vava 50. 20.2 va 11g1 .1_ S_LR iva satativ 1g12 40.2 _L LOC S_ Os _Cva 1_O 30.1 RR _Os _N LOC_ Osa 2000 _N iva ivaa_ _Os1 300. _Os RR LOC_Os11g42040 ati C_ _Os sati LR ativ Os11 BS BS _C .1_O BS _NNB tiva 1_O ativa 1g12 LOC_ ativ NB va_ a_N C_ g120 _LR R R sativ BS_N sativ _NB _N _LRR 320.1 _L _CC Os11 a_N NB S_ NB 40.1_ S BS_ a_NB _LR a_N .1_Os RR BS R _Osa g1177 _NB S_L BS_ S_L LR S_L Osati LOC_O LRR BS_ ativa_ S 0.1_O tiva_ _L S_L R LRR R RR RR va_N RR s02g09CC_N LRR NBS RR sativa RR BS_L BS_LR 790.1_ _LRR _NBS RR LOC_Os01gL R _CC_N _LRR O3O C_ O 8Og gg 1a1 01O 6Osativa .ss1g10 L 0 _1 78 0_.0 _ .C 1.s_ L3O _CaO i.v0 C8C O _11O R O sva LOC aBta 5 tiiS _O aBS_LR sL _ _ N 513ss2 1 B 3130t04 LtN S 1a00 R ia _ _R a R LBRSR_LRR _ vvtN Cs_40 _ B O S B S g5 S 4O _ 617 4_tO .isN 1sa __ O sv_B a ivN aL _R N L6 OgC3L_0O O LOC_Os0 sa 4C 3s0_.6 ivsaa_tN 1g_ O20s4ga60t0i7v.91 a7_8O N.B _B LR O s21 1 _StO ivB a_SN SR _LRR

CC_NBS_LRR

vaB_SNBS _.1N C4O igv00a06_.2C .s1_1_O 0C 9OO 2L_ LOC_Os11gL0O6C tiSvsaa_tiN s_aBO 13_ 1Ogss01a61t3 vtaisv_aaN N _4 aSB 44 i0vBN t_ g_0sN 0O S5B7S0tiva_NBS sCt1i_ _LO 1as_ti_aO Osativa_NBS 0s.O .1__NBS _O 1C .L 045gS4009.4104_.7 1O 0.1_Osa 0 g NBS 0B s1.1_Os LOC_Os10g0416L0O 0g1LOC_Os sv1a ativa_ C O _aO CsO O04540 O L_10g0456 sativa BS BSBS NOs10 BS tiva_Os10g va_N aLOC_ g04380.1_O CC_N C_N Osati 0.1_OsLOC_ BS tiva_ a_C S a_N 60.1_ NBS sativ _Osa a_NB g417 ativ va_ LOC_Os10g0448 .1_O 380.1 sativ Os02 S S _Os sati 1g52 1_O 2340 LOC_ _NB S 20.1 _NB 1_O 330. BS _Os0 01g5 ativa 523 NB 80. LOC 1g52 S _Os C_N _CC _Os 01g C_ 522 BS NB _Os0 LOC tiva S BS a_C 04.1 _Os _N BS _C 01g C_ LOC BS ativ 523 Osa NB iva va LOC _C _N _Os 01g _Os _N .1_ CC_N sat ati iva a_ va va LOC 290 00.1 a_ BS sat Os ati tiv 1_O atisa LOC_Os 457s01 g52 tiv 90. .1_ _N 1_O 05g Os 60 Sg0BS sa va 456 70. .2_ _O .1_ 25 C_O atiOs NB BSa_ 05g 30 .1 10 _N LOC_Os LO 70 _N Os 26 Os 26 .1_O 60 05g BS 07 25 C_ iva456 ativ g0 Os Osva LO _N26 g0 sat 07 Os g0 2620 C_ati va.1_ 07 .1_ Os LO 1_O 26 07 ati24 LOOs Os C_ 30.1_ 84.C_ g0 227025 07g0 OsOs C_ LO 07C_ .1_ LO 02g398 Os Os01g507g0 40LO C_SOs a_NB 26 BS LOC_ LOC_Os g0C_ LOC_Os Os07LO sativ LO C_N C_ 40.1_O ta_C R R LOg2 ro 53 R R DcaRBSB_L S_L RR Os01 N 9.1__La_ R NBS _LLBR S R S LOC_ RRCR_N NB R tiva_ R RRN Stiv 1572 sa R RRR B_L __LLR RR _LR LR _LCR a_ R S _S S_LR Osa _L _N _LLR _LN _ RR Pno0C BB SS_S C BSa_ BONStiv _ R NBS NB 1_ tiv N B R 0.C S 160.1_ sa B BB _LS_RLRRR R a_ LB N BS ota_ a_sa R__N _N NN tiva_ 81 OR __N aCS__C R NB40 S tiv ara_ 1_ N C R tiv tivO RRR S RRR C 29 0.a_ C Dctiv R tiCvC Osa BSR C a_B BC_ sa _R Osa 0.SSa1_ __L s11g R _C _LC O Osa R_LLR C 4.1_ L_L 1_s0 0.1_ 94 _tiOLvCsaRaa_aC _NR_SLL__R 29 Osa_O tiv 2_7g BBaa_ a_ti_LvN 1_82 C_a__N tiv S S a R tiv v _ _O C 0. 13 N a _ 7g 0. _N 2936 sa R R S 41 C C0. L _ s LOC1g4259 0.1_ C C N 44 B O sa 1g s0 _tiv 45 aBtis_S 14 BSRB _SSBB NO a_ti_sNvO sa C OO asa LO 0B.1 B _NN_L_S Pno0 sO __tiv s11_ sa __ 14 1049 _O O s1 _C _C RR 0g C33 _LN _aNN s09g 9g .1 BS tivavBaCS__CB .1 tiOv05a.116_0O0v_a.1 aavS s1LO sa 0_.1 19tiv0.a0a_ C_O Rtiv tiv 5 s0 N a _ 0 4 7 a .1 N Os12g 8 .1 .1 ti sa O R R a_N C_O s C_O 6 45 0 a S _L R ti LO ti L O avNvtiB 50S C_O_O asa_aC 4sa 025s06a_tiOO LOLO 1gO 459.1 71__8gsO 414s19g02g06O LOC_Osativ LO R vastia_av_vsC LR RBBS s1.11_ LOC as_ 1_1g91g4g9gN400B01.1 1_ _vO LSR_BS__LNN 06 _1O 110015_0.2sa_tistisava_tiOO 900.s1 s1 g R 4 3 1 _ O s .1 s1 ti C 0 0 S a _ 9 4 O _ 620. O _ 0 5_O O O 9 s a v g 0 _1O 0821s05g.1 Rava_ vCa5L__1OOC 445 S 0s.1 _s3O _C sC NB_NBLtiR 4 02.1.1 gO O CC gtis1 1g45 11g .180_2.17O383__739O LO s1 s1LO C1s_L1aO LO NCB_CRLCR_SRs_N_asLaRti S RRR O g_2O42g 817935.109_.1 C_LO a O _ 8 s 8 Os1CL_.1 _Os1 C _ g _ 4 4 0 C 8 0 LO _ C__O BR R a O S LOOLsO0_8Osg02s105S tiNv_BB_O O LOC g92210702730g7g374g0.1 OC L8OO tiavti__vBsLaaS a_.1 80C_ LRRRR__LLRRR LLO __RN 1 1O6gs4O Cs1_0sO R aSLR stiasvB_aSN s s2 S Ov70_0N.1 15L CL_OO CO_0N 5g1B _g1 _5O LRR _tiSLBvR_R _Os_aO LO C _L BS_L vNa1s_a0titi75v5a8 08g _LCOO O OCsO C_C v_a_sO LR_ NsS_LaBN RRRSR _NNBS Os S 0.1_4sv.1aa6ti__0Og.1 OLOa_tiC L s0 LLO sa1 8.1 ti 1 1 L 4 C_ 0 g SB_ _SO 0 _Ba_ LRRB a_ SB a O OsLCO _O s.1R1_R LO 106L.1 _BRN .1a_vBBtiaNLvS_SaSRL__RNLatitivvaN_BN 03991.11s_g5O 0O1 .1_L LOC _C 85__O _vNaR 2_0satiN _B 2g213g_70O 0 s O 80 g 0 1 0 S 5 .1 1 C 0 _ _NSBNtiSBvOssaCa vaati_L5tiOa7v_tiasaOvaN 1 61 Os0s5s0508OgsO 0B6C N s v aBs__aN _Oati_v atiBs SgsO2a_.1 g1 O sN C_O _6Og01C_11gL_1LO a 10_4.1s0aaavti_tiati_Nvv_Oa 0.1.1_tisva _ _ 05 s C O _.1 LOC Og01L9_OOasti1v1 a_ s_05O Os 2O tiaR0vssti.1a1555730_sOa .1 RO LOs_L0O9sCO s 94v O.16013.1 C_ sL7_as6O 9001ati 3_20g03.1 OC O _O __O 3311 0_.1O LO 2g34Os O566C03g436B0.1 CL_O 0LL.1O_C _S0O_.15_ 5ggg3 3.1 g73 _ 0gL3s2Og05_s3076N.1 5 02g02.1sss00055402590 LO 0 _g2a100401811_O 013s0.1s0_s1CO 32 OO g3 2 O_CO g2 O_s7O O01ti00v2g4O2s5CC__s0g84 05 8 CO_gC1 C_LCOLssg0a21g01_gLOOC Os _sO0 LO0L7 OLO_s_O0_OsL2O1OssC0 LLCO C_ .1_CO_O O_O Os L O_0CO LO C_ 0LC9LCO C OLC LO L 0L4OLOLO 0g s1 O C_ LO

a

70

56

g1

06

CA

a

Acetoacetyl-CoA 2. 3. 1. 9

Isopentenyl-PP

Mevalonate-5P

1. 1. 1. 34

bioRxiv preprint first posted online 2. 7. 1. 36 . The copyright 2. holder 7. 4. 2 for this preprint 4. 1.(which 1. 33 was not 2. 3. Jul. 3. 104, 2018; doi: http://dx.doi.org/10.1101/362046 Glycolysis peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission.

3-Hydroxy-3-methy Mevalonate -lglutaryl-CoA

Acetyl-CoA

K15817(DDS) DammarenediolII

b

K00801(FDFT1)

(S)-Squalene-2,3-epoxide 3000

Squalene

1500 1000

2. 5. 1. 21

Presqualene-PP

Pno022472.1(SQLE) Pno009162.1(SQLE) Pno029444.1(FDFT1) Pno034035.1(DDS) Pno005189.1(FDFT1) Pno035240.1(SQLE)

2500

K00801(FDFT1)

2. 5. 1. 21

1. 14. 13132

4. 2. 1. 125

2000

Log(RPKM)

K00511(SQLE)

Mevalonate-5PP

2. 5. 1. 10

Farnesyl-PP

d

500 150 100 50 0

R1

R2

R3

L1

L2

L3

F2

F3

40

Pno022472.1(SQLE) Pno009162.1(SQLE) Pno029444.1(FDFT1) Pno034035.1(DDS) Pno005189.1(FDFT1) Pno035240.1(SQLE)

35

Relative Quantities

α helix (Front)

30 25

195A 194L

20 15 10 5 0

c

R1

R2

R3

L1

L2

L3

F2

F3 α helix (Front) aa:194-196

α helix (Back)

Dc020454.1 Achn32 9921 Pno0 0904 8.1 AT5 G3 Ac 611 hn 0.1 31 AT5G 35 36 PG 51 14 0 .1 SC 00 03 DM T4 00 06 71 71

T4 DM 3 0 00 31 SC 3299 1 G n P ch 347 2 A hn0 1 Ac 1 99 n32 Ac h 68931 A c hn3 Pno011760.1

0 00

19

56

1

a

4

02

.1

Dc

.1

0 42

b

.1

8 33

5

06

.1

.2

39 0

21

13

53

03

40 0

T

g bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046 . The copyright holder for this preprint (which was not 06 yc peer-reviewed) is the author/funder. All rights Sreserved. No reuse allowed without permission. ol

PG S So C000 lyc 3D M 0 T40 S PG 7g0 009 o ly SC0 4288 394 0.1 1 .1

CYP716

Achn130041

Dc032272.1

Dc03227

0.1 Solyc 06g0 6543 PG 0.2.1 SC0 003 Pn DM o0 T 21 400 28 067 Pn 3.1 089 o0 CY 12 34 P7 7.1 16 A4 7

3001

Achn33

11

.1 Pn

o0

029 60. 1 1 0003D 6A5 3v2 MT40 00544 57 Solyc02g069 600.2.1

64

89

19 67 05 T4

DM 03 PG

PGSC

1

00

170

SC

Protopanaxadiol (PPD)

n20

Ach

12

Relative Quantities

91

16

20

14

d

11 2131 Achn 3051 Achn22

00

SC

PG

Achn038071

SC

PG

T4

M

D 03

CYP 7

M

3D

0 00

2

43

08

0 00

71 39 21 36 53 hn 11 Ac hn Ac

PG

T4

hn Ac

SC

6

25

38

0 00

03

M

3D

0 00

00

311

148

n Ach

0

24

38

00

0 T4

Pn o

3039

Achn

c

PPDS>PPTS

Pno012347 Pno002960 Pno011760 Pno021283

10

Dammarenediol-II CYP716A47

DM

00 3

.1

CYP716A53v2

g0

c0 5

Solyc07g064450.2

1 40. 073 0 o .1 Pn 234 032 Pno 3.1 1135 Pno0 Dc014537.1

8 6 4 2 R1

R2

R3

L1

L2

L3

F2

F3

8

8

6

Lg(RPKM)

6 4 2 0

4 2 0

−2

−2

−5

F3

F2

L3

L2

L1

Pno012265.1 Pno028322.1 Pno038184.1 Pno032082.1 Pno005453.1 Pno021401.1 Pno021801.1 Pno002334.1 Pno014418.1 Pno023041.1 Pno027184.1 Pno019492.1 Pno037322.1 Pno000020.2 Pno008683.1 Pno015535.1 Pno006032.1 Pno017577.2 Pno020413.1 Pno008682.1 Pno020120.1 Pno002917.1 Pno021283.1 Pno011544.1

R3

CYP87 CYP81 CYPCYP51 CYP77 CYP716 CYP716 CYP716 CYPCYP78 CYP78 CYP82 CYP94 CYP86 CYP86 CYP51 CYP72 CYP83 CYP72 CYP96 CYP76 CYP71 CYPCYP71 CYP71

R2

Pno035711.1 Pno008582.1 Pno020287.1 Pno033997.1 Pno012456.2 Pno012347.1 Pno002960.1 Pno011760.1 Pno012725.1 Pno031133.1 Pno009840.2 Pno027966.1 Pno022529.1 Pno033513.1 Pno037530.1 Pno001661.1 Pno008188.1 Pno006076.1 Pno020138.2 Pno011440.1 Pno008576.1 Pno011050.1 Pno032385.1 Pno017963.1 Pno005525.1

R1

F3

F2

L3

L2

L1

R3

R2

−4

R1

Lg(RPKM)

Protopanaxatriol (PPT)

PPTS>PPDS

0 20

0

0 5 Value CYP87 CYP77 CYP82 CYP71 CYP76 CYP71 CYP82 CYP90 CYP98 CYP94 CYP718 CYP71 CYP707 CYP71 CYP71 CYP707 CYP71 CYP79 CYP72 CYP715 CYP77 CYP71 CYP716 CYP--

a

b bioRxiv preprint first posted online Jul. 4, 2018; doi: http://dx.doi.org/10.1101/362046 . The copyright holder for this preprint (which was not UGTPg100 UGTPg1 peer-reviewed) is the author/funder. All rights reserved. No reuse allowed without permission. UGTPg101 UGTPg101

c

ginsenoside Rg1

8

UGT71A27

ginsenoside F1

Protopanaxatriol (PPT)

UGT74AE2

Protopanaxadiol (PPD)

6 4

R1 R2 R3 L1 L2 L3 F2 F3

Pno017659.1 Pno026280.1 Pno033035.1 Pno036629.1 Pno007320.1 Pno020571.1 Pno010455.1 Pno018210.1 Pno027722.1 Pno008958.1 Pno015224.1 Pno012461.1 Pno014832.1 Pno014565.1 Pno016510.1 Pno018314.1 Pno003277.1 Pno003692.1 Pno003883.1 Pno020280.1 Pno033219.1 Pno013811.1 Pno024024.1

UGT-UGT71A UGT74 UGT72B UGT-UGT-UGT73 UGT-UGT71A UGT74 UGT-UGT71 UGT72E UGT-UGT85A UGT85A UGT-UGT-UGT-UGT71A UGT-UGT-UGT85A

Pno026280 Pno0027722 Pno020280

10 9

UGT74AE2

0 5 Value

Compound K

8 7

ginsenoside Rg3

6 5

ginsenoside F2 UGT94Q2

−4

UGT94Q2

−5

−2

Relative Quantities

0

4 3 2 1 0

R1

R2

R3

L1

L2

L3

F2

F3

6

ginsenoside Rd

4 2 0 8

0

−5

0 5 Value

Pno013844.1 Pno019712.1 Pno019168.1 Pno020304.3 Pno013982.1 Pno026117.1 Pno034813.1 Pno003567.1 Pno024781.1 Pno031515.1 Pno033394.1 Pno002810.1 Pno000722.1 Pno036328.1 Pno018315.1

UGT71A UGT75 UGT-UGT85A UGT80 UGT-UGT73B UGT-UGT-UGT74 UGT74 UGT74 UGT74 UGT85 UGT85A

Pno033394 Pno002810 Pno00732

Pno013844 Pno031515

12 10

Relative Quantities

−2

R1 R2 R3 L1 L2 L3 F2 F3

d

ginsenoside Rh2

0 12

2

8 6 4 2 0

R1

R2

R3

L1

L2

L3

F2

F3