Tracing the origin and evolutionary history of ... - Wiley Online Library

3 downloads 14510 Views 1MB Size Report
Toll⁄interleukin-1 receptor (TIR) class of NBS-LRR genes probably had an ...... cte ristic do mains within some essential sign aling compon ents in the plant disea.
Research

Tracing the origin and evolutionary history of plant nucleotide-binding site–leucine-rich repeat (NBS-LRR) genes Jia-Xing Yue1,2, Blake C. Meyers3, Jian-Qun Chen1, Dacheng Tian1 and Sihai Yang1 1

State Key Laboratory of Pharmaceutical Biotechnology, School of Life Sciences, Nanjing University, Nanjing 210093, China; 2Department of Ecology and Evolutionary Biology, Rice

University, Houston, TX 77005, USA; 3Department of Plant and Soil Sciences, and Delaware Biotechnology Institute, University of Delaware, Newark, DE 19711, USA

Summary Authors for correspondence: Sihai Yang Tel: +86 25 83686406 Email: [email protected] Dacheng Tian Tel: +86 25 83686406 Email: [email protected] Received: 4 October 2011 Accepted: 11 November 2011

New Phytologist (2012) 193: 1049–1063 doi: 10.1111/j.1469-8137.2011.04006.x

Key words: domain reorganization, evolutionary history, nucleotide-binding site– leucine-rich repeat (NBS-LRR) genes, origin, resistance (R) genes.

• Plant disease resistance genes (R genes) encode proteins that function to monitor signals indicating pathogenic infection, thus playing a critical role in the plant’s defense system. Although many studies have been performed to explore the functional details of these important genes, their origin and evolutionary history remain unclear. • In this study, focusing on the largest group of R genes, the nucleotide-binding site–leucinerich repeat (NBS-LRR) genes, we conducted an extensive genome-wide survey of 38 representative model organisms and obtained insights into the evolutionary stage and timing of NBS-LRR genes. • Our data show that the two major domains, NBS and LRR, existed before the split of prokaryotes and eukaryotes but their fusion was observed only in land plant lineages. The Toll ⁄ interleukin-1 receptor (TIR) class of NBS-LRR genes probably had an earlier origin than its nonTIR counterpart. The similarities of the innate immune systems of plants and animals are likely to have been shaped by convergent evolution after their independent origins. • Our findings start to unravel the evolutionary history of these important genes from the perspective of comparative genomics and also highlight the important role of reorganizing pre-existing building blocks in generating evolutionary novelties.

Introduction Plant disease resistance (R) genes encode proteins that serve to sense the invasion of viral, bacterial or fungal pathogens and subsequently trigger a series of downstream immune responses, thereby playing a critical role in the plant innate immune system (Dangl & Jones, 2001). There are two major classes of immune receptors in plants: nonspecific transmembrane pattern recognition receptors (PRRs), for example, receptor-like kinases (RLKs), and specific cytoplasmic receptors, for example, the nucleotidebinding site–leucine-rich repeat (NBS-LRR) proteins. Proteins in the former class recognize conserved pathogen-associated or microbe-associated molecular patterns (PAMPs or MAMPs), whereas those in the latter class recognize pathogen-specific effectors (Chisholm et al., 2006; Jones & Dangl, 2006; Maekawa et al., 2011). In 1992, Johal & Briggs isolated the first plant R gene, Hm1, from maize (Zea mays). Since then, > 70 R genes have been cloned. Among them, NBS-LRR genes comprise the largest class and account for more than half of plant R genes (McHale et al., 2006). NBS-LRR genes are characterized by encoding an N-terminal variable domain, a central nucleotide-binding site (NBS) domain, and a C-terminal leucine-rich repeat (LRR) domain. Based on whether they also encode an N-terminal Toll ⁄ interleukin-1 receptor (TIR) domain, NBS-LRR genes can be further divided  2011 The Authors New Phytologist  2011 New Phytologist Trust

into two subclasses, the TIR subclass and the nonTIR subclass (Meyers et al., 1999). In addition to their different domain architectures, NBS-LRR genes from these two subclasses also differ considerably in their phyletic distribution and downstream signaling pathways, suggesting possible functional divergence between them (Aarts et al., 1998; Tarr & Alexander, 2009). In the plant innate immune system, receptors encoded by NBS-LRR genes are localized in the cytoplasm and can specifically recognize viral effectors secreted by pathogens, either directly or indirectly; this recognition subsequently activates a complex downstream signaling pathway usually leading to rapid local cell death, termed the hypersensitive response (HR), around infected sites (Chisholm et al., 2006). Genome-wide investigations conducted in Arabidopsis thaliana, Oryza sativa, Medicago truncatula, Vitis vinifera, and Populus trichocarpa revealed that there are generally hundreds of NBS-LRR genes in plant genomes, reflecting the important roles of these genes (Meyers et al., 2003; Zhou et al., 2004; Ameline-Torregrosa et al., 2008; Yang et al., 2008). However, NBS-LRR genes are often polymorphic between individuals of a host population, and the complete set of these genes defines the repertoire for the detection of polymorphic pathogen effectors (Bakker et al., 2006; Zhang et al., 2009; Maekawa et al., 2011). It is therefore of particular interest to elucidate the origin and evolutionary history of this important gene family and also the New Phytologist (2012) 193: 1049–1063 1049 www.newphytologist.com

New Phytologist

1050 Research

evolutionary relationship between TIR- and nonTIR-NBS-LRR genes. There is another line of plant defenses represented by the LRR-RLK transmembrane receptors which are localized on the plant cell surface. Containing an extracellular LRR domain and an intracellular Pkinase domain, LRR-RLKs are typically able to recognize highly conserved PAMPs, such as flagellin, lipopolysaccharides (LPSs), cold-shock protein, chitin, and b-glucans (Nurnberger & Brunner, 2002). These molecular patterns are often common components that participate in important functions for various microbial species (pathogenic or not). Therefore, resistance driven by LRR-RLKs is generally non-host-specific and may represent a more basal defense strategy. If so, one unanswered question about the plant innate immune system is whether LRR-RLK genes have a more ancient origin than NBS-LRR genes. Analogous to the case for plants, animal innate immune systems also rely on two groups of receptors, the transmembrane Toll-like receptors (TLRs) and cytosolic Nod-like receptors (NLRs) (Ausubel, 2005). TLRs are characterized by an extracellular LRR domain and an intracellular TIR domain, whereas NLRs often demonstrate a tripartite domain architecture consisting of a variable N-terminal domain, a central NACHT domain and also a C-terminal LRR domain, like NBS-LRR proteins in plants (Ausubel, 2005; Staal & Dixelius, 2007). In addition, NBS-LRR and NACHT-LRR proteins are members of the signal-transduction ATPases with numerous domains (STAND) class of ATPases (Leipe et al., 2004; Rairdan & Moffett, 2007; Maekawa et al., 2011) which share similar biological functions and protein structures; however, the NACHTs have been placed in a separate phylogeny, suggesting different evolutionary histories of these two domains (Leipe et al., 2004; Rairdan & Moffett, 2007; Maekawa et al., 2011). Therefore, it is interesting to ask: what forces drove such striking similarities between the plant and animal innate immune systems? Did they come from the same common ancestor (divergent evolution) or did they evolve independently (convergent evolution), as suggested by a number of reviews (Ausubel, 2005; Rairdan & Moffett, 2007; Staal & Dixelius, 2007)? The recent avalanche of whole-genome data from diverse species offers us an unprecedented opportunity to explore this series of questions via comparative analysis at the genomic level. In this study, we sampled 38 representative genomes, covering all major kingdoms of organisms (eubacteria, archaebacteria, fungi, protists, plants and animals) and identified all homologous genes of NBS-LRR, LRR-RLK, TLR, and also NLR genes in these genomes. For these well-defined immune receptor genes, we conducted a set of comparative analyses on their phyletic distribution, domain architecture, phylogenetic topology, and conserved motif evolution patterns. These data shed light on the origin and history of NBS-LRR genes and their evolutionary relationship with LRR-RLK genes in plants, and also those of the TLR and NLR genes in animals. In addition to R genes, a number of important chaperones and regulators involved in the plant immune response have been identified through genetic screens; these include WRKY in the LRR-RLK defense pathway and New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

SGT1, RAR1, EDS1, PAD4, SAG101, HSP90 and SID2 in the NBS-LRR defense pathway (Rusterucci et al., 2001; Azevedo et al., 2002; Hubert et al., 2003; Leister et al., 2005; Eulgem & Somssich, 2007). Because of the essential functions of these signaling components for plant immunity, they were also incorporated in our study. The clarification of the evolutionary relationships and timelines of plant R genes, especially NBS-LRR genes, in this study paves the way for elucidation of how plant disease resistance originated and evolved, and how this evolutionary innovation helped plants to survive and thrive in diverse terrestrial habitats. In addition, the findings of our study provide a good example of how recruiting and reorganizing of pre-existing building blocks can lead to functional innovation, thus shedding new light on the origin of evolutionary novelties.

Materials and Methods Data sampling For this study, we selected a total of 38 different whole-genome sequenced species, including six eubacteria, six archaebacteria, six fungi, six protists, seven plants and seven animals. These species were sampled because they represent all six major taxonomic kingdoms of organisms on earth. Their complete genomic sequences and corresponding annotation information were downloaded from online databases (Supporting Information Table S1). For each species, protein entries matching the NBS domain (Pfam: PF00931) in the Pfam database v23.0 (Finn et al., 2009) were identified as NBS-encoding genes using HMMER searches with an E-value cut-off of 10)4 (Eddy, 1998). Likewise, we also identified all NACHT-encoding, TIR-encoding, and Pkinaseencoding genes in these 38 species (Table S1). For the two dubious TLR genes found in grape (Vitis vinifera) and poplar (Populus trichocarpa), we used their nucleotide sequences to run BLAST against the GenBank expressed sequence tag (EST) database to seek EST support for these two genes with > 85% sequence similarity. In addition, the nucleotide sequences of these two genes, together with their 5¢ and 3¢ flanking regions (each 5000 bp, respectively), were used for re-annotation with FGENESH (http://linux1.softberry.com/berry.phtml) and the domain architecture of the resulting predicted proteins was assessed using Pfam (Finn et al., 2009). In addition, we randomly selected 30 NBS-encoding and NACHT-encoding genes from each of several other eubacterial and fungal species; these amino acid sequences were also downloaded from the European Bioinformatics Institute (EBI) UniProt database, including the 30 from each of the sets of NBS-encoding genes in eubacteria, NACHT-encoding genes in eubacteria, NBS-encoding genes in fungi, and NACHT-encoding genes in fungi (Supporting Information Notes S1). Fifteen NBSencoding genes in gymnosperms (Pinus monticola) were also downloaded from EBI UniProt and National Center for Biotechnology Information (NCBI) GenBank and incorporated into our data set (Notes S1). Finally, 43 NBS- and NACHT-encoding  2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist genes of known function were selected for further analysis, and their amino acid sequences were downloaded from NCBI GenBank (Notes S1). The presence or absence of our domains of interest within these genes (e.g. NBS, NACHT, LRR, TIR, Pkinase, WD40, TPR, and HEAT) were verified based on both the Pfam (Finn et al., 2009) and SMART (Letunic et al., 2009) databases with an E-value cutoff of 10)4. Additionally, all current available sequence data (nucleotides, ESTs and peptides) for 5126 species from nine major early plant lineages (Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, Charales, liverworts, bryophytes, hornworts and lycophytes) and the trace data for a liverwort species, Marchantia polymorpha, were downloaded from NCBI GenBank (Table S2). For these nucleotide, EST, and trace sequences, TBLASTN was first implemented to find potential hits for the NBS domain under the threshold of 1. For those significant hits, the corresponding amino acid sequences given by high-scoring segment pairs (HSPs) were retrieved to assess the presence or absence of the NBS domain by Pfam. For those peptide sequences, Pfam searches were directly performed to detect their domain architecture. The amino acid sequences of these verified NBS domains were then analyzed by BLASTP against all other NBS-encoding genes used in this study to assess their relatedness with TIR- and nonTIR-type NBS domains. For those other essential genes in plant disease resistance response pathways (Table S2), we first identified their key functional domains using Pfam. Then all known accessions encoding those domains were downloaded from EBI UniProt based on Pfam species distribution analysis results (Finn et al., 2009). Sequence alignment and phylogenetic reconstruction For all NBS ⁄ NACHT-encoding genes in our data set, the amino acid sequences of their NBS or NACHT region were aligned using MUSCLE v3.8 (Edgar, 2004). All alignments used in this study are shown in Notes S1. Phylogenetic trees were constructed based on the neighbor-joining (NJ) method with a Poisson correction model using MEGA v4.0 (Tamura et al., 2007). The confidence for each branching node was assessed by bootstrap analysis with 1000 replicates (Felsenstein, 1985). Identification and analysis of conserved motifs within each characteristic domain For all NBS ⁄ NACHT-encoding genes sampled in our study, MEME 4.0.0 (Bailey & Elkan, 1994) was employed to identify and analyze conserved motifs among amino acid sequences of NBS ⁄ NACHT regions. Based on expectation maximization, conserved motifs were identified in a group of sequences without a priori assumptions about their alignments. The individual profile for each conserved motif was described and a summary report was generated after motif summary tiling. Only those conserved motifs with position P-values £ 10)4 and no overlap with each other were reported. The motif occurrence profile of the NBS ⁄ NACHT region for each gene in the MEME motif summary report was assigned  2011 The Authors New Phytologist  2011 New Phytologist Trust

Research 1051

to different groups corresponding to ten phylogenetic clades (Eubacteria, Protista, Fungi, Bryophyta, Lycopodiophyta, Gymnospermophyta, Angiospermophyta, Protostomia, Echinodermata, and Chordata). The motif occurrence percentages of these ten motifs were determined for each group as the motif usage frequencies. For other essential genes that participate in the plant disease resistance response, similar analyses were performed. The number of conserved motifs in our initial setting varied from three to ten according to the length of the corresponding domains. Ancestral state reconstruction The evolutionary age of the domain inferred by ancestral state reconstruction (ASR) was used to give further support to inferences regarding when and where the major domain fusion events (NBS with LRR, NACHT with LRR and also LRR with Pkinase) occurred. ASR using maximum likelihood (ML) methods was performed in MESQUITE (v2.74) (Maddison & Maddison, 2010). For the ML-based ASR, the Markov k-state 1 parameter (Mk 1) model was employed. Character states and the corresponding relative likelihood supports were mapped onto the phylogeny of those major evolutionary lineages.

Results Genome-wide identification of NBS-LRR and LRR-RLK genes in multiple representative species All genes encoding R gene characteristic domains (e.g. NBS, LRR and Pkinase) were identified in our sampled genomes (Table S1) based on a Pfam homologous search. Interestingly, all these characteristic domains can be detected in several eubacteria and archaebacteria (Table 1). For example, in our sampled six eubacteria genomes, 15 NBS-encoding, 153 Pkinase encoding, and seven LRR-encoding genes were identified (Table 1). This observation, together with the wide distribution of these domains in nearly all eukaryotic phyla, collectively suggests very ancient origins of these functional modules. Yet no domain fusion events were detected in our sampled eubacteria, archaebacteria and fungi genomes (Table 1). A further scan of all currently available sequence data for eubacteria, archaebacteria and fungi by BLAST further confirmed this finding. Therefore, the domain fusion events between NBS and LRR and also between LRR and Pkinase may have occurred exclusively in eukaryotes after the divergence of fungi. In our sampled genomes, the first structurally complete LRR-RLK gene that encodes both LRR and Pkinase domains was observed in protists. This observation suggests that the origin of the LRR-RLK genes occurred at an early stage of eukaryotic evolution, which is confirmed by our ASR analysis based on the ML method (Fig. 1). Contrasting with their loss in fungi and metazoans, the number of LRR-RLK genes in land plants was generally one or two orders of magnitude higher than in other phyla (Table 1). Very possibly, such substantial gene expansion of this gene family in the land plants was driven by their function specialization in plant immunity (Tang et al., 2010). New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist

1052 Research

Table 1 Genes encoding NBS ⁄ NACHT ⁄ Toll ⁄ interleukin-1 receptor (TIR) ⁄ Pkinase ⁄ leucine-rich repeat (LRR) domains identified in a genome-wide survey NBS Kingdom

Phylum or clade

Species

NACHT

TIR

T-NB-L X-NB-L T-NB X-NB NA-L NA L-T

T

Pkinase

LRR

L-Pk Pk

L

Eubacteria

Escherichia coli Streptomyces coelicolor Rhodococcus opacus Trichodesmium erythraeum Nostoc punctiforme Synechococcus sp. RCC307

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 6 0 4 5 0

0 0 0 0 0 0

0 3 1 8 3 0

0 0 0 0 2 0

0 3 0 0 1 0

0 0 0 0 0 0

1 33 25 40 54 1

1 0 0 3 4 0

Archaebacteria

Methanocaldococcus jannaschii Natronomonas pharaonis Methanospirillum hungatei Sulfolobus tokodaii Methanosarcina barkeri Methanosarcina acetivorans

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 2 0

0 0 0 0 0 0

0 0 0 0 2 2

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 1 6 7 0 0

0 0 0 0 1 1

Cyanidioschyzon merolae Chlamydomonas reinhardtii Volvox carteri

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

0 2 0

0 2 1

59 447 366

9 71 60

0 0 0

0 0 0

0 0 0

0 0 0

0 0 0

3 2 3

0 0 0

0 0 2

3 1 12

2363 870 338

172 110 54

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

0 0 2 0 2 1

0 0 0 0 0 0

0 0 4 3 2 11

0 0 0 0 0 0

0 0 0 0 0 0

0 0 0 0 0 0

114 105 116 139 0 103

9 9 9 12 0 9

2 0 ?b 17c 14c 10c 0c 0

20 14 ?b 4c 62c 78c 52c 61

0 0 ?b 0 0 0 0 0 3 0

Protista

Algae-like (plant-like)

Protozoa-like (animal-like) Paramecium tetraurelia Tetrahymena thermophila Monosiga brevicollis Fungi

Saccharomyces cerevisiae Schizosaccharomyces pombe Aspergillus terreus Coprinus cinereus Magnaporthe grisea Chaetomium globosum

Plantae

Bryophyta Lycopodiophyta Gymnospermophyta Angiospermophyta

Physcomitrella patens Selaginella moellendorffii Pinus monti Arabidopsis thaliana Vitis vinifera Populus trichocarpa Oryza sativa Sorghum bicolor

4 0 > 67a 93c 97a 78a 0a 0

0 2 > 61a 53c 362c 252c 464c 184

Animalia

Protostomia

Nematostella vectensis Caenorhabditis elegans

0 0

0 0

0 0

1 2

Echinodermata Chordata

Strongylocentrotus purpuratus Branchiostoma floridae Ciona intestinalis Mus musculus Homo sapiens

0 0 0 0 0

0 0 0 0 0

0 0 0 0 0

2 8 0 2 5

208d 40 2 27 28

0 0 ?b 0 0 0 0 0 13 1 41 69 17 12 10

0 4 0 1 ?b ?b 0 39 1 29 1 116 0 1 0 2 0 1 222d 28 0 12 11

105 536 158 848 ?b ?b 220 783 274 986 353 1348 302 1152 228 927

170 261 ?b 264 346 471 173 179

7 5

1 1

327 561

129 73

49 64 1 11 10

3 20 1 4 6

386 843 295 662 702

171 1399 107 232 281

a

Data from Liu & Ekramoddoullah (2003, 2007). The question mark here indicates that data are currently not available. c Data from Yang et al. (2008). d Data from the Sea Urchin Genome Sequencing Consortium (Sodergren et al., 2006). The threshold of the hmmpfam search here is E-value < 10)4. T, TIR domain; NB, nucleotide-binding site (NBS) domain; NA, NACHT domain; L, LRR domain; Pk, Pkinase domain. The number of genes encoding L-Pk and encoding only L may be underestimated because of the difficulty of detecting the LRR domain. b

Compared with LRR-RLK genes, NBS-LRR genes showed a much later origin and are probably specific to land plants according to both our genome-wide survey and the ASR (Table 1 and Fig. 1). In the land plant lineage, four NBS-LRR genes were first detected in the moss Physcomitrella patens, associated with the New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

TIR domain in their N termini (Table 1). This observation is consistent with previous studies showing the presence of TIR-NBS-like and TIR-encoding genes in this species (Akita & Valkonen, 2002; Meyers et al., 2002). As P. patens diverged at an early stage of land plant evolution, these NBS-LRR genes might  2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist

Research 1053

LRR-RLK gene

TLR gene

NBS-LRR gene

NLR gene

Fig. 1 Maximum likelihood (ML) ancestral state reconstruction for the domain fusion events in the evolution of plant and animal immune receptors. White circles, without domain fusion; black circles, with domain fusion. The pie chart at each node shows the relative likelihood of these two possible states at that particular node. LRR, leucine-rich repeat; NBS, nucleotide-binding site; NLR, Nod-like receptor; RLK, receptor-like kinase; TLR, Toll-like receptor.

represent the archetype of NBS-LRR genes in land plants. In spikemoss (Selaginella moellendorffii), a total of 16 NBS-encoding genes were detected and two of them also encoded a C-terminal LRR domain (Table 1). In contrast to the limited number of NBS-LRR genes in these two basal plants, substantial lineagespecific gene expansion occurred thereafter and hundreds of NBS-LRR genes were found in gymnosperm and angiosperm genomes. In addition, as shown in Table 1, TIR-NBS-LRR genes were first observed in P. patens, whereas nonTIR-NBS-LRR genes were first found in S. moellendorffii, both of which branched before the split of gymnosperms and angiosperms. As P. patens represents an evolutionarily older lineage (bryophytes) than S. moellendorffii (lycophytes), this observation also hints at an earlier origin of TIR-NBS-LRR than their nonTIR counterparts. Further investigation of NBS-LRR genes in early land plants To improve the resolution of the evolutionary history of NBSLRR genes in early land plants, a further scan of all current available early plant sequence data (nucleotides, EST, peptides and  2011 The Authors New Phytologist  2011 New Phytologist Trust

trace) in GenBank, covering 5126 species from nine major early plant lineages (Chlorokybales, Klebsormidiales, Zygnematales, Coleochaetales, Charales, liverworts, bryophytes, hornworts and lycophytes) (Qiu & Palmer, 1999), was conducted (Table S2). Based on this data set, 16 NBS-encoding fragments were identified and the existence of the NBS domain in the plant lineage can be further dated back to the origin in the Coleochaetales (Table S3). However, no evidence of the fusion between the NBS domain and either the LRR or the TIR domain was found based on this data set. To characterize these 16 NBS-encoding fragments, their amino acid sequences were used to run BLASTP against all other NBS-encoding genes used in our study. The best hits for those three NBS-encoding fragments in Coleochaetales were NBS domains from one NBS-WD40 gene in archaebacteria and two TIR-NBS-LRR genes in dicot plants (Table S4). For the 12 NBS-encoding fragments in liverworts, half of them had the best hits with TIR-NBS-LRR or TIR-NBS genes and the other half showed the highest similarity with two NBS genes, three PKNBS genes, and one NBS-LRR gene (Table S4). Although the latter six genes with best hits do not encode TIR domains, their New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist

1054 Research

(a)

(b)

0.2

(c)

Eubacteria Archaebacteria Fungi ProƟsta Protostomia Echinodermata 0.1

Chordata Bryophyta Lycopodiophyta Gymnospermophyta Angiospermophyta

0.2

Fig. 2 Phylogenetic analyses for nucleotide-binding site (NBS) and NACHT domains. NBS ⁄ NACHT domains of representative NBS ⁄ NACHT-encoding genes in multiple species were used to construct this phylogenetic tree. (a) The entire tree combining the NBS and NACHT domains and separate trees for (b) NBS and (c) NACHT domains are shown. NBS and NACHT domains are denoted by triangles and squares, respectively, with different colors to represent different phylogenetic clades. Blue branches represent the Toll ⁄ interleukin-1 receptor (TIR) clades and the light green branches represent the TIR-type NBS-like PpC NBS gene family in Physcomitrella patens. The red branches denote those function-verified NBS- or NACHT-encoding genes.

NBS domains were either within or adjacent to the monophylic clades of the TIR-type NBS domain in our later phylogenetic analysis (Fig. 2). By contrast, the NBS-encoding fragment found in lycophytes (Isoetes lacustris) had the best hit for a typical nonTIR-type NBS domain from a rice CC-NBS-LRR gene (Table S4). As Coleochaetales and liverworts diverged before the split of bryophytes and lycophytes, this result provided further support for the theory that the TIR-type NBS is more closely related to the NBS domains in early land plants and the nonTIRtype NBS is a later, derived innovation that arose before the split of lycophytes. Independent domain recruitments in the evolution of plant and animal immune receptor genes The genomic investigation of the TLR and NLR genes of animal genomes revealed a similar story. First, as with our observations for Pkinase, NBS and LRR domains, the phyletic distribution pattern of NACHT and TIR domains suggested that these two domains also had ancient origins before the split of prokaryotes and eukaryotes (Table 1). Moreover, it is worth noting that almost all of our identified TLRs and NLRs exist only in metazoans. There are a total of four nonmetazoan genes that encode both the LRR and TIR domains, two from the bacterium Nostoc New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

punctiforme and two from land plants (grape and poplar). For those two bacterial genes, some other domains, such as Ras and Miro, were found between the LRR and TIR domains. Therefore, these two genes cannot be considered as typical TLR genes like those identified in metazoans. For these two genes in grape and poplar, we ran a BLAST search against the GenBank EST database and found no hits with > 85% sequence similarity. In addition, we re-annotated these two genes and found no support for their LRR-TIR domain architecture. Therefore, they probably result from annotation error. In metazoans, NLR genes were first detected in the sea anemone Nematostella vectensis, and TLR genes were first observed in the nematode Caenorhabditis elegans (Table 1). Based on the presence ⁄ absence pattern of these two genes and also our ASR (Fig. 1), we propose that the fusion events between the LRR and TIR domains and also between the NACHT and LRR domains both occurred in the early evolution of metazoans, before the split of protostomia and deuterostomia. In deuterostomia, the numbers of both TLR and NLR genes appear to reach a relatively stable level of approximately two dozen, with striking exceptions in the sea urchin (Strongylocentrotus purpuratus) and sea squirt (Ciona intestinalis) (Table 1). For the sea urchin, > 200 TLR and NLR genes were reported, showing substantial expansion for these two immune system genes (Rast et al., 2006; Sodergren  2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist

Research 1055 Individual domains NBS (NB)

NACHT Phospho LRR (NA) kinase (PK) (L)

TIR (T)

HEAT (H)

Domain combinaƟon paƩerns TPR (TP)

WD40 (W)

NA-H

NA-W

NA-L

NB-TP

NB-W

T-NB

NB-L TNL

Eubacteria Archaebacteria

+ +

ProƟsta Fungi Protostomia Echinodermata Chordata Bryophyta Lycopodiophyta Gymnospermophyta Angiospermophyta

+ + + + + + + +

+ + + + + + +

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + + + + + + + + + +

+ + +

+ + + + + +

+ + + + +

+ +

L-PK

Non-TNL

+ +

+

+ +

+ + + +

+ + + +

+ + +

+ + + +

Fig. 3 Phyletic distribution and domain organization patterns of nucleotide-binding site (NBS)-, NACHT- and Pkinase-encoding genes. The distribution and domain organization patterns of NBS-, NACHT- and Pkinase-encoding genes were mapped to the phyletic tree. Plus (+) and minus (–) indicate whether the domain or organization pattern was detected in the corresponding phylogenetic clade under the cut-off of E-value < 10)4 in the hmmpfam search. It is worth noting that two major innovations took place in the shift from prokaryotes to eukaryotes (denoted by light-gray blocks) and from unicellular organisms to multicellular organisms (denoted by dark-gray blocks), respectively. LRR, leucine-rich repeat; TIR, Toll ⁄ interleukin-1 receptor.

et al., 2006). In the genome of the sea squirt, however, a considerable number of gene loss events have been observed, leaving only two NLRs and no TLRs in this genome (Table 1). While these findings are similar to our findings for the plant NBS-LRR and LRR-RLK genes, it is worth noting that their shared TIR and LRR domains seemed to be recruited into animal and plant immune receptor genes independently. To further test this hypothesis, we examined the phyletic distribution pattern of domain organization in different evolutionary lineages (Fig. 3). We found that all the characteristic domains (e.g. NBS, NACHT, TIR, LRR, TPR, HEAT and WD40) originated before the split of prokaryotes and eukaryotes. In prokaryotes, the organizations of NBS-TPR and NBS-WD40 were fairly common, as were the organizations of NACHT-HEAT and NACHT-WD40. By contrast, the novel organization of NBSLRR and NACHT-LRR dominated the land plant and metazoan lineages, respectively. The organization of LRR-Pkinase and LRR-TIR also emerged in different evolutionary lineages. Therefore, the innovations of domain organization patterns in plant and animal immune receptor genes seemed to be two independent processes that occurred in different evolutionary lineages and stages. Contrasting evolutionary patterns of NBS and NACHT domains Two clearly separated clades, the NBS clade and the NACHT clade, were observed in the phylogenetic tree based on the NBS ⁄ NACHT region (Fig. 2 and Notes S1). Particularly, the NBS and NACHT sequences from eubacteria and archaebacteria were assigned to the corresponding clades without mixture. This observation suggests that there were already some fundamental differences between the NBS and NACHT domains before the split of eubacteria, archaebacteria, and eukaryotes. Therefore, these two apparently similar domains must have either diverged a long time ago or actually originated independently. In addition, contrasting to the substantially diversified nonTIR NBS subclass,  2011 The Authors New Phytologist  2011 New Phytologist Trust

a striking monophylic clade of the TIR-type NBS subclass was observed both in the phylogenetic tree based on the NBS ⁄ NACHT combined alignment and in the tree based on the NBS-only alignment, which further suggests a distinct evolutionary history and probable functional divergence of TIR and nonTIR NBS subclass genes. To further explore the evolutionary relationship between the NBS and NACHT domains, the ten most conserved motifs within these two domains were identified by MEME using the NBS ⁄ NACHT combined data (Table 2 and Fig. 4). Consistent with previous studies on the NBS domain, a number of functionally critical motifs were found in our analysis, including the P-loop (motif 1), kinase-2 (motif 5), GLPL (motif 4), RNBS-B (motif 3) and other previously described motifs (Meyers et al., 1999), as shown in Table 2. In addition to just comparing these motifs in TIR- and nonTIR-type NBS domains in land plants, as was generally done in previous studies, the much larger data set in our study enabled us to map the motif combination and usage frequency patterns of these conserved motifs on a phylogeny that included all major kingdoms of organisms on the planet. Clearly

Table 2 Conserved motifs identified in the nucleotide-binding site (NBS) and NACHT regions of aligned sequences Motif IDa

Motif nameb

Consensus sequencea

Motif 1 Motif 2 Motif 3 Motif 4 Motif 5 Motif 6 Motif 7 Motif 8 Motif 9 Motif 10

Kinase a ⁄ P-loop RNBS-D (nonTIR) Kinase-3 ⁄ RNBS-B GLPL Kinase-2 RNBS-A (nonTIR) Unknown RNBS-C Unknown RNBS-D (TIR)

IVGMGGIGKTTLAK LPSHLKQCFLYCSIFPEDY GxGSRIIVTTRNExVAxxM KCGGLPLAIKVLGS KGKRFLLVLDDVWN RVKxHFDLRAWVCVSQDFD KEELIRLWIAEGFI SEEEAWxLFxKxAF xDxILPILKLSYDD IFLDIACFFKGEDKDYVSRILD

a

Motif IDs and consensus sequences were generated by MEME analysis. Motif names are adapted from Meyers et al. (1999).

b

New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist

1056 Research NBS

NACHT

Eubacteria

Archaebacteria

Protista

Fungi

Protostomia

Echinodermata

Chordata

Bryophyta

Lycopodiophyta

Gymnospermophyta

Angiospermophyta

TIR type

nonTIR type

contrasting patterns were revealed in our analysis: while the NACHT domain remains quite conserved from prokaryotes to eukaryotes, the NBS domain shows considerable flexibility in different evolutionary lineages (Fig. 4). For the NBS domain, motifs 1, 3, 4, and 5 were well preserved, with > 80% usage frequency throughout the eukaryotic New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

Fig. 4 Fingerprint analyses for conserved motifs within nucleotide-binding site (NBS) and NACHT domains. The usage frequency of each conserved motif in the NBS and NACHT domains from different evolutionary clades was counted and is shown in the bar plot. Bars 1 to 10 represent different conserved motifs (motifs 1 to 10) and the height of each bar represents its specific usage frequency (from 0 to 1). The characteristic motifs of the Toll ⁄ interleukin-1 receptor (TIR) and nonTIR subclasses of the NBS domain are highlighted in red and blue, respectively.

lineages (Fig. 4). The remaining motifs, however, showed considerable variation in their usage frequency across different lineages. Within land plant lineage, motifs 2, 6 and 7 appear to be specific to the nonTIR-type NBS, while motif 10 was detected only in the TIR-type NBS. Notably, judging from the motif usage frequency pattern, the NBS domain of the TIR subclass is more  2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist similar to the prokaryotic NBS domain than its nonTIR counterparts (Fig. 4). This observation provided another clue to suggest that the TIR-type NBS-LRR genes in plants may be more evolutionarily ancient than nonTIR-NBS-LRR genes. For the NACHT domain, a much simpler pattern was observed, in which motifs 1, 3, and 5 were detected with constantly high representation across different evolutionary lineages. Such clearly contrasting patterns of motif combinations within the NBS and NACHT domains were in concordance with our phylogenetic analysis above, both of which support an independent origin and evolution of the NBS and NACHT domains. In addition, the finding that motif 1 (P-loop) is shared between the NBS and NACHT domains here is consistent with a previous observation of the P-loop in multiple ATP- or GTP-binding domains (Saraste et al., 1990), thus highlighting the structural and functional importance of this element. Other essential genes in the plant disease resistance response pathway We extended our investigation to a number of important regulator and cofactor genes in the plant disease resistance response pathway (Table 3 and Fig. S1) and found considerable variation with respect to the origins and evolutionary patterns of their characteristic domains. The key domains of NDR1, HSP90, EDS1, PAD4, SAG101, and SID2 were found in multiple eubacteria species, suggesting that their origins predate the split of prokaryotes and eukaryotes. The corresponding domains of SGT1, RAR1, and WRKY were found to first appear in protists and fungi, and these domains were widespread throughout the eukaryotic lineage. The AvrRpt cleavage domain encoded by RIN4, however, seems to be specific to land plants, and may have a co-evolutionary relationship with plant NBS-LRR genes. Nevertheless, it is worth noting that the origin of these key domains occurred no later than that of the corresponding immune receptor genes in the respective signaling pathway (e.g. RIN4 vs NBS-LRR and WRKY vs LRR-RLK). Therefore, it seems that, concurrently with the origin of the NBS-LRR and LRR-RLK genes, other necessary components in the respective signaling pathways were established, thus ensuring successful pathogen recognition and downstream signal transduction in plants. Conserved motifs within these key domains were also identified using the MEME program and their combinatorial patterns and usage frequencies were analyzed (Fig. S2). We found diverse patterns of motif evolution for these domains. For some domains, such as RIN4 and RAR1, the motif usage patterns seemed to be quite constant in evolution (Fig. S2). In contrast, the motif usage patterns of some other domains showed dramatic changes across different evolutionary lineages (Fig. S2). Evolutionary conservation of the flagellin receptor in land plants A member of the LRR-RLK proteins called FLS2 has been well studied in Arabidopsis thaliana. Experiments have shown that this protein can detect a 22-amino acid fragment (flg22) of a  2011 The Authors New Phytologist  2011 New Phytologist Trust

Research 1057

microbial flagellin polypeptide; detection initiates a cascade of immune defenses (Zipfel et al., 2004). Mutations in the FLS2 receptor lead to susceptibility to bacterial pathogens in A. thaliana (Chinchilla et al., 2006). Interestingly, Takai et al. (2008) found that the homologous gene in rice remains capable of recognizing flg22 and can also induce a corresponding immune response. In our study, we found a uniform copy number of FLS2-homologous genes in all our sampled land plants, usually as a single copy (Fig. 5). The protein IDs of these FLS2-homologous genes are GSVIVP00003414001 and GSVIVP00003415001 in V. vinifera, 197404 in P. trichocarpa, 5057881 in Sorghum bicolor, 119860 in S. moellendorffii and 151614 in P. patens. As shown in Fig. 5, these genes clustered as a single clade in the phylogenetic tree with high bootstrap support. This observation reflects the considerable evolutionary conservation of the flagellin perception strategy in the plant innate immune system. It will be interesting to investigate the functional conservation of these FLS2-homologous genes in other land plants in future experiments.

Discussion The origin of plant NBS-LRR and LRR-RLK genes Numerous experimental studies have revealed that all these major domains (e.g. NBS, LRR and Pkinase) are indispensable for the function of plant R genes (Dangl & Jones, 2001), suggesting that domain origin and fusion events are critical steps in the origin of plant R genes. Until whole-genome sequencing data recently became available in a series of land plant species, the two major approaches for making inferences about domain appearance and fusion history were BLAST-based homologous searches and degenerate primer-based PCR. For example, in 2002, the discovery of two TIR-encoding sequences in the bryophyte P. patens by a BLAST search against an EST database highlighted the widespread distribution of this domain in land plants (Meyers et al., 2002). In the same year, Akita & Valkonen (2002) identified several TIR-type-like NBS-encoding fragments (without the TIR and LRR regions) in this species by degenerate primer-based PCR. However, the inherent bias underlying these two approaches meant that the resulting data were incomplete. Without obtaining the full-length NBS-LRR and TIR-NBS-LRR genes, it is still difficult to make inferences about the history of domain fusions. In this study, in which not only an array of representative land plant genomes but also all currently available early land plant sequence data were included, we were able to identify and date these important domain fusion events with much higher confidence and better resolution. First, the NBS, LRR, and TIR domains are all very ancient, with their origins probably predating the split of prokaryotes and eukaryotes. In plants, the emergence of the NBS domain in plant lineages can be dated back to the origin in the Coleochaetales. Secondly, the NBS-LRR and TIRNBS domain fusion events both occurred no later than the divergence of bryophytes and at least no fusion event was found in currently available liverwort sequence data. With more wholegenome sequencing data from early land plants becoming New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

WRKY PF03106

Speciesa Sequencesb Regionsc

Speciesa Sequencesb Regionsc 16 175 175

0 0 0

535

10 99 99

0 0 0

1

1 1

13 40 40

7 13 17

214

26 208

17 20 34

28 45 45

215 283 283

1 10 11

3 3 3

Protista

b‘

‘Species’ is the number of species in which the domain is found. Sequences’ is the number of unique sequences in which each domain is found. c‘ Regions’ is the number of domains that are found in all sequences in the alignments. d The question mark here indicates that data are currently not available .

a

SID2 (EDS16) Chorismate_bind PF00425

WRKY

Regionsc

339 535

Lipase_3 PF01764 Speciesa Sequencesb

EDS1 PAD4 SAG101

0 0 0

0 0 0

Speciesa Sequencesb Regionsc

CHORD PF04968

Rar1

0 0 0

0 0 0

Speciesa Sequencesb Regionsc

SGS PF05002

SGT1

0 0 0

Speciesa 941 Sequencesb 1000 Regions 1098

HSP90

HSP90

0 0 0

0 0 0

Speciesa Sequencesb Regions

AvrRpt-cleavage PF05627

Rin4

20 23 26

Archaeabacteria

159 195 207

HIN1 PF03168

NDR1

Eubacteria

Speciesa Sequencesb Regionsc

Pfam domain

Family name

1 5 5

2 46 49

43

1 43

1 2 3

1 1 1

1 11 11

1 3 3

1 25 27

Bryophyta

1 5 5

1 29 36

50

1 50

1 2 3

1 1 1

1 9 9

1 3 3

1 9 10

Lycopodiophyta

?d ?d ?d

3 67 68

22

1 22

1 1 2

1 1 1

3 4 4

16 137 174

25 57 58

Gymnospermophyta

18 77 77

111 1446 1718

510

31 508

17 34 51

24 51 51

13 68 76

1 10 11

41 562 587

Angiospermophyta

59 154 154

5 10 11

349

91 349

34 37 72

61 63 63

85 100 102

0 0 0

0 0 0

Fungi

0 0 0

0 0 0

65

4 65

6 6 12

4 5 5

75 122 124

0 0 0

0 0 0

Protostomia

Table 3 Phyletic distribution of characteristic domains within some essential signaling components in the plant disease resistance response pathway across multiple species

0 0 0

0 0 0

0

0 0

18 20 40

1 2 2

2 2 2

0 0 0

0 0 0

Echinodermata

0 0 0

0 0 0

27

7 27

14 26 51

17 42 42

60 210 223

0 0 0

0 0 0

Chordata

1058 Research

New Phytologist

 2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist

Research 1059 (a)

(b)

Eubacteria Archaebacteria Fungi

Fig. 5 The FLS2 clade in the phylogenetic tree of leucinerich repeat (LRR)-Pkinase genes across multiple species. Representative LRR-receptor-like kinase (RLK) genes across multiple species were used to construct this phylogenetic tree. Both the snapshot of the homolog gene group including FLS2 gene (FLS2 clade) (a) and the entire tree (b) are shown. Different colors represent different phylogenetic clades. Blue branches in the LRR-RLK gene tree denote the FLS2 clade.

available in the future, we should be able to further narrow the window of time in which these domain fusion events could have occurred. A rather controversial issue in the plant R gene community is the chronological order of the origin of TIR-type and nonTIRtype NBS-LRR genes. According to the phylogenetic tree, nonTIR-NBS-LRR genes appear to be much more diverse than their TIR counterparts, and it is therefore tempting to speculate that nonTIR-NBS-LRR genes are more ancient than the TIR type (Cannon et al., 2002). Also, in 2003, Meyers et al. found that nonTIR-NBS-LRR genes also show greater diversity in their intron positions and phases, which makes this hypothesis even more appealing. However, our analysis showed that the NBS domains found in early land plants (Coleochaetales, liverworts, and bryophytes) are closer to the TIR-type NBS domain at sequence level (Table S4 and Fig. 4). Moreover, across the early  2011 The Authors New Phytologist  2011 New Phytologist Trust

Protostomia Echinodermata Chordata Bryophyta Lycopodiophyta Gymnospermophyta Angiospermophyta

land plant lineages, the complete TIR-NBS-LRR genes were first identified in bryophytes, whereas the complete nonTIR-NBS-LRR genes were found in lycophytes, which diverged c. 50 million yr later than bryophytes. Finally, compared with the nonTIR-type NBS domain, the motif usage frequency of the TIR-type NBS domain in land plants showed much higher similarity to that of the NBS domain in several distant evolutionary lineages such as eubacteria, archaeabacteria and fungi, suggesting that they are closer to the ancestral state. Therefore, collectively, our findings support an earlier origin of TIR-type NBS than the nonTIR type. A higher selective constraint for TIR-NBS-LRR genes or a higher diversifying selection pressure for nonTIR-NBS-LRR genes may have been responsible for explaining the different patterns of these two types of NBS-LRR genes found in phylogenetic studies. Compared with NBS-LRR genes, LRR-RLK genes seem to be more ancient and probably originated in the early-branching New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist

1060 Research

unicellular eukaryotes, after the split of the fungal lineage. The timeline of the origin of plant R genes revealed in our study provides further support for the widely known PAMP-triggered immunity–effector-triggered immunity (PTI-ETI) model (Jones & Dangl, 2006). The first layer of defense provided by transmembrane receptors such as LRR-RLKs can recognize conserved features of certain pathogen- ⁄ microbe- associated molecules (e.g. lipopolysaccharide, flagellin and peptidoglycans) and initiate downstream self-protection responses, collectively known as PAMP-triggered immunity (PTI) (Jones & Dangl, 2006). A further investigation of LRR-RLK genes in these early-branching unicellular eukaryotes will shed light on the ancestral function of these genes in their evolutionary history. However, just one layer of defense does not provide plants with sufficient protection. Some successful pathogens gradually evolved the ability to suppress or evade the surveillance of LRR-RLKs by deploying certain virulence effectors (Jones & Dangl, 2006). This new challenge explains why another layer of defense, the intracellular NBS-LRR receptors, emerged in land plant genomes during the arms race. These new weapons are capable of specifically sensing the presence of corresponding virulence effectors and triggering the plant hypersensitive response. A consequently striking expansion of this gene family throughout the entire land plant lineages and their frequent recombination provide a powerful arsenal for land plants, allowing them to evolve novel resistance specificities quickly and effectively to fight diverse pathogens in the terrestrial environment (Leister, 2004). Convergent evolution of the plant and animal innate immune systems Noting the constitutional and structural similarities of plant and animal immune receptors, it is tempting to speculate that plant and animal innate immune systems have a common ancestry. However, further reflection suggests that this may not be the case. There are some noteworthy differences between plant and animal innate immune systems. First, while both transmembrane TLRs and cytoplasmic NLRs in animals are able to recognize conserved PAMPs (or MAMPs), such recognition in plants is exclusively localized to the cell membranes, and is performed by LRR-RLKs and other transmembrane PRRs (Girardin et al., 2002; Jones & Takemoto, 2004). Instead of performing intracellular recognition of PAMPs like animal NLRs, the NBS-LRRs in plants can recognize pathogen-specific effectors and mount effector-triggered immunity (ETI) (Jones & Dangl, 2006). Secondly, consistent with different recognition strategies, NBS-LRRs in plants and NLRs in animals differ considerably in copy number and the types of molecules that they can recognize; plant NBS-LRRs show much higher diversity and can recognize a much broader spectrum of pathogens than their counterparts in animals. Thirdly, the downstream signaling pathways in the animal and plant innate immune systems are also unlikely to be derived from a common origin. A large set of signaling components identified in these two systems are different. For example, the WRKY transcription factors critical in downstream activation of LRR-RLKs are absent from metazoans, whereas plants lack nuclear factor New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

(NF)-jB, an indispensable regulator for the animal immune response (Ausubel, 2005). These fundamental differences, together with the seemingly independent recruitment of those common domains encoded in plant and animal immune receptors, collectively suggest that plant and animal innate immune systems have different origins and their striking resemblance was shaped by convergent evolution. Given the compelling scenario of convergent evolution for plant and animal innate immune systems, it is of interest to ask why the apparently similar or even common functional modules were chosen for these immune systems and why they are assembled in such an analogous way. A noteworthy fact is that the NBS- and NACHT-encoding genes in plants, animals and also fungi, especially when found together with a C-terminal tandemly repetitive domain such as an LRR or WD40, are frequently associated with the ability to discriminate self and nonself, and they play a role in triggering immune responses such as programmed cell death (PCD) (Leipe et al., 2004; Fedorova et al., 2005; Petrilli et al., 2005). Similarly, most TIR-encoding genes identified in plants and animals have been shown to function in immune responses and their TIR domains play important roles in transducing signals to downstream signaling partners (Meyers et al., 2002; Akira & Takeda, 2004). LRR- and Pkinase encoding genes, in contrast, are involved in the signal transduction pathways of many important biological processes. Therefore, there should be some common functional constraints for both modules in choosing and assembling immune receptor genes for plant and animal innate immune systems. Future functional studies on these genes and also their regulatory networks should provide further elucidation of these aspects of the immune systems. Evolutionary novelty based on reorganization of pre-existing building blocks How morphological or physiological novelties are generated is one of the most exciting yet also most challenging questions in evolutionary biology. It is commonly believed that gene duplications play a major role in this process by providing additional copies of raw materials and functional redundancy for evolutionary modification (Ohno et al., 1968; Hughes, 1994; Des Marais & Rausher, 2008; Flagel & Wendel, 2009). Alternatively, recent progress in ‘evo-devo’ has suggested that alterations in gene expression patterns could also lead to considerable phenotypic changes, especially in morphology, without the need for new genes (Carroll, 2000; Wray, 2007). We believe that our studies may provide an excellent example of how reorganization of preexisting functional modules, at both domain and motif levels, can lead to evolutionary novelties. Domains are considered to be the basic unit of proteins, and reorganizing these building blocks may lead to significant changes in the physical structure and also biochemical activity of the corresponding proteins. Therefore, the reshuffling of pre-existing protein domains could, to a large extent, contribute to functional innovation in the course of evolution. Taking NBS-encoding genes as an example, the NBS domain is a member of the STAND superfamily which has ATPase activity and exists in  2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist

Research 1061 Evolution of motif combination 1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

1

2

3

4

Pattern 1

Pattern 2

Pattern 3

 2011 The Authors New Phytologist  2011 New Phytologist Trust

Chorismate_bind

Losing a certain motif

SGS, lipase_3

Adding certain new motifs

NBS

Adding a duplicated copy of the original motif

HSP90

Ab initio establishing new motif combination

CHORD, AvrRpt cleavage

2

Pattern 5

multiple genes that undertake diverse functions (Leipe et al., 2002). In our study, NBS-TPR and NBS-WD40 organization patterns were frequently detected in eubacteria, archaebacteria, and also in higher animals, suggesting that such organizations may be quite ancient and undertake important biological functions. Surprisingly, they were entirely lost in plant lineages and replaced by a novel organization pattern, the NBS-LRR. Such innovation may have led land plants to evolve a second layer of defense that helped them to better adapt to their new habitat. Similarly, the NLR genes in the animal innate immune system have a similar story. The results of several other studies also suggest that this domain reorganization process is fairly common for a substantial number of functionally novel genes (Apic et al., 2001; Basu et al., 2008; Deshmukh et al., 2010). Moreover, it is worth noting that the critical domain organization changes leading to NBS-LRR and NLR genes both occurred during the shift from unicellular organisms to multicellular organisms. Is this merely a coincidence or does it hint at a more general principle? In fact, consistent with our observation of the reorganization of domains, a series of studies in animal evolution have revealed a correlation between such domain recruitment and reorganization events and the evolution of biological complexities, including multicellularity (Itoh et al., 2007; Kawashima et al., 2009). Further focusing on the evolution of the protein domain itself, we propose that the strategy of reorganizing pre-existing building blocks also works at the motif level. These conserved motifs may contain important co-factor binding sites or enzyme catalytic sites, and thus alterations in the motif combinatorial pattern will have a considerable effect on the functional specificities of the corresponding domains. In this study, we examined the evolution of the motif combinatorial patterns of several immune-related domains, and a total of five main patterns were discerned based on our findings (Fig. 6). First, for some highly conserved domains, as illustrated by pattern 1, the motif combinatorial patterns are well preserved across different evolutionary lineages.

Preserving the original motif combination

new

Pattern 4 Fig. 6 The evolution of conserved motif combinations within domains. Five basic patterns of motif combination evolution are shown in cartoons. Different color blocks represent different motifs. A brief description and examples are also given for each pattern.

Exemplary domains

However, more typically, the combination of conserved motifs within a given domain will undergo modification during evolutionary history. For example, pattern 2 shows that some domains, such as SGS and lipase_3, will experience motif loss during this process. Conversely, a copy of a duplicated motif or an entirely new motif can also be added to the original motif combination, as indicated by patterns 3 and 4, respectively. Finally, for entirely novel domains, their motif combinations are generally formed by a way of ab initio establishment (pattern 5). Rather than merely a theoretical model, more and more experiments have demonstrated the efficiency of the strategy of reorganizing pre-existing building blocks in generating new functions. For example, a number of synthetic proteins of fused regulatory or catalytic modules can display novel signaling input ⁄ output relationships and thus generate the potential to rewire the corresponding cellular pathways (Dueber et al., 2003; Yeh et al., 2007). Similarly, Peisajovich et al. (2010) investigated the effect of domain reorganization on the behavior of the yeast mating pathway and observed great diversity in pathway response dynamics generated by chimeric domain recombinants. Therefore, many functional novelties in evolution may not require the invention of entirely new genetic materials, but instead can arise through the recruitment and reorganization of pre-existing building blocks and the establishment of novel linkages among them. Simple, economical but also powerful, this may be an eternal theme in evolution.

Acknowledgements This work was supported by the National Natural Science Foundation of China (30930008, 30970198 and 30930049), the Fundamental Research Funds for the Central Universities and the Qing Lan Project. We thank the Editor and two anonymous reviewers of New Phytologist for their critical comments on the manuscript. New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

1062 Research

References Aarts N, Metz M, Holub E, Staskawicz BJ, Daniels MJ, Parker JE. 1998. Different requirements for EDS1 and NDR1 by disease resistance genes define at least two R gene-mediated signaling pathways in Arabidopsis. Proceedings of the National Academy of Sciences, USA 95: 10306–10311. Akira S, Takeda K. 2004. Toll-like receptor signalling. Nature Reviews Immunology 4: 499–511. Akita M, Valkonen JP. 2002. A novel gene family in moss (Physcomitrella patens) shows sequence homology and a phylogenetic relationship with the TIR-NBS class of plant disease resistance genes. Journal of Molecular Evolution 55: 595–605. Ameline-Torregrosa C, Wang BB, O’Bleness MS, Deshpande S, Zhu H, Roe B, Young ND, Cannon SB. 2008. Identification and characterization of nucleotide-binding site-leucine-rich repeat genes in the model plant Medicago truncatula. Plant Physiology 146: 5–21. Apic G, Gough J, Teichmann SA. 2001. An insight into domain combinations. Bioinformatics 17(Suppl 1): S83–S89. Ausubel FM. 2005. Are innate immune signaling pathways in plants and animals conserved? Nature Immunology 6: 973–979. Azevedo C, Sadanandom A, Kitagawa K, Freialdenhoven A, Shirasu K, SchulzeLefert P. 2002. The RAR1 interactor SGT1, an essential component of R gene-triggered disease resistance. Science 295: 2073–2076. Bailey TL, Elkan C. 1994. Fitting a mixture model by expectation maximization to discover motifs in biopolymers. Proceedings of International Conference on Intelligent Systems for Molecular Biology 2: 28–36. Bakker EG, Toomajian C, Kreitman M, Bergelson J. 2006. A genomewide survey of R gene polymorphisms in Arabidopsis. The Plant cell 18: 1803–1818. Basu MK, Carmel L, Rogozin IB, Koonin EV. 2008. Evolution of protein domain promiscuity in eukaryotes. Genome Research 18: 449–461. Cannon SB, Zhu H, Baumgarten AM, Spangler R, May G, Cook DR, Young ND. 2002. Diversity, distribution, and ancient taxonomic relationships within the TIR and non-TIR NBS-LRR resistance gene subfamilies. Journal of Molecular Evolution 54: 548–562. Carroll SB. 2000. Endless forms: the evolution of gene regulation and morphological diversity. Cell 101: 577–580. Chinchilla D, Bauer Z, Regenass M, Boller T, Felix G. 2006. The Arabidopsis receptor kinase FLS2 binds flg22 and determines the specificity of flagellin perception. The Plant Cell 18: 465–476. Chisholm ST, Coaker G, Day B, Staskawicz BJ. 2006. Host-microbe interactions: shaping the evolution of the plant immune response. Cell 124: 803–814. Dangl JL, Jones JD. 2001. Plant pathogens and integrated defence responses to infection. Nature 411: 826–833. Des Marais DL, Rausher MD. 2008. Escape from adaptive conflict after duplication in an anthocyanin pathway gene. Nature 454: 762–765. Deshmukh K, Anamika K, Srinivasan N. 2010. Evolution of domain combinations in protein kinases and its implications for functional diversity. Progress in Biophysics and Molecular Biology 102: 1–15. Dueber JE, Yeh BJ, Chak K, Lim WA. 2003. Reprogramming control of an allosteric signaling switch through modular recombination. Science 301: 1904–1908. Eddy SR. 1998. Profile hidden Markov models. Bioinformatics 14: 755–763. Edgar RC. 2004. Muscle: multiple sequence alignment with high accuracy and high throughput. Nucleic Acids Research 32: 1792–1797. Eulgem T, Somssich IE. 2007. Networks of WRKY transcription factors in defense signaling. Current Opinion in Plant Biology 10: 366–371. Fedorova ND, Badger JH, Robson GD, Wortman JR, Nierman WC. 2005. Comparative analysis of programmed cell death pathways in filamentous fungi. BMC Genomics 6: 177. Felsenstein J. 1985. Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39: 19. Finn RD, Mistry J, Tate J, Coggill P, Heger A, Pollington JE, Gavin OL, Gunasekaran P, Ceric G, Forslund K et al. 2009. The Pfam protein families database. Nucleic Acids Research 38: D211–D222.

New Phytologist (2012) 193: 1049–1063 www.newphytologist.com

New Phytologist Flagel LE, Wendel JF. 2009. Gene duplication and evolutionary novelty in plants. New Phytologist 183: 557–564. Girardin SE, Sansonetti PJ, Philpott DJ. 2002. Intracellular vs extracellular recognition of pathogens – common concepts in mammals and flies. Trends in Microbiology 10: 193–199. Hubert DA, Tornero P, Belkhadir Y, Krishna P, Takahashi A, Shirasu K, Dangl JL. 2003. Cytosolic HSP90 associates with and modulates the Arabidopsis RPM1 disease resistance protein. EMBO Journal 22: 5679–5689. Hughes AL. 1994. The evolution of functionally novel proteins after gene duplication. Proceedings of the Royal Society of London Series B – Biological sciences 256: 119–124. Itoh M, Nacher JC, Kuma K, Goto S, Kanehisa M. 2007. Evolutionary history and functional implications of protein domains and their combinations in eukaryotes. Genome Biology 8: R121. Johal GS, Briggs SP. 1992. Reductase activity encoded by the HM1 disease resistance gene in maize. Science 258: 985–987. Jones DA, Takemoto D. 2004. Plant innate immunity – direct and indirect recognition of general and specific pathogen-associated molecules. Current Opinion in Immunology 16: 48–62. Jones JD, Dangl JL. 2006. The plant immune system. Nature 444: 323–329. Kawashima T, Kawashima S, Tanaka C, Murai M, Yoneda M, Putnam NH, Rokhsar DS, Kanehisa M, Satoh N, Wada H. 2009. Domain shuffling and the evolution of vertebrates. Genome Research 19: 1393–1403. Leipe DD, Koonin EV, Aravind L. 2004. STAND, a class of P-loop NTPases including animal and plant regulators of programmed cell death: multiple, complex domain architectures, unusual phyletic patterns, and evolution by horizontal gene transfer. Journal of Molecular Biology 343: 1–28. Leipe DD, Wolf YI, Koonin EV, Aravind L. 2002. Classification and evolution of P-loop GTPases and related ATPases. Journal of Molecular Biology 317: 41–72. Leister D. 2004. Tandem and segmental gene duplication and recombination in the evolution of plant disease resistance gene. Trends in Genetics 20: 116–122. Leister RT, Dahlbeck D, Day B, Li Y, Chesnokova O, Staskawicz BJ. 2005. Molecular genetic evidence for the role of SGT1 in the intramolecular complementation of Bs2 protein activity in Nicotiana benthamiana. The Plant Cell 17: 1268–1278. Letunic I, Doerks T, Bork P. 2009. Smart 6: recent updates and new developments. Nucleic Acids Research 37: D229–D232. Liu JJ, Ekramoddoullah AK. 2003. Isolation, genetic variation and expression of TIR-NBS-LRR resistance gene analogs from western white pine (Pinus monticola dougl. Ex. D. Don.). Molecular Genetics and Genomics 270: 432–441. Liu JJ, Ekramoddoullah AK. 2007. The CC-NBS-LRR subfamily in Pinus monticola: targeted identification, gene expression, and genetic linkage with resistance to Cronartium ribicola. Phytopathology 97: 728–736. Maddison WP, Maddison DR. 2010. Mesquite: a modular system for evolutionary analysis. Version 2.74. [www document]. URL http://mesquiteproject.org [last accessed on 20 June 2011]. Maekawa T, Kufer TA, Schulze-Lefert P. 2011. NLR functions in plant and animal immune systems: so far and yet so close. Nature Immunology 12: 817–826. McHale L, Tan X, Koehl P, Michelmore RW. 2006. Plant NBS-LRR proteins: adaptable guards. Genome Biology 7: 212. Meyers BC, Dickerman AW, Michelmore RW, Sivaramakrishnan S, Sobral BW, Young ND. 1999. Plant disease resistance genes encode members of an ancient and diverse protein family within the nucleotide-binding superfamily. Plant Journal 20: 317–332. Meyers BC, Kozik A, Griego A, Kuang H, Michelmore RW. 2003. Genomewide analysis of NBS-LRR-encoding genes in Arabidopsis. The Plant Cell 15: 809–834. Meyers BC, Morgante M, Michelmore RW. 2002. TIR-X and TIR-NBS proteins: two new families related to disease resistance TIR-NBS-LRR proteins encoded in Arabidopsis and other plant genomes. Plant Journal 32: 77–92. Nurnberger T, Brunner F. 2002. Innate immunity in plants and animals: emerging parallels between the recognition of general elicitors and pathogenassociated molecular patterns. Current Opinion in Plant Biology 5: 318–324.

 2011 The Authors New Phytologist  2011 New Phytologist Trust

New Phytologist Ohno S, Wolf U, Atkin NB. 1968. Evolution from fish to mammals by gene duplication. Hereditas 59: 169–187. Peisajovich SG, Garbarino JE, Wei P, Lim WA. 2010. Rapid diversification of cell signaling phenotypes by modular domain recombination. Science 328: 368–372. Petrilli V, Papin S, Tschopp J. 2005. The inflammasome. Current Biology 15: R581. Qiu YL, Palmer JD. 1999. Phylogeny of early land plants: insights from genes and genomes. Trends in Plant Science 4: 26–30. Rairdan G, Moffett P. 2007. Brothers in arms? Common and contrasting themes in pathogen perception by plant NB-LRR and animal NACHT-LRR proteins Microbes and Infection ⁄ Institut Pasteur 9: 677–686. Rast JP, Smith LC, Loza-Coll M, Hibino T, Litman GW. 2006. Genomic insights into the immune system of the sea urchin. Science 314: 952–956. Rusterucci C, Aviv DH, Holt BF 3rd, Dangl JL, Parker JE. 2001. The disease resistance signaling components eds1 and pad4 are essential regulators of the cell death pathway controlled by lsd1 in arabidopsis. The Plant Cell 13: 2211–2224. Saraste M, Sibbald PR, Wittinghofer A. 1990. The P-loop – a common motif in ATP- and GTP-binding proteins. Trends in Biochemical Sciences 15: 430–434. Sodergren E, Weinstock GM, Davidson EH, Cameron RA, Gibbs RA, Angerer RC, Angerer LM, Arnone MI, Burgess DR, Burke RD et al. 2006. The genome of the sea urchin Strongylocentrotus purpuratus. Science 314: 941–952. Staal J, Dixelius C. 2007. Tracing the ancient origins of plant innate immunity. Trends in Plant Science 12: 334–342. Takai R, Isogai A, Takayama S, Che FS. 2008. Analysis of flagellin perception mediated by flg22 receptor OSFLS2 in rice. Molecular Plant-Microbe Interactions 21: 1635–1642. Tamura K, Dudley J, Nei M, Kumar S. 2007. Mega4: molecular evolutionary genetics analysis (mega) software version 4.0. Molecular Biology and Evolution 24: 1596–1599. Tang P, Zhang Y, Sun X, Tian D, Yang S, Ding J. 2010. Disease resistance signature of the leucine-rich repeat receptor-like kinase genes in four plant species. Plant Science 179: 399–406. Tarr DE, Alexander HM. 2009. TIR-NBS-LRR genes are rare in monocots: evidence from diverse monocot orders. BMC Research Notes 2: 197. Wray GA. 2007. The evolutionary significance of cis-regulatory mutations. Nature Reviews. Genetics 8: 206–216. Yang S, Zhang X, Yue JX, Tian D, Chen JQ. 2008. Recent duplications dominate NBS-encoding gene expansion in two woody species. Molecular Genetics and Genomics 280: 187–198. Yeh BJ, Rutigliano RJ, Deb A, Bar-Sagi D, Lim WA. 2007. Rewiring cellular morphology pathways with synthetic guanine nucleotide exchange factors. Nature 447: 596–600. Zhang Y, Wang J, Zhang X, Chen JQ, Tian D, Yang S. 2009. Genetic signature of rice domestication shown by a variety of genes. Journal of Molecular Evolution 68: 393–402.

 2011 The Authors New Phytologist  2011 New Phytologist Trust

Research 1063 Zhou T, Wang Y, Chen JQ, Araki H, Jing Z, Jiang K, Shen J, Tian D. 2004. Genome-wide identification of NBS genes in japonica rice reveals significant expansion of divergent non-TIR NBS-LRR genes. Molecular Genetics and Genomics 271: 402–415. Zipfel C, Robatzek S, Navarro L, Oakeley EJ, Jones JD, Felix G, Boller T. 2004. Bacterial disease resistance in Arabidopsis through flagellin perception. Nature 428: 764–767.

Supporting Information Additional supporting information may be found in the online version of this article. Fig. S1 Phyletic distribution pattern of characteristic domains within other essential signaling components in the plant disease resistance response pathway. Fig. S2 Evolution of motif combinatorial patterns within the key domains of some essential genes in plant disease resistance response pathways. Table S1 Whole-genome data used in this study Table S2 Currently available sequence data for nine early plant lineages from NCBI GenBank Table S3 The characteristic domains within plant immune receptor genes and other essential signaling components in plant and animal innate immune systems Table S4 Pfam and BLASTP analysis for candidate nucleotidebinding site (NBS) domains encoded in early plant lineages Notes S1 Sequence and alignment files. Please note: Wiley-Blackwell are not responsible for the content or functionality of any supporting information supplied by the authors. Any queries (other than missing material) should be directed to the New Phytologist Central Office.

New Phytologist (2012) 193: 1049–1063 www.newphytologist.com