Structural diversity and evolution of the N-terminal ... - Springer Link

1 downloads 0 Views 1MB Size Report
Feb 12, 2010 - polyneopteran species, and 9 paraneopteran species] and. 26 Endopterygota species). GenBank accession numbers of the EcR-A isoforms ...
Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

RESEARCH ARTICLE

Open Access

Structural diversity and evolution of the N-terminal isoform-specific region of ecdysone receptor-A and -B1 isoforms in insects Takayuki Watanabe*, Hideaki Takeuchi, Takeo Kubo*

Abstract Background: The ecdysone receptor (EcR) regulates various cellular responses to ecdysteroids during insect development. Insects have multiple EcR isoforms with different N-terminal A/B domains that contain the isoformspecific activation function (AF)-1 region. Although distinct physiologic functions of the EcR isoforms have been characterized in higher holometabolous insects, they remain unclear in basal direct-developing insects, in which only A isoform has been identified. To examine the structural basis of the EcR isoform-specific AF-1 regions, we performed a comprehensive structural comparison of the isoform-specific region of the EcR-A and -B1 isoforms in insects. Results: The EcR isoforms were newly identified in 51 species of insects and non-insect arthropods, including direct-developing ametabolous and hemimetabolous insects. The comprehensive structural comparison revealed that the isoform-specific region of each EcR isoform contained evolutionally conserved microdomain structures and insect subgroup-specific structural modifications. The A isoform-specific region generally contained four conserved microdomains, including the SUMOylation motif and the nuclear localization signal, whereas the B1 isoform-specific region contained three conserved microdomains, including an acidic activator domain-like motif. In addition, the EcR-B1 isoform of holometabolous insects had a novel microdomain at the N-terminal end. Conclusions: Given that the nuclear receptor AF-1 is involved in cofactor recruitment and transcriptional regulation, the microdomain structures identified in the isoform-specific A/B domains might function as signature motifs and/or as targets for cofactor proteins that play essential roles in the EcR isoform-specific AF-1 regions. Moreover, the novel microdomain in the isoform-specific region of the holometabolous insect EcR-B1 isoform suggests that the holometabolous insect EcR-B1 acquired additional transcriptional regulation mechanisms.

Backgrounds Morphogenetic events during insect development are triggered by ecdysteroids. The major insect ecdysteroid, 20hydroxyecdysone (20E), binds directly to a heterodimeric transcription factor comprising two nuclear receptors, the ecdysone receptor (EcR) and the ultraspiracle, to activate a complex transcriptional cascade [1]. In the holometabolous insects, there are two or three EcR isoforms (A and B1 isoforms; Drosophila has an additional B2 isoform) that are produced from a single genetic locus by differential promoter usage and alternative splicing [2]. The EcR isoforms have a common C-terminal region that includes the * Correspondence: [email protected]; [email protected] Department of Biological Sciences, Graduate School of Science, The University of Tokyo, Bunkyo-ku, Tokyo 113-0033, Japan

DNA binding domain and the ligand binding domain (the C-F domains), but also have isoform-specific regions in the N-terminal A/B domain. In the holometabolous insects, each EcR isoform govern distinct steroid-stimulated responses during metamorphosis. In Drosophila, the EcR-A isoform is predominantly expressed in the proliferative tissues during metamorphosis, and is required for adult-specific developmental processes. In contrast, the EcR-B isoforms (B1 and B2 isoforms) are expressed in the larva-specific tissues, and are involved in 20E-triggered larval tissue remodeling during metamorphosis [2-7]. In the red flour beetle Tribolium castaneum, the EcR-A isoform has a dominant role in transduction of the ecdysteroid action by regulating the expression of 20E-response genes [8]. Only the A isoform has been identified in basal direct-

© 2010 Watanabe et al; licensee BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (http://creativecommons.org/licenses/by/2.0), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

developing (ametabolous and hemimetabolous) insects [9,10], therefore, it is not known whether direct-developing insects have multiple EcR isoforms with distinct physiologic functions. In addition to the differential expression patterns and the distinct physiologic functions during development, EcR isoforms have differential transcriptional regulation mechanisms. The A/B domain of the nuclear receptors generally contains the ligand-independent activation function (AF)-1 region [11]. Several studies of the AF-1 function of Drosophila EcR isoforms have revealed that the A/B domain of Drosophila EcR-B isoforms have strong transactivation activity, whereas that of the A isoform has weaker transactivation activity [12]. The AF-1 of Drosophila EcR-B1 mainly locates in the N-terminal region (amino acid residues 1-53), whose sequence is considerably conserved among higher holometabolous insects such as flies, mosquitoes, and moths [13]. Although structural differences in the AF-1 region among the EcR isoforms result in differential transcriptional regulation, the structural basis and molecular mechanisms underlying the isoform-specific AF-1 functions remain obscure. In the present study, to examine the detailed structural characteristics of the isoform-specific region of each EcR isoform, we performed a comprehensive structural comparison of the isoform-specific regions of the insect EcR-A and -B1 isoforms. First, we newly identified EcR isoforms in 51 insect and non-insect arthropod species, focusing mainly on the basal direct-developing insects. Then, we compared the deduced amino acid sequences of the isoform-specific region of the EcR-A and -B1 isoforms (45 and 76 species, respectively). Sequence comparison revealed that the isoform-specific region of each EcR isoform contains several conserved microdomain structures, some of which were evolutionally acquired or modified. Interestingly, the isoform-specific region of the holometabolous insect EcR-B1 isoform contained a highly conserved (K/R)RRW motif at the N-terminus, which was lacking in basal directdeveloping insects. Given that the nuclear receptor AF-1 regions provide an interaction interface for co-regulatory proteins, the conserved microdomain structures in the EcR isoform-specific AF-1 regions might play essential roles in transcriptional regulation. Moreover, the holometabolous insect EcR-B1 might have novel transcriptional regulation mechanisms that are mediated by the newly acquired (K/R)RRW motif.

Results Cloning of cDNA fragments encoding the A/B domain of the EcR-A and -B1 isoforms

The EcR gene encodes two or three isoforms in the holometabolous insects (A, B1 and B2 isoforms), but

Page 2 of 17

only A isoform was previously identified in the directdeveloping insects. In the present study, to perform a comprehensive sequence comparison of the isoform-specific region of the EcR-A and -B1 isoforms among insects and non-insect arthropods, we obtained cDNA fragments encoding the A/B domains of the EcR-A and -B1 isoforms from 18 and 51 arthropod species, respectively, by using reverse transcription polymerase chain reaction (see Additional file 1, Table S1 and Additional file 2, Table S2). We newly identified the EcR-A isoform in the direct-developing insects, including Paleoptera (1 species), Polyneoptera (2 species), and Paraneoptera (8 species). For a detailed analysis, we also identified the EcR-A isoform in 5 holometabolous insect species and one non-insect arthropod species. Although the EcR-B1 isoform was identified in many holometabolous insects and several non-insect arthropods, it has not been identified in basal hemimetabolous and ametabolous insects. Therefore, we newly identified the EcR-B1 isoform in basal insects, including Apterygota (2 species), Paleoptera (2 species), Polyneoptera (8 species), and Paraneoptera (12 species). In addition, we identified the EcR-B1 isoform in 25 holometabolous insect species and two non-insect arthropod species. To our knowledge, this is the first report of the identification of the EcR-B1 isoform in the direct-developing insects. In the directdeveloping insects, the EcR isoforms were cloned from the whole body of the nymph and adult, and isolated tissues, including the nymphal wing pad, fat body, testis, and ovaries (data not shown), suggesting that the EcR isoforms are broadly distributed in the direct-developing insects. The A isoform-specific region was classified into five structural types

We compared the deduced amino acid sequences of the isoform-specific region in the A/B domain of the EcR-A isoform identified from 45 arthropod species (5 noninsect arthropod species [Crustacea and Chelicerata], 15 direct-developing insect species [1 paleopteran species, 4 polyneopteran species, and 9 paraneopteran species] and 26 Endopterygota species). GenBank accession numbers of the EcR-A isoforms were listed in Additional file 1, Table S1. General structural features of the A isoform-specific region

The C-terminal region of the A isoform-specific region contained a conserved Ser/Thr-rich region, which was formally referred to as the “A-box” [10,14]. In the Nterminal region of the A isoform-specific region, we found two conserved microdomains: the SUMOylation motif and the nuclear localization signal (NLS). In addition, we found conserved (D/E)(D/E)W residues between the NLS and the C-terminal A-box. The SUMOylation motif in the A isoform-specific region

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 3 of 17

contained a canonical consensus sequence for the SUMOylation (ΨKxE, where Ψ represents a large hydrophobic amino acid) [15]. The NLS of the A isoform-specific region contained a consensus sequence of the monopartite NLS [K(K/R)x(K/R)] [16]. Comprehensive structural comparison revealed several subgroup-specific structural modifications. Based on the structural similarity, we classified the isoform-specific region of the EcR-A isoform into five structural types.

did not contain the DLKHE sequence at their N-termini. The EcR-A isoforms identified from several polyneopteran insects (Blattella germanica [Order Dictyoptera], Locusta migratoria [Order Orthoptera], and Anisolabis maritima [Order Dermaptera]), and non-neopteran insects (Ephemera strigata [Order Ephemeroptera]) did not contain the ΨAYRG sequence at their N-termini. The EcR-A isoforms identified in Lepidoptera lacked the NLS.

Type-1 A isoform-specific region (Figure 1)

Type-3 A isoform-specific region (Figure 3)

The type-1 A isoform-specific region contained the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box. The consensus sequence of the Abox of the type-1 A isoform-specific region was NGxSPSxxSSYDxxYSP. The EcR-A isoform with the type-1 A isoform-specific region was identified in noninsect arthropods.

The type-3 A isoform-specific region contained the (D/ E)(D/E)W residues and the A-box. The type-3 A isoform-specific region lacked the N-terminal region containing the SUMOylation motif and the NLS. The consensus sequence of the A-box of the type-3 A isoform-specific region was NGxPSPTMSSMSYDPYSP. The EcR-A isoform with the type-3 A isoform-specific region was identified in Heteroptera (Order Hemiptera).

Type-2 A isoform-specific region (Figure 2)

The type-2 A isoform-specific region contained the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box. In addition, the type-2 A isoformspecific region generally contained two conserved Nterminal sequences (DLKHE and ΨAYRG, where Ψ represents a large hydrophobic amino acid). The consensus sequence of the A-box of the type-2 A isoformspecific region was NGYSSP(M/L)SSGSYDPYSP. The EcR-A isoform with the type-2 A isoform-specific region was identified in the direct-developing insects except for Heteroptera (Order Hemiptera) and three holometabolous orders (Coleoptera, Trichoptera, and Lepidoptera). The EcR-A isoforms identified from two psicopteran insects (Pediculus humanus corporis and Liposcelis sp.)

Type-4 A isoform-specific region (Figure 4)

The type-4 A isoform-specific region contained the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box. In addition, the type-4 A isoformspecific region contained D(T/S)S repeats in the N-terminus. The consensus sequence of the A-box of the type-4 A isoform-specific region was NGYASPMS(T/S) GSYDPYSP. The EcR-A isoform with the type-4 A isoform-specific region was identified in Hymenoptera and Mecoptera. The NLS of the EcR-A isoforms identified from several hymenopteran insects (Apis mellifera, Pheidole megacephala, and Camponotus japonicus) was modified (consensus; KxxR).

Crustacea

Daphnia magna Ornithodoros moubata Amblyomma americanum Nephila clavata Liocheles australasiae

Chelicerata

50 amino acids SUMO NLS

NLS + (D/E)(D/E)W

SUMO

Daphnia magna Ornithodoros moubata Amblyomma americanum Nephila clavata Liocheles australasiae

30 82 57 78 62

::* *** . ** IIKTEPRNSDSPTTT IVKVEPRLP-SPCVS VVKVEPRLP-SPCVG IVKMEPRLP-SPCNIVKVEPRLP-SPCLN

SUMO

(D/E)(D/E)W A-box

44 95 70 90 75

146 104 80 95 94

* ** : :: *:.** SPLKRSRHININNNNNNSSSSAMEE---WLPSP SPSKRIR---------------QDDAGSWIASP -PPKRVR---------------QDDAGAWISSP SPSKRTK---------------QED-GTWLPSP SPNKRIR---------------QED-GTWIPSP

NLS

(D/E)(D/E)W

A-box

175 121 96 111 110

232 142 121 127 133

* ** : * **: : *. : NGYSP-SPMSTGSYEPPF-SPGGGGGKLGNGLSPAS-CS--SYD-TY-SP-----RGPC -GLSPVSNCS--SYD-TY-SP-----RGPC NGHSP-T--S--SYD-AYCSS-----KD-NGLSPVSMVS--SYDP-Y-SP-----KG--

258 161 134 143 151

A-box

Figure 1 Type-1 A isoform-specific region. The type-1 A isoform-specific region contained four conserved microdomains. (A) Schematic representation of the type-1 A isoform-specific regions. Positions of the SUMOylation motif, the NLS, the (D/E)(D/E) residues, and the A-box are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the conserved microdomains of the type-1 A isoformspecific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 4 of 17

Bombyx mori Choristoneura fumiferana Omphisa fuscidentalis Chilo suppressalis Manduca sexta Stenopsyche marmorata Tribolium castaneum Leptinotarsa decemlineata Graptopsaltria nigrofuscata Bothrogonia ferruginea Aphrophora pectoralis Acyrthosiphon pisum Pediculus humanus corporis Liposcelis sp. Blattella germanica Locusta migratoria Anisolabis maritima Nemoura sp. Ephemera strigata

Lepidoptera Endopterygota Trichoptera Coleoptera Hemiptera (without Heteroptera)

Paraneoptera

Psocoptera Dictyoptera Orthoptera Dermaptera Plecoptera Ephemeroptera

Polyneoptera

50 amino acids DLKHE ΨAYRG

(D/E)(D/E)W A-box

SUMO NLS

DLKHE + ΨAYRG

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

Bombyx mori Choristoneura fumiferana Omphisa fuscidentalis Chilo suppressalis Manduca sexta Stenopsych marmorata Tribolium castaneum Leptinotarsa decemlineata Graptopsaltria nigrofuscata Bothrogonia ferruginea Aphrophora pectoralis Acyrthosiphon pisum Pediculus humanus corporis Liposcelis sp. Blattella germanica Locusta migratoria Anisolabis maritima Nemoura sp. Ephemera strigata

-------------MEL-K-HE---------------------VAY-RG---------------MDL-K-HE---------------------VAY-RG---------------MDL-K-HE---------------------VAY-RG---------------MDL-K-HE---------------------VAY-RG---------------MDL-K-HE---------------------VAY-RG---------------MDL-KQ-E--------------------LL-F-RPGPA MTAICS--GNLGSMDI-K-HE--------------------MI-Y-R---MTTIHSITSHLGSMDI-K-HE--------------------MI-Y-RD---------------MDL-KQHD--------------------SL-YNR-SGG -------------MDL-K-HE--------------------LL-YGR-SGT -------------MDL-K-HD-----------------------------------------MMDQ-K-CDGGGGGVAAAAAGIGGGGVGGLMSYNRGRGG ------------------------------------------M-F-RGGAS ------------------------------------------M-YQRGTST -------------MDL-K-HE-LGGLGGGGASGNC----------------------------MELFRGADPLA-VSVPLALPLPGHASPASAA-------------------MD-----EDM--LAAPAA-----TATVAGAAAAVASAT -------------MDLKRQ----------------------MM-Y-RGGVQ -------------MDL-K-HEEGL-LRLSDPGDGPGPT-------------

DLKHE

36 34 35 35 36 66 98 102 80 54 41 103 58 58 59 93 61 95 45

: -----------D--W-MAGGAA -----------D--W-MAGGAG -----------E--W-MAGATA -----------Y--W-MAGVVG -----------D--W-MAGGAA VKRPR--TD--D--W-LAAASP SKRQR--TD--D--W-LSSPSATK-RQMTE--D--F-MSSPSP NKRPR--TD--E--WPLSSPSP TKR-R--VD--D--W-MSSPSP TKRQR--TD--D--W-LSSPSP VKRLR--PD--D--W-LAVNSP SKRPR--TD--D--W-LSPPSP NKRSR--TD--D--W-LSPPSP AKRAR--LDS-D--W-LSSPGS NKRPR--PD--D--W-LSSPSP AKRLRPSTE--ES-W------NKRTR--PD--D--W-LSAVSP TKRARLSSEQQEGGW-L--PSP

11 11 11 11 11 14 21 24 15 14 6 37 7 8 19 30 26 14 22

:* QVKAEPGVSH--QVKAEPGV-H--QVKAEPGVST--QVKAEPGVSH--QVKAEPGVSH--VVKAEPD--HGGLVKSEPQL-HSPC LVKSEPQL-FTSN YIKSEPRL-HSPV IIKSEPRL-HSPP LIKSEP---HG-IIK--PR---SPA QVKSEPRV-HSPC QVKSEPRV-HSPC VVKTEPRE----SVKTEPRV-HSPC ------------39 AVKAE-RV-QSPC 25 LVKTEPRT-MSPC 16 16 15 15 16 22 27 29 24 24 14 41 23 22 34 28

ΨAYRG

NLS + (D/E)(D/E)W

Bombyx mori Choristoneura fumiferana Omphisa fuscidentalis Chilo suppressalis Manduca sexta Stenopsych marmorata Tribolium castaneum Leptinotarsa decemlineata Graptopsaltria nigrofuscata Bothrogonia ferruginea Aphrophora pectoralis Acyrthosiphon pisum Pediculus humanus corporis Liposcelis sp. Blattella germanica Locusta migratoria Anisolabis maritima Nemoura sp. Ephemera strigata

SUMO

25 24 24 24 25 31 38 40 35 35 21 48 34 33 41 39 49 36

SUMO

A-box

43 41 42 42 43 80 111 117 95 67 55 117 72 72 74 107 72 109 63

NLS (D/E)(D/E)W

59 56 57 56 59 102 132 142 125 91 83 133 95 95 131 101 74 139 100

* .* ** :*: .**.**** SN--------------GY-----SSP-LSS-GSYGPYSP-NG NN--------------GY-----SSP-LSS-GSYGPYSP-NG NN--------------GY-----SSP-LSS-SSYGPYSP-NG NN--------------GY-----SSP-LSS-SSYGPYSP-NG NN--------------GY-----SSP-LSS-GSYGPYSP-NSN--------------GY-----SSP-MSS-GSYDPYSP-NG SN--------------GY-----SSP-MSS-GSYDPYSP-NG SN--------------GY-----SSP-MSS-GSYDPYSP-NG SN--------------GY-----SSPTMSS-GSYDPYSP-NG SN--------------GY-----SSPTMSS-GSYDPYSP-NG SN--------------GY-----SSPTMSS-GSYDPYSP-NE SNGGGGGGGGGGGGGGGYN----TSP-MST-NSYDPYSPMSG SN--------------GY-----SSP-MSS-GSYDPYSP-NG SN--------------GY-----SSPTMSS-GSYDPYSP-NG SN--------------GY-----SSP-MSS-GSYDPYSP-NG PN--------------GYA-----SP-LSSGGSYDPYSP-GG TNNNNNNST-------SYAGGGAPSP-LSSGGSYDPYSP-TG NN--------------GY-----SSP-MSS-GSYDPYSP-NG SN--------------GY-----SSP-LSS-GSYDPYSP-NG

78 75 76 75 77 121 151 161 145 111 103 168 114 115 150 121 106 158 119

A-box

Figure 2 Type-2 A isoform-specific region. The type-2 A isoform-specific region contained four conserved microdomains and two conserved N-terminal sequences (DLKHE and ΨAYRG). (A) Schematic representation of the type-2 A isoform-specific regions. Positions of the two Nterminal sequences, the SUMOylation motif, the NLS, the (D/E)(D/E) residues, and the A-box are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region and the conserved microdomains of the type-2 A isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Type-5 A isoform-specific region (Figure 5)

The type-5 A isoform-specific region contained the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box. In addition, the type-5 A isoformspecific region contained YRLN residues at the N-terminus. The consensus sequence of the A-box of the type-5 A isoform-specific region was NGYASPMSAGSYDPYSPNG. The EcR-A isoform with the type-5 A isoform-specific region was identified in Diptera. The EcRA isoforms identified from several species in genus Drosophila (D. melanogaster, D. simulans, and D. yakuba) did not contain the YRLN sequence at their N-termini.

The B1 isoform-specific region was classified into six structural types

We compared the deduced amino acid sequences of the isoform-specific region in the A/B domain of the EcR-B1 isoforms identified from 76 arthropod species (4 non-insect arthropod species [Crustacea, Myriapoda, and Chelicerata], 24 direct-developing insect species [2 apteryogote species, 2 Paleopteran species, 8 polyneopteran species, and 12 paraneopteran species] and 48 Endopterygota species). GenBank accession numbers of the EcR-B1 isoforms were listed in Additional file 2, Table S2.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 5 of 17

General structural features of the B1 isoform-specific region Corythucha marmorata Physopelta gutta Pachygrontha antennata

The N-terminal region of the B1 isoform-specific region essentially contained three microdomains: the S-rich motif, the SP residues, and the DL-rich motif. The Srich motif was composed of acidic residues (D and E) and neutral polar residues (S and T), and contained a conserved core EV(T/S)SS sequence. The DL-rich motif was composed of repeats of acidic residues and aromatic/bulky hydrophobic residues (W, A, F, I, L, M, and V). The region between the S-rich and the DL-rich motifs generally contains conserved SP residues. In addition to these microdomains, the isoform-specific region of the holometabolous insect EcR-B1 isoform contained an additional conserved (K/R)RRW motif at the Nterminus. Comprehensive structural comparison revealed several subgroup-specific structural modifications. Based on the structural similarity, we classified the isoform-specific region of the EcR-B1 isoform into seven structural types.

Heteroptera (Order Heteroptera)

50 amino acids (D/E)(D/E)W

A-box

(D/E)(D/E)W + A-box

Corythucha marmorata Physopelta gutta Pachygrontha antennata

* *: :* *:* * ** * ** ***************::*:* 1 MMGEE-KRTEDDWMVTGGSPRTYQANGGAGFPSPTMSSNSYDPYSPSSKLG 50 1 -MCEDSRR--DEWSVGGG-----Q-NG---CPSPTMSSNSYDPYSPASKIG 39 1 -MCEESRR--DEWTVGGG-----Q-NG---CPSPTMSSNSYDPYSPATKIG 39

A-box

(D/E)(D/E)W

Figure 3 Type-3 A isoform-specific region. The type-3 A isoformspecific region contained two conserved microdomains. (A) Schematic representation of the type-3 A isoform-specific regions. Positions of the (D/E)(D/E) residues and the A-box are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the full-length of the type-3 A isoform-specific regions. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Apis mellifera Pheidole megacephala Camponotus japonicus Nasonia vitripennis Urocerus antennatus Panorpa pryeri

Hymenoptera

Mecoptera

50 amino acids D(T/S)S repeats SUMO

NLS (D/E)(D/E)W

A-box

SUMO

D(T/S)S repeats

Apis mellifera Pheidole megacephala Camponotus japonicus Nasonia vitripennis Urocerus antennatus Panorpa pryeri

1 1 1 1 1 1

:*** : ------------------------------------MDTS-DSSMDTTN---MGDVVLAAPSRLRRAINLSTAMTTTAKTETTTITLSMDTSGDSNMDTDN---------------------------------------MDTSGDSSLDTANRPLQ ------------------------------------MDTSQQQQQQQQQQQDP ------------------------------------MDTS-EGP------------------------------------MSSLSIGKVDTSANGLSNHLSATLN

12 49 17 17 7 29

37 70 31 20 10 84

D(T/S)S repeats

NLS + (D/E)(D/E)W

Apis mellifera Pheidole megacephala Camponotus japonicus Nasonia vitripennis Urocerus antennatus Panorpa pryeri

85 112 69 62 48 135

.* * ** *:. GKGAARSDD---WLA 96 NK-TARSDD---WLA 122 NK-TTRPDD---WLA 79 NK-RPRTDD---WLA 72 NK-RPRPDD---WLT 58 NK-RLRIDDTSPWVV 148

NLS (D/E)(D/E)W

:*.* * LLVKAETPEH LLVKAERPDN LTIKAERPDH GLVKAEAPEQ ARVKVETPEH SIVKVE-PR-

46 79 40 29 19 90

SUMO

A-box

151 167 119 147 140 170

.* *:: ** :: ***.:** : . NSNNGYA--SPMSTSSYDPYSPNSKIA VSNNGYA--SPMSTGSYDPYSPNGKIG VSNNGYA--SPMSTGSYDPYSPNGKIG PSNNGYA--SPMSSGSYDPYSPNGKIG PNNNGYA--SPMSSGSYDPYSPNGKIG LSNMGHTTLSP-TS-SYDSFSP--R-G

175 191 143 171 164 191

A-box

Figure 4 Type-4 A isoform-specific region. The type-4 A isoform-specific region contained four conserved microdomains and conserved Nterminal D(T/S)S repeats. (A) Schematic representation of the type-4 A isoform-specific regions. Positions of the D(T/S)S repeats, the SUMOylation motif, the NLS, the (D/E)(D/E) residues, and the A-box are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region and the conserved microdomains of the type-4 A isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 6 of 17

Drosophila melanoguster Drosophila yakuba Drosophila simulans Drosophila ananassae Drosophila grimshawi Drosophila mojavensis Drosophila virilis Drosophila persimilis

Diptera

Drosophila pseudoobscura pseudoobscura Ceratitis capitata Aedes aegypti Anopheles gambiae 50 amino acids YRLN SUMO

YRLN

Drosophila melanoguster Drosophila simulans Drosophila yakuba Drosophila ananassae Drosophila grimshawi Drosophila mojavensis Drosophila virilis Drosophila persimilis Drosophila pseudoobscura pseudoobscura Ceratitis capitata Aedes aegypti Anopheles gambiae

1 1 1 1 1 1 1 1 1 1 1 1

. ------------MLTTSGQQ 8 ------------MLTTSGQQ 8 ------------MLTSEL-Q 7 -----------MYRLNAIQQ 9 -----------MYRLNATQQ 9 -----------MYRLNATQQ 9 -----------MYRLNATQQ 9 -----------MYRLNAT-- 7 -----------MYRLNAT-- 7 MNNGNNNNNIIALTVN-TLP 19 -----------MYRLNIVST 9 -----------MYRVNIVST 9

YRLN

A-box

NLS (D/E)(D/E)W

SUMO

:* **:. *. AIKIEPMP-DVI AIKIEPMP-DVI TIKIEPMP-DVI TIKIEPMA-DVI TIKTEPMP-DVI AIKTEPMP-DVI -IKTEPMP-DVI TIKIEPMP-DVI TIKIEPMP-DVI -----------129 AVKVEPLPMDTG 129 AVKVEPLPSDTG 77 77 68 101 109 139 136 87 89

SUMO

NLS + (D/E)(D/E)W

87 87 78 111 119 149 145 97 99 140 140

134 135 125 159 165 197 207 145 147 125 191 187

** * : *:*:*: PNKRPRLDVT---EDWMST PNKRPRLDVT---EDWMST PNKRPRIDVT---EDWMST PNKRARLEVS---EDWMSA PTKRARLDAP---EDWMST PNKRARLDAP---EDWMST PNKRARLDAP---EDWMST PSKRPRLDAP---EDWMST PSKRPRLDAP---EDWMST S-KRMRLECPPSNEEWIST SGKR-RLE---SNEEWISS GGKR-RHE---SNEEWISS

NLS

(D/E)(D/E)W

149 150 140 174 180 212 222 160 162 142 205 201

A-box

174 175 165 195 205 237 247 185 187 169 232 228

*****.* ****.***.* NMSNGYASPMSAGSYDPYSPTG NMSNGYASPMSAGSYDPYSPTG NMSNGYASPMSAGSYDPYSPTG -MSNGYAS---EGSYDAYSPNG NMSNGYASPMSAGSYDPYSPNG NMSNGYASPMSAGSYDPYSPNG NMSNGYASPMSAGSYDPYSPNG NMSNGYASPMSAGSYDPYSPNG NMSNGYASPMSAGSYDPYSPNG NMSNGYPSPMSAGSYDPYSPNG TMSNGYSSPMSTGSYDPYSPNG TMSNGYSSPLSSGSYDPYSPNG

195 196 186 212 226 258 268 206 208 190 253 249

A-box

Figure 5 Type-5 A isoform-specific region. The type-5 A isoform-specific region contained four conserved microdomains and conserved Nterminal YRLN residues. (A) Schematic representation of the type-5 A isoform-specific regions. Positions of the YRLN residues, the SUMOylation motif, the NLS, the (D/E)(D/E) residues, and the A-box are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region and the conserved microdomains of the type-5 A isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Type-1 B1 isoform-specific region (Figure 6)

The type-1 B1 isoform-specific region contained three conserved microdomains (the S-rich motif, the SP residues, and the DL-rich motif). The consensus sequences of the S-rich motif and the DL-rich motif of the type-1 B1 isoform-specific region were GDExSxEVSSSS and DIGEVDLDFWDLDL, respectively. The N-terminal region and the region between each two conserved microdomains varied in length and sequence. The EcRB1 isoform with the type-1 B1 isoform-specific region was identified in non-insect arthropods and non-neopteran insects. Moreover, the EcR isoforms identified from the filarial nematode Brugia malayi [EcR-A and -C isoforms (GenBank accession numbers; ABQ28713 and ABQ28714)] contained the S-rich motif-like sequence and the DLrich motif in the N-terminal A/B domain (amino acid residues 1-153, both isoforms contained the common A/B domain; Figure 6C). Type-2 B1 isoform-specific region (Figure 7)

The type-2 B1 isoform-specific region contained three conserved microdomains (the S-rich motif, the SP residues, and the DL-rich motif). In addition, the type-2 B1 isoform-specific region generally contained a

partially conserved TxxΨW sequence at the N-terminus. The consensus sequences of the S-rich motif and the DL-rich motif of the type-2 B1 isoform-specific region were GxESSPEVTSSS and DIGEVDLEFWDLDL, respectively. The EcR-B1 isoform with the type-2 B1 isoform-specific region was identified in Polyneoptera and Paraneoptera except for Heteroptera (Order Hemiptera). Type-2’ B1 isoform-specific region (Figure 8)

The type-2’ B1 isoform-specific region structurally resembled to the type-2 B1 isoform-specific region, but the motif structures were modified. The consensus sequences of the S-rich motif and the DL-rich motif of the type-2’ B1 isoform-specific region were EDxxxQVSSS and EVDLELWDLGL, respectively. Moreover, the region between the S-rich motif and the DLrich motif were shortened. In some cases, these two motifs were completely fused (e.g. EcR-B1 isoforms of Physopelta gutta and Pachygrontha antennata). Like the type-2 B1 isoform-specific region, the type-2’ B1 isoform-specific region contained the N-terminal TxxΨW sequence. The EcR-B1 isoform with the type-2’ B1 isoform-specific region was identified in Heteroptera (Order Hemiptera).

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 7 of 17

Sympetrum infuscatum Ephemera strigata Thermobia domestica Ctenolepisma villosa Celuca pugilator Daphnia magna Thereuopoda clunifera Nephila clavata

Odonata Ephemeroptera

Basal Pteryogota

Thysanura

Apterygota

Insecta

Crustacea Myriapoda Chelicerata

50 amino acids SP

S-rich

S-rich

DL-rich

SP + DL-rich 





















50 amino acids Modi ed S-rich

Sympetrum infuscatum Ephemera strigata Thermobia domestica Ctenolepisma villosa Celuca pugilator Daphnia magna Thereuopoda clunifera Nephila clavata

1 14 5 5 26 1 1 58

: . ****: MQ-LEDEGDEEVSSTTP MQRLGAESSPEVSSSSP MQ-MGDEGSLEVSSSSP MQ-MGDEGSPEVSSSSP ST-MGDESCSEVSSSSP MT-MGDHG--EVSSSSS MT-MGDVGSPEVSSSQR RS-LGDESSPEVSSSNI

16 30 20 20 41 14 16 73

33 41 26 26 63 139 22 87

** .: **.****::****:* SPQ--PVLSSSSDIAEVDLDFWDLDLN 57 SPN--PTGSSSSDIGEVDLDLWDLDLN 65 SPV--PSLASS-DIGEVDLEFWDLDLN 49 SPV--PSLASS-DIGEVDLEFWDLDLN 49 SPP--TSLASS-DIGEVDLDFWDLDLN 86 SPSDEHSLPSS-DIGEVDLDLWDLDIN 164 SPV--RSLVSTPDISEVDLEFWDLDIN 46 SPV--PSLAPS-DIGEVDLEFWDLDIN 110

SP

S-rich























DL-rich

Modi ed S-rich

DL-rich

. *: :* 4 ATVTYHELPPLTSW 18 DEGSXEV--SSS S-rich consensus (Type 1)

:*: *:* * ** *. 93 RTDV-EID-DNFWAATSFLS 110 DIGEVDLD-FWDLD--L DL-rich consensus (Type 1)

DL-rich

Figure 6 Type-1 B1 isoform-specific region. The type-1 B1 isoform-specific region contains three conserved microdomains. (A) Schematic representation of the type-1 B1 isoform-specific regions. Positions of the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the conserved microdomains found in the type-1 B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues. (C) The N-terminal region of Brugia malayi EcR contained the S-rich motif-like sequence (modified S-rich motif) and the DL-rich motif. Positions of the modified S-rich motif and the DL-rich motif are indicated in the schematic representation of the N-terminal region of B. malayi EcR (amino acid residues 1-151). The alignment of the microdomains found in B. malayi with their corresponding microdomains in the type-1 B1 isoform-specific region reveals their sequence similarity. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Graptopsaltria nigrofuscata Oncotympana maculaticollis Aphrophora pectoralis Bothrogonia ferruginea Acyrthosiphon pisum Franklinothrips vespiformis Liposcelis sp. Tenodera angustipennis Reticulitermes speratus Periplaneta fuliginosa Gryllus bimaculatus Achetus domesticus Locusta migratoria Anisolabis maritima Nemoura sp.

Hemiptera (without Heteroptera)

Paraneoptera

Thysanoptera Psocoptera Dictyoptera Polyneoptera

Orthoptera Dermaptera Plecoptera

50 amino acids SP DL-rich

TxxΨW S-rich

TxxΨW

Graptopsaltria nigrofuscata Oncotympana maculaticollis Aphrophora pectoralis Bothrogonia ferruginea Acyrthosiphon pisum Franklinothrips vespiformis Liposcelis sp. Tenodera angustipennis Reticulitermes speratus Periplaneta fuliginosa Gryllus bimaculatus Achetus domesticus Locusta migratoria Anisolabis maritima Nemoura sp.

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

----------MTTSIWL-RPGGIPI----------MTTSIWL-RPGGIPI----------MTASIWV-RPGGLPV----------MTT-IWI-RPGIIPM--------------M-L-RLASQNDG ---------MMTG-I-L-ARGGCPV----------MKK-L-ITDRDMTDLY -----------------------MQY ----------MSF-MWVSDKH--LQF ----------MTS-IWIPS----------------MTS-VWACGAGGVLGL ----------MTS-VWACGAGGVLGL MERGMSVVSAMS--MWSRAGAGVL-L ----------MAT--WVAN----------------MTSTVWARF-------

TxxΨW

S-rich

14 14 14 13 10 13 14 3 13 8 15 15 23 7 9

18 18 21 19 14 21 34 29 38 31 25 30 34 16 19

. .**:** MGDESSPEVTSSSA MGDESSPEVTSSSA MGDEGSPEVTSSSA --EDSSSEVTSSSA ----SSSEVTSSSS --PESSPEVTSSSS -AEESSPEVTSSCG SGE-GSPEVTSSSG MGDEGSPEVTSSCS MGDEGSPEVTSSCS MGDESSPEVTSSGL MGDESSPEVTSSGL EGCDGSPEVSSSSDG--SSSEVTSSSL DG-QGVPEVTSSSA

S-rich

SP + DL-rich

31 31 34 30 23 32 46 41 51 44 38 43 46 27 31

49 49 49 45 49 53 49 49 76 58 78 82 48 42 37

::: *** ISPA--HSLGSSDIGEV-DLEFW-DLDLN 73 ISPA--HSLGSSDIGEV-DLEFW-DLDLN 73 ISPA--HSLGSSDIGEV-DLEFW-DLDLN 73 MSPP--TSLGSSDIDEVVNIDLF-DLDL- 69 NSPM------TRES-----FEFLQDLDDS 66 ISP--EHSLGSPDMGGV-DMEFW-DLDQT 77 -------SN--SDFRDV-DLELW-DLDVN 66 ISPA--HSLGSSDIGEV-DLEFW-DLDLN 73 ISPA--HSHGSSDIGDV-AIEFW-DLDLD 100 ISPA--HSLGSSDIGEV-DLEFW-DLDLN 82 ISPA--HSLGSSDIGEV-DLDLF-DLDIN 102 ISPA--HSLGSSDIGEV-DLDLF-DLDIN 106 LSPA--HSLTSS-IGEV-DIEFW-DLDIN 71 LSPSIASSNA-PDLGEV-YLEFW-DLDLN 67 ISPA--HSLGSSDIGEV-DLEFW-DLDLN 61

SP

DL-rich

Figure 7 Type-2 B1 isoform-specific region. The type-2 B1 isoform-specific region contained three conserved microdomains and the Nterminal TxxΨW sequence. (A) Schematic representation of the type-2 B1 isoform-specific regions. Positions of the TxxΨW sequence, the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the Nterminal region and the conserved microdomains of the type-2 B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Physopelta gutta Pachygrontha antennata Orius strigicollis Corythucha marmorata Appasus japonicus

Page 8 of 17

Heteroptera (Order Heteroptera)

50 amino acids SP DL-rich

TxxΨW S-rich

TxxΨW + S-rich + SP + DL-rich

Physopelta gutta Pachygrontha antennata Orius strigicollis Corythucha marmorata Appasus japonicus

1 1 1 1 1

:*: * * :: **:**. * *****:** ---------MTAMWVPR-FGQVSR-ED----QVSSSS---EV-DLELWDLG ---------MTAMWVPR-FGQVSR-ED----QVSSSS---EV-DLELWDLG ------------MWV-RGY-AM-R-EEG-CDQVTSSCSPG-VEDLELWDLG MRSVKNGSNEVMMWV-RGYAAM-RGEDGCCDQVSSSCSPGVVDDLELWDLG ---------MTTLWL-RAGGG--R-DETSCDQVSSSCSPGQV-DLELWELG

TxxΨW

S-rich

SP

32 32 33 49 37

DL-rich

Figure 8 Type-2’ B1 isoform-specific region. The type-2’ B1 isoform-specific region contained three conserved microdomains and the N-terminal TxxΨW sequence. (A) Schematic representation of the type-2’ B1 isoform-specific regions. Positions of the TxxΨW sequence, the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region of the type-2’ B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Type-3 B1 isoform-specific region (Figure 9)

The type-3 B1 isoform-specific region contained four conserved microdomains [the (K/R)RRW motif, the Srich motif, the SP residues, and the DL-rich motif]. The consensus sequences of the S-rich motif and the DLrich motif of the type-3 B1 isoform-specific region were EESSSEVTSSS and DIGDVDLEFWDLDL, respectively. The EcR-B1 isoform with the type-3 B1 isoform-specific region was identified in Coleoptera, Strepsiptera, Neuroptera, Megaroptera, and Raphidioptera. Type-4 B1 isoform-specific region (Figure 10)

The type-4 B1 isoform-specific region contained four conserved microdomains [the (K/R)RRW motif, the Srich motif, the SP residues, and the DL-rich motif]. In addition, the type-4 B1 isoform-specific region contained a conserved LQTVPRVPVAGV sequence just N-terminally adjacent to the (K/R)RRW motif. The consensus sequences of the S-rich motif and the DL-rich motif of the type-4 B1 isoform-specific region were ESSPEVSSS and EDLQLWDLDL, respectively. The EcR-B1 isoform with the type-4 B1 isoform-specific region was identified in the aculeate Hymenoptera and horntails (Family Siricidae, Order Hymenoptera). The EcR-B1 isoform identified from two vespine wasps (Vespa mandarinia and Anterhynchium flavomarginatum) contained a S/H-rich insertion between the (K/R)RRW and the S-rich motifs. Type-5 B1 isoform-specific region (Figure 11)

The type-5 B1 isoform-specific region structurally resembled to the type-4 B1 isoform-specific region, but

contained an additional N-terminal extension region. The consensus sequences of the conserved N-terminal sequence, the S-rich motif and the DL-rich motif of the type-5 B1 isoform-specific region were LAVPRVPVAGV, ESSPEVSSS, and EDLQLWDLDL, respectively. The EcR-B1 isoform with the type-5 B1 isoform-specific region was identified in sawflies, the basal lineage of Hymenoptera. Moreover, the isoform-specific region of the EcR-B1 isoform identified in Panorpa pryeri (Order Mecoptera) contained an N-terminal extension region similar to that of the type-5 B1 isoform-specific region. The P. pryeri EcR-B1 isoform contained the S-rich motif (amino acid residues 75-83; ESSPEVSSS) and the DLrich motif (amino acid residues 99-112; ELGEVDLDFWDLDI), which were structurally similar to the S-rich motif of the type-4 B1 isoform-specific region and the DL-rich motif of type-3 B1 isoform-specific region, respectively. Type-6 B1 isoform-specific region (Figure 12)

The type-6 B1 isoform-specific region contained conserved microdomains [the (K/R)RRW motif, the S-rich motif, the SP residues, and the DL-rich motif]. The consensus sequence of the S-rich motif of the type-6 B1 isoform-specific region was EESSSEVTSSS. Although the DL-rich motif of type-6 B1 isoform-specific region was composed of acidic residues and bulky hydrophobic residues, its sequence was highly modified [consensus; (D/ E)Y(C/G)(E/D)LWxxxxD)]. The EcR-B1 isoform with the type-6 B1 isoform-specific region was identified in Diptera, Lepidoptera, and Trichoptera. Phylogenetic evolution of motif structures in the EcR isoform-specific regions in insects

Based on the phylogenetic relationship of insect orders, we constructed evolution models for the structural evolution of the isoform-specific region of the EcR isoforms in insects (Figure 13). The structural modifications in the A isoform-specific region accompanied by insect evolution are represented in Figure 13A. The A isoform-specific region of non-insect arthropods (type-1 A isoform-specific region) was composed of four conserved microdomains [the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box]. The EcR-A isoform of direct developing insects acquired two Nterminal sequences (DLKHE and ΨAYRG) (type-2 A isoform-specific region), which were conserved among most direct-developing insect groups except for Heteroptera, whose EcR-A isoform lacked the N-terminal half of the isoform-specific region including the SUMOylation motif and the NLS (type-3 A isoform-specific region). The two N-terminal sequences and four conserved microdomains were also conserved among most holometabolous insect groups except for Hymenoptera, Mecoptera and Diptera. The isoform-specific region of

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 9 of 17

Anthonomus grandis Leptinotarsa decemlineata Tenebrio molitor Tribolium castaneum Pseudoxenos iwatai Hagenomyia micans Protohermes grandis Inocellia japonica

Coleoptera Strepsiptera Neuroptera Megaroptera Raphidioptera

Neuropterida

50 amino acids (K/R)RRW S-rich

SP DL-rich

S-rich

(K/R)RRW

1 1 1 1 1 1 1 1

Anthonomus grandis Leptinotarsa decemlineata Tenebrio molitor Tribolium castaneum Pseudoxenos iwatai Hagenomyia micans Protohermes grandis Inocellia japonica

***** ------MKRRWAG 7 ------MKRRWNG 7 ------MKRRWSG 7 ------MKRRWS- 6 ------MKRRWTG 7 ------MKRRWSG 7 MYTFSKMKRRWTG 13 ------MKRRWSG 7

20 13 16 12 16 19 25 16

SP + DL-rich

:.*.**:*** VDSSAEVTSSSSS EESSSEVTSSSTEESSSEVTSSSTT EESSSEVTSSS-EESSSEVSSSSTS DESSPEVTSSS-DESSPEVTSSSTDEGSPEVTSSS--

(K/R)RRW

32 24 28 22 28 29 36 26

39 28 32 27 38 35 41 32

S-rich

:*. ::*.:: *:.:. :* ***:*:: SPA------NSLVST-DLGEA-DLEFWDLDLN SPA------NSLASA-DIGDV-DLEFWDLDLN SPA------NSLAST-DIGDV-DLEFWDLDLN SPA------NSLAST-DIGDV-DLEFWDLDLN TPVTHTNNYQTLATT-DISDV-ELAFWDIDMN SPA------NSLAST-DIGDV-DLEFWDLDLN SPA------NSLASTTDIGDVVDLEFWDLDLQ SPA------NSLAST-DIGDV-DLEFWDLDIN

SP

62 51 55 50 67 58 66 55

DL-rich

Figure 9 Type-3 B1 isoform-specific region. The type-4 B1 isoform-specific region contained four conserved microdomains. (A) Schematic representation of the type-4 B1 isoform-specific regions. Positions of the (K/R)RRW motif, the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the conserved microdomains of the type-4 B1 isoformspecific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Apis mellifera Bombus hypocrita Xylocopa appendiculata Osmia cornifrons Coelioxys fenestrata Ammophila infesta Vespa mandarinia Anterhynchium flavomarginatum Scolia oculata Campsomeris schulthessi Pheidole megacephala Anospilus sp. Amblyjoppa sp. Eretmocerus eremicus Nasonia vitripennis Urocerus antennatus

Apoidea

Aculeata Suborder Apocrita (Order Hymenoptera)

Vespoidea

Ichneumonoidea Chalcidoidea Suborder Symphyta

Siricoidea

(Order Hymenoptera)

50 amino acids LQTVPRVPVAGV (K/R)RRW

LQTVPRVPVAGV + (K/R)RRW

Apis mellifera Bombus hypocrita Xylocopa appendiculata Osmia cornifrons Coelioxys fenestrata Ammophila infesta Vespa mandarinia Anterhynchium flavomarginatum Scolia oculata Campsomeris schulthessi Pheidole megacephala Anospilus sp. Amblyjoppa sp. Eretmocerus eremicus Nasonia vitripennis Urocerus antennatus

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

** : **:*:**::***. MLQTV-PRVPVAGIKRRWGMLQTV-PRVPVAGVKRRWGMLQTM-PRVPVAGVRRRWGMLQTV-PRVPVAGVRRRWGMLQTV-PRVPIAGVRRRWGMLQTV-PRVPVAGVRRRWGT MLQTV-PRVPVAGVRRRWGG MLQTV-PRVPVAGVRRRWGMLQTM-PRVPVAGVRRRWGMLQTM-PRVPVAGVRRRWGMLQTM-PRIPVAGVRRRWSMLQTM-PRVPVAGVRRRWGT MLQTM-PRVPVAGVRRRWGML-A-QPRLPVAGVKRRWAS MLQTMQPRVPVAGVKRRWGG MLQTV-PRVPMAGVKRRWGG

18 18 18 18 18 19 19 18 18 18 18 19 18 18 20 19

LQTVPRVPVAGV (K/R)RRW

DL-rich

S-rich SP

S-rich + SP

25 25 25 25 25 28 62 42 25 24 25 25 25 27 28 24

DL-rich

** * .****: :*. LQ------E-SSPEVSSSGVLSPPP LQ------E-SSPEVSSSGVLSPPP LQ------E-SSPEVSSSGVLSPPP LQ------E-SSPEVSSSGVLSPPL LQ------E-SSPEVSSSGVLSPPP LQ------E-SSPEVSSSGVLSPPP LQ------E-SCPEVSSSGVLSPPP LQ------E-SSPEVSSSGVLSPPP LQ------ETSSPEVSSSVVLSPPP LQ------ETSSPEVSSSEVLSPPA LQ------E-SSPEVSSSDTFSS-LQ------E-SSPEVSSSGVLSP-LQ------E-SSPEVSSSGVLSPPP LQPSPAHGE---SEVSSSGMLSPPP LQ------E-SSSEVSSSGVLSPPP LQ------E-SSPEVSSTGVLSPPP

S-rich

SP

42 42 42 42 42 45 79 59 43 42 40 40 42 48 45 41

57 57 57 57 57 61 106 83 61 61 53 55 58 84 65 55

* *******: * EE-----LQLWDLDLH-RG 69 EE-----LQLWDLDLH-RG 69 EE-----LQLWDLDLH-RG 69 EE-----LQLWDLDLH-RG 69 EE-----LQLWDLDLH-RG 69 ED-----LQLWDLDLQ-RG 73 ED-----LQLWDLDLH-RG 118 ED-----LQLWDLDLH-RG 95 ED-----LQLWDLDLH-RG 73 ED-----LQLWDLDLH-RG 73 ED-----LQLWDLDLH-RG 65 ED-----LQLWDLDLH-RG 67 ED-----LQLWDLDLQ-RG 70 EGSPGNGAQLWDLDLHGRQ 102 ED-----PQLWDLDLQPRQ 78 ED-----LQLWDLDLQ-RG 67

DL-rich

Figure 10 Type-4 B1 isoform-specific region. The type-4 B1 isoform-specific region contained four conserved microdomains and the Nterminal LQTVPRVPVAGV sequence. (A) Schematic representation of the type-4 B1 isoform-specific regions. Positions of the LQTVPRVPVAGV sequence, the (K/R)RRW motif, the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region and the conserved microdomains of the type-4 B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 10 of 17

Cimbex femoratus Arge similis Allantus luctifer Tenthredo fagi Panorpa pryeri

Sawflies (Order Hymenoptera, suborder Symphyta )

Mecoptera

50 amino acids N-terminal extension LAVPRVPVAGV

(K/R)RRW S-rich

SP DL-rich

N-terminal extension + LAVPRVPVAGV+ (K/R)RRW

Cimbex femoratus Arge similis Allantus luctifer Tenthredo fagi Panorpa pryeri

1 1 1 1 1

:. * . :. * :**** *:* ::* MLVT-CPSVGSRLV-LVAKDCTLGGTVAS-DPSNH-----------------KVQEIAEMLAVPRV--------------------PVAGVKRRWGG MLGSLFP-GGSSVAALVAANGSAIEDI-SEEPTTATTTTNPPPKTPPPMLVVKVDE-GEMIAVPRV--------------------PVAGVKRRWGMLVT-WPSGGSGLV-LLAKDSGL-DTLESEDTGSD-----------------KTGETAEMLAVPRV--------------------PVAGVKRRWGG MLVT-WPSGGSSLV-LVAKDSTFQ----SSDPSSD-----------------KIGETAEMLAVPRV--------------------PVAGVKRRWGG ------------MVIKKAR-AVFPDDFMNEESM-------------------KLGPC----SVPRVKKEAAESSSRHASSSSGSLQPIA--RKRA--

N-terminal extension

S-rich

Cimbex femoratus Arge similis Allantus luctifer Tenthredo fagi Panorpa pryeri

62 78 62 59 75

********* ESSPEVSSSG ESSPEVSSSG ESSPEVSSSG ESSPEVSSSG ESSPEVSSSQ

S-rich

LAVPRVPVAGV

57 73 57 54 57

(K/R)RRW

SP + DL-rich

71 87 71 68 84

73 89 73 70 89

:** * .: * : *:.:****:: LSPPPNSLPAVPECAMNAEDLQLWDLDLQ-RG LSPPPHFHPAVPENAVHAEDLHLWDLDLQ-RG LSPPPNFHPATPECAVNTEDLQLWDLDLQ-RG LSPPPNFHPSTPECAVNADDIQFWDLDLQQRG ISPVPSLNSS--E--LGEVDLDFWDLDIN---

SP

103 119 103 101 113

DL-rich

Figure 11 Type-5 B1 isoform-specific region. The type-5 B1 isoform-specific region contained four conserved microdomains, the N-terminal LAVPRVPVAGV sequence, and the N-terminal extension region. (A) Schematic representation of the type-5 B1 isoform-specific regions. Positions of the N-terminal extension, the LAVPRVPVAGV sequence, the (K/R)RRW motif, the S-rich motif, the SP residues, and the DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the N-terminal region and the conserved microdomains of the type-5 B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

the hymenopteran/mecopteran EcR-A isoform and the dipteran EcR-A isoform lacked the two N-terminal sequences, but contained the subgroup-specific sequences: The A isoform-specific region of the hymenopteran/mecopteran EcR-A isoform (type-4 A isoformspecific region) contained the D(S/T)S repeats, and that of the dipteran EcR-A isoform generally contained the YRLN sequence at the N-terminus. In addition, although the isoform-specific region of the lepidopteran EcR-A isoform structurally classified as the type-2 A isoform-specific region, it lacked the NLS. The structural modifications in the B1 isoform-specific region that accompanies insect evolution are represented in Figure 13B. All types of the B1 isoform-specific region contained three conserved microdomains: the S-rich motif, the SP residues, and the DL-rich motif (the B1 isoform-specific region of several heteropteran insects did not contain the SP residues). The DL-rich motif of the type-6 B1 isoform-specific region was structurally modified and showed a limited sequence similarity with the corresponding motif of other insects. Interestingly, although Mecoptera is thought to be a sister taxon of Diptera [17], the P. pryeri EcR-B1 isoform had a DL-rich motif quite similar to the corresponding motif of the type-3 B1 isoform-specific region. In addition to the three conserved microdomains, the isoform-specific region of the EcR-B1 isoform contained

subgroup-specific N-terminal sequences. The EcR-B1 isoform of polyneopteran and paraneopteran insects (types 2 and 2’ B1 isoform-specific region) generally contained the N-terminal TxxΨW sequence. In the holometabolous insects, the TxxΨW sequence was replaced with the (K/R)RRW motif, a novel conserved microdomain at the N-terminal end of the holometabolous insect EcR-B1 (types 3-6 B1 isoform-specific region). The hymenopteran/mecopteran EcR-B1 isoform contained the N-terminal LQTVPRVPVAGV (or LAVPRVPVAGV) sequence just N-terminally adjacent to the (K/R)RRW motif (types 4 and 5 B1 isoform-specific region). Moreover, the EcRB1 isoform of the basal hymenopteran insects (sawflies) and the mecopteran insect had the N-terminal extension region (type 5 B1 isoform-specific region).

Discussion Generally, the nuclear receptor A/B domain contains the ligand-independent AF-1 transactivation domain that is involved in the interaction with transcriptional cofactors [18,19], and some nuclear receptors have multiple isoforms with different A/B domains that exhibit isoformspecific transcriptional regulation and physiological functions [20-23]. In the present study, we performed a structural analysis of the isoform-specific A/B domain of the EcR-A and -B1 isoforms. Our data revealed the presence of the conserved microdomain structures as well as the

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 11 of 17

Drosophila melanogaster Drosophila simulans Drosophila yakuba Drosophila ananassae Drosophila grimshawi Drosophila mojavensis Drosophila virilis Drosophila persimilis Drosophila pseudoobscura pseudoobscura Calliphora vicina Lucilia cuprina Ceratitis capitata Aedes albopictus Bombyx mori Spodoptera frugiperda Omphisa fuscidentalis Chilo suppressalis Manduca sexta Stenopsych marmorata 50 amino acids (K/R)RRW S-rich

Diptera

Lepidoptera

Trichoptera

SP Modi ed DL-rich

(K/R)RRW + S-rich

Drosophila melanogaster Drosophila simulans Drosophila yakuba Drosophila ananassae Drosophila grimshawi Drosophila mojavensis Drosophila virilis Drosophila persimilis Drosophila pseudoobscura pseudoobscura Calliphora vicina Lucilia cuprina Ceratitis capitata Aedes albopictus Bombyx mori Spodoptera frugiperda Omphisa fuscidentalis Chilo suppressalis Manduca sexta Stenopsych marmorata

1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1

SP + Modi ed DL-rich

*:*** *** * ::: ::**:**:*** -------------------------------------------------------------MKRRWSNNGGF--MRLPEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--MRLPEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--MRLPEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--MRLPEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--LRIQEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--LRIHEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--LRIQEESSSEVTSSSN -------------------------------------------------------------MKRRWSNNGGF--LRVPEESSSEVTSSSN ------------------------------------------------------------MMKRRWSNNGGF--LRVPEESSSEVTSSSN ------------------------------------------------------------MMKRRWSNNGGFAALKMLEESSSEVTSSSN ------------------------------------------------------------MMKRRWSNNGGFAALKMLEESSSEVTSSSN ------------------------------------------------------------MMKRRWSNNGGF-ALRMLEESSAEVSSSSN ------------------------------------------------------------MMKRRWSNNGGFTALRMLDDSSSEVTSSSA MRVENVDNVSFALNGRADEWCMSVETRLDSLVREKSEVKAYVGGCPS-VITDAGAYDALFDMRRRWSNNGGF-PLRMLEESSSEVTSSS---------------------MSIESRLDSLVRGKSEVKAFTGGCPSALV-DTGACDTLADMRRRWYNNGGFQTLRMLEESSSEVTSSS-------------------------------------------------------------MRRRWSNNGGFQTLRMLEESSSEVTSSS-------------------------------------------------------------MRRRWSNNGGFQTLRMLEESSSEVTSSS-------------------------------------------------------------MRRRWSNNGCF-PLRMFEESSSEVTSSS-------------------------------------------------------------MKRRWSNNGGFQTLRMLEDSSAEVSSSS-

(K/R)RRW

S-rich

27 27 27 27 27 27 27 27 28 30 30 29 30 87 67 28 28 27 28

37 37 37 37 37 37 37 37 38 40 40 37 36 97 77 38 38 37 37

:**..* * * * : MSPSSLDSHDYCDQDLWLCGNE 58 MSPSSLDSHDYCDQDLWLCGNE 58 MSPSSLDSHDYCDQELWLCGNE 58 MSPSSLDSHDYCDQELWLCGND 58 MSPSSLDSHDYGEPEMWLCGND 58 MSPSPLDSPDYGDQGSWLCGND 58 MSPSSLDSHDYGEQELWLCGND 58 MSPSSLDSHDYCDQDLWLCGND 58 MSPSSLDSHDYCDQDLWLCGND 59 MSPSSLDSPVYGDQDMWLC-ND 60 MSPSSLDSPVYGDQEMWLC-ND 60 LSPSPLDSPVYGDQDLW--YD- 55 MSPNSLGSPNYDELELWSSYED 57 MSPESLASPEYRALELW-SYDD 117 MSPESLASPEYGGLELW-GYED 97 MSPESLASPEYSNLELW-TYED 58 MSPESLASPEYAGLELW-GYDD 58 MSPESLASPEYGGLELW-SYDE 57 MSPESLGSPEYGEIELW-GYDE 57

SP

Modi ed DL-rich

Figure 12 Type-6 B1 isoform-specific region. The type-6 B1 isoform-specific region contained four conserved microdomains. The DL-rich motif was structurally modified in the type-6 B1 isoform-specific region (termed as modified DL-rich motif). (A) Schematic representation of the type-6 B1 isoform-specific regions. Positions of the (K/R)RRW motif, the S-rich motif, the SP residues, and the modified DL-rich motif are indicated by colored boxes. (B) The alignment and Sequence Logo representation of the conserved microdomains of the type-6 B1 isoform-specific region. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues.

insect subgroup-specific structural variations in the isoform-specific region in the A/B domain of each EcR isoform. The conserved microdomain structures in the EcR isoform-specific regions might be the fundamental structural basis for the EcR isoform-specific AF-1 functions. The A isoform-specific region generally contained the SUMOylation motif, the NLS, the (D/E)(D/E)W residues, and the A-box. Among these structural motifs, the (D/E)(D/E)W residues and the A-box were highly conserved in the C-terminal half of the A isoform-specific region across arthropod species, suggesting that these two microdomains are the essential structures for the A isoform-specific AF-1. Moreover, the NLS was conserved among all species except for Heteroptera and Lepidoptera. The NLS in the isoform-specific region of Drosophila EcR-A isoform is involved in nucleo-cytoplasmic shuttling in mammalian cells [24]. Our data suggest that the isoform-specific region-mediated

nuclear localization mechanism of the EcR-A isoform is conserved among most species. The A/B domain of Drosophila EcR-A isoform exhibits weak transactivation activity in Drosophila Kc167 cells [12], whereas it exhibits transcriptional inhibition activity in HeLa cells [13]. In the red flour beetle T. castaneum, the EcR-A isoform initiates ecdysone responses by regulating the expression of downstream 20Eresponse genes, indicating that the EcR-A isoform functions as a transactivating factor. In the N-terminal half of the A isoform-specific region, we found a conserved SUMOylation motif. SUMOylation is a reversible posttranslational modification involved in various cellular processes, such as nucleo-cytoplasmic shuttling and transcriptional regulation [25-27]. SUMOylation of transcription factors generally results in the repression of transcriptional activation activity [28]. The high level of conservation of the SUMOylation motif in the A

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 12 of 17

Figure 13 Structural diversity of the EcR isoform-specific region in insects. (A) Evolution model for the structural modification in the A isoform-specific region in insects. (B) Evolution model for the structural modification in the B1 isoform-specific region in insects. Schematic representation shows the phylogenetic relationship of insect subgroups with the structural types of the isoform-specific region. Conserved microdomains are indicated by colored boxes.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

specific region might function as signature motifs and/ or target sites for the EcR isoform-specific transcription cofactors. The A/B domain of the Drosophila EcR-B1 isoform activates transcription even in yeast and mammalian cells [13,37,38], therefore it is likely that the AF-1 region of the EcR-B1 isoform activates transcription through mechanisms conserved among eukaryotic organisms. Among the conserved microdomains contained in the Drosophila EcR-B1 isoform-specific region, the DL-rich motif, which comprises tandem repeats of acidic residues (D and E) and aromatic/bulky hydrophobic residues (W, A, F, I, L, M, and V), was structurally similar to the acidic activation domain (Figure 14A). Acidic activation domains are found in viral and cellular transcription factors such as VP16, GAL4 and p53 [39-42], as well as in the AF-1 region of several nuclear

Acidic activator domains of VP16

* :: :: 437 D--ALD--DF-DLDM 466 G--ALDMADF-EFEQ Type 1 Type 2 Type 2' Type 3 Type 4 Type 5

DL-rich motif consensus

447 477

DIGEVDL-DFWDLDL DIGEVDL-EFWDLDL ---EVDL-ELWDLGL DIGDVDL-EFWDLDL ---E-DL-QLWDLDL ---E-DL-QLWDLDL Holometabolous insect-specific Co-regulatory protein

W )RR

(K/R

ch DL -ri

DL -ri

ch

S-rich

ich

S-r

isoform-specific region suggests that it is functionally important. SUMOylation of the A isoform-specific region might contribute to the cell-type/organismdependent transcriptional regulation mediated by the A/ B domain of the EcR-A isoform. All types of the B1 isoform-specific region contained the S-rich and the DL-rich conserved motifs. The isoform-specific region of the holometabolous insect EcRB1 isoform (types 3-6 B1 isoform-specific region) contained an additional (K/R)RRW motif at the N-terminus. In Drosophila, strong transactivation activity was mapped at the N-terminal end of the A/B domain of the EcR-B1 isoform (amino acid residues 1-53) [13], which includes all three conserved motifs. Moreover, the S-rich motif-like sequences and the DL-rich motif are found in the A/B domain of the ecdysone receptor identified from the filarial nematode B. malayi. The high level of conservation of the S-rich and DL-rich motifs across ecdysozoan species suggests that these motifs are the essential structures for the B1 isoformspecific AF-1 function. The length and sequences of the linker regions between microdomains, as well as the distribution of microdomains in the isoform-specific region varied across species. For example, the B1 isoform-specific region of the non-insect arthropod EcR-B1 isoforms (type-1 B1 isoform-specific region) contained a long linker region between the S-rich and the DL-rich motifs (e.g., Daphnia magna EcR-B1 contained an 137-amino acid linker region). In the holometabolous insects, the three conserved motifs [the (K/R)RRW, the S-rich, and the DL-rich motifs] were contained in the N-terminal half of the B1 isoform-specific region, and the linker regions between the motifs were shortened (e.g. type-6 B1 isoform-specific region). The isoform-specific region of the EcR-B1 isoform identified in vespoid wasps contained a long S/H-rich insertion between the holometabolous insect-specific (K/R)RRW motif and the S-rich motif. These data suggest that the conserved microdomains in the EcR isoform-specific regions are structurally, and even functionally, independent of each other. In vitro and in silico structural analysis for the A/B domain of the Drosophila EcR isoforms revealed that the N-terminal region of the Drosophila EcR-B1 isoform-specific region, which contains the conserved microdomains, is structurally disordered [29]. The intrinsically disordered AF-1 region was found in many other nuclear receptors, such as the androgen receptor and the glucocorticoid receptor [30-34]. The intrinsic disordered regions are found in a lot of proteins involved in protein-protein interactions [35,36] and that the nuclear receptor AF-1 region is involved in cofactor recruitment and transcriptional regulation. Thus, the conserved microdomains found in the EcR isoform-

Page 13 of 17

Co-regulatory proteins essential for EcR-B1 AF-1 Direct-developing insect EcR-B1 AF-1

Holometabolous insect EcR-B1 AF-1

Figure 14 Hypothetical model of the B1 isoform-specific region mediated transcriptional regulation. (A) Sequence comparison between two acidic activator domains of herpes virus protein VP16 and the consensus sequences of the DL-rich motif of the EcR-B1 isoform. Two acidic activation domains of VP16 (GenBank accession number; NP_044650) were aligned with the consensus sequences of the DL-rich motifs of the type 1-5 B1 isoform-specific regions. The DL-rich motif of EcR-B1 isoform and the VP16 acidic activation domains comprise tandem repeats of acidic residues and aromatic/ bulky hydrophobic residues. The amino acid residues are represented in the default color scheme of ClustalX. Asterisks indicate identical residues and colons indicate similar residues. (B) Schematic model of the action of the EcR-B1 isoform-specific AF-1. The S-rich and DL-rich motifs might interact with co-regulatory proteins essential for B1 isoform-specific AF-1. The holometabolous insect EcR-B1 isoform might have acquired a novel interaction partner(s) in the holometabolous insect-specific (K/R)RRW motif.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

receptors [43,44]. Further studies, such as protein-protein interaction studies, will shed light on the transcriptional regulation mechanism of the EcR-B1 isoformspecific AF-1. In the present study, we discovered that the N-termini of holometabolous insect EcR-B1 isoforms contain a novel conserved (K/R)RRW motif, which is lacking in the direct-developing insects. The (K/R)RRW motif was found in the N-terminal region of the EcR-B1 isoform identified in Hymenoptera, the most basal lineage of holometabolous insects, suggesting that the (K/R)RRW motif was acquired in association with the evolution of holometabolous insects. Strikingly high conservation of the (K/R)RRW motif across holometabolous insect species suggests its importance in the AF-1 function of the holometabolous insect EcR-B1 isoform. Taken together, we hypothesize that the holometabolous insect EcR-B1 isoform acquired a novel transcriptional regulation mechanism that is mediated by the (K/R)RRW motif (Figure 14B).

Conclusions A comprehensive structural comparison of the N-terminal isoform-specific region in the A/B domain of the EcR-A and -B1 isoforms in insects revealed several microdomain structures that are conserved across insects or acquired/modified in a specific-subgroup of insects. These microdomains might be the essential structural basis for the EcR isoform-specific AF-1 transactivation functions by serving as binding sites for coregulatory proteins or as post-translational modification sites. The novel (K/R)RRW motif in the isoform-specific region of the holometabolous insect EcR-B1 isoform might provide additional interaction sites for co-regulatory proteins and mediate the holometabolous insectspecific regulation of the B1 isoform-specific AF-1 transactivation function. Identification of binding partners of the EcR isoform-specific regions and structural analysis of the interaction interface will provide further insight into the evolutionary importance of the structural modifications of the EcR isoform-specific AF-1 regions. Methods Animals

We collected 51 species of insects and non-insect arthropods for EcR cDNA cloning. Some of them were collected in the field and others were purchased. Their collection sites are listed in Additional file 3, Table S3. Franklinothrips vespiformis, Orius strigicollis and Eretmocerus eremicus were purchased from Arysta LifeScience (Tokyo, Japan). Colonies of Apis mellifera were purchased from Kumagaya bee farm (Saitama, Japan). Pupae of Osmia cornifrons were purchased from Mason Bee Research Institute (Sendai, Miyagi, Japan). Pupae of

Page 14 of 17

Nasonia vitripennis were purchased from Nosan corporation (Tsukuba, Ibaraki, Japan). Gryllus bimaculatus, Acheta domestica and Thermobia domestica were purchased from local pet stores. RNA isolation and reverse transcription

Tissues were dissected in ice-cold phosphate-buffered saline (138 mM NaCl, 3 mM Na 2 HPO 4 , 10 mM NaH 2 PO 4 , pH 7.4), and immediately homogenized in TRIzol reagent (Invitrogen, Carlsbad, CA) or stored in RNA later reagent (Ambion, Austin, TX) at -20°C until use. Total RNA was isolated with TRIzol reagent and treated with DNase I (TaKaRa, Shiga, Japan) at 37°C for 1 h. For standard reverse transcription-polymerase chain reaction (RT-PCR) experiments, total RNA (1 μg) was reverse transcribed using Superscript III reverse transcriptase (Invitrogen) with a 3’ rapid amplification of the cDNA end (3’ RACE) adapter packaged in the FirstChoice RLM-RACE kit as a primer. For 5’ RACE, total RNA was modified using the FirstChoice RLM-RACE kit and reverse transcribed using Superscript III reverse transcriptase with a random decamer as a primer. Cloning of cDNA fragments that encode the A/B and C domains of EcR isoforms

First, we cloned cDNA fragments encoding the C domain and the C-terminal region of the A/B domain of EcR isoforms by RT-PCR. Then, we performed 5’ RACE using the FirstChoice RLM-RACE kit (Ambion) to determine translation initiation site of the gene. Finally, we cloned cDNA fragments encoding the full length of the A/B domain of EcR isoforms. The obtained cDNA sequences were registered with GenBank. Detailed experimental procedures are described below. Generally, we performed nested or semi-nested PCR to amplify a desired cDNA fragment in the RT-PCR and the 5’ RACE experiments. First, PCR was performed using either KOD-plus polymerase (ToYoBo, Osaka, Japan) or TaKaRa LA Taq (TaKaRa), and nested PCR was performed using either TaKaRa Ex Taq (TaKaRa) or TaKaRa LA Taq. The amplified DNA fragments were subcloned into the plasmid vector pGEM-T easy (Promega, Madison, WI), and the nucleotide sequences were determined using ABI PRISM 3100 Genetic Analyzer (Applied Biosystems, Foster City, CA). The sequences of the degenerate/consensus primers used for the partial cDNA cloning are listed in Additional file 3, Table S3. The sequences of the primers used to amplify a cDNA encoding the full-length of the A/B domain of EcR isoforms are listed in Additional file 4, Table S2. Cloning of cDNA fragments encoding the EcR C domain

To obtain the partial cDNA fragments encoding the ecdysone receptor (EcR) C domain, we performed reverse transcription polymerase chain reaction (RT-

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

PCR) with primers designed on the basis of the highly conserved amino acid residues among the insect EcR C domain (EELCLVCGD and TCEGCKGFF for forward degenerate primers, MYMRRKCQE and VGMRAECV for reverse degenerate primers). Cloning of cDNA fragments encoding the C-terminal region of the A/B domain of EcR-B1 isoform from insects and noninsect arthropods expect for Hemiptera, Hymenoptera, and Trichoptera

To obtain the cDNA fragment encoding the C-terminal region of the A/B domain of EcR-B1 isoform from insects and non-insect arthropods other than Hemiptera and Hymenoptera, we performed RT-PCR with a forward consensus primer designed on the basis of the conserved nucleotide sequence among three Coleopteran EcR-B1s (GenBank accession numbers: CAL25731, CAA72296, and BAD99297) and two Crustacean EcRs (GenBank accession numbers: AAC33432 and BAF49033). The reverse gene-specific primers were designed on the basis of the obtained cDNA sequence encoding the C domain. To determine the translation initiation site, we performed 5’ RACE with gene-specific primers designed on the basis of the obtained partial cDNA encoding the A/B domain of EcR-B1 isoform. Cloning of cDNA fragments encoding the C-terminal region of the A/B domain of EcR-B1 isoform from Hymenoptera

We designed an alternative forward primer on the basis of the consensus nucleotide sequence conserved among three hymenopteran insects. Briefly, we performed a genomic tBlastn search with an amino acid sequence of the Pheidole megacephala EcR-B1 isoform (GenBank accession number: BAE47510) against the genomic DNA sequences of Nasonia vitripennis and Apis mellifera to predict the DNA sequences encoding the A/B domain of EcR-B1 isoform. We then designed two forward consensus primers on the basis of the conserved nucleotide sequence encoding the A/B domain of hymenopteran insect EcR-B1 isoform. We simultaneously designed a reverse consensus primer on the basis of the conserved nucleotide sequence encoding the C domain of EcR. To obtain the partial cDNA fragment encoding the C-terminal region of the A/B domain of EcR-B1 isoform from Hymenoptera, we performed RT-PCR with the forward and reverse consensus primers. Cloning of cDNA fragments encoding the C-terminal region of the A/B domain of EcR-B1 isoform from Hemiptera and Trichoptera

Because the forward degenerate primer described above did not work for hemipteran and trichopteran EcR-B1 gene, we performed 5’ RACE with reverse gene-specific primers designed on the basis of the obtained cDNA sequence encoding the C domain of EcR.

Page 15 of 17

Cloning of cDNA fragments encoding the C-terminal region of the A/B domain of EcR-A isoform

To obtain the cDNA fragment encoding the A/B domain of EcR-A isoform from arthropods other than Hymenoptera, we performed 5’ RACE with gene-specific primers designed on the basis of the obtained partial cDNA sequence encoding the C domain of EcR. To obtain cDNA fragment encoding EcR-A isoform from hymenopteran insects, we first designed a forward consensus primer encoding the C-terminal region of the A isoform-specific region. The forward consensus primer for hymenopteran EcR-A isoform was designed on the basis of the conserved nucleotide sequences of three hymenopteran EcR-A isoforms (GenBank accession numbers: XP_001602885, XP_394760, BAF79665, and BAE47509). We then performed RT-PCR with the forward and reverse consensus primers to obtain the partial cDNA encoding the C-terminal of the A/B domain of EcR-A isoform. 5’ RACE

To determine a translation initiation site of the gene, we performed 5’ RACE using the FirstChoice RLM-RACE kit (Ambion) according to the manufacturer’s protocol. The reverse gene-specific primers were designed based on the obtained cDNA encoding the isoform-specific region of EcR isoforms. Cloning of cDNA fragments encoding the full-length of the A/B domain of EcR isoforms

To obtain a cDNA fragment encoding the full-length A/ B domain of EcR isoforms, we designed a forward genespecific primer at the 5’ untranslated region (UTR) of the gene. Reverse gene specific primers were designed on the basis of the obtained cDNA corresponding to the C domain of EcR. To clone A. mellifera EcR isoforms, we designed a reverse gene-specific primer at the 3’ UTR of the gene on the basis of the predicted fulllength cDNA encoding EcR-A isoform (GenBank accession number: XM_394760). Sequence comparison

GenBank accession numbers of EcR-A and -B1 isoforms used for sequence comparison are listed in Additional file 1, Table S1 and Additional file 2, Table S2, respectively. The deduced amino acid sequences of the isoform-specific region of EcR-A and -B1 isoforms were aligned with the ClustalX 2.0 program [45] or MAFFT 6.0 program [46], and then manually adjusted. Prediction of potential SUMOylation sites was performed using SUMOsp 2.0 program [47]. The Sequence Logos and the consensus sequences were generated using the Geneious 4.8 program [48] to represent sequence conservation. All alignment files were supplemented in FASTA format (Additional file 5).

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Additional file 1: Table S1. Taxa used in the structural comparison of the EcR-A isoform-specific region. These files can be viewed with: CLUSTAL X. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-1040-S1.PDF ] Additional file 2: Table S2. Taxa used in the structural comparison of the EcR-B1 isoform-specific region. These files can be viewed with: CLUSTAL X. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-1040-S2.PDF ]

Page 16 of 17

7.

8.

9.

10.

11.

Additional file 3: Table S3. Degenerate/consensus primers used for cDNA cloning. These files can be viewed with: CLUSTAL X. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-1040-S3.PDF ]

12.

Additional file 4: Table S4. Gene-specific primers used for cDNA cloning and the collection sites of animals. These files can be viewed with: CLUSTAL X. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-1040-S4.PDF ]

14.

Additional file 5: Sequence alignments of the isoform-specific regions of the EcR-A and -B1 isoforms. Aligned sequences of the five types of the A isoform-specific regions and seven types of the B1 isoform-specific regions. These files can be viewed with: CLUSTAL X. Click here for file [ http://www.biomedcentral.com/content/supplementary/1471-2148-1040-S5.ZIP ]

13.

15. 16. 17. 18.

19. Acknowledgements This work was supported by the Program for Promotion of Basic Research Activities for Innovative Bioscience (PROBRAIN) and Grant-in-Aid from the Ministry of Education, Science, Sports. Authors’ contributions TW conceived of the study, collected specimen, carried out the molecular cloning studies and the sequence alignment and drafted the manuscript. TK helped to draft the manuscript. TH advised on the manuscript. All authors read and approved the final manuscript. Received: 21 August 2009 Accepted: 12 February 2010 Published: 12 February 2010 References 1. Yao TP, Forman BM, Jiang Z, Cherbas L, Chen JD, McKeown M, Cherbas P, Evans RM: Functional ecdysone receptor is the product of EcR and Ultraspiracle genes. Nature 1993, 366:476-479. 2. Talbot WS, Swyryd EA, Hogness DS: Drosophila tissues with different metamorphic responses to ecdysone express different ecdysone receptor isoforms. Cell 1993, 73:1323-1337. 3. Robinow S, Talbot WS, Hogness DS, Truman JW: Programmed cell death in the Drosophila CNS is ecdysone-regulated and coupled with a specific ecdysone receptor isoform. Development 1993, 119:1251-1259. 4. Truman JW, Talbot WS, Fahrbach SE, Hogness DS: Ecdysone receptor expression in the CNS correlates with stage-specific responses to ecdysteroids during Drosophila and Manduca development. Develoment 1994, 120:219-234. 5. Schubiger M, Wade AA, Carney GE, Truman JW, Bender M: Drosophila EcRB ecdysone receptor isoforms are required for larval molting and for neuron remodeling during metamorphosis. Development 1998, 125:2053-2062. 6. Davis MB, Carney GE, Robertson AE, Bender M: Phenotypic analysis of EcRA mutants suggests that EcR isoforms have unique functions during Drosophila development. Dev Biol 2005, 282:385-396.

20. 21.

22.

23.

24.

25. 26. 27. 28.

29.

30.

Choi YJ, Lee G, Park JH: Programmed cell death mechanisms of identifiable peptidergic neurons in Drosophila melanogaster. Development 2006, 133:2223-2232. Tan A, Palli SR: Ecdysone receptor isoforms play distinct roles in controlling molting and metamorphosis in the red flour beetle, Tribolium castaneum. Mol Cell Endocrinol 2008, 291:42-49. Saleh DS, Zhang J, Wyatt GR, Walker VK: Cloning and characterization of an ecdysone receptor cDNA from Locusta migratoria. Mol Cell Endocrinol 1998, 143:91-99. Cruz J, Mané-Padrós D, Bellés X, Martín D: Functions of the ecdysone receptor isoform-A in the hemimetabolous insect Blattella germanica revealed by systemic RNAi in vivo. Dev Biol 2006, 297:158-171. Wärnmark A, Treuter E, Wright AP, Gustafsson JA: Activation functions 1 and 2 of nuclear receptors: molecular strategies for transcriptional activation. Mol Endocrinol 2003, 17:1901-1909. Hu X, Cherbas L, Cherbas P: Transcription activation by the ecdysone receptor (EcR/USP): identification of activation functions. Mol Endocrinol 2003, 17:716-731. Mouillet JF, Henrich VC, Lezzi M, Vögtli M: Differential control of gene activity by isoforms A, B1 and B2 of the Drosophila ecdysone receptor. Eur J Biochem 2001, 268:1811-1819. Wang SF, Li C, Sun G, Zhu J, Raikhel AS: Differential expression and regulation by 20-hydroxyecdysone of mosquito ecdysteroid receptor isoforms A and B. Mol Cell Endocrinol 2002, 196:29-42. Anckar J, Sistonen L: SUMO: getting it on. Biochem Soc Trans 2007, 35:1409-1413. Boulikas T: Nuclear localization signals (NLS). Crit Rev Eukaryot Gene Expr 1993, 3:193-227. Grimaldi D, Engel MS: Evolution of the Insects. Cambridge: Cambridge University Press 2005. Lavery DN, McEwan IJ: Structure and function of steroid receptor AF1 transactivation domains: induction of active conformations. Biochem J 2005, 391:449-464. Kumar R, Litwack G: Structural and functional relationships of the steroid hormone receptors’ N-terminal transactivation domain. Steroids 2009, 74:877-883. Keightley MC: Steroid receptor isoforms: exception or rule?. Mol Cell Endocrino 1998, 137:1-5. Bender M, Imam FB, Talbot WS, Ganetzky B, Hogness DS: Drosophila ecdysone receptor mutations reveal functional differences among receptor isoforms. Cell 1997, 91:777-788. Giangrande PH, McDonnell DP: The A and B isoforms of the human progesterone receptor: two functionally different transcription factors encoded by a single gene. Recent Prog Horm Res 1999, 54:291-313. Giangrande PH, Kimbrel EA, Edwards DP, McDonnell DP: The opposing transcriptional activities of the two isoforms of the human progesterone receptor are due to differential cofactor binding. Mol Cell Biol 2000, 20:3102-3115. Gwóźdź T, Dutko-Gwóźdź J, Nieva C, Betańska K, Orłowski M, Kowalska A, Dobrucki J, Spindler-Barth M, Spindler KD, Ozyhar A: EcR and Usp, components of the ecdysteroid nuclear receptor complex, exhibit differential distribution of molecular determinants directing subcellular trafficking. Cell Signal 2007, 19:490-503. Wilson VG, Rangasamy D: Intracellular targeting of proteins by sumoylation. Exp Cell Res 2001, 271:57-65. Zhao J: Sumoylation regulates diverse biological processes. Cell Mol Life Sci 2007, 64:3017-3033. Ulrich HD: The SUMO system: an overview. Methods Mol Biol 2009, 497:3-16. Gill G: Post-translational modification by the small ubiquitin-related modifier SUMO has big effects on transcription factor activity. Curr Opin Genet Dev 2003, 13:108-113. Nocula-Ługowska M, Rymarczyk G, Lisowski M, Ozyhar A: Isoform-specific variation in the intrinsic disorder of the ecdysteroid receptor N-terminal domain. Proteins 2009, 76:291-308. Takimoto GS, Tung L, Abdel-Hafiz H, Abel MG, Sartorius CA, Richer JK, Jacobsen BM, Bain DL, Horwitz KB: Functional properties of the N-terminal region of progesterone receptors and their mechanistic relationship to structure. J Steroid Biochem Mol Biol 2003, 85:209-219.

Watanabe et al. BMC Evolutionary Biology 2010, 10:40 http://www.biomedcentral.com/1471-2148/10/40

Page 17 of 17

31. Lavery DN, McEwan IJ: The human androgen receptor AF1 transactivation domain: interactions with transcription factor IIF and molten-globule-like structural characteristics. Biochem Soc Trans 2006, 34:1054-1057. 32. Lavery DN, McEwan IJ: Structure and function of steroid receptor AF1 transactivation domains: induction of active conformations. Biochem J 2005, 391:449-64. 33. McEwan IJ, Lavery D, Fischer K, Watt K: Natural disordered sequences in the amino terminal domain of nuclear receptors: lessons from the androgen and glucocorticoid receptors. Nucl Recept Signal 2007, 5:e001. 34. Krasowski MD, Reschly EJ, Ekins S: Intrinsic disorder in nuclear hormone receptors. J Proteome Res 2008, 7:4359-4372. 35. Dunker AK, Cortese MS, Romero P, Iakoucheva LM, Uversky VN: Flexible nets. The roles of intrinsic disorder in protein interaction networks. FEBS J 2005, 272:5129-5148. 36. Haynes C, Oldfield CJ, Ji F, Klitgord N, Cusick ME, Radivojac P, Uversky VN, Vidal M, Iakoucheva LM: Intrinsic disorder is a common feature of hub proteins from four eukaryotic interactomes. PLoS Comput Biol 2006, 2: e100. 37. Dela Cruz FE, Kirsch DR, Heinrich JN: Transcriptional activity of Drosophila melanogaster ecdysone receptor isoforms and ultraspiracle in Saccharomyces cerevisiae. J Mol Endocrinol 2000, 24:183-191. 38. Beatty J, Fauth T, Callender JL, Spindler-Barth M, Henrich VC: Analysis of transcriptional activity mediated by Drosophila melanogaster ecdysone receptor isoforms in a heterologous cell culture system. Insect Mol Biol 2006, 15:785-795. 39. Cress WD, Triezenberg SJ: Critical structural elements of the VP16 transcriptional activation domain. Science 1991, 251:87-90. 40. Regier JL, Shen F, Triezenberg SJ: Pattern of aromatic and hydrophobic amino acids critical for one of two subdomains of the VP16 transcriptional activator. Proc Natl Acad Sci USA 1993, 90:883-887. 41. Ruden DM: Activating regions of yeast transcription factors must have both acidic and hydrophobic amino acids. Chromosoma 1992, 101:342-348. 42. Hulboy DL, Lozano G: Structural and functional analysis of p53: the acidic activation domain has transforming capability. Cell Growth Differ 1994, 5:1023-1031. 43. Kistanova E, Dell H, Tsantili P, Falvey E, Cladaras C, Hadzopoulou-Cladaras M: The activation function-1 of hepatocyte nuclear factor-4 is an acidic activator that mediates interactions through bulky hydrophobic residues. Biochem J 2001, 356:635-642. 44. Folkers GE, van Heerde EC, Saag van der PT: Activation function 1 of retinoic acid receptor ®2 is an acidic activator resembling VP16. J Biol Chem 1995, 270:23552-23559. 45. Thompson JD, Higgins DG, Gibson TJ: CLUSTAL W: improving the sensitivity of progressive multiple sequence alignments through sequence weighting, position specific gap penalties and weight matrix choice. Nucl Acids Res 1994, 22:4673-4680. 46. Katoh K, Toh H: Recent developments in the MAFFT multiple sequence alignment program. Brief Bioinform 2008, 9:286-298. 47. Ren J, Gao X, Jin C, Zhu M, Wang X, Shaw A, Wen L, Yao X, Xue Y: Systematic study of protein sumoylation: Development of a site-specific predictor of SUMOsp 2.0. Proteomics 2009, 9:3409-3412. 48. Drummond AJ, Ashton B, Cheung M, Heled J, Kearse M, Moir R, StonesHavas S, Thierer T, Wilson A: Geneious v4.8. http://www.geneious.com/. doi:10.1186/1471-2148-10-40 Cite this article as: Watanabe et al.: Structural diversity and evolution of the N-terminal isoform-specific region of ecdysone receptor-A and -B1 isoforms in insects. BMC Evolutionary Biology 2010 10:40.

Submit your next manuscript to BioMed Central and take full advantage of: • Convenient online submission • Thorough peer review • No space constraints or color figure charges • Immediate publication on acceptance • Inclusion in PubMed, CAS, Scopus and Google Scholar • Research which is freely available for redistribution Submit your manuscript at www.biomedcentral.com/submit