J Mol Evol (1998) 47:211–221
© Springer-Verlag New York Inc. 1998
Shaping of Drosophila Alcohol Dehydrogenase Through Evolution: Relationship with Enzyme Functionality Sı´lvia Atrian,1 Luis Sa´nchez-Pulido,2 Roser Gonza`lez-Duarte,1 Alfonso Valencia2 1 2
Departament de Gene`tica, Facultat de Biologia, Universitat de Barcelona, Avenida Diagonal 645, E-08028 Barcelona, Spain Protein Design Group, Centro Nacional de Biotecnologı´a—CSIC, Cantoblanco, E-28049 Madrid, Spain
Received: 11 August 1997 / Accepted: 30 December 1997
Abstract. Drosophilidae is a large, widely distributed family of Diptera including 61 genera, of which Drosophila is the most representative. Drosophila feeding is part of the saprophytic trophic chain, because of its dependence upon decomposing organic matter. Many species have adapted to fermenting fruit feeding or to artificial (man-made) fermentation habitats, such as cellars and breweries. Actually, the efficient exploitation of niches with alcohols is considered one of the reasons for the worldwide success of this genus. Drosophila alcohol dehydrogenase (ADH), a member of the short-chain dehydrogenase/reductase family (SDR), is responsible for the oxidation of alcohols, but its direct involvement in fitness, including alcohol tolerance and utilization, gives rise to much controversy. Thus, it remains unclear whether ADH differentiation through evolution is somehow associated with natural adaptation to new feeding niches, and thus maybe to Drosophila speciation, or if it is a simple reflection of neutral divergence correlated with time separation between species. To build a hypothesis which could shed light on this dilemma, we analyzed the amino acid variability found in the 57 protein ADH sequences reported up to now, identified the taxonspecific residues, and localized them in a threedimensional ADH model. Our results define three regions whose shaping has been crucial for ADH differentiation and would be compatible with a contribution of ADH to Drosophila speciation.
Correspondence to: S. Atrian; e-mail:
[email protected]
Key words: Drosophila alcohol dehydrogenase — Short-chain dehydrogenase/reductases — Protein sequence analysis — SequenceSpace — Molecular evolution — Speciation — Function specificity — Active site
Introduction Drosophila Alcohol Dehydrogenase Drosophilidae is a large, highly diverse, widely distributed family of Diptera (Invertebrata, Arthropoda, Insecta, Diptera) including 61 genera, of which Drosophila is the most representative. The approximate differentiation time for the genus Drosophila has been estimated as 40 My (Kwiatowski et al. 1994; Russo et al. 1995). Since then, more than 1,300 species have spread to different geographic areas, including deserts, swamps, savannas, woodlands, and cities, from sea level to high mountains (Wheeler 1981, 1986). Drosophila feeding is part of the saprophytic trophic chain, because of its dependence upon decomposing organic matter, mainly in the larval stage (David et al. 1983). Thus, the efficient exploitation of niches such as fermenting fruit and rotting roots or leaves, of which alcohols are clear components, is considered one of the reasons for the worldwide success of this genus (Throckmorton 1975). Furthermore, many non-phylogenetically related species have also adapted to high-alcohol content habitats, such as breweries and cellars. The biochemical pathway leading to the energetic assimilation or detoxification of alco-
212
hols, the main product of fermentation, begins with their oxidation to aldehydes or ketones by alcohol dehydrogenase (ADH). It is clear that flies require functional ADH to survive exposure to ethanol and that those collected in places with high alcohol concentrations are usually more tolerant to ethanol under laboratory conditions (Chambers 1991). However, the amount of ADH found in most Drosophila species and the extent of its activity do not directly correlate with their alcohol tolerance and ability to breed on decomposing fruits (Merc¸ot et al. 1994). It is unlikely that these parameters are a limiting factor in new habitat colonization, unless we consider alcoholicbeverage environments. Therefore, it is unclear whether Drosophila ADH differentiation is simply a consequence of natural variability fixed by time divergence among species or if it is somehow related to fitness parameters and natural selection. Drosophila ADH (EC 1.1.1.1)—a highly abundant protein in larvae and adults—is a nonmetalloenzyme, active as a dimer of two identical subunits of about 255 amino acids, present in all Drosophila species with extremely coincident biochemical features (Atrian and Gonza`lez-Duarte 1982). It is by far the Drosophila gene/ enzyme system which has gathered the highest amount of genetic and biochemical information (for reviews see Lindsley and Zimm 1992; Heinstra 1993). From a structural and evolutionary point of view, Drosophila ADH is a member of the short-chain dehydrogenase/reductase (SDR) family (Jo¨rnvall et al. 1981). This heterogeneous group of proteins, although proposed in 1981 (Jo¨rnvall et al. 1981), was not properly characterized as a protein family until recently (Persson et al. 1991; Jo¨rnvall et al. 1995), due mainly to a low degree of residue identity (15–30%) among its members. These are mostly steroid and prostaglandin dehydrogenases/reductases of prokaryotes and mammalian organisms. However, as a consequence of the early origin and radial divergence of the family, SDR proteins have also been isolated in other taxonomic groups, including seed plants, Ascomycetes, and Invertebrates, and with different biological functions, e.g., redox reactions of other substrates, and structural or reservoir roles. Actually, Drosophila ADH is one of these nonrepresentative SDR members, independent also of the medium-chain (MDR) ADH lineage (Jo¨rnvall et al. 1981). Drosophila ADH, the ADH of two nonDrosophilidae Diptera—Sarcophaga peregrina (Horio et al. 1996) and Ceratitis capitata (SwissProt Nos. P48814–P48815)—and the paralogous fat body storage proteins k25 of Sarcophaga (Matsumoto et al. 1985) and P6 of Drosophila (Rat et al. 1991) are the only SDR proteins isolated in invertebrates. It is worth noting that recently genes derived from ADH duplication have been reported as active in different species of Drosophila: the ⌿-pseudogene in the repleta group (Begun et al. 1997) and the Adhr gene in the Sophopora subgenus (Brogna
and Ashburner 1997). In both cases, the function of the paralogous protein remains unknown. ADH divergence and evolution in the Drosophilidae have been broadly investigated at DNA level as a paradigm of the polymorphisms of genes in populations and the evolutionary story of Drosophila coding regions (e.g., Albalat et al. 1994; Russo et al. 1995). This amount of data contrasts with the scarcity of studies on the evolution and divergence of the Drosophila ADH protein (Atrian et al. 1992; Dorit and Ayala 1995), which could give more clues about a possible linkage between ADH features and biological fitness. One reason for this is the lack of the three-dimensional structure of Drosophila ADH, which precludes any detailed study of the structure/function relationship and, thus, any possibility of direct correlation between protein variability and adaptive trends of the molecule. Nevertheless, the 57 Drosophila ADHs constitute a unique material for studying proteins: a large set of 50 orthologous sequences, homologous in different genomes; and 7 paralogous pairs, homologous inside the same genome. All of them are very close in evolution and identical in biochemical function, but they show enough amino acid diversity to analyze evolutionary trends. This situation is opposite to that of all other SDR members, for which only one representative of each protein family is known, and they share such a low level of identity that alignments are often ambiguous and include many gaps (Jo¨rnvall et al. 1995). The kind of information given by ADH sequences is also opposite to that available for most of the protein families, usually composed of paralogous relatives carrying out well-differentiated cellular functions, e.g., SH2 domains in several proteins (Casari et al. 1995).
Methods Available for the Detection of Taxon-Specific Residues Several methods have been derived recently to extract the positions in multiple sequence alignments specific to the main divisions of the protein families. SequenceSpace (Casari et al. 1995) was based on the simultaneous usage of principal-component analysis for clustering protein sequences and for the detection of residues with maximum information on the protein classification. The same goal was approached by Livingstone and Barton (1993) using various definitions of conservation to analyze the differences between subfamilies and groups of sequences of a protein family. In their analysis they considered as specific positions those in which one or more subfamilies have a completely conserved residue, even if, at that particular position, other subfamilies did not contain any conserved residue. Lately a different implementation of the same idea [evolutionary trace method (Lichtarge et al. 1996)] analyzed the protein family tree manually, in search of conserved residues specific to the
213
most populated branches of the tree. In this case they considered only positions in which all the subfamilies contained a completely conserved residue. This additional criterion produced a strong reduction in the sensitivity of the algorithm. Andrade et al. (1997) implemented a self-organizing map algorithm for the analysis of protein families. Multiple sequence alignments were coded as linear vectors and clustered in groups by comparing the different vectors in an iterated training procedure. The final relations between sequences are represented by their topological positions in a bidimensional map of linear vectors. Positions in these vectors were scanned to detect group-specific residues by the difference in their values corresponding to a single position in the underlying multiple sequence alignment. So far no systematic comparison of these different methods has been carried out, and only partial studies have been done in well-known protein families (Pazos et al. 1997). In these test cases all the available methods were in accordance in important aspects, such as the detection of a similar number and distribution of the main subfamilies. This observation holds only when the adequate level of resolution is selected. Important differences were also noted, for example, in the treatment of subfamilies represented by a small number of sequences. Given that the Lichtarge et al. method (1996) was based on the identification of completely conserved residues, divergent sequences can distort the analysis. This is a drawback that makes this method unstable toward small changes in the underlying alignments. The method of Andrade et al. (1997) was able to reproduce automatically the predictions obtained by the manual method of Lichtarge et al. and it turned out to be more stable to the perturbations created by the inclusion of distantly related sequences. Finally, SequenceSpace was, on the whole, more stable in critical cases: it was able to extract additional information pinpointing interesting regions, e.g., residues shared between different subfamilies. This makes SequenceSpace a better choice for the analysis of difficult cases. SequenceSpace also differs from the other methods because the relative contribution of different sequences to the determination of the specific residues is weighted by their distance from other proteins of the family. Therefore we have followed the SequenceSpace approach to study the amino acid variability of the 57 Drosophila ADH sequences. From our results we aim, first, to confirm the conservation of residues which have been proposed as responsible for catalysis, mainly from the analysis of the ADH of one species, D. melanogaster, and, second, to unravel information about the positions occupied by different residues in the ADH of different taxonomic groups, which we denominate taxon-specific residues. In other protein families (Casari et al. 1995; Lichtarge et al. 1997; Pazos et al. 1997) it has been
Table 1.
Drosophilidae speciesa
Genus Zaprionus
Genus Scaptomyza
Z. tuberculatus
S. albovittata Genus Drosophila
Subgenus Scaptodrosophila Group victoria D. lebanonensis Subgenus Sophophora Group melanogaster D. melanogaster (AdhS/AdhF)b D. simulans D. erectac D. orenac D. teissieric D. tsacasic D. yakubac D. mauritianad D. sechelliad Group obscura D. ambigua D. subobscura D. mirandae D. persimilise D. pseudoobscurae D. guanched D. madeirensisd Group willistoni D. willistonie Subgenus Engiscaptomyza D. crassifemurd
Subgenus Drosophila Group immigrans D. immigrans Group repleta D. buzzatii D. hydei (Adh1 & Adh2)f D. arizonaee D. mayaguanae D. mettleri e D. mojavensise (Adh1 & Adh2)f D. mulleri (Adh1 & Adh2)f D. navojoae (Adh1 & Adh2)f D. wheeleri e Group virilis D. lummei D. montana (Adh1 & Adh2)f D. virilis (Adh1 & Adh2)f D. americanae D. borealise e D. flavomontana e D. lacicola (Adh1 & Adh2)f D. texanae Hawaiian speciesd D. adiastola D. affinidisjuncta D. differens D. grimshawi D. heteroneura D. mimica D. nigra D. picticornis D. planitibia D. silvestris
a
According to Wheeler (1981). AdhS/AdhF are allozymic forms (orthologous pairs). c African endemism. d Island endemism. e Nearctic species. f Adh1 and Adh2 are isozymic forms (paralogous pairs). b
shown that these residues are localized preferentially in regions of the protein three-dimensional structure involved in functional specificity. Finally, the possible consequences of the localization of the taxon-specific residues for our understanding of the role of ADH in the evolution of Drosophila are discussed.
Materials and Methods Drosophila ADH Sequences All available Drosophilidae ADH sequences (57) were retrieved from the SwissProt (Bairoch and Boeckmann 1993) or TREMBL (Bairoch
214
Fig. 1. Amino acid sequence of five Drosophila alcohol dehydrogenases, representatives of the five analyzed phylogenetic groups. Subgenus Drosophila: aff., D. affinidisjuncta, belonging to the Hawaiian group; vir1, D. virilis isozyme 1, of the virilis group; and hyd1, D. hydei, isozyme 1, of the repleta group. Subgenus Sophophora: sub, D.
subobscura, of the obscura group; and mels, D. melanogaster, allelozyme slow. Black boxes correspond to fully conserved residues and gray boxes to partially conserved residues. The complete alignment of the 57 sequences is available at the web site http://www.cnb.uam.es/ ∼cnbprot/dropso.html.
and Apweiler 1996) database. All Drosophilidae species whose ADH is reported in this work are shown in Table 1 according to their taxonomic situation (Wheeler 1981). Table 1 also includes basic information on geographic distribution and type of protein homology. The species belong to three genera, Zaprinous, Scaptomyza, and Drosophila. In the Sophophora subgenus of Drosophila there are representatives of the melanogaster group, the obscura group, and the willistoni group. For D. melanogaster the two main allellomorphs, Adhfast and Adhslow, are reported and considered as orthologous forms, because only one of them is present per haploid genome. The Drosophila subgenus of Drosophila is presented by species belonging to four groups: immigrans, repleta, virilis, and the Hawaiian endemics, while we consider also as genus Drosophila. The paralogous pairs of ADH sequences belong to repleta and virilis members, but they are isozymic proteins with the same function, differing only in the developmental stage of expression. For this reason they have been included in this study. We follow the classical consideration of D. lebanonensis as a Drosophila species— victoria group—although some authors classify it as Scaptodrosophila (Wheeler 1981, 1986; Grimaldi 1990). D. crassifemur is the only representative of the Engiscaptomyza subgenus.
and the recognition of specific residues. In particular, distantly related sequences contributed less than more related ones.
Three-Dimensional (3-D) Localization of ADH Residues Drosophila ADH residues were represented in their corresponding structural positions on the 3␣,20-hydroxysteroid dehydrogenase structure (Ghosh et al. 1994), the most related SDR structure available. A molecular model of ADH was generated with the WHATIF package (Vriend 1990) following the sequence alignment available at our web site http://www.cnb.uam.es/∼cnbprot/dropso.html. We are interested only in the general topological features, and not in the description of molecular details that are impossible to model at this level of sequence similarity. Therefore, in the following figures and text the position of the different ADH residues are discussed after localizing them on the 3␣,20-hydroxysteroid dehydrogenase framework structure. Highly divergent regions between the two proteins are shown solely for completeness, with no intention of representing a structural prediction.
Sequence Analysis Method Results Protein alignments and the corresponding dendrograms were calculated with the ClustalW package (Higgins et al. 1996). ClustalW implements the neighbor-joining method (Saitou and Nei 1987). The stability of different branches with respect to different choices of subsets of residue positions was checked by bootstrapping experiments (Felsenstein 1985). The SequenceSpace algorithm (Casari et al. 1995) was used to classify sequences in groups and, at the same time, determine the positions of the multiple sequence alignment responsible for the sequence classification. A convenient vector representation and a classical component analysis were the main features of this approach. Taxonspecific residues were detected by their degree of conservation and specificity in different groups of sequences. It is important to note that not all the sequences contributed equally to the division in subfamilies
Drosophila ADH Amino Acid Sequence Composition Figure 1 shows five Drosophila ADH sequences, which are the most representative members from each major taxonomic group: D. melanogaster, D. subobscura, D. virilis, D. hydei, and D. affinidisjuncta. The complete alignment and identification of all 57 Drosophila ADH protein sequences are available at http://www.cnb. uam.es/∼cnbprot/dropso.html. The length of the ADH monomer can take three possible values: 255 for the
215
sequences of the protein family according to the pairwise distances derived from the given multiple sequence alignment. This clustering is referred to as the Protein Space. SequenceSpace applied to the alignment of the 57 ADH produced five well-defined groups in the Protein Space shown in Fig. 3, while the Drosophila species corresponding to the ill-represented groups were viewed as scattered points. This division is very similar to that obtained with standard trees derived from sequence comparisons, which, in turn, follows the conventional taxonomy of Drosophilidae (Throckmorton 1975). Sequences contributed to their group separation according to their degree of similarity with other groups. This is shown by the position of sequences in the same group along the axis that best separates this group from the rest. Consequently, the contribution of different sequences to the characterization of taxon-specific residues is weighted by their similarity to other sequences. In the first three principal axes, x1 defines a neat separation between Drosophila subgenus and Sophophora subgenus species along the x2 direction (Fig. 3a). The third and fourth axes (x3x4) allow the separation of the three Drosophila groups—virilis, repleta, and the Hawaiian species (Fig. 3b)—while the Sophophora species remained clustered. Representation in higher dimensions (x5x6 axes) is needed to differentiate further melanogaster and obscura group members (Fig. 3c), and in this case Drosophila groups are also well split. Fig. 2. Dendrogram for Drosophila alcohol dehydrogenase obtained with the program ClustalW, which implements the neighbor-joining method of Saitou and Nei (1987). The main taxonomic groups and subgenera are indicated by thick vertical bars. The six species corresponding to ill-represented taxons are in the outer branches of the tree: D. willistoni, in the Sophophora subgenus, and D. lebanonensis, Zaprionus, D. immigrans, and the doublet D. crassifemur–Scaptomyza.
members of the melanogaster group, 254 for D. lebanonensis, and 253 for the remaining species. This is due to the fact that the species in the melanogaster group show an insertion of two amino acids at the N-terminal end (FT) and D. lebanonensis keeps its initial methionine in the mature form. Of 253 residues, 129 are strictly conserved, which represents 51% of the ADH polypeptide, a considerably low percentage, taking into account that we are dealing with species mainly within the same genus. No significant differences are found between ADH segments coded by different exons. Among highly conserved residues were G14, G16, G17, and G19 and the so-called ‘‘SDR catalytic triad,’’ S139, Y152, and K156 (Jo¨rnvall et al. 1995). Their significance is discussed later. The dendrogram derived from our multiple sequence alignment is shown in Fig. 2. Analysis of ADH with SequenceSpace Analysis of the Protein Space. The first step in the analysis of the SequenceSpace results is the clustering of the
Analysis of the Amino Acid Space. In the second step in the application of SequenceSpace the produced clustering of sequences is analyzed in terms of the amino acids and positions more informative (correlated) with the clustering. This step is what we call the Amino Acid Space. Its analysis (Figs. 3d–f) allows the identification of the residues distinguishing each of the defined groups. In x1x2, the direction of the first principal axis corresponds to Drosophila ADH consensus, thus the most distant points from the origin represent the most conserved residues (Fig. 3d). Furthermore, residues in the upper and lower corners in Fig. 3d, pointing to the Sophophora and the Drosophila direction, respectively, are those responsible for the differentiation of the ADH in these two main subgenera. Figure 4a shows these subgenus-specific positions, together with the specific amino acid in each case. When localized in the corresponding position of the 3-D structure of 3␣,20-hydroxysteroid dehydrogenase, they define two regions in the ADH monomer: the first around the ACTIVE CORE of the enzyme (Jo¨rnvall et al. 1995) and the second situated at a lower corner, which we call the MONOMER BASE, in an area with no assigned functionality up to now. Variation between aligned sequences at the MONOMER BASE is highly conservative and involves hydrophobic residues: four of five leucines in Sophophora species changed to isoleucine, alanine, or proline in Drosophila members. Positions at the ACTIVE CORE show a more heterogeneous
216
Fig. 3. SequenceSpace analysis of Drosophila alcohol dehydrogenases. Principal-component analyses of the protein sequences are shown on (a) the 1–2 axis, (b) the 3–4 axis, and (c) the 5–6 axis. The first plot splits the two main subgenera. Drosophila and Sophophora, and each of them is later separated into several groups. Simultaneous analysis of the taxon-specific residues is shown for the same set of axes in d, e, and f, respectively. On the dimension of axis 1 (horizontal axis) in d, conserved residues are on the right and variable residues on the
left. The axis 2 (vertical axis) direction separates residues according to their amount of information on the split of the two main groups: up for the Sophophora species and down for the Drosophila species. Other axes are used for detecting the group-specific residues characteristic of each of the groups. In a some sequences appear not to be restricted to a definite group. The six isolated spots, from top to bottom, correspond to D. willistoni, D. lebanonensis, Zaprionus, D. immigrans, and the doublet D. crassifemur–Scaptomyza.
pattern of alternation: although they are mostly conservative changes concerning nonpolar residues, valine to alanine, leucine, or isoleucine, some nonconservative substitutions are found, i.e., asparagine/serine in position 161 and lysine/leucine, 206. Once the subgenus-specific residues are discarded, different Drosophila groups—repleta, virilis, and Hawaiian—are best defined by positions situated at the end of the directions shown in Figs. 3e and 3f. In these positions (Fig. 4b), hydrophilic residues are more represented than among subgenus-specific residues. They are clustered mainly around two exposed protein regions, the above-mentioned MONOMER BASE and a new one, where most of them concentrate, referred to as the ACTIVE LOOP, which is also a hypothesized catalytic element of the enzyme, as it would delimit and close the entrance to the central reactive pocket (Jo¨rnvall et al. 1995) (Fig. 4b). Lysine and serine accumulate among the Hawaiianspecific residues. On the contrary, virilis-specific residues show no particular composition, but they are more concentrated in the ACTIVE LOOP, and the repleta-specific residues show the most scattered distribution. Additionally, once the subgenus-specific residues have been discarded, different Sophophora groups— obscura and melanogaster—are defined by residues shown in Fig. 4c. These are again located mainly at the ADH ACTIVE LOOP, and some at the MONOMER BASE, which is consistent with their more hydrophilic character and coincident with the specific residues for the Drosophila groups. Substitutions involving T, S, and Q are overrepresented (Fig. 4c).
Discussion The 57 Drosophilidae ADH sequences offer a unique opportunity to analyze the features of this SDR enzyme, the correlation between ADH sequences fixed by means of evolutionary forces, such as natural selection and genetic drift, and some structural/functional traits of the molecule. Two antagonistic forces act on proteins to modulate primary structure variability. On the one hand, functional requirements are powerful constraints to maintain a structure compatible with its biological role. On the other hand, evolutionary changes must produce sequence diversity suitable for at least slight differentiation of the molecule according to speciation and selective adaptation to novel conditions. Analysis of the 57 ADH sequences provided us with enough information about these two phenomena. Positions S139, Y152, and K156, the residues involved in the SDR catalytic triad (Jo¨rnvall et al. 1995), appear to be highly conserved among the 57 Drosophila ADHs. This supports their leading role in the enzymatic mechanism of ADH. Involvement of Drosophila ADH Y152 and K156 in the catalysis, first predicted by sequence conservation (Jo¨rnvall et al. 1981) and chemical modification (Krook et al. 1992; Prozorovski et al. 1992), was later confirmed by the following site-directed mutagenesis results. The Y152 hydroxyl group requirement to mediate the proton/hydride transfer to the coenzyme is consistent with the null activity of Y152F (Albalat et al. 1992). Y152E, and Y152Q (Cols et al. 1993). Y152H and Y152C, reported by Chen et al. (1993), are
217
Fig. 4. Group-specific residues of the Drosophila alcohol dehydrogenases and localization in a related SDR enzyme structure. Tables list the specific residues, whose localization in the 3-D model is shown with balls in the corresponding diagrams. a Subgenus-specific residues. b Group-specific residues of subgenus Drosophila species: Hawaiian (B1), white balls; virilis (B2), gray balls; and repleta (B3), black balls. c Group-specific residues of subgenus Sophophora species: obscura (A1), gray balls; and melanogaster (A2), black balls. Regions of se-
quence similarity between ADH and the 3-D framework of 3␣,20hydroxysteroid dehydrogenase are shown as gray ribbons. Black ribbons, indicate regions on which both structures will largely disagree. Catalytic Y152 and K156 are also represented to locate the catalytic center of the enzyme. Residues that determine two taxonomic groups are not represented twice. More detailed information about the localization of the different residues is available at http://www.cnb.uam.es/ ∼cnbprot/dropso.html.
also inactive or nearly dead enzymes. In accordance with the need for an alkaline lateral chain in position 156, to stabilize the coenzyme or/and lower the pKa of the Y152 hydroxyl group, the mutant K156I has been shown to be inactive (Cols et al. 1993; Chen et al. 1993), while K156R retained 2.2% of its activity. Surprisingly, a glutamic acid at position 156 of D. virilis ADH has been reported (Nurminsky et al. 1996), although this enzyme exhibits a reaction behavior similar to that of other Drosophila species (Juan and Gonza`lez-Duarte 1980). Recently we have also shown the null activity of the S139A and S139C enzymes (Cols et al. 1997). Conserved positions other than the catalytic triad can be easily identified along the ADH monomer. The typical ‘‘G-box’’ of the coenzyme-binding domain of all NAD+/NADP+dependent dehydrogenases/reductases, located at the N terminus, has also been analyzed by site-directed mutagenesis (Chen et al. 1990; Ribas de Pouplana and Fothergill-Gilmore 1994). Furthermore, the central region of the molecule—approximately from position 100 to position 170—includes two long ␣-helices which are believed to constitute the monomer–monomer interface for ADH dimerization, due to the hydrophobic side chains of their outer-side residues (Chenevert et al. 1996). Indeed, all other known SDR, being dimers or tetramers in the
active form, share a similar dimerization interface architecture (Jo¨rnvall et al. 1995). Besides, the fact that the two catalytic residues Y152 and K156 are part of one of these helices would explain the lack of activity of the ADH monomer, as the hydrophobic residues on the outside of the helix exposed to the solvent would distort the correct positioning of Y152 and K156 at the active site (Chenevert et al. 1996). External loops, or even the Cterminal tail of the polypeptide (Albalat et al. 1995), may be involved in catalysis, as they could contribute to closing the active pocket of the enzyme. These regions are also highly conserved and concentrate in segments 175– 200 and the last 10 residues (Fig. 1). Once the positions restricted by functional requirements have been considered, and taking into account that we are dealing with sequences with the same biochemical function, we can assume that the amino acid variability will correspond to positions that allow variability between taxa without disturbing the catalytic mechanism of the enzyme. When localizing the taxon-specific residues, our results were rather surprising, as these appear to be clearly clustered around three circumscribed protein regions. Drosophila/Sophophora-specific positions (Fig. 4a, i.e., those correlated with the ancient split of subgenera, appear to be gathered mainly in the ACTIVE CORE,
218
Fig. 4.
Continued.
219
strengthen the hypothesis that the MONOMER BASE may have some functional and/or structural significance, or at least it did at the time of the differentiation of these groups of species, between 40 and 25 My ago. Amino acids in these positions in Sophophora species are also conserved with respect to SDR sequences, consistent with a strong constraint acting in this area. Thus, either it keeps restrictions to free variability from an ancient SDR tetrameric precursor or it has some kind of functional/ structural present constraint which would be worth investigating.
Evolutionary Significance of the 3-D Localization of Taxon-Specific Residues
Fig. 5. The 3-D framework of the 3␣,20-hydroxysteroid dehydrogenase plot showing regions referred to in the text as the ACTIVE CORE, ACTIVE LOOP, and MONOMER BASE, where the Drosophila ADH taxonspecific residues are clustered. Black ribbons indicate regions on which both structures will largely disagree given the low level of sequence similarity.
with some in the MONOMER BASE (Fig. 5). It is plausible that replacements in the central area cause slight differences in the ADH catalytic properties, although they do not impair the main enzymatic activity. Some of the changes between subgenera involve nonconservative substitutions (N/S, K/L), which is consistent with a region allowing structural modulation. The second area, the MONOMER BASE, has no specific role assigned up to now in Drosophila ADH, nor has it been the subject of specific attention, and it is discussed later. Group-specific residues within subgenera have a similar clustered distribution pattern but are located in outer parts of the molecule, although a few positions also map near the ACTIVE CORE. One of the clusters is in the putative ACTIVE LOOP, which would close the central reactive cavity once the substrate and coenzyme were inside. Then another functionally significant region of the molecule seems to allocate group-specific residues. The second region is again at the MONOMER BASE, supporting a possible significance of this area. Thus, more recent differentiation events seem to imply more peripheral positions of ADH. It is interesting that a new location, the MONOMER BASE, showing clear significance in molecular diversification, has been identified in Drosophila ADH. This region, formed by the antiparallel contact between an ␣helix and a -sheet (Ghosh et al. 1994), corresponds to part of the dimer–dimer interface of the SDR proteins that are active as tetramers (Fig. 6, at http://www.cnb. uam.es/∼cnbprot/dropso.html). This is not the case for ADH, which is a dimer with no specific subunit interaction involving this surface, which besides, is structurally different from that of other SDR. However, our results
The pattern drawn by the localization of the taxonspecific residues could fit alternate models, which we hypothesize would correspond to different evolutionary situations. On one hand, the residues can locate at sparse positions in the molecule, indicating a random ADH divergence in Drosophilidae. That is, the fixation of natural nucleotide substitutions allowed the presence of different amino acids in noncritical positions, which would not be expected to share any defined area of the protein 3-D structure. On the other hand, a localization of taxonspecific residues in clustered positions of the structure would indicate that some regions have accumulated the variability characteristic of different taxons. The easiest interpretation is that this clustering is related to structural and/or functional reasons. If these residues locate in functional regions, then a correlation between the process of differentiation and the modulation of protein activity can be argued. In our case, this would support the idea that some adaptive traits rendered by ADH have been associated with environmental adaptation, reproductive isolation, and thus, maybe new species generation. We have described how ADH taxon-specific residues are concentrated in three regions, two of which are already reported to be involved in enzymatic catalysis. Therefore they behave more accordingly to the second hypothesis above and we hypothesize that Drosophila ADH taxon-specific residues, which are correlated with the spread, differentiation, and evolution of this enzyme, may produce slight functional differences among ADH and, therefore, could have been responsible for different functional adaptations. As oxidation of alcohols present in the feeding niches can be considered an important adaptive trend of the flies, a significant role of ADH in the speciation of Drosophilidae cannot be ruled out. Overall, the method, data, and conclusions presented can contribute to shedding light on the evolutionary processes which drove the spread of the Drosophila ADH family and show how new protein regions of previously unsuspected evolutionary significance can be identified by this kind of analysis.
220
Note Added at Proof Recently, the authors have sequenced the DNA region encompassing the catalytic triad in D. virilis ADH. We concluded from our data that two amino acid positions, 141 and 156, may have been reported incorrectly as Ser and Glu, instead of the conserved Thr and Lys, respectively. Acknowledgments. We are grateful to R. Rycroft and B. Malik for revising the English in the manuscript. This work was supported by CICYT (Plan Nacional I + D, Ministerio de Educacio´n y Ciencia, Espan˜a) Grant BIO94-1067 (Protein Design Group) and EC Contract BIO4-CT97-2123 (Departament de Gene`tica).
References Albalat R, Gonza`lez-Duarte R, Atrian S (1992) Protein engineering of Drosophila alcohol dehydrogenase. The hydroxyl group of Tyr(152) is involved in the active site of the enzyme. FEBS Lett 308:235–239 Albalat R, Marfany G, Gonza`lez-Duarte R (1994) Analysis of nucleotide substitutions and amino acid conservation in the Drosophila Adh genomic region. Genetica 94:27–36 Albalat R, Valls M, Fibla J, Atrian S, Gonza`lez-Duarte R (1995) Involvement of the C-terminal tail in the activity of Drosophila alcohol dehydrogenase. Evaluation of truncated proteins constructed by site-directed mutagenesis. Eur J Biochem 233:498–505 Andrade MA, Casari G, Sander C, Valencia A (1997) Classification of protein families and detection of the correlated residues with an improved self-organizing map. Biol Cybernet 76:441–450 Atrian S, Gonza`lez-Duarte R (1982) Comparison of some biochemical features of the enzyme alcohol dehydrogenase in sixteen species of Drosophila. In: Lakovaara S (ed) Advances in genetics, development and evolution of Drosophila. Plenum Press, New York, p 251 Atrian S, Marfany G, Albalat R, Gonza`lez-Duarte R (1992) Primary structure analysis of Drosophila alcohol dehydrogenase. Biochem Genet (Life Sci Adv) 11:19–29 Bairoch A, Apweiler R (1996) The SWISS-PROT protein sequence databank and its new supplement TREMBL. Nucleic Acids Res 24:21–25 Bairoch A, Boeckmann B (1993) The SWISS-PROT protein sequence data bank, recent developments. Nucleic Acids Res 21:3093–3096 Begun DJ (1997) Origin and evolution of a new gene descended from alcohol dehydrogenase in Drosophila. Genetics 145:375–382 Brogna S, Ashburner M (1997) The Adh-related gene of Drosophila melanogaster is expressed as a functional dicistronic messenger RNA: multigenic transcription in higher organisms. EMBO J 16: 2023–2031 Casari G, Sander C, Valencia A (1995) A method fo predict functional residues in proteins. Struct Biol 2:171–178 Chambers GK (1991) Gene expression, adaptation and evolution in higher organisms. Evidence from studies of Drosophila alcohol dehydrogenases. Comp Biochem Physiol 99B:723–730 Chen Z, Lu L, Shirley M, Lee WR, Chang SH (1990) Site-directed mutagenesis of glycine-14 and two critical cisteinyl residues in Drosophila alcohol dehydrogenase. Biochemistry 29:1112–1118 Chen Z, Jiang JC, Lin Z, Lee WR, Baker ME, Chang SH (1993) Site-specific mutagenesis of Drosophila alcohol dehydrogenase. Evidence for involvement of Tyrosine-152 and Lysine-156 in catalysis. Biochemistry 32:3342–3346 Chenevert SW, Fossett NG, Chang SH, Tsigelny I, Baker ME, Lee WR (1995) Amino acids important in enzyme activity and dimer sta-
bility for Drosophila alcohol dehydrogenase. Biochem J 308:419– 423 Cols N, Marfany G, Atrian S, Gonza`lez-Duarte R (1993) Effect of site-directed mutagenesis on conserved positions of Drosophila alcohol dehydrogenase. FEBS Lett 319:90–94 Cols N, Atrian S, Benach J, Ladenstein R, Gonza`lez-Duarte R (1997) Drosophila alcohol dehydrogenase: evaluation of Serl39 sitedirected mutants. FEBS Lett 413:191–193 David JR, Allemand R, Van Herrewege J, Cohet Y (1983) Ecophysiology: a biotic factor. In: Ashburner M, Carson HL, Thompson JN (eds) The genetics and biology of Drosophila, 3d. Academic Press, New York, p 105 Dorit RL, Ayala FJ (1995) ADH evolution and the phylogenetic footprint. J Mol Evol 40:658–662 Felsenstein J (1985) Confidence limits on phylogenies: an approach using the bootstrap. Evolution 39:83–791 Ghosh D, Wawrzak Z, Weeks CM, Duax WL, Erman M (1994) The refined three-dimensional structure of 3␣,20-hydroxysteroid dehydrogenase and possible roles of the residues conserved in shortchain dehydrogenases. Structure 2:629–640 Grimaldi DA (1990) A philogenetic, revised classifications of genera in Drosophilidae (Diptera). Bull Am Mus Nat Hist 197:1–139 Heinstra, PWH (1993) Evolutionary genetics of the Drosophila alcohol dehydrogenase gene-enzyme systems. Genetica 92:1–22 Higgins DG, Thompson JD, Gibson TJ (1996) Using CLUSTAL for multiple sequence alignments. In: Doolittle RF (ed) Computer methods for macromolecular sequence analysis, Methods in Enzymology 266. Academic Press, New York, p 383 Horio T, Kubo T, Natori S (1996) Purification and cDNA cloning of the alcohol dehydrogenase of the flesh fly Sarcophaga peregrina—a structural relationship between alcohol dehydrogenase and a 25-kDa protein. Eur J Biochem 237:698–703 Jo¨rnvall H, Persson B, Jeffery J (1981) Alcohol and polyol dehydrogenases are both divided into two protein types, and structural properties cross-relate the different enzyme activities within each type. Proc Natl Acad Sci USA 78:4226–4230 Jo¨rnvall H, Persson B, Krook M, Atrian S, Gonza`lez-Duarte R, Jeffery J, Ghosh D (1995) Short-chain dehydrogenases/reductases (SDR). Biochemistry 34:6003–6013 Juan E, Gonza`lez-Duarte R (1980) Determination of some biochemical and structural features of alcohol dehydrogenases from Drosophila simulans and Drosophila virilis. Biochem J 189:105–110 Krook M, Prozorovski V, Atrian S, Gonza`lez-Duarte R, Jo¨rnvall H (1992) Short-chain dehydrogenases. Proteolysis and chemical modification of prokaryotic 3a/20-hydroxysteroid, insect alcohol and human 15-hydroxyprostaglandin dehydrogenases. Eur J Biochem 209:233–239 Kwiatowski J, Skarecky D, Bailey K, Ayala FJ (1994) Phylogeny of Drosophila and related genera inferred from the nucleotide sequence of the Cu,Zn Sod gene. J Mol Evol 38:443–454 Lichtarge O, Bourne HR, Cohen FE (1996) An evolutionary trace method defines binding surfaces common to protein families. J Mol Biol 257:342–358 Lindsley DL, Zimm GG (1992) Adh: alcohol dehydrogenase: In: The genome of Drosophila melanogaster. Academic Press, New York, p 16 Livingstone CD, Barton GJ (1993) Protein sequence alignments: a strategy for the hierarchical analysis of residue conservation. CABIOS 6:645–756 Matsumoto N, Sekimuzu K, Soma G, Ohmura Y, Andoh T, Nakanishi Y, Obinata M, Natori S (1985) Structural analysis of a developmentally regulated 25 kDa protein gene of Sarcophaga peregrina. J Biochem 97:1501–1508 Merc¸ot H, Defaye D, Capy P, Pla E, David JR (1994) Alcohol tolerance, ADH activity, and ecological niche of Drosophila species. Evolution 48:746–757 Nurminsky DI, Moriyama EN, Lozovskaya ER, Hartl DL (1996) Molecular phylogeny and genome evolution in the Drosophila virilis
221 species group: duplications of the alcohol dehydrogenase gene. Mol Biol Evol 13:132–149 Pazos F, Sanchez-Pulido L, Garcı´a-Renea JA, Andrade MA, Atrian S, Valencia A (1997) Comparative analysis of different methods for the detection of specificity regions in protein families. In: Lundt D, Olsson B, Narayanan A (eds) Bio-computing and emergent computation. World Scientific, Singapore London Hong Kong, p 133 Persson B, Krook M, Jo¨rnvall H (1991) Characteristics of short-chain alcohol dehydrogenases and related enzymes. Eur J Biochem 200: 537–543 Prozorovski V, Krook M, Atrian S, Gonza`lez-Duarte R, Jo¨rnvall H (1992) Identification of reactive tyrosine residues in cysteinereactive dehydrogenases. Differences between liver sorbitol, liver alcohol and Drosophila alcohol dehydrogenases. FEBS Lett 304: 46–50 Rat L, Veuille M, Lepesant JA (1991) Drosophila fat body protein-P6 and alcohol dehydrogenase are derived from a common ancestral protein. J Mol Evol 33:194–203 Ribas de Pouplana L, Fothergill-Gilmore L (1994) The active site
architecture of a short-chain dehydrogenase defined by site-directed mutagenesis and structure modeling. Biochemistry 33:7047–7055 Russo CAM, Takezaki N, Nei M (1995) Molecular phylogeny and divergence times of drosophilid species. Mol Biol Evol 12:391–404 Saitou N, Nei M (1987) The neighbor-joining method: a new method for reconstructing phylogenetic trees. Mol Biol Evol 4:406–425 Throckmorton LH (1975) The phylogeny, ecology and geography of Drosophila. In: King RC (ed) Handbook of genetics, 3. Plenum Press, New York, p 421 Vriend G (1990) WHAT IF: A molecular modeling and drug design program. J Mol Graph 8:52–56 Wheeler MR (1981) The Drosophilidae: a taxonomic overview. In: Ashburner M, Carson HL, Thompson JN (eds) The genetics and biology in Drosophila, 3a. Academic Press, New York, p 1 Wheeler MR (1986) Additions to the catalog of the world’s Drosophilidae. In: Ashburner M, Carson HL, Thompson JN (eds) The genetics and biology of Drosophila, 3e. Academic Press, New York, p 395