CABIOS COMMUNICATION
Vol. 12 no. 3 1996 Pages 237-240
The Directory of P450-containing systems on Worldwide Web Kirill N.Degtyarenko1* and Peter Fabian2
Abstract To facilitate access to electronic resources for all researchers working in the field of P450 proteins and P450containing systems, a Worldwide Web server has been established called The Directory of P450-containing Systems at . Currently it contains the most up-to-date list of sequences of both the P450 superfamily and proteins mediating electron transfer to P450, i.e. NADPH.P450 reductases, specific NAD(P)H:ferredoxin reductases, cytochrome b; reductases, ferredoxins and cytochromes b;, and their homologues from different enzyme systems. All the referenced sequences are provided with accession numbers and cross-links to major sequence databanks: PIR, SWISS-PROT, EMBL/ GenBank and PRF. Introduction P450 enzymes constitute a superfamily of haem-thiolate proteins, widely distributed in bacteria, fungi, plants and animals. The enzymes are involved in the metabolism of a plethora of both exogenous and endogenous compounds. Usually, they act as terminal oxidases in multicomponent electron transfer chains, termed P450-containing monooxygenase systems (Degtyarenko, 1995). All known P450containing systems share a common structural and functional domain architecture. Apart from P450 itself, these systems can comprise several fundamentally different protein components and domains, all of which can be shared by other multicomponent/multidomain enzyme systems with various functions: FAD flavoprotein or domain, FMN domain, 2Fe-2S ferredoxin, 3Fe-4S ferredoxin, cytochrome b5. Either a FMN (flavodoxin-like) domain, a ferredoxins, or cytochrome b5 serve as the electron transport intermediate between the FAD domain International Centre for Genetic Engineering and Biotechnology, Area Science Park. 34012 Trieste, Italy 'Present address: Department of Biochemistry and Molecular Biology, University of Leeds, Leeds LS2 9JT, UK. E-mail,
[email protected] 2 Present address: Institute for Biochemistry, Agricultural Biotechnology Center, GdddllO, P.O. Box 411, 2100. Hungary. E-mail:
[email protected] *To whom correspondence should be addressed. E-mail:
[email protected]
© Oxford University Press
and P450. In turn, on the basis of sequence similarity, FAD flavoproteins can be categorised into three major families: (i)
(ii) (iii)
ferredoxin:NADP + reductase (FNR) family: NADPH:P450 reductase, cytochrome b$ reductase; glutathione reductase (GR) family: putidaredoxin reductase, terpredoxin reductase; NADPH:adrenodoxin reductase, which shares local similarity with glutamate synthases and NADH peroxidase.
The rapid growth of sequence information is making the gathering together of a comprehensive and up-to-date data set for extensive protein families increasingly laborious. For example, at least 450 different P450 sequences have been published to date (Nelson, 1995), which represents a doubling of the collection since the 1993 nomenclature update (Nelson et al., 1993). In an effort to facilitate access to electronic resources for all researchers working in the field of P450 proteins and P450 containing systems, we have created The Directory of P450-containing Systems (DPS), a Worldwide Web server available at < http://www.icgeb.trieste.it/p450/ > . DPS contents and structure Currently DPS presents the following data: 1. 'Core' (lists of accession numbers for components of P450-containing systems) • • • • • • • • • • • • • •
Full list of P450 superfamily Animal P450s Fungal P450s Plant P450s Bacterial P450s NADPH:P450 reductases (CPR) Fe-S protein- and cytochrome b5 reductases Fe-S proteins Cytochromes b$ Cytochrome 65-like domain containing proteins Ferredoxin:NADP+ reductases and FNR-like domain containing proteins GR-like oxidoreductases Flavodoxins Genes with known exon/intron structure 237
K.N.Degtyarenko and P.Fabian
Table I. Contents of DPS 'core' Group of proteins
No. of genes
No. of species
P450 superfamily. including Animal P450s Fungal P450s Plant P45Os Bacterial P450s CPR family (flavodoxin/FNR fuston proteins), including NADPH:P450 reductases Nitric oxide synthases NADPH:sulphite reductases FNR superfamily (except for CPR family), including NADH:cytochrome A5 reductases GR superfamily, including P450-specific Fe-S reductases Adrenodoxin reductases Iron-sulphur proteins Flavodoxins Cytochromes i 5 Proteins containing cytochrome 65-lilce domain Total
434
90
323 35 44 32 39
37 18 17 21 28
22 11 5 99
20 6 5 69
5 138
4 83
4 3 20 26 18 45 781
4
3 17 22 15 35
226
2. Supplementary information: • Families of structural domains of P450-containing systems • PRINTS entries for structural domains of P450containing systems • List of known three-dimensional structures • OMIM entries related to P450-containing systems • Bibliography on P450-containing systems The 'core' of DPS is systematic information on protein and nucleic acid sequences of the P450 superfamily as well
as other components of P450-containing systems and their homologues. Strictly speaking, DPS is more a 'virtual' than a 'real' tool (Harper, 1995), i.e. rather than containing sequence information itself, it is composed mainly of direct links to 'primary' sequence databanks: PIR (George et al., 1994), SWISS-PROT (Bairoch and Boeckmann, 1994), EMBL (Emmert et al., 1994) and PRF/SEQDB (Protein Research Foundation, Osaka). In addition, DPS has cross-references to other databases such as PROSITE (Bairoch and Bucher, 1994), PRINTS (Attwood et al., 1994), ProDom (Sonnhammer and Kahn, 1994), ProtFam at MIPS (George et al., 1994), Entrez Taxonomy Database at NCBI (Benson et al., 1994), ENZYME (Bairoch, 1994), LIGAND (Suyama et al., 1993) and OMIM (Pearson et al., 1994). Table 3 from Nelson el al. (1993), containing accession numbers for known P450 sequences in PIR, SWISSPROT and EMBL/GenBank databases, was used as a prototype of the page layout in DPS (Fig. 1). This table has been supplemented by the accession numbers in the PRF/SEQDB database. All the accession numbers are linked to the corresponding databases: 'clicking' the code retrieves the sequence. This allows the user to get the most up-to-date versions of sequences as soon as they appear in major databases. The table of P450 sequences is being continuously updated in accordance with the P450 nomenclature recommendations. Analogous tables have been compiled for other structural components of P450containing systems. As of November 28, 1995 DPS contains reference data on 781 genes from 226 different species (Table I). Problems with data redundancy are reduced without loss of information merely through the siting of the entries in the table. For example, EMBL contains six entries corresponding to the human CYP1A1 gene; all of them are
Table II. PRINTS entries for structural domains of P450-containing systems PRINTS entry
P450 BP4B0 HITP450 CYT0CHR0MEB5 ADRENODOXIN 3FE4SFRD0XIN FLAV0DOXIN FPNCR
CYTB5RDTASE FADPNH
ADXRDTASE
238
Number of elements
Description
True positives
P450 superfamily B-class P450 Mitochondrial P450 Cytochrome bs superfamily Adrenodoxin family 3Fe-4S ferredoxins Flavodoxin superfamily Flavoprotein pyridine nucleotide cytochrome reductases (FNR superfamily) Cytochrome 65 reductases FAD-dependent pyridine nucleotide reductases (GR superfamily) Adrenodoxin reductases
354 23 36 54 15 15 66 81
47 79
15
Tbe Directory of P450-contalning systems on www
Description
PRINTS references
Mitochondrlal (1) and most bacterial P450-containing systems (21 have three components: an PAD-containing flavoproteln (NADPH or NADH-dep«ndent reductase), an Iron-sulphur protein, and P450. NADPH -» adrenodoxln reductaee -» adrenodoxln -» P450 (l) NADH -» Cerredoxln reductase -» ferredoxln -» P450 (2) It was shown that In microsomal P4S0-contalning systems cytochroroe b, can serve as electron donor Cor P450s: NADH -» cytochrome b, reductase -» cytochrome b, -+ P450 (3) Despite functional similarities, reductase components of redox chains (1, 2, 3) are not homologous. Bacterial P450-speclflc ferredoxin reductasee belong to FADdependent pyrldlne nucleotide reductaee family, whereas cytochrome 6, reductase belongs to flavoppoteln pyrldlne nucleotide cytochrome reductase family. Adrenodoxln reductase shows no sequence similarity with other known ferredoxln reductases over the entire length of sequence.
NAD(P)H:Fe-S reductases: Gene
LIGAND and ENZYME
Species
PIR
SwlssProt
adxR
bov
A29604
JT07S1
P08165
M17029
D00211
X13736
hum
A36482 B404B7
A40487
P22570
M58508
H58509
J03826
U28038
U28373
See
PRODOM LIGAND and ENZYME
PRODOM DPS LIGAND and ENZYME
GenBanX/EMBL/DDBJ
PRF
A. NADPH:adirnodoxln reducing* (EC 1.18.1.2) 1309175A 1314208A 1508200A 2002222A 1413353A 1701475A
ProDom family: 5276. B. flMSO-roedflc) fnrtdoxln:NADH reductasti (EC 1.18.1.-) can*
Psp
JX0078
terpA
Pse
S27654
forA
Ser
L38646
2110282B
thcD
Rhn
U1713O
2104333K
D42971
P16640
J05406
P33009
M91440
D00528
M12546
P r o D o m famUie*: 2 5 1 , 6 0 4 , 1 9 7 4 , 2 3 4 1 . To the FAD-dependent pvridine nucleotide oxidoreductases page ( 57K). ProDom fimilicr 251, 360, 449, 570, 604, 1974, 2 3 4 1 . C. NADH;CTtochrome ftt itduciast* (EC 1.6.2.2) Gene
Species
bSR
PIR
bov hum
A23896 JS0468
rat
JX0119 S23641 S49935 ASS196
See
B26616
P07514 P00387 P20070
S42S67
MIPS
MIPS protein luperfamiry: 1 2 0 . 0 MIPS protein domain family: 00050
PRODOM DPS
ProDom families 3809, 7 5 3 . •
SwissProt
P38626 P36060
GenBank/EHBL/DDBJ H83104 M16461 M28706 M28710 J03867 X65190 Z28365 Z28150 Z49218
Ml 6462 H2B707 M28711 X77117 X65191 Z26877
TO9077 M28708 M28712 D00718
PRF M28705 M28709 H28713 D00636
1111250A 1203280A 1516206A 1707155A 1413360A 1612299A 2006246A 2102246A
To the feircdrarin:NADP*reductaieluperfamiivpage(46K 1. ProDom families: 625. 2 1 0 3 .
Species abbitTiarJom
Entrez taxonomy
bov M bovine hum « Pse = psp»£ai
rat «aJ See = Saccharonrxet ctrrriiiat (biker1! Ser •»Saccharooohipora ervthrara Fig. 1. A typical page from DPS (boxed). Underlined text indicates the hypertext links to the number of databases (listed on the top and along the left margin).
239
K.N.Degtyarenko and P.Fabian
positioned in the CYP1A1 hum field and therefore the user will not treat them as different genes in spite of their distinct 'trivial' names [P-450 6, P(l)-450, P-1-450, P450c, P450IA1]. In the EMBL/GenBank database, the multi-exon genes are sometimes presented as a series of entries. For example, entries M30795 through M30804 represent the human CYP19 gene where each entry corresponds to one exon; six entries (M58932 — M58937) span the rat CPR gene consisting of 16 exons. A special page in DPS is dedicated to genes with a known exon/ intron structure. Search in DPS To provide simple and fast textual searching in the whole P450 directory, glimpse version 3.0 (Manber and Wu, 1994) was installed. Tools for updating DPS To shorten the time-consuming process of searching for novel related sequences, we have created a small program in perl (Wall and Schwartz, 1991). The program works in the following way: •
• •
•
It connects to selected Worldwide Web servers (currently to expasy.hcuge.ch, www.genome.ad.jp and www.ebi.ac.uk). These servers offer the possibility of searching in the SWISS-PROT, PRF and EMBNew databanks, respectively The program searches the above databanks using 30 pre-specified keywords It then retrieves the results of each search and extracts the accession number for each sequence found After comparing the accession numbers with those already included in DPS, the program extracts the missing ones and inserts them into a hypertext file
Finally, the new sequences are classified manually and inserted into DPS. The above procedure can easily be employed for the semi-automatic update of any other sequence library. Fingerprinting the structural domains of P450-containing systems Structural domains can be characterized in terms of a number of separate sequence motifs. The groups of motifs, constituting the signature for a particular (super)family of sequences (also referred to as 'fingerprints'), provide a tool for the effective search of distantly homologous sequences (Attwood el al., 1994). For all the components/domains of P450 systems, the corresponding fingerprints were constructed, which can be used to identify the new members of these protein superfamilies, even if the overall amino acid
240
sequence identities are at or below the limit of significance (Table II). The fingerprint entries are integrated in the PRINTS database and are available both via the www server at < http://www.biochem.ucl.ac.uk/bcrn/dbbrowser/ > and via the anonymous ftp elsewhere (Attwood et al., 1994). Extra fingerprints will be added in the incoming releases of PRINTS to allow the further discrimination between protein families, e.g. P450 families. Acknowledgements The authors thank Dr T.K.Attwood for her inestimable help in the creation of the fingerprints. This project was launched during the authors' fellowship at the Protein Structure and Function Laboratory of the International Centre for Genetic Engineering and Biotechnology, Trieste, Italy.
References Attwood.TK., Beck,M.E., Bleasby.A.J and Parry-Smith,D.J. (1994) PRINTS—a database of protein motif fingerprints. Nucleic Acids Res., 22, 3590-3596. Bairoch,A. (1994) The ENZYME data bank. Nucleic Acids Res., 22, 3626-3627. Bairoch,A. and Boeckmann,B. (1994) The SWISS-PROT protein sequence data bank: current status. Nucleic Acids Res., 22, 3579—3580. Bairoch,A. and Bucher.P. (1994) PROSITE: recent developments. Nucleic Acids Res., 22, 3583-3589. Benson,D.A., Boguski.M., Lipman.D J. and Ostell.J. (1994) GenBank. Nucleic Acids Res., 22, 3441-3444. Degtyarenko,K.N. (1995) Structural domains of P450-containing monooxygenase systems. Protein Engineering, 8, 737-747. Emmert.D.B., Stoehr,P.J., Stoesser.G. and Cameron.G. (1994) The European Bioinformatics Institute (EBI) databases. Nucleic Acids Res., 22, 3445-3449. Gcorge.D.G., Barker,W.C , Mewes,H.-W., Pfeiffer.F. and Tsugita.A. (1994) The PIR-International Protein Sequence Database Nucleic Acids Res., 22, 3569-3573. Harper.R. (1995) World Wide Web resources for the biologist Trends Genet., 11, 223-228. Nelson.D.R., Kamataki,T., Waxman,DJ., Guengcrich.F.P., Estabrook.R.W., Feyereisen.R., Gonzalez,F J., Coon.M.J., Gunsalus,I.C, Gotoh.O., Okuda.K. and Nebert,D.W. (1993) The P450 superfamily update on new sequences, gene mapping, accession numbers, early trivial names of enzymes, and nomenclature. DNA Cell Biol., 12, 1-51. Nelson.D.R. (1995) 450 cytochrome P450s. In Proceedings of the Third International symposium on Cvtochrome P450 Biodiversity, Woods Hole, MA, p. 14. Manber.U. and Wu,S. (1994) GLIMPSE: A tool to search through entire file systems. In Proceedings of the USENIX Winter Technical Conference, San Francisco, pp. 23-32. Glimpse is available by anonymous ftp at ftp.cs.arizona.edu. Pearson,P., Francomano.C, Foster.P., Bocchini,C, Li,P and McKusick,V. (1994) The status of online Mendelian inheritance in man (OMIM) medio 1994. Nucleic Acids Res., 22, 3470-3473. Sonnhammer,E.L.L. and Kahn,D. (1994) The modular arrangement of proteins as inferred from analysis of homology Protein Science, 3, 482-492. Suyama,M., Ogiwara,A., Nishioka,T. and Oda,J. (1993) Searching for amino acid sequence motifs among enzymes: the enzyme-reaction database. Comput. Applic. Biosci., 9, 9-15. Wall,L. and Schwartz,R (1991) Programming Perl. O'Reilly and Associates, Inc. Received on January 2, 1996; accepted on March 8, 1996