Prediction of N-terminal protein sorting signals ... - Semantic Scholar

3 downloads 38731 Views 528KB Size Report
MacProt for Apple Macintosh computers (P Markiewicz, unpublished), SIGNAL [22] for UNIX, ... [25]. PLOT.A/SlG ftp://ftp/ebi.ac.uk/pub/software/mac. (a). SIGNAL.
394

Prediction of N-terminal protein sorting signals Manuel G Claros*, Soren Brunak and Gunnar von Heijne Recently, neural networks have been applied to a widening range of problems in molecular biology. An area particularly suited to neural-network methods is the identification of protein sorting signals and the prediction of their cleavage sites, as these functional units are encoded by local, linear sequences of amino acids rather than global 3D structures.

Addresses *Laboratorio de Bioqumica y Biologa Molecular, Facultad de Ciencias, Universidad de M&laga, E-29071 M&laga, Spain tDepartment of Chemistry, Centre for Biological Sequence Analysis, Technical University of Denmark, DK-2800 Lyngby, Denmark ~Department of Biochemistry, Arrhenius Laboratory, Stockholm University, S-106 91 Stockholm, Sweden Current Opinion in Structural Biology 1997, 7:394-398 http://biomednet.com/elecref/0959440X00700394 © Current Biology Ltd ISSN 0959-440X Abbreviations c-region C-terminal region cTP chloroplasttransit peptide h-region hydrophobic core mTP mitochondrialtargeting peptide n-region N-terminal region SP signal peptide

Introduction Many proteins that are synthesized in the cytoplasm are ultimately found in noncytoplasmic locations. This requires an efficient and specific addressing system. The 'address labels' arc encoded by a specific sequence or structure in the protein [1"], and various methods to detect different classes of targeting signals in protein sequences have been devised over the years. In this review, we will briefly discuss methods designed to identify signals that target proteins to the secretory pathway (signal peptides) and to mitochondria and chloroplasts. Our emphasis will be on the most recent work in this area, which is based mostly on the use of neural networks. Signal peptide characteristics Proteins destined for the cndoplasmic reticulum, the Golgi compartment, the lysosome, the plasma membrane, and the exterior of the cell are initially transported across the endoplasmic reticulum membrane [2]. These proteins contain an N-terminal signal peptide (SP) of about 15-25 residues. SPs consist of a short, positively charged N-terminal region (n-region), a 7-15 residue hvdrophobic core (h-region), and a more polar 3-7 residue C-terminal region (c-region) leading up to the signal peptidasc cleavage site between the SP and the mature protein [3]. Although minor differences in composition and length exist, the characteristics of eukaryotic SPs are essentially shared by the SPs of prokaryotic and archaeal proteins

destined to the periplasm [4°] (H Nielsen, S Brunak, yon Heijne, unpublished data). Prokaryotic lipoproteins are cleaved by a distinct signal peptidase and their SPs have a different cleavage-site pattern [3]. Mitochondrial targeting peptide characteristics Most mitochondrial targeting peptides (mTPs) consist of an N-terminal presequence of about 20-80 residues, although several mitochondrial proteins are synthesized without a cleavable extension [5]. mTPs arc enriched in arginine, leucine, serine and alanine, contain none or a few acidic residues and are believed to form amphiphilic or-helices [6,7], A loosely conserved cleavage-site motif has been proposed [8]. Proteins located in the inner mitochondrial membrane or the intermembrane space are often targeted by means of a bipartite mTP [9]. Although not related specifically to the mTPs, it may be noted that mitochondrially targeted proteins have a high isoelectric point [10] and do not present physical constraints such as highly hydrophobic transmembrane segments that obstruct their import [11*1. Chloroplast transit peptide characteristics Chloroplast transit peptides (cTPs) are mainly enriched in serine and other hydroxylated residues. Because of a rather low content of acidic residues, their net charge is often positive. Their length varies from 20 to more than 100 residues, cTPs seem to be organized in three domains [7]: an N-terminal part, which is around ten amino acids long, uncharged, poor in glycine and proline, and which has an alanine following the initial methionine in most cases; a central part, which is enriched in serine and contains few, if any, acidic residues; and a C-terminal part, which is enriched in arginine, and which may form an amphiphilic ]3 strand, cTPs that target proteins to the thylakoid are bipartite [12]. Two kinds of cleavage-site consensus motifs have been proposed: one for thylakoidal proteins, which is similar to the signal peptidase motif found in SPs [13], and another, less conserved motif for stromal proteins [14]. Neural networks Neural networks consist of a number of simple, nonlinear computational units that operate in parallel. The technique was originally developed with the purpose of modeling brain function, and neurons and synapses in biological neural networks, but it has so far had its greatest impact in practical pattern-recognition applications [15,16].

In the most popular network architecture, the units are organized into layers: an input layer, which receives signals from external sources, such as the numerical encoding of sequence data; one or more hidden layers; and an output layer, which sends analog signals to the environment for later interpretation and assignment of

Prediction of N-terminal protein sorting signals Claros, Brunak and yon Heijne

395

functional categories. When only input and output layers are present, the architecture is often called a 'Perceptron', and it has the computational ability to resolve only linear classification tasks.

that takes place in the cellular environment. Below, we discuss the 'conventional' and neural-network methods that have been developed for predicting protein targeting signals (Table 1).

A more powerful pattern-recognition device results when hidden layers are present. These layers are termed hidden because the activation of their units is not directly observable from the outside; they receive signals from the input units and relay weighted signals to the output units. Hidden units give the network the capability of handling nonlinear classification tasks. T h e optimal numbers of hidden layers and units are normally determined empirically, although tools originating from a more general Bayesian framework can be used for estimating a reasonable number of weight parameters [15-17].

Prediction of signal peptides Statistical methods For a full characterization, one needs to predict both the existence of a particular sorting signal in a protein sequence and the cleavage site between the signal and the mature protein. A procedure to do the former was proposed early on by McGcoch [19], who built a discriminant function based on the net charge and length of the n-region, and the length and hydrophobicity of the h-region. Weight matrices were used early on to identify signal peptidase cleavage sites [20], with 75-80% predictions of cleavage sites in SPs outside the training set reported to be correct. These matrices have been widely used in several software applications such as SIGSEQ [21] for UNIX and MS-DOS operating systems, the PLOT.A/SIG program included in MacProt for Apple Macintosh computers (P Markiewicz, unpublished), SIGNAL [22] for UNIX, and Signalase (N Matei, unpublished) for Apple Macintosh computers. T h e most complete software that uses the original SP weight matrices is SIGCLEAVE in the extended pack of programs for GCG (EGCG) for VMS and UNIX machines [23]. SIGCLEAVE has been modificd following the recommendations in [24], although the reported success rate is similar to the original method.

One of the main advantages of neural networks is that a preconceived model is not required. The weights in a network (i.e. the strengths of the connections between units in adjacent layers) are determined automatically by so called 'learning procedures' that use sequence data to adjust the parameter values. Information from both positive and negative cases is taken into account, such that residues and positions that are important will be weighted accordingly: Thus, no a priori relationships exist between the chosen parameters, leading to essentially unbiased, data-driven results [17]. Importantly, the trained network can detect nonlinear correlations between the residues in the input window and exploit them for improving the quality of the predictions. As expected, larger, nonhomologous, data sets increase the accuracy of neural-network methods [18]. A special benefit is the possibility of interpreting the weights resulting from training in a biochemically meaningful way, possibly leading to new insights into the recognition

Whereas most of the early methods focused only on cleavage-site predictions, Folz and Gordon [25] have combined two different algorithms to detect SPs: one algorithm (SIGSEQ1) is based on a series of empirically derived rules to generate a probability score for each position, and the second algorithm (SIGSEQ2) applies the statistical weight matrix approach. By combining the two

Table 1 Software for predicting N-terminal sorting signals. Sorting signal

Programs

Internet addresses

Signal peptides

SlGSED SlGSEQ1, SlGSEQ2 PLOT.A/SlG SIGNAL Signalase SlGCLEAVE

ftp://ftp.ebi.ac.uk/pub/software/ [email protected]*

PROFI PROSA, PROMIS SIGNALP Mitochondrial targeting peptides

Subcellular Iocalizations

MitoProt II

ftp://ftp/ebi.ac.uk/pub/software/mac ftp://ftp/ebi.ac.uk/publsoftwarelunix/ ftp://ftp/ebi.ac.uk/publsoftware/mac/ ftp://ftp/ebi.ac.uk/pub/software/ [email protected] [email protected] http://www.cbs.dtu.dk/servers/SignalP [email protected] (e-mail server)

PROSA, PROMIS

ftp://ftp.ebi.ac.uldpub/software/ mitoprot@biologie,ens.fr (e-mail server) [email protected]

PSORT

http://psort.nibb.ac.jp/

Reference [21 ] [25] (a) (b) (c) [23] [29] [32] [41 ] [36"] [32] [40]

*This published e-mail address does not exist. (a) P Marckiewicz, personal communication. (b) R Colgrove, personal communication. (c) N Matei, personal communication.

396

Sequencesand topology

methods, 80% of the SP cleavage sites in a collection of eukaryotic proteins was correctly predicted.

Neural network approaches Neural networks were first applied to SP predictions in combination with weight matrices [26]. A multilayer architecture that self-adjusts to the data was trained to detect 20 N-terminal residues belonging to SPs. 82% and 74% of the SPs were correctly identified in the training and test set, respectively. When the network was applied to sequences first selected with the weight matrices, the success rate increased to 95%. This network failed to outperform the discrimination ability of the weight matrix method, even though a larger training set was used. An unsupervised neural network based on a self-organizing Kohonen feature map has unexpectedly been found to be able to extract SP motifs from insulin cDNA sequences [27]. How well this method performs on more general data sets is unclear. Using a training set of only 24 proteins, a neural network, trained by a genetic algorithm and representing each amino acid by four physicochemical properties, has been able to correctly predict all SPs in the training set [28]. It uses an asymmetric window o f - 1 0 to +2 residues with a Perceptron-type neural network, This method, PROFI, has been improved by using a three-layer network and by dividing the same 24 proteins into training and test sets: the optimal input layer employs 4-7 physicochemical amino acid properties for the sequence description, one hidden layer for the feature extraction, and a single output layer for classification [29]. The success rate for predicting the cleavage sites is 99% for the training set and 91% for the test set. The network has been further used to design amino acid sequences, by using simulated molecular evolution, that should have 'ideal' E. coli leader peptidase cleavage sites [30]. Two SP cleavage sites designed in this way have been studied in vivo, and the experimentally observed processing of the protein confirms the predictions made by the neural network [31]. A similar strategy of using a genetic neural network, but using a larger set of proteins (65 from eubacteria and 95 from eukaryotes), has enabled the generalization of E. coli SP characteristics to other organisms. Both the characteristics of the SP and the cleavage site have been studied [32]. The statistical program PROSA is used to extract rules that describe SPs and is able to predict 80% of the cleavage sites in both training and test sets. The neural network PROMIS has been used to predict SPs and their cleavage sites. PROMIS can predict 80% of the SP cleavage sites in the training and test sets. The most efficient method currently available to identify SPs and their cleavage sites is SIGNALP [33"°], which uses neural networks trained on very large sets of nonhomologous prokaryotic (266 sequences from Gram-neg-

ative and 141 from Gram-positive bacteria) and eukaryotic (1011 sequences) SPs [34°]. SIGNALP combines two networks, one to recognize the cleavage site, and another to distinguish between SPs and non-SPs. The best results have been obtained with networks that have zero or one hidden layer with up to four hidden units. SIGNALP uses asymmetric windows (including more positions upstream than downstream of the cleavage site) for predicting the SP, and a symmetric or nearly symmetric window to predict the cleavage site. Cleavage-site prediction on the test sets correctly identified 78% of the eukaryotic and 89% of the prokaryotic signal peptides. This is lower than the accuracy reported for the original weight matrix method (see above), but when new weight matrices were constructed based on the same training set as used for SIGNALP, the performance was much worse than in the original publication. This suggests that signal peptides are somewhat more variable than was apparent from the early, rather small collections, and that neural networks can extract useful information beyond what can be captured with weight matrices. It should be noted that the prediction performances reported for SIGNALP correspond to minimal values due to the low similarity between the sequences in the test set. Thus, the prediction accuracy will generally be higher when the method is applied to an unbiased sample of new sequences, as these will often be similar to sequences used in the training set. Interestingly, the difficulty of SP prediction has been found to increase in the order Gram-negative bacteria < eukaryotes < Gram-positive bacteria. This neural network is also, to some limited extent, able to discriminate among SPs and uncleaved signal anchor sequences.

Prediction of mitochondrial and chloroplast sorting signals Statistical methods The program MitoProt for Apple Macintosh computers [35"] represents an early attempt to help identifying mTPs. It calculates a number of characteristics thought to describe mTPs but leaves the final decision to the user. In the search for a more objective way to identify mTPs, discriminant analysis turns out to be very useful [36°°]. An analysis based on 47 physicochemical parameters and a large set of mitochondrial and nonmitochondrial proteins has reached a success rate of 80% for the training set and 75% for the test set when predicting the existence of an mTR Man'/ of the mitochondrial proteins not detected by the method have been found to possess internal, noncleavable mTPs or to be localized in the outer mitochondrial membrane. Among matrix and inner membrane proteins, only 7% of the mTPs were missed. The method has been implemented in the program MitoProt II for Macintosh and UNIX machines. A discriminant function to separate both mitochondrial and chloroplast proteins from non-organellar proteins has also been included in the last release of MitoProt II

Prediction of N-terminal protein sorting signals Claros, Brunak and von Heijne

(MG Claros, P Vincens, unpublished). T h e crossvalidated rate predicting m T P s and cTPs (77% for both the training and test sets) is slightly lower than for only mitochondrial proteins. T h e discrimination between m T P s and cTPs is performed with the equation described in [7], based on the observation that the frequencies of arginine and serine are very different in the two kinds of signals.

success

Thylakoid lumen proteins have bipartite presequences that are cleaved off by a prokaryotic-like signal peptidase, and a weight matrix similar to that used for cleavage sites in bacterial SPs has been derived for such proteins [13]. T h e prediction accuracy is not known in this case, but it is probably higher than for Gram-negative SPs as the substrate specificity of the thylakoidal protease is more restricted than for the E. coli signal peptidase [37].

Neural network approaches

Neural networks were first applied to a small set of proteins to infer m T P s and cTPs by Schneider eta]. [28], who reported a success rate of 75% for the test set. Cleavage-site predictions based on the PROSA and PROMIS programs, however, have so far not been very successful [32]. T h e main problem in both cases has been overtraining and an inability to extract significant properties describing transit peptides. Using Neurospora crassa mitochondrial precursors, scanned with asymmetric windows and utilizing 13 Perceptron-type networks [38], 81% of the m T P cleavage sites have been correctly predicted. Recently, self-organizing Kohonen networks have been trained on mTPs (with each residue represented by its size and hydrophobicity values) from a range of different organisms and have been found to be able to cluster these sequences according to three previously known cleavage-site motifs (G Schneider et al., unpublished data).

Prediction of subcellular expert systems

localization

by

Expert systems are programs that use domain-specific knowledge and heuristics to solve problems in a narrow and realistic problem area. T h e difference between an expert system and the algorithms described above is that the expert system applies a large number of different sorting signal detection algorithms at the same time, utilizing a collection of 'if-then' rules as a knowledge base to interpret their respective outputs. In the first application of expert systems to the protein sorting problem, four different locations (cytoplasm, inner membrane, periplasm, and outer membrane) in Gramnegative bacteria have been considered [39]. The basic if-then rules are intended to simulate the actual sorting pathways, and the system uses the SP-detection methods

397

described above as well as empirical rules relating to the amino acid composition in the mature part of the protein. It has been able to correctly distribute 83% of 106 nonhomologous prokaryotic proteins among the four localization sites. A similar expert system (PSORT) has been developed for predicting protein locations in eukaryotic cells [40]. 401 proteins from 14 different subcellular sites (cytoplasm, nucleus, four mitochondrial sites, peroxisome, two endoplasmic reticulum sites, Golgi, two lysosome sites, vacuole, two plasma membrane sites, and the extracellular space; and for plant cells, three more sites corresponding to chloroplast proteins) have been collected. Many of the rules are based on discriminant functions calculated on the overall amino acid composition of the protein. The organization of the if-then rules follows a reasoning that tries to emulate the real pathway of sorting in vivo. Around 65% of all proteins can be localized to the correct compartment by the method, whereas a random guess would result in less than 10% accuracy. 66% of the mitochondrial proteins and 86% of the chloroplast proteins have been correctly localized. For proteins targeted by SPs to the secretory, pathway, the accuracy is similar to that obtained using the non-neural network algorithms discussed above. Conclusions

In recent years, workers trying to predict protein sorting signals have turned increasingly to the use of neural networks. By and large, this approach appears to give slight but significant improvements over previous methods, although the larger size and higher quality of the more recent training and test sets may be equally important. SPs, mTPs, and cTPs can now be quite reliably identified, but cleavage sites can so far only be predicted with an acceptable degree of confidence for SPs. Much of the development in the coming years will probably be an adaptation of the methods to the analysis of data from genome-sequencing projects. A particularly interesting (and difficult) problem in this context is the identification of sorting signals in incomplete sequences (such as ESTs). In addition, slight improvements will probably be obtained by developing species-specific prediction methods.

References and recommended reading Papers of particular interest, publishedwithin the annual period of review, have been highlightedas: • ••

of special interest of outstanding interest

Zheng N, Gierasch LM: Signal sequences: the same yet differenL Cell 1996, 86:849-852. Different targeting signalsare compared, inferringevolutionary and functional relationships.Special emphasis is put on the fact that most of them fold into helical structures. 1. •

398

Sequences and topology

2.

Rapoport TA, Jungnickel B, Kutay U: Protein transport across the eukaryotic endoplasmic reticulum and bacterial inner membranes. Annu Ray Biochem 1996, 65:271-303.

25.

Folz PJ, Gordon Jh Computers-assisted predictions of signal peptidase processing sites. Biochem Biophys Res Comm 1987, 146:870-877.

3.

Von Heijne G: The signal peptide. J Membr Bio/1990, 115:195-201.

26.

Ladunga I, Czako F, Csabai I, Geszti T: Improving signal peptide prediction accuracy by simulated neural network. Comput App/ Biosci 1991, 7:485-487.

2"7.

Arrigo P, Giuliano F, Scalia F, Rapallo A, Damiani G: Identification of a new motif on nucleic acid sequence data using Kohonen's self-organizing map. Comput App/Biosci 1991,7:353-357.

28.

Schneider G, R6hlk S, Wrede P: Analysis of cleavage-site patterns in protein precursor sequences with a perceptrontype neural network= Biochem Biophys Res Comm 1993, 194:951-959.

Rusch SL, Kendall DA: Protein transport via amino-terminal targeting sequences: common themes in diverse systems. Mo/ Membr Bio/1995, 12:295-307. This review describes the sequence characteristics of N-terminal targeting signals from different organisms and sorting pathways. 4.



5.

Schwarz E, Neupert W: Mitochondrial protein import: mechanisms, components and energetics. Biochim Biophys Acta 1994, 1187:270-274.

6.

Von Heijne G: Mitochondrial targeting sequences may form amphiphilic helices. EMBO J 1986, 5:1335-1342.

29.

7.

Yon Heijne G, Steppuhn J, Herrmann RG: Domain structure of mitochondrial and chloroplast targeting peptides. Eur J Biochem 1989, 180:535-545.

Schneider G, Wrede P: Development of artificial neural filters for pattern recognition in protein sequences. J Mol Evo11993, 36:586-595.

30.

Schneider G, Wrede P: The rational design of amino acid sequences by artificial neural networks and simulated molecular evolution: de novo design of an idealized leader peptidase cleavage site. Biophys J 1994, 66:335-344.

8.

Gavel Y, Von Heijne G: Cleavage-site motifs in mitochondrial targeting peptides. Protein Eng 1990, 4:33-37.

9.

Stuart RA, Neupert W: Topogenesis of inner membrane proteins of mitochondria. Trends Biochem Sci 1996, 21:261-267.

31.

Schneider G, Hahn U, Fatemi A, MOiler G, Wrede P: Peptide design in machina: artificial signal peptidase I cleavage sites are processed in vivo. Minim Invas Mad 1995, 6:72-77.

10.

Jaussi R: Homologous nuclear-encoded mitochondrial and

32.

Schneider G, Wrede P: Signal analysis of protein targeting sequences. Protein Seq Data Anal 1993, 5:227-236.

cytosolic isoproteins. A review of structure, biosynthesis and genes. Eur J Biochem 1995, 228:389-403. 11. •

Claros MG, Perea J, Shu Y, Samatey FA, Popot JL, Jacq C: Limitations of the in vivo import of hydrophobic proteins into yeast mitochondria. The case of cytoplasmically synthesized apocytochrome b. Eur J Biochem 1995, 228:762-771. Using different parts o1 the cytochrome b gene, the authors demonstrate that imported mitochondrial proteins should not contain too many hydrophobic segments. A new parameter called mesohydrophobicity is developed to quantify a part of the constraints on hydrophobicity. 12.

Robinson C, KI6sgen RB: Targeting of proteins into and across the thylakoid membrane-a multitude of mechanisms. Plant Mol Biol 1994, 26:15-24.

13.

Howe CJ, Wallace TP: Prediction of leader peptide cleavage sites for polypeptides of the thylakoid lumen. Nucleic Acids Res 1990, 18:3417.

14.

Gavel Y, Von Heijne G: A conserved cleavage-site motif in chloroplast transit peptides. FEBS Lett 1990, 261:455-458.

15.

PresnellSR, Cohen FE: Artificial neural networks for pattern recognition in biochemical sequences. Annu Rev Biophys Biomol Struct 1993, 22:283-298.

16.

Bishop C: Neural Networks for Pattern Recognition. Oxford: Clarendon Press; 1995.

17.

Hirst JD: Prediction of structural, functional features of protein and nucleic acid sequences by artificial neural networks. Biochemistry 1992, 31:7211-7218.

18.

Chadonia JM, Karplus M: Importance of larger data sets for protein secondary structure prediction with neural networks. Protein Sci 1996, 5:768-774.

19.

McGeoch DJ: On the predictive recognition of signal peptide sequences. Virus Res 1985, 3:271-286.

20.

Von Heijne G: A new method for predicting signal sequence cleavage sites. Nucleic Acids Res 1986, 14:4683-4690.

21.

Popowicz AM, Dash PF: SIGSEQ: a computer program for predicting signal sequence cleavage sites. Comput Appl Biosci 1988, 4:405-406.

22.

Colgrove R: SIGNAL for UNIX [PhD Thesis]. San Francisco: University of California, San Francisco; 1990.

23.

Rice P, Lopez R, Doelz R, Leunissen J: EGCG 8.0. Embnet news 1995, 2:5-7.

24.

Von Heijne G: Sequence Analysis in Molecular Biology: Treasure Trove or Trivial Pursuit? London: Academic Press; 1987.

33. ••

Nielsen H, Engelbrecht J, Brunak S, Von Heijne G: Identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites. Protein Eng 1997, 10:1-6. Neural networks are trained to identify signal peptides in prokaryotic and eukaryotic proteins. Sequence Iogos are calculated for signal peptides from different groups of organisms. 34. •

Nielsen H, Enge~brecht J, Von Heijne G, Brunak S: Defining a similarity threshold for a functional protein sequence pattern: the signal peptide cleavage site. Proteins 1996, 24:165-177. A method to exclude redundant or homologous entries from sets of sequences that contain a common, we~l defined feature, such as a proteolytic cleavage site is presented. 35. Claros MG: MitoProt, a Macintosh application for studying • mitochondrial proteins. Comput Appl Biosci 1995, 11:441-447. Software that calculates and presents all the characteristics proposed to be important for mitochondrial targeting sequences is described. 36. •-

Claros MG, Vincens P: Computational method to predict mitochondrially imported proteins and their transit peptides. Eur J Biochem 1996, 241:779-786. A discriminant analysis is used to identify mitochondrial proteins. The final function is implemented in a software that can also be used to identify the targeting sequence. The advantage of this approach is the large number of parameters and sequences considered to elaborate the discriminant function, and the rules to infer the mitochondrial targeting sequence. 37

Shackleton JB, Robinson C: Transport of proteins into chloroplasts--the thylakoidal processing peptidase is a signaltype peptidase with stringent substrate requirements at the ~3 position and -1 position. J Biol Ch em 1991,266:12152-12156.

38.

Schneider G, Schuchhardt J, Wrede P: Peptide design in machina: development of artificial mitochondrial protein precursor cleavage sites by simulated molecular evolution. Biophys J 1995, 68:434-447.

39.

NakaiK, Kanehisa M: Expert system for predicting protein localization sites in Gram-negative bacteria. Proteins 1991, 11:95-110.

40.

NakaiK, Kanehisa M: A knowledge base for predicting protein localization sites in eukaryotic cells. Genomics 1992, 14:897-911.

41.

Nielsen H, Engelbrecht J, Brunak S, Von Heijne G: A neural network method for identification of prokaryotic and eukaryotic signal peptides and prediction of their cleavage sites./nt J Neural Syst 1997, in press.