ESyPred3D: Prediction of proteins 3D structures

BIOINFORMATICS

Vol. 18 no. 9 2002 Pages 1250–1256

ESyPred3D: Prediction of proteins 3D structures ´ Christophe Lambert ∗, Nadia Leonard, Xavier De Bolle and Eric Depiereux ´ Universitaires Notre-Dame de la Paix, Unite´ de Recherche en Biologie Facultes ´ Moleculaire, Rue de Bruxelles, 61, B-5000 Namur, Belgium Received on February 1, 2002; revised on March 7, 2002; accepted on March 18, 2002

ABSTRACT Motivation: Homology or comparative modeling is currently the most accurate method to predict the threedimensional structure of proteins. It generally consists in four steps: (1) databanks searching to identify the structural homolog, (2) target–template alignment, (3) model building and optimization, and (4) model evaluation. The target–template alignment step is generally accepted as the most critical step in homology modeling. Results: We present here ESyPred3D, a new automated homology modeling program. The method gets benefit of the increased alignment performances of a new alignment strategy. Alignments are obtained by combining, weighting and screening the results of several multiple alignment programs. The final three-dimensional structure is build using the modeling package MODELLER. ESyPred3D was tested on 13 targets in the CASP4 experiment (Critical Assessment of Techniques for Proteins Structural Prediction). Our alignment strategy obtains better results compared to PSI-BLAST alignments and ESyPred3D alignments are among the most accurate compared to those of participants having used the same template. Availability: ESyPred3D is available through its web site at http://www.fundp.ac.be/urbm/bioinfo/esypred/. Contact: [email protected]; http://www. fundp.ac.be/∼lambertc

INTRODUCTION Three-dimensional (3D) protein structure is an important source of information to better understand the function of a protein, its interactions with other compounds (ligands, proteins, DNA, . . . ) and to understand phenotypical effects of mutations (Tramontano, 1998). The 3D protein structure can be predicted according to three main categories of methods (Rost and O’Donoghue, 1997): (1) homology or comparative modeling (described below); (2) fold recognition (predicting the global fold of a protein); (3) ab initio techniques (trying to model the 3D structure of proteins using only the sequence and a force field). ∗ To whom correspondence should be addressed.

1250

Homology modeling is historically the first (Browne et al., 1969) and the most accurate method (Sanchez and Sali, 1997). It was shown during the last CASP experiment (Venclovas et al., 1999) (Critical Assessment of Techniques for Proteins Structural Prediction) that the critical steps are: (1) template selection, (2) target– template alignment step, (3) modeling of regions not present or significantly different from those in template and (4) modeling of side chains. Among these critical steps, it is commonly accepted that the target–template alignment step is the most critical (Mosimann et al., 1995; Martin et al., 1997). It is known that above 50% of identity rate between target and template, pairwise alignments provide accurate models. Between 30% and 50% of identity, multiple alignments between target, template and similar proteins must be used and the pairwise alignments between target and template must be extracted from this multiple alignment. Below 30% of identity rate, only heuristic combinations of multiple alignments, experimental data and know-how of an expert are able to generate an accurate model. A large number of techniques have been developed to predict 3D structures of proteins by homology modeling. For the target–template alignment step, most of them use PSI-BLAST (Altschul et al., 1997), PileUp (Wisconsin Package Version 9.1, Genetics Computer Group (GCG), Madison, Wisc.), ClustalW (Thompson et al., 1994), 3DPSSM (Fischer et al., 1999), SAMT99 (Karplus et al., 1997), or also the alignment producing the best model out of a collection computed from various alignment programs (Yang and Honig, 1999). Our laboratory developed the MATCHBOX multiple sequence alignment software in the early 1990s (Depiereux and Feytmans, 1992) and it has proved to be one of the most accurate in terms of specificity (Depiereux et al., 1997). Much effort has been consented into improving alignment accuracy by adding information such as secondary structure predictions, solvent accessibility predictions, specific scoring matrices and combination with ClustalW. In all cases, it was only possible to slightly improve multiple alignment accuracy (unpublished rec Oxford University Press 2002

ESyPred3D: Prediction of proteins 3D structures

sults). Meanwhile, no significant improvements in alignment performance have been published by other groups. Furthermore, no alignment method can be qualified as the absolute most reliable one. Indeed, benchmarks (Briffeuil et al., 1998; Thompson et al., 1999) have shown that comparative performances of alignment programs are deeply dependent on the set of aligned sequences. In this work, we tackle the target–template alignment problem by developing a specific program to align target and template sequences in homology modeling. Matching of homologous segments is improved by incorporation of the results of several multiple alignment programs. Results are scored to optimize the performances and screened to remove incompatible matches. Several algorithmic problems have required specific developments in order to generate and efficiently screen the database of the various and often incompatible alignments proposed by the different algorithms. This new alignment strategy is included into our ESyPred3D program (http://www.fundp.ac.be/urbm/ bioinfo/esypred/) that predicts the 3D structure of proteins using the homology modeling approach.

SYSTEM AND METHODS Our automatic program (ESyPred3D) implements the four steps of the homology modeling approach (Eisenhaber et al., 1995): (1) databanks searching to identify the structural homolog, (2) target–template alignment, (3) model building and optimization and (4) model evaluation (not implemented at the time of the CASP4 experiment). ESyPred3D was run on an SGI Octane Dual processor 225 MHz workstation under IRIX 6.5. Identifying the structural homolog To find homologs to the target sequence, PSI-BLAST 2.0.14 (downloaded from NCBI and run locally) is run using the latest possible version of the NR databank (NCBI). The chosen template is the sequence from the latest version of the PDB databank with the lowest expected value after four iterations. The cutoff for the expected value is 0.0001 (-h flag). If no template is found with these criteria, the program stops. Aligning sequences and constructing the 3D model According to Thompson et al. (1999), the quality of the alignment of sequences highly depends on the context of the alignment. The results obtained for a given pair of sequences may be different depending on the set of sequences submitted to multiple alignment programs. So, after fetching all sequences retrieved by PSI-BLAST, two sets of sequences are generated in order to create two different computational conditions for running multiple alignment programs: the set A contains the 50 best hits including the target and the template (the number of sequences is limited to 50 to reduce computing time). The

set B is a subset of at least seven sequences, including the target and the template, produced by dropping too redundant sequences with the PURGE program (provided with the Gibbs package (Neuwald et al., 1995)). The BLAST score (using the -b flag) to select or eliminate sequences during the PURGE operation is 250. The building of the target–template alignment is performed in these steps (see Figure 1): (a) Matching. Both sets of sequences (A and B) are aligned by five alignment programs emerging from two benchmarks (Briffeuil et al., 1998; Thompson et al., 1999). These programs are: ClustalW, Dialign2 (Morgenstern, 1999), Match-Box (Depiereux et al., 1997), Multalin (Corpet, 1988) and PRRP (Gotoh, 1996). Ten multiple alignments are generated, each one including the target and template sequences. Then, the pairwise alignments between the target and template sequences are extracted, leading to ten different pairwise alignments between the target and the template. (b) Database building. Each position of the alignments is stored in a database, all the redundant results, i.e. the same amino acid placed at the same position by different programs, being scored in a frequency table. (c) Screening. The position with the highest score is taken as the first anchor point to build the final target–template alignment. Incompatible results (see Figure 2.), aligning regions located up- and downstream anchor points, are removed from the database. The process is pursued, new anchor positions being determined, and incompatible regions being eliminated, until all results are selected or removed. The final target–template alignment is thus composed by the most frequent aligned positions, under the condition of compatibility. This final pairwise alignment is used by the MODEL homology modeling routine of MODELLER release 4 (Sali and Blundell, 1993; Sali et al., 1997) to build a 3D model of the target protein. This routine includes the satisfaction of spatial and geometric restraints and a very fast molecular dynamic annealing: no other refinements were applied.

Participation to the CASP4 experiment ESypred3D server participated to the CASP4 experiment (see complete results at http://PredictionCenter.llnl.gov/ casp4/; group 218: LAMBERT-CHRISTOPHE). All the models generated by MODELLER were submitted to the CASP4 contest without any geometric or energetic evaluation. The number of homology modeling targets used during the CASP4 competition was too small to take 1251

C. Lambert et al.

ESyPred3D target-template alignment PRRP

Matching

ClustalW

Multalin PSI-BLAST

Database building

Dialign Extract targettemplate alignment

Match-Box

Extract aligned positions

Scoring aligned positions

Screening

Anchor points

Choose position with highest score as anchor point

Remove incompatible positions

Target-template alignment building

Fig. 1. Flowchart of the ESyPred3D target–template alignment method. See the text for details.

1. A D L I I Y L R T S P E V A Y E

2.

L P G T N I V L G A L P E D R H

Fig. 2. Example of compatible and incompatible results on two hypothetical sequences. Three cases are reported: (a) Alignments I–I and I–L are not compatible because the same amino acid in sequence 1 is aligned to two amino acids in sequence 2. (b) Alignments P–P and A–A are not compatible. P in sequence 1 is at the right of A but P in the second sequence is the left of A. (c) Alignment I–I and P–P are compatible. The prolines are both at the right of the isoleucines.

a very robust statistical conclusion about the performances of our method. However, results obtained provide a first estimation of performances. For more statistical results, see the continuous evaluation of servers performed by EVA (Eyrich et al., 2001) (http://cubic.bioc.columbia.edu/ eva/). 1252

For the purpose of the CASP4 experience, two models were built for 13 comparative modeling targets (Table 1.) for which ESyPred3D was able to predict a 3D structure: (1) The first model was built using the complete strategy described above (ESyPred3D) (models T0xxxTS218 1). (2) The second model was built using the same strategy as ESyPred3D but by using the rough sequencestructure alignment provided by PSI-BLAST (models T0xxxTS218 2).

Scoring schemes used to compare target structures to models To compare ESyPred3D models to PSI models and ESyPred3D models to models of other CASP4 participants, the AL0 and the GDT TS scores were chosen. Both scores where calculated using the LGA (Local–Global Alignment) program (Zemla, 2000). AL0 AL0 is the number of correctly aligned residues in the target–template alignment. This score is very significant in this case because our method was designed to generate optimal alignment performances. This number is evaluated by, at first, making a structural alignment of the prediction and the target structure with the DALI-server, and then, counting the number of residues in the model for which the closest residue in the target is the correct one (the distance ˚ between their α-carbons being less than 3.8 A). GDT TS The Global Distance Test, G DTdi , is the number of αcarbons of a prediction not deviating from more than di ˚ from the α-carbons of the targets, after optimal superA imposition. This optimal superimposition is computed in such a way that the number of residues (α-carbons) that can fit under the distance cutoff di is maximum. If NT is the total number of residues of the target, GDT TS (GDT Total Score) computed according to the formula given below is the mean fraction of residues of the target not deviating from the prediction after four ˚ αoptimal superimpositions with 1.0, 2.0, 4.0, 8.0 A carbon distance cutoffs. The GDT TS score represents the overall quality of the model. This score was used to evaluate the complete procedure of ESyPred3D: identifying the structural homolog, aligning target to template and building the 3D model. G DTdi G DT T S = 100∗

di

NT

4

di ∈ {1.0, 2.0, 4.0, 8.0}


Table 1. Homology modeling targets for the CASP4 experiment

Target

Description

PDB code

T0090

ADP-ribose pyrophosphatase, E. coli

1g0s, 1g9q, 1ga7

T0092

Hypothetical protein HI0319, H. Influenzae

T0099

No description

T0103

Pepstatin insensitive carboxyl proteinase, Pseudomonas sp.

T0111

Enolase, E. coli

T0112

Ketose Reductase / Sorbitol Dehydrogenase, B. argentifolii

1e3j

T0113

Short chain 3-hydroxyacyl-coa dehydrogenase, rat

1e3w, 1e3s, 1e6w

T0117

Deoxyribonucleoside kinase, D. Melanogaster

T0121

MalK, T. litoralis

1g29

T0122

Tryptophan Synthase alpha subunit, P. furiosus

1geq

T0123

Beta-lactoglobulin, pig

1exs

T0125

Sp18 protein, H. fulgens

1gak

T0128

Manganese superoxide dismutase homolog, P. aerophilum

RESULTS AND DISCUSSION The performance of our homology modeling server is analyzed in three steps. In the first section, ESyPred3D alignments are compared to PSI-BLAST alignments. In the second section, ESyPred3D alignments are compared to those of other participants having used the same template. Since our alignment method is specifically designed for homology modeling, in the third section, ESyPred3D models are compared to those of other CASP4 competitors in order to evaluate the global performance of our homology modeling strategy. Alignment performances of ESyPred3D models compared to those of PSI-BLAST models Table 2 contains AL0 scores for all models. Out of the 13 models, ESyPred3D obtains nine AL0 scores greater than PSI-BLAST and only one AL0 score significantly lower (more than two amino acids incorrectly aligned) than PSI-BLAST, for T0112. Two reasons explain the poor alignment for T0112: (1) The number of homologs found by PSI-BLAST was so large that the non-redundant set could not be computed with PURGE. (2) Four regions of T0112 shared only a very low similarity with homologues. So the different alignment programs produced contradictory results in these regions, and only a poor alignment could be established by our method. From this first evaluation, we can conclude that the quality of the target–template alignment is generally better

1ga6

by using ESyPred3D alignment methodology than using the target–template alignment provided by PSI-BLAST.

Comparison of ESyPred3D with those of the participants having used the same template The number of groups that have used the same templates as ESyPred3D is strongly variable (from two to 65 groups, Figure 3). In Figure 3, using AL0 scores, ESyPred3D models obtained one time the first place, five times the second place, three times the third place and one times the fourth place. ESyPred3D models are then ten times in the top four places out of the 13 targets. Taking into account that a group that performs better than ESyPred3D model for one target is rarely the same that performs better for another target, one can conclude that our methodology is among the most efficient. Comparison of ESyPred3D models with those of all CASP4 participants In this section, the complete strategy of ESyPred3D is evaluated and the performances are compared to those of other CASP4 participants using the GDT TS score. The number of models submitted for each target was always above 200. So, to enable a rapid interpretation of the distributions of scores, we have computed the third quartile of these distributions. The third quartile (Q3) of a distribution is the value such that 75% of values in the list are less or equal to it. All information provided in Figure 4 has been normalized by the Q3 value, for each target. For each target, Figure 4 contains: (1) the GDT TS score of ESyPred3D models; (2) the GDT TS score of the 1253

C. Lambert et al.

Table 2. Scores for ESyPred3D and PSI-BLAST models. The last column shows templates that lead to the best models presented at CASP4

Model

RMSD (all α carbons)

GDT TS

AL0

Template

T0090TS218 1

6.52

30.15

41

1tum

1mut

T0090TS218 2

6.44

23.37

30

1tum

1mut

T0092TS218 1

14.70

35.69

73

1d2g

1xva, 1d2h

T0092TS218 2

5.22

34.69

66

1d2g

1xva, 1d2h

T0099TS218 1

5.54

52.23

26

1qly

1a0n, 1ad5, 2hck, 1qcf, 2src

T0099TS218 2

5.56

50.00

21

1qly

1a0n, 1ad5, 2hck, 1qcf, 2src

T0103TS218 1

11.95

38.59

128

1sbh

1mee, 1sup

T0103TS218 2

12.41

33.76

113

1sbh

1mee, 1sup

T0111TS218 1

2.29

83.55

383

1one

1pdz, 1pdy, 1ykf, 4-7enl

T0111TS218 2

2.26

82.39

381

1one

1pdz, 1pdy, 1ykf, 4-7enl

T0112TS218 1

5.35

54.31

174

1hdy

1teh, 1ykf

T0112TS218 2

4.13

59.19

197

1hdy

1teh, 1ykf

T0113TS218 1

3.32

81.86

214

1hdc

1hdc, 2hsd

T0113TS218 2

3.68

80.49

207

1hdc

1hdc, 2hsd

T0117TS218 1

8.24

56.85

114

1qhi

1e2k, 1kim, 1ki2-7

T0117TS218 2

3.87

55.71

109

1qhi

1e2k, 1kim, 1ki2-7

T0121TS218 1

3.35

41.94

143

1b0u

1b0u

T0121TS218 2

3.38

40.93

141

1b0u

1b0u

T0122TS218 1

2.43

79.15

203

1cw2

1a5a, 1a5b,1beu, 1cw2

T0122TS218 2

2.41

74.58

190

1cw2

1a5a, 1a5b,1beu, 1cw2

T0123TS218 1

4.15

63.91

102

2a2u

2a2g,1beb

T0123TS218 2

3.75

65.47

102

2a2u

2a2g,1beb

T0125TS218 1

4.15

61.13

74

3lyn

2lis, 3lyn

T0125TS218 2

4.07

60.40

75

3lyn

2lis, 3lyn

T0128TS218 1

1.74

86.73

185

1abm

1b06, 1sss

T0128TS218 2

1.65

87.32

187

1abm

1b06, 1sss

best model received by CASP4 organizers and (3) the third quartile is equal to 1.0 because of the normalization. Figure 4 shows that ESyPred3D built three models with scores close to the best model, indeed the second place was obtained for targets T0103, T0121 and T0122 (see full tables at http://PredictionCenter.llnl.gov/casp4/). ESyPred3D predicted eight models above Q3 values, i.e. in the top 25% of participants. Further analysis of the data at the CASP4 web site shows that there are few groups that have reached such a number of scores values above the Q3. It is also important to note that the group that obtained the best model for one target is rarely the same that submitted the best model for another target. The analysis of GDT TS scores (Figure 4) showed that seven targets (T0090, T0092, T0099, T0112, T00117, 1254

Templates leading to the best models

T0123 and T0128) obtained values significantly lower than those of the best models. For targets T0090, T0092, T0099, T0117, T0123 and T0128 the low values of GDT TS are due to the selection of a template that was not fully adequate. Indeed, for these targets, the alignment performances remain good when comparing only to groups that used the same template, as shown by the AL0 score in Figure 3. Although in our methodology the template selection process has to be improved, it is important to note that a completely inadequate template was never chosen. The result of T0112 is due to the quality of the alignment as you can see in Figure 3. The fact that eight models from 13 are above the Q3 shows that our alignment method combined with the PSIBLAST template selection and the use of MODELLER

AL0 (in % of the length)


100 90 80 70 60 50 40 30 20 10 T0 09 0 T0 09 2 T0 09 9 T0 10 3 T0 11 1 T0 11 2 T0 11 3 T0 11 7 T0 12 1 T0 12 2 T0 12 3 T0 12 5 T0 12 8

0

Targets

Fig. 3. AL0 scores for targets studied in this work. Two series are reported for each target: the score of ESyPred3D models (black bullets) and the scores of models of other CASP4 participants having used the same template (blank bullets). AL0 scores are expressed as a fraction of the length of the target.

GDT_TS (in % of Q3)

1,5 1,4

P031 1kim

P406 1xva P023 1mut P094 1sup

1,3 1,2 1,1

P237 1a0n

P406 2hsd P028 4enl

P042 1ykf

P381 1b0u P526 1cw2

P095 2lis P023 1beb

P526 1b06

1,0 0,9

T0 09 0 T0 09 2 T0 09 9 T0 10 3 T0 11 1 T0 11 2 T0 11 3 T0 11 7 T0 12 1 T0 12 2 T0 12 3 T0 12 5 T0 12 8

0,8

Targets

Fig. 4. GDT TS scores for targets studied in this work. Three points are reported for each target: the score of the model that obtain the best score (bold line), the ESyPred3D model (box) and the third quartile value (dotted line). All values are expressed as a fraction of the third quartile value. The group IDs of best predictors with their selected templates are also reported.

to obtain the 3D model is a good strategy. Even if the template selection or the alignment quality is not optimal, the global quality of the ESyPred3D modeling strategy remains good.

CONCLUSION A new alignment methodology for homology modeling of proteins has been developed. The program has been tested on 13 targets of the CASP4 for its alignment performances and for the general quality of the provided models. Our alignment strategy produced better results compared to PSI-BLAST alignments and ESyPred3D align-

ments are among the most accurate comparing to participants having used the same template. Furthermore, our ESyPred3D program provides models that are among the best of the CASP4 experiment. Nevertheless, our alignment methodology could be improved. Thompson et al. (1999) and Briffeuil et al. (1998) benchmarks showed that all alignment programs have different level of performance. We plan to use this information to improve the computing of the alignment, by weighting each multiple alignment method with the numeric representation of the mean performance of the method. Additional information such as secondary structure predictions can also be used in the box selection in order to improve the alignment quality. The template problem remains troublesome in homology modeling, especially when the target and template sequences are sharing a low identity rate. To improve the template selection, the use of better parameters or better scoring matrices for PSI-BLAST (like the one described in Kann et al. (2000)) need to be investigated. In the same way, PSI-BLAST can also be replaced by SAM-T99 (Karplus et al., 1998) or other programs. The intrinsic quality of the possible template structures (NMR, resolution, . . . ) and the selection of multiple templates will also be taken into account to improve our modeling strategy. The model evaluation step of our homology modeling methodology has not yet been developed. Geometric and energetic evaluation of the model can be done using ANOLEA (Melo and Feytmans, 1997), PROCHECK (Laskowski et al., 1993) or Verify3D (Luthy et al., 1992). The results of these evaluations will be used to change our target–template alignment or to select a more appropriate template. The process will be iterated in order to find the template that provides the best evaluated model. A similar iteration procedure has been used by the Blundell group in the CASP4.

ACKNOWLEDGMENTS We thank the organizers and assessors of the CASP4 experiment for their valuable contributions to the structure prediction field. Christophe Lambert holds a specialized grant from the ‘Fonds pour la Formation a` la Recherche dans l’Industrie et dans l’Agriculture’ (F.R.I.A.). We particularly want to thank Guy Baudoux, Katalin de Fays and Johan Wouters for helpful and fruitful discussions. REFERENCES Altschul,S.F., Madden,T.L., Sch¨affer,A.A., Zhang,J., Zhang,Z., Miller,W. and Lipman,D.J. (1997) Gapped BLAST and PSIBLAST: a new generation of protein database search programs. Nucleic Acids Res., 25, 3389–3402. Briffeuil,P., Baudoux,G., Reginster,I., De Bolle,X., Vinals,C., Feytmans,E. and Depiereux,E. (1998) Comparative analysis of

1255

C. Lambert et al.

seven multiple protein sequence alignment servers: clues to enhance predictions reliability. Bioinformatics, 14, 357–366. Browne,W.J., North,A.C. and Phillips,D.C. (1969) A possible threedimensional structure of bovine alpha-lactalbumin based on that of hen’s egg-white lysozyme. J. Mol. Biol., 42, 65–86. Corpet,F. (1988) Multiple sequence alignment with hierarchical clustering. Nucleic Acids Res., 16, 10881–10890. Depiereux,E., Baudoux,G., Briffeuil,P., Reginster,I., De Bolle,X., Vinals,C. and Feytmans,E. (1997) Match-Box server: a multiple sequence alignment tool placing emphasis on reliability. Comput. Appl. Biosci., 13, 249–256. Depiereux,E. and Feytmans,E. (1992) Match-Box: a fundamentally new algorithm for simultaneous alignment of several protein sequences. Comput. Appl. Biosci., 8, 501–509. Eisenhaber,F., Persson,B. and Argos,P. (1995) Protein structure prediction: recognition of primary, secondary, and tertiary structural features from amino acid sequence. Crit. Rev. Biochem. Mol. Biol., 30, 1–94. Eyrich,V.A., Marti-Renom,M.A., Przybylski,D., Madhusudhan,M.S., Fiser,A., Pazos,F., Valencia,A., Sali,A. and Rost,B. (2001) EVA: continuous automatic evaluation of protein structure prediction servers. Bioinformatics, 17, 1242–1243. Fischer,D., Barret,C., Bryson,K., Elofsson,A., Godzik,A., Jones,D., Karplus,K.J., Kelley,L.A., MacCallum,R.M., Pawowski,K., Rost,B., Rychlewski,L. and Sternberg,M. (1999) CAFASP-1: critical assessment of fully automated structure prediction methods. Proteins, (Suppl 3), 209–217. Gotoh,O. (1996) Significant Improvement in Accuracy of Multiple Protein Sequence Alignments by Iterative Refinement as Assessed by Reference to Structural Alignments. J. Mol. Biol., 264, 823–838. Kann,M., Qian,B. and Goldstein,R.A. (2000) Optimization of a new score function for the detection of remote homologs. Proteins, 41, 498–503. Karplus,K., Barrett,C. and Hughey,R. (1998) Hidden Markov models for detecting remote protein homologies. Bioinformatics, 14, 846–856. Karplus,K., Sjolander,K., Barrett,C., Cline,M., Haussler,D., Hughey,R., Holm,L. and Sander,C. (1997) Predicting protein structure using hidden Markov models. Proteins, (Suppl 1), 134–139. Laskowski,R.A., Moss,D.S. and Thornton,J.M. (1993) Main-chain bond lengths and bond angles in protein structures. J. Mol. Biol., 231, 1049–1067.

1256

Luthy,R., Bowie,J.U. and Eisenberg,D. (1992) Assessment of protein models with three-dimensional profiles. Nature, 356, 83– 85. Martin,A.C., MacArthur,M.W. and Thornton,J.M. (1997) Assessment of comparative modeling in CASP2. Proteins, (Suppl 1), 14–28. Melo,F. and Feytmans,E. (1997) Novel Knowledge-based Mean Force Potential at Atomic Level. J. Mol. Biol., 267, 207–222. Morgenstern,B. (1999) DIALIGN 2: improvement of the segmentto-segment approach to multiple sequence alignment. Bioinformatics, 15, 211–218. Mosimann,S., Meleshko,R. and James,M.N. (1995) A critical assessment of comparative molecular modeling of tertiary structures of proteins. Proteins, 23, 301–317. Neuwald,A.F., Liu,J.S. and Lawrence,C.E. (1995) Gibbs motif sampling: detection of bacterial outer membrane protein repeats. Protein Sci., 4, 1618–1632. Rost,B. and O’Donoghue,S. (1997) Sisyphus and prediction of protein structure. Comput. Appl. Biosci., 13, 345–356. Sali,A. and Blundell,T.L. (1993) Comparative protein modelling by satisfaction of spatial restraints. J. Mol. Biol., 234, 779–815. Sali,A., Sanchez,R. and Badretdinov,A. (1997) MODELLER: A Program for Protein Structure Modeling Release 4. Sanchez,R. and Sali,A. (1997) Advances in comparative proteinstructure modeling. Curr. Opin. Struct. Biol., 7, 206–214. Thompson,J.D., Higgins,D.G. and Gibson,T.J. (1994) CLUSTALw: improving the sensitivity of progressive multiple sequence alignment through sequence weighting, position-specific gap penalties and weight matrix choice. Nucleic Acids Res., 22, 4673–4680. Thompson,J.D., Plewniak,F. and Poch,O. (1999) A comprehensive comparison of multiple sequence alignment programs. Nucleic Acids Res., 27, 2682–2690. Tramontano,A. (1998) Homology modeling with low sequence identity. Methods, 14, 293–300. Venclovas,C., Zemla,A., Fidelis,K. and Moult,J. (1999) Some measures of comparative performance in the three CASPs. Proteins, (Suppl 3), 231–237. Yang,A.S. and Honig,B. (1999) Sequence to structure alignment in comparative modeling using PrISM. Proteins, (Suppl 3), 66–72. Zemla,A. (2000) LGA program: A Method for Finding 3D Similarities in Protein Structures. Accessed at http://PredictionCenter. llnl.gov/local/lga.