mations apiece. Furthermore, it correctly discriminates be- tween native and nonnative structures of multichain aggre- gates without additional ir@ormation ...
Proceedings of the 28th Annual Hawaii International Conference on System Sciences -
1995
Protein Folding Potential Functions Gordon M. Crippen College of Phamacy, University of Michigan, Ann Arbor, MI Abstract
taken to be a high-resolution X-ray crystal or NMR structure. There is also consensusthat the appropriate measure of protein conformational similarity is the root-meansquaredeviation in C” coordinatesafter optimal superposition by rigid body translation and rotation (denotedhere by RMSD). (We believe that the test of “‘topological similarity” is so ill-defined as to be a meaninglessmeasure.) Unfortunately, there is no consensusabout how smalI the RMSD between the calculated and crystal structure must be to count as success.We have recently proposedan objective RMSD cutoff between similar and dissimilar protein structuresthat dependson chain length, but is free of arbitrary decisions.[l] Using this criterion we find several cases where various authors have calculated the tertiary structureof small proteins nearly well enough, and one or two caseswhere they have barely succeeded.Yet no one hassucceededon more than one protein, to our knowledge. This brings up the requirementof generality. A particular folding algorithm may inadvertentlyor intentionally incorporate information about the protein being predicted, or it may be subtly biased toward producing structuresof that type. Successin FP must include the ability to function on a variety of different protein structural types, as well as extension beyond the set of proteins that may have been used to develop the method. Just how broad the range of proteins must be is up for debate,but we would propose that a successfulmethod should work at least for cr, p, and CY//~types of globular, water soluble proteins. There is a second major goal that is of more recent interest than FP and opposite to it, namely the inverse folding problem (IFP). Here the intent is to calculatea sequenceor sequencesthat will uniquely fold to a given 3D structure. In IFP the solution is not unique, as we know from the great similarity of the many mutant T4 lysozyme crystal strucwes, but otherwise the accuracy and generality issuesare the same. The experimentally determined 3D structureof the designedamino acid sequencemust be unique enough to crystallize, and must lie within the proposedRMSD limit comparedto the given target structure. Furthermore, this must be generally hue for some wide variety of proteins, particularly those that are significantly
There has been a great deal of activity recently on approaches to the calculation of protein folding using specially devised empirical potential functions. We have developed one such function that solves the protein structure recognition problem: given the sequence for a globular protein and a collection of plausible protein conformations, including the native conformation for that sequence, identify the correct, native conformation. Although it nas been trained on only 58 single-chain proteins, it recognizes the native conformation for essentially all compact, soluble, globular proteins having known native conformations in comparisons with lo4 to lo6 reasonable alternative conformations apiece. Furthermore, it correctly discriminates between native and nonnative structures of multichain aggregates without additional ir@ormation about disuljide bonds or bound ligands. Given its broad successes, we can use it to gain insight into the d$erences between several seemingly related computational problems.
1: Problem definition There has been a lot of excitement in the recent literature about computer calculations that “fold up proteins”, particularly methods that employ specially designed potential functions that are not general purpose molecular mechanicsforce fields, yet somehow incorporatemformation about protein folding. Along with all the excitement and optimistic claims of successhas come a great deal of confusion over who has really done what, and what does it mean in other contexts. Our purpose here is not to give an authoritative review of the field, but rather to clear up some of the misconceptionsand define some terms precisely enough to explain what we have been doing. The long term goal of many investigatorshas been the protein folding problem (FP): given only t!?eamino acit sequence,calculate the detailed three-dimensional(3D) structure of the protein. However, this statementof FP is insufficient. We must add the target accuracy of the prediction and a requirement of generality. The experimental answer the calculation is trying to match is almost always
319 1060-3425/95$4.00O1995IEEE
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
different from structuresused to develop the method. FP and IFP so formulated are probably distant goals, and we need easierproblems for now, such that their solution will lead the way to successon the more difficult ones. Certainly if we can’t even distinguish between correctly and incorrectly folded structuresof the samesequence,we have little hope of solving FP. Our researchhas therefore centered around what we will call the structure identification problem (3DID): given a particular amino acid sequence and a large collection of 3D protein structuresof the correct chain length, one of which is the correct native structure, select that native structure. Once again, the problem statementneeds to be refined as to accuracyand generality. Most investigatorsof 3DID have useda statisticaltreatment of accuracyby developing a scalar function of structure that can be used to rank all the structuresgiven, and then noting that the native lies far out on the favorable end of the disttibutionJ2, 3, 41 We have adopted the much more stringent, nonstatistical requirement that the native must always be ranked first, just as the real protein folds to its one native structure and no other. As for generality, there is the range of applicability concerning sizes and types of native proteins, as well as the range of nonnative structural types. We consider only native proteins that are compact, globular, and water soluble, consisting of one or more polypeptide chains of naturally occurring ammo acids. They may be of any folding motif and have associatedions, small ligands and prosthetic groups, but otherwisethe set of polypeptide chains comprising the native structure must be able to fold up independently of other macromolecules.For example, if the experimentally stable state of a protein is the dimer, we must consider both chains at once, not just the monomer. As far as the diversity of the alternative structuresgoes, we assume that the obviously bad ones have already been rejected by some structure quality assessmentprogram that looks for left-handed a-helices, van der Waalscontacts,unusual d/$ values, etc. Otherwise, they may be compact or noncompact, similar to the native or very dissimilar. As does the Sippl group, we generate our alternatives by cutting out contiguous segmentsof polypeptide chain the length of the native from larger PDB entries. The opposite of 3DID and a restriction of IF? is what we will call the sequenceidentification problem (SEQID): given a particular native sequenceand its high-resolution 3D structure,select from a large set of sequencesthe one or more that will fold to the target structure. As in IFP, the answer is clearly not unique, given a large assortmentof sequences,although the native sequenceshould certainly be one of the hits. A successful algorithm should be applicable to a broad class of native protein 3D types. The accuracyissue is not so straightforward. Supposethe
algorithm ranks sequencesaccordingto their suitability for the target structure, which is a globin, for instance. If the target is globin A, and it differs only a little in RMSD from globin B, then is ranking sequenceB ahead of sequence A a mistake? PerhapssequenceB is more strongly biased toward the globin folding motif than sequenceA is. One further question of problem definition common to both 3DID and SEQID is the treatment of insertions and deletions. In the sameway that permitting indels is essential to the successof sequencealignment algorithms, this is a reasonablefeature of SEQID. Without it, one can well expect to identify only the native sequence,even when clearly homologoussequencesare available for selection. Similarly, it has often been argued that 3DID must select the native structure on the basis of the conservedinterior strands(the “core” residues)alone, so that if the assortment of structuresto choosefrom does not happento include the native, the algorithm will at least recognizesome homologous structure. We presentthe argumentin the next section that such a goal for a 3DID algorithm is incompatiblewith experimentand with the previously statedobjectivesof the problem. In any case, we have strictly considered3DID without gaps of any sort.
2: Misconceptions There are some persistentmyths in the protein folding field shared by many. The first is that 3DID is easy to solve, and that several different approacheshave already solved it. The fact is that the problem is rather hard if the criteria for successare high, but it can always be trivialized by relaxing the standards.One should keep in mind the numbersand structuraldiversity of the alternative conformationscomparedto the native. It is quite easy to discriminate against a few tens of alternatives, or even hundreds, particularly if they are all clustered near the native in conformation or all are near each other but far away from the native. When there are 104 or even lo6 alternativestaken from all over the structuresin the Protein Data Bank, the problem is qualitatively different. When the native must be recognizedas the single best structure for many natives, the problem is much more demanding than asking that it be statistically outstanding. The secondmyth is that a method adaptedfrom statistical mechanicsis essentialfor successin SDID. Both the Boltzmann distribution approachof Sippl et al.[5] and the spin glass approachof Wolynes et al.[4] view a collection of alternative conformationsderived from PDB as an ensembleof conformational statesresulting from the energy function that will be subsequentlyused as the ranking potential in 3DID. As a heuristic, theseapproacheshavebeen empirically validated by producing reasonablysuccessful potentials. However, since PDB structuresor fragmentsof 320
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
them neither sample all polypeptide conformational possibilities nor are in equilibrium with each other, one cannot take the underlying physical theory too seriously. In particular, the fact that the associatedphysical theory may produce a potential with units of energy, does not necessarily lend it more validity. In 3DID, the distribution of energiesof the alternativesis irrelevant; the only thing that counts is that the native is more favored than any alternative. If that is not true, the method doesn’t solve the problem. If it is true, the method does solve the problem regardlessof the shape of the distribution of the altematives. The third myth reasons that since 3DID and SEQID both deal with the compatibility of sequenceand structure, an algorithm that solves one must necessarily solve the other. To paraphrase Sippl’s argument[S] against this common error, think of the potential function f(z, y), where z is sequenceand y is structure,expressedin some sort of parameterization. If f solves 3DID, then f(ti, y) always has its gfobaI minimum at y = t/i, where Zi and yi are the native sequenceand native structure for protein i=1,2,...,1000. Alternatively if f solves SEQID. then f(z, yi) always has its global minimum at zi for all i. There is absolutely no mathematicalreasonin the world that the lirst property off implies the second,or vice versa. For example, f( Zi, y) might have its global minimum at yi for a particular protein i, but the global minimum of f(+, Yi) CiUl easily be I = Zj. NOW this does leave open the possibility that one could devise some f(z, g) that handles both problems, although successso far has been limited.[2] We view this as a dubious undertakingbecause while we do know the native structurefor a few sequences and that thesesequencesdo not fold to other structures,we don’t know what sequencemost prefers a given structure. Empirical as these potentials may be, at least they are trying to mimic a property of the real free energy in 3DID, and there are experimental data to compare with. In SEQID, there is no correspondingphysical energy, ncr correspondingexperimentalprocesslike protein refolding. The fourth myth has it that gapped alignment is an essential feature of any 3DID method. As we have recently summarizedelsewhere,[ti] our potential solves the ungapped3DID for practically all proteins in PDB compared with literally millions of alternatives. Furthermore, it correctly disfavors a fragment taken out of context, such as one chain of a multimeric complex or a postprocessed protein locked into the conformation favored by its proprotein by disultide bonds. In particular it treats proteins having more than one polypeptide chain as well as those having only one. Having thus reacheda high level of confidence with our potential, consider what gappedstructure identification looks like to it. The isolated core segments are supposedto prefer their relative positions taken from
the native structureover any other armngement,even more compact ones. Breaking the chain in a few places within a domain and removing many residues,even those on the surface,would almost certainly prevent the remaining fragments from folding up as they did before. Why should anyone expect a potential to correctly agree with so many experimental facts and then give a bogus result that violates common experience in protein chemistry? These potentials are indeed empirical, but in some sense they have assimilateda lot of knowledge about protein folding. A second way to look at this last myth is that intro ducing gaps adds degrees of freedom to the calculation that a real protein doesn’t have. When different members of a homologous series of proteins fold to similar core structuresexcept for the positioning of their different surface loops, alternative placementsfor all the residues must be taken into consideration,and the native fold is the most favorable structureeach sequencecan choose. In the correspondingcalculations, when we introduce gaps into the native sequenceand/or gaps into the structureswe are comparing, some residues and their interactions with the rest of the protein effectively disappear. This can lead to the curious result of a nonnative gapped alignment of the native sequenceonto the native structure being preferred over the native alignment.
3: Results 3.1: 3DID validation of the potential By methods describedin detail elsewhere,[7] we have determined a potential function of interresidue contacts that solves 3DID by giving a lower value for the native structure than for all alternatives by a margin of at least 3xRMSD between the native and the alternative structures. In what follows, any native and alternative pair failing to satisfy this inequality will be called a violation. The contacts are grouped into 112 classes,dependingon the types of the two interacting residuesand their sequence separation,so there are 112 correspondingadjustableparameters. The latest version of the potential[6] was determined by a training set consisting of 58 compact, single chain proteins (50 X-ray + 8 NMR), and a total of 6,566 of their alternatives. These native proteins were an arbitrarily chosen set of high-resolution, small to medium chain length proteins from all folding classes,some of them apparently important for the training process. We tested it on a supersetof altogether212 native proteins (176 X-ray + 17 NMR + 19 models) and 477 homologuesof these, contrastedto a grand total of 1,627,714alternatives. The templateproteins usedto constructthe alternativesencompassedessentiallyall PDB entries of medium to high resolution. Of these 212, only 161 are compact (totalling 321
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - I995
1242,182 alternatives). Compact polypeptide conformers satisfy certain preciserequirementsof small radius of gyration and large numbersof inter-residuecontacts.[7] Out of the non-compactproteins, for which the potential is not required to work, just 21(15 X-ray + 5 NMR + 1 model) had a few violations, and transforming growth factor CY(4tgf, by NMR) had many. Six compact test proteins had in all 12 violations. Model structures(lmca.A, lapk, and 2apk) account for 9 of theseviolations, but this likely says more about the calculationsthat determinedthem than about our agreementwith experiment. One violation for pike parvalbumin (lpal) is due to a very similar structure from the homologouscarp parvalbumin(5cpv). which is an example of the desirable property that very similar conformations have very close potential function values. Finally, there was one violation for each for myohemerythrin(2mhr) and chain I of bean pod mottle virus (1bmv. 1). Thesetwo constraintsneed to be introduced into the training set when the next update of the potential is calculated Still, two genuine errors out of 1,200,OOO testsgives one confidencethe potential has learneda lot about a wide variety of proteins. A gratifying consequenceof training the potential on single chain proteins is that it behavescorrectly on multichain aggregates. For example, the native conformations of ten compact two-chain proteins (2ltn, 2fbj, 2fdl, 4fab, 3fab, 3hfm, lf19, 2igf, lmcp, and 2ig2) were correctly favored over the 500,000 to 2,000 alternatives we could generatefor each. Similarly, the potential favored the native melittin tetramer (2mlt) over all the 1.28x lo6 alternatives we tried. In contrast, the potential correctly rejects the crystal conformations of the monomer and dimer as possible native conformations. Indeed, the stable form of melittin in solution and in the crystal is the tetramer.[8] Similarly, the crystal structureof insulin (3ins) is not preferred over many alternativesby the potential, which treats the A and B chains as independentlyrearrangablechains whose Cys residuesmay or may not engage in any combination of disulfide bridges. This negative result is in agreementwith the experimentalfact that proinsulin spontaneously folds, is stabilized by internal crosslinks, and then is cut to form insulin’s two chainsjoined by disulfide bridges. Indeed, the reoxidation of reduced insulin forms a random mixture of different cysteine pairings.[9] Not surprisingly, the potential is also able to correctly discriminatebetween the 12 casesin the literature of correct vs. intentionally incorrect protein folds[lO, 111, the correct crystal structure of RuBisC0[121 vs. the earlier incorrect chain tracing.[l3] the correct crystal structureof HIV protease (5hvp)[14] vs. the earlier model structure (lhvp),[l5] and the correct NMR structure[l61 of interleulcin 4 vs. the model structure[l7]. The last case is an interestingdiscrimination betweentwo clhelix bundles differing in the handednessof the helix packing arrange-
ment.
3.2: Free energy and potential comparison Since our potential is successfulat 3DID, it resembles the free energy of proteins insofar as it assignsa lower number to the native conformation than to nonnativeones. One might reasonablyinquire whether there is any closer correlation, perhaps as an unforeseenconsequenceof all the training. A good test caseis the wild type T4 lysozyme and 13 single point mutants, for which Alber et al.[18] have measuredthe relative AG of folding. The first observation is that our potential gave no change in value whatsoeverfor the native structure when changing from the wild sequenceto some mutants. The reason is that the potential is calculatedas a sum of contact contributions, where contactsare classifiedaccording to the types of the amino acids interacting by grouping them into classes,such as large hydrophobic, acidic polar, etc. A mutation from one member of a group to another still gives exactly the samecontributions for the samelist of contacts. Consequently,we rederived a new potential using a detailed classificationscheme,where contactsare classedby sequenceseparation(short, medium, and long range) and which of the 20 amino acid types the two interacting residuesare (hence 20x21/2 = 210 pair types). Instead of the 112 adjustableparametersof our standard potential, this detailed potential had 3x210 = 630 parameters. Adjusting these parameterswith the same training se! as before resulted in the “detailed potential”, having comparablepredictive power. A plausible calculatedquantity that should correspond to the AG of folding is the detailed potential difference between the native and the lowest (most favorable) alternative. However, we found no significant correlation between them for the 14 T4 lysozymes.We conclude that successat 3DID does not necessarilyimply a quantitative correlation with conformational free energy, and that our potential functions, standardand detailed, tend to be rough surfaces,sometimesgiving large changesin value for only moderatechangesin sequenceor structure.
3.3: Gapped 3DID Lathrop, Smith, and coworkers have been examining the gapped 3DID problem using pairwise interaction potentials.[l3] Assuming the native structure must be recognizcble from the interactions between residues in the structural core (as opposed to core-loop or loop-loop interactions),they representeach native globular protein by only a set of core strands, determined by an examination of the crystal structure. For example, for sea snake 322
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995
Table 1 Residue-residuecontact contributions for some protein crystal structures,broken down according to core and loop interactions. Protein
9%of Total Potential Value Core-Core
Core-Loop
Loop-Loop
lsn3
42.1
44.1
13.7
351c
72.0
10.5
17.5
lubs
65.5
30.9
3.7
lhoe
86.8
18.5
-5.4
5cyt.R
40.1
36.5
23.4
scpv
35.7
42.4
21.9
O?DB code)
realistic comparisons. Supposethe native threadingof the native sequenceonto the native structurehas someresidues ...Phe-Ser-X-X-Val-Ala... where the first four residuesare in an interior core strand, so that Phe and Ser are buried, but Val and Ala are exposed. Shifting the alignment one residue to the left would expose Phe in exchange for burying Val, which would probably be a net loss in stability for the real protein and for our potential in an ungapped calculation. On the other hand, considering only core strandsmeans that the Phe would be pushed off the end of the strand so that it disappearsand no longer interacts with anything, Ser becomesmore exposedto solvent, and Val enters the buried core strand, a net favorable move. Our second explanation questions the assumptionthat only core-core interactions are important in determining protein folding. Given reasonableassignmentsof core strandsfor several proteins,[20] we calculated for several native proteins the fractions of the total contact potential values arising from core-core, core-loop, and loop-loop interactions (Table 1). Depending on the protein, corecore interactions accounted for anywhere between 30% and 90% of the stabilization of the native structure. (For lhoe. the calculatedloop-loop interactionsare a net destabilization.) Of course, our potential is only an empirical construct that does not necessarilyquantitatively translate into free energy,but it is so successfulat 3DID that these observationscannot be dismissedout of hand. This work was supportedby a grant from the National Institute on Drug Abuse (DAO6746). We are indebted to all the researcherswho deposited their protein structural data in the Protein Data Bank.
neurotoxin (lsn3) theseare chosento be residues l-4 (extended), 23-32 (helix), 37-41 (extended),and 46-50 (extended).[20] Then the given native sequenceis presented with the native structure’s core strands and the cores of any other known protein structures such that the sum of the residuesin the core does not exceedthe length of the given sequence. A threading of the sequenceonto some core amountsto assigning a contiguous sequencesegment of the correct length to each core structure segment. Each core residue position is used exactly once, each sequence residue is used at most once, and the order of core segments matches the order of used sequencesegments. In other words, all the core residue positions are assigned residues, but there will be unassignedsequenceresidues that effectively vanish. This formulation of gapped3DID then amountsto finding the optimal (lowest) potential threading of the given sequenceonto all the cores and seeing whether this one is the native alignment of the native sequenceonto the native core. Since loop segmentshave been removed from the structures,there are many ways to thread even the native sequenceonto the native core. Using our standard 112 parameterpotential, the native threading of the lsn3 sequenceonto its core correspondsto a value of -229.7 arbitrary units.[20] However, a nonnative threading of this sequenceonto the samecore yields the markedly better score of -395.7 units (sequencesegments5-8, 14-23, 3741, and 44-48 onto core structure segments14,23-32,37-41 and 46-50, respectively.) Furthermore,the samesequence prefers some threadings onto the cores of other proteins even more. We have two explanations for why our potential does so well with ungappedthreading and so poorly at gapped threading. The first is the idea that gapped threading allows the calculation some freedom that the real protein does not have, and our potential has been trained in more
Bibliography 1. 2.
3.
4.
5.
6.
Maiorov, V. N. and Crippen,G. M. (1994)Significance of root-mean-square deviationin comparingthreedimensional structuresof globularproteins,J. Mol. Bill. 235. 625-634. Bryant, S. H. and Lawrence.C. E. (1993)An empirical energy function for threading protein sequence, frofeiru Struct. Func. Genetics 16, 92-112. Cove& D. G. and Jernigan, R. L. (1990) Conformations of folded proteinsin restrictedspaces,Biochemistry 29, 3287-3294. Goldstein, R. A., Luthey-Schulten. 2. A., and Wolynea, P. G. (1992) Protein tertiary structure recognition using optimized hamihonians with local interactions, Proc. Narl. Acad. Sci. U. S. A. 89, 9029-33. Sippl, M. J. (1993) Boltzmann’s principle, knowledgebased mean fields and protein folding. An approachto the computational determination of protein structures, .I. Camp.-Aided Mol. Design 7, 473-501. Maiorov, V. N. and Crippen. G. M. (1994) Learning about protein folding via potential functions, Protebu: Stfuct. Func. Genetics in press.0.
323
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE
Proceedings of the 28th Annual Hawaii International Conference on System Sciences - 1995 7.
8.
9.
10.
11.
12.
13. 14.
Maiorov, V. N. and Crippen. G. M. (1992) Contact potential that recognizes the correct folding of globular proteins, J. Mol. Biol. 227. 876-888. Terwilliger. T. and Eisenberg, D. (1982) The structure of melittin. I. Structure determination and partial refinement, J. Biol. Chem. 257, 6016-6022. Srlnivasa, B. R. (1984) SulIhydryl oxidation of reduced insulin in dilute solution, Biochemistry International 9, 523429. Novomy, J., Rashin, A. A., and Bruccoleri. R. E. (1988) Criteria that discriminate between native proteins and incorrectly folded models, Proteins: Siruct., Funct.. Genet. 4, 19-30. Holm. L. and Sander, C. (1992) Evaluation of protein models by atomic salvation preference, J. Mol. 3ioI. 225, 93-105. Curmi, P., Cascio, D.. Sweet, R., Eisenberg, D., and Schreuder, H. (1992) Crystal structure of the unactivated form of ribulose-1.5 bisphosphate carboxylase/oxygenase from tobacco refined at 2.0-a resolution, J. Biol. Chem. 267. 1698@-16989. Eisenberg. D. personal communication. Fitzgerald, P., McKeever. B., vanMiddlesworth, J., Springer, J., Heimbac, J.. Leu. C.-T., W.K.Herber, Dixon,
15.
16.
17. 18.
19.
20.
R., and Darke. P. (1990) Crystallographic analysis of a complex between human immunodeficiency virus type 1 proteins and acetylpepstatin at 2.0~angstroms resolution, J. Biol. Chem. 265, 14209-14219. Weber. I., Miller, M.. Jaskolslci, M.. Leis. J., Skalka, A.. and Wlodawer, A. (1989) Molecular modeling of the hiv-1 protease and its substrate binding site, Science 243, 928931. Smith, L.. Redfield, C., Boyd, J., Lawrence, G., Edwards, R. G., Smith, R. A. G.. and Dobson, G. M. (1992) Human interleukin 4: The solution structure of a four-helix-bundle protein, 1. Mol. Biol. 224, 899-904. Cohen, F. and Presnell. S. personal communication. Aiber. T.. Sun. D. P., Wilson, K., Wozniak, J. A., Cook, S. P., and Matthews, B. W. (1987) Contributions of hydrogen bonds of thr 157 to the thermodynamic stability of phage t4 lysozyme, Nature 330. 41-46. Lathrop, R. and Smith, T. F. (1994) A branch-and bound algorithm for optimal protein threading with pairwise (contact potential) ammo acid interactions, In L. Hunter, (ed.). Proceedings of the 27th Annual Hawaii International Conference on System Sciences, Los Alamitos, CA: IEEE Computer Society Press. pp. 365-374. Buturovic, L. (1994) personal communication.
324
Proceedings of the 28th Hawaii International Conference on System Sciences (HICSS '95) 1060-3425/95 $10.00 © 1995 IEEE