Boris A. Reva'?' , AIexei V. PinkeIstein3 and Jeffrey Skoh&. Abstract. Lattice modeling of proteins is commonly used in studying the protein folding problem.
A self-consistent
field optimization approach to build energetically and geometrically correct lattice models of proteins
Boris A. Reva’?’, AIexei V. PinkeIstein3 and Jeffrey Skoh&
combined pseudoenergy, one can also use the model
Abstract
building to test the accuracy of potentials:the better the
Lattice modeling of proteins is commonly used in
potentials i.e., the more accuratethe “interaction field”, the
studying the protein folding problem. A finite number of
smaller contribution of the “geometrical field” is required
possible conformations of lattice models enormously
for building accuratelattice models.
facilitates exploration of the conformational space.In this
Introduction
work, we suggesta methodto searchfor the optimal lattice
Lattice modeling of proteins is widely used in the
models that reproduced the off-lattice structures with
numerical investigations of protein folding kinetics and
minimal errorsin geometry and energetics.The methodis
thermodynamics’-4. A
based on the self-consistent field optimization of a
finite
number of possible
conformations of lattice models enormously facilitates
combined pseudoenergyfunction that includes two force
exploration of the conformationalspaceof a molecule.
fields: an “interaction field”, which drives the residuesto
The very first problem in lattice modeling is to build a
optimize the chain energy, and a “geometrical field”,
lattice model, given molecular coordinates and a lattice.
which attracts the residuestowards their native positions.
This model hasto be reasonablyprecisein two respects:in
By varying the contributions of theseforce iields in the
reproducingthe protein chain geometryand in reproducing the protein energetics.This is not a trivial task becausethe
‘Departmentof MolecularBiology, The ScrippsResearch Institute,10666NorthTorreyPinesRoad,Ca92037,USh; (tel:
model also has to satisfy the conditions of chain connectivity and self-avoidin?-‘.
6197848831,fax: 6197848895)
A
geometrically
accurate lattice model that preserveschain connectivity
2on leave from Institute of MathematicalProblemsof Biology RussianAcademyof Scicnccs,142292,Pushchino,
can be built by dynamic programing6=l. Geometrically
MoscowRegion,RussianFederation;
accurate self-avoiding models can be built by n self-
“institute of Protein Research,Russian Academy of Sciences, 142292, Pushchino, Moscow Region, Russian Federation
consistent field (S(X)-bnsed optimization of the error function which includes terms penalizing overapping of the chain residues’.
Permission to make digit&an? copies ofall or part ofthis mata% for pawnal of classroom use is granted without fee provided that the copies are not made or diiioted for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is - -on of the ACM, Inc. To copyothenvise, girtn that copyright is by perrmsn to republii to post on senws or to redktribute to lists, requires specific pellnksionantiorfee.
RECOMB9sNew YorkNY USA copyright 1998 cl-s9791-97~9~98l33.00
214
Jn this work, we extend the approach of SCF-based optimization for building energetically and geometrically accuratemodels.The method is demonstratedon a coarse cubic lattice of spacing3.8& which is difficult for building
satisfactorymodels.
I
(Xi - R)
M&OdS f,(R)
(a) A comEned energy function for hzttice models
=
if R is one of the lattice points surrounding Xi
‘(3)
I+ 03, otherwise
Let us consider a chain of N monomers (residues al,..., aN) with 3D coordinatesx1,...,+
2
(for simplicity
we will assume further that Xi is given by Cc-atom
gives the deviation of the model from the actual 3D
coordinate of ith residue of a protein chain; a
structure. The smaller the error function, the better the
generalization for chains with side groups can be done
model. One can see that the standardroot mean squared
according to Ref.7) and let, vectors Ri, i = I ,..., N,
deviation of the lattice model with respect to the native
give the lattice points correspondingto theseresidues in
structure(RMSD) is :E,,,/N)l”.
the lattice chain model.
The condition that Ri must be one of the lattice points
To optimize a lattice model of a protein chain with
surrounding Xi is specified for computational efficiency.
respect to both geometry and energy, we minimke a
(Theoretically, one can consider all lattice points as
pseudoenergyfunction consisting of three terms: the term
allowed for eachof the chain residues.)
mainmining the chain connectivity, geometrical error and
In this study, we allow the points Ri to belong to only
chain energy.
the first shell of 8 lattice points surrounding the point
Xi.
Our experiments show (see below) that the first shell is
The chain connectivity condition is included by the
sufficient for building continuouslattice models.
tHIIlS:
To take into account both lattice model energy, ECaI, RI,..., aN, RN},
and geometrical accuracy
simultaneously, we suggest the following combined
I
+=,
pseudoenergyfunction:
otherwise
N-I
where
i = I,...,N-
pi = lXi-Xi+Il
I _
ht
this
VCal, RI-
expression
is the actual distancebetweenresidues
i and i + 1 (for a protein or-carbonchain without cys-Pro residues &3-S&
aN’ RN } =
AECal, Rp---, aN’ RN )+(‘-A)
of inter-residue distancein the lattice model from its actual
It is easyto seethat by changingA from 0 to 1 one can
A is a lattice spacing.
scanall the possiblecasesbetweenthe most geometrically
The geometricalerror ftmctiorP, presentedas:
accuratemodelsand the lowest energymodels. Thus, the problem is to find the minimum of the
N
EeJR,, ---2~) = C fi(Ri) ,
C fi(Ri) i=l
and y limits the allowed deviation
value; in this work, y = A/2,
C ‘i(Ri’Ri+I)+ i= 1 N
pseudoenergyV;
(2)
min --- mm . V{al, RI,..., aN, RN> =
i=I
Rl
where
RN * * V aI, Rl,..-, alv, RN
215
(5)
and to obtain the lattice model coordinates R;,..., Ri
The idea of the SCF approximationis to representthe
correspondingto this minimum. Then one can find the * * model chain energy E{al, RI,..: aNp = N-5
N Ui(Ri, Ri + 1I+ C yi(Ri) i=l
Here Y,(R)
N
(7)
f i(R) +AYi(R)
=
isthepotential
acting on a residue i at a point R. The term AYi (which modifks the potential fi) is the averagepotential “felt” by
N-2
c
p1
aiai + 2
i=I N-3
(IRi-Ri+21)+
h(3) aiai+3(lRi-Ri+31)+
i= c c
1
otherresiduesin space. The distribution of residues is given by functions
{FV,-(R)}, G=l,...,N); ~j(R) (6)
is a probability that
residuej occupieslattice point R. The force field createdby this distribution of residues in space is given by the
N-4
h(4) aiai+4(lRimRi+41)+
potential:
i= 1 N-2 _. c
a residue i at a point R under a given distribution of the
Ayi(R) = b~+l(IRi-Ri+21)
i=l N-3
5 j#i, i-l,i+
C&a a (IR-rI)wj(rl+ ij r 1
4
b(3) ai+lai+2(lRi-Ri+31) c i=1 The energy terms E, h”‘, hC3), ht4’, b(2) and b(3) are described in the legend to Fig.1
(b) SCF-based optimization jkcti4m
of the energy
The energyfunction in the form given by Equation (4) cannot be
* - - kl (as iu Refs.67) using dynamic
programmingbecausethe “long-range” terms of Equation (6) dependon coordinatesof non-neighborresidues. However, one can use a self-consistentfield (SCF) ~eorys’loJl to minimize sucha function.
c
O(N-i-k+l)
.
k=2 &k) c aiai+k(lR-~)Wi+k(r)+ T 8(i-2)Cbg2
r
l (IR-d)wi-2(~)
(8)
+
o(N-i-~)Cb~)+l(IR-rI)Wi+Z(r)+ r Wi-31~b~~2ai r
l(IR-d)Wi-3(rl+
Here O(Z) =
I,l>O
Then one usesthe updated CAY”} { Yi}
0, othewise -
iteration numbers when a value is chosencorrectly, that is
short-range and long-range interactions in Equation (9)
small enough. As a rule, we take ~0.8, but sometimes
equally. The form of the energyfunction m-7) enablesthe
haveto decreaseit to eO.3 when otherwise we observean
use of 1D statistical mechanics of chain molecules to potentials {Yi(R)}
potentials and then calculates WT* , { AY;}
values. According to general theoH lo, efore selfconsistency is attained, F always decreaseswith the
For the sake of computational efficiency we treat
compute the probabilities {w:(R)}
val es to obtain
increaseof I;b comparedwith Fr-‘.
provided the
The self-consistent solution is obtained when the
are given. The corresponding
algorithm can be found in Ref8, Equations Q-(13). As a
probabilities {W’}
result, one obtains {W:(X)},
This meansthat a self-consistent field is found and free
for distribution
the probability functions
of residues in
free energy, F, of this distribution. This
self-
consistent
found
solution
iteratively:
one
initial
field
i-e, randomly the
be with
AY$‘(R)
the
{q=‘}
next
step
starting field. Below we show that this dependenceis minor in our calculations.
with
),
(c)A temperatureprotocolfor lattice modeks search
and
probabilities. of
the
is a set of probabilities {W,(R)),
takes
(usually
i=I,...N. To get a
unique model one needsto decreasea temperatureto zero.
= cxAYy=-’ (R) + (I - tx)AY;-l(R) O-xx1
the lowest energy
A solution of SCF-equationsat non-zerotemperatures
iteration
one
AY;(R)
In principle, the SCF solution can depend on the
_ _ ,N) ,
(or
is achieved; typically it takes -20-40
iterations.
some
(i=l,
AY?i=‘(R) ,
(s12) where
starts
generated
obtains In
can
A’3?ic1(R) = 0
with
energy minims
field {‘Yi} , and
no longer change upon iteration.
Thus,
cx-O-5-0.9).
one needs
procedure
a
:
to
to start
use
an annealing
at
some T #O ,
Fig. 1. Long-range and short-rangeinteractions: residuesfor which potentials are derived are shown by filled circles. (a) long-range interactions: Ii - ji 2 5 ; the potential &ai,(l)
dependson the distance 1 between
remote chain residues and on chemical sorts of these residues ai and aj; (b) short-range potentials
c&&
“A ii-l i+2 A i
i+3 ii-l i+2 i+3
A-CL i
i+l
i-k2
1
hgii,,(lRi-Ri+,I)
h$i+3ClRi-Ri+jl)~
h~~i+,( IRi - Ri + 41) dependon the distancebetween
i i-l-1 A i-l
2
terminal residues and their chemical sorts; (c) short-
I,
raw
i+2
potentials
‘~‘(lRi-~-Ri+,l)
ad
b~~i+,( IRi_ I - Ri + 21) dependon chain bending in the intervening residuesi (or i and i+l) that affectsthe distancebetweenthe terminal residuesi-l, i+l.
i-r-4 217
.
Rhm(A)
Self-avoidance(%)
4
3
T=O.5 T=O.O I
.
.
0 2 4
.
.
.
.
.
1
...+
0*
6 8 IO 12 14 16 18 20 Iterations
0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 weighting factor A
S
T=l.O
30
20
IO 0.0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.6
0.9
1.0
weighting factor A
0 0 2 4 6 8 IO 12 14 16 18 20 Iterations
Fig. 2. Energy and entropy of a cram~m mokcule
Fig. 3.
as a fimction of number of iterations in SCF-optimkation. The calculation are done with the combined energy function (5) at A=O.9 on a lattice of 3.S& The protocols shown by filled, dashedand dotted lines correspond to three different randomly assignedstarting fields; the temperature protocol used in annealing: T=l.O until the SCF solution is obtained; then T=O.5 until the new SCF-solution is not obtained; and then T=O
(a) RMSD (fill ed circles) and percentage of self-avoiding models (open circles) as a function of the weighting factor A. The results are averaged over 10 proteins (PDB codes: lcm (46); ldtx (58); lptx (64); 2ctx (71); lcks (78); 2bop (85); 7pcy (98); 3sic (107); lbnd (109); 2rsl (120); chain lengths are given in parentheses);for each of the proteins 100 lattice models were built on a lattice of spacing 3.SA using the constraint of 8 allowed lattice points per residue; (b) Averaged per residue energy (filled circles) and the range of dispersion (shown by the dashedlines) as a function of the weighting factor A; the mean energy of the actual off-lattice structures(-0.83) is given by the thin line.
to obtain the correspondingSCF solution, then decrease temperature, obtain a new solution, etc., until zero
nly one of the lowest energychain pathways.
Results and Discussion
temperature is reached. To calculate the chain
(a) Choice of the temperature protocol
distribution at T = 0 , we usethe statisticalmechanics
In general, the chain models found by the SCF-
of 1D systemsespeciallyadapted to zero temperature12. This approach finds all the lowest energy pathways
optimization depend on the starting field and on the
while taking into account a possibility of ground state
temperature protocol. This dependency results from
degeneracy.Finally, dynamic programming singles out
using of a self-consistentfield approximationlo-*I.
218
Only de ‘iTinteractionfield” (see Equation (9)) is a
lower than the true off-lattice energy. The RMSD of the
A = 0, (Equation
lattice models built at A - 1 approachesthe maximal
subject of this approximation. When
. 4), this field doesnot act; in this caseonecantake T = 0
possible deviation 3.2L
- A when 8 lattice points of
from the very beginning (or that is absolutely the same,
the first shells are allowed per residue and even greater
one use dynamic programming) and come to the lowest
when two (43=64 lattice points) or three shells (63=216
When A # 0,
different
lattice points) are allowed. Fig.3 also shows the reduction
temperature protocols have different qualities. For
in the number of self-avoiding models at A 2 0.8, i.e.
example, one should not start at too low temperatures
when the energy term dominates.One of the reasonsfor
becauseit will trap de molecule in local minima. Our
this reduction is the use of a too narrow “tube” (only 8
experiments show that one has to start with a moderate
lattice points per residue in width) accurate enough to
temperature (e.g. with T = 1
select the native structure, although they gave quite
pseudoenergy solution.
and decrease this
temperaturegradually-
reasonableresults in recognition of the native structure in threadingg.
Figure 2 shows energy and entropy changesin the
Thus, the SCF-based optimization algorithm for
course of iterations for the temperature protocol which
lattice model building appearsto provide a more severe
turned out to be one of the best.
testfor lattice potentials.
We checked this temperature protocol using 100 different randomly chosenstarting fields and found for a
Conclusion
crambm molecule at A = I that the dispersion of the
In this work we have suggested and tested the new
lattice model energiesis -1 (m RT, tits). This is smaller
approachfor building lattice models of protein structure.
than the energyvariation causedby different lattice-protein
The method builds lattice models using a SCF-based
orientation -4.
(3) Searchfor the optiml
optimization of the combined pseudoenergy energy
models
function which includes both potential energy and geometricalconstraintsterms (error function). geometrical
In Figure 3 we presentaveragedgeometricalaccuracy
constraintsterms (error function).
(RMSD), averageresidueenergyand dispersion,and also a fraction of seIf-avoiding allowed to the chain to searchfor
We have found the optimal combination of the energy
the lowest energyconformation.However, a deeperreason
and the geometrical constraints and have shown that one
is that the employed lattice potentials are not models as
can reproduceoff-lattice structureswith minimal errors in
functions of the weighting factor A of the pseudoenergy
geometryand energetics.The obtainedmodelscan be used
(seew-4). One can see that at A = 0.7 the SCF-based
as target structuresin protein folding simulations held on
optimization algorithm builds a self-avoiding, and rather
3D lattices.
accurate(geometrically and energetically) lattice models-
Acknowledgments
However, one can see also that at A 2 0.8 when the
This work was supportedby NIH Grant GM48835 (to
energy term dominates the pseudoenergy(4) the protein
JS). AVF acknowledges support by an International
chain choosesan optimal lattice model which rather far
ResearchScholar’s Award No. 75195-544702 from the
from the true off-lattice chain pathway, i.e. RMSD is large.
Howard Hughes Medical Institute and by NIH Fogarty
Theenergy of such a lattice model is significantly
ResearchCollaboration Grant No. TWO0546. 219
References [l] Dashevskii, V.G. Mol.Biol. (Moscow) 14,105, (1980)
[2] Covell, D., Jemigan, R. Biochemisrry 29,3287, (1990) [3] Hind, D., Levitt, M. J.MoZ.BioZ.243,668, (1994)
[4] Kolinski, A, Skolnick, J. Yuttice Models of Protein Folding, Dynamics and Thermodynamics’*, RGLandes Co., Austin, (1996) [s] Godzik, A, Kolinski, A, Skotick, J. JXomChem. 14, 1194, (1992)
161Rykunov, D.S., Reva, B-A, Finkelsteiu, AV. Proteins 22,100,(1995) [7] Reva, B.A., Rykunov, D-S., Olson, AJ., Finkelsteiu,
.
AV. J. Corn BioL 2,527, (1995)
[8] Reva, B-A, Fir&e&in,
AV., Rykunov, D-S., Olson,
AL, Proteins 26,1, (1996) [9] Reva, B.A, Ftielstein,
AV., Sanner, M-F., Olson A.J,
Skolnick, J. ProtEngng. in press, (1997) [lo] Fiukelsteiu, AV., Reva, B.A ProtEngng. 9, 387, (1996) [ll]
Kubo, R “Sfatisicul Physics” Amsterdam: North-
Holand Publishing, (1965)
220