Document not found! Please try again

A self-consistent field optimization approach to build energeti- cally ...

2 downloads 0 Views 520KB Size Report
Boris A. Reva'?' , AIexei V. PinkeIstein3 and Jeffrey Skoh&. Abstract. Lattice modeling of proteins is commonly used in studying the protein folding problem.
A self-consistent

field optimization approach to build energetically and geometrically correct lattice models of proteins

Boris A. Reva’?’, AIexei V. PinkeIstein3 and Jeffrey Skoh&

combined pseudoenergy, one can also use the model

Abstract

building to test the accuracy of potentials:the better the

Lattice modeling of proteins is commonly used in

potentials i.e., the more accuratethe “interaction field”, the

studying the protein folding problem. A finite number of

smaller contribution of the “geometrical field” is required

possible conformations of lattice models enormously

for building accuratelattice models.

facilitates exploration of the conformational space.In this

Introduction

work, we suggesta methodto searchfor the optimal lattice

Lattice modeling of proteins is widely used in the

models that reproduced the off-lattice structures with

numerical investigations of protein folding kinetics and

minimal errorsin geometry and energetics.The methodis

thermodynamics’-4. A

based on the self-consistent field optimization of a

finite

number of possible

conformations of lattice models enormously facilitates

combined pseudoenergyfunction that includes two force

exploration of the conformationalspaceof a molecule.

fields: an “interaction field”, which drives the residuesto

The very first problem in lattice modeling is to build a

optimize the chain energy, and a “geometrical field”,

lattice model, given molecular coordinates and a lattice.

which attracts the residuestowards their native positions.

This model hasto be reasonablyprecisein two respects:in

By varying the contributions of theseforce iields in the

reproducingthe protein chain geometryand in reproducing the protein energetics.This is not a trivial task becausethe

‘Departmentof MolecularBiology, The ScrippsResearch Institute,10666NorthTorreyPinesRoad,Ca92037,USh; (tel:

model also has to satisfy the conditions of chain connectivity and self-avoidin?-‘.

6197848831,fax: 6197848895)

A

geometrically

accurate lattice model that preserveschain connectivity

2on leave from Institute of MathematicalProblemsof Biology RussianAcademyof Scicnccs,142292,Pushchino,

can be built by dynamic programing6=l. Geometrically

MoscowRegion,RussianFederation;

accurate self-avoiding models can be built by n self-

“institute of Protein Research,Russian Academy of Sciences, 142292, Pushchino, Moscow Region, Russian Federation

consistent field (S(X)-bnsed optimization of the error function which includes terms penalizing overapping of the chain residues’.

Permission to make digit&an? copies ofall or part ofthis mata% for pawnal of classroom use is granted without fee provided that the copies are not made or diiioted for profit or commercial advantage, the copyright notice, the title of the publication and its date appear, and notice is - -on of the ACM, Inc. To copyothenvise, girtn that copyright is by perrmsn to republii to post on senws or to redktribute to lists, requires specific pellnksionantiorfee.

RECOMB9sNew YorkNY USA copyright 1998 cl-s9791-97~9~98l33.00

214

Jn this work, we extend the approach of SCF-based optimization for building energetically and geometrically accuratemodels.The method is demonstratedon a coarse cubic lattice of spacing3.8& which is difficult for building

satisfactorymodels.

I

(Xi - R)

M&OdS f,(R)

(a) A comEned energy function for hzttice models

=

if R is one of the lattice points surrounding Xi

‘(3)

I+ 03, otherwise

Let us consider a chain of N monomers (residues al,..., aN) with 3D coordinatesx1,...,+

2

(for simplicity

we will assume further that Xi is given by Cc-atom

gives the deviation of the model from the actual 3D

coordinate of ith residue of a protein chain; a

structure. The smaller the error function, the better the

generalization for chains with side groups can be done

model. One can see that the standardroot mean squared

according to Ref.7) and let, vectors Ri, i = I ,..., N,

deviation of the lattice model with respect to the native

give the lattice points correspondingto theseresidues in

structure(RMSD) is :E,,,/N)l”.

the lattice chain model.

The condition that Ri must be one of the lattice points

To optimize a lattice model of a protein chain with

surrounding Xi is specified for computational efficiency.

respect to both geometry and energy, we minimke a

(Theoretically, one can consider all lattice points as

pseudoenergyfunction consisting of three terms: the term

allowed for eachof the chain residues.)

mainmining the chain connectivity, geometrical error and

In this study, we allow the points Ri to belong to only

chain energy.

the first shell of 8 lattice points surrounding the point

Xi.

Our experiments show (see below) that the first shell is

The chain connectivity condition is included by the

sufficient for building continuouslattice models.

tHIIlS:

To take into account both lattice model energy, ECaI, RI,..., aN, RN},

and geometrical accuracy

simultaneously, we suggest the following combined

I

+=,

pseudoenergyfunction:

otherwise

N-I

where

i = I,...,N-

pi = lXi-Xi+Il

I _

ht

this

VCal, RI-

expression

is the actual distancebetweenresidues

i and i + 1 (for a protein or-carbonchain without cys-Pro residues &3-S&

aN’ RN } =

AECal, Rp---, aN’ RN )+(‘-A)

of inter-residue distancein the lattice model from its actual

It is easyto seethat by changingA from 0 to 1 one can

A is a lattice spacing.

scanall the possiblecasesbetweenthe most geometrically

The geometricalerror ftmctiorP, presentedas:

accuratemodelsand the lowest energymodels. Thus, the problem is to find the minimum of the

N

EeJR,, ---2~) = C fi(Ri) ,

C fi(Ri) i=l

and y limits the allowed deviation

value; in this work, y = A/2,

C ‘i(Ri’Ri+I)+ i= 1 N

pseudoenergyV;

(2)

min --- mm . V{al, RI,..., aN, RN> =

i=I

Rl

where

RN * * V aI, Rl,..-, alv, RN

215

(5)

and to obtain the lattice model coordinates R;,..., Ri

The idea of the SCF approximationis to representthe

correspondingto this minimum. Then one can find the * * model chain energy E{al, RI,..: aNp = N-5

N Ui(Ri, Ri + 1I+ C yi(Ri) i=l

Here Y,(R)

N

(7)

f i(R) +AYi(R)

=

isthepotential

acting on a residue i at a point R. The term AYi (which modifks the potential fi) is the averagepotential “felt” by

N-2

c

p1

aiai + 2

i=I N-3

(IRi-Ri+21)+

h(3) aiai+3(lRi-Ri+31)+

i= c c

1

otherresiduesin space. The distribution of residues is given by functions

{FV,-(R)}, G=l,...,N); ~j(R) (6)

is a probability that

residuej occupieslattice point R. The force field createdby this distribution of residues in space is given by the

N-4

h(4) aiai+4(lRimRi+41)+

potential:

i= 1 N-2 _. c

a residue i at a point R under a given distribution of the

Ayi(R) = b~+l(IRi-Ri+21)

i=l N-3

5 j#i, i-l,i+

C&a a (IR-rI)wj(rl+ ij r 1

4

b(3) ai+lai+2(lRi-Ri+31) c i=1 The energy terms E, h”‘, hC3), ht4’, b(2) and b(3) are described in the legend to Fig.1

(b) SCF-based optimization jkcti4m

of the energy

The energyfunction in the form given by Equation (4) cannot be

* - - kl (as iu Refs.67) using dynamic

programmingbecausethe “long-range” terms of Equation (6) dependon coordinatesof non-neighborresidues. However, one can use a self-consistentfield (SCF) ~eorys’loJl to minimize sucha function.

c

O(N-i-k+l)

.

k=2 &k) c aiai+k(lR-~)Wi+k(r)+ T 8(i-2)Cbg2

r

l (IR-d)wi-2(~)

(8)

+

o(N-i-~)Cb~)+l(IR-rI)Wi+Z(r)+ r Wi-31~b~~2ai r

l(IR-d)Wi-3(rl+

Here O(Z) =

I,l>O

Then one usesthe updated CAY”} { Yi}

0, othewise -

iteration numbers when a value is chosencorrectly, that is

short-range and long-range interactions in Equation (9)

small enough. As a rule, we take ~0.8, but sometimes

equally. The form of the energyfunction m-7) enablesthe

haveto decreaseit to eO.3 when otherwise we observean

use of 1D statistical mechanics of chain molecules to potentials {Yi(R)}

potentials and then calculates WT* , { AY;}

values. According to general theoH lo, efore selfconsistency is attained, F always decreaseswith the

For the sake of computational efficiency we treat

compute the probabilities {w:(R)}

val es to obtain

increaseof I;b comparedwith Fr-‘.

provided the

The self-consistent solution is obtained when the

are given. The corresponding

algorithm can be found in Ref8, Equations Q-(13). As a

probabilities {W’}

result, one obtains {W:(X)},

This meansthat a self-consistent field is found and free

for distribution

the probability functions

of residues in

free energy, F, of this distribution. This

self-

consistent

found

solution

iteratively:

one

initial

field

i-e, randomly the

be with

AY$‘(R)

the

{q=‘}

next

step

starting field. Below we show that this dependenceis minor in our calculations.

with

),

(c)A temperatureprotocolfor lattice modeks search

and

probabilities. of

the

is a set of probabilities {W,(R)),

takes

(usually

i=I,...N. To get a

unique model one needsto decreasea temperatureto zero.

= cxAYy=-’ (R) + (I - tx)AY;-l(R) O-xx1

the lowest energy

A solution of SCF-equationsat non-zerotemperatures

iteration

one

AY;(R)

In principle, the SCF solution can depend on the

_ _ ,N) ,

(or

is achieved; typically it takes -20-40

iterations.

some

(i=l,

AY?i=‘(R) ,

(s12) where

starts

generated

obtains In

can

A’3?ic1(R) = 0

with

energy minims

field {‘Yi} , and

no longer change upon iteration.

Thus,

cx-O-5-0.9).

one needs

procedure

a

:

to

to start

use

an annealing

at

some T #O ,

Fig. 1. Long-range and short-rangeinteractions: residuesfor which potentials are derived are shown by filled circles. (a) long-range interactions: Ii - ji 2 5 ; the potential &ai,(l)

dependson the distance 1 between

remote chain residues and on chemical sorts of these residues ai and aj; (b) short-range potentials

c&&

“A ii-l i+2 A i

i+3 ii-l i+2 i+3

A-CL i

i+l

i-k2

1

hgii,,(lRi-Ri+,I)

h$i+3ClRi-Ri+jl)~

h~~i+,( IRi - Ri + 41) dependon the distancebetween

i i-l-1 A i-l

2

terminal residues and their chemical sorts; (c) short-

I,

raw

i+2

potentials

‘~‘(lRi-~-Ri+,l)

ad

b~~i+,( IRi_ I - Ri + 21) dependon chain bending in the intervening residuesi (or i and i+l) that affectsthe distancebetweenthe terminal residuesi-l, i+l.

i-r-4 217

.

Rhm(A)

Self-avoidance(%)

4

3

T=O.5 T=O.O I

.

.

0 2 4

.

.

.

.

.

1

...+

0*

6 8 IO 12 14 16 18 20 Iterations

0.0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1.0 weighting factor A

S

T=l.O

30

20

IO 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.6

0.9

1.0

weighting factor A

0 0 2 4 6 8 IO 12 14 16 18 20 Iterations

Fig. 2. Energy and entropy of a cram~m mokcule

Fig. 3.

as a fimction of number of iterations in SCF-optimkation. The calculation are done with the combined energy function (5) at A=O.9 on a lattice of 3.S& The protocols shown by filled, dashedand dotted lines correspond to three different randomly assignedstarting fields; the temperature protocol used in annealing: T=l.O until the SCF solution is obtained; then T=O.5 until the new SCF-solution is not obtained; and then T=O

(a) RMSD (fill ed circles) and percentage of self-avoiding models (open circles) as a function of the weighting factor A. The results are averaged over 10 proteins (PDB codes: lcm (46); ldtx (58); lptx (64); 2ctx (71); lcks (78); 2bop (85); 7pcy (98); 3sic (107); lbnd (109); 2rsl (120); chain lengths are given in parentheses);for each of the proteins 100 lattice models were built on a lattice of spacing 3.SA using the constraint of 8 allowed lattice points per residue; (b) Averaged per residue energy (filled circles) and the range of dispersion (shown by the dashedlines) as a function of the weighting factor A; the mean energy of the actual off-lattice structures(-0.83) is given by the thin line.

to obtain the correspondingSCF solution, then decrease temperature, obtain a new solution, etc., until zero

nly one of the lowest energychain pathways.

Results and Discussion

temperature is reached. To calculate the chain

(a) Choice of the temperature protocol

distribution at T = 0 , we usethe statisticalmechanics

In general, the chain models found by the SCF-

of 1D systemsespeciallyadapted to zero temperature12. This approach finds all the lowest energy pathways

optimization depend on the starting field and on the

while taking into account a possibility of ground state

temperature protocol. This dependency results from

degeneracy.Finally, dynamic programming singles out

using of a self-consistentfield approximationlo-*I.

218

Only de ‘iTinteractionfield” (see Equation (9)) is a

lower than the true off-lattice energy. The RMSD of the

A = 0, (Equation

lattice models built at A - 1 approachesthe maximal

subject of this approximation. When

. 4), this field doesnot act; in this caseonecantake T = 0

possible deviation 3.2L

- A when 8 lattice points of

from the very beginning (or that is absolutely the same,

the first shells are allowed per residue and even greater

one use dynamic programming) and come to the lowest

when two (43=64 lattice points) or three shells (63=216

When A # 0,

different

lattice points) are allowed. Fig.3 also shows the reduction

temperature protocols have different qualities. For

in the number of self-avoiding models at A 2 0.8, i.e.

example, one should not start at too low temperatures

when the energy term dominates.One of the reasonsfor

becauseit will trap de molecule in local minima. Our

this reduction is the use of a too narrow “tube” (only 8

experiments show that one has to start with a moderate

lattice points per residue in width) accurate enough to

temperature (e.g. with T = 1

select the native structure, although they gave quite

pseudoenergy solution.

and decrease this

temperaturegradually-

reasonableresults in recognition of the native structure in threadingg.

Figure 2 shows energy and entropy changesin the

Thus, the SCF-based optimization algorithm for

course of iterations for the temperature protocol which

lattice model building appearsto provide a more severe

turned out to be one of the best.

testfor lattice potentials.

We checked this temperature protocol using 100 different randomly chosenstarting fields and found for a

Conclusion

crambm molecule at A = I that the dispersion of the

In this work we have suggested and tested the new

lattice model energiesis -1 (m RT, tits). This is smaller

approachfor building lattice models of protein structure.

than the energyvariation causedby different lattice-protein

The method builds lattice models using a SCF-based

orientation -4.

(3) Searchfor the optiml

optimization of the combined pseudoenergy energy

models

function which includes both potential energy and geometricalconstraintsterms (error function). geometrical

In Figure 3 we presentaveragedgeometricalaccuracy

constraintsterms (error function).

(RMSD), averageresidueenergyand dispersion,and also a fraction of seIf-avoiding allowed to the chain to searchfor

We have found the optimal combination of the energy

the lowest energyconformation.However, a deeperreason

and the geometrical constraints and have shown that one

is that the employed lattice potentials are not models as

can reproduceoff-lattice structureswith minimal errors in

functions of the weighting factor A of the pseudoenergy

geometryand energetics.The obtainedmodelscan be used

(seew-4). One can see that at A = 0.7 the SCF-based

as target structuresin protein folding simulations held on

optimization algorithm builds a self-avoiding, and rather

3D lattices.

accurate(geometrically and energetically) lattice models-

Acknowledgments

However, one can see also that at A 2 0.8 when the

This work was supportedby NIH Grant GM48835 (to

energy term dominates the pseudoenergy(4) the protein

JS). AVF acknowledges support by an International

chain choosesan optimal lattice model which rather far

ResearchScholar’s Award No. 75195-544702 from the

from the true off-lattice chain pathway, i.e. RMSD is large.

Howard Hughes Medical Institute and by NIH Fogarty

Theenergy of such a lattice model is significantly

ResearchCollaboration Grant No. TWO0546. 219

References [l] Dashevskii, V.G. Mol.Biol. (Moscow) 14,105, (1980)

[2] Covell, D., Jemigan, R. Biochemisrry 29,3287, (1990) [3] Hind, D., Levitt, M. J.MoZ.BioZ.243,668, (1994)

[4] Kolinski, A, Skolnick, J. Yuttice Models of Protein Folding, Dynamics and Thermodynamics’*, RGLandes Co., Austin, (1996) [s] Godzik, A, Kolinski, A, Skotick, J. JXomChem. 14, 1194, (1992)

161Rykunov, D.S., Reva, B-A, Finkelsteiu, AV. Proteins 22,100,(1995) [7] Reva, B.A., Rykunov, D-S., Olson, AJ., Finkelsteiu,

.

AV. J. Corn BioL 2,527, (1995)

[8] Reva, B-A, Fir&e&in,

AV., Rykunov, D-S., Olson,

AL, Proteins 26,1, (1996) [9] Reva, B.A, Ftielstein,

AV., Sanner, M-F., Olson A.J,

Skolnick, J. ProtEngng. in press, (1997) [lo] Fiukelsteiu, AV., Reva, B.A ProtEngng. 9, 387, (1996) [ll]

Kubo, R “Sfatisicul Physics” Amsterdam: North-

Holand Publishing, (1965)

220

Suggest Documents