Computer Physics Communications 167 (2005) 34–42 www.elsevier.com/locate/cpc
AFMM: A molecular mechanics force field vibrational parametrization program ✩ A.C. Vaiana 1 , Z. Cournia, I.B. Costescu, J.C. Smith ∗ Computational Molecular Biophysics, Interdisciplinary Center for Scientific Computing (IWR), Im Neuenheimer Feld 368, Universität Heidelberg, 69120 Heidelberg, Germany Received 5 July 2004; accepted 14 December 2004
Abstract AFMM (Automated Frequency Matching Method) is a program package for molecular mechanics force field parametrization. The method used fits the molecular mechanics potential function to both vibrational frequencies and eigenvector projections derived from quantum chemical calculations. The program optimizes an initial parameter set (either pre-existing or using chemically-reasonable estimation) by iteratively changing them until the optimal fit with the reference set is obtained. By implementing a Monte Carlo-like algorithm to vary the parameters, the tedious task of manual parametrization is replaced by an efficient automated procedure. The program is best suited for optimization of small rigid molecules in a well-defined energy minimum, for which the harmonic approximation to the energy surface is appropriate for describing the intra-molecular degrees of freedom. Program summary Title of program: AFMM Catalogue identifier: ADUZ Program summary URL: http://cpc.cs.qub.ac.uk/summaries/ADUZ Program obtainable from: CPC Program Library, Queen’s University of Belfast, N. Ireland Computer: x86 PC, SGI, Sun Microsystems Operating system: GNU/Linux, BSD, IRIX, Solaris Programming language used: Python Memory required: 10 Mbytes
✩ This paper and its associated computer program are available via the Computer Physics Communications homepage on ScienceDirect (http://www.sciencedirect.com/science/journal/00104655). * Corresponding author. Tel.: +49-6221-548857; fax: +49-6221-54886. E-mail address:
[email protected] (J.C. Smith). 1 Present address: Institut de Biologie Moléculaire et Cellulaire CNRS (UPR 9002, SMBMR) 15, rue René Descartes, 67084 Strasbourg Cedex.
0010-4655/$ – see front matter 2005 Elsevier B.V. All rights reserved. doi:10.1016/j.cpc.2004.12.005
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
35
No. of bits in a word: 32 or 64 No. of processors used: 1 Parallelized?: No No. of lines in distributed program, including test data, etc.: 13 127 No. of bytes in distributed program, including test data, etc.: 182 550 Distribution format: tar.gz Typical running time: 24 h Nature of the physical problem: Molecular mechanics force field parametrization. Method of solution: Fitting of the molecular mechanics potential to normal modes derived from quantum chemical calculations. The missing force field parameters are optimized via a merit function to obtain the optimal fit with the reference quantum mechanical set. 2005 Elsevier B.V. All rights reserved. PACS: 87.15; 87.10; 02.70.N; 33.20.Tp Keywords: Molecular mechanics; Force field parametrization; Parameter optimization; Normal mode analysis
1. Introduction During a Molecular Dynamics (MD) run, the potential energy of the system is described by an empirical molecular mechanics energy function or “force field”. The functional form of the force field generally includes a set of empirical parameters which are system dependent and must be tuned on a new system or molecule prior to performing simulations. This tuning step is generally referred to as parametrization of the force field. The reliability of a molecular mechanics calculation is dependent on both the functional form of the force field and on the numerical values of the parameters implemented in the force field itself. Thus, the first necessary step for a reliable MD simulation is the parametrization procedure. A typical empirical potential energy function is given by Eq. (1) [1] Kb (b − b0 )2 + Kub (s − s0 )2 + Kθ (θ − θ0 )2 + Kχ 1 + cos(nχ − χ0 ) V (rN ) = ub
bonds
+
impropers
Kφ (φ − φ0 )2 +
nonbond
ij
angles
Rijmin 12 rij
−
dihedrals
Rijmin 6 rij
+
qi qj , Drij
(1)
where Kb , Kub , Kθ , Kχ , Kφ are, respectively, the bond, Urey–Bradley, angle, dihedral and improper dihedral constants. b, s, θ , χ , and φ represent, respectively, bond length, Urey–Bradley 1–3 distance, bond angle, dihedral angle and improper torsion angle (the subscript zero is used to represent the corresponding equilibrium value). Nonbonded interactions between pairs of atoms (labeled i and j ) at a relative distance rij are described by the Lennard–Jones 6–12 (LJ) and Coulomb interaction terms. Rijmin and ij are, respectively, the distance between atoms i and j at which the LJ potential is minimum and the depth of the LJ potential well for the same pair of atoms. D is the effective dielectric constant and qi the partial atomic charge on atom i. General procedures for automated parametrization of molecular mechanics (MM) force fields based on fitting to any given set of reference data involve the following main steps [2,3]: • • • • •
Definition of a merit function based on the available reference data. Choice of atom types to be used and definition of new atom types when needed. Careful choice of initial parameter values. Refinement of parameters (optimization of the merit function). Testing and validation of the final parameter set.
36
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
2. Theoretical background AFMM (Automated Frequency Matching Method) provides an efficient automated way to generate intramolecular force field parameters using normal modes. The method can be, in principle, used with any atom-based molecular mechanics program which has the facility of calculating normal modes and the corresponding eigenvectors. The basic principle behind the AFMM method is to iteratively tune an initial MM (e.g., CHARMM [1]) parameter set in order to reproduce the normal modes generated from a quantum mechanical (QM) calculation. The program refines an initial parameter set, which can either be a pre-existing set or using chemically-reasonable estimation. For parametrization of new molecules, the starting parameters should be based on analogy to other similar existing molecular mechanics parameters and on chemical intuition. Equilibrium values and hybridization of the atoms involved should be carefully taken into account when designing a set of initial parameters. Another way to ensure a good choice of the initial parameter set involves checking it by visual inspection of the motions involved in exchanged eigenvector modes, using the Molden program [4], for example, and manually adjusting the parameters concerned. This approach is particularly useful for critical torsion parameters. In some cases it is necessary to derive initial parameters from rotational potential energy profiles (single point calculations from QM programs) before achieving good optimization. Equilibrium values for bonds b0 , angles, θ0 and dihedrals χ0 can be derived from the quantum chemical ground state structure or from experimental X-ray or NMR structures. A set of partial atomic charges can be computed from the QM packages using various methods as well. Use of AFMM for optimizing van der Waals parameters is not recommended. The van der Waals constants ij and Rijmin depend mostly on atomic properties and are relatively insensitive to changes in the molecular environment. Therefore, they can often be transferred from existing values and should not be modified during refinement. The reference quantum mechanical normal modes can be calculated with any QM programs (e.g., Gaussian 94/98 [5,6], NWChem [7], ADF [8]) and using any levels of theory (e.g., Hartree–Fock, DFT). Frequencies resulting from the quantum calculations often need to be scaled by an “empirical scaling factor” to compensate for approximations in the electronic structure calculations [9]. The choice of reference data upon which to base the MM parametrization is a critical step in the parametrization procedure. The reliability and accuracy of the new parameters in reproducing various properties of the molecule depend on the quality of the reference data. 2.1. Description of the method, the merit function Here we briefly sketch the principles of the parametrization method implemented in the program. For a more detailed theoretical treatment and applications, see Refs. [10,11]. Automated refinement methods are mostly based on optimizing a “merit function”, which usually corresponds to minimizing a weighted sum of square deviations from a set of reference values. One of the major problems of parametrization methods that fit to vibrational frequencies is identifying a calculated mode with the corresponding reference mode. Incorrect mode matching can lead to an unfaithful reproduction of the correct distribution of energy among the intra-molecular modes, and, consequently of the dynamical properties of the molecule. It is therefore important to use a merit function which requires that both the reference frequencies and eigenvectors should coincide with those resulting from the new parameter set. This implies that each eigenvector from the set calculated by molecular mechanics should, in principle, be orthonormal to all but one (its corresponding eigenvector) of the vectors from the reference set. AFMM projects each of the MM eigenvectors {χiM } (where the subscript i indicates the normal mode number and the superscript M indicates that the modes are calculated with MM) onto the reference set of eigenvectors Q
{χi } (the superscript Q indicates that these modes are calculated with a QM program) to find the frequency νjmax Q
corresponding to the highest projection (j : χiM · χi = max) and compares this frequency with the corresponding Q
frequency, νi . In the ideal case νi = νjmax and χiM · χi = δij where δij is the Kroenecker delta. Frequencies νi that deviate from this ideal situation may indicate exchanged eigenvectors or mismatched frequencies. AFMM is based
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
on minimizing the merit function, Y 2 , which in this case, is the deviation from the ideal situation: 2 Y2 = Ωi2 νi − νjmax ,
37
(2)
3N −6
Ωi(1) = (2)
Ωi
=
1 Q
maxj (χiM · χi ) 1 , ωiM
,
(3)
(4)
where N is the number of atoms in the molecule and there are 3N − 6 independent vibrational frequencies. The program allows three possibilities to weight the merit function: (1) The weights Ωi are chosen to be the inverse of the highest eigenvector projection. This has the effect of biasing the merit function, even in the case of a good frequency assignment, such that minimization of Y 2 leads to an improved eigenvector projection distribution (Eq. (3)). (2) The weights Ωi are chosen to be the inverse of the MD frequency. This has the effect of biasing the merit function towards better fitting of the lowest, biologically more relevant frequencies (Eq. (4)). (3) No weights (Ωi = 1). 2.1.1. Parameter refinement For the automatic optimization of the chosen subset of parameters a standard Monte Carlo (MC) scheme is used to minimize Y 2 . Although the subset of parameters to be optimized can be chosen at wish by the user, it is advisable to perform optimizations separately on bond, angle, and torsion constants. At each step i, all chosen parameters are iteratively varied in the MC algorithm with a uniform distribution within a fixed range, Yi2 is evaluated, and, if 2 , the new parameter set is used in the next step, i + 1. The optimization algorithm is illustrated in Fig. 1. Yi2 < Yi−1
Fig. 1. Schematic representation of the optimization algorithm used in the AFMM method. The method iteratively changes the parameters and matches both frequencies and eigenvector projections from the molecular mechanics (CHARMM in this case) normal mode analysis (NMA) with reference QM NMA.
38
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
When comparing results for different molecules, normalization of Y 2 can be rather tedious due to the different weights Ωi . For comparison purposes after minimization of Y 2 , the root-mean-square deviation σ from the reference case is calculated: max 2 3N −6 (νi − νj ) . σ= (5) 3N − 6 For comparison between different molecules or optimizations with different weights, the nonweighted σ should be used. 3. Description of the program The present version of the program is interfaced with the Gaussian 94/98, NWChem 4.5 and older and Molden format output files as input for reading the QM normal modes set and optimizes parameters for the CHARMM program [1]. The program is written in Python and requires Python version 2 or newer; however, it does not require any nonstandard Python modules. The program is composed of only one source code file (afmm.py) which contains definitions of two classes and the normal modes import functions. The param class contains information about the parameter set to be optimized and a Monte Carlo-like method to generate new random values of the parameters that are different from both the starting and the current values. The names of the normal modes import functions are composed of read_ followed by the program name. For the QM output files, there are also functions that identify the type of file, their name being composed of is_ followed by the program name. All the normal modes import functions return lists of nonzero frequencies and corresponding eigenvectors. The afmm class is the core of the program and contains the following methods: • • • • • • • • •
ReadConfig—read the configuration file. WriteNewParams—write new parameters in the CHARMM stream file. WriteStreamedInput—write a new CHARMM input file. RunMD—run CHARMM, checking for normal termination. DotProduct—calculates the dot product between the eigenvectors. TooLow—checks if a value is too close to zero (before division). Compute—matches the modes and computes the merit function. Optimize—main routine that iteratively assigns new values to parameters and minimizes the merit function. OutputResults—writes out the minimum weighted σ (Eq. (5)), the corresponding nonweighted σ , the optimized parameters and the frequency matching file.
The main program consists of only 3 calls: reading the configuration file, calling Optimize for computation then calling OutputResults. The program can be run as follows: python afmm.py
Before running the program the user must verify that the order of the atoms and the orientation of the molecule in the MM and QM files are the same when matching the normal modes. While running, the program will print on standard output the values of the weighted σ that result during the optimization. The program will stop in either of two cases: when the maximum number of steps for which σ remains constant is reached (convergence criterion) or when the maximum number of optimization steps is reached. Upon program completion, the minimum weighted σ , the corresponding nonweighted σ and the final parameter set are printed on standard output and the frequency matching file is created. The frequency matching file contains 2 columns, the first containing the scaled QM frequencies (if a scaling factor was given to the program) and the second containing corresponding MM frequency values.
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
39
3.1. Structure of the AFMM configuration file The AFMM configuration file must be named “afmm.cfg” and has to be located in the working directory. The configuration file is composed of two sections. The first section is called [parameters] and contains information on the parameters to be optimized. The general syntax is: parameter_name = min_value max_value start_value
The parameters to be optimized must be replaced by variables, e.g., @p1, @p2 in the CHARMM parameter file. The parameter names in the configuration file (parameter_name) should correspond to the variable names in the CHARMM parameter file. The first and second numbers are the minimum and maximum values respectively that the parameter can take during the optimization. The third number is the starting value (the initial guess). The numbers can be specified as integers or reals. The items can be separated by any number of whitespaces or tabs. The second section is called [general] and contains settings that control the flow of the program. The settings that are recognized by the current version are listed below: • • • • • • • •
maxsteps—the maximum number of optimization steps for each parameter. maxsigmasteps—convergence criterion. mdexec—path of the CHARMM executable. mdinp—path of the CHARMM input file to be run. qmout—path of the output QM file. qmfactor—empirical scaling factor for the QM frequencies [9]. afmmfile—AFMM frequency matching. weighting—frequency, projection or none (see Eqs. (3) and (4)).
If any errors occur in the configuration file, they will be printed when the program is started; errors in the [parameters] section lead to ignoring the parameter for which they occurred, while errors in the [general] section lead to the termination of the program.
Acknowledgements The authors thank G.M. Ullmann, S. Fischer, M. Sauer, A. Schultz, and J. Wolfrum for fruitful discussions. Z.C. acknowledges funding from BMBF project 03SHE2HD.
Appendix A. Test run This test run will optimize the CE2 CE1 and CE1 CT3 bond force constants for propene. The QM output file and the CHARMM input file are not provided verbatim due to their size. They are however included in the archive submitted to the CPC Program Library. [parameters] p1 = 200.0 1000.0 500.0 p2 = 200.0 600.0 383.0 [general] maxsteps = 20 maxsigmasteps = 100 mdexec = /usr/local/charmm/c28b1/cmf
40
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
mdinp = /home/cournia/propene/propene.inp qmout = /home/cournia/propene/propene_freq.out qmfactor = 0.89 afmmfile = freq.dat weighting = projection
The following output is produced by AFMM: Reading NWChem output file. Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
170.365843, 170.365843, 170.365843, 170.365843, 170.365843,
current current current current current
sigma: sigma: sigma: sigma: sigma:
175.442685 305.851532 174.324009 192.533119 176.844623
Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
170.365843, 170.365843, 170.365843, 170.365843, 170.365843,
current current current current current
sigma: sigma: sigma: sigma: sigma:
219.731076 410.539643 243.455826 362.493006 198.129129
Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
170.365843, 170.365843, 170.365843, 170.365843, 170.365843,
current current current current current
sigma: sigma: sigma: sigma: sigma:
174.542701 208.111639 337.061233 262.165510 413.674609
Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
170.365843, 170.365843, 170.365843, 170.365843, 170.365843,
current current current current current
sigma: sigma: sigma: sigma: sigma:
200.022441 173.682637 215.698296 180.699470 193.090005
Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
170.365843, 170.365843, 170.365843, 164.750416, 164.750416,
current current current current current
sigma: sigma: sigma: sigma: sigma:
171.177135 190.924645 164.750416 198.865795 386.049561
Minimum Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma: sigma:
164.750416, 164.750416, 164.750416, 164.750416, 164.750416,
current current current current current
sigma: sigma: sigma: sigma: sigma:
313.314267 182.387657 255.651696 170.806263 162.969646
Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma:
162.969646, 162.969646, 162.969646, 162.969646,
current current current current
sigma: sigma: sigma: sigma:
165.932617 318.345362 309.780318 180.099516
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
Minimum Minimum Minimum Minimum
sigma: sigma: sigma: sigma:
162.969646, 162.969646, 162.969646, 162.969646,
current current current current
sigma: sigma: sigma: sigma:
Minimum Minimum Maximum Minimum Minimum
sigma: 162.969646, current sigma: 425.721911 sigma: 162.969646, current sigma: 168.487581 number of steps (20) exceeded. Exiting. weighted sigma: 162.9696463 nonweighted sigma: 121.468295647
41
169.477198 270.833865 374.927967 321.229604
Optimized parameters: p2 352.406156 p1 489.530057
We would like to point out that the nonweighted σ value = 121.468295647 cm−1 resulting from this test run is only indicative. Based on previous benchmark studies on test molecules [10], a good optimization is reached when the value of σ is within 0–100 cm−1 . This is normally achieved with a maxsteps value of 100000 optimization cycles and a convergence criterion (maxsigmasteps) of 10000 cycles for each parameter. The test run should output the frequency matching file “freq.dat” file: 190.557900 192.037937 371.922100 526.452800 834.793300 829.987300 921.684000
423.786768 569.209935 863.725975 875.847246 960.680246
854.408900 986.111065 921.684000 1024.520025 1188.648400 1069.574113 1188.648400 1180.393724 1278.351500 1326.557541 1345.769000 1358.638400 1319.158000 1543.963100 2686.651900
1410.834322 1414.451913 1467.791268 1627.117128 2853.888621
2734.703000 2770.258500 2815.684100 2799.877700 2872.768700
2909.492726 2910.631225 2995.317222 3034.498554 3106.175468
References [1] B.R. Brooks, O. Bruccoleri, B.D. Olafson, D.J. States, S. Swaminathan, M. Karplus, CHARMM: A program for macromolecular energy, minimization and dynamics calculations, J. Comp. Biol. 4 (1983) 187.
42
A.C. Vaiana et al. / Computer Physics Communications 167 (2005) 34–42
[2] P. Norrby, T. Liljefors, Automated molecular mechanics parametrization with simultaneous utilization of experimental and quantum mechanical data, J. Comp. Chem. 10 (1998) 1146. [3] A. Hopfinger, R. Pearlstein, Molecular mechanics force-field parameterization procedures, J. Comp. Chem. 5 (1984) 486. [4] G. Schaftenaar, J. Noordik, Molden: a pre- and post-processing program for molecular and electronic structures, J. Comput.-Aided Mol. Design 14 (2000) 123–134. [5] M.J. Frisch, G.W. Trucks, H.B. Schlegel, P.M.W. Gill, B.G. Johnson, M.A. Robb, J.R. Cheeseman, T. Keith, G.A. Petersson, J.A. Montgomery, K. Raghavachari, M.A. Al-Laham, V.G. Zakrzewski, J.V. Ortiz, J.B. Foresman, J. Cioslowski, B.B. Stefanov, A. Nanayakkara, M. Challacombe, G.Y. Peng, P.Y. Ayala, W. Chen, M.W. Wong, J.L. Andres, E.S. Replogle, R. Gomperts, R.L. Martin, D.J. Fox, J.S. Binkley, D.J. Defrees, J. Baker, J.P. Stewart, M. Head-Gordon, C. Gonzalez, J.A. Pople, Gaussian 94, Revision E.1, Gaussian, Inc., Pittsburgh, PA, 1995. [6] M.J. Frisch, G.W. Trucks, H.B. Schlegel, G.E. Scuseria, M.A. Robb, J.R. Cheeseman, V.G. Zakrzewski, J.J.A. Montgomery, R.E. Stratmann, J.C. Burant, S. Dapprich, J.M. Millam, A.D. Daniels, K.N. Kudin, M.C. Strain, O. Farkas, J. Tomasi, V. Barone, M. Cossi, R. Cammi, B. Mennucci, C. Pomelli, C. Adamo, S. Clifford, J. Ochterski, G.A. Petersson, P.Y. Ayala, Q. Cui, K. Morokuma, D.K. Malick, A.D. Rabuck, K. Raghavachari, J.B. Foresman, J. Cioslowski, J.V. Ortiz, A.G. Baboul, B.B. Stefanov, G. Liu, A. Liashenko, P. Piskorz, I. Komaromi, R. Gomperts, R.L. Martin, D.J. Fox, T. Keith, M.A. Al-Laham, C.Y. Peng, A. Nanayakkara, M. Challacombe, P.M.W. Gill, B. Johnson, W. Chen, M.W. Wong, J.L. Andres, C. Gonzalez, M. Head-Gordon, E.S. Replogle, J.A. Pople, Gaussian 98, Revision A.9, Gaussian, Inc., Pittsburgh, PA, 1998. [7] R.J. Harrison, J.A. Nichols, T.P. Straatsma, M. Dupuis, E.J. Bylaska, G.I. Fann, T.L. Windus, E. Apra, J. Anchell, D. Bernholdt, P. Borowski, T. Clark, D. Clerc, H. Dachsel, B. de Jong, M. Deegan, K. Dyall, D. Elwood, H. Fruchtl, E. Glendenning, M. Gutowski, A. Hess, J. Jaffe, B. Johnson, J. Ju, R. Kendall, R. Kobayashi, R. Kutteh, Z. Lin, R. Littlefield, X. Long, B. Meng, J. Nieplocha, S. Niu, M. Rosing, G. Sandrone, M. Stave, H. Taylor, G. Thomas, J. van Lenthe, K. Wolinski, A. Wong, Z. Zhang, NWChem, A Computational Chemistry Package for Parallel Computers, Version 4.0.1, Pacific Northwest National Laboratory, Richland, Washington, USA. [8] G. te Velde, R.M. Bickelhaupt, E.J. Baerends, C.F. Guerra, S.J.A. van Gisbergen, J.G. Snijders, T. Ziegler, Chemistry with ADF, J. Comp. Chem. 22 (2001) 931. [9] J. Foresman, A. Frisch, Exploring Chemistry with Electronic Structure Methods, second ed., Gaussian, Inc., Pittsburgh, PA, 1993. [10] A. Vaiana, A. Schulz, J. Worfrum, M. Sauer, J. Smith, Molecular mechanics force field parametrization of the fluorescent probe rhodamine 6G using automated frequency matching, J. Comp. Chem. 24 (2002) 632. [11] Z. Cournia, A. Vaiana, J.C. Smith, G.M. Ullmann, Derivation of a molecular mechanics force field for cholesterol, Pure Appl. Chem. 76 (1) (2004) 189.