Development and application of a hybrid method involving ... - Berkeley

0 downloads 0 Views 554KB Size Report
Nov 6, 2008 - These modifications were tested to measure the reduction in .... The second modification to the GSM is to use the con- jugate gradient method ... In their initial formulation, Collins and ... The second derivative term,. 2E/ n ..... surface showing the three minima A, B, C and two saddle points, or TSs. TS1, TS2. 1.
THE JOURNAL OF CHEMICAL PHYSICS 129, 174109 共2008兲

Development and application of a hybrid method involving interpolation and ab initio calculations for the determination of transition states Anthony Goodrow,1 Alexis T. Bell,2,a兲 and Martin Head-Gordon3,a兲 1

University of California—Berkeley, Berkeley, California 94720-1462, USA Department of Chemical Engineering, University of California, Berkeley, Berkeley, California 94720-1462, USA 3 Department of Chemistry, University of California, Berkeley, Berkeley, California 94720-1462, USA 2

共Received 18 July 2008; accepted 11 September 2008; published online 6 November 2008兲 Transition state search algorithms, such as the nudged elastic band can fail, if a good initial guess of the transition state structure cannot be provided. The growing string method 共GSM兲 关J. Chem. Phys. 120, 7877 共2004兲兴 eliminates the need for an initial guess of the transition state. While this method only requires knowledge of the reactant and product geometries, it is computationally intensive. To alleviate the bottlenecks in the GSM, several modifications were implemented: Cartesian coordinates were replaced by internal coordinates, the steepest descent method for minimization of orthogonal forces to locate the reaction path was replaced by the conjugate gradient method, and an interpolation scheme was used to estimate the energy and gradient, thereby reducing the calls to the quantum mechanical 共QM兲 code. These modifications were tested to measure the reduction in computational time for four cases of increasing complexity: the Müller–Brown potential energy surface, alanine dipeptide isomerization, H abstraction in methanol oxidation, and C–H bond activation in oxidative carbonylation of toluene to p-toluic acid. These examples show that the modified GSM can achieve two- to threefold speedups 共measured in terms of the reduction in actual QM gradients computed兲 over the original version of the method without compromising accuracy of the geometry and energy of the final transition state. Additional savings in computational effort can be achieved by carrying out the initial search for the minimum energy pathway 共MEP兲 using a lower level of theory 共e.g., HF/STO-3G兲 and then refining the MEP using density functional theory at the B3LYP level with larger basis sets 共e.g., 6-31Gⴱ, LANL2DZ兲. Thus, a general strategy for determining transition state structures is to initiate the modified GSM using a low level of theory with minimal basis sets and then refining the calculation at a higher level of theory with larger basis sets. © 2008 American Institute of Physics. 关DOI: 10.1063/1.2992618兴 I. INTRODUCTION

Efficient methods for finding transition states 共TSs兲 are required for studies of the mechanism and kinetics of chemical reactions.1–5 During the past several decades, two classes of TS-finding methods have been developed, often referred to as surface-walking methods and interpolation methods.6–8 Surface-walking methods require calculation of the potential gradient and Hessian in addition to the reactant configuration. The most common of these algorithms are those based on restricted-step Newton–Raphson methods,9 such as that developed by Cerjan and Miller10 and later modified by Banerjee et al.11 and Baker12 as the partitioned rational function optimization 共P-RFO兲 method. Such methods tend to work well if information about the potential energy surface is known or can be calculated easily, but they are slow to converge, particularly for large systems with a large number of low frequency modes. For complex systems, the computational burden can become prohibitive for determining the Hessian matrix analytically or numerically. Although methods exist for updating the Hessian matrix, there are convera兲

Authors to whom correspondence should be addressed. Electronic addresses: [email protected] and [email protected].

0021-9606/2008/129共17兲/174109/12/$23.00

gence problems associated with them and they may not lead to the correct TS.5 Interpolation methods require knowledge of the reactant and product configurations in order to generate a sequence of structures 共nodes兲 between them. The sum of the orthogonal forces acting on each node is then minimized, resulting in the placement of the nodes on the minimum energy pathway 共MEP兲. A common interpolation method is the nudged elastic band 共NEB兲.13,14 The NEB connects the reactant and product states by a series of intermediate structures 共typically 10–20兲 with springlike interactions between each node. Upon convergence, the band joining all the structures is the MEP. It is noted, though, that for this method to be successful, a good initial guess of the TS is needed, which can be a challenge for complex reaction systems. When the guess for the initial reaction path is far from the MEP, electronic structure calculations along the path can fail to converge. Finally, if Hooke’s constants used to keep the nodes equally spaced along the path are not chosen appropriately, convergence can be slow or the resulting path can deviate significantly from the MEP.7 Similar to the NEB, E et al.15 introduced the zero temperature string method 共SM兲 for finding the MEP. A string joining the reactant and product configurations in the SM

129, 174109-1

© 2008 American Institute of Physics

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-2

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon

moves in the direction of the orthogonal force at each node on the string. Rather than using Hooke’s constants to keep the nodes equally spaced along the pathway in the NEB, the SM redistributes the nodes along the string after each move. A class of methods based on the SM has emerged recently.7,16,17 One such method is the growing SM 共GSM兲 developed by Peters et al.7 In contrast to the NEB and SM, the GSM does not require a guess of the TS geometry or MEP,7 and consequently, electronic structure calculations are far less likely to fail. Several improvements have been made recently to the SM and GSM by E et al.18 and Aguilar-Mogas et al.,19 respectively. Rather than equally spacing the nodes in terms of arclength, the spacing is determined by using energy-weighted arclength, enabling the achievement of a finer resolution of nodes near the TS.18,19 In addition, a method has been proposed for avoiding numerical instability in the calculation of the tangent vector for nodes near a reactant or product minimum or near the TS.19 However, even with these improvements, the GSM remains quite computationally intensive and, hence, there is incentive to find additional methods for decreasing the computational burden. This paper presents a modified version of the GSM developed to alleviate the computational bottleneck associated with the original version of this algorithm. Three separate improvements are introduced: 共1兲 the use of internal coordinates instead of Cartesian coordinates, 共2兲 the use of the conjugate gradient method instead of the steepest descent method for the minimization of orthogonal forces during the evolution step, and 共3兲 the use of an interpolation scheme for points on the potential energy surface in order to minimize calls to the quantum mechanics 共QM兲 code. These modifications are then implemented within the original GSM. The methodology and strategy for the use of a hybrid low level/ high level of theory to determine TSs using the modified GSM is also described. The performance of the modified GSM is then assessed for four reactions of increasing complexity. These examples demonstrate that significant reduction in computational effort can be achieved without a loss in accuracy using the modified GSM. The computational savings associated with the use of a hybrid low level/high level of theory approach are discussed with each example. II. THEORY

Prior to discussing the improvements made to the GSM, it is useful to summarize the method briefly. There are four distinct steps involved in growing a string of nodes and then placing these nodes along the MEP. The creation of a string is initiated by locating the geometries and energies of the reactant and product states. Then two additional nodes are located along the linear synchronous transit path joining the reactant and product configurations. One of these nodes is added close to the reactant state and the other is added close to the products. The string is then grown independently from the reactant and product ends and the string segments are joined together once the fraction of vacant arclength between the two string segments reaches zero. The second step is evolution of the two string segments. During this step, the QM code is used to calculate the energy

and forces at each node. The orthogonal force, f⬜, on each node is then minimized using the steepest descent method. This method generates a new geometry for the node by moving it a prescribed distance along the direction of steepest descent. The QM code is then used again to calculate the energy and forces for the new node geometry. This sequence of steps is repeated until the norm of the orthogonal force on tol . each node, 储f⬜储, falls below a user-specified tolerance, f ⬜ The resulting string is parallel to the gradient at each point along the path. The evolution of the string is the most computationally demanding part of the GSM method because of repeated calls to the QM code to calculate the energy and forces. In the third step, the string is reparametrized by redistributing the nodes uniformly along a string of normalized arclength after each evolution step. The arclength of the string, s, is calculated by integrating along a cubic spline joining the nodes in each string segment. If the two string segments have been joined, then a single cubic spline through all nodes is integrated. The arclength is then normalized such that the reactant and product are located at s = 0 and s = 1, respectively. The remaining nodes in the normalized string are equally spaced in terms of arclength. For example, in a fully joined string containing 11 nodes, the nodes are 1 2 , 10 , . . ., and 1. The computational cost of located at s = 0, 10 the reparametrization is negligible compared to a single QM force calculation. Once the string has been evolved and reparametrized, the fourth step is the growth of the string. As the string grows inward from both the reactant and product ends independently, a new node can be added to each advancing string segment. A new node is added to the string if the number of nodes, n, is less than the maximum number, Nnode. The location of the node on the string is determined by extrapolating a fitted cubic spline through the existing nodes. Once the distance between the two advancing nodes reaches zero, the string grown from the reactant end merges with the string grown from the product end. The number of nodes remains constant and the evolution step proceeds until the sum of the Nnode 储f⬜,n储, is norm of the orthogonal forces on all nodes, 兺n=1 tol less than a user-specified value, F⬜ . The geometry corresponding to the highest energy on the converged string is then used as the TS estimate in a conventional TS search calculation, using the P-RFO method.11,12 A. Coordinate system

The first modification to the GSM is the replacement of the Cartesian coordinates by nonredundant delocalized internal coordinates 共henceforth referred to simply as internal coordinates兲 for the description of molecular geometries. Previous work has shown that the choice of coordinate system can have a major influence on the rate of convergence in geometry optimizations and that a coordinate system involving minimal coupling between individual coordinates is preferred.20–23 Internal coordinates, i.e., bond lengths, bond angles, and dihedral 共or torsion兲 angels, are weakly coupled in contrast to the Cartesian coordinates, which are heavily coupled. Baker et al.23 described a general procedure for

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-3

J. Chem. Phys. 129, 174109 共2008兲

Modified growing string method

generating a complete, nonredundant set of delocalized internal coordinates from the Cartesian coordinates for any molecular configuration. This procedure involves two steps: 共1兲 relating displacements in the Cartesian coordinates, ⌬X, to displacements in primitive internal coordinates, ⌬q 共which may be redundant兲, using the Wilson B matrix24 共i.e., ⌬q = B⌬X兲 and 共2兲 diagonalizing the matrix G = BBT to give eigenvalues that are positive or zero. For a species of N atoms, there are 3N-6 eigenvectors with positive eigenvalues, and the remaining eigenvalues are zero. The eigenvectors corresponding to the 3N-6 positive eigenvalues of G represent the set of nonredundant delocalized internal coordinates, ␰. B. Conjugate gradient

The second modification to the GSM is to use the conjugate gradient method, instead of the previously used steepest descent method, to minimize the orthogonal forces on the nodes during the evolution of the string. The steepest descent method moves a node from point xi to xi+1 on the potential energy surface by moving in the direction of the local downhill gradient, −ⵜE共xi兲. While this minimization scheme incorporates gradient information, it has the disadvantage of taking many small steps.25 The conjugate gradient removes these difficulties. While this method involves more computations to compute the search direction, it requires fewer overall steps to achieve convergence. Additionally, the convergence characteristics of the conjugate gradient method are much better than that of the steepest descent method because the current gradient is the Hessian-conjugate with respect to the previous gradient search directions.25 As a consequence, the convergence of the conjugate gradient is superlinear, whereas the convergence of the steepest descent method is linear. The difference in the rates of convergence between these two methods can lead to large savings in computational time when the conjugate gradient method is used, especially in cases where the potential energy surface is relatively flat.25,26 It should also be noted that still more efficient minimization procedures exist, such as global-BFGS,27 where all available gradient information 共from each node兲 is used in each individual local optimization. In our present context, it is not clear whether the reduction in the number of force calculations would offset the advantage of using an interpolation method to reduce the number of calls to the QM code 共see below兲. C. Potential energy surface interpolation

The third modification to the GSM is the use of a potential energy surface 共PES兲 interpolation method. Currently, the GSM makes frequent calls to the QM code to perform force calculations. Significant savings in computational time can be made by estimating the energy of a given geometry through an interpolation method, rather than computing it directly. An efficient method for estimating the energy of points along the potential energy surface has been developed by Collins and co-workers.28–33 This method uses a modified-Shepard interpolation scheme.34 Data points on the PES calculated previously by the QM code are used to esti-

mate the energy of new points by means of a weighted sum of Taylor series. In their initial formulation, Collins and co-workers28–33 used reciprocal atom-atom distances to describe the geometry of a molecule. Since internal coordinates were used for the present study, the interpolation approach of Collins and co-workers was rewritten in these coordinates. A second-order Taylor series expansion as a function of the internal coordinates, ␰, was used to approximate the local potential energy in the vicinity of a calculated point, ␰共i兲, as shown in Eq. 共1兲. The term E关␰共i兲兴 is the QM calculated energy for configuration ␰共i兲 and ⳵E / ⳵␰n is the QM calculated analytical gradient. The second derivative term, ⳵2E / ⳵␰n ⳵ ␰m, is approximated by using a unit matrix for the Hessian 共the default for many QM codes兲. An analytical Hessian is not used because its calculation is prohibitively expensive and does not provide a significant improvement over using an approximate Hessian in the accuracy of the energy for interpolated points. Thus, Ti共␰兲 represents the Taylor series estimate of the energy of geometry ␰ based on data point i, where the indices n and m in Eq. 共1兲 refer to one of the 3N-6 components of ␰.

冏 冏

3N−6

Ti共␰兲 = E关␰共i兲兴 +

+

1 2





⳵E

兺 关␰n − ␰n共i兲兴 ⳵␰n n=1

␰共i兲

3N−6 3N−6

兺 兺

关␰n − ␰n共i兲兴关␰m − ␰m共i兲兴

n=1 m=1

⳵ 2E ⳵␰n ⳵ ␰m



. ␰共i兲

共1兲

A weighted sum of Taylor series expansions is used to estimate the energy of a known geometry, ␰, on the potential energy surface, Eq. 共2兲. The summation is over all data points Ndata calculated previously by the QM code. Ndata

E共␰兲 ⬇

wi共␰兲Ti共␰兲. 兺 i=1

共2兲

Data points closer to the configuration that is to be calculated by interpolation are weighted more heavily than data points that are further away. The weighting function wi共␰兲 is based on confidence lengths determined from a Bayesian analysis of the errors in the gradient.31,35 Each Taylor series expansion has an associated confidence length for each of the 3N-6 directions in space. The normalized weighting function wi共␰兲 is given by Eq. 共3兲 and involves determination of the primitive weights vi共␰兲. The un-normalized primitive weights vi共␰兲 and the confidence lengths dn共i兲 are determined from Eqs. 共4兲 and 共5兲, respectively. Previous work by Thompson et al.30 and Bettens and Collins35 showed empirically that the best values for the parameters in Eq. 共4兲 are q = 2 and p = 12 for a wide range of PES. These values for q and p are consistent with recent findings by Dawes et al.36,37 where the range q = 2 – 4 and p = 8 – 12 were found to work best for fits that included the energy, gradient, and Hessian, as in Eq. 共1兲. In the two-part weighting function in Eq. 共4兲, if ␰n − ␰n共i兲 ⬍ dn共i兲, then the first term on the right-hand side dominates and vi decreases slowly as ␰n − ␰n共i兲 increases. However, if ␰n − ␰n共i兲 ⬎ dn共i兲, then the second term dominates

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-4

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon

Start with n Nodes Geometry in Cartesian Coordinates For each node Yes

QChem Converts X  2 Determines E,∇ E,∇ E

Is || f ⊥|| < f

tol ? ⊥

Advance to Next Node

No

Conjugate Gradient Computes Step Size

Library of Points {, E,∇ E,∇2E} Interpolation Criterion Satisfied

No

Yes

Interpolation Scheme Estimates E, ∇E

FIG. 1. A flow sheet for the evolution step of the modified GSM. The orthogonal force f⬜, refers to the negative of the component of the gradient that is orthogonal to the reaction path. The criterion for using the interpolation scheme is if there are M points within a confidence length dtol.

and vi decreases rapidly to zero. In other words, points within a confidence length dn共i兲 contribute heavily to the estimate of E共␰兲, while points that lie outside contribute very little. The accuracy of the interpolated energy in Eq. 共1兲 was shown to not be strongly dependent on the number of points M 共Ref. 37兲 or the value of the error tolerance Etol 共Ref. 35兲 used in calculating the confidence length dn共i兲. Details concerning Eqs. 共4兲 and 共5兲 have been discussed by Collins and co-workers.31,35 w i共 ␰ 兲 =

v i共 ␰ 兲 Ndata

共3兲

,

v j共␰兲 兺 j=1

再冋 兺 冉 冋兺 冉 3N−6

v i共 ␰ 兲 =

n=1

3N−6

+

n=1

M

dn共i兲−6 =

1 兺 M j=1

冊册 冊册 冎 再冋冏 冏 冏 冏 册 ␰n − ␰n共i兲 d n共 ␰ 兲

2

q/2

␰n − ␰n共i兲 d n共 ␰ 兲

2

p/2

⳵E ⳵␰n

共4兲

,



␰共j兲

−1

⳵ Ti ⳵␰n

关␰n共j兲 − ␰n共i兲兴

␰共j兲 2 Etol储␰共j兲 − ␰共i兲储6



2

.

共5兲

D. Modified-GSM algorithm

The three modifications described above were implemented in the evolution step of the GSM, as shown in Fig. 1. The reactant and product geometries are specified in the Cartesian coordinates, which is the default output for many molecular visualization software packages. The center of mass and list of coordinates are adjusted to match in the reactant and product states, so as to ensure that the geometries of both states are defined consistently in the standard orientation. For

instance, the ith atom in the reactant matches the ith atom in the product, and the center of mass of both species is at the origin. Using the standard orientation for each geometry, QCHEM 3.0 共Ref. 38兲 was used to determine the set of internal coordinates following the procedure of Baker et al.,23 described previously. A library of information about the PES surface is built by saving the delocalized internal coordinates, energy, analytical gradient, and approximate Hessian from each force calculation for a given molecular geometry. This information is used to determine the search direction for the conjugate gradient method and for the potential energy surface interpolation scheme. If the norm of the orthogonal force, 储f⬜储, on a node is tol less than a user-specified tolerance, f ⬜ , then the evolution step for that node is complete and the calculation proceeds to the next node. Once the evolution step is complete for all nodes, a new node is added 共if the number of nodes is less than Nnode兲, and the string is reparametrized. However, if tol 储f⬜储 ⬎ f ⬜ then the conjugate gradient method is used to determine the direction to move on the potential energy surface. The Polak-Ribiére method is used within the conjugate gradient method to calculate the scalar ␤, a multiplicative factor needed to determine successive conjugate direction vectors. The Polak–Ribière method is preferred over the Fletcher–Reeves method because the it converges much more rapidly.25,26 Whereas the Polak–Ribière and Fletcher– Reeves methods are identical for a quadratic surface, for a nonquadratic surface where the conjugate property between successive search directions does not hold rigorously, the Polak–Ribière method periodically restarts the conjugate gradient method by setting ␤ = 0. Previous work has shown that as the number of iterations in the conjugate gradient method increases, the Polak–Ribière method proceeds more smoothly toward a minimum than the Fletcher–Reeves method.25 Moving in the conjugate gradient direction, the energy and forces of the new configuration can be calculated either using the QM code or by interpolation. If the criterion for interpolation is satisfied, then the library of previous calculated structures is used as the basis for estimation. The criterion that was used for using the interpolation scheme is that there must be at least M points within a confidence length of dtol. Choosing a large value for M and dtol would result in interpolating often, but with less accuracy; however, choosing a low value of M and dtol would result in interpolating rarely, but with high accuracy. In this work, a value of M = 5 and dtol = 1.0dnode, where dnode is the curvilinear distance between adjacent nodes, was found to balance the frequency of interpolating with the accuracy of the interpolated results. The energy tolerance, Etol in Eq. 共5兲 was set to 0.062 kcal/ mol 共0.1 mH兲, as in previous work.35 If any of the points in the final string were determined by interpolation, then a single-point energy calculation was performed using the QM code. This ensures that interpolated points with an erroneous energy estimate are corrected. The geometry corresponding to the highest energy in the converged string is used as the starting geometry in a TS

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-5

J. Chem. Phys. 129, 174109 共2008兲

Modified growing string method

GSM mod-GSM Interpolated Pt

Grow String at Low-Level of Theory

Improve TS Estimate from Low-Level String using High-Level Theory

Yes

Did TS Search Converge?

|xsp-xest|

No

Yes

Refine String at High-Level of Theory

Optimize TS Estimate from High-Level String using QChem TS Search

No

Did TS Search Converge?

0.1

TS Found

Add More Nodes to String & Restart Calculation

0.01

FIG. 2. A flow sheet for a TS-finding strategy that uses a combined low level/high level of theory approach. A TS search using a high level of theory can be initiated directly from the converged low-level string. Otherwise, if the topology of the PES is correct, the low-level string can be further refined at a high level of theory.

search calculation. The eigenvector following 共EF兲 algorithm, which uses the P-RFO method described by Baker,12 is used in QCHEM with internal coordinates to optimize the TS estimate from the GSM to the final converged TS geometry. The EF scheme in QCHEM works by maximizing along the lowest Hessian mode while at the same time minimizing along all other modes. Often, the starting TS estimate has many negative Hessian eigenvalues that are eliminated during the calculation, with the final converged TS geometry having one negative Hessian eigenvalue. E. Hybrid low-level/high-level strategy

A further aim of this work was to find a means for reducing the computational time required to find TSs. To this end, we investigated a strategy in which the search for a TS was initiated using a low level of theory/small basis sets 共i.e., Hartree–Fock, force field, and semiempirical methods兲 and then refined using a higher level of theory/larger basis sets,

1

10

100

1000

Gradient Evaluations

FIG. 4. 共Color online兲 Error in the estimate of the highest saddle point from the true saddle point 共TS1兲 on the MB PES. Results are shown at 18 nodes for the GSM and modified GSM. The open circles represent points where an interpolation step was performed in the modified GSM.

which is more computationally demanding. For instance, the computational time for a single force calculation can be reduced by two orders of magnitude by switching from using large basis sets 共e.g., 6-31Gⴱ兲 to smaller basis sets 共e.g., STO-3G兲 because the calculation time is O共b3兲 – O共b4兲, where b is number of basis functions. Hence, a small reduction in the number of basis functions used has a large impact on reducing the computational time. This strategy, shown in Fig. 2, is conceptually similar to that used for geometry optimizations, in which an initial geometry is sought using a low level of theory and then refined using a higher level of theory.5 In the context of the GSM, a string connecting the reactant and product is grown using a low level of theory and a minimal basis set 共e.g., HF/STO-3G兲. Next, the TS estimate from this calculation is used as a starting geometry for a TS search at a higher level of theory carried out with larger basis sets 共e.g., density functional theory using

2

GSM mod-GSM

A

1.5

-40

-60

y

TS1

C

0.5

TS2 0

-0.5 -1.5

Energy (arb. units)

1

B

-80

-100

-120

-140 -1

Minimum

-0.5

x

0

0.5

1

Saddle Point (Transition State)

FIG. 3. 共Color online兲 Contour plot of the Müller–Brown potential energy surface showing the three minima 共A, B, C兲 and two saddle points, or TSs 共TS1, TS2兲.

0.0

0.2

0.4

0.6

0.8

1.0

Normalized Arclength

FIG. 5. 共Color online兲 Final energy profile of the MEP for the MB PES using the GSM and the modified GSM for 18 nodes. The points in each curve are joined with a spline.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-6

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon

GSM mod-GSM 8

Energy (kcal/mol)

6

4

2

0

-2

FIG. 6. 共Color online兲 Alanine dipeptide isomerization from species C5 共reactant兲 to C7AX 共product兲.

B3LYP/ 6-31Gⴱ兲. However, if the TS search calculation fails to converge, then the converged string at the low level of theory can be used as the initial pathway for a subsequent string calculation at a high level of theory. The TS estimate from this refined calculation can then be used to start a TS search. It is noted that for each of the systems investigated, a TS search performed using the TS estimate attained from a lowlevel string calculation could not be converged. This finding indicates that it is essential to refine the string calculation using a high level of theory prior to completing the TS search. Without refining the string at a high level of theory, the energetics of the overall reaction tend to be erroneous and the resulting TS estimate is often a poor initial guess containing many negative Hessian eigenvalues. However, this strategy is valuable in providing an initial reaction pathway with a small computational cost that can then be refined at a higher level of theory.

GSM mod-GSM Interpolated Pt

3.5 3.0

0.0

0.2

0.4

0.6

0.8

1.0

Normalized Arclength FIG. 8. 共Color online兲 Final energy profile of the MEP for alanine dipeptide isomerization using the GSM and the modified GSM for 11 nodes. The reaction path shown for the modified GSM is after single-point QM calculations have been performed on interpolated points that exist in the final string. The points in each curve are joined with a spline.

III. RESULTS AND DISCUSSION

The following four examples compare the efficiency of the modified GSM with the original GSM for finding TSs for a set of reactions of increasing complexity. Unless noted, 11 nodes were used in the final string. The GSM and modified GSM were interfaced with QCHEM 3.0 共Ref. 38兲 to perform the QM energy and force calculations. The same convergence criteria were used for both forms of the GSM. A pertol 兲 was rependicular gradient 储f⬜储 ⬍ 0.040 hartree/ bohr 共f ⬜ quired before a new node was added to the string, the extrapolated distance for placement of a new node was 0.08共⬍共1 / Nnode兲兲, and a perpendicular gradient 储f⬜储 tol 兲 was required on each node for ⬍ 0.002 hartree/ bohr 共F⬜ convergence of the string. The highest energy structure in each string calculation was used as the initial geometry for a TS search. Each refined TS structure was found to have only one imaginary frequency. The GSM and modified GSM were compared on the basis of the number of QM gradient calculations required to

RMS Error

2.5

TABLE I. Comparison of TS geometries from final string calculations for alanine dipeptide isomerization. The energies are relative to the reactant, C5.

2.0 1.5

Calculation method

1.0 0.5

GSM 共B3LYP/ 6-31Gⴱ兲

0.0 10

100

1000

QM Gradients

FIG. 7. 共Color online兲 Error in the estimate of the highest saddle point from the true saddle point for alanine dipeptide isomerization. Results are shown at 11 nodes for the GSM and modified-GSM. The open circles represent points where an interpolation step was performed in the modified GSM.

Modified GSM 共B3LYP/ 6-31Gⴱ兲

Perczel et al.a 共B3LYP/ 6-31Gⴱ兲

Species C5 TS C7AX C5 TS C7AX C5 TS C7AX

␾ 共deg兲

␺ 共deg兲

−161.2 165.1 107.3 −139.7 71.7 −58.2 −161.2 165.1 105.8 −150.9 71.7 −58.2 −156.8 162.6 114.4 −147.9 72.3 −54.7

E 共kcal/mol兲 0.0 7.6 −1.40 0.0 7.2 −1.40 0.0 7.33 −1.35

a

Reference 44.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-7

J. Chem. Phys. 129, 174109 共2008兲

Modified growing string method

TABLE II. Time required to determine the TS for alanine dipeptide isomerization.

Calculation method GSM string grown at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ Modified-GSM string grown at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ Modified-GSM string grown at HF/STO-3G, refined at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ

Functional/basis set

No. of QM gradient

CPU time/gradient

CPU time for TS optimization

Total CPU timea

B3LYP/ 6-31Gⴱ

980

⬃8 min

35 min

130 h

B3LYP/ 6-31Gⴱ

339

⬃8 min

36 min

46 h

HF/STO-3G

324

⬃10– 15 s

35 min

32 h

216

⬃8 min

B3LYP/ 6-31G



a

Total CPU time= CPU time for growing string+ CPU time for optimizing TS estimate.

reach the estimated TS. For both methods, gradient calculations represent nearly all the computational time. The time required for calculating the search direction in the conjugate gradient method and for estimating the energy of interpolated geometries is negligible compared to the time required for a single gradient calculation. A. Müller–Brown potential energy surface

The Müller–Brown PES 共Ref. 39兲 is commonly used to benchmark the performance of new TS-finding methods.7,17,39–41 The MEP on this PES differs significantly from the linearly interpolated path and, as seen in Fig. 3, there are two saddle points 共TS1, TS2兲 located between three local minima 共A, B, C兲 on the MEP. Since the Müller–Brown PES is analytic, the energy and gradient at any point can be evaluated rapidly. Figure 4 shows the number of gradient evaluations versus the root-mean-square error between the highest saddle point 共TS1兲 and the TS estimate in going from point A 共reactant兲 to point B 共product兲, using 18 nodes. In the original paper by Müller and Brown,39 124-155 function evaluations were needed to determine the TS1, and 73-74 to determine TS2, using the constrained simplex optimization procedure. 共The exact number of evaluations required varied depending on the starting point.兲 Using the GSM, both TSs were determined from one calculation, requiring 152 gradient evaluations.7 共The time needed to perform a gradient evaluation is equivalent to that for a single function evaluation on the Müller–Brown PES.兲 A clear advantage of the GSM over other methods, such as the NEB, can be seen from this example: the GSM does not require initial information about the reaction path and it is able to find multiple TSs 共when they exist兲 by connecting the reactant and product. The capability of the GSM to determine multiple TSs has also been noted recently by Chempath and Bell42 in their study of methane oxidation to formaldehyde on isolated molybdate species supported on silica. By implementing the modifications to the GSM, only 75 gradient evaluations were required to find both TSs. The points in Fig. 4 identify geometries for which the energy was estimated using the interpolation scheme. In total, 62 points were interpolated on the Müller–Brown PES in the course of identifying the MEP. Figure 5 compares the final strings obtained from the GSM and the modified GSM. As can be seen, the results of both methods agree quite well, and the values of the energy for both TS1 and TS2 agree with the actual value. Thus, the results of the modified GSM on the Müller–

Brown PES give a hint about the speedup in cases where calculating the energy and gradient are nontrivial.

B. Alanine dipeptide isomerization

Alanine dipeptide isomerization is another well-studied example that has been used to evaluate the performance of TS-finding methods.43 The system consists of 22 atoms and 66 Cartesian coordinates. Using internal coordinates, the isomerization of alanine dipeptide can be described by the rotation of two dihedral angles ␾ and ␺, shown in Fig. 6.44 The GSM and modified GSM calculations were performed at the B3LYP/ 6-31Gⴱ level of theory. A single gradient 共i.e., force兲 calculation for the geometries of alanine dipeptide required ⬃8 min. Figure 7 compares the number of gradients computed by the QM code required to achieve convergence using the GSM and the modified GSM. The GSM requires 980 QM gradient calculations,7 whereas the modified GSM requires only 339. Several factors contribute to the marked improvement. First, internal coordinates are much better at describing this reaction than the Cartesian coordinates. Rather than keeping track of all the 66 Cartesian coordinates, the isomerization reaction can be described just by examining the two dihedral angles. Second, the conjugate gradient method results in very efficient convergence of the final string containing 11 nodes 共requiring 87 QM gradient calculations兲. Third, 45 points are interpolated by using the PES interpolation method. These results are similar to those reported recently by Quapp45 using a modified conjugate gradient method and the Newton projections,46,47 which are used to determine where to move each node on the string, for the determination of the TS. An analysis of the relative contribution of various improvements to the GSM was carried out. As noted above, the original version of the GSM algorithm, which uses Cartesian coordinates and the steepest decent method, required 980 QM gradient calculations. Replacement of the Cartesian coordinates by internal coordinates reduced the number of QM gradient calculations to 651, and the replacement of the steepest decent algorithm by the conjugate gradient algorithm reduced the number of QM gradient calculations to 435. Finally, use of the PES interpolation scheme, further reduced the number of QM gradient calculations to 339. The use of internal coordinates is very helpful because the reaction involves the movement of 16 atoms around two dihedral angles.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-8

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon

GSM mod-GSM 45 40

FIG. 9. 共Color online兲 H abstraction in methanol oxidation on VOx / SiO2.

Energy (kcal/mol)

35 30 25 20 15 10

The large deviation from the TS geometry occurring from 20 to 28 QM gradients in the case of the modified GSM can be explained by looking at the PES calculated by Perczel et al.44 The string starts moving uphill to a TS that is higher in energy than the actual TS for this reaction. The problem is rapidly corrected by calculating QM gradients instead of using the interpolation scheme. This same problem is encountered in the GSM from 102 to 108 QM gradients. Neither the GSM nor the modified GSM approaches the TS monotonically, as shown in Fig. 7. The reason why there are periodic increases in the error of the TS estimate as more QM gradients are computed is due to the addition of a new node 共or nodes兲 along the reaction pathway. When a new node is added along the reaction pathway, there is initially little information about the local PES, and this can result in deviations away from the TS. These deviations are quickly corrected in the GSM and modified GSM. Typically, when points are interpolated successfully on the PES, the estimate of the TS becomes better without having to make additional QM gradient calculations. However, at points where the interpolation criterion is barely satisfied, the next step often produces a point for which the criterion is no longer satisfied and a QM gradient must be computed. This problem is evident for the interpolated point at 40 QM GSM mod-GSM Interpolated Pt

2.0 1.8 1.6

RMS Error

1.4 1.2 1.0 0.8 0.6 0.4 0.2 0.0 10

100

1000

QM Gradients

FIG. 10. 共Color online兲 Error in the estimate of the highest saddle point from the true saddle point for H abstraction in methanol oxidation on VOx / SiO2. Results are shown at 11 nodes for the GSM and modified GSM. The open circles represent points where an interpolation step was performed in the modified GSM.

5 0 0.0

0.1

0.2

0.3

0.4

0.5

0.6

0.7

0.8

0.9

1.0

Normalized Arclength

FIG. 11. 共Color online兲 Final energy profile of the MEP for H abstraction in methanol oxidation on VOx / SiO2 using the GSM and the modified GSM for 11 nodes. The reaction path shown for the modified GSM is after singlepoint QM calculations have been performed on interpolated points that exist in the final string.

gradients shown in Fig. 7. The estimate of the energy at this point brings the calculation closer to the actual TS, although the next point is not vertically below it, as one might expect. The reason for this behavior is that at the next point a QM gradient is computed since the interpolation criterion is not met. The final energy profile of the MEP calculated by both methods is shown in Fig. 8. The energies of all interpolated points in the final string obtained using the modified GSM were updated by carrying out single-point energy calculations. Table I shows the energies and dihedral angles for the reactant 共C5兲, product 共C7AX兲, and the estimated TS obtained using both methods. In both cases, the geometry of the TS and the activation energy agree well with one another, and with the previous work of Perczel et al.44 The close agreement between the GSM and the modified GSM indicates that the accuracy of the estimated TS is not compromised by the use of the modified GSM. As a final test, a TS search was initiated using the geometry corresponding to the highest energy in each string calculation. Both calculations converged to the same TS structure and the difference in the imaginary frequency was only 0.05 cm−1. An investigation was carried out as well to establish whether the search for the TS could be initiated using a string first grown at a low level of theory and then refined at a higher level of theory. The calculation approach for this strategy is shown in Fig. 2. The string was first grown using HF/STO-3G and then converged using the modified GSM. This process required 324 QM gradients calculations. The calculated reaction path from this level of theory was then used as the starting string for the calculation of the string at the B3LYP/ 6-31Gⴱ level, a process which required an additional 216 QM gradient calculations. The advantage of this strategy is that a force calculation at the HF/STO-3G level took ⬃10– 15 s, whereas a force calculation at the

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-9

J. Chem. Phys. 129, 174109 共2008兲

Modified growing string method

TABLE III. Comparison of TS geometries from final string calculations for H abstraction in methanol oxidation on VOx / SiO2. The strings were grown using the functional/basis sets listed. All TS estimates were refined at B3LYP/ 6-31Gⴱ / LANL2DZ. Calculation method

RC–H 共Å兲

RO–H 共Å兲

vimg 共cm−1兲

⌬Eact 共kcal/ mol兲

1.63 1.63

1.11 1.11

−870.9 −870.9

40.3 40.8

1.63

1.11

−871.9

40.8



GSM Goodrow and Bell 共B3LYP/ 6-31G 兲 Modified GSM 共B3LYP/ 6-31Gⴱ兲 Modified-GSM string grown at HF/STO-3G, refined at B3LYP/ 6-31Gⴱ a

a

Reference 48.

B3LYP/ 6-31Gⴱ level took ⬃8 min. The final TS estimate obtained from the string at the B3LYP/ 6-31Gⴱ level converged to the same TS structure as that determined previously. As shown in Table II, a modified GSM calculation and TS search, both using B3LYP/ 6-31Gⴱ, took a total of 46 h to determine the TS, whereas the time required for the hybrid, low-level/high-level approach was 32 h. The total CPU time in Table II was calculated as the CPU time to grow the string plus the CPU time to optimize the TS estimate to the final TS structure. The CPU time needed to grow the string is the product of the number of QM gradients and the time per QM gradient. In the case of the hybrid low-level/high-level approach, the CPU time to grow the string is the sum of the CPU time for growing both the HF/STO-3G and B3LYP/ 6-31Gⴱ strings. Thus, while the total number of gradient calculations computed using the hybrid approach 共324+ 216= 540兲 is more than the number required if the whole process were performed at the B3LYP/ 6-31Gⴱ level of theory 共339兲, it represents a significant reduction in the overall computational time because a force calculation at the HF/STO-3G level is an order of magnitude faster than one at carried out at the B3LYP/ 6-31Gⴱ level. C. H-abstraction in isolated vanadate sites supported on silica

The third test case investigated was the rate-limiting step for methanol oxidation occurring on isolated vanadate species supported on silica, VOx / SiO2.48 This reaction system involves 34 atoms. The rate-limiting step is the abstraction of a H atom from an adsorbed methoxy group bonded to V to the vanadyl O, as shown in Fig. 9. This test case was chosen because of the large number of atoms involved, including a transition metal, and the long computational time needed to determine the TS using the GSM. The GSM and modified GSM were carried out at the B3LYP/ 6-31Gⴱ level. The LANL2DZ effective core potential was used for vanadium.

Figure 10 shows the number of gradients required for convergence for both methods. The modified GSM only required 381 QM gradients as compared to 651 for the original GSM. The reduction in the number of QM gradient calculations represents a significant savings in computational time since a single gradient 共i.e., force兲 calculation for this system at the B3LYP/ 6-31Gⴱ / LANL2DZ level took ⬃80– 120 min. A total of 76 points were interpolated. It is observed that once a sufficient number of points have been determined on the potential energy surface by means of QM calculations, the interpolation scheme was able to estimate many points with reasonable accuracy, as seen from the end of the calculation. As done previously, an analysis of the relative contribution of each improvement to the GSM was performed. The original version of the GSM algorithm required 651 QM gradient calculations. Replacement of the Cartesian coordinates by internal coordinates reduced the number of QM gradient calculations by 8% to 599. This reduction is less than the 34% seen in alanine dipeptide isomerization because the reaction mechanism involves movement of only a single H atom. The replacement of the steepest decent algorithm by the conjugate gradient algorithm reduced the number of QM gradient calculations to 467. Finally, use of the PES interpolation scheme, further reduced the number of QM gradient calculations to 381. The final energy profile of the MEP obtained by both methods is shown in Fig. 11. Single-point energy calculations were performed on the points on the final string obtained from the modified GSM that had been determined by interpolation. The single-point energy calculations changed the energy of several of the interpolated points in the final string by as much as 2–3 kcal/mol. At the B3LYP/ 6-31Gⴱ / LANL2DZ level, the change in energy for the overall reaction shown is ⌬E = 38.1 kcal/ mol and the activation energy is ⌬E‡ = 40.8 kcal/ mol. These values compare well with those reported by Goodrow and Bell,48 ⌬E

TABLE IV. Time required to determine the TS for H transfer in VOx / SiO2.

Calculation method GSM string grown at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ Modified-GSM string grown at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ Modified-GSM string grown at HF/STO-3G, refined at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ

Functional/basis set

No. of QM gradient

CPU time/gradient 共min兲

CPU time for ts optimization 共h兲

Total CPU time 共h兲a

B3LYP/ 6-31Gⴱ

651

⬃80– 120

140

1225

B3LYP/ 6-31Gⴱ HF/STO-3G B3LYP/ 6-31Gⴱ

339 330 151

⬃80– 120 ⬃4 – 6 ⬃80– 120

134 114

700 393

a

Total CPU time= CPU time for growing string+ CPU time for optimizing TS estimate.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-10

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon 3.0

mod-GSM Interpolated Pt

2.5

RMS Error

2.0

1.5

1.0

FIG. 12. 共Color online兲 C–H bond activation in toluene on a Rh complex coordinated with two CO species and three TFA solvent ligands.

= 37.1 and ⌬E‡ = 39.8 kcal/ mol, respectively, obtained using the LACV3Pⴱⴱ + + basis set. A TS search calculation using the estimated geometry from the modified GSM converged to the same energy and geometry as the GSM. The modified GSM was used at the HF/STO-3G level to identify the initial string. While the energetics were not properly captured by this low level of theory, the topology of the potential energy surface was correct. The converged reaction path from this calculation was reasonably close to the exact reaction path determined at a higher level of theory. The reaction path from HF/STO-3G was refined by beginning another string calculation at the B3LYP/ 6-31Gⴱ / LANL2DZ level. A comparison of the final TS structures obtained from each method are shown in Table III. The C–H and O–H bond lengths, as well as the imaginary vibrational frequency characteristic of the TS, agree well for all three cases. Table IV lists the CPU time required for each type of calculation. It is important to note that the times listed are for a single CPU, whereas, in practice, each gradient and TS search calculation was performed using 2 CPUs. Using this hybrid method, 330 QM gradients were calculated with HF/STO-3G and an addition 151 QM gradients were calculated for convergence of the string at B3LYP/ 6-31Gⴱ / LANL2DZ, for a total of 481 QM gradient calculations. The time required to identify the TS in this case was 393 h, whereas using the modified GSM to grow the entire string at the B3LYP/ 6-31Gⴱ / LANL2DZ level required 700 h.

0.5

0.0 10

1000

FIG. 13. 共Color online兲 Error in the estimate of the highest saddle point from the true saddle point for C–H bond activation in toluene over Rh/TFA complex. The result shown is at 11 nodes for the modified GSM. The GSM is not shown because it was unable to converge. The open circles represent points where an interpolation step was performed in the modified GSM.

fied GSM at a low level of theory 共HF/STO-3G兲, 共2兲 refinement of the string calculation at a higher level of theory 共B3LYP/ 6-31Gⴱ and LANL2DZ for Rh兲, and 共3兲 utilization of the TS estimated from the refined string to determine the final TS geometry. HF/STO-3G was used as a low level of theory to gain an initial estimate of the reaction pathway. This method was chosen because the change in energy for the reaction 共⌬EHF = 0.12 kcal/ mol兲 is comparable to that computed using a more accurate method 共⌬EB3LYP = 0.43 kcal/ mol for gas-phase species, ⌬EB3LYP = 0.9 kcal/ mol for the inclusion of solvent effects49兲. A total of 852 gradients were computed and 166 points were interpolated using the modified GSM, shown in Fig. 13. Growth and convergence of the initial string at the HF/ STO-3G level required 738 gradients, with each calculation mod-GSM E(HF) mod-GSM E(B3LYP)

14

D. C–H bond activation in the oxidative carbonylation of toluene to p-toluic acid

12

10

Energy (kcal/mol)

A theoretical investigation of the oxidative carbonylation of toluene to p-toluic acid by the complex Rh共CO兲2共TFA兲3 共TFA= trifluoroacetate兲 has recently been reported by Zheng and Bell.49 As shown in Fig. 12, the rate-limiting step for the reaction is C–H bond activation of coordinated toluene. This particular reaction was chosen because it involves the collective movement of many atoms and the TS for the reaction could not be found using the GSM by itself. Instead, the GSM had to be used in association with an iterative partial optimization approach 共IPOA兲 that involved the freezing and unfreezing of the Cartesian coordinates.49 While this approach proved to be successful, the IPOA involved a lot of human intervention and required over 1800 h of CPU time to converge to the correct TS structure. The strategy that we have used to determine the TS consisted of three steps: 共1兲 growth of the string using the modi-

100

QM Gradients

8

6

4

2

0 0.0

0.2

0.4

0.6

0.8

1.0

Normalized Arclength

FIG. 14. 共Color online兲 Final energy profile of the MEP for C–H bond activation in toluene over Rh/TFA complex using the modified GSM for 11 nodes. The modified GSM was first used using HF/STO-3G, and then refined using B3LYP/6-31G/LANL2DZ. The points in each curve are joined with a spline.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-11

J. Chem. Phys. 129, 174109 共2008兲

Modified growing string method

TABLE V. Comparison of TS geometry using different functionals/basis sets for C–H bond activation in toluene on Rh共CO兲2共TFA兲.

Bond length 共Å兲 C–H O–H Rh–C Rh–O

Estimated TS from string 共HF/STO-3G兲

Estimated TS from string 共B3LYP/ 6-31Gⴱ兲

1.40 1.43 2.12 2.04

Optimized TS from string 共B3LYP/ 6-31Gⴱ兲

1.33 1.30 2.19 2.12

TS determined by Zheng and Bella

1.28 1.38 2.18 2.11

1.30 1.35 2.20 2.11

a

Reference 49.

taking ⬃9.5 min, and 134 interpolated points. The string was then refined at the B3LYP/ 6-31Gⴱ / LANL2DZ level, which required the calculation of an additional 114 gradients, each calculation taking ⬃100– 120 min, and 32 interpolated points. Figure 13 shows a gap between when the last point interpolated in the HF/STO-3G string 共at 737 QM gradients兲 and the first point interpolated in the B3LYP/ 6-31Gⴱ / LANL2DZ string 共at 793 gradients兲. The reason for this gap is that points computed in the HF/ STO-3G string cannot be used in the B3LYP/ 6-31Gⴱ / LANL2DZ string because the energies were very different, even though the geometry was correct. As a result, it was necessary to perform 56 QM calculations at the B3LYP/ 6-31Gⴱ / LANL2DZ level before the criterion for interpolating a point was satisfied. An analysis of the relative contribution of various improvements to the GSM was carried out; however, the calculations did not converge. The original version of the GSM did not converge after 1000 QM gradients using B3LYP/ 6-31Gⴱ / LANL2DZ and the calculation was terminated after 1800 h. Replacement of the Cartesian coordinates by internal coordinates and replacement of the steepest decent algorithm by the conjugate gradient algorithm were insufficient for the GSM to converge to a TS after 1000 QM gradients. Only by incorporating the PES interpolation scheme with the previous modifications was the GSM able to converge, requiring 852 QM gradient calculations. All the modifications to the GSM work together in a synergistic fashion and are required for this reaction system because of the complexity involved in the movement of a large number of atoms. In particular, the interpolation method is relied on heavily, with the fraction of points interpolated versus QM gradients calculated at nearly 20%. Lastly, the hybrid approach of using a low-level/high-level method provides a significant reduction in computational time for this reaction. The converged strings using HF/STO-3G and

B3LYP/ 6-31Gⴱ / LANL2DZ are shown in Fig. 14. The estimate of the activation energy is 12.8 kcal/mol using HF and 11.9 kcal/mol using B3LYP. A TS search was then initiated using the estimate from the final B3LYP string in order to converge to the correct TS structure. The activation energy for the final converged TS from the modified GSM was 9.4 kcal/mol, in close agreement to the previously computed value of 9.1 kcal/mol 共⌬E‡ = 7.7 kcal/ mol when accounting for solvent effects兲.49 A comparison of the geometries of the TS determined by different approaches is shown in Table V. The TS from the converged string grown at the HF/STO-3G level provides a reasonable estimate of the actual TS. The geometry of the final converged TS at the B3LYP level agrees closely with that reported by Zheng and Bell.49 As shown in Table VI, the CPU time required for these three steps are 117, 209, and 228 h, respectively, for a total CPU time of 554 h. This strategy represents a large reduction in computational time and human intervention over the previous IPOA 共1800 h兲. It also demonstrates that a low level of theory 共in this case HF兲 can be used to generate a reasonable depiction of the MEP and a good estimate of the TS, which can then be refined using a higher level of theory. IV. CONCLUSIONS

Several modifications were made to the GSM with the aim of decreasing the computational requirements of the method: 共1兲 the Cartesian coordinates were replaced by internal coordinates, 共2兲 the steepest descent method used for the minimization of orthogonal forces at each node along the string was replaced by the conjugate gradient method, and 共3兲 a PES interpolation scheme was used to generate an estimate of the energy when possible. All three modifications contributed to a reduction in the number of calls made to the QM code. The degree to which a particular modification alters the number of QM code calls is problem specific but is,

TABLE VI. Time required to determine the TS for C–H activation in toluene over Rh共CO兲2共TFA兲.

Calculation method IPOA by Zheng and Bella Modified-GSM string grown at HF/STO-3G, refined at B3LYP/ 6-31Gⴱ TS search at B3LYP/ 6-31Gⴱ

Functional/basis set

No. of QM gradient

CPU time/gradient 共min兲

CPU time for ts optimization

Total CPU timeb

B3LYP/ 6-31Gⴱ

⬎1000

⬃100– 120

¯

⬎1800 h did not converge

HF/STO-3G

738

⬃9 – 10

114

⬃100– 120

228 h

554 h

B3LYP/ 6-31G



a

Reference 49. Total CPU time= CPU time for growing string+ CPU time for optimizing TS estimate.

b

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

174109-12

in general, about two to three times lower than that for the original version of the method. It is significant that the estimate of the TS geometry obtained using the modified GSM led to refined TS structures that were equivalent to those found using the original method. The idea of using a low level of theory to determine the initial locations of the string nodes on the PES followed by refinement of the string using a higher level of theory was also explored. Implementation of this approach led to further reductions in computational time, without any loss in the quality of the final TS geometry or energy. Thus, a recommended strategy for determining TSs is to initiate a calculation using the modified GSM at a low level of theory or with small basis sets and then refine the calculation at a higher level of theory with larger basis sets. 1

J. Chem. Phys. 129, 174109 共2008兲

Goodrow, Bell, and Head-Gordon

D. G. Truhlar, B. C. Garrett, and S. J. Klippenstein, J. Phys. Chem. 100, 12771 共1996兲. 2 D. G. Truhlar, Faraday Discuss. 110, 521 共1998兲. 3 D. G. Truhlar and K. Morokuma, Transition State Modeling for Catalysis 共ACS, Dallas, TX, 1998兲. 4 F. Jensen, Introduction to Computational Chemistry 共Wiley, Chichester, 1999兲. 5 H. B. Schlegel, J. Comput. Chem. 24, 1514 共2003兲. 6 A. T. Bell, Mol. Phys. 102, 319 共2004兲. 7 B. Peters, A. Heyden, A. T. Bell, and A. Chakraborty, J. Chem. Phys. 120, 7877 共2004兲. 8 A. Heyden, A. T. Bell, and F. J. Keil, J. Chem. Phys. 123, 224101 共2005兲. 9 R. Fletcher, Practical Methods of Optimization 共Wiley, New York, 1987兲. 10 C. J. Cerjan and W. H. Miller, J. Chem. Phys. 75, 2800 共1981兲. 11 A. Banerjee, N. Adams, J. Simons, and R. Shepard, J. Phys. Chem. 89, 52 共1985兲. 12 J. Baker, J. Comput. Chem. 7, 385 共1986兲. 13 G. Henkelman, B. P. Uberuaga, and H. Jonsson, J. Chem. Phys. 113, 9901 共2000兲. 14 G. Henkelman and H. Jonsson, J. Chem. Phys. 113, 9978 共2000兲. 15 W. E, W. Ren, and E. Vanden-Eijnden, Phys. Rev. B 66, 052301 共2002兲. 16 W. Quapp, J. Chem. Phys. 122, 174106 共2005兲. 17 S. K. Burger and W. T. Yang, J. Chem. Phys. 124, 054109 共2006兲. 18 W. E, W. Ren, and E. Vanden-Eijnden, J. Chem. Phys. 126, 164103 共2007兲. 19 A. Aguilar-Mogas, X. Gimenez, and J. M. Bofill, J. Chem. Phys. 128, 104102 共2008兲. 20 P. Pulay, G. Fogarasi, F. Pang, and J. E. Boggs, J. Am. Chem. Soc. 101, 2550 共1979兲. 21 G. Fogarasi, X. F. Zhou, P. W. Taylor, and P. Pulay, J. Am. Chem. Soc. 114, 8191 共1992兲. 22 P. Pulay and G. Fogarasi, J. Chem. Phys. 96, 2856 共1992兲. 23 J. Baker, A. Kessi, and B. Delley, J. Chem. Phys. 105, 192 共1996兲.

24

E. B. Wilson, J. C. Decius, and P. C. Cross, Molecular Vibrations 共McGraw-Hill, New York, 1955兲. 25 W. H. Press, S. A. Teukolsky, W. T. Vetterling, and B. P. Flannery, Numerical Recipes in C⫹⫹ 共Cambridge University Press, New York, 2005兲. 26 J. R. Shewchuk, An Introduction to the Conjugate Gradient Method Without the Agonizing Pain 共Carnegie Mellon University, Pittsburgh, PA, 1994兲. 27 D. Sheppard, R. Terrell, and G. Henkelman, J. Chem. Phys. 128, 134106 共2008兲. 28 J. Ischtwan and M. A. Collins, J. Chem. Phys. 100, 8080 共1994兲. 29 K. C. Thompson, M. J. T. Jordan, and M. A. Collins, J. Chem. Phys. 108, 564 共1998兲. 30 K. C. Thompson, M. J. T. Jordan, and M. A. Collins, J. Chem. Phys. 108, 8302 共1998兲. 31 M. A. Collins, Theor. Chem. Acc. 108, 313 共2002兲. 32 G. E. Moyano and M. A. Collins, J. Chem. Phys. 121, 9769 共2004兲. 33 G. E. Moyano and M. A. Collins, Theor. Chem. Acc. 113, 225 共2005兲. 34 D. Shepard, in A Two-Dimensional Interpolation Function for Irregularly-Spaced Data, Proceedings of the ACM National Conference, 1968 共unpublished兲. 35 R. P. A. Bettens and M. A. Collins, J. Chem. Phys. 111, 816 共1999兲. 36 R. Dawes, D. L. Thompson, Y. Guo, A. F. Wagner, and M. Minkoff, J. Chem. Phys. 126, 184108 共2007兲. 37 R. Dawes, D. L. Thompson, A. F. Wagner, and M. Minkoff, J. Chem. Phys. 128, 084107 共2008兲. 38 Y. Shao, L. F. Molnar, Y. Jung, J. Kussmann, C. Ochsenfeld, S. T. Brown, A. T. B. Gilbert, L. V. Slipchenko, S. V. Levchenko, D. P. O’Neill, R. A. DiStasio, R. C. Lochan, T. Wang, G. J. O. Beran, N. A. Besley, J. M. Herbert, C. Y. Lin, T. Van Voorhis, S. H. Chien, A. Sodt, R. P. Steele, V. A. Rassolov, P. E. Maslen, P. P. Korambath, R. D. Adamson, B. Austin, J. Baker, E. F. C. Byrd, H. Dachsel, R. J. Doerksen, A. Dreuw, B. D. Dunietz, A. D. Dutoi, T. R. Furlani, S. R. Gwaltney, A. Heyden, S. Hirata, C. P. Hsu, G. Kedziora, R. Z. Khalliulin, P. Klunzinger, A. M. Lee, M. S. Lee, W. Liang, I. Lotan, N. Nair, B. Peters, E. I. Proynov, P. A. Pieniazek, Y. M. Rhee, J. Ritchie, E. Rosta, C. D. Sherrill, A. C. Simmonett, J. E. Subotnik, H. L. Woodcock, W. Zhang, A. T. Bell, A. K. Chakraborty, D. M. Chipman, F. J. Keil, A. Warshel, W. J. Hehre, H. F. Schaefer, J. Kong, A. I. Krylov, P. M. W. Gill, and M. Head-Gordon, Phys. Chem. Chem. Phys. 8, 3172 共2006兲. 39 K. Müller and L. D. Brown, Theor. Chim. Acta 53, 75 共1979兲. 40 B. Peters, W. Z. Liang, A. T. Bell, and A. Chakraborty, J. Chem. Phys. 118, 9533 共2003兲. 41 S. K. Burger and W. Yang, J. Chem. Phys. 127, 164107 共2007兲. 42 S. Chempath and A. T. Bell, J. Catal. 247, 119 共2007兲. 43 J. W. Chu, B. L. Trout, and B. R. Brooks, J. Chem. Phys. 119, 12708 共2003兲. 44 A. Perczel, O. Farkas, I. Jakli, I. A. Topol, and I. G. Csizmadia, J. Comput. Chem. 24, 1026 共2003兲. 45 W. Quapp, J. Comput. Chem. 28, 1834 共2007兲. 46 W. Quapp, J. Comput. Chem. 22, 537 共2001兲. 47 R. Crehuet, J. M. Bofill, and J. M. Anglada, Theor. Chem. Acc. 107, 130 共2002兲. 48 A. Goodrow and A. T. Bell, J. Phys. Chem. C 111, 14753 共2007兲. 49 X. B. Zheng and A. T. Bell, J. Phys. Chem. C 112, 2129 共2008兲.

Downloaded 26 Nov 2008 to 128.32.49.120. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

Suggest Documents