SciDAC 2007 Journal of Physics: Conference Series 78 (2007) 012067
IOP Publishing doi:10.1088/1742-6596/78/1/012067
Advanced Software for the Calculation of Thermochemistry, Kinetics, and Dynamics R. Shepard Chemistry Division, Argonne National Laboratory, Argonne, IL 60439
[email protected] Abstract. The Born-Oppenheimer separation of the Schroedinger equation allows the electronic and nuclear motions to be solved in three steps. 1) The solution of the electronic wave function at a discrete set of molecular conformations; 2) the fitting of this discrete set of energy values in order to construct an analytical approximation to the potential energy surface (PES) at all molecular conformations; 3) the use of this analytical PES to solve for the nuclear motion using either time-dependent or time-independent formulations to compute molecular energy values, chemical reaction rates, and cumulative reaction probabilities. This project involves the development of technology to address all three of these steps. This report focuses on our recent work on the optimization of nonlinear wave function parameters for the electronic wave functions.
1. Introduction Our work this past year has focused on the optimization of nonlinear wave function parameters for electronic wave functions and on the extension of our nonlinear wave function expansion approach to include spin-orbit interaction. We have recently proposed a new expansion form for molecular electronic wave functions [1-6]. The wave function is written as a linear combination of product basis functions, and each product basis function in turn is formally equivalent to a linear combination of configuration state functions (CSFs) that comprise the underlying linear expansion space of dimension Ncsf. The CSF coefficients that define the basis functions are nonlinear functions of a smaller number of variables NϕNcsf. This new approach addresses directly the main bottlenecks of previous electronic structure methods. Much of the computational effort and data storage requirements with this new approach depends only on Log(Ncsf) rather than Ncsf. The method is formulated in terms of spin-eigenfunctions using the Graphical Unitary Group Approach (GUGA) [7], and consequently it does not suffer from the spin contamination or spin instability that is often associated with single-reference methods. The expansion form is appropriate for both ground and excited states and to closed- and open-shell molecules. The method is not based on the idea of expansion about a “reference” wave function, so its accuracy is not inherently limited due to failures of the Hartree-Fock method or to artificially imposed excitation-level restrictions. Our initial applications to some small molecules are encouraging. As demonstrated last year[4], even a single product function Nα=1 based on a 66 Shavitt graph is sufficient to dissociate the N≡N triple bond correctly to ground state 4S atom fragments, two product functions are sufficient to approach the full-valence CI electronic energy to within 3 mEh, and three product functions reproduce exactly the full-valence CI energy at all bond lengths. More recently, we have c 2007 IOP Publishing Ltd
1
SciDAC 2007 Journal of Physics: Conference Series 78 (2007) 012067
IOP Publishing doi:10.1088/1742-6596/78/1/012067
demonstrated[5] convergence of the total energy to within 1 kcal/mole of the full-CI limit with only 13 expansion terms along the dissociation curve for the symmetric dissociation of H2O to its atomic fragments O(3P)+2H(2S). 2. Wave Function Optimization Although the results in the previous section are very encouraging, the most remarkable feature of our new method is the relatively small effort required to construct hamiltonian matrix elements and transition density matrices in the product function basis[2,3]. Given a wave function expansion of the form N"
! = # cM M M
in which the Nα product functions are denoted |M〉, each basis function |M〉 depends on a corresponding set of nonlinear parameters ϕ M. The wave function thereby depends on the linear expansion coefficients c and on the nonlinear parameters ϕ. Optimization of the expansion coefficients c to minimize the energy results in the symmetric generalized eigenvalue equation Hc=ScE with H MN = M Hˆ N and SMN=〈M|N〉. An efficient procedure to compute hamiltonian matrix elements and reduced one- and two-particle density matrices for this nonlinear expansion form has been developed [2]. The effort required to construct an individual hamiltonian matrix element between two product basis functions HMN scales as O(βn4) for a wave function expanded in n molecular orbitals. The prefactor β itself scales between N0 and N2, for N electrons, depending on the complexity of the underlying Shavitt graph. The corresponding metric matrix element SMN=〈M|N〉 requires effort that scales as O(βn). There is no component of the effort or storage for matrix element computation that scales as Ncsf. Hamiltonian matrix element timings with our initial implementation of this method are very promising. Wave function expansions that are orders of magnitude larger than can be treated with traditional CI methods require only modest effort with our new method. A matrix element involving product basis functions with n=N=46 corresponding to an underlying linear expansion space dimension Ncsf≈5.5⋅1024, or over 9.2 mol of CSFs, requires only a few seconds with our new method[24]. The computation of that same hamiltonian matrix element using traditional full-CI technology is estimated to require about 1024 seconds, which is about a million times longer than the age of the universe. Of course, a traditional full-CI code has the capability of computing HMN=x MTHxN for arbitrary vectors xM and xN, whereas our new method is restricted to vectors that can be represented in our graphical-based nonlinear expansion form. This restriction affects eventually Nα, the number of basis functions that are required to achieve convergence to chemical accuracy. We note at this point that our new method does not depend on approximating HMN in any way (e.g. local density or fast multipole approximations); our recursive procedure computes exactly the same HMN value as the traditional full-CI method. These timings show the tremendous potential of our new method. HMN computation scales approximately the same as the simple Hartree-Fock method, yet the method is capable of approaching the accuracy of full-CI wave functions. A major challenge presented by the new method is the optimization of the nonlinear parameters. A straightforward application of standard optimization methods, such as conjugate-gradient and quasinewton approaches, results in relatively slow convergence requiring hundreds or thousands of gradient evaluations. Each gradient evaluation[3] requires effort O(βn5), which is significantly more expensive than an energy evaluation, so it is critical to improve these methods and to develop other optimization methods that require less effort. In general, we find that our analytic gradient computation is 10 to 100 times faster than a finite difference approximation, and it is 10 to 100 times slower than a single energy computation. Fig. 4 in Ref. [4] shows the convergence of our initial implementation of various optimization methods. We have improved these methods in two different ways. First, there
2
SciDAC 2007 Journal of Physics: Conference Series 78 (2007) 012067
IOP Publishing doi:10.1088/1742-6596/78/1/012067
sometimes exist some arc factors that never contribute to a particular wave function due to point group symmetry restrictions. In our initial optimizations, these arc factors were included in the optimization space. These factors result eventually in gradient and hessian elements with value zero, and their inclusion in the optimization procedure hampered the convergence of the remaining arc factors. By pruning the arcs that correspond to these arc factors, these particular arc factors are eliminated from the optimization process, and the convergence of several of the standard methods, including the quasinewton[8] and conjugate gradient[9] is improved significantly. Our current results of these methods are shown in Fig. 1 for the same molecule and initial starting point as Fig. 4 in Ref. [4].
Fig. 1. Convergence of the nonlinear wave function parameters using various optimization approaches for the O3 molecule. The LBFGS curves are limited-memory quasi-newton results with storage of various numbers of history vectors. The conjugate gradient curve uses analytic gradients for determining both the search direction and for the line searches. The substitution approach replaces the arc factors at a single level with optimal values computed from the G[u] eigenproblem. The Split approach uses contractions from the G[u] eigenproblem for all orbital levels to define a 12-D subspace of the full 147-D variational space. The LBFGS curves in Fig. 1 use the limited-memory quasi-newton method of Liu and Nocedal [8]. Each step of this procedure requires a gradient and an energy evaluation. Various numbers of history vectors are stored, and it is clear from the curves that increasing the number of vectors improves the convergence in this case. However, none of the convergence curves are entirely satisfactory. The conjugate gradient curve shown uses the CG_DESCENT code of Hager and Zhang [9]. This method requires on average two to three function evaluations for each gradient evaluation. Like the quasi-newton curves, the convergence is relatively slow. The Newton curve in Fig. 1 is the result of an exact Newton method stabilized with rationalfunction shifts[10] and with the hessian matrix approximated with finite differences of analytic gradients. This procedure is very expensive – each step requires approximately the same effort as an entire quasi-newton or CG optimization, but it is included in order to show the potential convergence rate that a quasi-newton or CG procedure might achieve with optimal preconditioning. The substitution curve is based on a special purpose optimization method for this specific problem. If all of the nonlinear arc factors in the Shavitt graph are frozen except for those at a single orbital level u, then the optimal values of the arc factors α [u] at that level are given by the elements of the lowest eigenpair of the generalized symmetric eigenproblem G[u] α [u] = S[u] α [u] E Furthermore, if these optimal arc factors are substituted for the original arc factors at level u, then the computed energy is exactly the eigenvalue of this eigenproblem. That is, the Rayleigh quotient
3
SciDAC 2007 Journal of Physics: Conference Series 78 (2007) 012067
E=
IOP Publishing doi:10.1088/1742-6596/78/1/012067
! [u ]T G[u ]! [u ] ! [u ]T S[u ]! [u ]
is an exact representation of the energy as a function of this subset of nonlinear parameters. The computation of the orbital level matrices G[u] and S[u] is comparable in effort to a gradient evaluation. The substitution approach leads to improved convergence compared to the quasi-newton and CG methods in the initial iterations, but its convergence rate slows. Our final convergence method in Fig. 1 uses the optimal α [u] values from the orbital eigenproblem to define sets of arc factors, one set for each orbital level u. These n sets of arc factors are then used as contractions to define a subspace of the full optimization of dimension n. In Fig. 1, the dimension is 12 because there are 12 valence orbitals. A set of arc factors at each orbital level u are defined as
! [u ] (" u ) = cos" u ! [u0 ] + sin " u ! 1[u ]
where α 0 corresponds to the original arc factors at this optimization step and α 1 corresponds to the new set. The entire set of arc factors is then parameterized in terms of the n parameters θ . This subspace optimization is performed using several methods, including conjugate gradient and exact newton methods, but the most efficient approach so far corresponds to a greedy sequential optimization approach in which each θu is optimized, in closed form, from an eigenvalue equation of dimension 2Nα. This is efficient because each step is only three times the cost of an energy evaluation, which is relatively cheap, and convergence is rapid, rarely requiring more than 1.5n to 2n steps. As seen in Fig. 1, the convergence of this method with respect to the number of gradient evaluations is much improved over the other methods. However, the convergence rate eventually slows, indicating that the contractions α 1 are not as optimal later in the iterative procedure as they are at the beginning Consequently, we feel that this approach shows promise, but we still seek improvement. In the future, we plan to explore other optimization approaches, including those based on stepwise optimization of nested subgraphs and on various interpolation approaches [3,5,11]. 3. Spin-Orbit Interaction Our method was originally developed to compute energies and wave functions for the nonrelativistic spin-free electronic hamiltonian operator. We have begun to extend our method to include spin-orbit interaction appropriate for molecules containing transition metals, Lanthanides, and Actinides. Our initial approach will include the same relativistic core potentials and valence spin-orbit operators as those in the COLUMBUS Program System. With this approach, we use multiheaded Shavitt graphs[6] in which each head corresponds to the possible set of spin states that interact through the spin-orbit operator. Fig. 2 shows an example of such a graph for a simple case of four electrons in four orbitals. The important feature of this approach is that the recursive computation of transition density matrix elements for this graph can be computed with roughly the same effort as for the singleheaded nonspin-orbit case, but these elements can be combined with the hamiltonian integrals in such a way that entire blocks of hamiltonian elements are computed simultaneously. For example, for the Shavitt graph shown in Fig. 2, each O(βn4) recursive step would result in a hamiltonian subblock of dimension 9x9. For a 20-electron calculation (e.g. the valence orbitals of a molecule containing a Lanthanoid atom), each such step would result in a hamiltonian subblock of dimension 121x121. Thus we find that the cost per hamiltonian matrix element actually decreases in our approach with increasing molecule complexity. This feature is particularly appealing because with traditional spinorbit CI approach, the cost increases dramatically with increasing complexity. 4. Conclusions We have presented our latest results of a general formulation for energy-based optimization of the arc factors for our recently developed nonlinear wave function expansion form for electronic wave functions. The energy-based optimization is formulated in terms of analytic energy gradients and orbital-level hamiltonian matrices which correspond to a specific kind of uncontraction of each of the 4
SciDAC 2007 Journal of Physics: Conference Series 78 (2007) 012067
IOP Publishing doi:10.1088/1742-6596/78/1/012067
product basis functions. These orbital-level hamiltonian matrices give an intuitive representation of the energy as a function of disjoint subsets of the arc factors, they provide for an efficient computation of gradients of the energy with respect to the arc factors, and they allow optimal arc factors to be determined in closed form for subspaces of the full variation problem. Our analytic gradient computations are between a factor of 10 and 100 times faster than a finite-difference approximation. Our optimization procedures have improved over the past year, but we still seek better methods in order to handle efficiently molecules with 20 to 30 electrons. Energy and gradient computations with our new method require relatively little effort compared to other electronic structure methods. Timings for energy and arc factor gradient computations involving expansion spaces of over 1024 CSFs have been reported. We have begun to extend our nonlinear expansion method to include spinorbit interactions. An appealing feature of our method is that the cost per hamiltonian matrix element decreases as the number of interacting spin states increases. This suggests that molecules containing transition metals, Lanthanide atoms, and Actinide atoms may be computed with our new method with a flexibility and accuracy that has never before been possible.
Fig. 2. Multiheaded Shavitt graph for N=4 and n=4. The heads correspond to the spin states that interact through the spin-orbit operator. In this example, the states correspond to singlets (b=0), triplets (b=2), and quintets (b=4). The arc factors at the top level of the graph are associated with individual spin states. Those at lower levels in the graph are shared by one or more spin states. References [1] R. Shepard, J. Phys. Chem. A 109, 11629 (2005) [2] R. Shepard, J. Phys. Chem. A 110, 8880 (2006) [3] R. Shepard and M. Minkoff, Int. J. Quantum Chem. 106, 3190-3207 (2006). [4] R. Shepard, A. F. Wagner, and S. K. Gray, J. Physics: Conference Series 46, 239-243 (2006). [5] R. Shepard, M. Minkoff, and S. R. Brozell, Int. J. Quantum Chem. (submitted, 2007). [6] S. R. Brozell, R. Shepard, and Z. Zhang, Int. J. Quantum Chem. (submitted, 2007). [7] I. Shavitt, The Graphical Unitary Group Approach and its Application to Direct Configuration Interaction Calculations, in The Unitary Group for the Evaluation of Electronic Energy Matrix Elements (Lecture Notes in Chemistry 22), pp. 51-99, J. Hinze, ed., (Springer-Verlag, Berlin, 1981). [8] D. Liu and J. Nocedal, Mathematical Programming B 45, 503 (1989). [9] W. W. Hager and H. Zhang, ACM Transactions on Mathematical Software 32, 113 (2006). http://www.math.ufl.edu/~hager/papers/CG. [10] A. Banerjee, N. Adams, J. Simons, and R. Shepard, J. Phys. Chem. 89, 52 (1985). [11] P. Pulay, Chem. Phys. Letts. 73, 393 (1980); P. Pulay, J. Comput. Chem. 3, 556 (1982). 5