Computing vibrational energy levels by using

THE JOURNAL OF CHEMICAL PHYSICS 130, 214110 共2009兲

Computing vibrational energy levels by using mappings to fully exploit the structure of a pruned product basis Jason Coopera兲 and Tucker Carrington, Jr.b兲 Chemistry Department, Queen’s University, Kingston, Ontario K7L 3N6, Canada

共Received 2 October 2008; accepted 1 May 2009; published online 4 June 2009兲 For the purpose of calculating 共ro-兲vibrational spectra, rate constants, scattering cross sections, etc. product basis sets are very popular. They, however, have the important disadvantage that they are unusably large for systems with more than four atoms. In this paper we demonstrate that it is possible to efficiently use a basis set obtained by removing, from a product basis set, functions associated with the largest diagonal Hamiltonian matrix elements. This is done by exploiting the fact that for every factor of every term in the Hamiltonian, there is a basis-set order in which the matrix representation of the factor is block diagonal. Due to this block diagonality the Lanczos algorithm can be implemented efficiently. Tests with model Hamiltonians with as many as 32 coordinates illustrate the merit of the ideas. © 2009 American Institute of Physics. 关DOI: 10.1063/1.3140272兴 I. INTRODUCTION

The best approaches for solving the time-dependent and the time-independent Schrödinger equation make use of a basis set.1–6 Wave functions or wave packets are written as a linear combination of basis functions and methods of linear algebra are used to determine the coefficients. The simplest basis functions are product functions. Each function is a product of functions of a single variable. For a D-dimensional system with vibrational coordinates 兵q1 , q2 , . . . , qD其 a product basis function 兵␾i1,i2,. . .,iD其 is

␾i1,i2,. . .,iD = ␣i11共q1兲␣i22共q2兲 . . . ␣iDD共qD兲.

共1兲

If all of the indices i1 , i2 , . . . , iD take values from 1 to n 共in practice it is easy to use different upper limits for different coordinates兲 then one has a direct product basis with nD functions. For the purpose of dealing with important singularities, it is advantageous to use a product basis which has factors with shared indices—a nondirect product basis.7 A great advantage of a product basis is that it enables one to do matrix-vector products at a cost that scales as nD+1.8 This is true regardless of the complexity of the kinetic energy operator 共KEO兲 and the potential. Matrix-vector products must be evaluated to propagate a wave packet or use an iterative eigensolver. The most important advantage of the product basis is simplicity. However, due to the fact that a product basis includes all possible products of the one-dimensional 共1D兲 functions, a product basis is so large that it becomes unusable if D is larger than about 6. This is because one must keep in memory vectors as large as the basis. Clearly many of the functions in a product basis are unimportant. It therefore seems obvious that one should remove some functions a兲

Electronic mail: [email protected]. Electronic mail: [email protected]. Fax: 613-533-6669.

b兲

0021-9606/2009/130共21兲/214110/7/$25.00

from the full product. In principle this is easy to do; in practice it is not easy to remove functions without increasing the cost of a matrix-vector product.9–11 In this paper we present new ideas for using “pruned” basis sets 共i.e., DP basis sets from which some functions have been removed兲. Pruning can reduce the size of a product basis by orders of magnitude. The pruned basis set is not a product basis set, although it is obtained from one. The key point of the paper is that even after pruning considerable structure remains. The matrix representing an operator that depends on a single variable is, if basis functions are in the right order, block diagonal and pruning does not ruin the block diagonality. Previous attempts to exploit this structure have used constrained sums10,12 or mapping arrays.8,9,11,13 We demonstrate the efficiency of two new approaches by computing energy levels for model problems with 6 to 32 coordinates. II. MATRIX-VECTOR PRODUCTS IN THE PRUNED BASIS

There are various criteria for retaining basis functions. We choose to retain the basis functions with the smallest diagonal Hamiltonian matrix elements.14 More details will be presented in Sec. IV. With a basis of this type it is not in general possible to do matrix-vector products using the constrained index idea of Refs. 10 and 12. In this section we focus on how to do matrix-vector products in a pruned basis. We write the Hamiltonian as, g

D

H = 兺兿 ˆt共d,l兲共qk兲,

共2兲

l=1 d=1

and do matrix-vector products term by term.8 In this equation g is the number of terms in the Hamiltonian. The ˆt共d,l兲共qk兲 factors can be derivatives or functions and in some terms many of them are unit operators. Terms for which all ˆt共d,l兲共qk兲 factors, except one, are identical are combined in what follows. All the curvilinear coordinate systems custom-

130, 214110-1

© 2009 American Institute of Physics

Downloaded 15 Jul 2009 to 129.132.218.38. Redistribution subject to AIP license or copyright; see http://jcp.aip.org/jcp/copyright.jsp

214110-2

J. Chem. Phys. 130, 214110 共2009兲

J. Cooper and T. Carrington, Jr.

TABLE I. Index mapping as described in Sec. II. On the left, indices for the elements of a pruned product basis set, labeled by p. On the right, indices pdi for each degree of freedom d, sorted as described in the text. Underlines indicate the end of a block. p

k1p

k2p

k3p

i

p1i

p2i

p3i

1 2 3 4 5 6

1 2 1 2 2 2

1 1 1 1 2 3

1 1 2 2 2 2

1 2 3 4 5 6

1 2គ 3 4គ 5គ 6គ

1គ 3គ 2គ 4 5 6គ

1 3គ 2 4គ 5គ 6គ

arily used in three- and four-atom variational vibrational calculations have KEOs of this form. Frequently the potential is also written as a sum-of-products of functions of a single coordinate. In particular, this sum-of-products form facilitates multiconfiguration time-dependent Hartree calculations.15 There are several algorithms for fitting or refitting potentials so that they have this sum-of-products form.16,17 The use of a sum-of-products form is a common means of avoiding a full basis and/or a full grid. For larger molecules not only does a simplified representation of the potential facilitate solving the vibrational Schrödinger equation, it is also essential because a complete general potential cannot be extracted from a reasonable number of ab initio calculations. To evaluate a matrix-vector product for a single D ˆ共d,l兲 t 共qk兲 term we successively apply Nr ⫻ Nr matrices for 兿k=1 each of the ˆt共d,l兲共qk兲 factors to a vector, where Nr is the number of retained basis functions. This is equivalent to inserting the operator that projects onto the retained basis between each of the factors. In many papers, the sum-of-products structure has been exploited by using the fact that in a product basis the matrix representation of a product of operators is a tensor product of small matrices. In this paper we exploit this structure in a different way. We approximate the matrix representation of a product of operators with a product of matrix representations and, for each term, evaluate matrixvector products factor by factor. We have tested two 共related兲 ideas for doing matrixvector products for the factors. Both ideas exploit the fact that a ˆt共d,l兲共qk兲 factor couples only 1D basis functions for qk. In the first method basis functions are ordered so that matrices representing ˆt共d,l兲共qk兲 factors are block-diagonal and matrix-vector products are done by using a mapping array. We call this the block-diagonal method. Each retained product basis function is represented by ␾ p, where p corresponds to a set of D indices 共for the D 1D factors兲, denoted k1p , k2p , . . . , kDp . The pth retained basis function is

␾ p = 兿 ␣kd共qd兲. d

d

p

共3兲

The correspondence between p and sets of k1p , k2p , k3p values is given on the left side of Table I for a three-dimensional 共3D兲 problem. In this example the basis functions are ordered so that the matrix representing ˆt共1,l兲共q1兲 is block diagonal 共with 2 ⫻ 2 diagonal blocks兲. The block diagonality of the matrix reduces the cost of multiplying it onto a vector. Clearly, with this basis function order the matrix representing ˆt共2,l兲共q2兲 is

not block diagonal and applying it to a vector would be more expensive. There is, however, for each d, a basis function order that makes the matrices representing operators ˆt共d,l兲 ⫻共qd兲 block diagonal. An element of the Nr ⫻ Nr matrix representing ˆt共d,l兲共qd兲 is 具␾ p兩tˆ共d,l兲共qd兲兩␾q典 = 具␣kd兩tˆ共d,l兲共qd兲兩␣kd典兿具␣ke 兩␣ke 典, d

d

p

q e⫽d

e

e

p

=具␣kd兩tˆ共d,l兲共qd兲兩␣kd典兿 ␦ke ,ke , d

p

d

q e⫽d

p q

q

共4兲共5兲

d

assuming that the 兩␣kd典 are orthogonal. The matrix is block p diagonal. The diagonal blocks are labeled by indices ke, with e ⫽ d. Because a block diagonal matrix-vector product is much less costly than a general matrix-vector product it would clearly be advantageous to be able to apply each factor in block-diagonal form. For each coordinate qd we choose a basis function order that makes the Nr ⫻ Nr matrices representing ˆt共d,l兲共qd兲 block diagonal. The ideas are illustrated in Table I for a small 共Nr = 6兲 3D basis. First, the basis functions, in some original order, are labeled by p. The left side of Table I shows an original order. For all of the orderings 共i.e., all the coordinates兲 the index i labels successive basis functions, i = 1 , . . . Nr. For each coordinate we must associate each value of i with a value of p. It is therefore necessary to create D separate sets of indices pdi . When rows and columns of the matrix are labeled by i, Nr ⫻ Nr matrices of all ˆt共d,l兲共qd兲 are block diagonal. The right side of Table I contains the i → p mappings required to make functions of any of the three coordinates of block diagonal. For example, the last column indicates that to block diagonalize the matrix representing a function of q3 one should use the order: 共k1 , k2 , k3兲 = 共1 , 1 , 1兲 , 共1 , 1 , 2兲 , 共2 , 1 , 1兲 , . . .. Underlined entries on the right side of the column indicate the end of a block. In Sec. V we demonstrate that the cost of creating, storing, and using these lists is acceptable. Neither p nor i needs to be stored. For this example we would store the three rightmost columns on both sides of the table. For coordinate e, the required ordering is determined by sorting the p so that the associated 兵ke其, ignoring the index e = d, occur in lexicographic order. The mappings on the right need only be calculated once, and only one such mapping needs to be stored for each degree of freedom. When applying each 1D operator the corresponding mapping is traversed sequentially—an operation that can be performed very efficiently.


214110-3

J. Chem. Phys. 130, 214110 共2009兲

Using mappings to compute vibrational levels

The second method is a modified version of compressed row storage 共CRS兲.18 If we used standard CRS for each factor it would be necessary to store a different mapping array for every factor. Using standard CRS would therefore be more costly than the block-diagonal mapping method of the previous paragraph. Rather than storing vectors that specify the position of nonzero elements of a ˆt共d,l兲共qk兲 factor, we store vectors that specify the position of all elements for which ␣ej ⫽ ␣ej⬘ and e ⫽ d. This requires larger vectors that would be necessary for a single factor using standard CRS but has the advantage that only one set of vectors needs to be stored for each coordinate. Instead of storing the matrix elements themselves we store two indices that enable us to retrieve the matrix element from each 1D matrix. This modified CRS approach requires slightly more storage than the blockdiagonal mapping method, but it also works well. When the factors of a term are represented in the pruned basis and applied sequentially the order in which they are applied matters. For example, t共d,l兲共t共d⬘,l兲v兲 and t共d⬘,l兲共t共d,l兲v兲, where v is a vector, are not equal. To apply a product, for example, ˆt共d,l兲共qd兲 ˆt共d⬘,l兲共qd⬘兲, we insert a basis-set resolution of the identity between the two operators. Because the basis set is not complete this introduces an approximation. This sort of approximation is called a “product approximation” in Refs. 3 and 19. The order matters because applying the first 共rightmost兲 operator to a function can create a function that is not a linear combination of the pruned-basis functions, but becomes a linear combination of the pruned functions when the second operator is applied. As “leakage” is discarded the order becomes important. For example, consider a two-dimensional 共2D兲 problem for which the retained basis is composed of 共n1 , n2兲苸兵共0 , 0兲 , 共0 , 1兲 , 共1 , 0兲 , 共2 , 0兲 , 共1 , 1兲 , 共0 , 2兲其. If a2a†1 共where a†1 and a2 are raising and lowering operators20兲 is applied to a function F共q1 , q2兲 that is a linear combination of the retained basis functions one obtains another linear combination of the basis functions. However if one first computes G共q1 , q2兲 = a†1F共q1 , q2兲 and retains only its projection on the basis and then applies a2 to G共q1 , q2兲 one makes an error because G共q1 , q2兲 “leaks “ out of the basis. Due to this leakage, matrix representations of factors that commute in a DP basis do not commute in a pruned basis, and products which are symmetric in a DP basis may not be symmetric in a pruned basis. To ensure that the matrix for which we compute eigenvalues is symmetric we make the Hamiltonian operator explicitly Hermitian. Each product of 1D operators ABC. . . is replaced with 1 2 共ABC . . . + . . . CBA兲. Computation time can be reduced somewhat by combining terms with identical ˆt共d,l兲共qk兲 factors. For example, q21 + q21q2 can be replaced by q21共1 + q2兲.

methods that exploit the sum-of-products form. We have computed levels for HL, for which the number of terms is large when D is large, and for HS, which has, for a given D, fewer terms. For HL we can check our results because exact energies are easily obtained. For HS exact energies are not available. HS is used to study the scaling and computational cost of the method. To make HL we start, as in Ref. 21, with an uncoupled Hamiltonian of the form D

D

1 ⳵2 + 兺关c2共x2n兲 + c3共x3n兲 + c4共x4n兲兴, 共6兲 Hsep共x兲 = 兺 − 2 ⳵ x2n n=1 n=1 and then make a coordinate transformation. We use c2 = −1.0, c3 = −0.2, and c4 = 0.1. Because it is separable, exact energies can be obtained for this Hamiltonian by solving 1D problems in a large sinc discrete variable representation 共DVR兲 basis.22 The coupled Hamiltonian HL共q兲 is obtained by writing xn as 共7兲

x = Mq,

where M is a square matrix with unit determinant. We set M = kT, where

冦

1 i = j,

Tij = a i and j adjacent, 0 otherwise,

冧

共8兲

with “i and j adjacent” taken in a cyclic sense—i.e., i and j are adjacent if either 兩i − j兩 = 1 or 兩i − j兩 = D − 1. The constant k is chosen to give det共M兲 = 1. The coupled Hamiltonian obtained is not separable into two or more uncoupled problems. Our choice of transformation matrix T differs from that of Ref. 21, where all off-diagonal elements of T take the value a. We have chosen the transformation matrix to give a manageable number of terms in the sum-of-products Hamiltonian. Even so, for a 32-dimensional 共32D兲 problem, the kinetic energy 共which contains cross terms ⳵2 / ⳵qi ⳵ q j for all pairs 兵i , j其 of degrees of freedom兲 has 528 terms, most of these a product of two 1D operators, while the potential energy has 608 terms. The larger the number of terms the more computer time is required to obtain converged energy levels. For most high-dimensional problems of practical importance one will not have enough information to build such a comprehensive Hamiltonian and we have therefore also done tests with a Hamiltonian HS with fewer terms. It is obtained by making the coordinate transformation only for the potential terms in Eq. 共6兲, and simply replacing xi with qi in the KEO. The KEO of HS has therefore no cross terms. The energy levels of HS are not the same as those of Eq. 共6兲, so exact solutions cannot be obtained.

III. MODEL PROBLEMS

IV. THE BASIS SET

We demonstrate the efficiency of the methods using two model Hamiltonians. The approaches we propose will be most useful when one wishes to do quantum calculations for a Hamiltonian that does not have a huge number of terms. The cost of the calculation scales linearly with the number of terms in the Hamiltonian. This is of course true for many

We retain functions from a product basis set each of whose functions is a product of 1D eigenfunctions. 1D eigenfunctions for each qk are obtained by evaluating the potential of H共q兲, at q j = 0, j ⫽ k and retaining in the KEO only the term with ⳵2 / ⳵q2k . The 1D eigenfunctions are computed in a DVR primitive basis set, so that there is no re-


214110-4

J. Chem. Phys. 130, 214110 共2009兲


striction on the form of the 1D Hamiltonians H共i兲共qi兲.3,23 Other choices for the 1D basis set, such as eigenstates of the 1D Hamiltonian in Eq. 共6兲, may yield better convergence for the full-dimensional problem but we prefer our choice because it is general and does not take advantage of the separability of Hsep共x兲. It is possible that retaining from a product basis set made of products of simultaneous diagonalization21,24 or wavelet25 functions might enable one to make an even smaller pruned basis. From the full DP basis, we retain the Nr basis functions ␾ p for which the diagonal elements 具␾ p兩H兩␾ p典 are smallest. If D is large it is far too costly to compute all the diagonal elements and select those that are smallest. This is true even if the maximum index for each set of 1D functions is quite small. Instead, we employ a method similar to that used in Refs. 24 and 25 for a 16D problem. The Hamiltonian is divided into three pieces, H = Hlow + Hhigh + Hcoup, where Hlow includes all terms involving only qn with 0 ⬍ n ⱕ D / 2, Hhigh includes all terms with coordinates qn and D / 2 ⬍ n ⱕ D, and Hcoup includes the remaining terms, which couple coordinates in the two groups. This separation can be applied recursively, subdividing Hlow and Hhigh until they are of a tractable dimensionality D⬘. We use D⬘ = 4 and denote level = f the number of divisions required to attain D⬘ = 4. For each of the smallest problems H⬘ 苸兵Hlow , Hhigh其, we then determine the Nr⬘ product basis functions of dimension D⬘ for which the diagonal elements of H⬘ are lowest. Products of these basis functions are then used to determine which functions to retain for the Hamiltonians of level = f − 1. Working back in this fashion we obtain the Nr product functions for the full problem. The size Nr⬘ of the intermediate basis sets must be chosen such that: 共1兲 no basis function which would otherwise enter into the final basis set is omitted; but 共2兲 each intermediate set of Nr⬘ ⫻ Nr⬘ product functions is of a manageable size. If the coupling is strong and Nr⬘ is not large enough one could miss low-energy basis functions. Here, as in Ref. 24, we use Nr⬘ = 5000. V. RESULTS

Using the ideas outlined in the previous sections we have computed energy levels of Hamiltonians with between 6 and 32 coordinates. We present numerical results obtained with the block-diagonal method. The modified CRS results and timings are similar. The mere fact that it is possible to compute levels for a 32D Hamiltonian indicates that the methods are powerful, however, we begin this section with a discussion of a 6D test because in 6D we can also use a DP basis, and thereby test and assess the efficiency of the mapping procedures. For these tests we use HL with a = 0.1. A basis of 14 1D slice eigenfunctions, computed in a sinc DVR with 400 functions in the range x 苸共−9 , +9兲, is used for each coordinate. The Hamiltonian is a sum of 75 terms, with a total of 162 1D factors. Pruned-basis results were obtained for this Hamiltonian using 1000 iterations of the Lanczos algorithm, with an initial vector which has a one in the entry that corresponds to one of the basis function with the lowest diagonal element and zeros in all other entries. Lanczos vectors were not orthogonalized, and a block algorithm was not

TABLE II. Calculated energies for the first few levels of a 6D HL problem with a = 0.1 in: a pruned basis with Nr = 30 000 basis functions; and a direct product 共DP兲 basis with 14 basis functions per degree of freedom and 7 529 536 functions. Exact levels in the last column are obtained from a separable Hamiltonian. Figures in parentheses are the degeneracy of the exact levels. Minimum and maximum values are the smallest and largest eigenvalues in a cluster associated with an exact level. Where no maximum value is indicated max= min. Nr = 30 000

DP, n = 14

Min

Max

Min

Max

Exact

⫺30.928 54 ⫺28.721 44 ⫺26.810 71 ⫺26.513 65 ⫺26.083 89 ⫺25.351 98

¯ ⫺28.719 32 ⫺26.806 93 ⫺26.497 64 ⫺26.083 64 ⫺25.349 20

⫺30.928 76 ⫺28.722 00 ⫺26.814 14 ⫺26.515 24 ⫺26.098 53 ⫺25.368 82

¯ ⫺28.722 00 ⫺26.814 14 ⫺26.515 24 ⫺26.098 53 ⫺25.368 09

⫺30.928 75共1兲 ⫺28.722 00共6兲 ⫺26.814 16共6兲 ⫺26.515 24共15兲 ⫺26.100 97共6兲 ⫺25.371 19共6兲

used, and therefore degeneracies are not obtained for these calculations.26,27 Spurious eigenvalues were eliminated by including in our results only those energies for which duplicates agree to at least ten significant figures. In Table II we compare the first few levels computed with a basis of 30 000 retained functions to those computed with a full DP basis set with n = 14 functions for each coordinate 共i.e., ⬃7.5⫻ 106 functions兲, and the corresponding exact values. Both the DP and pruned-basis energies were obtained using the Lanczos algorithm. Even after the removal of copies introduced by the Lanczos algorithm, the DP and pruned eigenvalues that correspond to an exact degenerate eigenvalue occur in tight clusters. For each cluster we report maximum and minimum values in the table. Why do we find clusters? First, what degeneracy pattern does one expect? The Hamiltonian operator commutes with all cyclic permutations of the D coordinates and its symmetry group is therefore isomorphic with the point group CD. For odd D, this gives a 1D irreducible representation 共irrep兲 A and 共D − 1兲 / 2 2D irreps En. For even D, there are two 1D irreps, A and B, and 共D / 2 − 1兲 2D irreps En. The degeneracies of the exact values in Table II are therefore largely accidental, i.e., groups of levels that are degenerate by symmetry are themselves accidentally degenerate. The DP eigenvalues occur in clusters due to finite-basis effects. Groups of levels that are degenerate by symmetry, and accidentally degenerate when levels are exact, merge as the size of the DP basis is increased, but if the basis is not large enough they appear as clusters. For the pruned-basis calculation the clusters are broader. There are two reasons. First, we retain 30 000 functions but make no attempt to ensure that all basis functions that can be obtained by applying symmetry operations to functions in the pruned basis are also in the pruned basis. This breaks the symmetry and therefore even the degeneracy imposed by the symmetry of the Hamiltonian operator will not be perfect. Second, although we write the Hamiltonian in explicitly Hermitian form 共see Sec. II where we discuss loss of symmetry兲 so that the matrix for which we compute eigenvalues is symmetric, leakage causes an effective noncommutativity of matrix representations of factors that should commute and the


214110-5

J. Chem. Phys. 130, 214110 共2009兲


TABLE III. Calculated energies for the first few levels of a 32D HL problem with a = 0.02 in a pruned basis with 300 000 basis functions. Results are compared against diagonal values of the Hamiltonian, the results of second-order perturbation calculations, and the exact levels. Figures in parentheses give the degeneracy of the exact levels. Min and Max have the same meaning as in Table II. HL diagonal Min

Second order

Max

⫺163.580 67 ¯ ⫺161.386 13 ¯ ⫺159.498 80 ¯ ⫺159.191 58 ⫺159.190 32 ⫺158.854 45 ¯ ⫺158.087 12 ¯

Nr = 300 000

Min

Max

Min

Max

Exact

⫺165.209 71 ⫺162.999 36 ⫺161.069 11 ⫺160.788 84 ⫺160.239 90 ⫺159.500 11

¯ ⫺162.999 30 ⫺161.068 62 ⫺160.742 67 ⫺160.233 28 ⫺159.497 19

⫺164.951 62 ⫺162.737 36 ⫺160.798 48 ⫺160.407 00 ⫺159.970 62 ⫺159.225 53

¯ ⫺162.727 40 ⫺160.785 29 ⫺160.337 70 ⫺159.962 93 ⫺159.216 97

⫺164.953 38共1兲 ⫺162.746 62共32兲 ⫺160.838 78共32兲 ⫺160.539 87共496兲 ⫺160.125 60共32兲 ⫺159.395 82共32兲

retained basis Hamiltonian matrix does not therefore have the same symmetry group as the Hamiltonian operator. If D is small this problem can be rectified by replacing each term with a sum of terms obtained by applying all permutations of the 1D operators, but even for HL with D = 6 this is costly. Despite the fact that we must compare DP and prunedbasis clusters it is clear that the pruned basis with only 30 000 functions gives results which are very nearly as good as those of the full DP basis with 146 = 7 529 536 functions. Whereas the pruned-basis calculation requires a little over 47⫻ 106 double-precision multiplications per Lanczos iteration, the full direct-product calculation requires 17⫻ 109—already, in six dimensions, 360 times as many. It is not surprising that not all of the nD functions of a DP basis are necessary. What is not obvious is that it is possible to use the pruned basis efficiently. To test the efficiency of the block-diagonal mapping method we use the algorithm and program used to generate the pruned results of Table II, but retain all the basis functions and compare the cost of the calculation with the cost of a DP calculation that fully exploits the product structure of the basis. By exploiting the product structure one can do matrix-vector products by summing only over 1D indices.8 If a given term l has bl nonidentity t共d,l兲 matrices, the cost of the matrix-vector multiplication for that term is blnD+1. If the matrix-vector product is instead done with the approach of Sec. II, by applying nD ⫻ nD block-diagonal matrices for each factor, one expects the cost to increase. We have tested the method of Sec. II with several DP basis sizes and, for all sizes tested, it requires only about 60% more computer time than the sequential summation DP approach. Thus, the overhead is low. Of course, one would not use the approach of Sec. II if none 共or few兲 of the product basis functions were discarded. Mapping

approaches are important because a very large fraction of the basis functions 共and the fraction increases as D increases兲 can be discarded. For the HL calculation with D = 6 the fraction of retained basis functions is less than 0.04 共and for the 32D calculation the fraction is about 10−40兲. The overhead incurred clearly does not compensate for the huge reduction in basis size. There is no need to use a pruned basis set for a 6D Hamiltonian. To demonstrate the power of the sequential application of block-diagonal matrices approach we apply it to compute low-lying levels of a 32D Hamiltonian. With diffusion Monte Carlo28,29 one can compute the ground state of high dimensional problems, but there are no other methods with which one can compute many levels of 32D Hamiltonians. In Table III we present the low-lying levels of HL with D = 32 and a = 0.02. The pruned-basis results are compared against the exact energies, the diagonal elements, and second-order perturbation theory results for the Hamiltonian HL. Diagonal elements are simple approximations for the energy levels, so that the difference between a diagonal element and an exact level is a measure of the effect of the coupling. The first few energy levels 共actually 65 eigenvalues due to degeneracy兲 computed in the retained basis are significantly better than those obtained from second-order perturbation theory, while subsequent levels are comparable. Note that the cost of the second-order perturbation theory calculation is similar to the cost of doing the Lanczos calculation, because to apply second-order perturbation theory one must do N matrix-vector products to get N levels. Because the degeneracy of the diagonal elements is not necessarily the same as the degeneracy of the exact eigenvalues more than one diagonal element may correspond to an exact eigenvalue. The calculations were done using a pruned basis con-

TABLE IV. Timing results for the exactly solvable model Hamiltonian HL. Number of 1D operations per Lanczos iteration is 272, 704, 1664, and 4352 for 4D, 8D, 16D, and 32D problems, respectively. Time per 1D operation, ms 共percentage of total calculation time兲 Nr

D=4

D=8

D = 16

D = 32

300 3000 30 000 300 000

0.018 43 共44%兲 0.3016 共90%兲 5.312 共91%兲 180.2 共81%兲

0.011 01 共74%兲 0.1509 共82%兲 2.230 共80%兲 126.8 共76%兲

0.009 898 共72%兲 0.1228 共78%兲 2.102 共78%兲 106.9 共74%兲

0.009 087 共70%兲 0.1248 共78%兲 2.072 共78%兲 120.4 共76%兲


214110-6

J. Chem. Phys. 130, 214110 共2009兲


TABLE V. Timing results for the model Hamiltonian HS. Number of 1D operations per Lanczos iteration is 248, 592, 1184, and 2368 for 4D, 8D, 16D, and 32D problems, respectively. Time per 1D operation, ms 共percentage of total calculation time兲 Nr

D=4

D=8

D = 16

D = 32

300 3000 30 000 300 000

0.018 43 共82%兲 0.3004 共90%兲 5.299 共91%兲 163.8 共81%兲

0.011 10 共74%兲 0.1476 共82%兲 2.722 共81%兲 129.4 共76%兲

0.009 904 共72%兲 0.1205 共78%兲 1.684 共74%兲 95.61 共72%兲

0.009 064 共71%兲 0.1209 共78%兲 1.835 共77%兲 104.8 共73%兲

sisting of 300 000 basis functions, and 2000 Lanczos iterations. The full calculation, including basis-set pruning, took a little more than 12 days on a single node of the SGI Altix 4700 computer 共which has dual-core 1.6 GHz Itanium 2 processors兲 of the Réseau québécois de calcul de haute performance 共RQCHP兲. It should be emphasized that no attempt was made to exploit the symmetry of the Hamiltonian— indeed, the same code could be applied to less-symmetrical problems of the same type 共using a more complicated q to x transformation matrix, for example兲. The quality of the levels we compute could be improved by retaining more basis functions. Doing so would increase the computer time required. Note that, unlike most basis-set methods, the memory cost is not a serious problem 共even the 32D problem requires only a few hundred megabytes of memory兲. Tables IV and V present information about how the cost of the block-diagonal mapping approach scales with D. The average time required for the application of a single t共d,l兲 factor is given for various basis sizes and D values. For each table entry 10 Lanczos iterations were performed. Only part of the total time is required for applying the t共d,l兲 factors; the percentage is indicated. The rest of the time is associated with other vector operations 共the computation of scalar products and the addition of vectors兲 involved in the Lanczos method. All of these calculations are done with 25 1D functions for each degree of freedom. The RQCHP Altix was used. To ensure that computer times for different calculations are comparable, when we did the 10 Lanczos iterations we also recorded the time required to do 109 double-precision multiplications. The times required for the 109 doubleprecision multiplications vary by less than a part in a thousand. For both HL and HS, the time per t共d,l兲 factor shows no 1

Time per 1D (s)

10

-1

a

b

10-2 10-3 10-4 10-5 10-6 102

103

104

105

Basis size (Nr)

106

102

103

104

105

106

Basis size (Nr)

FIG. 1. Scaling of the average time for applying a single 1D operator represented in a pruned basis with Nr functions for 共a兲 HL and 共b兲 HS models. For each value of Nr, points are plotted for models with 共from top to bottom兲 4, 8, 16, and 32 degrees of freedom. The line plotted in both cases is for t = 3 ⫻ 10−9Nr1.35.

strong trend with increasing dimensionality, though it generally decreases somewhat as the dimensionality increases 共probably due to the diminishing importance of overhead兲. The times for HL and HS are very similar. In almost all cases, the application of the 1D operators takes 70%–80% of the total calculation time, As expected, the time per t共d,l兲 factor increases quickly with Nr. The matrix representing t共d,l兲 is always applied in block-diagonal form. If the size of these blocks were independent of Nr, only the number of blocks would increase as Nr is increased and one would expect the computational time to scale linearly with Nr. In practice, this is not the case. The inclusion of a larger number of basis functions results in a larger average block size and we find that the computational complexity scales as Nr1+␣ with ␣ ⬎ 0. In Fig. 1 we present timing results for both the HL and HS models. For both models, and for all values of D we tested, the average time for the application of a t共d,l兲 factor is fit well by taking ␣ = 0.35. VI. CONCLUSION

Product basis sets are very popular. They are easy to build and, due to the structure of the basis, product-basis matrix-vector products are efficient.8,30,31 For a problem with three or four atoms, it is fairly straightforward to write a product-basis computer program to calculate spectra, cross sections, rate constants, etc. For most systems with more than four atoms a product basis is unusably large. It is natural to use product basis functions but not to retain all the products. It is not difficult to use such a pruned basis if one uses a direct eigensolver. However, using a direct eigensolver limits one to keeping 共with today’s computers兲 no more than about 50 000 basis functions. To make this idea practical and useful for difficult problems one must find a technique to using the pruned basis set with iterative methods that do not require storing the Hamiltonian matrix. The matrix-vector product is at the heart of all iterative approaches so what is needed is a method for evaluating matrix-vector products in a pruned basis. In this paper we present two such methods. Using these approaches it is possible to use pruned basis sets to compute energy levels of a Hamiltonian with 32 coordinates. The methods are applicable to any Hamiltonian in a sum-of-products form, with no restriction on the form of the 1D factors. Pruning drastically reduces the memory cost of quantum dynamics calculations. In addition, because it reduces the spectral range of the Hamiltonian matrix it decreases the number of matrix-vector products required to obtain a converged spectrum. Both methods we propose have


214110-7

low overhead and scale favorably with the dimensionality of the problem and number of retained basis functions. The cost of the mapping methods scales linearly with the number of terms in the sum-of-products Hamiltonian. Although one should do everything possible to reduce the number of terms,32–34 for high-dimensional problems there will, in general, be many terms in the Hamiltonian. This is also a problem for other methods using the a sum-of-products Hamiltonian.15 A simple means of speeding up the calculation is to parallelize over the different terms.8 For Hamiltonians with hundreds of terms this parallelization would significantly reduce the time required to solve the Schrödinger equation. There are also other parallelization options. For example, the increasing prevalence of massively multiprocessor architectures means that it should also be possible to parallelize the multiplication of the blocks of each 1D factor with part of a vector. The fact that the blocks are constant throughout the calculation would reduce the bandwidth requirements of such a scheme. ACKNOWLEDGMENTS

Some of the calculations were done on a computer of the Réseau québécois de calcul de haute performance 共RQCHP兲. This work has been supported by the Natural Sciences and Engineering Research Council of Canada, the Centre de recherches mathématiques in Montreal, and the Canada Research Chairs Programme. S. Carter and N. C. Handy, Comput. Phys. Rep. 5, 115 共1986兲. J. Tennyson, Comput. Phys. Rep. 4, 1 共1986兲. J. C. Light and T. Carrington, Jr., Adv. Chem. Phys. 114, 263 共2000兲. 4 R. Schinke, Photodissociation Dynamics 共Cambridge University Press, Cambridge, England, 1993兲. 5 Dynamics of Molecules and Chemical Reactions, edited by R. E. Wyatt 1 2 3

J. Chem. Phys. 130, 214110 共2009兲


and J. Z. H. Zhang 共Dekker, New York, 1996兲. T. Carrington, Jr., in Encyclopedia of Computational Chemistry, edited by P. von Ragué Schleyer 共Wiley, New York, 1998兲, Vol. 5. 7 M. J. Bramley, J. W. Tromp, T. Carrington, Jr., and G. C. Corey, J. Chem. Phys. 100, 6175 共1994兲. 8 M. J. Bramley and T. Carrington, Jr., J. Chem. Phys. 99, 8519 共1993兲. 9 A. Viel and C. Leforestier, J. Chem. Phys. 112, 1212 共2000兲. 10 X.-G. Wang and T. Carrington, Jr., J. Phys. Chem. A 105, 2575 共2001兲. 11 H.-G. Yu, J. Chem. Phys. 117, 2030 共2002兲. 12 H. S. Lee and J. C. Light, J. Chem. Phys. 120, 4626 共2004兲. 13 D. Bégué, N. Gohaud, C. Pouchan, P. Cassam-Chenai, and J. Liévin, J. Chem. Phys. 127, 164115 共2007兲. 14 R. Dawes and T. Carrington, Jr., J. Chem. Phys. 122, 134101 共2005兲. 15 M. H. Beck, A. Jäckle, G. A. Worth, and H.-D. Meyer, Phys. Rep. 324, 1 共2000兲. 16 A. Jaeckle and H.-D. Meyer, J. Chem. Phys. 109, 3772 共1998兲. 17 S. Manzhos and T. Carrington, Jr., J. Chem. Phys. 125, 194105 共2006兲. 18 J. Dongarra, in Templates for The Solution of Algebraic Eigenvalue Problems: A Practical Guide, edited by Z. Bai, J. Demmel, J. Dongarra, A. Ruhe, and H. van der Vorst 共SIAM, Philadelphia, 2000兲, p. 315. 19 H. Wei and T. Carrington, Jr., J. Chem. Phys. 101, 1343 共1994兲. 20 D. Papoušek and M. R. Aliev, Molecular Vibrational-Rotational Spectra 共Elsevier Science, New York, 1982兲. 21 R. Dawes and T. Carrington, Jr., J. Chem. Phys. 124, 054102 共2006兲. 22 D. T. Colbert and W. H. Miller, J. Chem. Phys. 96, 1982 共1992兲. 23 J. C. Light, I. P. Hamilton, and J. V. Lill, J. Chem. Phys. 82, 1400 共1985兲. 24 R. Dawes and T. Carrington, Jr., J. Chem. Phys. 122, 134101 共2005兲. 25 B. Poirier and A. Salam, J. Chem. Phys. 121, 1690 共2004兲. 26 C. C. Paige, J. Inst. Math. Appl. 10, 373 共1972兲. 27 J. K. Cullum and R. A. Willoughby, Lanczos Algorithms for Large Symmetric Eigenvalue Computations 共Birkhauser, Boston, 1985兲, Vol. 1. 28 A. B. McCoy, X. Huang, S. Carter, and J. M. Bowman, J. Chem. Phys. 123, 064317 共2005兲. 29 J. B. Anderson, J. Chem. Phys. 63, 1499 共1975兲. 30 H.-G. Yu and J. T. Muckerman, J. Mol. Spectrosc. 214, 11 共2002兲. 31 G. Czako, T. Furtenbacher, A. G. Csaszar, and V. Szalay, Mol. Phys. 102, 2411 共2004兲. 32 S. Manzhos and T. Carrington, Jr., J. Chem. Phys. 127, 014103 共2007兲. 33 O. Vendrell, F. Gatti, D. Lauvergnat, and H.-D. Meyer, J. Chem. Phys. 127, 184302 共2007兲. 34 S. Manzhos and T. Carrington, J. Chem. Phys. 129, 224104 共2008兲. 6