Document not found! Please try again

Cubic-scaling algorithm and self-consistent field for the random-phase

0 downloads 0 Views 444KB Size Report
Jan 15, 2014 - the G2-1 test set13, and 0.010 Ha/atom for correlation energies of first and ... established electronic structure codes, basis set convergence problems, and ...... We approximate Eq. (A5) with a 1-parameter quadrature, sgn(Re(b )) ≈ ...... G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda,.
Cubic-scaling algorithm and self-consistent field for the random-phase approximation with second-order screened exchange Jonathan E. Moussaa)

arXiv:1303.3847v6 [cond-mat.mtrl-sci] 15 Jan 2014

Sandia National Laboratories, Albuquerque, NM 87185, USA

The random-phase approximation with second-order screened exchange (RPA+SOSEX) is a model of electron correlation energy with two caveats: its accuracy depends on an arbitrary choice of mean field, and it scales as O(n5 ) operations and O(n3 ) memory for n electrons. We derive a new algorithm that reduces its scaling to O(n3 ) operations and O(n2 ) memory using controlled approximations and a new self-consistent field that approximates Brueckner coupled-cluster doubles (BCCD) theory with RPA+SOSEX, referred to as Brueckner RPA (BRPA) theory. The algorithm comparably reduces the scaling of second-order Møller-Plesset (MP2) perturbation theory with smaller cost prefactors than RPA+SOSEX. Within a semiempirical model, we study H2 dissociation to test accuracy and Hn rings to verify scaling. I.

INTRODUCTION

Density functional theory (DFT) and coupled-cluster (CC) theory are the two dominant paradigms for the computation of many-electron ground states with complementary capabilities. Both theories are built upon the independent-orbital and selfconsistent field (SCF) structure of Hartree-Fock (HF) theory1 . DFT proves the existence of an exact density functional2 for which there are approximations3,4 enabling routine simulation of thousands of electrons5 that scale as O(n3 ) operations and O(n2 ) memory for n electrons. CC theory is a systematically improvable hierarchy of methods6 indexed by p ≥ 2 that scale as O(n2p+2 ) operations and O(n2p ) memory. In practice, DFT is limited by accuracy and CC theory is limited by cost. The random phase approximation (RPA) is a natural point of convergence in the ongoing development of CC theory and DFT. RPA emerges from truncated versions of CC methods as reduced-complexity methods that retain a significant fraction of the CC correlation energy7 . RPA also occurs when DFT is used to approximate the polarization function in the adiabaticconnection fluctuation-dissipation formula for the correlation energy8 . This formula might become a part of more accurate density functionals, but it has a higher cost and complexity9 than conventional density functionals. In both cases, a balance of cost and accuracy is being sought with RPA. The electron correlation energy model that we consider in this paper is the RPA correlation energy plus the second-order screened exchange (SOSEX) energy10 . RPA+SOSEX is exact for one electron and to second order in perturbation theory. It agrees with quantum Monte Carlo benchmarks of the uniform electron gas11 to within 0.002 Ha/electron12 . Benchmarks on inhomogeneous systems show mean absolute errors of 0.002 Ha/atom for cohesive energies of solids in a 5-solid test set10 , 0.008 Ha/molecule for atomization energies of molecules in the G2-1 test set13 , and 0.010 Ha/atom for correlation energies of first and second row atoms13 . These atomic and molecular RPA+SOSEX results approach but fail to surpass the accuracy of second-order Møller-Plesset (MP2) perturbation theory14,15 and the B3LYP density functional16 . Also, any RPA+SOSEX energy depends on a mean field choice and is not unique10 .

a) Electronic

mail: [email protected]

The goal of this paper is to further develop RPA+SOSEX as a compromise between the cost of DFT and the accuracy of CC theory. Known algorithms for RPA+SOSEX10 and similar models15 require O(n5 ) operations and either O(n3 ) or O(n4 ) memory. By utilizing an auxiliary basis set17 , fast interaction kernel summation18 , and a low-rank approximation of energy denominators19 , we design a new algorithm to reduce the cost to O(n3 ) operations and O(n2 ) memory. We define a precise RPA+SOSEX total energy with a unique choice of mean field based on Brueckner orbitals20 that is designed to approximate Brueckner coupled-cluster doubles (BCCD) theory21 . This is referred to as Brueckner RPA (BRPA) theory. It is important to be pragmatic about the short-term value of BRPA theory and a fast RPA+SOSEX algorithm. Despite its construction, BRPA is not a good approximation of BCCD. The approximations used to retain the RPA+SOSEX form are too crude. For a given basis, a fast RPA+SOSEX calculation will have a large cost relative to conventional DFT. With the continued reduction of average errors in density functionals22 and a correlation of DFT and RPA error outliers23 , it is unclear whether the cost is warranted without additional research. The paper proceeds as follows. BRPA theory is derived in Sec. II as a truncation of BCCD theory. The SCF structure of BRPA theory is emphasized. The three main components of a fast RPA+SOSEX algorithm are introduced in Sec. III. This includes a nonstandard choice of primary and auxiliary basis, which simplifies the cost accounting and algorithm design. In Sec. IV, the tensor structure in existing RPA algorithms9,15 is reviewed and extended to RPA+SOSEX. The novel structure of RPA+SOSEX theory enables a significant reduction in the number of its variables. Fast and conventional RPA+SOSEX algorithms are designed in Sec. V as pseudocode. A detailed leading-order cost analysis is given instead of a crude scaling analysis. We study applications to a semiempirical Hn model in Sec. VI. The accuracy of BRPA theory is tested on H2 , and the scaling of RPA+SOSEX algorithms is confirmed with Hn calculations for large n. The implications for future work are discussed in Sec. VII. This includes implementation in established electronic structure codes, basis set convergence problems, and further development of correlation models. We conclude in Sec. VIII with a brief consideration of RPA as a distinct electronic structure paradigm.

2 II.

BRUECKNER RPA (BRPA) THEORY

A.

The common origin of all post-HF methods is the manyelectron Hamiltonian in second quantization notation1 , erspq cˆ †p cˆ †q cˆ s cˆ r , Hˆ = E0 + hqp cˆ †p cˆ q + 41 V

(1)

where cˆ p are the fermion lowering operators of spin-orbitals. Spin-orbital indices are labeled by {p, q, r, s}. Any tensor or product of tensors with repeated indices has an implicit sum over those indices unless they appear unrepeated in a tensor or product of tensors or in an explicit sum in the same equation. This notation is related to the standard bracket notation1 as Z p hq = hp|h|qi = dxdyφ∗p (x)h(x, y)φq (y), Z Vrspq = hpq|rsi = dxdyφ∗p (x)φ∗q (y)V(x, y)φr (x)φ s (y), erspq = hpq||rsi = hpq|rsi − hpq|sri, V

cˆ †i |Φi = 0, cˆ a |Φi = 0, hΦ|Φi = 1,

(4)

or equivalently a Fock operator in spin-space, f 0 (x, y) = h(x, y) + v(x)δ(x − y) − ρ(x, y)V(x, y), Z ∗ ρ(x, y) = φi (x)φi (y), v(x) = dyV(x, y)ρ(y, y), Z dy f 0 (x, y)φ p (y) =  p φ p (x),

= E0 + 12 (hii + fii0 ).

(5)

(6a) (6b)

The orbital energies are approximate excitation energies, a = hΦ|ˆca Hˆ cˆ †a |Φi − EHF , i = EHF − hΦ|ˆc† Hˆ cˆ i |Φi, i

which is known as Koopmans’ theorem24 .

At the singles-and-doubles level of theory (CCSD), |Φi is the ˆ cˆ †a cˆ i , cˆ †a cˆ † cˆ j cˆ i }. In its Brueckner HF ground state and S = {I, b ˆ cˆ †a cˆ i }). variant (BCCD), |Φi is varied and Tˆ ∈ span(S \ {I, We use nonstandard notation to simplify the tensor form of the BCCD equations. The first is Tˆ = 12 T iabj cˆ †a cˆ †b cˆ j cˆ i with ab ab partial symmetry constraints: T iabj = T ba ji but not T i j = −T ji . 21 The second is a non-Hermitian Brueckner matrix , ei j )Teiabj . e p j − 1 δap V bqp = fqp0 + (δap δiq fbj0 + 12 δiq V qb ab 2

ei j + 1 V ei j eab E = E0 + hii + 21 V ij 4 ab T i j

(10a)

+

(10b)

= E0 +

1 i 2 (hi

bii ).

Also, the single-excitation equations simplify to bai = 0. The most complicated part of BCCD theory is the doubleexcitation equations. We separate out a “ring” tensor, eak ecb eac ekb eac ekl edb Rab i j = Vic T k j + T ik Vc j + T ik Vcd T l j ,

(11)

which simplifies the remaining equations to eiabj + bac Teicbj − bki Teab + bbc Teiacj − bkj Teab + R eab V ij kj ik eab Teicdj + 1 Teab V ekl 1 eab ekl ecd + 12 V cd 2 kl i j + 4 T kl Vcd T i j = 0.

(12)

Only Teiabj appears in the BCCD tensor equations. Changes in T ab that do not alter Teab are a redundancy in the theory. ij

ij

BRPA as a truncation of BCCD

We construct BRPA theory by truncating the BCCD tensor equations to extract the RPA+SOSEX correlation model. We crudely attempt to minimize errors by minimizing the number of truncations. Only Eq. (12) is truncated by (1) removing the “ladder” terms on the second line of the equation, (2) reducing eab ⇒ D eab , for the ring tensor to a “direct-ring” tensor, R ij ij ak cb ac kb ac kl db Dab i j = Vic T k j + T ik Vc j + T ik Vcd T l j ,

(13)

and (3) reducing bqp to its Hermitian part, b˜ qp = 12 (bqp + bq∗ p ), to guarantee orthogonal orbitals with real energies. The resulting equations depend on T iabj but only define Teiabj . To define T iabj , we expand the set of equations by removing all ‘e’, Viabj + b˜ ac T icbj − b˜ ki T kabj + b˜ bc T iacj − b˜ kj T ikab + Dab i j = 0.

(7)

(9)

This reduces the total energy to a form similar to Eq. (6b),

B.

with the canonical SCF structure between φ p (x) and f 0 (x, y). The total energy consistent with the Fock matrix is ei j EHF = E0 + hii + 21 V ij

ˆ hΦ|Xˆ † exp(−Tˆ )H|Ψi = EhΦ|Xˆ † exp(−Tˆ )|Ψi, Xˆ ∈ S, ˆ |Ψi = exp(Tˆ )|Φi, Tˆ ∈ span(S \ {I}). (8)

(3)

uniquely up to a global phase. When combined with fermion anticommutation, cˆ †p cˆ q = δqp − cˆ q cˆ †p and cˆ p cˆ q = −ˆcq cˆ p , Eq. (3) is sufficient to calculate all expectation values of |Φi. ˆ HF theory posits the minimization of EHF = hΦ|H|Φi and diagonalization of particle and hole subspaces, hΦ|ˆca Hˆ cˆ †b |Φi and hΦ|ˆc†i Hˆ cˆ j |Φi. For local minima in EHF , this is equivalent to diagonalization of a Fock matrix, eip =  p δqp , fqp0 = hqp + V iq

ˆ The basic structure of CC theory is to solve H|Ψi = E|Ψi approximately as a projected eigenvalue problem6 . It defines a cluster operator, Tˆ , from the linear span of a set of operators, ˆ which only alters the normalization, S, omitting identity, I,

(2)

where x and y are spin-space coordinates, h(x, y) is the singleelectron Hamiltonian, V(x, y) is the inter-electron interaction, erspq = Xrspq − X srpq φ p (x) are the spin-orbital wavefunctions, and X is a tensor antisymmetrization operation. The operators have symmetries h(x, y) = h(y, x)∗ and V(x, y) = V(y, x). We define a reference Slater determinant, |Φi, by splitting the set of orbitals into virtual orbitals labeled by {a, b, c, d} and n occupied orbitals labeled by {i, j, k, l}. |Φi is defined by

Brueckner coupled-cluster doubles (BCCD) theory

(14)

These are known as the RPA Riccati equations7 . They have a unique, positive-definite “stabilizing” solution25 that evolves over Viabj ⇒ λViabj from the unique λ → 0 solution to λ = 1.

3 C.

III.

Self-consistent field structure of BRPA

In conjunction with Eqs. (13) and (14), it is convenient to describe BRPA theory with SCF structure. We expand Eq. (4) to the diagonalization of a generalized Fock matrix, fqp = fqp0 + Σqp =  p δqp ,

(15)

with a static and Hermitian self-energy matrix, Σqp , in analogy to many-body Green’s function theory1 . With the choice fba = b˜ ab ,

f ji = b˜ ij ,

fia = bai ,

fai = ba∗ i ,

(16)

Eq. (15) contains bai = 0, Eqs. (6b) and (10b) are analogous, and fqp diagonalization enables a rearrangement of Eq. (14), T iabj

=−

(δap δri + T ikac δkp δrc )Vrspq (δlq δds T ldbj + δbq δ sj ) a − i + b −  j

.

(17)

It is convenient to write Σqp using a non-Hermitian matrix,

(18)

whereas Σqp ⇒ 0 in HF theory. A generalized SCF cycle is used for BRPA calculations. For a given { p , φ p (x)}, we solve Eq. (17) iteratively for T iabj as an inner SCF cycle. We then calculate σqp from T iabj , which is used to recalculate { p , φ p (x)} in the outer SCF cycle. This is appropriate for reducing the number of σqp calculations when there is a large relative cost to calculate σqp over T iabj . The SCF cycle is summarized as σqp 7→ { p , φ p (x)} 7→ T iabj 7→ σqp . The inner SCF cycle can be avoided by using a first-order approximation for T iabj instead of solving Eq. (17), T iab0 j =

−Viabj a − i + b −  j

.

(19)

p0 p T iabj ⇒ T iab0 j in Eq. (18) defines σq ⇒ σq , which is similar to a canonical transformation MP2 (CT-MP2) theory26 . Other theories use σqp0 to calculate E but calculate { p , φ p (x)} using an inconsistent Σqp . Brueckner CC2 (BCC2) theory27 uses p a a0∗ Σqp ⇒ δap δiq σa0 i + δi δq σi ,

(20)

and conventional MP2 theory uses Σqp ⇒ 0. A generalized Koopmans’ theorem28 applies when E and { p , φ p (x)} are calculated from the same T iabj and Σqp , a = hΦ|ˆca exp(−Tˆ )Hˆ exp(Tˆ )ˆc†a |Φi − E, i = E − hΦ|ˆc† exp(−Tˆ )Hˆ exp(Tˆ )ˆci |Φi, i

None of the three components of the fast BRPA algorithm are fundamentally new. However, some details of their use are nonstandard. We review these details and how they compare with modern standards in the Gaussian-orbital and planewavepseudopotential electronic structure methodologies. A.

Primary and auxiliary local basis sets

In this paper, we use a grid in x with αn points as both the primary and auxiliary basis, labeled by {x, y, z, w}. α is a basis set efficiency factor. The relation between primary and orbital fermion operators is aˆ x = φ px cˆ p with orthonormal structure, φ px φ∗py = δ xy . Single-electron operators in the primary basis are X xy = φ px Xqp φ∗qy . This notation transparently interchanges with basis-free notation (e.g. φ p (x) ⇔ φ px ). In terms of aˆ x , Hˆ = E0 + h xy aˆ †x aˆ y + 12 V xy nˆ x nˆ y − 12 V xx nˆ x ,

(22)

with number operators, nˆ x = aˆ †x aˆ x , and a “kernel”, V xy , from

p i a i q a∗ Σqp = 21 (σqp + σq∗ p + δa δq σi + δ p δa σi ),

ei j )Teiabj , e p j − 1 δap V σqp = (δap δiq hbj + 21 δiq V qb ab 2

FAST ALGORITHM COMPONENTS

(21)

which generalizes Eq. (7). We assume bqp to be real for p = q. This partially corrects for the absence of electron correlation in Koopmans’ theorem, but it still has an “orbital relaxation” error. Because solutions to the BRPA equations depend on the choice of orbital occupations,  p is not exactly the difference between a pair of converged BRPA total energies. In contrast, many-body Green’s function theory is formally exact29 .

p0 p0 ∗ Vrspq = S rx V xy S q0 sy , S qx = φ px φqx .

(23)

This is similar to a tensor hypercontraction (THC) form30 for Vrspq . However, here we use it as a theoretical basis to simplify accounting and not necessarily as a computational basis. A computational basis must enable decomposition of Vrspq similar to Eq. (23) to be suitable for a fast BRPA calculation. Its kernel, V xy , must be amenable to fast summation methods p0 discussed in Sec. III C. A general form, S qz = φ∗px S 0xyz φqy , is 0 acceptable if the “vertex”, S xyz , is sparse. In S 0xyz , {x, y} are primary basis indices, and z is an auxiliary basis index. These primary and auxiliary bases need not be orthogonal or equal, but we assume both to simplify method development. In Gaussian-orbital electronic structure, both primary and auxiliary basis functions are atom-centered polynomials times Gaussians. The resolution-of-identity (RI) form of Vrspq is17 Z S 0xyz = dxdy f x∗ (x) fy (x)V(x, y)gz (y), (24a) Z −1 = (24b) Vzw dxdygz (x)V(x, y)g∗w (y), for primary, f x (x), and auxiliary, gz (x), basis functions and a matrix inverse, ‘−1 ’. The RI vertex is not sparse, which is also the case in Cholesky31 and pseudospectral32 decompositions. The THC vertex is sparse but requires O(α4 n4 ) operations to be calculated at present30 . In a standard RI-RPA calculation9 , α = 14.5 for the cc-pVTZ basis and α = 40.5 for its auxiliary basis33 of a frozen-core C atom (n = 4). In planewave-pseudopotential electronic structure, all the basis functions are planewaves, but pseudopotentials augment the primary basis inside atomic spheres34 . No augmentation of the auxiliary basis is a source of errors when core-valence polarization is important35 . Planewaves have a sparse vertex because the fast Fourier transform (FFT) enables efficient grid operations. Also, the Coulomb kernel, V xy , is diagonal in the planewave basis. In a standard planewave RPA calculation35 , α = 57.3 for the primary basis and α = 28.3 for the auxiliary basis of a frozen-core C atom in diamond at equilibrium.

4 B.

Low-rank energy denominator approximation

In this paper, we build low-rank approximations to energy denominators using numerical quadratures of three integrals. The first integral is over the imaginary frequency axis, Z i∞ 1 dΩ 1 = ωai + ωb j 2πi (ω − Ω)(ω ai b j + Ω) −i∞ Ωe ≈ , (25) (ωai − ωe )(ωb j + ωe ) with ωai = a − i > 0 and a quadrature of β1 points, ωe , and weights, Ωe , indexed by {e, f, g}. We use a permuted index, e, to write a required quadrature symmetry: ωe = ω∗e = −ωe and Ωe = Ω∗e = Ωe . The remaining two integrals are δap δqi = ωai + ωe ≈

I Γv

dΩ 2πi

I Γo

Numerical quadratures of the Laplace transform19 are an alternative low-rank energy denominator approximation, Z ∞ 1 = dse−sa e si e−sb e s j a − i + b −  j 0 ≈ we e−se a e se i e−se b e se  j . (31) Numerical methods for calculating quadratures are known37 . The main advantage of Eq. (31) over Eqs. (25) and (26) is the single quadrature summation rather than nested summations. However, Eqs. (25) and (26) have better numerical behavior in the case of complex orbital energies and lead to equations with connections to many-body Green’s function theory29 . Kernel interpolation (i.e. skeleton decomposition38 ) is yet another low-rank approximation of energy denominators,

dΩ0 1/(Ω − Ω0 + ωe ) 2πi ( p − Ω)(q − Ω0 )

Ωeai

( p − ωa )(q − ωi )

,

(26)

with closed counterclockwise contours, Γv separating a from i − ωe , and Γo separating i from a + ωe . Their quadratures are β2 points, ωa , indexed by {a, b} and β3 points, ωi , indexed by {i, j} with different weights, Ωeai , for each ωe . We combine a and i into one index, p, for notational convenience. With a 4-parameter model of the orbital energy spectrum, a ∈ [vmin , vmax ] and i ∈ [omin , omax ], and a target error, εQ , we analytically construct numerical quadratures in Appendix A. Comparable to other results36 , the quadrature sizes are !  max − omin 2 ln ε−1 β1 ≈ 2 ln vmin Q , π v − omax !  max − omax 4 β2 ≈ 2 ln vmin ln ε−1 Q , π v − omax !  min − omin 4 β3 ≈ 2 ln vmin ln ε−1 (27) Q , π v − omax without an explicit n-dependence. When vmin − omax is zero or very small, a low-energy cutoff must be introduced. In conjunction with Eq. (25), the fast RPA algorithm also requires an approximate product closure relation, ∆eg f 1 ≈ . (ωai + ωe )(ωai + ω f ) ωai + ωg

(28)

For e , f , this equation is closed by partial fractions, ∆eg f = −δef ∇gf − (δeg − δgf )/(ωe − ω f + δef ).

(29)

For e = f , this requires coefficients, ∇ef , of a finite difference d approximation to − dω (ωai + ω)−1 for ω ∈ {ωe }. It is a standard linear approximation problem that we solve by minimizing a root-mean-square (RMS) error metric, v u t X 2 2 1 e (ωai + ωe ) εFD = min 1 + ∇ f . (30) ∇ef ωai + ω f (α − 1)β1 n2 a,i,e

In practice, we find that εFD is proportional to εQ .

1 1 1 ≈ [K−1 ]e f , ωai + ωb j ωai + ωe ωb j + ω f 1 [K]e f = . ωe + ω f

(32)

It is not based on reduction of an integral to quadrature, thus it does not have a simple exact limiting expression. However, it is exact if ωai or ωb j is in {ωe } and may be efficient for treating isolated spectral features and large interior energy gaps. C.

Fast summation of the interaction kernel

In this paper, fast V xy summation methods are accounted for by assigning O(αγn) operations to ρ x 7→ V xy ρy . γ is an efficiency factor with either weak or no n-dependence for fast methods. Further details are not important for cost analysis. Fast V xy summation methods are ubiquitous in electronic structure. FFTs are used in planewave-pseudopotential codes to solve the Poisson equation and apply the Hamiltonian to an orbital. The fast multiple method (FMM) is used in Gaussianorbital codes to efficiently calculate the matrix elements of the Hartree potential39 . Except for planewave calculations of the Fock exchange40 , these fast methods are not a bottleneck and their γ values do not contribute to the leading-order cost. The leading-order cost of the fast BRPA algorithm has a dependence on γ. This is commonly the case for classical nbody simulations of electrostatics in molecular dynamics and gravity in astrophysics. As a result, the method development in those fields has been driven to develop new fast summation techniques. Such modern innovations include the accelerated cartesian expansion41 (ACE), multilevel summation method42 (MSM), and hierarchical matrix decompositions43 . However, electronic structure applications have special requirements for summation methods such as the ability to treat point nuclei, smooth valence electron charge, and the multiple length scales of core electron charge in a unified framework. Applications such as BRPA might motivate this type of development. Fast summation methods exist for other V xy besides the Coulomb kernel. The fast BRPA algorithm applies to any V xy for which γ is small for any reason. Such generality might be appealing, but further improvements to accuracy or efficiency could result from better use of structure within the Coulomb kernel and elementary operations beyond just ρ x 7→ V xy ρy . An example is the low off-diagonal rank of the Coulomb kernel shared by a general class of structured matrices43 .

5 IV.

RPA TENSOR STRUCTURE

B.

There are three distinct approaches to calculating the RPA correlation energy. The first is calculating T iabj by solving the RPA Riccati equation in Eq. (14) and evaluating i j ab EcRPA = 21 Vab Ti j .

(33)

The RPA Riccati equation is equivalent to the RPA symplectic eigenvalue problem7 that determines electron-hole excitation energies. EcRPA is also half the sum of the difference between these excitation energies and the corresponding energies in the Tamm-Dancoff approximation8 . These energies are the poles of frequency-dependent polarization functions and this sum is extracted from them by the residue theorem when used on the adiabatic-connection fluctuation-dissipation (ACFD) formula for EcRPA . This formula can be rearranged as rs pq EcRPA = 41 V pq Crs ,

(34)

pq where Crs is the correlated part of the RPA two-body density matrix averaged over interaction strength44 . We use the wellpq studied structure of Crs as a reference point for the discussion ab of structure in T i j , which has yet to be elucidated.

A.

Adiabatic-connection fluctuation-dissipation RPA

In an auxiliary basis, the ACFD formula for EcRPA is8 EcRPA = −

Z

1

i∞

Z

dΩ V xy [Pλxy (Ω) − P0xy (Ω)], 4πi

dλ −i∞

0

(35)

where Pλxy (ω) is the RPA polarization function for interaction strength λ. Pλxy (ω) is defined in relation to its λ = 0 value as P0xy (ω) = −

a0 i0 S ay S ix



i0 a0 S ax S iy

,

ωai + ω ωai − ω Pλxy (ω) = P0xy (ω) + λPλxz (ω)Vzw P0wy (ω).

(36)

The screened Coulomb interaction is related to Pλxy (ω) as λ W xy (ω) = λV xy + λ2 V xz Pλzw (ω)Vwy ,

Pλxy (ω)

=

+

P0xy (ω)

λ (ω)P0wy (ω). P0xz (ω)Wzw

(37a) (37b)

Using Eq. (37b), we encapsulate λ in Eq. (35) with Z 1 λ W xy (ω) = 2 dλW xy (ω), [E(ω)] xy = P0xz (ω)Vzy , (38)

pq by splitting P0xy (ω) over its internal indices, and extract Crs

pq Crs = −δap δqb δir δ sj



δip δqj δar δbs

i∞

Z

−i∞

Z

i∞

−i∞

a0 b0 dΩ S ix W xy (Ω)S jy 2πi (ωai − Ω)(ωb j + Ω) j0 i0 dΩ S ax W xy (Ω)S by . 2πi (ωai + Ω)(ωb j − Ω)

From observing Eqs. (17), (25), and (39), it is reasonable pq to expect similar tensor structure in Crs and T iabj . To that end, pq we expand Vrs in Eq. (17) using Eq. (23) and regroup terms, T iabj

=

a −S ix V xy S bjy

ωai + ωb j

j0 a a0 , S ix = S ix + T iabj S bx .

(40)

a This can be rewritten to define S ix without reference to T iabj , a a0 S ix = S ix + S iya Vyz Bzx (ωai ),

Bxy (ω) = −

a i0 S ix S ay

ωai + ω

.

(41a) (41b)

a a0 The iteration of Eq. (41a) starting from S ix = S ix elucidates a a factorization ansatz that further simplifies S ix , a a0 S ix = S ix + S iya0 Vyz Azx (ωai ).

(42)

It enables a reduction of Eq. (41) to Dyson-like equations, A0xy (ω) = −

a0 i0 S ay S ix

, ωai + ω A xy (ω) = Bxy (ω) + A xz (ω)Vzw Bwy (ω), I dΩ 0 Bxy (ω) = A xy (ω) − Azx (−Ω)Vzw A0wy [Ω, ω]. 2πi Γ

(43a) (43b) (43c)

This uses a divided difference, f [x, y] = [ f (x) − f (y)]/(x − y), and an analytic reconstruction of A xy (ωai ) with the form I dΩ A xy (−Ω) (44) A xy (ωai ) = Γ 2πi ωai + Ω for a closed counterclockwise contour Γ separating −ωai from the poles of A xy (−ω), which are disjoint if A xy (ωai ) is finite. pq We maximize the superficial similarity of Crs and T iabj by ab again regrouping terms in T i j and applying Eq. (25), a0 b0 dΩ S ix U xy (ωai , ωb j )S jy , (45a) −i∞ 2πi (ωai − Ω)(ωb j + Ω) U xy (ω, ω0 ) = V xy + V xz Qzw (ω, ω0 )Vwy , (45b) Q xy (ω, ω0 ) = A xy (ω) + Ayx (ω0 ) + A xz (ω)Vzw Ayw (ω0 ). (45c)

T iabj = −

Z

i∞

pq and T iabj produce the same value of EcRPA , their While both Crs

0

W xy (ω) = −2V xz [E(ω)−1 + E(ω)−2 ln(I − E(ω))]zy ,

Structure of T iabj in BRPA

(39)

pq ij Other forms9,15,44 of Crs combine the ‘ab i j ’ and ‘ab ’ sectors and use different but equivalent energy denominators.

i j ab corresponding SOSEX-like values, EcSOSEX = − 12 Vba T i j and pq 1 rs Crs , are not equal and begin to differ at EcAC−SOSEX = − 4 Vqp third order in perturbation theory44 . However, EcAC−SOSEX and EcSOSEX are numerically similar for small molecules15 . In a BRPA calculation based on Sec. II C, T iabj will appear in its own calculation and also in the calculation of σqp . There are O(α2 n4 ) variables in T iabj , which is the memory bottleneck. a a Direct calculation of S ix in Eq. (41) and of σqp from S ix will 2 3 reduce the number of variables to O(α n ) if the full storage of Bxy (ωai ) is avoided. A xy (ωai ) is not directly useful because it contains O(α3 n4 ) variables. Nevertheless, the interpolation of A xy (ωai ) from an effective quadrature of Eq. (44) reduces it to O(α2 β1 n2 ) variables and enables fast algorithms.

6 V.

FAST ALGORITHM DESIGN

ALGORITHM 1. Conventional MP2

Here we design fast and conventional algorithms for MP2 and RPA+SOSEX calculations. In each of these four designs, we begin by identifying the tensor equations to be evaluated. Many of these equations are sufficiently complicated that an algorithm for efficient evaluation is not immediately obvious. We use simple pseudocode to decompose these equations into intermediate variables and elementary operations. Algorithm costs are then easy to account for in the pseudocode. Without the simple primary and auxiliary basis sets in Sec. III A, this design process becomes significantly more difficult. Even so, the results here demonstrate what is possible with a practical basis set if sufficient effort is given to algorithm design. The pseudocode is grouped into functions. Memory usage is delineated at the beginning of the function declaration with inputs then outputs separated by a semicolon in the function argument and a workspace statement that lists all temporary variables. The body of each function contains ‘for’ loops and elementary ‘A := B’ operations that denote the calculation of expression B and its storage in variable A. Each bottleneck is commented with its leading-order operation count in the cost polynomial of {α, β1 , β2 , β3 , γ, n}. We count as one operation each addition, multiplication, and division. Real and complex are not distinguished in operation and variable counts. The basic design strategies for minimizing cost are reuse of calculations and avoidance of concurrent data storage. All computational costs that are leading order in n are arranged as either matrix-matrix multiplications with the minimum matrix dimension maximized or as fast V xy summations. This choice hides the cost of data movement in efficient implementations of linear algebra and V xy summation. These strategies suffice for the design of serial algorithms, but parallel algorithms will need to explicitly account for communication costs. A common element of the four algorithms is calculation of σqp0 or σqp directly in the primary basis. The SCF equations from Sec. II C in the primary basis are analogous to Eq. (5), E = E0 + 21 ρ xy (hyx + fyx ), f xy = h xy + v x δ xy − ρ xy V xy + Σ xy , f xy φ py =  p φ px , ρ xy = φix φ∗iy , v x = V xy ρyy , Σ xy =

1 2 (σ xy

+

(46e) σ∗yx

+ σ xz ρzy +

ρ xz σ∗yz

− ρ xz σzw ρwy − ρ xz σ∗wz ρwy ), j0 j0 σ xy = φax φ∗iy Teiabj ( fbj0 + V xz S bz − S bz Vzy ), with E and A.

Σqp

(46a) (46b) (46c) (46d)

1: function MP2c( fai0 , V xy ,  p , φ px ; σ0xy ) 2: workspace : {X, X x , Xia , Xix , Yix } 3: σ0xy := 0 4: for each a do 5: Xix := φix φ∗ax 6: Xix := V xy Xiy 7: for each i do 8: Y jx := X jx φix − Xix φ jx 9: X bj := φ∗bx Y jx 10: 11:

also calculated in the primary basis.

Conventional MP2 algorithm

Algorithm 1 calculates σ0xy using Eqs. (19) and (46g) by substituting σ xy ⇒ σ0xy and T iabj ⇒ T iab0 j . It is comparable to 45 an RI-MP2 algorithm when Eq. (46) is used to calculate E from σ0xy . Both require O(α3 n5 ) operations, but the O(α2 n3 ) memory cost of RI-MP2 reduces to an O(α2 n2 ) memory cost by avoiding storage of the dense RI vertex in Eq. (24a). The intermediate calculation of σ0xy instead of a direct calculation of E enables the self-consistent MP2 methods in Sec. II C.

. 2α3 n5

− i + b −  j ) . 2α3 n5

12: 13: 14: X x := V xy Xy 15: σ0xy := σ0xy + φax φ∗iy (X + X x − Xy ) 16: end for 17: end for 18: end function

B.

Conventional BRPA algorithm

Because of the T iabj structure found in Sec. IV B, the SCF a inner loop proposed in Sec. II C only needs to determine S ix ab rather than T i j . We rearrange Eq. (41) into a residual tensor, Raix

=

a S ix



a0 S ix

+

S iya

j0 Vyz S bjz S bx

ωai + ωb j

,

(47)

and recast the SCF inner loop as solving Raix = 0. Algorithm 2 calculates Raix with O(α3 n5 ) operations and O(α2 n3 ) memory. As in the MP2 case, the operation count is the same as other RPA+SOSEX algorithms10 , but avoiding the storage of T iabj reduces the O(α2 n4 ) memory cost. Algorithm 3 calculates σ xy using Eqs. (40) and (46g). It is comparable in cost with the Raix inner loop calculations, which violates the assumption in Sec. II C of an inexpensive inner loop. As a result, it is more efficient to calculate Raix and σ xy concurrently and solve the orbital self-consistency and Raix = 0 problems simultaneously. There will be one iterative cycle as opposed to inner iterations nested within outer iterations. This is an example of modifying the overall design of an algorithm based on a change in the relative cost of its components.

(46f) (46g)

:= X bj /(a X := X bj fbj0 Y jx := X bj φbx X x := φ∗jx Y jx X bj

ALGORITHM 2. Conventional BRPA inner loop a 1: function inBRPAc(V xy ,  p , φ px , S ix ; Raix ) 2: workspace : {Xia , Xix , Yix } a 3: Raix := S ix − φ∗ax φix 4: for each a do 5: Xix := V xy S iya 6: for each i do b 7: X bj := S ix X jx

8:

:= X bj /(a − i + b Y jx := X bj φbx Rajx := Rajx + Y jx φ∗ix X bj

9: 10: 11: end for 12: end for 13: end function

. 2α3 n5

−  j)

. 2α3 n5

7 ALGORITHM 4. Fast MP2

ALGORITHM 3. Conventional BRPA outer loop a 1: function outBRPAc( fai0 , V xy ,  p , φ px , S ix ; σ xy ) a a a 2: workspace : {Xi , Yi , Xix , Yix , Xix } 3: Xia := 0 4: Xixa := 0 5: for each a do 6: Xix := V xy S iya 7: for each i do b 8: Y bj := S ix X jx

:= Y bj /(a − i + b := X aj − Y bj fbi0 := Xia + Y bj fbj0 Y jx := Y bj φbx X ajx := X ajx − Y jx φ∗ix Xixa := Xixa + Y jx φ∗jx

9: 11:

2: 3: 4:

. 2α3 n5

−  j)

Y bj X aj Xia

10:

p

0 1: function MP2f(Ωe , Ωeai , f xy , V xy , G xy ; σ0xy )

9: 10: 11: 12: 13:

end for e0 W xy := V xz Wzye0 e0 e0 W xy := Vyz W xz e0 e0 v x := V xy Jyy

14: 15: 16:

F xy := 0 for each x do ea i Xy := Ωeai G xy

19:

MP2 corresponds to A xy (ω) = 0 in terms of the structure discussed in Sec. IV B. However, A0xy (ωai ) is still an element of fast MP2 calculations. To efficiently calculate A0xy (ωai ), we introduce a mean-field Green’s function, G xy (ω) =

φ px φ∗py ω − p

p

, G xy = G xy (ω p ).

(48)

With Eqs. (25) and (26), we reduce A0xy (ωai ) in Eq. (43a) to A0xy (ωai ) ≈

Ωe Ae0 xy ωai − ωe

0 , Ae0 xy = A xy (ωe ),

i

p G xy ,

i0

i

Ae0 xy .

a0

a

a

e0 e0 F xy = ΩeaiG xy Wyx + ΩeaiG xz Vzy (Ξe0 zyx + Jzy ), a0 F xy

Ξe0 xyz

=

i e0 ΩeaiG xy (W xy

=

i a Ωe ΩeaiG xwGwy Vwz ,

+

i

ve0 x )

. 2β1 β2 α3 n3 . 2β1 β2 α3 n3 . β1 γα3 n3 . 2β1 β3 α3 n3

27: 28: end for i0 i a a0 29: σ0xy := F xz Gzy − G xz Fzy 30: end function

Fast BRPA algorithm

As in Eq. (49a), we use Eq. (25) to approximate Bxy (ωai ) with Bexy = Bxy (ωe ) and postulate a similar form for A xy (ωai ), Ωe Bexy

Bxy (ωai ) ≈

ωai − ωe

, A xy (ωai ) ≈

Ωe Aexy ωai − ωe

,

+

i

a

e Ae0 xy = −Ωai G xy G yx ,

i ΩeaiGzy V xz Ξe0 xzy ,

Aexy = Bexy + Aexz Vzw Bewy − Bexy

a

e0 0 J xy = Ωe ΩeaiG xz fzw Gwy ,

(51)

=

Ae0 xy

+

(52a) f Ω f ∆eg f A xz Vzw Bgwy ,

f Ω f ∆eg f Azx Vzw Ag0 wy ,

(52b) (52c)

for ω ∈ {ωai }. These equations are equivalent to Rexy = 0 for

e0 W xy = Ωe V xz Ae0 zw Vwy , e0 ve0 x = V xy Jyy .

24: 25: 26:

. 2β1 β3 α3 n3 . β1 γα3 n3

with Aexy , A xy (ωe ). Using Eq. (28), this reduces Eq. (43) to

σ0xy ≈ F xzGzy − G xz Fzy , i0

23:

. 2β1 β2 β3 α2 n2

(49b)

2

a

. 2β1 β2 β3 α2 n2

(49a)

Given we need O(α β1 β2 β3 n ) operations to form Repeated application of Eqs. (25) and (26) to σ0xy enables p its decomposition into G xy and rearrangement into 2

20: 21: 22:

D.

a

A0xy (ωe ) ≈ −ΩeaiG xyGyx .

. 2β1 β3 α3 n3

ei a Xy := Ωeai Gyx a0 ea a0 e0 + ve0 F xy := F xy + Xy (W xy x ) i0 ei e0 i0 Fyx := Fyx + Xy W xy ei i e Xzy := Xy Gzy e e Xzy := Vyw Xzw e e Xzy := Ωe Xzy V xz a0 ea e a0 Fzy := Fzy + Xy Xzy ea a e Xzy := Xz Gzy e e Xzy := Vzw Xwy e e e0 Xzy := (Ωe Xzy + J xy )V xy i0 ei e i0 Fzy := Fzy + Xz Xzy

17:

Fast MP2 algorithm

. 2β1 β2 β3 α2 n2

p0

18:

C.

p0

i e e0 e0 W xy := W xy − G xy Xyx i0 e e0 e0 J xy := J xy + F xz Xzy

8:

. 2α3 n5

12: 13: 14: 15: end for 16: end for 17: Xixa := V xy Xiya 18: σ xy := φax φ∗iy (Xia + Xixa − Xiya ) 19: end function

5: 6: 7:

ep

e0 e e workspace : {ve0 x , X x , J xy , W xy , X xy , F xy } e0 J xy := 0 e0 W xy := 0 i i0 F xy := G xz fzy0 for each i do a e X xy := Ωe Ωeai G xy

(50)

σ0xy is calculated by Algorithm 4 in O(α3 β1 (β2 + β3 + γ)n3 ) operations and O(α2 (β1 + β2 + β3 )n2 ) memory. This improves on the O(α3 n4 ) operations used by THC-MP246 and O(α4 n4 ) operations used by atomic-orbital Laplace MP2 with integral screening47 . Neither method uses fast V xy summation, which enables the extra factor of n speedup in the new algorithm.

f Rexy = Aexy − Bexy − Aexz Vzw Bewy + Ω f ∆eg f A xz Vzw Bgwy .

(53)

Rexy is calculated by Algorithm 5 in O(α3 β1 n3 ) operations and O(α2 β1 n2 ) memory. Compared to the conventional algorithm in Sec. V B, this is more efficient in operations when β1  n2 and in memory when β1  n. The calculation of W xy (ω) at quadrature points in Eq. (39) is noniterative and has the same cost, but we lack efficient formulas relating W xy (ω) to σ xy .

8 ALGORITHM 5. Fast BRPA inner loop

ALGORITHM 6. Fast BRPA outer loop p

e e 1: function inBRPAf(ωe , Ωe , ∇ef , V xy , Ae0 xy , A xy ; R xy ) e e e 2: workspace : {X f , X xy , Y xy } 3: X ef := Ω f (1 − δef )/(ωe − ω f + δef )

4: 5: 6: 7: 8: 9: 10: 11: 12: 13: 14:

0 1: function outBRPAf(ωe , Ωe , ∇ef , Ωeai , f xy , V xy , Aexy , G xy ; σ xy )

2:

. β1 γα2 n2 . 2β21 α2 n2 . 2β1 α3 n3 . 2β1 α3 n3 . 2β21 α2 n2

e X xy := V xz Ae0 zy e Y xy := Ωe ∇ef X xyf e e Rexy := Ae0 xy − Azx Yzy e e e Y xy := Azx Xzy Rexy := Rexy + X ef Y xyf e Y xy Rexy e X xy e R xy e Y xy Rexy e Y xy e R xy

:= := := := := :=

. . . . . .

f X ef A xy e e Rexy + Yzx Xzy V xz Rezy e e Aexy − Rexy − Y xz Xzy e e A xz Xzy Rexy + X ef Y xyf e X xy + Ωe ∇ef X xyf e R xy − Aexz Yzye

. .

:= 16: := 17: end function 15:

2β21 α2 n2 2β1 α3 n3 β1 γα2 n2 2β1 α3 n3 2β1 α3 n3 2β21 α2 n2 2β21 α2 n2 2β1 α3 n3

a

Gax =

9: 10: 11: 12: 13: 14:

19:

φ∗ax φix i , G xi = , ωa − a ωi − i

(54)

a

j

b

i

ef f T iabj ≈ −GaxG xi Ωeai U xy Ωb jGbyGy j , ef U xy

fg Leg xz Ωg Vzw Lyw ,

=

Lexyf = δef δ xy + Ωg ∆ef g V xz Agzy .

(55)

This bears some resemblance to the approximate THC-CCSD ef form48 of T iabj . U xy appears in the BRPA analog of Eq. (50), i i F xzGzy

i

a

a

i

jb

a

i

jb

a

i

a

0 J xy = G xz fzw Gwy , e W xy = Ωe V xz Aezw Vwy , ia

ef f vex = U xy Ωai Jyy .

(56)

σ xy is calculated by Algorithm 6 using O(α3 β2 β3 (β1 + γ)n3 ) operations and O(α2 (β1 + β2 + β3 )n2 ) memory. A fast BRPA calculation of σ xy is a factor of O(β−1 1 β2 β3 ) more expensive than a fast MP2 calculation of σ0xy in the γ  βm regime. The calculations in Eq. (56) are especially difficult when memory ia ia ef limitations prohibit concurrent storage of U xy , J xy , and Ξ xyz . ia Full evaluation of Ξ xyz requires O(α3 β2 β3 γn3 ) operations, and without concurrent storage it must be evaluated twice overall. Without the simple primary and auxiliary basis structure from Sec. III A, pseudocode that is equivalent to Algorithm 6 will become significantly longer and more difficult to design.

for each x do ea i Xy := Ωeai G xy a

ei

Xy := Ωeai Gyx a ea a e F xy := F xy + Xy (W xy + vex ) i ei e i Fyx := Fyx + Xy W xy for each y do e Z ef := X xz Wyzf U ef := Y ef Z ef + Ωg (∇ge ∇gf Zgg − Xeg ∇gf Zge − ∇ge X gf Z gf )

24:

W ef := W xyf − Xge Zgf

25:

U ef := U ef + Ω f (Xef Wef − ∇ef W f )

26:

f

27:

W ef := Wyxf − Xge Z gf U ef := U ef + Ωe (X ef W ef − ∇ef Wee )

28:

Uee := Uee + Ωe (V xy − X ef W xyf − X ef W ef )

29:

Xaie := U ef Ωaif

30:

35: 36:

Ξ xyz = G xwGwy Vwz ,

ep ep p ai i e e X x , X x , Y x , W xy , X xy , F xy } , X xy p F xy := 0 X ef := (1 − δef )/(ωe − ω f + δef ) Y ef := Ωg Xeg X gf i i X xy := G xz fzy0 e X xy := Ωe V xz Aezy e e W xy := Vyz X xz ai i a X x := X xy Gyx ai vex := Ωeai X x e f X xe f := Xyx vy e e v x := v x + X ef (X xf f − X xf e ) − ∇ef X xe f vex := Ωe V xy vey e f vy X xe f := X xy e e v x := v x + Xef (X xe f + X xf e ) − ∇ef X xf f

23:

34:

jb

ef e F xy = ΩeaiG xy (W xy + vex ) + Ωea j Ωbif Gzy U xz Ξ xzy , i

20: 21: 22:

31: 32: 33:

a a − G xz Fzy ,

ef e (Ξzyx + Jzy ), F xy = ΩeaiG xy Wyx + Ωea j Ωbif G xz Uzy

ia

6: 7: 8:

18:

Eqs. (25), (26), (28), (40), and (42) reduce T iabj to

ia

5:

15: 16: 17:

In terms of half-transformed Green’s functions,

σ xy ≈

3: 4:

workspace : {U ef , W ef , X ef , Y ef , Z ef , Xaie , vex , X xe f ,

a i ai Xz := Gzx Gyz ai ai Xz := Vzw Xw ea ai ei Yz := Xz Xz a a ei Fyz := Fyz + Xaie Yz ai a i Xz := Gzy G xz ai ai Xz := Vzw Xw ai ai i a Xz := Xz + X xw Gwy ea ei ai Yz := Xz Xz i ea i Fzy := Fzy + Xaie Yz

37: 38: 39: end for 40: end for i i a a 41: σ xy := F xz Gzy − G xz Fzy 42: end function

VI.

. 2β21 β2 β3 α2 n2 . β2 β3 γα3 n3 . 2β1 β2 β3 α3 n3 . β2 β3 γα3 n3

. 2β1 β2 β3 α3 n3

APPLICATIONS

Before committing further to its development, it is prudent to test the accuracy of BRPA theory against other popular total energy methods and performance of the fast MP2 and BRPA algorithms against their conventional counterparts. Direct use of the algorithms in Sec. V is possible for Hamiltonians that are limited to a zero-differential-overlap (ZDO) form49 . There have been proposals to calculate electron correlation within a semiempirical framework50 , but they remains undeveloped.

9

0.75

0.75

0.85

∆ (Ha)

E (Ha)

0.90

(b)

0.80 0.85 0.90

0.95

0.95

1.00

1.00

1.05

1.05

1.10

1.10

1.15

1.15

0.9 0.8 0.7 0.6 0.5 0.4 0.3 0.2 0.1 0.0

(c)

1

2

3

5

4

6

7

8

9

10 11 12

R (Bohr)

(d)

1

2

3

4

5

6

7

8

9

0.05 0.04 0.03 0.02 0.01 0.00 0.01

E (Ha)

HF BRPA MP2 exact B3LYP symmetry-broken gap noninteracting gap (2t) optimized gap

∆E (Ha)

(a)

0.80

10 11 12

R (Bohr)

FIG. 1. Total energy, E, with symmetry (a) preserved and (b) broken, (c) virtual-occupied energy gap, ∆, and (d) error in the symmetry-broken total energy, ∆E, for H2 as a function of bond length, R. The RPA+SOSEX correlation energy is exact for an optimized gap when R < 3.7.

We consider Hn because it gives us fine control over the number of electrons and also because H2 is the simplest twoelectron molecule with an internal coordinate. The ZDO form of H2 is an extended Hubbard model with α = 2 and  µ −t 0 0  U V U V     −t µ 0 0 h =  0 0 µ −t  , V =  UV UV UV UV  , (57) 0

0

−t

µ

V U V U

in the notation of Eq. (22). It has a natural pairwise extension to the n > 2 case. Separate H2 and Hn Hamiltonians are fit in Appendix B. The H2 version is fit to reproduce exact, HF, and MP2 total energies. The Hn version is fit to a simple distancedependent form that gives proper asymptotic behavior. These models have limited transferability, which is a standard caveat of semiempirical modeling with simple Hamiltonian forms. A.

Accuracy of BRPA on H2

Symmetric dissociation of H2 is used as a critical test of new electron correlation methods51–53 . The model in Eq. (57) has simple formulas for HF, MP2, BRPA, and exact energies, EHF = E0 + 2(µ − t) + 12 (U + V), EMP2 = EHF −

(U−V)2 16t+8V ,

EBRPA = EHF −

q

4t2 + 41 (U − V)2 .

E = EHF + 2t −

(U−V)2 16t+8U ,

(58)

Dissociation curves are shown in Fig. 1a. BRPA is a uniform improvement over HF but has three times the error of MP2 or B3LYP at equilibrium. These methods fail at dissociation, but BRPA is stable at twice the B3LYP error as MP2 diverges.

RPA+SOSEX total energies are strongly modulated by the choice of mean field. As shown in Fig. 1c, errors in H2 near equilibrium can be corrected by significantly reducing the gap. This is typical behavior, and the smaller orbital gaps in DFT reference mean fields systematically improve total energies54 over HF-based calculations14 . Reducing gaps to reduce errors conflicts with the generalized Koopmans theorem in Eq. (21) that requires the BRPA gap to approximate the physical gap. On H2 , BRPA increases the HF gap towards the exact gap, but some methods for calculating excitations such as GW theory decrease the HF gap53 . One possible resolution is to derive an RPA+SOSEX model where orbital energies are not required to approximate electron addition and removal energies, such as by modifying Pλxy (ω) in Eq. (35) with exchange terms55 . Dissociation failures can be fixed by enabling symmetry breaking in the reference state. As shown in Fig. 1d, the error now peaks at around twice the equilibrium bond length. MP2 now has uniformly less error than BRPA, but BRPA fixes the derivative discontinuity in the MP2 energy surface. Symmetry breaking is needed for size consistency at this level of theory when considering the dissociation of a closed-shell molecule into open-shell fragments. It also serves as a simple model of static correlation. Broken symmetries can be restored with a projection56 , but the resulting total energy corrections are not size consistent. An accurate, size-consistent model of electron correlation should be able to repair symmetries broken by the reference state accurately but not necessarily exactly. A single example such as H2 is not enough to determine error statistics and compare the average errors of total energy methods. However, while errors on a system as important and simple as H2 remain large, it suffices to discount methods.

10 TABLE I. Leading-order, per-iteration floating-point costs and memory footprints of MP2 and the inner and outer loops of BRPA. Algorithm 1. 2. 3. 4. 5. 6.

Memory

Scaling of BRPA on Hn

β1 ≈ 1.7 + 2.1 ln n, β2 ≈ −0.7 + 4.1 ln n, β3 ≈ 3.3 + 4.1 ln n.

(59)

These are not asymptotic scalings. When the orbital gap falls below a numerical low-energy cutoff, we will add an artificial gap comparable to this cutoff that limits quadrature size. Benchmarks are measured on a c implementation57 . The HF Hamiltonian matrix is diagonalized with the dsyev routine in lapack58 . Tensors are contracted with the real dgemm and complex zgemm routines in blas59 . A conversion factor from runtimes to operation counts is determined by timing the 2n3 operations of n-by-n matrix-matrix multiplications in dgemm calls for large n. Conventional algorithms use real arithmetic, while fast algorithms use complex arithmetic. The increased cost of complex arithmetic is mitigated by using symmetry60 . V xy ρy is summed with fftw61 . For regular behavior of γ, we further restrict Hn to n = 2 p − 2 and embed V xy ρy in a cyclic convolution of size 2 p+1 . We observe an fftw scaling of γ ≈ 23 + 13 ln n.

108 107 106 105 time per iteration (s)

We benchmark the algorithms in Sec. V on Hn rings, with the intent to produce representative scaling behavior and not necessarily to optimize performance. To this end, we consider only n = 4m + 2 for a closed shell and a finite orbital gap. We use an HF mean field in all calculations to enable the use of common quadratures. The algorithm runtimes are comparable to per-iteration costs and a full self-consistent calculation will have number of iterations as an additional cost multiplier. The HF orbital gap decreases as ∆ ≈ 2.7n−0.83 . Using the quadrature formulas in Appendix A with errors that are set to εQ = 10−5 , the quadrature sizes are observed to increase as

104 103 102

(a)

RPAout conv. MP2conv. RPAinconv. RPAout fast MP2fast RPAinfast HF

β1 β2 β3 γ

101 100 10-1 10-2 10-3 10-4 105

(b)

104 memory (MiB)

B.

Operations

Conventional MP2 4α3 n5 2.5α2 n2 Conventional Inner BRPA 4α3 n5 2α2 n3 Conventional Outer BRPA 4α3 n5 2α2 n3 Fast MP2 6β1 β2 β3 α2 n2 + 2β1 (2β2 + 3β3 + γ)α3 n3 β1 β2 β3 + (β1 β2 + β1 β3 )αn + (3β1 + 2β2 + 2β3 )α2 n2 Fast Inner BRPA 2β1 (5β1 + γ)α2 n2 + 12β1 α3 n3 2β21 + 4.5β1 α2 n2 Fast Outer BRPA 2β21 β2 β3 α2 n2 + 2(2β1 + γ)β2 β3 α3 n3 2β1 β2 β3 + (β21 + 2β1 β2 + 2β1 β3 + β2 β3 )αn + (3β1 + 2β2 + 3β3 )α2 n2

103 102 101 100

Other fast summation methods18 avoid a ln n prefactor. Benchmark results are shown in Fig. 2. The runtimes are compared to the model estimates summarized in Table I. The agreement is good, and all discrepancies can be rationalized. The inBRPAf cost is increased 50% by complex arithmetic60 . The excess cost of MP2f is caused by zgemm calls with small O(β1 ) matrix dimensions. The excess cost of outBRPAf is caused by subleading-order terms with large prefactors. Fast and conventional calculations agree to 10−7 Ha in σ xy . In the cost model, the crossover between conventional and fast runtimes occurs at n = 62 for MP2 and n = 422 for BRPA. As all algorithms have an O(α3 ) scaling, larger basis sets will not shift the crossover. Other prefactors will but are indicative of realistic values as quadrature sizes are reported9,35 from 16 to 40 and the FFT is a standard fast summation method.

10-1

prefactor

(60)

120 100 80 60 40 20 0 1 10

(c)

102

n

103

FIG. 2. The (a) operation, (b) memory, and (c) prefactor costs of Hn on a single core of an Intel Xeon X5650. Points are measured costs, and the equivalently colored lines are the model costs62 . The model conversion factor between operation count and runtime is 10 Gflops.

11 VII.

B.

DISCUSSION

A natural continuation of the research in this paper is the extension of the algorithms in Sec. V for implementation in established electronic structure codes. For a Gaussian-orbital code, this means nontrivial overlap matrices between primary basis functions and between pairs of primary basis functions and auxiliary basis functions. It needs an implementation of THC or some other decomposition of Vrspq with a sparse vertex tensor and a Gaussian-compatible FMM39 . This combination of features is not yet available in any code. For a planewavepseudopotential code, algorithms for periodic systems need to be developed. Suitable FFT and grid operations for efficient Vrspq decomposition are widely available in these codes. An orthogonal direction for continued research is further development of basic algorithms and numerical analysis that improve cubic-scaling correlation models. One such direction is analysis of the slow basis set convergence that is a universal detriment to non-DFT electron correlation models. Another is the development of more accurate correlation models that are restricted to O(α3 n3 ) operations and O(α2 n2 ) memory. These two directions are discussed in more detail below. A.

The operator approximation problem

The α prefactor depends on the choice of basis set. Basis construction is widely considered as a function approximation problem of electron orbitals. It is usually focused on occupied orbitals, their response to perturbations, and low-lying virtual orbitals. Larger α values improve their approximation, while smaller α values reduce costs. The number of virtual orbitals defined by the basis set is also determined by α. Slow basis set convergence manifests in the virtual orbital summations in MP2 and related calculations. Convergence is often accelerated with basis set extrapolation to avoid large α values63 . In wavefunction-based methods, slow convergence is attributed to electron-electron cusps in the wavefunction64 that are corrected by R12/F1265 or Jastrow66 methods. In the fast algorithms in Sec. V, there are only operators and their finite-basis errors instead of summations over virtual orbitals or wavefunction cusps. In the case of a spinless nonrelativistic Hamiltonian, the BRPA operator variables asymptote to G(~x, ~y, ω) →

−1 −ρ(~x, ~y) , A(~x, ~y, ω) → , 2π|~x − ~y| 2π|~x − ~y|

(61)

as |~x −~y| → 0. These singularities converges slowly with basis set size and are better approximated by functions of |~x − ~y|. The efficient and accurate representation of A(x, y, ω) and G(x, y, ω) is an operator approximation problem. G(x, y, ω) is the simplest case where we seek to satisfy Z dz[ωδ(x − z) − f (x, z)]G(z, y, ω) ≈ δ(x − y). (62) The conventional approach is to approximate G(x, y, ω) with a sum of pairwise products of basis functions. The possibilities beyond this include basis operators that are added linearly or as pairwise products and the direct modeling of singularities. An operator-based approach can also use spatial localization67 that does not manifest in molecular orbitals. These ideas form a distinct alternative to orbital function approximation.

The cost-restricted electron correlation problem

The BRPA model developed in this paper is in the class of RPA models, but it is derived from BCCD theory. Historically, the mathematical structure of RPA has emerged from multiple theories: resummation of diagrammatic perturbation theory68 , boson approximation of electron-hole excitation operators69 , equations-of-motion for ground state annihilation operators70 , integration of response functions over interaction strength71 , integration of Slater determinants over generator variables72 , 1/N expansion in N interacting copies of the Hilbert space73 , total energy functionals of the many-body Green’s function29 , and an adaptation of interaction strength integration to DFT74 . This diverse set of theoretical ideas is responsible for the large number of modern RPA variants54,75,76 . Much like DFT, there is no rigorous delineation on what can be incorporated into an RPA model of electron correlation energy. The results of this paper suggest a delineation of models based on computational complexity: a cost strictly bound by a scaling of O(n3 ) operations and O(n2 ) memory with a limit on the form of allowed approximations. Other approximations such as orbital localization77 and stochastic sampling78,79 can further reduce the scaling of MP2-like methods but introduce localization length and number of samples as complications. A critical question is whether or not these cost restrictions are sufficient to enable method development based on the direct approximation of a quantum state as in CC theory or if it is to be confined to the indirect and often empirical development that is characteristic of modern DFT80 . VIII.

CONCLUSIONS

The results of this paper cross an important threshold. For the first time, the computational cost of an electron correlation model with a direct connection to an underlying wavefunction has been reduced to the canonical cost of an SCF calculation, O(n3 ) operations and O(n2 ) memory, without localization or stochastic approximations. This includes MP2 theory and an RPA+SOSEX approximation of BCCD theory. The new ideas used to achieve this result may also contribute to reducing the cost of more accurate electron correlation models. Electron correlation models that require O(n3 ) operations and O(n2 ) memory could constitute a distinct paradigm with continued development. They are a compromise between the low cost of DFT and the high accuracy of CC theory. For this compromise to be worthwhile, we need to explore the balance between cost and accuracy to identify its fundamental limits. ACKNOWLEDGMENTS

I thank Jay Sau, Norm Tubman, Jeff Hammond, Andrew Baczewski, Rick Muller, and Toby Jacobson for discussions. I thank Andrew Baczewski for checking the mathematics and proofreading the manuscript. This work was supported by the Laboratory Directed Research and Development program at Sandia National Laboratories. Sandia National Laboratories is a multi-program laboratory managed and operated by Sandia Corporation, a wholly owned subsidiary of Lockheed Martin Corporation, for the U.S. Department of Energy’s National Nuclear Security Administration under contract DE-AC0494AL85000.

12 We approximate Eq. (A4) with a 1-parameter quadrature,

Appendix A: Numerical quadrature

An implementation of the fast MP2 and BRPA algorithms requires numerical quadratures for the integrals appearing in Eqs. (25) and (26). We restate these integrals and quadratures in a more general notation with explicit summations, Z i∞ 1 dx 1 = 0 a + a0 −i∞ 2πi (a − x)(a + x) nx X wi ≈ , a, a0 ∈ A, 0+x) (a − x )(a i i i=1 I I θ(b, B)θ(c, C) dy dz 1 = b−c+x Y 2πi Z 2πi (b − y)(c − z)(y − z + x) ny X nz X Wi j (x) ≈ , b, c ∈ B ∪ C, (b − yi )(c − z j ) i=1 j=1 θ(s, S ) = 1 ∀s ∈ S , θ(s, S ) = 0 ∀s < S .

(A1)

A is the set of orbital transition energies (a − i ), B is the set of virtual orbital energies (a ), C is the set of occupied orbital energies (i ), X is the set of imaginary reals, and Y and Z are closed counterclockwise contours that satisfy B ⊂ in(Y), (B + X) ∩ in(Z) = ∅, C ⊂ in(Z), (C − X) ∩ in(Y) = ∅,

(A2)

for interior, in(S ) = {ts + (1 − t)s0 : t ∈ [0, 1], s, s0 ∈ S }, and elemental arithmetic, S ± S 0 = {s ± s0 : s ∈ S , s0 ∈ S 0 }, set operations. We assume A = B − C and a > 0 ∀a ∈ A. Eq. (A1) is assembled from more elementary quadratures. We split the first integral into two terms with partial fractions, " # Z i∞ dx 1 1 1 1 = − , (A3) a + a0 a + a0 −i∞ 2πi a − x −a0 − x and fit quadrature to its two terms on a merged domain, 1 θ(a, A) − = 2

Z

i∞

−i∞

2n+1 X w0 1 i sgn(a0 ) ≈ , a0 ∈ [−1, −k] ∪ [k, 1], 0 − x0 2 a i i=1

for 0 < k < 1 and map back to the original quadrature with " # " 0# wi wi min(A) (A7) xi = max(A) xi0 , k = max(A) , n x = 2n + 1. a0

a

Zolotarev’s rational approximation of the sign function using Jacobi elliptic functions minimizes the maximum error81 , ε=

for A0 = A ∪ {−a : a ∈ A}. For the second integral, we split with partial fractions after performing one of the integrations, " # I θ(b, B)θ(c, C) θ(c, C) dy 1 1 = − , b−c+x b − c + x Y 2πi b − y c − x − y " # I θ(b, B) dz 1 1 = − , b − c + x Z 2πi c − z b + x − z

θ(b, B) = θ(c, C) =

I

Y

Z

ny

X ui dy 1 ≈ , b ∈ B0 , 2πi b − y i=1 b − yi n

z X dz 1 vi ≈ , c ∈ C0, 2πi c − z i=1 c − zi

(A5)

for B0 = B ∪ (C − X) and C 0 = C ∪ (B + X). The fits combine to produce the quadrature weight, Wi j (x) = ui v j /(yi − z j + x). We assume that the reassembly of these quadratures is stable and leave further error analysis to future work.

r(x) =

1 x

n Y m=1

sn(mθ,k ) fm = k cn(mθ,k w01 = 0) ,

1 r(k)+r(κ)

2 x2 + f2m−1 2 , x2 + f2m

n Y

2 f2m−1 2 , f2m

m=1

w02m = w02m+1 =

1 1 2 r(k)+r(κ)

n Y p=1

2 2 f2m − f2p−1 2 − f 2 (1−δ ) f2m pm 2p

0 0 x10 = 0, x2m = −x2m+1 = i f2m , √ 0 k 2 k = 1 − k , κ = dn(θ,k0 ) , θ =

K 0 (k) 2n+1 .

,

(A8)

ε is the maximum error, which decays exponentially in n with an exponent of 2πK(k)/K 0 (k) that asymptotes to π2 / ln(4/k) in the k → 0 limit82 . sn(u, k), cn(u, k), and dn(u, k) are Jacobi elliptic functions. K(k) and iK 0 (k) are their quarter periods. We approximate Eq. (A5) with a 1-parameter quadrature, sgn(Re(b0 )) ≈

2n+1 X i=1

b0

u0i , b0 ∈ X ∪ [λ, 1], − y0i

(A9)

for 0 < λ < 1 and map back to the original quadratures with " # " 0# ui ui yi − max(C) = [max(B) − max(C)] y0i , b0

b − max(C)

λ= "

min(B)−max(C) max(B)−max(C) ,

#

vi zi − min(B) c − min(B)

λ=

ny = 2n + 1, or " 0# ui = [min(C) − min(B)] y0i , b0

max(C)−min(B) min(C)−min(B) ,

nz = 2n + 1.

(A10)

The interiors of B + X and C − X are omitted from Eq. (A9) by the maximum modulus principle. We map from Eq. (A6) to Eq. (A9) with a shift of sgn(a0 ) by 1/2 and a transformation, a0 =

and again fit quadratures to each contour on a merged domain, I

1 r(k)−r(κ) 2 r(k)+r(κ) , 0

n

x X wi dx 1 ≈ , a ∈ A0 , (A4) 2πi a − x i=1 a − xi

(A6)

(1+k)(b0 )2 −2k , 2−(1+k)(b0 )2

k=

λ2√ . 2−λ2 +2 1−λ2

(A11)

The rational transformation creates a reflection of the original contour about the imaginary axis and doubles the number of poles. We remove the new contour and its poles leaving r u0i =

w0i 1−k y0i (1+k)(1+xi0 )2 ,

y0i =

k+x0

i 2 (1+k)(1+x 0) .

(A12)

i

Empirically, we observe that the maximum error is twice ε in Eq. (A8). This approximant does not minimize the maximum error exactly, but we conjecture that the exponential decay rate of the error with n is optimal.

13 Appendix B: Semiempirical Hn model

To solve the semiempirical H2 model in Eq. (57), we use a family of reference orbitals parameterized by θ,      0   sin θ     0  θ  , φa = − cos  , φb = − cos θ 0   sin θ 0      0  cos θ (B1) φi =  sin0 θ  , φ j =  sin0 θ  , cos θ

0

where {i, j} label the two occupied orbitals and {a, b} label the two virtual orbitals. The nonzero matrix elements are hii = h jj = µ − t sin 2θ, haa pp V pp

= =

Vaiai =

(B2)

pq with symmetric extensions for qp ⇔ qp and rs ⇔ qp sr . The BCCD total energy in terms of T = T iabj is

E = E0 + 2(µ − t sin 2θ) + V + 21 (U − V)(1 − T )(sin 2θ)2 .

(B3)

The mean field equations are an orbital energy difference, ∆, and a coupling between occupied and virtual orbitals, ∆ = 2t sin 2θ + U − (U − V)(1 − T )(sin 2θ)2 ,

(B4a)

0 = [t(1 + T ) −

(B4b)

− V)(1 − T ) sin 2θ] cos 2θ.

These are the complete HF equations if T = 0. There are up to two solutions: a nonsymmetric state and a symmetric state for sin 2θ = 1. In BCCD theory, T is the solution to Eq. (12), − V)(1 − 4T + 3T )(sin 2θ) . 2

2

(B5)

The RPA Riccati equation in Eq. (14) similarly reduces to 0 = 2∆T − 21 (U − V)(1 − 4T + 4T 2 )(sin 2θ)2 ,

(B6)

which simplifies the 4-by-4 matrix equation for T iabj to a direct condition on T by exploiting matrix symmetries. For θ = 41 π, the quadratic formula is used to solve for T , T (exact) =

h

i 4t 2 U−V

+1−

4t U−V

≤ 1,

r h T (RPA) = 21 1 + T (BRPA) =

1 U−V 4 U+2t

∆ U−V

i

≤ 14 .



1 2

h 1+

∆ U−V

i2

− E), (B8)

where E0 = 1/R for interatomic separation R and U is set to its value at the dissociation limit, U = 0.5695. The H2 data is generated in gaussian09 using the cc-pV6Z basis83 . The data and parameters are compiled in Table II. We consider an extension to Hn having a pairwise form, 1X 1 , x, y ∈ {1, · · · , n}, 2 x,y R xy

z,x

jb = −Vbb = −Vbiji = 41 (U − V) sin 4θ,

r

µ=

2 1 1 16 (U − V) /(E HF − E) − 4 (E HF t − 14 (U + V) − 21 (E0 − EHF ).

V(xσ)(yσ0 ) = δ xy 0.5695 + (1 − δ xy )V(R xy ),

ja ib ia Vab = Vaiii = Vba = Vbj jj = −Vaa = −Vai jj

0 = 2(∆ − U)T −

t=

h(xσ)(yσ0 ) = −δσσ0 δ xy 21 − δσσ0 (1 − δ xy )t(R xy ) X + δσσ0 δ xy ∆µ(R xz ),

jj ij ii Vaa = Vbb = −Vab = 21 (U − V)(sin 2θ)2 ,

1 2 (U

E1 = 4(EHF − EMP2 )(EHF − E)/(EMP2 − E), h i p V = U + E1 1 − 1 + (E − EHF + 2U)/E1 ,

E0 =

hbb = µ + t sin 2θ, hia = hbj = t cos 2θ, Vaajj = Vbibi = U − 21 (U − V)(sin 2θ)2 , ab Vbbjj = Vab = Viijj = V + 21 (U − V)(sin 2θ)2 ,

1 2 (U

We determine one set of parameters for the semiempirical model by solving for {t, µ, V} given {EHF , EMP2 , E} at θ = 14 π,

(B9)

with spin indices, σ ∈ {↑, ↓}, and interatomic distances, R xy . Such a simple model will have limited transferability and care is required in fitting the parameter functions. For simplicity, we restrict the scaling study to uniform Hn rings that obey H¨uckel’s rule, n = 4m + 2. Here, R xy is qh  i . h  i R xy = R 1 − cos 2π(x−y) 1 − cos 2π (B10) n n , for nearest-neighbor distance R. Spin and spatial symmetries simplify the normalized orbitals to q q φ(1σ)(xσ0 ) = δσσ0 1n , φ(nσ)(xσ0 ) = δσσ0 1n (−1) x , q   φ(2mσ)(xσ0 ) = δσσ0 2n sin 2πmx , 1 ≤ m ≤ n2 − 1, n q   φ(2m+1σ)(xσ0 ) = δσσ0 2n cos 2πmx , (B11) n The density matrix sums to form the Dirichlet kernel, ρ(xσ)(yσ0 ) =

1 sin(π(x − y)/2) δσσ0 . n sin(π(x − y)/n)

(B12)

We use the HF energy of H6 , E6 , for fitting to these Hn rings. R is optimized by minimizing the HF total energy. We fit Eq. (B9) to {EHF , E6 , E} with exponential forms for t(R), ∆µ(R), and V(R) plus long-range asymptotic corrections for ∆µ(R) and V(R) with the form of an electrostatic potential between a 1s Slater-type orbital and point charge. With 6 free parameters, the result of this nonlinear least-squares fit is V(R) = 0.304 exp(−0.616R) + [1 − exp(−0.616R)]/R, ∆µ(R) = 0.000 exp(−1.235R) − [1 − exp(−1.235R)]/R, t(R) = 0.206 exp(−0.383R), (B13)

− 1 ≤ 12 , (B7)

The bounds on T are achieved at dissociation and for ∆ = 0 in the RPA case, which deviates from the BRPA value of ∆. The approximations change the fundamental range of T values.

with a root-mean-square error of 0.008 Ha/atom. The primed parameters calculated from Eq. (B13) are given in Table II. In this model, Hn rings have optimal R values between 1.36 and 1.89 and their HF energy asymptotes to −0.527 Ha/atom.

R 0.5 0.7 0.9 1.1 1.3 1.5 1.7 1.9 2.1 2.3 2.5 3.0 3.5 4.0 4.5 5.0 5.5 6.0 7.0 8.0 10.0 12.0 16.0 32.0 64.0 ∞

HF(H6 ) 3.35175 −0.27931 −1.86432 −2.62333 −2.99665 −3.17349 −3.24462 −3.25702 −3.23641 −3.19745 −3.14877 −3.01377 −2.88444 −2.77154 −2.67665 −2.59835 −2.53436 −2.48232 −2.40582 −2.35516 −2.29692 −2.26638 −2.23446 −2.18483 −2.16530 −2.14577

HF −0.48666 −0.88259 −1.04433 −1.11045 −1.13202 −1.13137 −1.11934 −1.10160 −1.08123 −1.05996 −1.03878 −0.98932 −0.94687 −0.91161 −0.88272 −0.85919 −0.84008 −0.82454 −0.80152 −0.78604 −0.76768 −0.75758 −0.74655 −0.73088 −0.72307 −0.71526

B3LYP −0.53368 −0.92834 −1.08972 −1.15616 −1.17854 −1.17909 −1.16858 −1.15262 −1.13425 −1.11519 −1.09640 −1.05352 −1.01831 −0.99060 −0.96928 −0.95312 −0.94096 −0.93187 −0.92003 −0.91341 −0.90729 −0.90475 −0.90242 −0.89929 −0.89772 −0.89616

UB3LYP −0.53368 −0.92834 −1.08972 −1.15616 −1.17854 −1.17909 −1.16858 −1.15262 −1.13425 −1.11519 −1.09640 −1.05444 −1.02916 −1.01666 −1.01074 −1.00788 −1.00646 −1.00573 −1.00513 −1.00496 −1.00489 −1.00488 −1.00488 −1.00488 −1.00488 −1.00488

MP2 −0.52150 −0.91676 −1.07811 −1.14410 −1.16578 −1.16546 −1.15399 −1.13702 −1.11764 −1.09758 −1.07784 −1.03302 −0.99677 −0.96934 −0.94992 −0.93749 −0.93098 −0.92941 −0.93770 −0.95637 −1.00903 −1.06870 −1.19169 −1.70376 −2.75324 −∞

exact −0.52641 −0.92184 −1.08348 −1.14991 −1.17221 −1.17273 −1.16233 −1.14673 −1.12904 −1.11105 −1.09380 −1.05718 −1.03171 −1.01625 −1.00788 −1.00369 −1.00170 −1.00078 −1.00017 −1.00004 −1.00000 −1.00000 −1.00000 −1.00000 −1.00000 −1.00000

µ −1.09542 −1.01031 −0.93969 −0.88345 −0.84003 −0.80743 −0.78364 −0.76666 −0.75476 −0.74644 −0.74049 −0.73048 −0.72172 −0.71169 −0.70042 −0.68868 −0.67710 −0.66608 −0.64641 −0.63013 −0.60596 −0.58946 −0.56858 −0.53588 −0.51848 −0.50000

µ0 −1.42142 −1.32677 −1.24548 −1.17541 −1.11478 −1.06210 −1.01617 −0.97595 −0.94059 −0.90939 −0.88175 −0.82513 −0.78192 −0.74821 −0.72136 −0.69958 −0.68161 −0.66657 −0.64283 −0.62499 −0.60000 −0.58333 −0.56250 −0.53125 −0.51562 −0.50000 t 0.31841 0.31680 0.31076 0.30041 0.28621 0.26887 0.24918 0.22810 0.20649 0.18512 0.16459 0.11939 0.08458 0.05929 0.04140 0.02880 0.01988 0.01352 0.00567 0.00157 −0.00176 −0.00272 −0.00300 −0.00231 −0.00142 0.00000

t0 0.17010 0.15756 0.14594 0.13518 0.12521 0.11598 0.10742 0.09950 0.09216 0.08537 0.07907 0.06529 0.05391 0.04452 0.03676 0.03035 0.02506 0.02069 0.01411 0.00962 0.00447 0.00208 0.00045 0.00000 0.00000 0.00000

U & U0 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 0.56950 V 0.11250 0.11662 0.12141 0.12688 0.13300 0.13964 0.14664 0.15375 0.16070 0.16727 0.17324 0.18470 0.19054 0.19123 0.18795 0.18205 0.17464 0.16651 0.15010 0.13524 0.11195 0.09564 0.07473 0.04056 0.02134 0.00000

V0 0.75358 0.69790 0.64749 0.60180 0.56036 0.52271 0.48849 0.45734 0.42896 0.40307 0.37942 0.32871 0.28783 0.25460 0.22734 0.20478 0.18595 0.17008 0.14502 0.12630 0.10043 0.08347 0.06251 0.03125 0.01563 0.00000

TABLE II. Total energies (of H2 unless otherwise stated) and model parameters (in Hartrees) at nearest-neighbor interatomic separation R (in Bohrs).

14

15 1 A.

Szabo and N. S. Ostlund, Modern Quantum Chemistry: Introduction to Advanced Electronic Structure Theory (Dover, New York, 1996). 2 P. Hohenberg and W. Kohn, Phys. Rev. 136, B864 (1964). 3 W. Kohn and L. J. Sham, Phys. Rev. 140, A1133 (1965). 4 J. P. Perdew and A. Zunger, Phys. Rev. B 23, 5048 (1981). 5 P. R. Briddon and M. J. Rayson, Phys. Status Solidi B 248, 1309 (2011). 6 R. J. Bartlett and M. Musiał, Rev. Mod. Phys. 79, 291 (2007). 7 G. E. Scuseria, T. M. Henderson, and D. C. Sorensen, J. Chem. Phys. 129, 231101 (2008). 8 F. Furche, J. Chem. Phys. 129, 114105 (2008). 9 H. Eshuis, J. Yarkony, and F. Furche, J. Chem. Phys. 132, 234114 (2010). 10 A. Gr¨ uneis, M. Marsman, J. Harl, L. Schimka, and G. Kresse, J. Chem. Phys. 131, 154115 (2009). 11 D. M. Ceperley and B. J. Alder, Phys. Rev. Lett. 45, 566 (1980). 12 D. L. Freeman, Phys. Rev. B 15, 5512 (1977). 13 J. Paier, B. G. Janesko, T. M. Henderson, G. E. Scuseria, A. Gr¨ uneis, and G. Kresse, J. Chem. Phys. 132, 094103 (2010); 133, 179902 (2010). 14 A. Heßelmann, J. Chem. Phys. 134, 204107 (2011). 15 X. Ren, P. Rinke, G. E. Scuseria, and M. Scheffler, Phys. Rev. B 88, 035120 (2013). 16 J. Paier, X. Ren, P. Rinke, G. E. Scuseria, A. Gr¨ uneis, G. Kresse, and M. Scheffler, New J. Phys. 14, 043002 (2012). 17 C. Van Alsenoy, J. Comput. Chem. 9, 620 (1988). 18 L. Greengard and V. Rokhlin, J. Comput. Phys. 73, 325 (1987). 19 J. Alml¨ of, Chem. Phys. Lett. 181, 319 (1991). 20 R. K. Nesbet, Phys. Rev. 109, 1632 (1958). 21 L. Z. Stolarczyk and H. J. Monkhorst, Int. J. Quantum Chem., Symp. 26, 267 (1984). 22 L. Goerigkab and S. Grimme, Phys. Chem. Chem. Phys. 13, 6670 (2011). 23 A. J. Cohen, P. Mori-S´ anchez, and W. Yang, Chem. Rev. 112, 289 (2012). 24 T. Koopmans, Physica 1, 104 (1934). (in German) 25 B. P. Molinari, SIAM J. Control 11, 262 (1973). 26 T. N. Lan and T. Yanai, J. Chem. Phys. 138, 224108 (2013). 27 Y. Akinaga, Y. Kawashima, and S. Ten-no, Chem. Phys. Lett. 506, 276 (2011). 28 I. Lindgren and S. Salomonson, Int. J. Quantum Chem. 90, 294 (2002). 29 L. Hedin, Phys. Rev. 139, A796 (1965). 30 R. M. Parrish, E. G. Hohenstein, T. J. Mart´ınez, and C. D. Sherrill, J. Chem. Phys. 137, 224106 (2012). 31 H. Koch, A. S´ anchez de Mer´as, and T. B. Pedersen, J. Chem. Phys. 118, 9481 (2003). 32 T. J. Mart´ınez and E. A. Carter, J. Chem. Phys. 100, 3631 (1994). 33 F. Weigend, A. K¨ ohn, and C. H¨attig, J. Chem. Phys. 116, 3175 (2002). 34 P. E. Bl¨ ochl, Phys. Rev. B 50, 17953 (1994). 35 J. Harl, L. Schimka, and G. Kresse, Phys. Rev. B 81, 115126 (2010). The α values are generated by vasp using the reported geometries and planewave cutoffs. For an energy cutoff E in Ha and a volume per electron V in Bohr, α ≈ V(2E)3/2 /(3π2 ) using basic accounting of planewaves inside a cubic unit cell in real space and a spherical Fermi sea in momentum space. 36 L. Lin, J. Lu, L. Ying, and W. E, Chinese Ann. Math. B 30, 729 (2009). 37 A. Takatsuka, S. Ten-no, and W. Hackbusch, J. Chem. Phys. 129, 044112 (2008). 38 S. A. Goreinov, E. E. Tyrtyshnikov , and N. L. Zamarashkin, Linear Algebra Appl. 261, 1 (1997). 39 C. A. White, B. G. Johnson, P. M.W. Gill, and M. Head-Gordon, Chem. Phys. Lett. 230, 8 (1994). 40 J. Paier, R. Hirschl, M. Marsman, and G. Kresse, J. Chem. Phys. 122, 234102 (2005). 41 B. Shanker and H. Huang, J. Comput. Phys. 226, 732 (2007). 42 R. D. Skeel, I. Tezcan, and D. J. Hardy, J. Comput. Chem. 23, 673 (2002). 43 S. B¨ orm, L. Grasedyck, and W. Hackbusch, Eng. Anal. Bound. Elem. 27, 405 (2003). 44 G. Jansen, R.-F. Liu, and J. G. Angy´ ´ an, J. Chem. Phys. 133, 154106 (2010). 45 M. Feyereisen, G. Fitzgerald, and A. Komornicki, Chem. Phys. Lett. 208, 359 (1993). 46 E. G. Hohenstein, R. M. Parrish, and T. J. Mart´ınez, J. Chem. Phys. 137, 044103 (2012). 47 M. H¨ aser, Theor. Chim. Acta 87, 147 (1993). 48 E. G. Hohenstein, R. M. Parrish, C. D. Sherrill, and T. J. Mart´ınez, J. Chem. Phys. 137, 221101 (2012). 49 I. Fischer-Hjalmars, J. Chem. Phys. 42, 1962 (1965). 50 W. Thiel, J. Am. Chem. Soc. 103, 1413 (1981). 51 T. M. Henderson and G. E. Scuseria, Mol. Phys. 108, 2511 (2010).

52 P.

Mori-S´anchez, A. J. Cohen, and W. Yang, Phys. Rev. A 85, 042507 (2012). 53 F. Caruso, D. R. Rohr, M. Hellgren, X. Ren, P. Rinke, A. Rubio, and M. Scheffler, Phys. Rev. Lett. 110, 146403 (2013). 54 A. Heßelmann and A. G¨ orling, Mol. Phys. 109, 2473 (2011). 55 J. Hubbard, Proc. R. Soc. A 243, 336 (1957). 56 C. A. Jim´ enez-Hoyos, T. M. Henderson, T. Tsuchimochi, and G. E. Scuseria, J. Chem. Phys. 136, 164109 (2012). 57 See the supplementary material at http://arxiv.org/e-print/1303.3847 for a c implementation of the algorithms in Sec. V. 58 E. Anderson, Z. Bai, C. Bischof, S. Blackford, J. Demmel, J. Dongarra, J. Du Croz, A. Greenbaum, S. Hammarling, A. McKenney, and D. Sorensen, LAPACK Users’ Guide, Third Edition (SIAM, Philadelphia, 1999). 59 J. J. Dongarra, J. Du Croz, I. S. Duff, and S. Hammarling, ACM Trans. Math. Soft. 16, 1 (1990). 60 In inBRPAf, tensors of the form Z e and Z e have symmetries Z e = (Z e )∗ xy f f  ∗ f∗ e = (Z e )∗ . MP2f contains these symmetries and adds Z ep = Z ep and Z xy , x x xy  p0 ∗ ∗  p∗ 0 e = Ze . outBRPAf contains these symmetries Z xy = Z xy , and Zai a∗ i∗  ∗ ∗ ∗ a i ai and adds Z x = Z x . Only half the entries of these symmetric tensors

need to be stored and calculated, which halves the memory and operation counts. Complex arithmetic doubles memory usage and FFT runtimes and triples matrix multiplication runtimes over real arithmetic. The additional cost of complex arithmetic is mitigated for all the operations except matrix multiplication (zgemm), which is reduced to 1.5 times the cost model. 61 M. Frigo and S. G. Johnson, Proc. IEEE 93, 216 (2005). 62 The leading-order cost model assumes that all parameters are much larger than one, which is false for α = 2. We correct the conventional algorithm costs with α3 → α(α − 1)2 for the operation counts, 3α2 → 3α2 + 4α − 2 for the MP2 memory cost, and α2 → α(α − 1) for both BRPA memory costs. 63 D. G. Truhlar, Chem. Phys. Lett. 294, 45 (1998). 64 T. Kato, Commun. Pure Appl. Math. 10, 151 (1957). 65 L. Kong , F. A. Bischoff , and E. F. Valeev, Chem. Rev. 112, 75 (2012). 66 N. D. Drummond, M. D. Towler, and R. J. Needs, Phys. Rev. B 70, 235119 (2004). 67 S. Goedecker, Rev. Mod. Phys. 71, 1085 (1999). 68 M. Gell-Mann and K. A. Brueckner, Phys. Rev. 106, 364 (1957). 69 K. Sawada, Phys. Rev. 106, 372 (1957). 70 M. Baranger, Phys. Rev. 120, 957 (1960). 71 A. D. McLachlan and M. A. Ball, Rev. Mod. Phys. 36, 844 (1964). 72 B. Jancovici and D. H. Schiff, Nucl. Phys. 58, 678 (1964). 73 A. J. Glick, H. J. Lipkin, and N. Meshkov, Nucl. Phys. 62, 211 (1965). 74 J. Harris and R. O. Jones, J. Phys. F 4, 1170 (1974). 75 H. Eshuis, J. E. Bates, and F. Furche, Theor. Chem. Acc. 131, 1084 (2012). 76 X. Ren, P. Rinke, C. Joas, and M. Scheffler, J. Mater. Sci. 47, 7447 (2012). 77 P. Y. Ayala and G. E. Scuseria, J. Chem. Phys. 110, 3660 (1999). 78 S. Y. Willow, K. S. Kim, and S. Hirata, J. Chem. Phys. 137, 204122 (2012). 79 D. Neuhauser, E. Rabani, and R. Baer, J. Chem. Theory Comput. 9, 24 (2013). 80 J. P. Perdew and K. Schmidt, AIP Conf. Proc. 577, 1 (2001). 81 E. I. Zolotarev, Zap. Imp. Akad. Nauk, St. Petersburg 30, 5 (1877); reprinted in Collected works (Izdat. Akad. Nauk SSSR, Leningrad, 1932), Vol. 2, pp. 1-59. (in Russian) 82 A. A. Gonˇ car, Math. USSR Sb. 7, 623 (1969). 83 M. J. Frisch, G. W. Trucks, H. B. Schlegel, G. E. Scuseria, M. A. Robb, J. R. Cheeseman, G. Scalmani, V. Barone, B. Mennucci, G. A. Petersson, H. Nakatsuji, M. Caricato, X. Li, H. P. Hratchian, A. F. Izmaylov, J. Bloino, G. Zheng, J. L. Sonnenberg, M. Hada, M. Ehara, K. Toyota, R. Fukuda, J. Hasegawa, M. Ishida, T. Nakajima, Y. Honda, O. Kitao, H. Nakai, T. Vreven, J. A. Montgomery, Jr., J. E. Peralta, F. Ogliaro, M. Bearpark, J. J. Heyd, E. Brothers, K. N. Kudin, V. N. Staroverov, R. Kobayashi, J. Normand, K. Raghavachari, A. Rendell, J. C. Burant, S. S. Iyengar, J. Tomasi, M. Cossi, N. Rega, J. M. Millam, M. Klene, J. E. Knox, J. B. Cross, V. Bakken, C. Adamo, J. Jaramillo, R. Gomperts, R. E. Stratmann, O. Yazyev, A. J. Austin, R. Cammi, C. Pomelli, J. W. Ochterski, R. L. Martin, K. Morokuma, V. G. Zakrzewski, G. A. Voth, P. Salvador, J. J. Dan¨ Farkas, J. B. Foresman, J. V. Ortiz, nenberg, S. Dapprich, A. D. Daniels, O. J. Cioslowski, and D. J. Fox, gaussian 09, Revision C.01, Gaussian, Inc., Wallingford CT, 2009.

Suggest Documents