Algorithms and Computational Aspects of DFT Calculations - Part II

13 downloads 195 Views 2MB Size Report
Sep 27, 2008 - molecular processes of solar energy utilization within reach. .... simulation, Gordon Bell prize (Ujfalussy, Stocks, Canning, Y. Wang, Shelton.
Algorithms and Computational Aspects of DFT Calculations Part II

Juan Meza and Chao Yang High Performance Computing Research Lawrence Berkeley National Laboratory

IMA Tutorial Mathematical and Computational Approaches to Quantum Chemistry Institute for Mathematics and its Applications, University of Minnesota September 26-27, 2008

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

1 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

2 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

3 / 37

Goals

1

The Role of Computation

2

Review Equations and Solution Techniques

3

Discuss Major Computational Aspects of Plane Wave DFT codes

4

Present Some Parallelization Issues

5

Highlight Computational Challenges

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

4 / 37

Materials by design

Advances in density functional theory coupled with multinode computational clusters now enable accurate simulation of the behavior of multi-thousand atom complexes that mediate the electronic and ionic transfers of solar energy conversion. These new and emerging nanoscience capabilities bring a fundamental understanding of the atomic and molecular processes of solar energy utilization within reach. Basic Research Needs for Solar Energy Utilization, Report of the BES Workshop on Solar Energy Utilization,April 18-21, 2005 Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

5 / 37

DFT codes are widely used for science applications

9470 nodes; 19,480 cores 13 Tflops/s SSP (100 Tflops/s peak) Upgrade to QuadCore (355 Tflops/s peak) DFT methods account for 75% of the materials sciences simulations at NERSC, totaling over 5 Million hours of computer time in 2006

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

6 / 37

We can now simulate some realistic structures

The charge density of a 15,000 atom quantum dot, Si13607 H2236 . Using 2048 processors at NERSC the calculation took about 5 hours.

Juan Meza (LBNL)

The calculated dipole moment of a 2633 atom CdSe quantum rod, Cd961 Se724 H948 . Using 2560 processors at NERSC the calculation took about 30 hours.

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

7 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

8 / 37

Kohn-Sham Equations Recall our goal is to find the ground state energy by minimizing the Kohn-Sham total energy, Etotal Leads to:

Kohn-Sham equations Hψi

= i ψi , i = 1, 2, ..., ne   1 H = − ∇2 + V (ρ(r)) , 2 Z ρ V (ρ(r)) = Vext (r) + + Vxc (ρ) |r − r0 | Nonlinear eigenvalue problem since the Hamiltonian, H, depends on ψ through the charge density, ρ

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

9 / 37

Discretized Kohn-Sham Equations KKT conditions ∇X L(X, Λ)

=

X ∗X

0,

= Ine .

Discretized Kohn-Sham equations can now be written as: H(X)X ∗

X X

=

XΛ,

=

Ine .

Kohn-Sham Hamiltonian given by: H(X) V (X)

Juan Meza (LBNL)

1 L + V (X), 2 = Vext + Diag (L† ρ(X)) + Diag gxc (ρ(X))

=

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

10 / 37

The SCF Iteration   − 12 ∇2 + V (ρ(r)) ψi = Ei ψi

{ψi }i=1,...,ne

ρ(r) =

Pne i

|ψi (r)|2

V (ρ(r))

Juan Meza (LBNL)

1

Given an initial charge density ρ compute a potential Vk (ρ(r))

2

Solve the linear eigenvalue problem for the ψi , i = 1, . . . , ne

3

Compute the new charge density ρ

4

Update ρ using your favorite mixing scheme

5

Compute Vk+1 and repeat until converged Overall computational complexity is O(N n2e ) due to linear algebra Major computational components CG method Orthogonalization Computation of potentials 3D FFT

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

11 / 37

What Are the Computational Issues?

DFT methods account for 75% of the material science simulations at NERSC Parallel efficiencies can be quite high on plane wave basis can scale to ≈ 1000 processors on plane wave basis and wavefunction index can scale to ≈ 10, 000 processors

Most codes still based on O(N 3 ) algorithms Not systematically improvable Inadequate for strong and/or non-local correlations Parallel efficiencies can be difficult to achieve; 10-20% parallel efficiency is not uncommon

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

12 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

13 / 37

Major Computational Components of Plane Wave DFT Codes

Eigenvalue solver Orthogonalization 3D FFTs Computation of potentials

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

14 / 37

Eigenvalue Solver

Need to solve one N × ne linear eigenvalue problem at each SCF iteration The size of N can easily be 10,000 – 100,000 Only need the ne (≈ number of atoms) lowest eigenvalues and corresponding eigenvectors Called diagonalization in chemistry/materials science circles Various approaches including CG, Grassmann CG, residual minimization Distinction is usually made between all band vs. band-by-band, which corresponds to solving for all eigenvectors simultaneously vs. solving for one eigenvector at a time. We would call this blocked vs. unblocked Use of optimized high-level BLAS3 routines can significantly improve performance

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

15 / 37

Orthogonalization

Due to physical constraints, the electronic wavefunctions must be orthonormal This adds a constraint to the KS equations in the form of X ∗ X = Ine Can be time consuming for large systems Complexity is O(N n2e ), where N is the size of the discretization and ne is the number of electrons

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

16 / 37

FFTs

Recall that the kinetic energy operator takes on a particularly simple form in Fourier space (also called G-space) Most DFT codes take advantage of this fact by converting from real space to G-space for computation of the Hamiltonian Since systems are usually 3D, codes need to compute the 3D FFTs through a series of 1D FFTs This has a consequence both in the total amount of work and when trying to parallelize the codes

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

17 / 37

Computation of potentials

The Hartree potential, R ρ VHartree = |r−r 0 , can be computed in several ways | The calculation can be posed as the solution of a Poisson problem. Fast Poisson solvers or multigrid can also be used Because the potential can be viewed a convolution, it can also be computed using FFTs

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

18 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

19 / 37

Parallel Calculations Milestones

1991 Silicon surface reconstruction (7x7), Meiko I860, 64 processor, (Stich, Payne, King-Smith, Lin, Clarke) 1998 FeMn alloys (exchange bias), Cray T3E, 1500 procs; First > 1 Tflop simulation, Gordon Bell prize (Ujfalussy, Stocks, Canning, Y. Wang, Shelton et al.) 2005 1000 atom Molybdenum simulation with Qbox, BlueGene/L at LLNL with 32,000 processors (F. Gygi et al.) 2008 Band-gap calculation of a 13,824 atom ZnTeO alloy proposed as a new solar cell material. Used 131,072 processors on Blue Gene/P at ANL achieved 107.5 Tflops/s

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

20 / 37

Parallelization Strategies

Parallel across k-points – Not useful for large systems as k is usually small Parallel over electrons – number of processors limited by number of electrons Parallel over the number of plane-wave basis, ng – most commonly used in plane-wave codes Parallelization of DFT codes is nontrivial and most codes cannot scale to large numbers of processors with even moderate efficiencies. 30% parallel efficiency is usually considered very good Parallelization issues for Hartree-Fock codes are similar, especially for SCF

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

21 / 37

Parallelization of 3D FFT

3D FFTs are computed via 3 sets of 1D FFTs and 2 transposes Most of the communication is in global transpose (b) to (c) Ratio of flops/comm ≈ log N Many FFTs are computed at the same time to avoid latency issues Only non-zero elements computed/communicated For details see (Canning et al.): http://www.nersc.gov/projects/paratec/

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

22 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

23 / 37

Linear Scaling Electronic Structure Methods

Goal is to reduce the computational work from O(N 3 ) to O(N ) Quantum mechanical effects are near-sighted, e.g. treat the computation of the exchange-correlation potential locally Need to introduce concept of a localization region, inside which the quantity of interest is computed and is assumed to vanish outside the region Six strategies for taking advantage of this fact (see Goedecker (1999)): 1 2 3 4 5 6

Fermi operator expansion Fermi operator projection Divide-and-conquer Density-matrix minimization Orbital minimization approach Optimal basis density-matrix minimization

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

24 / 37

LS3DF

Based on Divide-and-Conquer approach Divide a large system into smaller sub-domains that can be solved independently, then stitch the sub-domains back together again Classical electrostatic interactions are long-ranged, i.e. solve one global Poisson equation Requires minimal communication between the sub-domains Artificial boundary effects due to sub-dividing domains can be cancelled out Based on ideas from fragment molecular method We call our method Linear Scaling 3D Fragment or LS3DF

1 L.W.

1

Wang, Z. Zhao, J. Meza, LBNL-61691 (2006)

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

25 / 37

Parallelism Issues

Multi-core and many-core is the wave of the future Current algorithms for parallelism are difficult to parallelize with high efficiency Many quantum chemistry codes do not parallelize well for even medium scaled paralellism IBM Cell Blade. Same processor as found in a Sony Playstation 3

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

26 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

27 / 37

Electronic Structure Codes

ABINIT – www.abinit.org PARATEC – www.nersc.gov/projects/paratec PEtot – hpcrd.lbl.gov/linwang/PEtot/PEtot.html PWscf – www.pwscf.org NWChem – www.emsl.pnl.gov/docs/nwchem/nwchem.html Q-Chem – www.q-chem.com/ Quantum Espresso – www.quantum-espresso.org Socorro – dft.sandia.gov/Socorro VASP – cms.mpi.univie.ac.at/vasp Many, many more – apologies if your favorite code was not listed

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

28 / 37

KSSOLV Matlab package

KSSOLV Matlab code for solving the Kohn-Sham equations Open source package Handles SCF, DCM, Trust Region Example problems to get started with Object-oriented design - easy to extend Good starting point for students Beta version of KSSOLV available, ask one of us for more information!

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

29 / 37

Example: SiH4

a1 = Atom(’Si’); a2 = Atom(’H’); alist = [a1 a2 a2 a2 a2]; xyzlist= [ 0.0 0.0 0.0 1.61 1.61 1.61 ... ]; mol = Molecule(); mol = set(mol,’supercell’,C); mol = set(mol,’atomlist’,alist); mol = set(mol,’xyzlist’ ,xyzlist); mol = set(mol,’ecut’, 25); mol = set(mol,’name’,’SiH4’); ... isosurface(rho);

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

30 / 37

Convergence

[Etot, X, vtot, rho] = scf(mol); [Etot, X, vtot, rho] = dcm(mol);

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

31 / 37

Charge Density isosurface(rho);

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

32 / 37

Example: P t6 N i2 O

cell: 19.59 0.0 0.0 ... sampling size: n1 = 96, n2 = 48, n3 = 48 atoms and coordinates: 1 Pt 1.3 -0.180 -0.015 ... 7 Ni 8.4 0.003 3.069 8 Ni 8.5 7.998 7.762 9 O 14.9 2.644 1.511 number of electrons : 86 spin type : 1 kinetic energy cutoff: 60.0

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

33 / 37

Comparison of DCM vs. SCF

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

34 / 37

1

Goals and Motivation

2

Review of Equations

3

Plane Wave DFT Computational Components

4

Parallelization Strategies

5

Future Computational Challenges Linear Scaling Methods Parallelism Issues

6

Software Available Codes KSSOLV

7

Summary

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

35 / 37

Summary

Described most common PW DFT computational components Overview of standard numerical methods used Brief introduction into some parallelization issues Listed some computational challenges Introduced KSSOLV, Matlab package for solving KS equations

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

36 / 37

References

Aron J. Cohen, Paula Mori-Snchez, Weitao Yang, Insights into Current Limitations of Density Functional Theory, Science, Vol. 321. no. 5890, pp. 792 - 794 (2008). F. Gygi, R. K. Yates, J. Lorenz, E. W. Draeger, F. Franchetti, C. W. Ueberhuber, B. R. de Supinski, S. Kral, J. A. Gunnels, J. C. Sexton , Proceedings of the 2005 ACM/IEEE conference on Supercomputing (2005). G. Goedecker, Linear Scaling Electronic Structure Methods, Rev. Mod. Phys. 71, 1085 (1999). Curtis L. Janssen and Ida M.B. Nielsen, Parallel Computing in Quantum Chemistry, CRC Press, (2008).

Juan Meza (LBNL)

Algorithms and Computational Aspects of DFT Calculations

September 27, 2008

37 / 37

Suggest Documents