Sep 27, 2008 - molecular processes of solar energy utilization within reach. .... simulation, Gordon Bell prize (Ujfalussy, Stocks, Canning, Y. Wang, Shelton.
Algorithms and Computational Aspects of DFT Calculations Part II
Juan Meza and Chao Yang High Performance Computing Research Lawrence Berkeley National Laboratory
IMA Tutorial Mathematical and Computational Approaches to Quantum Chemistry Institute for Mathematics and its Applications, University of Minnesota September 26-27, 2008
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
1 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
2 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
3 / 37
Goals
1
The Role of Computation
2
Review Equations and Solution Techniques
3
Discuss Major Computational Aspects of Plane Wave DFT codes
4
Present Some Parallelization Issues
5
Highlight Computational Challenges
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
4 / 37
Materials by design
Advances in density functional theory coupled with multinode computational clusters now enable accurate simulation of the behavior of multi-thousand atom complexes that mediate the electronic and ionic transfers of solar energy conversion. These new and emerging nanoscience capabilities bring a fundamental understanding of the atomic and molecular processes of solar energy utilization within reach. Basic Research Needs for Solar Energy Utilization, Report of the BES Workshop on Solar Energy Utilization,April 18-21, 2005 Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
5 / 37
DFT codes are widely used for science applications
9470 nodes; 19,480 cores 13 Tflops/s SSP (100 Tflops/s peak) Upgrade to QuadCore (355 Tflops/s peak) DFT methods account for 75% of the materials sciences simulations at NERSC, totaling over 5 Million hours of computer time in 2006
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
6 / 37
We can now simulate some realistic structures
The charge density of a 15,000 atom quantum dot, Si13607 H2236 . Using 2048 processors at NERSC the calculation took about 5 hours.
Juan Meza (LBNL)
The calculated dipole moment of a 2633 atom CdSe quantum rod, Cd961 Se724 H948 . Using 2560 processors at NERSC the calculation took about 30 hours.
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
7 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
8 / 37
Kohn-Sham Equations Recall our goal is to find the ground state energy by minimizing the Kohn-Sham total energy, Etotal Leads to:
Kohn-Sham equations Hψi
= i ψi , i = 1, 2, ..., ne 1 H = − ∇2 + V (ρ(r)) , 2 Z ρ V (ρ(r)) = Vext (r) + + Vxc (ρ) |r − r0 | Nonlinear eigenvalue problem since the Hamiltonian, H, depends on ψ through the charge density, ρ
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
9 / 37
Discretized Kohn-Sham Equations KKT conditions ∇X L(X, Λ)
=
X ∗X
0,
= Ine .
Discretized Kohn-Sham equations can now be written as: H(X)X ∗
X X
=
XΛ,
=
Ine .
Kohn-Sham Hamiltonian given by: H(X) V (X)
Juan Meza (LBNL)
1 L + V (X), 2 = Vext + Diag (L† ρ(X)) + Diag gxc (ρ(X))
=
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
10 / 37
The SCF Iteration − 12 ∇2 + V (ρ(r)) ψi = Ei ψi
{ψi }i=1,...,ne
ρ(r) =
Pne i
|ψi (r)|2
V (ρ(r))
Juan Meza (LBNL)
1
Given an initial charge density ρ compute a potential Vk (ρ(r))
2
Solve the linear eigenvalue problem for the ψi , i = 1, . . . , ne
3
Compute the new charge density ρ
4
Update ρ using your favorite mixing scheme
5
Compute Vk+1 and repeat until converged Overall computational complexity is O(N n2e ) due to linear algebra Major computational components CG method Orthogonalization Computation of potentials 3D FFT
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
11 / 37
What Are the Computational Issues?
DFT methods account for 75% of the material science simulations at NERSC Parallel efficiencies can be quite high on plane wave basis can scale to ≈ 1000 processors on plane wave basis and wavefunction index can scale to ≈ 10, 000 processors
Most codes still based on O(N 3 ) algorithms Not systematically improvable Inadequate for strong and/or non-local correlations Parallel efficiencies can be difficult to achieve; 10-20% parallel efficiency is not uncommon
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
12 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
13 / 37
Major Computational Components of Plane Wave DFT Codes
Eigenvalue solver Orthogonalization 3D FFTs Computation of potentials
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
14 / 37
Eigenvalue Solver
Need to solve one N × ne linear eigenvalue problem at each SCF iteration The size of N can easily be 10,000 – 100,000 Only need the ne (≈ number of atoms) lowest eigenvalues and corresponding eigenvectors Called diagonalization in chemistry/materials science circles Various approaches including CG, Grassmann CG, residual minimization Distinction is usually made between all band vs. band-by-band, which corresponds to solving for all eigenvectors simultaneously vs. solving for one eigenvector at a time. We would call this blocked vs. unblocked Use of optimized high-level BLAS3 routines can significantly improve performance
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
15 / 37
Orthogonalization
Due to physical constraints, the electronic wavefunctions must be orthonormal This adds a constraint to the KS equations in the form of X ∗ X = Ine Can be time consuming for large systems Complexity is O(N n2e ), where N is the size of the discretization and ne is the number of electrons
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
16 / 37
FFTs
Recall that the kinetic energy operator takes on a particularly simple form in Fourier space (also called G-space) Most DFT codes take advantage of this fact by converting from real space to G-space for computation of the Hamiltonian Since systems are usually 3D, codes need to compute the 3D FFTs through a series of 1D FFTs This has a consequence both in the total amount of work and when trying to parallelize the codes
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
17 / 37
Computation of potentials
The Hartree potential, R ρ VHartree = |r−r 0 , can be computed in several ways | The calculation can be posed as the solution of a Poisson problem. Fast Poisson solvers or multigrid can also be used Because the potential can be viewed a convolution, it can also be computed using FFTs
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
18 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
19 / 37
Parallel Calculations Milestones
1991 Silicon surface reconstruction (7x7), Meiko I860, 64 processor, (Stich, Payne, King-Smith, Lin, Clarke) 1998 FeMn alloys (exchange bias), Cray T3E, 1500 procs; First > 1 Tflop simulation, Gordon Bell prize (Ujfalussy, Stocks, Canning, Y. Wang, Shelton et al.) 2005 1000 atom Molybdenum simulation with Qbox, BlueGene/L at LLNL with 32,000 processors (F. Gygi et al.) 2008 Band-gap calculation of a 13,824 atom ZnTeO alloy proposed as a new solar cell material. Used 131,072 processors on Blue Gene/P at ANL achieved 107.5 Tflops/s
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
20 / 37
Parallelization Strategies
Parallel across k-points – Not useful for large systems as k is usually small Parallel over electrons – number of processors limited by number of electrons Parallel over the number of plane-wave basis, ng – most commonly used in plane-wave codes Parallelization of DFT codes is nontrivial and most codes cannot scale to large numbers of processors with even moderate efficiencies. 30% parallel efficiency is usually considered very good Parallelization issues for Hartree-Fock codes are similar, especially for SCF
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
21 / 37
Parallelization of 3D FFT
3D FFTs are computed via 3 sets of 1D FFTs and 2 transposes Most of the communication is in global transpose (b) to (c) Ratio of flops/comm ≈ log N Many FFTs are computed at the same time to avoid latency issues Only non-zero elements computed/communicated For details see (Canning et al.): http://www.nersc.gov/projects/paratec/
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
22 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
23 / 37
Linear Scaling Electronic Structure Methods
Goal is to reduce the computational work from O(N 3 ) to O(N ) Quantum mechanical effects are near-sighted, e.g. treat the computation of the exchange-correlation potential locally Need to introduce concept of a localization region, inside which the quantity of interest is computed and is assumed to vanish outside the region Six strategies for taking advantage of this fact (see Goedecker (1999)): 1 2 3 4 5 6
Fermi operator expansion Fermi operator projection Divide-and-conquer Density-matrix minimization Orbital minimization approach Optimal basis density-matrix minimization
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
24 / 37
LS3DF
Based on Divide-and-Conquer approach Divide a large system into smaller sub-domains that can be solved independently, then stitch the sub-domains back together again Classical electrostatic interactions are long-ranged, i.e. solve one global Poisson equation Requires minimal communication between the sub-domains Artificial boundary effects due to sub-dividing domains can be cancelled out Based on ideas from fragment molecular method We call our method Linear Scaling 3D Fragment or LS3DF
1 L.W.
1
Wang, Z. Zhao, J. Meza, LBNL-61691 (2006)
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
25 / 37
Parallelism Issues
Multi-core and many-core is the wave of the future Current algorithms for parallelism are difficult to parallelize with high efficiency Many quantum chemistry codes do not parallelize well for even medium scaled paralellism IBM Cell Blade. Same processor as found in a Sony Playstation 3
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
26 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
27 / 37
Electronic Structure Codes
ABINIT – www.abinit.org PARATEC – www.nersc.gov/projects/paratec PEtot – hpcrd.lbl.gov/linwang/PEtot/PEtot.html PWscf – www.pwscf.org NWChem – www.emsl.pnl.gov/docs/nwchem/nwchem.html Q-Chem – www.q-chem.com/ Quantum Espresso – www.quantum-espresso.org Socorro – dft.sandia.gov/Socorro VASP – cms.mpi.univie.ac.at/vasp Many, many more – apologies if your favorite code was not listed
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
28 / 37
KSSOLV Matlab package
KSSOLV Matlab code for solving the Kohn-Sham equations Open source package Handles SCF, DCM, Trust Region Example problems to get started with Object-oriented design - easy to extend Good starting point for students Beta version of KSSOLV available, ask one of us for more information!
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
29 / 37
Example: SiH4
a1 = Atom(’Si’); a2 = Atom(’H’); alist = [a1 a2 a2 a2 a2]; xyzlist= [ 0.0 0.0 0.0 1.61 1.61 1.61 ... ]; mol = Molecule(); mol = set(mol,’supercell’,C); mol = set(mol,’atomlist’,alist); mol = set(mol,’xyzlist’ ,xyzlist); mol = set(mol,’ecut’, 25); mol = set(mol,’name’,’SiH4’); ... isosurface(rho);
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
30 / 37
Convergence
[Etot, X, vtot, rho] = scf(mol); [Etot, X, vtot, rho] = dcm(mol);
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
31 / 37
Charge Density isosurface(rho);
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
32 / 37
Example: P t6 N i2 O
cell: 19.59 0.0 0.0 ... sampling size: n1 = 96, n2 = 48, n3 = 48 atoms and coordinates: 1 Pt 1.3 -0.180 -0.015 ... 7 Ni 8.4 0.003 3.069 8 Ni 8.5 7.998 7.762 9 O 14.9 2.644 1.511 number of electrons : 86 spin type : 1 kinetic energy cutoff: 60.0
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
33 / 37
Comparison of DCM vs. SCF
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
34 / 37
1
Goals and Motivation
2
Review of Equations
3
Plane Wave DFT Computational Components
4
Parallelization Strategies
5
Future Computational Challenges Linear Scaling Methods Parallelism Issues
6
Software Available Codes KSSOLV
7
Summary
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
35 / 37
Summary
Described most common PW DFT computational components Overview of standard numerical methods used Brief introduction into some parallelization issues Listed some computational challenges Introduced KSSOLV, Matlab package for solving KS equations
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
36 / 37
References
Aron J. Cohen, Paula Mori-Snchez, Weitao Yang, Insights into Current Limitations of Density Functional Theory, Science, Vol. 321. no. 5890, pp. 792 - 794 (2008). F. Gygi, R. K. Yates, J. Lorenz, E. W. Draeger, F. Franchetti, C. W. Ueberhuber, B. R. de Supinski, S. Kral, J. A. Gunnels, J. C. Sexton , Proceedings of the 2005 ACM/IEEE conference on Supercomputing (2005). G. Goedecker, Linear Scaling Electronic Structure Methods, Rev. Mod. Phys. 71, 1085 (1999). Curtis L. Janssen and Ida M.B. Nielsen, Parallel Computing in Quantum Chemistry, CRC Press, (2008).
Juan Meza (LBNL)
Algorithms and Computational Aspects of DFT Calculations
September 27, 2008
37 / 37