Computer Physics Communications 134 (2001) 23–32 www.elsevier.nl/locate/cpc
A domain decomposition molecular dynamics program for the simulation of flexible molecules with an arbitrary topology of Lennard–Jones and/or Gay–Berne sites Jaroslav Ilnytskyi ∗ , Mark R. Wilson Department of Chemistry, University of Durham, South Road, Durham, DH1 3LE, UK Received 12 May 2000; accepted 2 July 2000
Abstract We describe a new parallel molecular dynamics program GBMOLDD which uses the domain decomposition algorithm. The program is designed to simulate molecular systems composed of both spherically-symmetric and anisotropic sites connected via arbitrary topology and described by standard force fields. The program is oriented mainly towards simulations of liquid crystalline systems including mixtures of mesogenic molecules and mesogens confined in host media. Benchmark results are presented for a model liquid crystal dimer composed of two mesogenic units linked via a flexible alkyl chain. The benchmarks compare favorably to those obtained via a parallel replicated data algorithm. 2001 Elsevier Science B.V. All rights reserved. PACS: 02.70; 61.30.Cz; 82.20.Wt Keywords: Molecular dynamics; Atomistic simulations; Parallel computing; Domain decomposition; Liquid crystals
1. Introduction Molecular dynamics (MD) simulations [1] have become a powerful technique for studying the behaviour of complex liquids, liquid crystals and polymers. Adequate chemical descriptions of these systems typically require a detailed model for both inter and intramolecular interactions, and also require system sizes that are large enough to observe the effects of molecular ordering over large length scales. Until recently, both requirements were difficult to satisfy at the same time. For example, in liquid crystal systems recent studies * Corresponding author. Permanent address: Institute for Con-
densed Matter Physics, National Academy of Sciences of Ukraine, 1 Svientsitskoho Str., UA-290011, Lviv, Ukraine. E-mail address:
[email protected] (J. Ilnytskyi).
have concentrated on large-scale simulations of single site potentials [2] (variants of the Gay–Berne potential [3,4] are generally used), or on small systems of molecules (∼100) using a fully atomistic force field treatment for relatively short simulation times (∼1 ns) [4– 10]. There is a clear need to extend the latter to much larger systems and longer simulation times in order to study the structural and dynamical properties of liquid crystal phases using chemically-realistic models. There is also a need to extend the more idealized models of the Gay–Berne form to incorporate molecular flexibility and facilitate the simulation of liquid crystalline polymers, elastomers, dendrimers and lyotropic systems. A typical large-scale simulation of a force field based model requires considerable expenditure of
0010-4655/01/$ – see front matter 2001 Elsevier Science B.V. All rights reserved. PII: S 0 0 1 0 - 4 6 5 5 ( 0 0 ) 0 0 1 8 7 - 9
24
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
CPU time. The most cost-efficient solution involves the use of parallel programming techniques, where the total computing workload is distributed (in an ideal case—evenly) among a number of processing nodes. Several general algorithms have been developed to parallelize MD simulations, see reviews [11–15]. The replicated data (RD) algorithm is the easiest to implement, and often requires only minor changes to the scalar codes. Each node stores a copy of all atomic data in the whole system, but performs only its own part of calculational workload. The pay-off for the simplicity of this algorithm is poor scaling, both memory and communication cost scale as Nat (number of sites) independent on Np (number of processors). As a result, for large Np , communication costs dominates and the code becomes communications bound [2,14]. A second approach, the force decomposition (FD) algorithm is similar in design to the RD algorithm, but aims at reducing the amount of data flowing between nodes. This is done by using a permuted matrix for pair forces which allows the storage of less atomic data pper processor [15,16]. This algorithm scales as Nat / Np and is especially effective for simulation of moderately sized systems. A third approach, the domain decomposition (DD) algorithm divides the entire simulation box geometrically and assigns each subdomain to one node [2,12,14]. Individual processors must exchange information with their neighbors for the boundary particles, but afterwards each node is able to compute the forces and potentials in parallel. This algorithm is especially effective for the short-ranged interactions, scaling (in the ideal case) as Nat /Np if the cutoff is much smaller than the domain dimension. Unfortunately, several complications arise when applying the DD algorithm to complex molecular systems. For instance, each node must keep track of the bonded interactions inside its own subdomain, special care must be taken when bonded interactions are broken between different nodes, and bonded pairs should not appear in the list of non-bonded interactions [17, 18]. There are several implementations of the DD algorithm for MD simulations of molecular systems composed of the spherically-symmetrical sites (such as Lennard–Jones (LJ) particles), in particular for linear chains [19–22] and for more general cases [23– 26]. A further problem can arise when implementing the DD algorithm in highly inhomogeneous systems (for instance when studying condensation or surface
effects) where it is difficult to secure good load balancing. To tackle this problem several techniques of dynamic load balancing have been developed. For example, one can move subdomain boundaries while keeping the same (say, cuboidal) subdomain shape [26], or one can construct a point-centered non-cuboidal DD [27]. A number of programs have been developed to implement the RD and DD strategies for anisotropic systems. The GBMESO (RD) and GBMEGA (DD) codes [2] use these approaches for Gay–Berne (GB) type particles. These programs have already proved successful in modeling a range of liquid crystal phases (nematic, smectic-A, smectic-B, twist grain boundary) and studying several aspects of liquid crystalline behaviour [2,28,29]. The most recent addition to this group of codes was the GBMOL program which implements the RD strategy for composite systems composed of mixtures of Lennard–Jones (LJ) and anisotropic particles [30]. This allows for the treatment of molecular flexibility, which is crucial in many liquid crystalline systems. This approach has already proved useful in studying the properties of liquid crystalline dimers, main chain polymers and mesogens with two flexible alkyl chains [31–36]. The present paper reports the development of a DD version of GBMOL (GBMOLDD) capable of simulating systems composed of both (or solely) LJ and GB sites. The code is completely general and may be used for a wide variety of topologies including linear chains, rings, and complex networks composed of both types of particles. It has been developed for either single or multicomponent systems. The volume of message passing has been reduced considerably over typical RD methods providing code that is suitable for large-scale simulations. The use of Fortran-90 and standard message passing packages (MPI and PVM) has allowed successful porting to a range of parallel systems included dedicated parallel computers (Cray T3E) and a workstation clusters linked by fast Ethernet connections.
2. Description of the molecular model The GBMOLDD program is designed in a universal manner and is capable of working with molecules of virtually arbitrary topology. Each molecule consists of
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
either or both LJ and GB sites interacting via bonded forces (here and hereafter we will use the more general term “site” instead of atom, in so far as GB sites are not atoms in a proper sense). The use of GB sites is especially effective for the simulation of liquid crystalline molecules [2,35]. The important molecular topologies include: (1) single-site LJ or GB molecules, (2) complex polymeric molecules built of the LJ sites (including chains, rings, star-like molecules, etc.), (3) flexible liquid crystalline molecules built of the GB and LJ sites, (4) complex polymeric networks described as one complex molecule. The possibility of mixing different types of molecules in one system is useful for simulation of different binary or higher mixtures, diluted liquid crystalline systems, fluids or mesogens confined in complex networks, etc. The bonded interactions within the molecule m are described via the force field nb X 2 bon 1 (b) = li − li(0) Em 2 ki i=1
+ + +
na X i=1 nz X i=1 nt X
(0) 2
1 (a) 2 ki
θi − θi
1 (z) 2 ki
ζi − ζi
Ui(tors).
(0) 2
(tors1)
=
n=1
(n)
ci cosn φi ,
+ Vi(2) (1 − cos 2φi )
+ Vi(3) (1 + cos 3φi ) . Each bonded interaction marked i in Eq. (1) might have a unique set of force parameters. The pairs of sites bonded in a molecule are excluded from the non-bonded interactions, and a list of those excluded pairs is stored for each molecular topology to speed-up calculations [30]. All the other pairs of sites interact via different kinds of the non-bonded interactions. The interaction of a pair of LJ sites is described as a factorized LJ potential (LJ) (LJ) 12 (LJ) (LJ) 6 Ai Aj Ci Cj (LJ) − . Uij = rij rij The interaction of the GB sites is described via the GB potential [3] (GB) (GB) (GB) 12 (GB) 6 − ρij ρij , Uij = 4ij where
ν µ ij(GB) = 0(GB) (GB) (uˆ i , uˆ j ) 0(GB)(uˆ i , uˆ j , rˆ ij )
(GB)
ρij
=
σ0(GB) rij − σ (GB) (uˆ i , uˆ j , rˆ ij ) + σ0(GB)
.
(GB)
(1)
i=1
Ui
or a truncated Fourier series (1) (tors2) Ui = 12 Vi (1 + cos φi )
is the orientational dependent well depth and we have introduced the shorthand
Here, nb and na represent the number of bonds and bond angles, and nz is the number of additional angle deformations including GB sites (described below) in the molecule m. The values li , θi , ζi are, respectively the actual bond length, bond angle and additional angle, and li(0) , θi(0) , ζi(0) are the correspondent equilibrium values. The harmonic force (b) (a) (z) constants are given by ki , ki and ki . ζi is measured as the angle between the long-axis of the GB site and a bond from this GB site to another site. The torsional interaction U (tors) may have two forms: the Ryckaert–Bellemans potential [37] 5 X
25
is the side-side contact distance for two Here σ0 GB sites, σ (GB) (uˆ i , uˆ j , rˆ ij ) is the orientationally dependent distance at which Uij(GB) = 0. uˆ i and uˆ j are the unit vectors along the GB molecular axis and rˆ ij is the unit vector along the radius-vector between the centers of masses of the ith and j th GB sites. The exponents µ and ν define different parametrization of the potential [3]. The full expressions for (GB) , 0(GB) and σ (GB) can be found in Refs. [3,34]. The mixed interaction between the LJ and GB sites can be derived by the analogy from the generalized GB potential [4] and is represented by (LJGB) (LJGB) (LJGB) 12 (LJGB) 6 = 4ij − ρij ρij Uij where the ith site is of the LJ type and j th is of the GB type. µ (LJGB) (LJGB) (LJGB) = 0 (uˆ j , rˆ ij ) ij
26
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
is the effective well depth for the mixed interaction and (LJGB)
ρij
=
σ0(LJGB) rij − σ (LJGB)(uˆ j , rˆ ij ) + σ0(LJGB)
.
The unit vectors have the same meaning as for the pure GB interaction and σ (LJGB) and (LJGB) are given in Refs. [4,34]. Each non-bonded interaction is truncated at an appropriate cutoff and shifted to go smoothly to zero at the cutoff distance. The GBMOLDD program is designed to work with different parametrizations of the GB potential, but with only one type of GB site in a given system. Finally, the complete expression for the potential energy of the system is given by Etotal =
N mol X
bon Em +
m=1
+
N GB N GB X X i=1 j >i
NLJ NLJ X X
Uij(LJ)
i=1 j >i (GB)
Uij
+
NLJ N GB X X
(LJGB)
Uij
(2)
i=1 j =1
where Nmol , NLJ and NGB are, respectively the total number of molecules, the LJ and the GB sites in the entire system.
3. Domain decomposition algorithm To apply the DD algorithm the simulational box is divided spatially into equal-sized cuboidal subdomains (regions), one for each computing node. The subdomain dimension in any direction must be no less than the maximal cutoff for the non-bonded interactions. At the start of a simulation each node reads the molecular topology file and a copy of the topology for each different molecule type is stored on each node. At the next step, each node reads the coordinates, velocities, orientations and orientational derivatives (for GB sites only) and stores appropriate sites corresponding to its subdomain. Consequently, each site is stored on a certain node with some accidental index, depending on the order of appearance in the coordinates file. To be able to restore the connectivity information for each site in a system a unique absolute site number Ni is assigned. This is done in the following order. The sites of the first molecule with the topology index 1 are counted first. Then we count sequentially the sites for molecules of the same topology. The sites
of the molecules with topology index 2 are counted afterwards, and so on. The convenience of this order will be seen later when the algorithm for the bonded forces is described. Each node must store key information for all its ires resident sites, which includes: the absolute site number Ni for the particle, the site type (LJ or GB), the molecule index im , the topology index for the host molecule it , the number ni within the molecule, the site mass mi , the moment of inertia Ii (for GB sites only), the coordinates rE, the velocity vE, the orientation eE and the orientational derivative uE (the last two for the GB sites only). This is presented schematically in Table 1. Consequently, our DD algorithm requires a site to have three different numbers: its number within a host molecule ni , its absolute number Ni , and its index on a host node idx. The absolute cite number is needed to restore all the topologyrelated information for the site (which is nessessary for bonded forces), and idx is needed to retrieve the coordinates, velocities, etc. The absolute site number Ni can be easily retrieved as Ni (idx) if we know idx for the site (see, Table 1). However, the reverse operation of finding whether site Ni is on a particular node and what its host node index is requires a search through the Ni array. This is time consuming. Instead we introduce a long array of indices idxnod(1 . . . Nat )
(3)
Table 1 Schematic representation of data storage after reading the coordinates. (The meaning of stored data is explained in a text, the dots stands for instant data and the empty rectangulars do not nessessarily mean unused allocated memory.) idx
Ni
type
it
im
ni
mi
Ii
rEi , vEi , eEi , uEi
1
254
1
1
21
8
···
···
···
2
12
2
1
2
9
···
···
···
3
347
1
1
33
2
···
···
···
4
49
1
1
4
5
···
···
···
ires
320
2
1
33
9
···
···
···
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
that is initially filled-in when the sites are stored on a node. If site N resides on another node then idxnod (1 . . . Nat ) = 0. This provides a fast way of retrieving site parameters for any site, with absolute number N , which resides on a particular node.
4. Calculation of forces To calculate all the forces that act on the resident sites each node needs to know the positions and orientations of sites on neighboring nodes that lie within the cutoff distance of any resident site. To this end all the nodes need to exchange information about their boundary sites with the corresponding neighboring nodes. There are two common approaches to this problem that have been implemented elsewhere for similar systems [12,14]. In the first approach, the coordinate/orientation information is exchanged between neighboring nodes in both + and − directions along each spatial axis in turn [2]. In total, six blocks of data are sent/received by each node. At the end of the exchange each node has the complete information about all the surrounding sites within the extended box xi ∈ [−1i , bi + 1i ], and is capable of calculating all the forces acting on its resident sites. (Here bi is the subdomain extent along axis i and 1i is the extent of the export/import area.) In the second approach [18], the information is exchanged only in one direction (say +) for each axis, so each node has incomplete information for calculating all the forces. An additional exchange of some pair forces between the nodes is therefore required in the reverse direction to obtain all the pair forces for resident sites. The second approach gains in eliminating redundant force calculations for the boundary sites, but it requires more checking routines and additional communications to exchange the pair forces. However, if the volume of the export/import region is much less than that of the bulk region of each node as occurs in largescale simulations, then the fraction of time spent on redundant force calculations is reasonably small. In the GBMOLDD program we have implemented the first approach. Sites imported from other nodes are appended after the last resident site with index ires , so they will appear in force calculations in the same manner as all the resident sites. At the same time, by checking whether
27
i 6 ires or not we will know if a site is resident or temporarily imported. After exchanging of boundary sites the arrays will look schematically as shown in Table 2. For each exported site we pass only its absolute number Ni and site coordinates, velocities, etc. All other parameters (listed in Table 2) are easily retrieved from Ni when a site is reallocated on a new node. In practice, the extent of the export region 1i is chosen to be the same in all the spatial directions and is defined by the maximal cutoff among the three non-bonded interactions. Fortunately, as was noted in Ref. [20], the maximal range of the torsional interactions (as being the most long-ranged among all the bonded interactions) is always less than the cutoff length for typical non-bonded interactions. Hence, the existence of the bonded interactions do not require any extra communications above those already carried out for the pure non-bonded interactions case. The approach used for the export/import of sites in GBMOLDD program is a similar to one used in GBMEGA program for single GB sites and is described in detail in Ref. [2]. After the transfer of coordinates/orientations, the potential energy, the viral and the forces acting on each resident site are evaluated in parallel by each node. A global sum operation [2] is then required to obtain the total potential energy and viral for the system as a whole.
Table 2 Schematic representation of data storage after exchanging of boundary sites (the comment to Table 1 applies here as well) idx
Ni
type
it
im
ni
mi
Ii
rEi , vEi , eEi , uEi
1
254
1
1
21
8
···
···
···
2
12
2
1
2
9
···
···
···
3
347
1
1
33
2
···
···
···
4
49
1
1
4
5
···
···
···
ires
320
2
1
33
9
···
···
···
ires+1 125
1
1
12
5
···
···
···
···
20
2
1
2
10
···
···
···
imax
57
1
1
5
7
···
···
···
28
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
4.1. Non-bonded forces The evaluation of non-bonded forces is the most time-consuming operation in a MD simulation [1]. In particular, the search for interacting pairs (by calculating the distance of separation and comparing these with the cutoff radius) is quite expensive, being of the order N 2 , if no special algorithms are applied. To speed-up this process, either the linked-cells (LC) algorithm [11], or a Verlet list [38] (or both) can be used. To build-up the LC list for one type of nonbonded interaction, requires each region to be divided into cells. We use equally-sized cells of cuboidal shape. The extension of each cell in any direction should not be smaller than the cutoff radius. The linked lists of all the sites within each cell are built, and only interacting pairs within the neighboring cells need to be considered in computing pair interactions. Using this method, the number of extra pairs tested is reduced considerably. The main issue of achieving efficient LC lists is that the actual size of cells should be kept as close to the cutoff as possible. In our program there are, in general, three types of non-bonded interactions, Eq. (2). (If the sites of some type are absent then, obviously, there will be only one type of non-bonded interactions.) Typically, the cutoff radius for each non-bonded interaction is quite different [34]. A LC list based on the most long-ranged interaction will work inefficiently for more short-ranged interactions.
Therefore, three separate sets of the LC lists must be built in the GBMOLDD program, one for each kind of non-bonded interaction. This operation is performed in parallel by all the nodes. However, even in this case the efficiency of three separate LC lists will be influenced by the box dimensions. This is especially problematic when the number of cells per region is relatively small or the cell size is incommensurate with the dimensions of a region. In these cases, the effective volume for the pair interaction search expands enormously, and the speedup of the simulations with increasing number of nodes becomes unsatisfactory. In GBMOLDD we overcome these problems using the LC lists to build three Verlet neighbor lists for all the resident sites on a node. The absolute site numbers Ni are used to identify the neighbors, and this helps to keep the neighbor lists universal when sites move from one node to another. To make the neighbor lists effective, the neighbors within a wider shell than a cutoff r ∈ [0, rc + δr] are stored. Hence, the minimal cell size required for the previously built LC lists becomes rc + δr instead of rc when the LC lists were used alone. (We note that we do not need to compile a list of neighbors for nonresident, imported sites.) The operation of building the neighbors lists is performed in parallel on all the nodes and at the end of this process each node has a complete neighbor list for all its resident sites. This is shown schematically in Table 3. To evaluate nonbonded forces each node performs a loop over all its
Table 3 Schematic representation of data storage after neighbor lists have been built, A–B stands for the list of B-type neighbors for the A-type site (the comment to Table 1 applies here as well) idx
Ni
type
it
im
ni
mi
Ii
rEi , vEi , eEi , uEi
LJ–LJ
1
254
1
1
21
8
···
···
···
62, 65, 32 . . .
2
12
2
1
2
9
···
···
···
3
347
1
1
33
2
···
···
···
46, 97, 11 . . .
89, 40, 80 . . .
4
49
1
1
4
5
···
···
···
34, 13, 87 . . .
69, 70, 30 . . .
ires
320
2
1
33
9
···
···
···
ires+1
724
1
1
72
4
···
···
···
···
726
1
1
72
6
···
···
···
imax
729
2
1
72
9
···
···
···
GB–GB
LJ–GB/GB–LJ 39, 19, 70 . . .
90, 49, 19 . . .
49, 20, 10 . . .
17, 6, 45 . . .
32, 75, 72 . . .
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
resident sites idx ∈ [1 . . . ires ] and then a further loop over the neighbors of each site. 4.2. Bonded forces In the DD algorithm, a complete molecule can be, in general, split between two or more nodes. Sites are stored randomly on each node according to their spatial position regardless of molecular structure. In this sense one might say that the DD algorithm is more site-oriented than the RD one. In the later case each node has complete information about all the molecules (and consequently, about all the bonded forces) in a system. Therefore, loops over the number of bonds, bond angles, etc., may be parallelized directly by each node by computing the forces for alternate iterations [14,30]. In contrast, within the DD algorithm loops over bonds, bond angles, etc., become rather impractical due to constant changes of site residence over the nodes. In this case it is convenient to switch from the concept of molecular topology to the concept of site connectivity. That means that each site should be able to reproduce all its neighbors involved in different bonded interactions, and so, a loop over resident sites may be performed rather than loops over pairs, triplets, etc. To this end after the molecular topology is readin and stored, a site connectivity list for each site, participating in each different topology (and identified by its unique number in a molecule ni ) is build. This structure contains the list of all neighbors for given a site, sorted by shared common bonds, bond angles, and so on. This information is static during the simulation: the lists are built by all nodes once at the beginning of the program. As far as one may retrieve both the topology it and the site number ni instantly for any site stored under index i on the current node (via arrays it (i) and ni (i), see, Table 1), there is no need to build this connectivity list for every site in the system. In the scheme of site connectivity each site becomes independent in the sense that it can retrieve instantly all its neighbors that share bonded interactions. Effectively that means performing several transformations of the form n(i)
at connect
i −→ ni −→ (nj , nk , . . .) Nj =Ni +nj −ni
−→
idxnod (N)
(Nj , Nk , . . .)
−→ (j, k, . . .).
(4)
29
At the end of this operation the whole group (i, j, k, . . .) of sites participating in a certain bonded interaction is known by means of their indices describing where they are stored on the current node. There are no arithmetic operations (except Nj = Ni + nj − ni ) in these transformations, only the retrieval of data from different arrays, and so this whole process is computationally efficient. To evaluate bonded interactions the loop is performed over all the resident sites on each node with indices i ∈ [1 . . . ires ], and for each ith site all the neighbors sharing the bonded interactions are retrieved. To avoid double counting for the bonded interactions the following rule is used: the potential and force contributions to all the sites within the group of interacting sites (i, j, k, . . .) are calculated when the loop passes through the site with the lowest index. Hence, if any of the neighbors of ith site has index j < i then we skip the bonded interaction of this group as it has been calculated before. To avoid double counting of energetic terms on different nodes, the corresponding contribution is added if ith site is a resident on the current node. When a site moves to other node then it will participate in a loop with the other sites of the new host node, but the scheme of transformations, Eq. (4) will always produce the same groups of bonded interactions.
5. Integration and reallocation of sites A form of the leap-frog algorithm that is suitable for rotational motion is used to integrate the equations of motion [39]. After updating the coordinates some sites leave the current node and are transferred to a new host node, this leaves behind holes in the arrays used to store their data. The newcoming sites are stored either in empty holes, if any are available, or are appended after the last resident site ires . All three neighbor lists associated with the site being moved are transferred to the new host node. All neighbor lists are built by means of absolute site numbers of the neighbors Nj , Nk , . . . and are independent of the host node. At the end of the reallocation the maximal site displacement is found since the last update of the neighbor lists. If it is more than δr/2, the neighbor lists will be updated during the next MD step.
30
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
6. Benchmarks To test the efficiency of the GBMOLDD program we performed a series of simulations of a model liquid crystal dimer system composed of two mesogenic units linked via a flexible alkyl chain. The later is modeled as a chain of 8 LJ sites (Fig. 1). The force field for the model molecule is described in terms of nb = 9 bonds (7 are of LJ–LJ type and 2 of LJ–GB type), na = 8 angles, nz = 2 additional angles and nt = 7 torsional angles. The pairs of sites separated by three or less bonds are excluded from the non-bonded interactions. The force field parameters for the bonded interactions are described in detail elsewhere [31]. The simulated system contained 4096 model molecules (40,960 sites in total). The initial configuration was prepared in the following way. First we take a smaller system of 512 molecules equilibrated at the temperature T = 330 K by means of the NP T ensemble simulations [31]. The cutoffs rc of 9.8 Å, 16.5 Å, 18.9 Å (and the shell dimensions rc + δr of 13.7 Å, 17.7 Å, 20.3 Å) have been used for LJ–LJ, LJ–GB and GB–GB interactions, respectively. At this temperature the system was found to be in isotropic phase but quite near to the isotropic-smectic A phase transition [31]. This rather small system was replicated 8 times in a 2 × 2 × 2 manner and then equilibrated for 0.1 ns with a time step of 1 fs to destroy memory of the replication process. The initial configuration prepared in this way was used for several NV E-ensemble runs of GBMOLDD for 1000 MD steps with a time step of 1 fs. Each run was performed by a different number of processing nodes 8, 16, 32 and 64 on a Cray T3E parallel computer. (The simulation of this system on a larger number of processing nodes will be ineffective due to subdomain dimensions becoming compatible with rc .) The average frequency of neighbor list up-
dates was approximately once per 60 MD steps. Typically about 5–10 sites where reallocated from/to each node after each MD step. We estimated that the use of a Verlet neighbor list led to speed-ups of approximately 1.7× for each number of nodes tested. However, we note that these speed-ups are somewhat dependent on the temperature, system density and the latency of interprocessor communications. Simulations of a low density system on machines with high latency (e.g., workstation clusters) are likely to perform better without the additional communications cost associated with passing the list of stored neighbors when a site moves to another node. The same initial configuration was then used to carry out replicated data simulations using the GBMOL program. The comparison of the speed-up achieved by both programs with the increasing number of processors is given in Fig. 2. One may see that the speed-up with the GBMOLDD program is more linear then can be obtained with the GBMOL program. Even on a machine with as low a latency as the Cray T3E the reduction in communication costs made possible by the DD algorithm leads to substantial performance gains. On machines with higher latency including beowulf type systems that have a lower computation to communication speed ratio we would expect the performance improvement to be even more dramatic. However, there are several reasons why the speedup for the case of the DD program is not completely linear. The lost in scalability for the DD algorithm originates both from extra communication costs and extra calculations of bonded and non-bonded interactions within the boundary regions. The later should be kept as small as possible but the dimensions of the boundary regions are constrained by the largest cutoff for the non-bonded interactions. A typical value of cutoff for the GB–GB interaction is almost twice as large
Fig. 1. Force field model for a liquid crystal dimer. l, θ , ζ and φ describe, respectively bond length, bond angle, additional angle and torsional angle (numerical labels represent the site numbers).
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
Fig. 2. Benchmarks for a model liquid crystal dimer system simulated by GBMOLDD (I) and GBMOL (II) programs on 8, 16, 32 and 64 nodes of a Cray T3E. The dashed line is the ideal linear speed-up.
as the value for the LJ–LJ interaction. Thus, with the introduction of GB sites in our model, the boundary regions increase substantially. As the result, one can only achieve optimal scaling of the DD algorithm for the mixed LJ/GB force field model with much larger system sizes compared to the force fields with pure LJ sites. For instance, in the case of the model system tested here and for the number of nodes set to 64, the maximal rc is about half of subdomain dimension. This is far from the ideal case for the optimum working of the DD algorithm. However, we present the results of these simulations obtained for realistic system sizes and densities rather than attempting to increase the system size deliberately to gain better scalability. We note, however, that as the number of particles increases the efficiency of the DD algorithm improves further, and we expect dramatic speed-ups over the RD algorithm for large system sizes (>100,000 sites).
7. Conclusions In this report a new parallel molecular dynamics program GBMOLDD is presented. The program is based on the domain decomposition algorithm and is able to simulate a wide class of molecular topologies composed of spherically-symmetrical and/or anisotropic sites. The program uses several new features. To make the evaluation of the bonded interactions more independent of the node residence of sites, site connectivity data is compiled for each different site. The site universal number provides a fast way to
31
retrieve this connectivity data for any required site in a system. To achieve maximal speed-up when evaluating the non-bonded forces both linked-cells and Verlet neighbor lists are built. Atom-based neighbor information is transferred along with coordinates/velocities when a site changes its host node. This avoids rebuilding of the full neighbor lists on the next MD step. Benchmarks of the GBMOLDD program have been obtained from short runs on a well-equilibrated system of 4096 liquid crystal dimer molecules. The speedup of the simulations obtained on 8, 16, 32 and 64 nodes shows excellent improvements in performance as compared to the replicated data version of the program which was developed previously. The GBMOLDD program may be used for effective molecular dynamics simulations of various molecular systems. The cases include: atomic systems; polymer chains, rings and stars; complex networks of both spherically-symmetric and anisotropic sites. However, the program is particularly useful for liquid crystalline molecules where flexibility plays a key role in determining phase behaviour.
References [1] M.P. Allen, D.J. Tildesley, Computer Simulation of Liquids, Clarendon Press, Oxford, 1987. [2] M.R. Wilson, M.P. Allen, M.A. Warren, A. Sauron, W. Smith, J. Comput. Chem. 18 (1997) 478. [3] J.G. Gay, B.J. Berne, J. Chem. Phys. 74 (1981) 3316. [4] D.J. Cleaver, C.M. Care, M.A. Allen, M.P. Neal, Phys. Rev. E 54 (1996) 559. [5] D.J. Cleaver, D.J. Tildesley, Mol. Phys. 81 (1994) 781. [6] D.J. Cleaver, M.J. Callaway, T. Forester, W. Smith, D.J. Tildesley, Mol. Phys. 86 (1995) 613. [7] M.A. Glaser, N.A. Clark, E. Garcia, D.M. Walba, Spectrochim. Acta, Part A 53 (1997) 1325. [8] M. Yoneya, Y. Iwakabe, Liq. Cryst. 18 (1995) 45; Liq. Cryst. 21 (1996) 347. [9] D. Pavel, J. Ball, S. Bhattacharya, R. Shanks, V. Toader, V. Bulacovschi, N. Hurduc, Comp. Theor. Pol. Sci. 9 (1999) 1. [10] S.S. Patnaik, R. Pachter, Polymer 40 (1999) 6507. [11] D.C. Rapoport, Comp. Phys. Rep. 1 (1988). [12] W. Smith, Comput. Phys. Comm. 62 (1991) 229. [13] M.P. Allen, Theor. Chim. Acta 84 (1993) 399. [14] M.R. Wilson, in: Advances in the Computer Simulations of Liquid Crystals, P. Pasini, C. Zannoni (Eds.), Kluwer, Dordrecht, 2000, p. 389. [15] S. Plimpton, B. Hendrickson, J. Comput. Chem. 17 (1996) 326. [16] R. Murty, D. Okunbor, Parallel Comput. 25 (1999) 217.
32
J. Ilnytskyi, M.R. Wilson / Computer Physics Communications 134 (2001) 23–32
[17] K. Esselink, B. Smit, P.A.J. Hilbers, J. Comput. Phys. 106 (1993) 101. [18] D. Brown, J.H.R. Clarke, M. Okuda, T. Yamazaki, Comput. Phys. Comm. 74 (1993) 67. [19] K. Esselink, P.A.J. Hilbers, J. Comput. Phys. 106 (1993) 108. [20] D. Brown, J.H.R. Clarke, M. Okuda, T. Yamazaki, Comput. Phys. Comm. 83 (1994) 1. [21] M. Surridge, D.J. Tildesley, Y.C. Kong, D.B. Adolf, Parallel Comput. 22 (1996) 1053. [22] A. Jabbarzadeh, J.D. Atkinson, R.I. Tanner, Comput. Phys. Commun. 107 (1997) 123. [23] M.T. Nelson, W. Humphrey, A. Gursoy, A. Dalke, L.V. Kalé, R.D. Skeel, K. Schulten, Int. J. Sup. Comp. Appl. 10 (1996) 251. [24] K. Lim, S. Brunett, M. Iotov, R.B. McClurg, N. Vaidehi, S. Dasgupta, S. Taylor, W.A. Goddard III, J. Comput. Chem. 18 (1997) 501. [25] D. Brown, H. Minoux, B. Maigret, Comput. Phys. Commun. 103 (1997) 170. [26] S.G. Srinivasan, I. Ashok, H. Jônsson, G. Kalonji, J. Zahorjan, Comput. Phys. Commun. 102 (1997) 28; Comput. Phys. Commun. 102 (1997) 44. [27] R. Koradi, M. Billeter, P. Güntert, Comput. Phys. Commun. 124 (2000) 139.
[28] M.P. Allen, M.A. Warren, M.R. Wilson, Phys. Rev. E 57 (1998) 5585. [29] M.A. Bates, Chem. Phys. Lett. 288 (1998) 209. [30] M.R. Wilson, GBMOL(1996) A replicated data molecular dynamics program to simulate combinations of Gay–Berne and Lennard–Jones sites, University of Durham, UK. [31] M.R. Wilson, J. Chem. Phys. 107 (1997) 8654. [32] C. McBride, M.R. Wilson, J.A.K. Howard, Mol. Phys. 93 (1998) 955. [33] A.V. Lyulin, M.S. Al-Barwani, M.P. Allen, M.R. Wilson, I. Neelov, N.K. Allsopp, Macromolecules 31 (1998) 4626. [34] C. McBride, M.R. Wilson, Mol. Phys. 97 (1999) 511. [35] M.R. Wilson, in: Structure and Bonding 94, Springer, Berlin, 1999, p. 41. [36] M.R. Wilson, M.J. Cook, C. McBride, in: Advances in the Computer Simulations of Liquid Crystals, P. Pasini, C. Zannoni, M.R. Wilson (Eds.), Kluwer, Dordrecht, 2000, p. 251. [37] J.P. Ryckaert, A. Bellemans, Chem. Phys. Lett. 30 (1990) 123. [38] L. Verlet, Phys. Rev. 159 (1967) 98. [39] D. Fincham, CCP5 Quarterly 12 (1984) 47.