Introduction to Particle Methods for the Simulation of Continuous Systems Ivo F. Sbalzarini∗ April 27, 2007
Abstract We introduce Lagrangian particle methods and hybrid particle-mesh methods to discretize the governing PDE of continuously modeled systems. After presenting particle-based discretizations for continuous functions and differential operators, we consider two different strategies: pure particle methods and hybrid particle-mesh methods. The latter reduce the algorithmic complexity of long-range operators to linear time. For local interactions, we discuss cell lists and Verlet lists as efficient neighbor finding algorithms.
1
Introduction to continuum particle methods
Continuum particle methods are based on the approximation of smooth functions by integrals that are being discretized onto computational elements called particles. A particle p occupies a certain position xp and carries an extensive physical quantity ω p , referred to as its strength. The particle attributes – strength and location – evolve so as to satisfy the underlying governing equation in a Lagrangian frame of reference [10]. The simulation of the physical system thus amounts to tracking the dynamics of all N computational particles that carry the physical properties of the system that is being simulated. The dynamics of the particles are governed by sets of Ordinary Differential Equations (ODE) that determine the trajectories of the particles p and the evolution of their properties ω, thus: N X dxp = v(xp , t) = K(xp , xq ; ω p , ω q ), dt q=1
p = 1, . . . , N
N
X dω p = F (xp , xq ; ωp , ωq ) dt q=1
p = 1, . . . , N ,
(1)
where v p is the velocity of particle p. The dynamics of the simulated system are completely defined by the functions K and F that represent the physics of the problem. In pure particle methods, K and F emerge from integral approximations of differential operators; in hybrid particle-mesh methods, they entail solutions of field equations that are discretized on a superimposed mesh. The sums on the righthand side of above equations relate to quadrature (numerical integration) of some integral kernels K and F . In order to situate this in the bigger picture of numerical methods, consider the different solution strategies as outlined in Fig. 1. One way consists of discretizing ∗ Computational Biophysics Lab, Institute of Computational Science, ETH Z¨ urich. (http://www.cbl.ethz.ch). E-mail:
[email protected]. Written during a visiting Professorship at Ecole Normale Sup´ erieure, Paris, France.
1
governing equation du dt = f
/ analytic R solution u = f (t) dt
exact
(2)
accuracy quadrature error
discretization error consistency (1)
un −un−1 δt
= fn discretized equation
accuracy stability
+
# P u = wi f (xi ) / un = un−1 + δtfn discrete solution
Figure 1: Strategies to numerically solve a differential equation: (1) discretization of the differential equation followed by numerical solution of the discretized equations for the intensive property u, or (2) integral solution for the extensive property that is numerically approximated by quadrature. the equation for the intensive property u onto a computational mesh and then solve the resulting system of ODEs numerically. The discretization needs to be done consistently in order to ensure that the discretized equations describe the same physical problem as the original PDE, and the numerical solution of the resulting ODE is subject to stability criteria. An alternative route solves the PDE analytically using Green’s function1 . The resulting integral constitutes an extensive quantity, which is then discretized and computed by a quadrature. The advantages of this procedure are that the integral solution is always consistent (even analytically exact), and that numerical quadrature is always stable. The only property that remains to be concerned about is the solution accuracy. The price we pay for these advantages is that the quadrature corresponds to an N -body problem, making it of potentially O(N 2 ) complexity to solve. It is this high computational cost that long prevented the use of particle methods in computational science. Fortunately, this can be circumvented: If the functions K and F are local, the algorithmic complexity of the sums in Eq. (1) naturally reduces to O(N ) by means of an interaction cutoff. For long-range interactions such as electrostatics, fast algorithms such as multipole expansions [7] are available to reduce the complexity to O(N ) also in these cases. The issue of efficient parallel implementation of particle methods has also been successfully addressed [13], making them a viable alternative to grid-based methods. The first way of solution is sometimes referred to as “intensive method” and the second as “extensive method”.
1.1
Function approximation by particles
The approximation of a continuous function u(x) : Rd 7→ R by particles can be developed in three steps: • Step 1: Integral Representation. Using the Dirac δ-function identity, the function u can be expressed in integral form as Z u(x) = u(y) δ(y − x) dy for x, y ∈ Ω . (2) In point particle methods, this integral is directly discretized on the set of particles using a quadrature rule with the particle locations as quadrature 1 Note
that Green’s function always exists, even though it is not explicitly known in most cases.
2
ω1ζ ε ε
ω 2ζ ε
xp1
xp2
x
Figure 2: Two particles of strengths ω1 and ω2 , carrying mollification kernels ζǫ .
points. Such a discretization does however not enable the recovery of the function values at locations other than those occupied by the particles. • Step 2: Integral Mollification. Smooth particle methods circumvent this difficulty by regularizing the δ-function by a mollification kernel ζǫ = ǫ−d ζ(x/ǫ), with limǫ→0 ζǫ = δ, that conserves the first r − 1 moments of the δ-function identity (see Ref. [3] for details). The kernel ζǫ can be thought of as a cloud or blob of mass, centered at the particle location, as illustrated in Fig. 2. The core size ǫ defines the characteristic width of the kernel and thus the spatial resolution of the method. The regularized function approximation is defined as Z uǫ (x) = u(y)ζǫ (y − x) dy (3) and can be used to recover the function values at arbitrary locations x. The approximation error is of order ǫr , hence uǫ (x) = u(x) + O(ǫr ) ,
(4)
where r depends on the vanishing moments of the mollification kernel [3, 10]. For positive symmetric kernels, such as a Gaussian, r = 2 [3]. • Step 3: Mollified Integral Discretization. The regularized integral is discretized over N particles using the quadrature rule uhǫ (x) =
N X
ωph ζǫ (xhp − x) ,
(5)
p=1
where xhp and ωph are the numerical solutions of the particle positions and strengths, determined by discretizing the ODEs in Eq. (1) in time. The strength ωp of particle p is an extensive property that depends on the particular quadrature rule. The most frequent choice is to use the rectangular rule, thus setting ωp = u(xp )Vp where Vp is the volume of particle p. Using this discretization, we obtain the function approximation s s h h r h = u(x) + O(ǫ ) + O , (6) uǫ (x) = uǫ (x) + O ǫ ǫ where s depends on the number of continuous derivatives of the mollification kernel ζǫ [3, 10], and h is the inter-particle distance. For a Gaussian s → ∞. 3
From the approximation error in Eq. (6), we see that it is imperative that the distance h between any two particles be always less than their mollified support ǫ, thus maintaining h 2
where s = |x|/h. Hereby, h denotes the mesh spacing and x is the distance of the particle from the respective mesh node as illustrated in Fig. 3. For each particlenode pair we thus compute one weight 0 6 Wip 6 1 and the fraction ω m i = Wip ω p of the strength of particle p is attributed to mesh node i. The M4′ kernel is third-order accurate, exactly conserving moments up to and including the second moment. In higher dimensions, the kernels are tensorial products of this one-dimensional kernel. Their values can thus be computed independently and then multiplied to form the final interpolation weight for a given particle and mesh node: W (x, y, z) = Wx (x)Wy (y)Wz (z). Meshes can serve two different purposes in hybrid PM methods: they allow the fast evaluation of long-range interactions through the corresponding differential operators and they can be used to reinitialize the particle locations to regular positions in order to maintain the overlap condition of Eq. (7). 5
W (x1 )
h
x1 Figure 3: Particle-to-mesh interpolation in one dimension. The interpolation weight is computed from the mesh spacing h and the distance x between the particle and the mesh node. For each particle-node pair a different weight is computed. The particle strength is then multiplied by the weight and assigned to the mesh node.
In the first application, the long-range components of the differential operator need to be solved for on the mesh. In many physical applications, the fields to be discretized are gradient fields, such that the corresponding long-range differential operator is given by the Laplace operator. The most common equation to be solved on the mesh thus is the Poisson equation. This can efficiently be done using finite differences [14], most efficiently implemented as a multigrid algorithm [15], or fast Poisson solvers that are based on Fast Fourier Transforms. Reinitialization using a mesh is needed if particles tend to accumulate in certain areas of the computational domain and to sparsify in others. In such cases, the function approximation will cease to be consistent as soon as the condition of Eq. (7) is violated in the sparse regions of space. As a remedy, the particle positions are periodically reset to regular locations. This is conveniently done by interpolating the particle properties to the nodes or a regular mesh as outlined above, discarding the present set of particles, and generating new particles at the locations of the mesh nodes. This procedure is called remeshing.
2 2.1
Implementation notes Neighbor lists for short-range interactions
The evaluation of Particle-Particle (PP) interactions is a key component of particle methods and PM algorithms. Sub-grid scale phenomena can require local particlebased corrections, differential operators can be evaluated on irregular locations [6], or the main dynamics of the system can be governed by particle interactions. Short-range interactions are characterized by a rapidly decaying interaction potential (kernel), such that only nearby neighbors significantly contribute to the righthand side of Eq. (1) of a given fixed particle. Typically, this is exploited by only considering interaction within a certain cutoff radius rc around each particle. The specific value of rc of course depends on the interaction law, i.e., the kernel function K and F in Eq. (1), and has to be chosen to meet the desired simulation accuracy. The most conservative choice of rc is given by the radius where the interaction contributions fall below the machine epsilon of the computer. Considering only interactions within an rc -neighborhood naturally reduces the algorithmic complexity of the PP evaluation from O(N 2 ) to O(N ) with a pre-factor that depends on the value of rc and the local particle density. This requires, however, that the set of neighbors to interact with is known or can be determined with at most O(N ) cost. Searching over all other particles to find the neighbors would obviously annihilate all benefits. Another point of possible optimization concerns the symmetry of the PP interaction. By construction (kernel-based interactions), the effect of particle i on particle j is the 6
2 −1
3
4
111 000 000 111 000 111 000 111 000 111 0 000 111 000 111
−4 −3
1
3
111 000 000 111 000 111 000 111 000 111 000 111 0 000 111
2
4
3 111 000 000 111 000 111 000 111 000 111 0 000 111
1
4 1
−2
1 0 1 0
sub−domain
2
0 1 1 0 0 1
1 0 1 0
(a)
sub−domain
(b)
sub−domain
(c)
Figure 4: Cell-cell interactions. (a) For non-symmetric particle-particle interactions, all interactions are one-sided. (b) In traditional symmetric cell list algorithms, interactions are required on all but one boundary. (c) Introducing diagonal interactions (1–3), the cell layers also become symmetric and do not overlap with any neighboring layers. This results in less communication, better scaling in memory and simpler algorithms (e.g. when considering connected particles). The two-dimensional case is depicted. See text for interactions in the three-dimensional case.
same (with a possible sign inversion) as the effect of particle j on i. Looping over all particles and computing the interactions with all neighbors within the cutoff radius thus amounts to computing every interaction twice. The computational cost can be reduced by a factor of (at most) two if interactions are evaluated symmetrically. This means that we only loop over half of the neighbors and attribute the interaction contributions to both participating particles at once. But how to make sure that all interactions are considered exactly once? We will see that shortly, but before we need to introduce the algorithm that is used for neighbor search. Two standard methods are available to find the interaction partners in O(N ) time: cell lists and Verlet lists. In cell lists, particles are sorted into equisized cuboidal cells, whose size reflects the interaction cutoff. Each cell contains a (linked) list of the particles that reside inside it. Interactions are then easily computed by sweeping through the lists. This has to be done for the cell in which the center particle lies as well as for immediately adjacent cells in order to ensure that all possible interactions within the rc cutoff are considered. If the interactions are computed asymmetrically, all adjacent cells around the center cell have to be considered. For symmetric interactions, it is sufficient to loop over half of the neighboring cells. Figure 4 illustrates the cell-cell interaction in the asymmetric and symmetric cases. In Fig. 4c, diagonal interactions are introduced in order to further reduce the amount of memory overhead and communication for symmetrically evaluated particle interactions by 33% in two dimensions and 40% in three dimensions [13]. In parallel implementations, the diagonal interaction scheme moreover has the advantage of constant communication overhead with increasing number of processors, amounting to a parallel shift in the speedup plot rather than a convex curve. Given the cells are numbered in ascending x, y, (z), starting from the center cell with number 0, the symmetric cell-cell interactions are: 0–0, 0–1, 0–3, 0–4, and 1–3 in two dimensions, and 0–0, 0–1, 0–3, 0–4, 0–9, 0–10, 0–12, 0–13, 1–3, 1–9, 1–12, 3–9, 3–10, and 4–9 in three dimensions. For spherically symmetric interactions, cell lists contain up to 27/(4π/3) = 81/(4π) ≈ 6 times more particles than actually needed. Verlet lists [16] are available to reduce this overhead. For each particle they involve an explicit list of all other particles it has to interact with. The radius of the Verlet sphere is usually chosen to be the interaction cutoff plus a certain safety margin (skin). The lists need to be rebuilt as soon as any particle has moved farther than this safety margin. In three dimensions, interactions using Verlet lists are at most 81/ 4π(1 + skin)3 times faster than cell 7
list interactions. In order to ensure overall O(N ) scaling, Verlet lists are constructed using intermediate cell lists. Memory requirements usually limit the application of Verlet lists.
2.2
Fast Multipoles for long-range interactions
Fast Multipole Methods (FMM) constitute an algorithm to reduce the algorithmic complexity of long-range particle-particle interactions from O(N 2 ) to O(N ), albeit with a large pre-factor. This large pre-factor causes FMM implementations to be several orders of magnitude slower than the corresponding hybrid particle-mesh method. Nonetheless, FMM methods are very appealing from a conceptual point of view, so we include a short description hereafter. This chapter is taken from the excellent Semester Thesis by Bettina Polasek [12] 2.2.1
The Barnes-Hut Algorithm
Josh Barnes and Piet Hut [1] introduced a hierarchical O(N log(N )) force-calculation algorithm, to improve the efficiency of gravitational N -body problems. The domain is divided into regular cells, each having half the size (length, breadth and width) of its parent cell. The resulting structure is an oct-tree in three dimensions. These cells store information about the center of mass and the total mass of their particles. The tree is then traversed for each target point. The decision, whether the cell is far enough to use as an approximation, or the cell needs to be parsed further down into his children is made using the criteria d/D < θ
(13)
where d is the diagonal of the cell currently being processed, D is the distance from the cell’s center of mass to the target point and θ is a fixed accuracy parameter ∼ 1. 2.2.2
The Greengard-Rohklin Formulation
Leslie Greengard and Vladimir Rohklin presented several papers about the Fast Multipole Method [2]. Their formulation using spherical harmonics found a wide acceptance and are therefore also used in this implementation. When cell-cell interactions are used, the complexity of the algorithm is further reduced to O(N ). 2.2.3
Definition of the Potential
In order to introduce the subject of multipole expansion, we consider a unit source at a point Q(x′ ) (Fig. 5). This unit source induces a potential at point P (x) given by: 1 (14) Ψ(P ; Q) = Ψ(x; x′ ) = |x − x′ | where the spherical coordinates of x and x′ are given by (r, θ, φ) and (ρ, α, β) respectively. The distance between the two points is denoted by R and the angle between vectors x and x′ is denoted γ. For µ = ρr ≤ 1, we use the generating formula for Legendre polynomials Pn so that the potential is expressed as: Ψ(P ; Q) =
∞ X ρn Pn (u) n+1 r n=0
8
(15)
Figure 5: Coordinate definition for multipole expansion. This equation describes the far field potential at a point P , due to a charge of unit strength centered at Q. To obtain a computationally tractable formulation we proceed to express the Legendre functions in terms of spherical harmonics : Pn (cos γ) =
n X
Yn−m (α, β) Ynm (θ, φ)
(16)
m=−n
and the spherical harmonics in terms of the associated Legendre polynomials: s (n − |m|)! |m| Ynm (θ, ϕ) = P cos (θ) eimφ (17) (n + |m|)! n The following numerically stable formulas are used for calculations: m (n − m)Pnm (u) = (2n − 1) u Pn−1 (u) − (n + m − 1)Pn−2 (u)
(18)
m (u) = (−1)m (2m − 1)!(1 − u2 ) 2 Pm
(19)
and m
Summarizing then we see that the far field representation of the potential induced by a collection of Nv sources centered around Q with coordinates (ρi , αi , βi ) is expressed as: ∞ X n X Mnm m Y (θ, ϕ) (20) Ψ(P ; {qi }) = rn+1 n n=0 m=−n where
Mnm =
Nv X
qi ρni Yn−m (αi , βi )
(21)
i=1
2.2.4
Translation of Multipole Expansions
Once the multipole expansions due to a collection of sources have been computed, one is usually interested in computing the far-field coefficients of the same collection 9
expanded about some other point, say S, so that the potential would be represented as: ∞ X n X Lm n Ψ(S; P ) = Y m (Θ, Φ) (22) σ n+1 n n=0 m=−n where (σ, Θ, Φ) are the spherical coordinates of the distance between points P and S. This defines the translation problem for multipole expansion for fast multipole methods. We make use of the following definitions of harmonic outer functions Onm and inner functions Inm : (−1)n i|m| Ynm (θ, ϕ) Am rn+1 n
(23)
n m Inm (x) = Inm (r, θ, ϕ) = i−|m| Am n r Yn (θ, ϕ)
(24)
Onm (x) = Onm (r, θ, ϕ) =
where
(−1)n Am n = p (n − m)!(n + m)!
(25)
More specifically for |x| ≥ |x′ | we obtain: Onm (x
′
−x )=
∞ X
′
n X
′
′
′
′
′
m+m (−1)n In−m (x′ )On+n ′ ′ (x)
(26)
n′ =0 m′ =−n′
and for the inner expansion we get that: Inm (x
′
−x )=
∞ X
′
n X
′
m−m (−1)n In−m (x′ )In−n ′ ′ (x)
(27)
n′ =0 m′ =−n′
so now we may express the equation for the potential induced at point x from a unit source at point x′ as: 1 = Ooo (x − x′ ) |x − x′ |
(28)
In order to further exhibit the formulation of theses translation operators we consider the configuration shown in Fig. 6 We wish to determine the potential induced by a collection of sources within a sphere centered at x0 and having radius R0 (denoted as s(x0 , R0 )) to a collection of points/sources within a sphere s(x3 , R3 ). This is achieved in the following steps: (i) We compute a set of multipole expansion coefficients Cnm for the far-field representation of a set of sources distributed within s(x0 , R0 ). Then the farfield representation of the field induced by this cluster of particles at a location x is given by: ∞ X n X −m (x − x0 ) (29) Ψ(x) = Cnm Om n=0 m=−n
where
Cnm =
Nv X
qi ρni Ynm (αi , βi )(−1)−n i−|m| Am n
i=1
10
(30)
Figure 6: Sketch for the translation of the multipole expansion. (ii) We translate the Outer expansion about x0 to an Outer expansion about x1 (child to parent ): ∞ X l X ψ(x) = Dlj Ol−j (x − x1 ) (31) l=0 j=−l
where Dlj
=
l X
min(j+l−n,n)
X
(j−m)
Cnm I(l−n) (x1 − x0 )
(32)
n=0 m=max(j+n−l,−n)
(iii) Once the coefficients of the multipole expansions have been computed in the sphere s(x3 , R3 ) we perform a local expansion using Eq.(22) to compute the potential at the individual points. 2.2.5
Wang’s Variable Order Revised Binary Treecode
Wang [17] proposes to use a fixed expansion order pmax to calculate the multipole expansion coefficients for all source boxes, since, this step costs only a small part of the total CPU time. Whereas a variable expansion order should be used to calculate the target values, the bottleneck of the tree code. The expansion order is given in terms of separation distance D between a field leaf and a source box and diagonal length d of the source box. D ≥ θ(p) (33) d where θ(p) depends on the accuracy required and expansion order p. However it can be approximated using θ(p) = 0.75 + 0.2 ∗ (pmax − p) + 0.05 ∗ (pmax − p)2
f or p = 4, ..., pmax
(34)
We use a fixed expansion order in this implementation. Further on Wang proposes to use an adaptive binary tree rather than an oct-tree.
11
References [1] Josh Barnes and Piet Hut. A hierarchical O(N logN ) force-calculation algorithm. Nature, 324:446–449, 1986. [2] H. Cheng, L. Greengard, and V. Rokhlin. A fast adaptive multipole algorithm in three dimensions. J. Comput. Phys., 155:468–498, 1999. [3] G.-H. Cottet and P. Koumoutsakos. Vortex Methods – Theory and Practice. Cambridge University Press, New York, 2000. [4] P. Degond and S. Mas-Gallic. The weighted particle method for convectiondiffusion equations. Part 1: The case of an isotropic viscosity. Math. Comput., 53(188):485–507, 1989. [5] P. Degond and S. Mas-Gallic. The weighted particle method for convectiondiffusion equations. Part 2: The anisotropic case. Math. Comput., 53(188):509– 525, October 1989. [6] Jeff D. Eldredge, Anthony Leonard, and Tim Colonius. A general deterministic treatment of derivatives in particle methods. J. Comput. Phys., 180:686–709, 2002. [7] L. Greengard and V. Rokhlin. The rapid evaluation of potential fields in three dimensions. Lect. Notes Math., 1360:121–141, 1988. [8] F. H. Harlow. Particle-in-cell computing method for fluid dynamics. Methods Comput. Phys., 3:319–343, 1964. [9] R. W. Hockney and J. W. Eastwood. Computer Simulation using Particles. Institute of Physics Publishing, 1988. [10] Petros Koumoutsakos. Multiscale flow simulations using particles. Annu. Rev. Fluid Mech., 37:457–487, 2005. [11] J. J. Monaghan. Extrapolating B splines for interpolation. J. Comput. Phys., 60:253–262, 1985. [12] Bettina Polasek. A fast multipole method (FMM) implementation for the parallel particle mesh library (PPM). Semester thesis, Institute of Computational Science, 2005. [13] I. F. Sbalzarini, J. H. Walther, M. Bergdorf, S. E. Hieber, E. M. Kotsalis, and P. Koumoutsakos. PPM – a highly efficient parallel particle-mesh library for the simulation of continuum systems. J. Comput. Phys., 215(2):566–588, 2006. [14] G. D. Smith. Numerical Solution of Partial Differential Equations: Finite Difference Methods. Oxford Appl. Math. Comput. Sci. Ser., Oxford, 3rd edition, 1985. [15] U. Trottenberg, C. Oosterlee, and A. Schueller. Multigrid. Academic Press, San Diego, 2001. [16] L. Verlet. Computer experiments on classical fluids. I. Thermodynamical properties of Lennard-Jones molecules. Phys. Rev., 159(1):98–103, 1967. [17] Qian Xi Wang. Variable order revised binary treecode. J. Comput. Phys., 200:192–210, 2004.
12