The fast multipole method

SIAM J. NUMER. ANAL. Vol. 38, No. 1, pp. 98–128

c 2000 Society for Industrial and Applied Mathematics

THE FAST MULTIPOLE METHOD I: ERROR ANALYSIS AND ASYMPTOTIC COMPLEXITY∗ ERIC DARVE† Abstract. This paper is concerned with the application of the fast multipole method (FMM) to the Maxwell equations. This application differs in many aspects from other applications such as the N-body problem, Laplace equation, and quantum chemistry, etc. The FMM leads to a significant speed-up in CPU time with a major reduction in the amount of computer memory needed when performing matrix-vector products. This leads to faster resolution of scattering of harmonic plane waves from perfectly conducting obstacles. Emphasis here is on a rigorous mathematical approach to the problem. We focus on the estimation of the error introduced by the FMM and a rigorous analysis of the complexity (O(n log n)) of the algorithm. We show that error estimates reported previously are not entirely satisfactory and provide sharper and more precise estimates. Key words. fast multipole method, electromagnetic theory, scattering, iterative method, matrix compression algorithms, computational aspects AMS subject classifications. 65F10, 65R20, 65Y20, 65D05, 78A45, 78A99 PII. S0036142999330379

Introduction. The fast multipole method (FMM) is a tool that has been widely applied in several fields with great success: N-body problem, Laplace equation, quantum chemistry (see [14], [15], [7, 8, 6]). Another field of application for this method is the compression of dense matrices for the Maxwell equation in an integral formulation. This formulation consists of the computation of the currents on the surface of the scattering object. A Galerkin method leads to a linear system with a dense matrix. This system can be solved with an iterative method. The FMM will allow fast matrix vector products and an optimal compression of the matrix. This matrix, after compression, will usually be stored in core instead of out of core, as is done if one uses a Gauss method or computes all the terms in the matrix. Thus the FMM allows for the resolution of problems of unprecedented size. However, the FMM is a complex tool especially when applied to the Maxwell case and implies several approximations. The error that is introduced in the computation has to be a priori evaluated. Moreover we will prove in this paper that there is an essential link between the error estimations and the asymptotic complexity of the algorithm. This can be easily understood since the precision is linked to the number of terms used in the multipole expansion, which in turn will affect the total complexity of the code. Few studies were devoted so far to a serious estimation of the complexity of the FMM. As far as the author knows, regarding theoretical aspects of the FMM applied to the Maxwell equation, there are two articles: the first by Epton and Dembart [12] and the second by Rahola [20] (also an early paper by Rokhlin [21]). However, those two papers do not entirely solve the problem of the error estimation as we will see in section 7. The article by Epton and Dembart gives very detailed formulae regarding the translation operators used in the FMM and the way to diagonalize them. They give ∗ Received

by the editors February 24, 1999; accepted for publication (in revised form) December 16, 1999; published electronically June 20, 2000. http://www.siam.org/journals/sinum/38-1/33037.html † Laboratoire d’Analyse Num´ erique, Université Paris 6, 4 place Jussieu, 75252 Paris, France. Current address: Center for Turbulence Research, Building 500, Stanford, CA 94305-3030 (darve@ ctr.stanford.edu, www.ann.jussieu.fr/∼ darve). 98

FAST MULTIPOLE METHOD

99

the key formulae for the FMM and we will rely on this article for their proof. They involve the Wigner 3-j symbol which appears as the value of the integration of a spherical harmonic against the product of two other spherical harmonics. They also express those translation operators in a convolutional form. However, this article does not address the problem of the error introduced by the FMM in the solver nor does it address the problem of the complexity of the algorithm. The second article by Rahola is much closer to our own. The formalism used is totally equivalent to ours. He gives two error estimations corresponding to Theorems 1 and 2. It should be mentioned that the first estimation given by Rahola (Lemma 4.1) can be found in a few other papers: for example, see [21, Theorem 3.3] and also [18, sections 3.1.1 and 3.2]. The original Rokhlin’s paper [21], which was the first to discover the single step FMM (for Maxwell), included straightforward error estimations. All those papers are based on the asymptotic behavior of the Bessel functions ez l e 1 jl (z) ∼ , 2l + 1 2 2l + 1 1 yl (z) ∼ − z

ez −l 2 , e 2l + 1

as l → +∞ (formula (9.3.1) in [1]). Using those formulae has two serious drawbacks. First they lead to inequalities which are valid when l is large compared to |z|. They are not adapted to our case where we need estimates for l ∼ |z| to study the convergence. Moreover they lead to bounds which involve constants depending on z. Our results involve true constants and lead to bounds and estimates which can be used in a real life code, for all configurations (shape, size) of the scattering object. This implies a total control of the error and ensures that the code has an optimal complexity. We believe it to be a considerable improvement for real life applications. Sections 1 and 2 give a quick general overview of the FMM applied to the Maxwell equation. Section 3 gives the main theorems and results proved in this paper. In section 4 we deduce the previous results from a more general theorem which is our key tool to study the convergence of the FMM. It is proved in the two following sections 5 and 6. 1. General presentation of the multipole method. 1.1. Maxwell equation and integral formulation. We want to solve the Maxwell equation with an integral formulation for a monochromatic signal of freκ quency 2π . Several integral formulations can be derived. In the case of a perfectly conducting object the most popular ones are EFIE and MFIE: electric/magnetic integral formulation. With a Galerkin method, the EFIE formulation leads to a linear system Ax = b. We denote the basis functions of the Galerkin method by Jl (x) for edge l and Jk (x) for edge k: ı bl = κZ def

1 ≤ k, l ≤ n, Jl (x)Etinc (x) dγ(x), Γ

where Etinc is the tangential part of the incident field

100

ERIC DARVE def

A(k,l) =

Γ

Γ

G(x, y)(Jk (x)Jl (y) − 1/κ2 ∇Γ Jk (x)∇Γ Jl (y)) dγ(x)dγ(y), def

G(x, y) = ı λ κ ∇Γ (1)

hl Z

ıκ|x−y| ıκ (1) def e . h0 (κ|x − y|) = 4π 4π|x − y|

Notation. √ : −1, : wavelength, : wave number κ = 2π/λ, : two-dimensional divergence on the surface, : spherical Hankel function of the first kind, : Z = 120π.

While some integrals must be computed partly analytically when the edges k and l are too close, for most pairs k, l a simple Gauss quadrature gives a very good approximation of A(k,l) . Using an iterative method (conjugate gradient, GMRES . . . ) and a preconditioner (such as SPAI; see [16] and [2]) the computer intensive part of the resolution of the linear problem reduces to the computation of matrix-vector products of the kind eκ|xi −xj | def σj = M(i,j) σj 4π|xi − xj | j j for all i such that |xi − xj | is larger than some threshold. The points xi are the Gauss points on the surface of the object. 1.2. Fast multipole method. Fast products. A fast matrix-vector product may be performed if the matrix M can be approximated, with an error η, as M(i,j) = (1) fim gjm + η. m≤M

Such a formula leads to a matrix-vector product with 2M n floating operations:   M(i,j) σj = fim  gjm σj  + ηO(σ). j≤n

m≤M

j≤n

Similar decomposition obtained for the Maxwell case. We consider the (1) kernel h0 (κ|x − y|). We denote by Pm the Legendre polynom defined by Pm (x) =

−1m dm (1 − x2 )m . 2m m! dxm (1)

We call “Gegenbauer” series (or sequence) the following expansion of h0 (κ|x − y|) (see the addition theorems in [1, Formulae (10.1.45) and (10.1.46)]): (1)

h0 (κ|x − y|) = where

+∞

(2m + 1)h(1) m (u)jm (v)Pm (w),

m=0

u = κ|U | v = κ|V | w = cos γ

101


x

U

γ V

X

Y y

Fig. 1. Two clusters of centers X and Y .

and where U and V are any two vectors such that x−y =U −V

and

|V | < |U |

and γ is the angle between vector U and V . The quadrature points x and y are gathered inside parallelepipedic clusters of centers X and Y . The vectors U and V are V = y − Y + X − x, U = X − Y. Thus condition v < u is satisfied if the two clusters are sufficiently far away from each other. See Figure 1. We define the truncated Gegenbauer series by ˜ M (κ|x − y|) def h = 0

M

(2m + 1)h(1) m (u)jm (v)Pm (w).

m=0

˜ M (κ|x − y|) on which the FMM is based. We recall the representation of h 0 ˜ Lemma 1.1 (representation of hM 0 (κ|x − y|)). For all *, there is a set of points sk on the unit sphere S 2 , and a set of weights 0 < ωk < 1, 0 ≤ k ≤ K, such that def

TM (sk , U ) = (2)

˜ M,K (κ|x − y|) def h = 0

M 2m + 1 (1) h (κ|U |)Pm (cos(sk , U )), 4πım m m=0 K

ωk eıκsk ,y−Y TM (sk , U )eıκsk ,X−x ,

k=0

˜ M ˜ M,K (κ|x − y|)

< *,

h0 (κ|x − y|) − h 0 where ˜ M (κ|x − y|) def = h 0

M

(2m + 1)h(1) m (u)jm (v)Pm (w).

m=0

This is proved using the decomposition of a plane wave as a sum of Bessel functions jm . Such a formula is given in several papers related to the Maxwell FMM: [21, 22, 10, 19, 23, 17]. Equation (2) is similar to (1) and thus leads to a fast algorithm for matrix-vector products. For a more complete description of the algorithm, see [20].

102

ERIC DARVE

2. The multilevel algorithm. 2.1. General idea. The multilevel scheme starts with an oct-tree decomposition of the quadrature points on the surface of the scattering object. We here give a rough description of the algorithm. It will have to be detailed later. • Step 1: We denote by P one of the clusters at the lowest level, and compute its radiation function, defined by def

FkP =

σi eıκsk ,XP −xi ,

xi ∈P

where XP is the center of cluster P . • Step 2: All those contributions are added to the father, one level up in the oct-tree, i.e., if cluster Q is the father of P then we perform FkQ = FkQ + FkP eıκsk ,XQ −XP . • Step 3: The “transfers” are performed, i.e., if clusters P and Q are such that they are not neighbors but their fathers are neighbors, we perform Q P GP k = Gk + Fk TM (sk , XQ − XP ).

• Step 4: All those contributions are added to the sons, i.e., if Q is the son of P we perform Q P ıκsk ,XQ −XP . GQ k = Gk + Gk e

• Step 5: We sum on S 2 to obtain the final value of the matrix vector product for each point xi : ωk eıκsk ,xi −XQ GQ k. k

2.2. Fast interpolation methods. As it will be proved later the functions FkP contain more and more oscillatory modes as we go up in the tree. This is due to the multiplication by the factors eıκsk ,XQ −XP of frequency roughly κ|XQ − XP |. Thus to represent accurately those functions more and more points must be taken on S 2 to account for this oscillatory behavior. An interpolation must be performed on FkP to compute the value of the radiation function at nodes s k , 1 ≤ k ≤ K from its value at nodes sk , 1 ≤ k ≤ K. It will be proved later that the number of significant frequencies in FkP is proportional to the size of the cluster. We will not write down the constant in the following for clarity. Let |P | be equal to κ times the diameter of the cluster. Among the four steps described in the previous subsection, two steps have to be modified: • Step 2: The radiation function is sampled at |P |2 points on S 2 . We have to compute its value at |Q|2 points on S 2 , which is the number of points used to represent the function F Q . In order to do so, we compute the Fourier coefficients on S 2 for the |P |2 first frequencies. A backward Fourier transform on S 2 is then performed to obtain |Q|2 terms on the unit sphere. This is done prior to the multiplication by eıκsk ,XQ −XP . • Step 4: In a similar fashion, GP is sampled at |P |2 points after step 3 of the FMM. However, after the multiplication by eıκsk ,XQ −XP we can represent

103


accurately the function GQ with fewer sample points. This subsampling is done in order to reduce the CPU. The higher frequencies can be eliminated, as we go down the tree from one level to the other. This is done by performing a forward Fourier transform on S 2 and then a backward Fourier transform using the first |Q|2 frequencies only. In practical implementations |Q| is equal to 2|P |, for example, if we partition the surface with an oct-tree. With this assumption we now give an algorithm to perform those transformations (interpolation or subsampling) with a complexity of O(|P |2 log |P |). It consists of the following operations. The reference for the algorithm can be found in [3]. Let us consider a discrete set of values F (sk ). The set sk is such that there exists angles φi and θj such that sk = (φi , θj ) in spherical coordinates. • Perform a fast Fourier transform with φi , 1 ≤ i ≤ I and obtain Fm (θj ) = F (φi , θj )e−ımφi . i

• Forward and backward θ transform. We denote by P¯nm the associated Legendre function with the proper normalization: d (l − m)! m (l − m)! m ¯ Pl (x) = (−1) (1 − x2 )m/2 m Pl (x) Plm (x) = (2l + 1) (2l + 1) (l + m)! (l + m)! dx and by *lm the following sequence: *lm =

l 2 − m2 . 4l2 − 1

The θ transform is defined by:   N  F˜m (θ˜k ) = Fm (θj )ωj P¯nm (cos θj ) P¯nm (cos θ˜k ), n=|m|

j

the integer N being the highest degree for the spherical harmonics P¯Nm+1 (cos θ˜k )P¯Nm (cos θj ) − P¯Nm (cos θ˜k )P¯Nm+1 (cos θj ) = Fm (θj )ωj *m , N +1 cos θ˜k − cos θj j

thanks to the Christoffel–Darboux formula. Fm (θj )ωj P¯ m (cos θj ) ˜ ˜ Fm (θk ) N ¯ m (cos θ˜k ) = P N +1 ˜k − cos θj *m cos θ N +1 j − P¯Nm (cos θ˜k )

Fm (θj )ωj P¯Nm+1 (cos θj ) j

cos θ˜k − cos θj

.

Such a computation can be seen as two matrix-vector products with matrix 1 . ˜ cos θk − cos θj A one-dimensional FMM can be applied successfully to this type of matrix yielding an optimal algorithm (see the article by Yarvin and Rokhlin for onedimensional FMMs [26] and also [25] and [11]).

104

ERIC DARVE

• A backward FFT is performed 2π ˜ ˜ ımφ˜l . F˜ (φ˜l , θ˜k ) = Fm (θk )e I m 3. Asymptotic complexity. 3.1. Error analysis. Our goal here is to prove the asymptotic complexity of the FMM. We will see that the proof relies on two error estimates stated in Propositions 1 and 2. Proposition 1 (number of terms in the transfer function). For all v, u ∈ R+ such that √ u 5 ≥ . v 2 For all m ≥ 0, we denote the Gegenbauer sequence by def

Gm = (2m + 1)jm (v)h(1) m (u)Pm (w). Then for all * < 1, there exists four constants C1 , C2 , C2 , and C3 such that def

m ≥ M = C1 + C2 v +

C2

−1

log v + C3 log *

=⇒

+∞

Gk ≤ *.

k=m

Thus we can use M terms to compute the transfer function. Remark. This a much better result than is usually√given in earlier papers since this condition is independent of u, provided that u/v ≥ 5/2 and depends only on v. Moreover the different constants appearing in the previous theorem are independent of all the parameters of the problems. We thus proved a uniform convergence of the series for all the points in a cluster of a given size. The second proposition can be proved once the previous one has been established. It corresponds to the error due to the fact that we can only consider a finite number of directions sk in our computation. It gives us a criteria to calculate the number of points needed on S 2 . Proposition 2 (number of directions on S 2 ). We suppose that the hypotheses of Proposition 1 are satisfied. We denote by * > 1 some bound on the error. There exists constants C1 , C2 , C2 , and C3 and constants D1 , D2 , D2 , and D3 such that, if for some level we keep M terms in the transfer function (see Proposition 1) and take enough points on the sphere to integrate exactly the first K spherical harmonics on S 2 , then M ≥ C1 + C2 v + C2 log v + C3 log *−1 K ≥ D1 + D2 v + D2 log v + D3 log *−1

(1) ˜ M,K (κ|xi − xj |)

≤ *. =⇒ h0 (κ|xi − xj |) − h 0 Practically a good choice for K is K ≥ 2M . Remark. Using a Gaussian quadrature for θ and a uniform quadrature for φ we know that to integrate exactly the first K = 2M spherical harmonics we need 2M 2 points on S 2 . From Proposition 2 we deduce that the number of points sk , #{sk } ≈ 2M 2 , is roughly proportional to the number of points in the cluster for


105

a uniform distribution of points on the surface (see below for the definition of this term). The two previous theorems were already found empirically by others (see [22], [10], and [24], for example). From numerical tests it is found that with M = v+3. log(v+π) the error is 10−3 while with M = v + 5. log(v + π) the error is 10−6 . This is in total accordance with the theory. As a simplification we will suppose from now on that * has been set to some value and we will consider it as a constant. The reader will note that in practice the terms proportional to v are the dominant ones. 3.2. Asymptotic complexity theorem. Definition 3.1. We call “uniform distribution” of points on the surface of an object a distribution such that there exists two constants γ2 > γ1 > 0 such that the distance between one point and its closest neighbor is greater than γ1 λ and less than γ2 λ where λ is the wavelength. Typically γ1 and γ2 are greater than 1/20 and less than 1/5. Another algorithm developed by Greengard addresses the problem of γ1 tending to zero. This is the case if there is an accumulation of points somewhere on the surface (see [13]). This constant will fix some of the other constants appearing later in the paper. In particular, in the theorem below, we have the following. Theorem 3.2 (asymptotic complexity theorem). We consider a set of points uniformly distributed on the surface of an object. Those points are partitioned using an oct-tree until the leaves of the tree have a size of a fraction of a wavelength (typically κ × Diam is of order 1). Then using a fast interpolation/subsampling as described above we can perform a matrix-vector product M σ with complexity O(n log2 n), where n is the size of the matrix and M(i,j) =

eıκ|xi −xj | . 4π|xi − xj |

There is a minimum size for the leaves of the tree. Under a certain critical size the (1) roundoff errors due to the explosion of the functions hl prevent the use of the FMM on a finite precision computer (even with 16 digits). Moreover the interaction between close neighbors at the lowest level in the tree has to be computed in a traditional way. With a uniform distribution of points this additional cost is O(n) (the constant depends on the critical size of the leaves and on γ1 and γ2 ). However, this cost is in O(n2 ) if there is an accumulation of points somewhere in the mesh. Note that the constant in O(n log2 n) also depends on γ1 and γ2 . The true expression is actually (we denote by R the diameter of the object) O(n) + O (κR)2 log2 (κR) , O(n) being the only function of γ1 and γ2 . We have n ∼ (R/λ)2 ∼ (κR)2 . Proof. The hypothesis of uniform distribution of the points is equivalent to the fact that |P |2 is proportional to the number of points contained in P .

106

ERIC DARVE

We now consider all the operations to be performed for all the clusters of a given level. For one cluster, the transfer + interpolation + subsampling require |P |2 log |P | operations (see Propositions 1 and 2). Thus for the clusters at a given level the complexity is proportional to n log n. This is possible only with the use of fast interpolation methods. Therefore the total complexity is n log2 n. 4. Gegenbauer sequence theorem. We now prove Propositions 1 and 2. They can be deduced from the following key theorem. Theorem 4.1 (Gegenbauer sequence theorem). For all v, u ∈ R+ such that 1 < v < u and m ∈ N. Let p be the smallest integer greater than v − 1/2, q the smallest greater than u + 7/2. We denote by fk (v) and gk (u, v) the two following sequences: fk (v) =

gk (u, v) =

1− 1−

v/(2k + 1) , 2v 2 1 − (2k+1)(2k+3) (2k−1)(2k−3) u2

(2k+1)(2k+3) v2

−1

−

(2k−1) (2k−5)

−

2(2k+1) (2k+5)

−1 .

For some constant C > 1  m  C   (2m + 1) fk (v)    v 5/6  p+1    m  C (2m + 1) fk (v) |Gm | ≤  (uv)5/6  p+1   q m   C v m−q    (2q + 1) fk (v) gk (u, v)   (uv)5/6 u p+1 q+1

for p ≤ m ≤ u − 12 , for u −

1 2

< m < q,

for q ≤ m.

√ Moreover, with the additional hypothesis that v/u ≤ 2/ 5 < 1, k ≥ q ⇒ 0 < gk (u, v) < 1. A similar theorem may be proved for the case v < 1. The mathematical tools which have to be used when v < 1 are very similar to the ones presented here. However, this case is of limited practical interest since in numerical experiments v is always greater than 1. The behavior of Gm is thus the following. For u m v, we observe a fast convergence. Indeed, fk (v) decreases very rapidly to 0. For k ∼ v, fk (v) ∼ v/k, while v . This is thus a fast convergence region. for k v, fk (v) ∼ 2k For m u, Gm converges to zero like a geometrical sequence of ratio v/u. 4.1. Proof of the error analysis. First we√notice that √ if the points on the surface are partitioned using an oct-tree then v/u ≤ 3/2 < 2/ 5. Thus gk (u, v) < 1. Proof of Proposition 1. We now suppose that v ≥ 1. 1. Convergence before u + 72 ≥ m. The first point consists in the following bound which simplifies the expression. Suppose that we have u + 72 ≥ m ≥

107


2v + 1. Then v 8 , 7 2m + 1 m−1 8C 1/6 |Gm | ≤ v fk (v). 7 fm (v) ≤

k=p+1

We have the following lemma. Lemma 4.2. We denote −1 def . F (α) = 1 + (α − 1)(2 + α−1 ) F (α) is a monotonous decreasing function with F (1) = 1. We have |fk (v)| ≤ F (v/k). The proof is left to the reader. We now have a bound on |Gk | for all q > k ≥ 2v + 1: |Gk | ≤

(3)

8C 1/6 v F (2)k−2v . 7

Before using (3), which applies to q > k, we bound the series for k ≥ q. Lemma 4.3 (bound for k ≥ q). There exists a constant C such that

+∞

Gk ≤ CGq−1 .

k≥q

We see that the proof is complete if we bound Gk for all k < q. Indeed, for all m < q

q−1

+∞

Gk ≤ (1 + C)

Gk .

k=m

k=m

The bound for all k < q is obtained below after the proof of Lemma 4.3, with the help of (3). Proof. For k ≥ q, thanks to Theorem 4.1, the sequence Gk may be bounded by v k−q Gk ≤ CGq−1 . u Thus

Since

+∞

1

Gk

≤ C1 Gq−1

1 −

k≥q

v u

H

m≤M

˜ M,K (κ|xi − xj |) is the contribution to node i from node j as computed with where h 0 the FMM. Since C D |Im,n − Im,n | ≤ 8π,

111


(1) ˜ M,K (κ|xi − xj |)

h0 (κ|xi − xj |) − h 0

(1)

≤ η + 2(M + 1)(2M + 1) max(1, |hM (u)|)

(2n + 1)|jn (v)|.

n>H

Using Proposition 3 we obtain |jn (v)| ≤

n C

v 5/6

fk (v)

p+1

for n + 12 ≥ v. We now suppose that M is such that M ≥ 2v. 1. Case M + 12 ≤ u. Let us suppose, moreover, that M + M + 12 > u will be considered below. If M + 12 ≤ u, (1) |hM (u)jn (v)|

≤

n C

v 5/6

1 2

≤ u. The case

fk (v).

p+1

Using the fact that H ≥ M and that (2M − 1)(2M + 1) fM −1 (v)fM (v) ≤ C. v 2 , (2n + 1)fn (v) ≤ C. v, we get, for some constant C1 , (1)

2(M + 1)(2M + 1) max(1, |hM (u)|) ≤

C1 v 3 v 5/6 n>H

≤ C2 v 13/6

(2n + 1)|jn (v)|

n>H

fk (v)

p+1≤kH

p+1≤kH

(M + 1)

v M −q u

n>H

 

n>H



q



q

fk (v)

k=p+1

fk (v)

k=p+1

n−1

n−1

 fk (v)

k=M +1



fk (v) .

k=M +2

With the same type of arguments as for the proof of Proposition 1, the proof is complete. 5. Preliminary results: Spherical Bessel functions. In this section and the following we will prove the Gegenbauer convergence theorem. We start with some basic observations regarding continued fractions (Lemmas 5.2 and 5.1). Then we will observe that jk (v)/jk−1 (v) as well as yk (u)/yk−1 (u) are equal to simple continued fractions (Lemmas 5.3 and 5.4). In the end (Propositions 3 and 4) we will derive from the previous results two bounds on jk (v)/jk−1 (v) and yk (u)/yk−1 (u). These bounds are valid, respectively, for k v and for k u. Thus in the last subsection we will look for bounds in the case, respectively, when k v and when k u. The Gegenbauer convergence theorem will be deduced from those results in the next section. Continued fractions. We denote by b0 +

a1 a2 a3 ··· b1 + b2 + b3 +

the following continued fraction: a1

b0 +

a2

b1 + b2 +

a3 b3 + · · · .

We need two technical lemmas to proceed in our proof. Lemma 5.1. Let ai , bi be two sequences such that ai ≤ 0, bi ≥ 0. Then (7) (8)

for all i ≥ 0

bi bi+1 + 4ai+1 ≥ 0 ai+p ai+1 1 ··· ≥ bi . ⇒ for all i ≥ 0 and p ≥ 0 bi + bi+1 + bi+p 2

113


Proof. It is true for k = i + p − 1 and p = 1 since bi+p−1 +

ai+p 1 ≥ bi+p−1 . bi+p 2

Then bi+p−2 +

ai+p−1 ai+p−1 ai+p ≥ bi+p−2 + 2 bi+p−1 bi+p−1 + bi+p 1 1 ≥ bi+p−2 − 2 bi+p−2 ≥ bi+p−2 . 4 2

By recurrence from j = i + p − 1 to j = i, the lemma can be proved. Lemma 5.2. Let ai , bi and a i , b i be four sequences of R such that 

 ai ≤ 0 and ai ≤ 0,

bi ≥ 0 and bi ≥ 0,   bi bi+1 + 4ai+1 ≥ 0 and b i b i+1 + 4a i+1 ≥ 0. Then

ai ≤ a i bi ≤ b i

⇒ b0 +

a1 a2 a3 a a a

· · · ≤ b 0 + 1 2 3 · · · . b1 + b2 + b3 + b1 + b2 + b3 +

Proof. Lemma 5.2 is a direct consequence of Lemma 5.1. We suppose that the hypotheses of the lemma are true for ai , bi and a i , b i . For all x ≥ 0 and x ≥ 0 such that x ≤ x , ai x ≤ a i x , a i x ≤ a i x, ⇒ ai x ≤ a i x ⇒

ai a

≤ i . x x

Thus for all i ≥ 0 bi +

ai a

≤ b i + i . x x

For all n ≥ 0, we have that bn−1 + an /bn ≤ b n−1 + a n /b n . Because (8) is true for ai , bi and a i , b i , a recurrence from j = n − 1 to j = 0 proves the lemma. 5.1. Continued fraction: jn /jn−1 . Lemma 5.3. For all n ∈ N and z ∈ C we have jn (z) = jn−1 (z)

1

1

1

2n+1 2n+3 2n+5 z − z − z −

··· .

Proof. We prove it by recurrence. j0 (z) =

j1 (z) 1 1 sin z sin z cos z and j1 (z) = 2 − ⇒ = − . z z z j0 (z) z tan z

114

ERIC DARVE

The function tan is equal to the following continued fraction: tan z =

1 1 1 z z2 z2 · · · = −1 ··· . 1− 3− 5− z − 3z −1 − 5z −1 −

Thus the lemma is true for n = 1: j1 (z) 1 1 = −1 ··· . j0 (z) 3z − 5z −1 − Using recurrence relation jn−1 (z) + jn+1 (z) =

(9)

2n + 1 jn (z) z

we have jn−1 (z) jn+1 (z) 2n + 1 + = jn (z) jn (z) z from which the lemma can be proved for all n. 5.2. Continued fraction: yn /yn−1 . A similar lemma can be proved for yn . Lemma 5.4. For all n ∈ N and z ∈ C we have yn−1 (z) = yn (z)

1

1

1

2n−1 2n−3 2n−5 z − z − z −

··· .

Proof. We prove it by recurrence. y0 (z) =

1 − cos z − cos z sin z y1 (z) = + tan z. − and y1 (z) = ⇒ z z2 z y0 (z) z

Thus the lemma is true for n = 1: 1 1 y1 (z) 1 1 = z −1 + −1 · · · = z −1 − ··· . y0 (z) z − 3z −1 − −z −1 − −3z −1 − Using recurrence relation (10)

yn−1 (x) + yn+1 (x) =

2n + 1 yn (x) x

we prove the lemma for all n. 5.3. Continued fraction with constant coefficient. Lemma 5.5. For all t ∈ R+ 2 cosh t −

1 1 · · · = exp t. 2 cosh t− 2 cosh t−

Proof. If we denote by fn fn = b0 +

an a1 ··· . b1 + bn


115

Then, we can define two sequences An and Bn by A−1 = 1 and A0 = b0 , B−1 = 0 and B0 = 1, An = bn An−1 + an An−2 , Bn = bn Bn−1 + an Bn−2 ,

(11) (12) such that

fn =

An , Bn

with, for all n, bn = 2 cosh t, an = −1. We have the following relation Bn = An−1 . Thus with (12) fn = 2 cosh t −

1 ⇒ lim fn = exp t. fn−1

We have an immediate corollary. Corollary 1. For all x ∈ R x ≥ 1 ⇒ x−1 −

1 1 · · · = 0. x + x−1 − x + x−1 −

Proof. Consider x = exp t, 2 cosh t = x + x−1 . Thus x + x−1 −

1 1 · · · = x. x + x−1 − x + x−1 −

5.4. Bound on jn /jn−1 . Proposition 3. For all n and x ∈ R+ (2n + 1)(2n + 3) α(2n + 1) −1 −1 G(α, n, x) = 1 − − , x2 2n + 5 1 n+ ≥x⇒ 2 jn (x) x x G(1, n, x) ≤ ≤ G(2, n, x), 2n + 1 jn−1 (x) 2n + 1 x/(2n + 1) x/(2n + 1) jn (x) ≤ ≤ . x2 2x2 j (x) 1 − (2n+1)(2n+3) 1 − (2n+1)(2n+3) n−1 The second inequality is not as good as the first one but the expression is more simple. Proof. We consider the continued fraction of Lemma 5.3. It satisfies the hypothesis of Lemmas 5.2 and 5.1. Hence the bounds.

116

ERIC DARVE

5.5. Bound on yn /yn−1 . Proposition 4. For all n and x ∈ R+

F (α, n, x) = 1 −

(2n − 1)(2n − 3) x2 n − 7/2 ≥ x ⇒

−

α(2n − 1) −1 , 2n − 5

yn (x) 2n − 1 2n − 1 F (2, n, x) ≤ ≤ F (1, n, x). x yn−1 (x) x Proof. We cannot apply Lemmas 5.1 and 5.2 this time because the continued fraction (Lemma 5.4) 1

1

1

2n−1 2n−3 2n−5 z − z − z −

···

does not satisfy the hypothesis. However, let p ∈ N be the smallest integer such that p − 12 ≥ x. We have the following recurrence relation which can be found in [1, Equation (10.1.22)]: dy

k (x) k yk+1 (x) dyk k yk (x) − (x) = yk+1 (x) ⇒ − dx = . x dx x yk (x) yk (x)

The inequality k +

1 2

≥ x implies that yk (x) ≤ 0. Therefore, if k +

1 2

≥ x,

dyk (x) ≥ 0. dx The study of the zeros of Yν and Yν shows that π

1 dyk (x) = Yk+ 1 (x) − Yk+ 21 (x) 2 dx 2x 2x yk+1 (x)/yk (x) ≥ 1 ⇔

is always positive for k + 12 ≥ x. Therefore the sign of yp (x)/yp−1 (x) is constant and yp (x)/yp−1 (x) ≥ 1. For all n ≥ p + 3, thanks to Lemma 5.4, 2n − 1 yn (z) = − yn−1 (z) z

1

1

2n−3 2n−5 z − z −

···

1 . yp /yp−1

Now we can apply Corollary 1: yn (z) 2n − 1 − = yn−1 (z) z

1

1

2n−3 2n−5 z − z −

···

1 1 ··· . yp /yp−1 + yp−1 /yp − yp /yp−1 + yp−1 /yp −

Because yp (x)/yp−1 (x) ≥ 1, we have yp /yp−1 + yp−1 /yp ≥ 2. Therefore the hypothesis of Lemmas 5.2 and 5.1 are satisfied. Thus yn (x) 2n − 1 F (1, n, x). ≤ yn−1 (x) x and yn (x) 2n − 1 F (2, n, x) ≤ . x yn−1 (x)


5.6. Bound on jn and yn for n ≤ x. Proposition 5. For all z ∈ C and n ∈ N   |jn (z)| ≤ 1 n + ≤ |z| ⇒  2 |y (z)| ≤ n

117

√

n |z| ,

√

n |z| .

In particular they are bounded by 1. Proof. From Formulae (10.1.26) and (10.1.27) in [1], we have  jn (z) = 1 π/zMn+ 1 (z) cos θn+ 1 (z), 2 2 2 y (z) = 1 π/zM 1 (z) sin θ 1 (z), n n+ 2 n+ 2 2 where

n 1 1 (2n − k)!(2n − 2k)! 1 2 π/z Mn+ . 1 (z) = 2 2 z2 0 k![(n − k)!]2 (2z)2n−2k

We now prove that

1

2 n

π/z Mn+ 1 (z) ≤ 2 2 |z|2 from which Proposition 5 results. Suppose the following lemma has been proved. Lemma 5.6. For all z ∈ C and k, n ∈ N such that 0 ≤ k ≤ n, (13)

n+

(2n − k)!(2n − 2k)! 1 1 ≤ |z| ⇒ ≤ 1. 2 k![(n − k)!]2 (2|z|)2n−2k

With this lemma n

1

2 1

1

π/z Mn+ 1 (z) ≤ 2 2 |z|2 0

which proves the first part of Proposition 5. Moreover, |z|2 ≥ |z| − 12 for all z ∈ C. Since |z| ≥ n + 12 ≥ 12 , |z| − 12 is positive, the following identity holds: |z| − |z|

1 2

≤ 1.

Therefore jn and yn are bounded by 1. Proof of Lemma 5.6. First we bound (2n − 2k)!/[(n − k)!]2 . 2n−k (n − k)!(2n − 2k − 1) · · · 5.3.1 [(n − k)!]2 (2n − 2k − 1) · · · 5.3.1 . = 4n−k (2n − 2k) · · · 6.4.2

(2n − 2k)!/[(n − k)!]2 =

118

ERIC DARVE

Thus (2n − 2k)!/[(n − k)!]2 ≤ 4n−k . We can bound the expression in (13): (2n − k)!(2n − 2k)! 1 (2n − k)! ≤ . k![(n − k)!]2 (2|z|)2n−2k k!|z|2n−2k We can rewrite the product as 2n−2k k+p (2n − k)! = 2n−2k k!|z| |z| 1

=

n−k 1

=

n−k 1

=

n−k 1

=

n−k 1

n−k k+p n+p |z| 1 |z|

(n + p)(n + 1 − p) |z|2 (n +

1 2

+ (p − 12 ))(n + |z|2

1 2

− (p − 21 ))

n−k n + 1 2 (n + 12 )2 − (p − 12 )2 2 ≤ . |z|2 |z| 1

The lemma is proved. 6. Convergence of the Gegenbauer series. The main trick of the proof is the following. Instead of trying to prove a uniform bound on jm and ym , which is a very difficult task, we use the following “expansion”: jm = jp ym = yq

m k=p+1 m k=q+1

jk , jk−1 yk . yk−1

Lemma 6.1 will provide us with a good bound on jp (v) and yq (u). Then, using Propositions 3 and 4, we obtain bounds on jm (v) and ym (u). Finally, √ with the next subsection we will prove that gk (u, v) is smaller than 1 when v/u ≤ 2/ 5. The proof of Theorem 4.1 results from all these elements. As a summary of our results, we notice that, with the bounds of Propositions 3 and 4, we proved that jm (v)ym (u) has a very simple behavior. For v m u, we can roughly state that jm (v)ym (u) decrease faster than a “geometrical” series of ratio v/(2m + 1). This a very fast convergence region. On the contrary for m u, this sequence behaves like a geometrical series of ratio v/u which is a much slower convergence. This is summarized on Figure 4. Lemma 6.1 (estimation of jp (v) and yq (u)). With the hypothesis of Theorem 4.1, there exists some constant C > 0 such that 1 |jp (v)| ≤ C 1−1/6 , v 1 |yq (u)| ≤ C 1−1/6 . u

119

FAST MULTIPOLE METHOD E(u-1/2) Gegenbauer sequence

u

v

0

q integer m

Geometrical ratio in v/(2m+1) No convergence

Geometrical ratio in v/u

Transition region

Fig. 4. Convergence of the Gegenbauer sequence.

Proof. From [1], we have two lemmas. Lemma 6.2 (bound on Jν (ν) and Yν (ν)). The following asymptotic estimations are valid for ν → +∞:

(14)

    Jν (ν) ∼

21/3

1

32/3 Γ(2/3)

ν 1/3

   Yν (ν) ∼ −

,

1 21/3 . 1/6 1/3 3 Γ(2/3) ν

We also have (15)

Jν (ν + zν 1/3 ) = 21/3 ν −1/3 Ai(−21/3 z) + O(ν −1 ), Yν (ν + zν 1/3 ) = −21/3 ν −1/3 Bi(−21/3 z) + O(ν −1 ),

where Ai and Bi are the Airy functions. The second set of equations give sharper asymptotic estimates. Recall that  jn (z) = 1 π/zJn+ 1 (z), 2 2 y (z) = 1 π/zY 1 (z). n n+ 2 2 For all v ∈ R+ , v ≤ n + 12 , Jn+ 12 (v) is an increasing function (see zeros in 9.5.2 of [1]) of v. Thus, there exists C > 1 1 −1/3 1 ≤C n+ Jn+ 12 (v) ≤ Jn+ 12 n + 2 2 with (14). For jp (v) we have, for some C |jp (v)| ≤

C v 1−1/6

.

For |yq (u)| we cannot use the same proof since this is a decreasing function of u and thus cannot be bounded by |yq (q + 12 )|. We need a more powerful estimate for yn (z) given by (15). Let us consider z = −5 for example. We denote ν = q + 12 . Then ν + zν 1/3 ≤ q − 9/2

(ν ≥ 1).

120

ERIC DARVE

We now use the monotonicity of Yν and for some C > 1

π |Y 1 (q − 9/2)| since q − 7/2 ≥ u > q − 9/2 2|u| q+ 2 π |Y 1 (ν + zν 1/3 )| ≤ 2|u| q+ 2 π C |u|−1/3 ≤ ≤ C

. 2|u| |u|1−1/6

|yq (u)| ≤

6.1. Bound on gk (u, v). In a code based on a oct-tree decomposition, v is equal to the diameter of a cluster and u is the smallest distance between the center of two “sufficiently” far apart clusters: such clusters cannot have in common a point or an edge. Thus if we choose to pave the space with cubes, we readily have v≤

√

3D,

u ≥ 2D,

where D is the length of an edge. Hence a realistic bound on u/v is indeed 2 u ≥√ . v 3 √ We prove that g (u, v) ≤ 1 if u/v ≥ 5/2 which is a sufficient condition in our √ k case, since √23 ≥ 25 . Bounding gk (u, v) by 1 is equivalent to (2k + 1)(2k + 3) (2k − 1)(2k − 3) − v2 u2 24k − 20 2(2k + 1) 2k − 1 =1− 2 . ≥ − 2k + 5 2k − 5 4k − 25 We prove a result that is sufficient to complete the proof (2k + 1)(2k + 3) (2k − 1)(2k − 3) − ≥1 v2 u2 since 24k − 20 ≥ 0. 4k 2 − 25 This is proved by 1 (2k + 1)(2k + 3) (2k − 1)(2k − 3) 1 − ≥ (2k − 1)(2k − 3) 2 − 2 2 2 v u v u k − 3/2 2 u2 −1 (since k − 3/2 ≥ u) ≥4 u v2 u2 √ ≥ 4 2 − 1 ≥ 1 (since u/v ≥ 5/2). v


121

7. Previous error estimations. We state Rahola’s lemma (see Rahola’s article [20]). Lemma 7.1 (Rahola’s Lemma 4.1). For each κ and u there exists a constant C such that the truncation error of the dipole potential is bounded by *T ≤

C v L+1 , u−v u

where L is the number of terms in the truncated expansion, u > v, and L > u. Similar lemmas can be found in [21] and [18]. Our new results improves on Rahola’s for two reasons: • Rahola’s error estimation is valid when the number of poles in the multipole expansion, L, is greater than the distance between two clusters. However, our estimation is valid as soon as the convergence takes place, that is, as soon as L is greater than the diameter of a cluster. This is an important improvement since we proved that the error and the convergence does not depend on the distance between the two clusters but rather on their size. Let us introduce some notations: v = κ × Diameter of a cluster, u = κ × Distance between two clusters, v < u. Grosso modo, we can truncate the expansion after log , u + C log(v/u) in Rahola’s theorem, L>

C.v + C log * in Theorem 1. • A major improvement is that in our theorems all the constants are true constants and depend on no parameter of the problem (position of the points, distance between two clusters, or size of the cluster). In Rahola’s article this limitation was emphasized. Lemma 4.1 stated that there exists a “constant” C such that the error in the FMM is bounded by a function of C, u and v. The “constant” C is independent of the number of poles taken in the multipole expansion but it depends on the distance u between two clusters. Thus it cannot be used when we need constants valid for all u and v, that is, for an implementation which allows any configuration (shape, size) of the object. On the contrary our estimations involve true constants and thus can be used safely. Rahola uses the following expansion for jl : jl (v) =

1 2 ( 12 v 2 )2 vl 2v 1− + − ··· 1. 3. 5 · · · (2l + 1) 1!(2l + 3) 2!(2l + 3)(2l + 5)

and the following approximation (16)

jl (v) ∼

2 vl e−v /4l , 1. 3. 5 · · · (2l + 1)

but this approximation is only valid if l 1 and if the infinite sum can be approximated by its first terms, i.e., iff v 2 /4l 1. The same type of

122

ERIC DARVE Estimate of the error for |y|=0.7|x|, L=|x|+1

Truncation error

100000

Real truncation error Previous papers with C=1 Our estimate

1 1e-05 1e-10 1e-15 1e-20 1e-25 0

20

40

60

80 100 120 140 160

Value of x

Fig. 5. Comparison of our error estimation with the other papers, C = 1.

approximation is used for yl : (17)

=−

( 12 u2 )2 + + ··· 1+ 1!(2l − 1) 2!(2l − 1)(2l − 3) ( 1 u2 )r 1. 3. 5 · · · (2l − 1)

1. 3. 5 · · · (2l − 1) yl (u) = − ul+1

1 2 2u

2

ul+1

r

r!(2l − 1)(2l − 3) · · · (2l − 2r + 1)

1. 3. 5 · · · (2l − 1) u2 /4l e . ∼− ul+1 However, the infinite sum includes terms that are decreasing at the denominator 2l − 1, 2l − 3, . . . , 2l − 2r + 1, . . . , 1, −1, . . . − ∞. Thus the approximation by eu of validity of

2

/4l

is not justified unless l → +∞. The domain

yl (u) ∼ −

1. 3. 5 · · · (2l − 1) ul+1

cannot be assessed for finite l. Figure 5 illustrates the fact that the constant C in Rahola’s lemma depends on u. Here we consider two vectors x and y with the relation |y| = 0.7|x| and we compute the error

L

(1)

(1) *true (x, y) = h0 (|x − y|) − (2m + 1)hl (|x|)jl (0.7|x|)Pl (cos γ)

l=0

for L = |x| + 1. We plot, as a function of |x|, *true (x, y) as well as *Rahola (x, y) (Rahola’s Lemma 4.1 with C = 1) and *ours (x, y) (Theorem 1). This figure confirms that there exists C such that *true (x, y) ≤ C*ours (x, y). However, *true (x, y) *Rahola (x, y)

123


Modulus of h_{l+5}^(1)(l) 100000

Koc et al h_{l+5}^(1)(l)

Hankel function

10000 1000 100 10 1 0.1 0.01 0.001 0

5

10 15 Value of l

20

25

(1)

Fig. 6. Behavior of hl (u) for l ∼ |u|.

Hankel function

Modulus of h_{l}^(1)(u),u=30 1e+14 1e+12 1e+10 1e+08 1e+06 10000 100 1 0.01 0.0001 1e-06

Koc et al h_{l}^(1)(u)

30

35

40

45 50 Value of l

55

60

65

(1)

Fig. 7. Behavior of hl (u) for u = 30.

cannot be bounded by a constant independent of |x|. Similar remarks can be made regarding Koc, Song, and Chew in [18] as is illus(1) trated by Figure 6. They use the following bound on jl and hl (u): e (e|v|)l |jl (v)| ≤ , 2 (2l + 1)l+1 (18) 2 2l + 1 l 1 (1) |hl (u)| ≤ . |u| e e|u| While the first bound is correct and can be found in [1, Equation (9.1.62)] the second one is not valid for all l and certainly not for l ∼ |u|. As a matter of fact the opposite seems to be true as can be seen on Figure 7. Estimate (18) is used to bound the remaining terms in the series from l ∼ |u| to +∞. However, this estimate is valid (up to a constant) only for l |u|. This error (1) is illustrated by Figure 6 where we chose |u| + 5 = l and plot |hl+5 (l)| and compare with the Koc, Song, and Chew estimate. 8. Numerical results. An article will be written about the implementation and performance of the FMM. In this article we only give a limited presentation of a few numerical results.

124

ERIC DARVE

Fig. 8. CETAF mesh.

The benchmark chosen, called “CETAF,” was provided by Dassault (see mesh on Figure 8). It is a real life application and presents some numerical difficulties, such as the presence of a slit through the mesh and sharp edges. Reflection from one side of the slit to the other can be observed. Triangles on either side are strongly interacting. All these phenomena result in a bad condition number and a slower convergence. The iterative scheme used is GMRES and the preconditioner is SPAI. We used the combined field integral equation (CFIE) which has nicer convergence property than EFIE or MFIE. This leads to a nonsymmetric matrix. This is why we chose to use GMRES as our iterative scheme. The reader interested in SPAI will read [9], [5], [4], [16], and [2]. SPAI was tested on a sphere. It was observed that a remarkable improvement (factor of 4-5) was obtained for the EFIE formulation. However, the improvement with CFIE was marginal (20%–30% improvement). As far as the author knows no preconditioner is known to be more efficient on a general configuration for electrodynamics problems. The code was run on a HP PA 8000 200 MHz computer, on a single processor. The peak rate is 800 MFlops (two addition/multiplication units). A comparison with a brute force resolution (= exact solution) showed that the relative error for the FMM (Euclidean norm) is less than 0.01. The “exact” solution was found by assembling the full matrix and inverting it with the Gauss method or using an iterative method with the full matrix. However, the largest test case for which the full matrix can be assembled corresponds to 5000 degrees of freedom. We found that for problems of size less than 5000 degrees of freedom the error is less than

125


CETAF

10λ

11.7λ

25λ

FMM CPU Time

1.44h

1.84h

12.81h

FMM CPU / Iter

1.55 min

2.35 min

13.03 min

FMM Memory

22.3 Megs

37.0 Megs

173 Megs

# levels

5

6

7

Degrees of freedom

21564

45960

183840

Total CPU

1.71h

2.42h

21.7h

SPAI (construction)

3.15 min

14.16 min

37.29 min

Resolution

1.57h

1.99h

20.63h(7.16h)

Total memory

80 Megs

210 Megs

550 Megs

# GMRES iterations

56

47

59

Fig. 9. CETAF test case. FMM, test case : CETAF 14 CPU time(min)

12

200

CPU time n log^2 n Memory n log(n)

180 160 140

10

120

8

100

6

80 60

4

Memory (Megs)

16

40

2

20

0

0 10

12

14

16 18 20 Frequency

22

24

26

Fig. 10. CPU / memory requirements.

0.01, but it was not possible to check the accuracy for the larger test cases shown here. The figure in parentheses (“20.63h(7.16h),” Figure 9) indicates that the close interactions were not stored in core memory but were recomputed at each iteration. Thus for the 25λ case, 7.16h were spent recomputing the close interactions. To visualize the asymptotic growth of the complexity and memory requirement we plot the CPU and memory for the 3 frequencies and compare them with the predicted n log2 (n) complexity for the CPU and n log(n) complexity for the memory (Figure 10). The intensity of the current on both sides of the CETAF is shown in Figures 11 and 12. The real part of the complex current (physical current) is shown in Figure 13. 9. Conclusion. In this article we have presented a theoretical tool to study the spherical Bessel functions and bound sequences and series involving these Bessel functions. From that we obtained an optimal and sharp criteria to truncate the

126

ERIC DARVE

Fig. 11. Intensity of the current.

Fig. 12. Shadow region.

Gegenbauer series and thus correctly define the number of terms in transfer functions. This gives us a control on the error due to the FMM. Moreover we deduced from those results the asymptotic complexity of the FMM. All the results, truncation of the Gegenbauer series, convergence of the FMM, complexity of the algorithm were verified and measured on a complete Maxwell code based on the FMM. We were able to develop this Maxwell code with an optimal, i.e., minimal, complexity and reach


127

Fig. 13. Real part of the complex current.

a million of unknowns on a parallel computer. Those numerical results will be the object of a forthcoming paper. REFERENCES [1] M. Abramovitz and I. Stegun, Handbook of Mathematical Functions, National Bureau of Standards, Applied Mathematical Series 55, Washington D.C., 1964. ón, M. Benzi, and L. Giraud, Sparse Approximate Inverse Preconditioning for [2] G. Alle Dense Linear Systems Arising in Computational Electromagnetics, Technical Report TR/PA/97/05, CERFACS, Toulouse, France, 1997. [3] B. Alpert and R. Jakob-Chien, A fast spherical filter with uniform resolution,. J. Comput. Phys., 136 (1997), pp. 580–584. [4] S. T. Barnard, L. M. Bernardo, and H. D. Simon, An MPI implementation of the SPAI preconditioner on the t3e, Technical Report LBNL-40794, Lawrence Berkeley National Laboratory, Berkeley, CA, 1997. [5] M. Benzi and M. T˚ uma, A sparse approximate inverse preconditioner for nonsymmetric linear systems, SIAM J. Sci. Comput., 19 (1998), pp. 968–994. [6] R. Bharadwaj, A. Windemuth, S. Sridharan, B. Honig, and A. Nicholls, The fast multipole boundary element method for molecular electrostatics: An optimal approach for large systems, J. Comput. Chem., 16 (1995), pp. 898–913.

128

ERIC DARVE

[7] O. Bokanowski and M. Lemou, Fast multipole method for multidimensional integrals, C. R. Acad. Sci. Paris Ser. I Math., 1 (1998), pp. 105–110. [8] M. Challacombe and E. Schwegler, Linear scaling computation of the Fock matrix, J. Chem. Phys., 106 (1997), pp. 5526–5536. [9] E. Chow and Y. Saad, Approximate inverse preconditioners via sparse-sparse iterations, SIAM J. Sci. Comput., 19 (1998), pp. 995–1023. [10] R. Coifman, V. Rokhlin, and S. Wandzura, The fast multipole method for the wave equation: A pedestrian prescription, IEEE Trans. Antennas and Propagation, 35 (1993), pp. 7–12. [11] A. Dutt, M. Gu, and V. Rokhlin, Fast algorithms for polynomial interpolation, integration and differentiation, SIAM J. Numer. Anal., 33 (1996), pp. 1689–1711. [12] M. A. Epton and B. Dembart, Multipole translation theory for the three-dimensional Laplace and Helmholtz equations, SIAM J. Sci. Comput., 16 (1995), pp. 865–897. [13] L. Greengard, J. Huang, V. Rokhlin, and S. Wandzura, Accelerating fast multipole methods for the Helmholtz equation at low frequencies, IEEE Comp. Sci. and Engin., 5 (1998), pp. 32–38. [14] L. Greengard and V. Rokhlin, Rapid evaluation of potential fields in three dimensions, in Vortex Methods, Lecture Notes in Math. 1360, Springer-Verlag, New York, 1988, pp. 121– 141. [15] L. Greengard and V. Rokhlin, A New Version of the Fast Multipole Method for the Laplace Equation in Three Dimensions, Acta Numer. 6, Cambridge University Press, Cambridge, UK, 1997, pp. 226–269. [16] M. J. Grote and T. Huckle, Parallel preconditioning with sparse approximate inverses, SIAM J. Sci. Comput., 18 (1997), pp. 838–853. [17] K. C. Hill, M. I. Sancer, and S. Bindiganavale, Assessment and development of fast cem solvers, in 29th Plasmadynamics and Lasers Conference, AIAA-98-2474, Albuquerque, NM, 1998. [18] S. Koc, J.M. Song, and W.C. Chew, Error Analysis for the Numerical Evaluation of the Diagonal Forms of the Scalar Spherical Addition Theorem, Research report CCEM-32-97, University of Illinois, Urbana, IL 61801-2991, 1997. [19] C.C. Lu and W.C. Chew, A multilevel algorithm for solving a boundary integral equation of wave scattering, Micro. Opt. Tech. Letters, 7 (1994), pp. 466–470. [20] J. Rahola, Diagonal forms of the translation operators in the fast multipole algorithm for scattering problems, BIT, 36 (1996), pp. 333–358. [21] V. Rokhlin, Diagonal forms of translation operators for the Helmholtz equation in three dimensions, Research Report YALEU/DCS/RR-894, Yale University Department of Computer Science, New Haven, CT, 1992. [22] V. Rokhlin, Analysis-based fast numerical algorithms of applied mathematics, in Proceedings of the International Congress of Mathematicians, Z¨ urich, Switzerland, 1994. [23] J.M. Song and W.C. Chew, Multilevel fast-multipole algorithm for solving combined field integral equations of electromagnetic scattering, Micro. Opt. Tech. Letters, 10 (1995), pp. 14– 19. [24] J.M. Song, C.C. Lu, and W.C. Chew, Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects, IEEE Trans. Antennas and Propagation, 45 (1997), pp. 1488–1493. [25] N. Yarvin and V. Rokhlin, An improved fast multipole algorithm for potential fields on the line, SIAM J. Numer. Anal., 36 (1999), pp. 629–666. [26] N. Yarvin and V.Rokhlin, A generalized one-dimensional fast multipole method with application to filtering of spherical harmonics, J. Comp. Phys., 147 (1998), pp. 594–609.