A fast algorithm for multiscale electromagnetic ... - Wiley Online Library

6 downloads 0 Views 928KB Size Report
Feb 18, 2012 - lative decomposition (ID) [Liberty et al., 2007] is combined with the ...... [37] The RCS from a ship model is calculated to demon- strate the ...
RADIO SCIENCE, VOL. 47, RS1011, doi:10.1029/2011RS004891, 2012

A fast algorithm for multiscale electromagnetic problems using interpolative decomposition and multilevel fast multipole algorithm Xiao-Min Pan,1,2 Jian-Gong Wei,2 Zhen Peng,2 and Xin-Qing Sheng1 Received 14 October 2011; revised 10 December 2011; accepted 15 December 2011; published 18 February 2012.

[1] The interpolative decomposition (ID) is combined with the multilevel fast multipole algorithm (MLFMA), denoted by ID-MLFMA, to handle multiscale problems. The ID-MLFMA first generates ID levels by recursively dividing the boxes at the finest MLFMA level into smaller boxes. It is specifically shown that near-field interactions with respect to the MLFMA, in the form of the matrix vector multiplication (MVM), are efficiently approximated at the ID levels. Meanwhile, computations on far-field interactions at the MLFMA levels remain unchanged. Only a small portion of matrix entries are required to approximate coupling among well-separated boxes at the ID levels, and these submatrices can be filled without computing the complete original coupling matrix. It follows that the matrix filling in the ID-MLFMA becomes much less expensive. The memory consumed is thus greatly reduced and the MVM is accelerated as well. Several factors that may influence the accuracy, efficiency and reliability of the proposed ID-MLFMA are investigated by numerical experiments. Complex targets are calculated to demonstrate the capability of the ID-MLFMA algorithm. Citation: Pan, X.-M., J.-G. Wei, Z. Peng, and X.-Q. Sheng (2012), A fast algorithm for multiscale electromagnetic problems using interpolative decomposition and multilevel fast multipole algorithm, Radio Sci., 47, RS1011, doi:10.1029/2011RS004891.

1. Introduction [2] Efficient and accurate solutions of electromagnetic (EM) scattering and radiation problems have attained a lot of interest for decades. Typical applications include radar cross section (RCS) estimation, antenna analysis and design, electromagnetic compatibility (EMC), electromagnetic interference (EMI), radiation hazards (EMR), remote sensing, etc. Among many full-wave numerical methods, the algorithms developed based on the method of moments (MoM) [Peterson et al., 1998] have been widely used due to its high fidelity and superior capability to handle arbitrarily shaped targets. A typical MoM solution procedure begins with properly meshing the target of interest and selecting basis functions to model the equivalent electric and magnetic currents. After modeling a target with a set of N expansion functions and performing the traditional Galerkin testing for the integral equation, an N  N dense impedance matrix is generated with the memory requirement of O(N2). The resultant matrix system can be solved by direct or iterative solvers. The computational complexity of MoM is O(N3) for a conventional direct solver in terms of CPU time, such as

1 Center for Electromagnetic Simulation, School of Information and Electronics, Beijing Institute of Technology, Beijing, China. 2 ElectroScience Laboratory, Ohio State University, Columbus, Ohio, USA.

Copyright 2012 by the American Geophysical Union. 0048-6604/12/2011RS004891

LU, and O(N2) for an iterative algorithm, such as CG or GMRES. The RWG [Rao et al., 1982] basis functions are the typical basis functions selected for discretizing the integral equations. To achieve accurate solutions, the average size of each element is generally on the order of 1/10 wavelength (l). Consequently, the size of the associated MoM matrix grows very rapidly as the object size becomes larger with respect to l; this challenges the MoM for a variety of applications. To make it worse, there are so-called multiscale applications. In these cases, targets are over-meshed to conduct wide-band calculations, or partly over-meshed to capture the tiny geometrical structures. The discretization size is virtually independent of l and N subsequently can be very large even for electrically small target sizes. [3] In the MoM, both the CPU time and memory space are a great burden for a moderate N, even on modern computers. To mitigate this technical difficulty, MoM matrix equations typically utilize iterative solvers along with techniques to accelerate the matrix vector multiplication (MVM). These accelerations are performed either by adopting sets of directional basis and testing functions which radiate narrow beams (giving rise to quasi-sparse impedance matrices) or by approximating MVM through the physical or mathematical properties of the MoM matrix. Examples of the former case include the impedance matrix localization (IML) [Canning, 1995], complex multipole beam approach (CMBA) [Boag and Mittra, 1994], and wavelet expansion [Steinberg and Leviatan, 1993]. The latter includes the fast multipole method (FMM) [Coifman et al., 1993], its multilevel version

RS1011

1 of 11

RS1011

PAN ET AL.: ID-MLFMA

[Chew et al., 2001; Velamparambil and Chew, 2005; Pan and Sheng, 2006, 2008; Ergul and Gurel, 2009; Taboada et al., 2010], adaptive integral method (AIM) [Bleszynski et al., 1996], precorrected fast Fourier transform (pFFT) method [Phillips and White, 1997], multilevel matrix decomposition algorithm (MLMDA) [Michielssen and Boag, 1996; Rius et al., 2008], IES3 [Kapur and Long, 1998], QR-based or SVD-based methods [Tsang and Li, 2004; Breuer et al., 2003; Gope and Jandhyala, 2005; Seo and Lee, 2004; Burkholder and Lee, 2004], and adaptive cross-approximation (ACA) method [Kurz et al., 2002; Zhao et al., 2005; Shaeffer, 2008]. In some of these formulations the memory requirements and CPU time are reduced from O(N2) to O(N1.5) for single level implementations and O(N log aN)(1 ≤ a ≤ 2) for multilevel ones. The pFFT, FMM and its multilevel version are based on analytic property of the Green’s function, while the MLMDA, IES3, ACA and the other QR-based methods are based on the rank deficiency among the coupling matrices between well-separated mesh partitions. The approximating methods mentioned above differ in the implementation and performance despite similarity in essence. Among these methods, MLFMA seems to be the most appealing one because of its fidelity, efficiency and generality. Although MLFMA has well documented success in solving large-scale MoM-based problems, its applications on scenarios involving over-meshing still present challenges. To substantiate this claim, it is well known that the FMM and MLFMA suffer from sub-wavelength breakdown when targets are over-meshed [Chew et al., 2001]. This results in expensive operations associated with near-field coupling submatrices which could consist of millions of entries. Computing and storing these are often impractical. [4] One solution for this difficulty is to combine the traditional MLFMA with its low-frequency versions developed through analytic approaches [Hu et al., 2001; Darve and Have, 2004; Cheng et al., 2006; Jiang and Chew, 2004; Daniela and Bunger, 2009; Vikram et al., 2009]. The efficiency issues associated with the aforementioned approaches are mentioned in Section 5.4. Another possibility is to adopt algebraic techniques such as ACA, QR or SVD-based methods, to approximate the near-field interactions in the MLFMA. However, the QR or SVD-based methods require all entries of the near-field matrix being computed. This prevents them from efficiently treating multiscale problems because evaluating and storing the near-field matrix would be too expensive. Furthermore, the complexity of the QR or SVD-based algorithms is of O(N3), where N is the dimension of the objective matrix. Recently, Rodriguez et al. [2008] efficiently approximated near-field interactions of the FMM according to the corresponding data sparse representation of far-field interactions, which was obtained by applying the SVD to the aggregation matrix. But the error arising from this approximation is hard to analyze. On the other hand, the ACA can avoid the large time/memory requirement to compute all the matrix entries, while its error control scheme is still an ongoing area of research. In this paper, the interpolative decomposition (ID) [Liberty et al., 2007] is combined with the conventional MLFMA for multiscale problems. Particularly, the ID is employed here to efficiently approximate near-field interactions (NFIs) with respect to MLFMA (MLFMA-NFIs). Furthermore, a specific mechanism for the

RS1011

ID approximation is developed to avoid the expensive operations on evaluation all of the MLFMA-NFI matrix elements. [5] The rest of the paper is organized as follows. Section 2 begins with a brief outline of the conventional MLFMA and then discusses the main framework of the proposed ID-MLFMA. Section 3 gives the basic idea of the ID algorithm and its applications on matrix approximations. Some details about computations at the ID levels are discussed in Section 4, including the employment of the artificial sphere. Section 5 presents some illustrative numerical results, and finally, a summary and some conclusions are given in Section 6.

2. Outline of MLFMA and ID-MLFMA 2.1. MLFMA [6] For perfectly electric conducting (PEC) objects, discretization and testing of surface integral equations yields an N  N dense matrix equation in the form of Z⋅I ¼ V

ð1Þ

where Z is the impedance matrix, N is the number of unknowns, I are the equivalent current, and V corresponds to the incident wave. The matrix equation, equation (1), can be solved iteratively, and the required MVM can be accelerated by the FMM or MLFMA [Coifman et al., 1993; Chew et al., 2001]. The FMM/MLFMA decomposes MVM into two parts: NFIs and far-field interactions (FFIs). The former is computed directly, while the latter is accelerated by FMM/MLFMA. The matrix equation in the context of FMM has a form of Z⋅I ¼

X s∈Bo

Zo;s ⋅Is þ Do ⋅

X

To;s ⋅ As ⋅ Is

ð2Þ

s2 = Bo

where Zo,s is the impedance matrix corresponding to the observation and source box. Is is the coefficients of the RWG basis functions in the box s; Bo denotes the near neighbors of the box o; To,s is the translator; Do and As are the disaggregation and aggregation matrices. The first term in equation (2) accounts for the contribution from the selfcoupling of box o and its near neighbors. While the second one collects the contribution from the remaining boxes. [7] To conduct far-field interactions by MLFMA, a hierarchical tree structure is always constructed by recursively subdividing the spatial domain. The computational domain is first enclosed in a box; subsequently the box is divided into eight equal children, where each child is then recursively divided into smaller boxes. The recursive division will not stop until the size of the leaf box is less than a given size, in our case, we use 0.3l. In the end, a tree structure is established. In general, all computations in the FMM/MLFMA are organized by boxes from this tree structure. In the MLFMA, interpolation/ anterpolation combined with center-shifting operations is required to transfer far-field patterns from a child box to its parent box and vice versa. The detailed explanation of the MLFMA can be found in work by Chew et al. [2001]. 2.2. ID-MLFMA [8] In order to compute the FFIs, the MLFMA maps basis/ testing functions to plane waves using far-field expansion

2 of 11

RS1011

PAN ET AL.: ID-MLFMA

RS1011

while those on MLFMA-NFIs are conducted through the skeleton approximation at ID levels. The MLFMA-NFI matrix consists of block matrices that represent coupling among finest MLFMA boxes. These block matrices generally permit no approximation because of their high rank. Submatrices with deficient rank should be extracted from them in advance. To this end, the ID-MLFMA further classifies MLFMA-NFIs into ID-NFIs and ID-FFIs at the ID levels; the distinction between the two is based upon the so-called “one-buffer-box” criterion [Chew et al., 2001]. Based on the ID [Liberty et al., 2007], efficient data sparse representation is found to approximate the low rank ID-FFI submatrices. Because the high rank submatrices corresponding to the ID-NFIs are separated from the approximation, the resultant ID-FFI submatrices in the form of data sparse representation are very sparse. The proposed ID-MLFMA has the following virtues. 1. The error of the ID-MLFMA is controllable since both the MLFMA and the ID are error controllable. 2. The data sparse representation of ID-FFI submatrices is obtained without computing and storing all the original coupling matrix elements. 3. The integration of the ID into the MLFMA is straightforward. [10] The implementation details of the ID-MLFMA will be given in Section 4 after the discussion of the ID and its applications on skeletonization in Section 3.

3. ID and Matrix Approximation [11] Skeletons and data sparse representation are efficiently computed for the ID-FFIs by the ID. In the following, the mathematical background of the ID and the procedures of constructing skeletons are elucidated.

Figure 1. Tree structure of the ID-MLFMA (2D case). functions. The interactions between the plane wave functions are efficiently handled through the three steps listed above; furthermore, such interactions are independent of the supporting mesh. In contrast, the MLFMA-NFI matrix is heavily dependent on the properties of the mesh; as the mesh density grows so does the computational complexity of evaluating the coupling in the near-field. Consequently the efficiency of MLFMA degrades rapidly for targets which have fine meshing. [9] To circumvent this problem, we propose a new approach by combining the interpolative decomposition (ID) [Liberty et al., 2007] with the conventional MLFMA, denoted by ID-MLFMA. The finest level boxes in terms of the MLFMA, which are determined based on size criterion, are decomposed further in the ID-MLFMA; the condition for discontinuing the decomposition is based on some predetermined number of basis elements (e.g. 50). Figure 1 shows such a tree structure for the ID-MLFMA in 2D cases. The levels in the tree are classified into three categories: the MLFMA levels, the transition level and the ID levels. The transition level is identical to the finest MLFMA level. It is worthy pointing out that the box division procedure can be carried out independently. Namely, further division would only be conducted on boxes with more than 50 unknowns. The computations on MLFMA-FFIs are conducted by the aggregation, translation and disaggregation,

3.1. ID [12] We first gives a brief introduction of the ID proposed by Liberty et al. [2007]. Suppose C is a complex m  n matrix of rank k with k ≤ m and k ≤ n. There exist a complex k  n matrix P and complex m  k matrix B whose columns consists of a subset of the columns of C such that 1. some subset of the columns of P makes up the k  k identity matrix; 2. no elementp offfiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi P has an absolute value greater than 1; 3. kPkn k2 ≤ kðn  kÞ þ 1; 4. the least (that is the k  th greatest) singular value of P is at least 1; and 5. ffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffiffi when k < m and k < n, kC  Bmk ⋅ Pkn k2 ≤ p skþ1 kðn  kÞ þ 1, where sk+1 is the (k + 1)-st greatest singular value of C. [13] Based on these statements, an approximation can be derived as Bmk ⋅ Pkn ≈ Cmn ;

ð3Þ

when the exact rank of Cmn is greater than k, but the (k + 1)-st greatest singular value of Cmn is small. [14] The ID employs randomness to reach the decomposition described in equation (3). It begins with generating a random vector w with Gaussian distribution and forming the product of y = wHC, where the superscript H means adjoint operation. Vector y can be regarded as a random sample

3 of 11

PAN ET AL.: ID-MLFMA

RS1011

RS1011

Figure 2. Construction of skeletons, (a) before and (b) after skeletonization.

from the range of C. Repeating this sampling process l(l > k) times: ðyðiÞ Þ ¼ ððwðiÞ ÞH ⋅ C; i ¼ 1; 2; ⋅⋅⋅; l:

ð4Þ

Owing to the randomness, the set w(i) : i = 1, 2, ⋅⋅⋅, l of random vectors form a linearly independent set and no linear combination falls in the null-space of C. Therefore, to produce an orthonormal basis of the range of C, we just need to orthonormalize the sample vectors by rewriting equation (4) into the compact form, Yln ¼ Wlm ⋅ Cmn :

ð5Þ

Employing some stable methods for performing the orthonormalization, such as the pivoted QR factorization, a k  n matrix P whose columns form an orthonormal basis for the range of Y can be obtained, such that Yln ¼ Llk ⋅Pkn :

ð6Þ

where the columns of Llk constitute a subset of the columns of Y. That is to say, there exists a set of integers i1, i2, ⋅⋅⋅, ik that, for any j = 1, 2, ⋅⋅⋅, k, the j-th columns of L is the ij-th column of Y. Collect the corresponding columns of C into a complex m  k matrix B, so that, for any j = 1, 2, ⋅⋅⋅, k, the j-th columns of B is the ij-th column of C. [15] The ID algorithm typically requires [Liberty et al., 2007] Cost ¼ l ⋅CH þ Oðk ⋅m þ k ⋅ l ⋅nÞ

floating-point operations, where CH is the cost of applying CH to a vector. [16] As shown by Liberty et al. [2007], l = k + 5 or l = k + 10 is sufficient. In practice, the rank k is rarely known in advance. The ID are usually implemented in an adaptive fashion where the number of samples is increased until the error satisfies the desired threshold ɛID as discussed in Section 5.2. This will at most double the cost [Liberty et al., 2007]. Due to the randomness used, the ID does have the possibility to fail. However, the possibility is very slim [Liberty et al., 2007]. In a word, compared with the classical pivoted QR factorization, the cost is reduced a lot since we need only to factorize the small matrix Y. [17] In some cases, it is more efficient to construct matrix Wlm in such a manner that the resultant matrix consists of uniformly randomly selected rows of the product of the discrete Fourier transform matrix and a random diagonal matrix [Liberty et al., 2007]. 3.2. Approximating Matrix by ID [18] Suppose the boxes o and s are a pair of well separated boxes, as shown in Figure 2. There are Ns basis functions in the source box s, with INs s as the coefficients, and No testing functions in the observation box o. Basis functions are denoted by lines with arrows, while testing functions are denoted by lines with double arrows. Skeletons to be figured out are indicated by lines with solid arrows. Suppose ZNo,soNs is the rank deficient coupling matrix for these two boxes. Applying the ID of equation (3) to the coupling matrix yields

ð7Þ

4 of 11

ZNo;so Ns ≈ BNo;so ks ⋅ Rks s Ns

ð8Þ

PAN ET AL.: ID-MLFMA

RS1011

RS1011

Figure 3. The division of boxes, showing (a) ltrans-th and (b) (ltrans + 1)-th levels. where ks(ks ≤ Ns) is the number of skeletons for the source group s. BNo,soks is the compressed representation of the matrix ZNo,so Ns, consisting of ks columns of original matrix ZNo,so Ns. Furthermore, employing the ID to conduct the row approximation on BNo,soks results in, o ks ZNo;so Ns ≈ LNo o ko ⋅ Sko;s ⋅ Rks s Ns

ð9Þ

where ko(ko ≤ No) is the number of skeletons for the obseroks is the sampling matrix consisting of vation group o. Sko,s Noks ko rows of Bo,s . [19] The matrix vector multiplication ZNo,soNs ⋅ INs evaluates the fields ENo o at the No observation points generated by the source INs s. According to equation (8) and equation (9), the MVM can be written as, ZNo;so Ns ⋅INs s ≈ LNo o ko ⋅ Sko;so ks ⋅ Rks s Ns ⋅ INs s

ð10Þ

The matrices RkssNs and LNo oko represent projection matrices. The former selects the dominant ks radiating elements which will sufficiently approximate the outgoing fields radiating from box s. The latter projects dominant field components oks , the onto each testing function located in box o. Sko,s sampling matrix, acts as a translation operator; it alters the outgoing skeletonized representations to incoming ones. It is essentially viewed as a compressed version of the ZNo,so Ns coupling matrix. If the matrix dimensions ko ≪ No and ks ≪ Ns, the compression can be significant.

4. Implementation of ID-MLFMA [20] In the ID-MLFMA, interactions at levels above the transition level are carried out by aggregation, translation and disaggregation, which are well documented by Coifman et al. [1993] and Chew et al. [2001]. Computations below the transition level (i.e. the ID levels) are conducted by the skeleton approximation. The implementation of this approximation will be discussed in detail in this section. 4.1. Extracting and Approximating Low Rank Submatrices [21] The MLFMA-NFI matrix consists of block matrices corresponding to coupling among boxes at the transition

ID-MLFMA level. These matrices are generally not rank deficient matrix subject to low rank approximation. However, the rank deficiency can be exploited at the ID levels. Suppose the ltrans-th level is the transition level, and box b1 is a box at this level which has three near neighbors: b2, b3 and b4 as shown in Figure 3a for a 2D case. It is clear that each matrix associated with interactions among these 4 boxes (including the self-coupling matrix) is almost full rank matrix. So, the skeleton approximation cannot be efficiently applied to them. After dividing each of the 4 boxes into 4 sub-boxes, b12, b13 and b14 are near neighbors of the box b11, while all children of b2, b3 and b4, the gray boxes in Figure 3b, become second near neighbors of b11. Their coupling belong to ID-FFIs and the corresponding submatrices are low rank matrices. By this manner, all the low rank matrices can be extracted at the (ltrans + 1)-th level. [22] Applying the ID to the low rank matrices, the first term in equation (2) can be written as X s∈Bo

Zo;s ⋅Is ¼

X p∈Bq

Zq;p ⋅Ip þ

X p2 = Bq

q kp LNq q kq ⋅ Skq;p ⋅ Rkpp Np ⋅ Ip

ð11Þ

where q and p, residing at the (ltrans+1)-th level, are children of boxes o and s, respectively. The notations in equation (11) are similarly defined as in equation (10). Suppose kq ¼ 1 1 Cq ⋅ Nq ðCq ≥ 1Þ and kp ¼ Cp ⋅ Np ðCp ≥ 1Þ, then the size of the low rank matrix Zq,p can be reduced by a factor of Cq ⋅ Cp. [23] The first term in equation (11) can be recursively approximated by skeletons at the (ltrans + 2)-th level, (ltrans + 3)-th level, ⋅⋅⋅, and so on. It is worthy pointing out that the skeletonization can also be used to efficiently approximate the aggregation and disaggregation matrices. 4.2. Constructing Projection Matrices Efficiently [24] A simple way to obtain the skeletons of the box q is to = Bq, concatenate all the submatrix Zq,p and ZH p,q, where p 2 into a matrix as

5 of 11

Zq ¼ ðZq;1 ; ⋅⋅⋅ ; Zq;p ; ⋅⋅⋅ ; ðZ1;q ÞH ; ⋅⋅⋅ ; ðZp;q ÞH ; ⋅⋅⋅Þ

ð12Þ

PAN ET AL.: ID-MLFMA

RS1011

RS1011

the ID-NFI block matrices. Consequently, we can substitute an artificial sphere for all the boxes p({p : p 2 = Bq}), as shown in Figure 4. The radius of the artificial sphere rsph = 2.5rbox, where rbox is the size of the box of interest. Because each second near neighbor (box p{p : p 2 = Bq}) resides outside the artificial sphere, the ability of radiation or receiving of the elements in the box q can be well measured. Based on this, we just take into account the coupling between the box q and the corresponding artificial sphere to construct the skeletons of box q. In particular, we compute the MoM submatrices Zq,a and Za,q for the mutual coupling between the box q and the artificial sphere a according to the standard MoM discretization procedure. The matrix used to construct skeletons of box q is then written as Zq = (Zq,a, (Za,q)H). After applying the ID on Zq as done in equation (13), we get the Lq and Rq by Figure 4. The artificial sphere for skeleton construction (rsph = 2.5rbox). which results in a Nq  Ntot matrix with Ntot ¼ 2 

P

p2 = Bq

Np . The

ID can be utilized on the matrix Zq to conduct the row approximation as Zq ¼ LNq q kq ⋅ Bkqq Ntot

ð13Þ

where BkqqNtot consists of kq rows of Zq, and LNq qkq is the incoming projection matrix of box q. Thus, for any submatrix = Bq) , we have Zq,p(∀p 2 q Np Zq;p ¼ LNq q kq ⋅Bkq;p

ð14Þ

q Np where Bkq,p consists of kq rows of Zq,p. According to Martinsson and Rokhlin [2005, 2007] and Greengard et al. [2009], the outgoing projection matrix of any box can be identical to its adjoint of incoming projection matrix. As a result, one arrives at

q kp Zq;p ¼ LNq q kq ⋅Skq;p ⋅ðLNp p kp ÞH ;

∀p 2 = Bq :

ð15Þ

Although the above procedure seems simple enough, constructing projection matrices R/L of Zq may be time consuming. The reason centers around the fact that computations = Bq}) to in equation (13) require all the elements of Zq,p({p : p 2 be available. Since these matrices are dense, computing and storing them are very time consuming; additionally, although Nq may be moderately sized Ntot can be considerably large leading to a large dimensions in the Zq matrix. [25] Conceptually, finding skeletons of a box is a procedure of selecting basis/testing functions in this box according to their ability of radiation or receiving. It is realized by calculating and sorting the singular values of the corresponding coupling matrix. Due to the fast decay of the Green’s function with respect to the distance between the source and observation points, Martinsson and Rokhlin [2005, 2007] and Greengard et al. [2009] proposed an approach to accelerate this step by introducing “supercell” of box q. In the supercell approach, Zq includes all Zq,p( p ≠ q). By contrast, = Bq) and excludes in the ID-MLFMA, Zq consists of Zq,p( p 2

N 2N

N k

k 2Na

Zq q a ≈ Lq q q ⋅Bqq k N N k Rqq q ¼ ðLq q q ÞH

ð16Þ

where Na is the number of unknowns required to discretize the artificial sphere a. Na is determined by an userspecified accuracy ɛID to conduct the ID approximation, as shown in Section 5.1 and 5.2. Since the number of unknowns for the P artificial sphere is always much less Np , a lot of CPU time is saved in than Ntot ¼ 2  p2 = Bq

applying the ID on Zq. [26] The employment of the artificial sphere provides us a mechanism to efficiently construct projection matrices without the MLFMA-NFI matrix, which is usually expensive to evaluate in multiscale problems. After skeletonization, the matrix Sq,p are evaluated directly according to the computed skeletons. Thus, only kq  kp matrix elements are required to be computed and stored, instead of Nq  Np elements in the original Zq,p. This may be one of the most attractive virtues of the ID-MLFMA. Furthermore, integrating the ID into the existing MLFMA is straightforward because all skeletonization operations are carried out on the MLFMA-NFI matrix. Also, the skeletonization procedure represents an inherent parallelism.

5. Numerical Results [27] All the computations are carried out on the Dell Optiplex 980 personal computer. It is configured with i7 870 CPU and 16 GB memory. RWG functions are chosen as basis and testing functions to discretize CFIE with a combination coefficient of 0.2. The GMRES iteration process is terminated when the L2-norm of the residual vector is reduced to 103. To simplify implementation of ID-MLFMA, skeletons at the child level are not used to construct skeletons at the parent level. (See Notation for definitions of notation used in this section.) [28] In the following, the compression ratio is computed by

6 of 11

hMem ¼

MID MMLFMA-NFI

MID ¼ MID-NFI þ Mproj þ Msamp

ð17Þ ð18Þ

PAN ET AL.: ID-MLFMA

RS1011

Table 1. Impacts of the Mesh Size of the Artificial Sphere on the ID-MLFMA Average Size of Mesh (l)

Memory (MB) Mproj Msamp Time (s) Tproj Tsamp d J (%) d RCS (%)

0.15

0.10

0.06

98 1238

98 1234

98 1230

338 768 0.35 0.30

478 765 0.32 0.30

843 763 0.31 0.30

Errors on electric current d J and the radar cross section (RCS) d RCS are computed via d J or d RCS ¼

k fref  fID-MLFMA k k fref k

ð19Þ

where fID-MLFMA is the result obtained by the ID-MLFMA, fref is the data computed by an analytical approach or the MLFMA, and k⋅k is the Euclidean norm. 5.1. Mesh Size of the Artificial Sphere [29] In this subsection, experiments are carried out to study how the mesh size of the artificial sphere will impact the efficiency and accuracy of the skeletonization by calculating the scattering from a PEC sphere, denoted by Sph-12, at the frequency of 60 MHz. Sph-12 has a diameter of 12 meters, modeled by 164,268 unknowns with an average mesh size of 0.1 m. In the experiment, ɛID is 0.001. The average mesh size of the artificial sphere is set be 0.15l, 0.10l and 0.06l. In the ID-MLFMA a total of 6 levels are required (the 0 and 1st levels involve no computations). The 3rd level is representative of the transition between MLFMA and ID. The last 2 (4th and 5th) ones are required for the ID. Skeletonization is performed independently on boxes at the ID levels. No skeleton is constructed for boxes with less than 50 basis functions. [30] Table 1 shows that the accuracy of ID-MLFMA is quite insensitive to the mesh size of the artificial sphere. The RCS errors (taking the Mie series as the reference) for these three cases are almost identical. To understand this phenomenon, it is beneficial to recall the essence of the skeleton construction. The selection of basis and testing functions, which is carried out by the projection matrices R and L, is based on the elements which contribute significantly to the radiation and receiving capabilities of a given box. Such projection matrices are obtained by calculating and sorting the singular values of the original coupling matrix through some rank revealing techniques (e.g. the ID). In this procedure, the ID cares more about the relative magnitudes of the singular values rather than their accurate values. That is to say, the accuracy of the singular values does not matter much if the singular values can be sorted correctly. [31] To prove our analysis, the normalized singular values of the coupling matrix for a selected box (a box contains 305 unknowns at the 4th level) are calculated under different meshing of the artificial sphere. As shown in

RS1011

Figure 5, the singular values are almost identical in these three cases because near neighbor coupling is excluded from the Zq and rbox is large in comparison with the box size. Since the ID employs randomness in finding skeletons, the skeletons obtained in these three cases are not exactly the same. However, 95% of the skeletons are identical. In other words, the skeletons are insensitive to the mesh size of the artificial sphere. It should be noted that the quadrature rule order does affect the accuracy of the matrix elements and thus the accuracy of very small singular values (i.e., the ones having normalized magnitudes less than 105) [Rius et al., 2008]. Since a threshold of 0.001 can always reach enough accurate approximation for the ID, as will be shown in Section 5.2, the basis/testing functions associated with small singular values contribute little to the accuracy of ID. Additionally, high order quadrature rules are unnecessary to some extent because the Sph-12 is already overly meshed. In a word, the quadrature rule order doesn’t play an important role in constructing skeletons. 5.2. Threshold for Skeleton Constructing [32] An ɛID, closely related to the singular values of the objective matrix, should be prescribed in advance for the ID to construct skeletons. The following experiments show the impact of ɛID on the efficiency and accuracy of the ID-MLFMA. In these experiments, the ID-MLFMA computations are conducted on the Sph-12 at 60 MHz by setting ɛID to be 0.01, 0.001 and 0.0001, respectively. The average mesh size of the artificial sphere is set to be 0.15 l according to the results in Section 5.1. It is shown in Table 2 that the accuracy of the ID-MLFMA is acceptable for the Sph-12 even when ɛID = 0.01. The accuracy can be further optimized by decreasing the threshold. The numerical results justify the above statement, as can be seen from Table 2. In summary, we conclude that the error of ID-MLFMA can be wellcontrolled since both the ID and the MLFMA are error controllable. [33] In the following computations, the mesh size of the artificial sphere is set to be 0.15l, and a threshold of

Figure 5. The normalized singular values of a selected box.

7 of 11

PAN ET AL.: ID-MLFMA

RS1011 Table 2. Impacts of the ɛID on the ID-MLFMA

RS1011

Table 4. Computational Statistics of the MLFMA on the Sph-12

ɛID

Memory (MB) Mproj Msamp Time (s) Tproj Tsamp d J (%) d RCS (%)

Frequency (MHz)

0.01

0.001

0.0001

91 965

98 1238

162 2436

337 598 1.43 1.22

338 768 0.32 0.30

339 1343 0.05 0.05

0.001 is used in the ID-MLFMA computations according to the investigations in Section 5.1 and Section 5.2. 5.3. Comparison Between ID and Pivoted QR Factorization [34] The outgoing and incoming projection matrices can also be computed by the pivoted QR factorization [Martinsson and Rokhlin, 2005, 2007; Greengard et al., 2009]. However, it is revealed by Liberty et al. [2007] that he ID always exhibits a higher efficiency than the pivoted QR, especially when the objective matrix is considerably rank deficient. For a 4096  4096 matrix with an effective rank of 248, the ID is 11 times faster than the pivoted QR as shown by Liberty et al. [2007]. In the following experiments, we compare the efficiency of the ID and the pivoted QR by computing the skeletons of Sph-12 at 60 MHz. In regards to the size of matrix Zq, Nq is no more than 300 and Ntot is 10800. The statistics shown in Table 3 for computations at the 4th level, where 1160 non-empty boxes reside, reinforce the statement by Liberty et al. [2007]. 5.4. Performance of ID-MLFMA [35] To investigate the performance of the ID-MLFMA, we calculate the scattering from Sph-12 under three different frequencies: 90 MHz, 60 MHz and 30 MHz. RCS are also calculated by the traditional MLFMA as comparison. Tables 4 and 5 list the resources required by the traditional MLFMA and by the ID-MLFMA. The ID-MLFMA shows superior performance compared with the traditional MLFMA. For example, ID-MLFMA needs less than 2.7 GB memory while it is estimated that the traditional MLFMA requires 47.9 GB memory for the NFI matrix in the 30 MHz case. At the same time, computations in the 90 MHz and 60 MHz also exhibit that the ID-MLFMA is capable of saving memory and accelerating the calculations. [36] Wide band computations are conducted on the NASA almond to further investigate the performance the ID-MLFMA. The frequency is swept from 1.4 GHz to 14.0 GHz under the stepping of 200 MHz. The almond is Table 3. CPU Time (s) Used by the ID and the Pivoted QR Average Size of Mesh (l)

ID Pivoted QR

0.15

0.10

0.06

63 329

117 850

225 1607

Average size of mesh (l) Finest MLFMA level Finest MLFMA box size (l) MMLFMA-NFI (MB) TMLFMA-NFI (s) TMVM (s) Ttot (s)

90

60

30

0.15 3rd 0.45 10795 3755 14.3 4163

0.10 3rd 0.30 10795 5247 14.3 5663

0.06 2nd 0.3 47905 –a –a –a

a

Computations cannot be completed because of limited memory.

model by 37,122 unknowns with an average mesh size of about 0.1l at 14.0 GHz and 0.015l at 1.4 GHz. A 6-level oct-tree is constructed for all the ID-MLFMA computations. Since the computations are costly, only 8 sampling points, 1.4 GHz, 2.0 GHz, 4.0 GHz, 6.0 GHz, 8.0 GHz, 10.0 GHz, 12.0 GHz and 14.0 GHz, are computed by the MLFMA. In the MLFMA computations, the division of boxes stops when the leaf box size reaches about 0.29l. As a result, a 6-level MLFMA is used in the 14.0 GHz case, while a 3-level one in the 1.4 GHz case. Figure 6 presents the monostatic RCS results at the direction of (90°, 30°) under different frequencies. It shows that the RCS results computed by the ID-MLFMA agree very well with those by the conventional MLFMA. Figure 7 plots MID against MMLFMA-NFI under different frequencies. In all ID-MLFMA computations, MID-NFI = 41(MB). It is indicated from Figure 7 that the memory consumption is significantly reduced when the frequency is low. For example, the MLFMA needs 7326 MB memory to store the corresponding NFI matrix in the 1.4 GHz case. However, it is reduced to 491 MB, a factor of 15.0, when the ID is employed. Figure 8 lists the statistics on TMLFMA-NFI and TID, where TID = TID-NFI + Tproj + Tsamp. Figure 9 gives statistics on TMVM. As shown in Figure 8 and Figure 9, the ID-MLFMA saves the total solution time in two aspects. On one hand, CPU time can be reduced in the ID-MLFMA because only a small portion of MLFMA-NFI matrix elements are required to be evaluated. In the 1.4 GHz case, the ID-MLFMA cuts down the matrix filling time from Table 5. Computational Statistics of the ID-MLFMA on the Sph-12 Frequency (MHz)

Finest ID level Memory (MB) MID-NFI Mproj Msamp hMem Time (s) TID-NFI Tproj Tsamp TMVM Ttot d J (%) d RCS (%)

8 of 11

90

60

30

4th

5th

5th

2717 35 215 3.6

675 98 1238 5.4

675 221 1746 18.1

2068 187 46 4.1 2421 0.11 0.11

2823 338 768 2.8 4019 0.32 0.30

2913 1090 4916 3.0 9069 0.44 0.41

RS1011

PAN ET AL.: ID-MLFMA

RS1011

Figure 6. Monostatic RCS by the NASA almond under different frequencies.

Figure 8. Time filling the MoM matrix under different frequencies for the NASA almond.

5840 s to 2593 s by a factor of over 2.0 compared with the MLFMA. On the other hand, the MVM is accelerated greatly by the approximation. As shown in Figure 9, the time for one MVM (the time for FFI is not included in the statistics) is reduced from 9.8 s to 0.9 s for the 1.4 GHz case, a factor of over 11.0, after the approximation. [37] The RCS from a ship model is calculated to demonstrate the capability of the ID-MLFMA. The incident plane wave illuminates the ship from the direction of (90°, 90°) at the frequency of 200 MHz. The ship, 100 meters long in the largest dimension, is simulated using 94,008 unknowns. Because there are many fine geometrical details, the resulting mesh as expected is drastically non-uniform, as shown in Figure 10. The longest edge is about 0.1l, and the shortest one is less than 1/100l. This is a typical multiscale application. The mesh is generated in such a manner that only the

tiny structures are overly meshed to capture their geometrical shapes while minimizing the total number of unknowns. We employ this ship model to exhibit the capability of the ID-MLFMA in solving multiscale problems. Whether the mesh is the most efficient one to conduct the simulation at a certain accuracy is beyond the scope of current study. [38] A 5-level tree is used for the pure MLFMA while a 7-level one is constructed for the ID-MLFMA. Namely, the 4th level is the finest MLFMA level, and it is the transition level in the ID-MLFMA computation. The statistics on the computational resource are listed in Table 6. The computed RCS is presented in Figure 11. As shown in Table 6, the relative RCS error against the results from the MLFMA is about 0.6%. The compression ratio of memory for the MoM matrix is over 6.0. The CPU time to fill this matrix is cut down by a factor of 2.4. The MVM is accelerated by a factor of 6.5. As a result, the total solution time is reduced by a

Figure 7. Memory for the MoM matrix under different frequencies for the NASA almond.

Figure 9. Time for one MVM under different frequencies for the NASA almond.

9 of 11

PAN ET AL.: ID-MLFMA

RS1011

RS1011

Figure 10. The ship model. factor of over 3.0. Obviously, the acceleration rate on total solution time can reach 6.5 for monostatic RCS calculations where the cost of iterations dominates that of the whole computation. [39] In particular, our experiments reveal that the proposed ID-MLFMA can significantly reduce the memory requirement as well as the total solution time compared with the MLFMA. The total solution time is saved not only on NFI matrix filling but also on the MVM for the iterations. This does not always hold true for other low-frequency fast algorithms. It was reported by Vikram et al. [2009] that the CPU time for every MVM was increased by the MLFMA based on accelerated Cartesian expansion (ACE) because of the additional computational cost at ACE levels in comparison to the MLFMA. As indicated by Hu et al. [2001], the memory requirement and the CPU time for one MVM with respect to the number of unknowns in the multilevel FIPWA was almost the same as the traditional MLFMA. [40] Just as other fast algorithms [Hu et al., 2001; Jiang and Chew, 2004], the loop-tree method can be integrated with the ID-MLFMA to solve the low-frequency MoM accuracy problem [Zhao and Chew, 2000] when the frequency becomes smaller.

effective unknowns, referred to as skeletons, are obtained to approximate the coupling among well-separated boxes at the ID levels. With an artificial sphere, skeletons are constructed without evaluating entries of the MLFMA-NFI matrix. In particular, the memory consumed by the skeletonized coupling submatrices at the ID levels can be greatly reduced. Furthermore, the matrix filling becomes less expensive and the matrix–vector multiplication is accelerated. Also, numerical experiments show that the ID-MLFMA is error controllable. Moreover, the ID-MLFMA’s accuracy is quite insensitive to the mesh employed for the artificial sphere. It is indicated that the ID-MLFMA is much more efficient than the conventional MLFMA in terms of memory consumption and total solution time for multiscale problems. Future work can be extended to handle radiation problems where nonuniform meshes are generally required to accurately model the details in the neighborhood of the feed point. Additionally, some techniques, such as loop-tree basis, can be

6. Conclusions [41] The ID-MLFMA proposed here consists of combining the interpolative decomposition (ID) with the conventional MLFMA for multiscale problems. Through the ID, the

Table 6. Computational Statistics on the Ship Model

MMLFMA-NFI / MID (MB) TMLFMA-NFI / TID (s) TMVM (s) Ttot (s) d J (%) d RCS (%)

MLFMA

ID-MLFMA

12085 4247 16.2 7882 – –

1950 1753 2.5 2504 0.61 0.60

Figure 11. Bi-static RCS from the ship model. 10 of 11

RS1011

PAN ET AL.: ID-MLFMA

incorporated into the ID-MLFMA to solve the low-frequency MoM accuracy problem.

Notation MMLFMA-NFI The memory for all MLFMA-NFI block matrices (Zo,s(s ∈ Bo)) MID-NFI The memory for all ID-NFI block matrices (Zq,p(p ∈ Bq)) at the finest ID level Mproj The memory for projection matrices R/L at all ID levels Msamp The memory for sampling matrices S at all ID levels TMLFMA-NFI The CPU time filling the MLFMA-NFI matrix TID-NFI The CPU time filling the NFI matrix at the finest ID level Tproj The CPU time constructing projection matrices R/L at all ID levels Tsamp The CPU time filling sampling matrices S at all ID levels TMVM The CPU time for one MVM excluding the time for the corresponding MLFMA-FFI Ttot The total solution time [42] Acknowledgments. This work was supported by the NSFC under grant 10832002, by the NSFC under grant 60901005, by the excellent young scholars research fund of Beijing Institute of Technology under grant 2010YS0502, and by the basic research fund of Beijing Institute of Technology under grant 20090542001.

References Bleszynski, E., M. Bleszynski, and T. Jaroszewicz (1996), AIM: Adaptive integral method for solving large-scale electromagnetic scattering and radiation problems, Radio Sci., 31, 1225–1251. Boag, A., and R. Mittra (1994), Complex multipole beam approach to electromagnetic scattering problems, IEEE Trans. Antennas Propag., 42, 366–372. Breuer, A., P. Borderies, and J. L. Poirier (2003), A multilevel implementation of the QR compression for method of moments, IEEE Trans. Antennas Propag., 51, 2520–2522. Burkholder, R. J., and J. F. Lee (2004), Fast dual-MGS block-factorization algorithm for dense MoM matrices, IEEE Trans. Antennas Propag., 52, 1693–1699. Canning, F. X. (1995), Solution of impedance matrix localization form of moment method problems in five iterations, Radio Sci., 30, 1371–1384, doi:10.1029/95RS01457. Cheng, H., W. Y. Crutchfield, Z. Gimbutas, L. Greengard, J. F. Ethridge, J. Huang, V. Rokhlin, N. Yarvin, and J. Zhao (2006), A wideband fast multipole method for the Helmholtz equation in three dimensions, J. Comp. Phys., 216, 300–325. Chew, W. C., J. M. Jin, E. Michielssen, and J. Song (2001), Fast Efficient Algorithms in Computational Electromagnetics, Artech House, Boston, Mass. Coifman, R., V. Rokhlin, and S. Wandzura (1993), The fast multipole method for the wave equation: A pedestrian prescription, IEEE Trans. Antennas Propag., 35, 7–12. Daniela, W., and R. Bunger (2009), An efficient implementation of the combined wideband MLFMA/LF-FIPWA, IEEE Trans. Antennas Propag., 57, 467–474. Darve, E., and P. Have (2004), Efficient fast multipole method for lowfrequency scattering, J. Comp. Phys., 197, 341–363. Ergul, O., and L. Gurel (2009), A hierarchical partitioning strategy for an efficient parallelization of the multilevel fast multipole algorithm, IEEE Trans. Antennas Propag., 57, 1740–1750. Gope, D., and V. Jandhyala (2005), Efficient solution of EFIE via low-rank compression of multilevel predetermined interactions, IEEE Trans. Antennas Propag., 53, 3324–3333. Greengard, L., D. Gueyffier, P. G. Martinsson, and V. Rokhlin (2009), Fast direct solvers for integral equations in complex three-dimensional domains, Acta Numer., 18, 243–275.

RS1011

Hu, B., W. C. Chew, and S. Velamparambil (2001), Fast inhomogeneous plane wave algorithm for the analysis of electromagnetic scattering, Radio Sci., 36, 1327–1340, doi:10.1029/2000RS002329. Jiang, L. J., and W. C. Chew (2004), Low-frequency fast inhomogeneous plane-wave algorithm (LF-FIPWA), Microwave Opt. Technol. Lett., 40, 117–122. Kapur, S., and D. E. Long (1998), IES3: Efficient electrostatic and electromagnetic solution, IEEE Comput. Sci. Eng., 5, 60–67. Kurz, S., O. Rain, and S. Rjasanow (2002), The adaptive cross-approximation technique for the 3-D boundary-element method, IEEE Trans. Magn., 38, 421–424. Liberty, E., F. Woolfe, P. G. Martinsson, V. Rokhlin, and M. Tygert (2007), Randomized algorithms for the low-rank approximation of matrices, Proc. Natl. Acad. Sci. U. S. A., 104, 20,167–20,172. Martinsson, P. G., and V. Rokhlin (2005), A fast direct solver for boundary integral equations in two dimensions, J. Comp. Phys., 205, 1–23. Martinsson, P. G., and V. Rokhlin (2007), An accelerated kernel-independent fast multipole method in one dimension, SIAM J. Sci. Comput., 29, 1160–1178. Michielssen, E., and A. Boag (1996), A multilevel matrix decomposition algorithm for analyzing scattering from large structures, IEEE Trans. Antennas Propag., 44, 1086–1093. Pan, X. M., and X. Q. Sheng (2006), A highly efficient parallel approach of multilevel fast multipole algorithm, J. Electromagn. Waves Appl., 20, 1081–1092. Pan, X. M., and X. Q. Sheng (2008), A sophisticated parallel MLFMA for scattering by extremely large targets, IEEE Antennas Propag., 50, 129–138. Peterson, A. F., S. L. Ray, and R. Mittra (1998), Computational Methods for Electromagnetics, IEEE Press, Piscataway, N. J. Phillips, J. R., and J. White (1997), A precorrected-FFT method for electrostatic analysis of complicated 3-D structures, IEEE Trans. Comput. Aided Des. Integr. Circuits Syst., 16, 1059–1072. Rao, S. M., D. R. Wilton, and A. W. Glisson (1982), Electromagnetic scattering by surfaces of arbitrary shape, IEEE Trans. Antennas Propag., 30, 409–418. Rius, J. M., J. Parron, A. Heldring, J. M. Tamayo, and E. Ubeda (2008), Fast iterative solution of integral equations with method of moments and matrix decomposition algorithm—Singular value decomposition, IEEE Trans. Antennas Propag., 56, 2314–2324. Rodriguez, J. L., J. M. Taboada, M. G. Araujo, F. O. Basteiro, L. Landesa, and I. Garcia-Tunon (2008), On the use of the singular value decomposition in the fast multipole method, IEEE Trans. Antennas Propag., 56, 2325–2334. Seo, S. M., and J. F. Lee (2004), A single-level low rank IE-QR algorithm for PEC scattering problems using EFIE formulation, IEEE Trans. Antennas Propag., 52, 2141–2146. Shaeffer, J. (2008), Direct solve of electrically large integral equations for problem sizes to 1 M unknowns, IEEE Trans. Antennas Propag., 56, 2306–2313. Steinberg, B. Z., and Y. Leviatan (1993), On the use of wavelet expansions in the method of moments, IEEE Trans. Antennas Propag., 41, 610–619. Taboada, J. M., M. Araujo, J. M. Bertolo, L. Landesa, F. Obelleiro, and J. L. Rodriguez (2010), MLFMA-FFT parallel algorithm for the solution of large-scale problems in electromagnetics (Invited Paper), Prog. Electromagn. Res., 105, 15–30. Tsang, L., and Q. Li (2004), Wave scattering with UV multilevel partitioning method for volume scattering by discrete scatters, Microwave Opt. Technol. Lett., 41, 354–361. Velamparambil, S., and W. C. Chew (2005), Analysis and performance of a distributed memory multilevel fast multipole algorithm, IEEE Trans. Antennas Propag., 53, 2719–2727. Vikram, M., H. Huang, B. Shanker, and T. Van (2009), A novel wideband FMM for fast integral equation solution of multiscale problems in electromagnetics, IEEE Trans. Antennas Propag., 57, 2094–2104. Zhao, J. S., and W. C. Chew (2000), Integral Equation Solution of Maxwell’s Equations from Zero Frequency to Microwave Frequencies, IEEE Trans. Antennas Propag., 48, 1635–1645. Zhao, K., M. N. Vouvakis, and J. F. Lee (2005), The adaptive cross approximation algorithm for accelerated method of moments computations of EMC problems, IEEE Trans. Electromagn. Compat., 47, 763–773. X.-M. Pan, Z. Peng, and J.-G. Wei, ElectroScience Laboratory, Ohio State University, 1320 Kinnear Rd., Columbus, OH 43212, USA. ([email protected]) X.-Q. Sheng, Center for Electromagnetic Simulation, School of Information and Electronics, Beijing Institute of Technology, Beijing 100081, China.

11 of 11

Suggest Documents