IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 52, NO. 7, JULY 2004
1693
Fast Dual-MGS Block-Factorization Algorithm for Dense MoM Matrices Robert J. Burkholder, Senior Member, IEEE, and Jin-Fa Lee, Member, IEEE
Abstract—A robust method is introduced for efficiently compressing dense method of moments (MoM) matrices using a dual -factorization algorithm based modified Gram–Schmidt blockon low-rank singular value decomposition. The compression is achieved without generating the full matrix or even full subblocks of the matrix. The compressed matrix may then be used in the iterative solution of the MoM problem. The method is very robust because it uses a reduced set of the original matrix entries to perform the compression. Furthermore, it does not depend on the analytic form of the Green’s function, so it may be applied to arbitrarily complex media. Index Terms—Singular value decomposition (SVD), fast solvers, integral equations, method of moments (MoM), matrix decomposition, iterative methods.
I. INTRODUCTION
T
HE SOLUTION of large scale electromagnetic (EM) integral equation problems by the method of moments (MoM) is burdened by the creation and storage of a dense system impedance matrix [1]. For a general MoM problem subsectional basis functions, the computational using for filling the matrix, storing complexity is it, and solving the system via an iterative algorithm [2]. This paper presents a new algorithm for efficiently compressing the MoM matrix to reduce the memory requirement, matrix fill time, and the time of the iterative solution relative to a full -factorization approach matrix approach. A modified blockis developed based on low-rank singular value decomposition (SVD). The unique features of this approach are that the compression is achieved without assembling the entire matrix, and without an analytic representation of the Green’s function. Significant progress has been made to accelerate the solution of integral equations encountered in EM radiation and scattering problems by deriving reduced representations for the matrix interactions. The single-level fast multipole method (FMM) has been shown to be very useful by reducing to [3], the computational complexity from [4]. This is achieved by spatially decomposing the geometry basis functions per into equally sized groups with group. The matrix blocks corresponding to self-groups and nearest-neighbor groups are filled using the traditional MoM
Manuscript received November 4, 2002; revised April 1, 2003. This work was supported by Temasek Laboratories, National University of Singapore, under Contract TL/EM/2002/0014. The authors are with the ElectroScience Laboratory, Department of Electrical Engineering, The Ohio State University, Columbus, OH 43212 USA (email:
[email protected]). Digital Object Identifier 10.1109/TAP.2004.831333
approach. For separated groups, the free space Green’s function is expanded using the addition theorem for Hankel functions combined with a plane wave spectral integral. This results in the aggregation-translation-disaggregation of a set of plane waves to compactly represent the matrix interactions between separated groups. Multilevel versions of the FMM have achieved by recursively subdividing the nearest-neighbor groups and re-applying the multipole expansion at each level [5]. The accuracy in the FMM is controllable by varying the number of multipoles and plane waves in the expansion, of course with a tradeoff of efficiency. One of the drawbacks of the FMM is its dependence on the analytic form of the Green’s function. This makes it cumbersome to apply to problems other than for an infinite homogeneous medium. Another limitation of the FMM is that all the groups at a given level must be the same size (although it is not necessary that groups exist on all levels at all locations). For highly nonuniform meshes this causes large disparities in the number of basis functions in each group, and hence the FMM grouping algorithm will not be optimal. Lastly, the FMM may break down at very low frequencies because of the singular behavior of the spherical Hankel function for small arguments. Another approach which has not attracted as much attention yet is the block-SVD method used in the integral equation solver “IES ” (ice-cube) [6]. This approach takes advantage of the smoothness of the integral equation kernel over local regions of space, and hence, the redundancies in the dense impedance matrix. Similar to the FMM, the geometry is spatially decomposed into groups. Each block of the impedance matrix corresponding to the interactions between a pair of groups is compressed using SVD. The rank of the block (i.e., the number of linearly independent column vectors) is generally much less than the total number of columns in the block because of the strong linear dependence between columns. By removing the linear dependencies in the impedance matrix, a much more compact representation is possible. The method is very robust because it does not depend on the explicit form of the Green’s function kernel, nor does it require that all the groups are of the same size. A proof of rank reduction as a function of group separation distance is presented in [7] for electrostatic problems. The same principle applies to time-harmonic problems as the numerical results of this paper will show. A direct SVD of all the blocks of the impedance matrix would be very inefficient in the IES method. Therefore, a -factorization method is used in [6] which relies on the blockmodified Gram–Schmidt (MGS) orthogonalization procedure [8]. The matrix is filled with an orthonormal set of column
0018-926X/04$20.00 © 2004 IEEE
1694
IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 52, NO. 7, JULY 2004
vectors that approximately spans the original matrix block . Partial pivoting is used in the MGS so that the dominant are incorporated in column vectors of the original block the order of their importance. Hence, an error threshold may be defined to halt the MGS algorithm when no significant new information is being added. In this way the accuracy of the method is controllable. A similar method is presented in [9] which uses Lanczos bidiagonalization to perform the block-factorization. -facEven with the MGS and Lanczos methods, a direct torization approach still has a computational complexity of because the full matrix block must be assembled prior to the factorization algorithm. This hurdle is overcome in [6] by using a reduced set of auxiliary point sources and receivers to obtain the and matrices via interpolation, but the method loses some of its natural robustness because the points are not uniquely defined and the implementation details are omitted in [6]. Furthermore, some of the built-in error control of the MGS algorithm is necessarily lost. The approach -factorization introduced in this paper is based on the blockapproach of [6], but overcomes the efficiency problem without sacrificing robustness. Rows and columns of the original block matrix are chosen selectively in a dual-MGS algorithm to be described in Section III. The rows and columns are both used to define sets of orthonormal row and column bases. The trick is to select the dominant rows and columns without a priori knowledge of all the rows and columns of . This is achieved by choosing the largest entry in the most recently added row or column to select the next column or row, respectively. The Dual-MGS algorithm is halted when the orthonormal column basis (i.e., the columns of ) approximately spans the space matrix is found by of the original matrix . Next, the co-location using only the rows and columns of that have been generated within the Dual-MGS algorithm. is , then only rows and columns of If the rank of are used to generate the -factorization. It will be demonstrated that the computational cost of assembling and storing the compressed impedance matrix, as well as the cost of the matrix-vector product operation in the iterative solver, are reusing a simple-minded grouping scheme. It duced to complexity can be achieved with is expected that an adaptive multilevel grouping approach as done in [6], but that is left for a future submission. Section II first presents the conventional brute-force MGS -factorizing a rank-defialgorithm with partial pivoting for cient matrix, and Section III then describes the modifications leading to the new Dual-MGS algorithm. Section IV derives an estimate of the associated operational count (numerical complexity) and Section V presents some illustrative numerical results. II. CONVENTIONAL MGS QR-FACTORIZATION ALGORITHM WITH PARTIAL PIVOTING The MoM gives rise to a dense system impedance matrix whose entries describe the interactions between sources (basis functions) radiating to receivers (test functions). The geometry may be divided up into groups, and the basis and testing func-
Fig. 1. B.
Two separated groups of basis functions associated by a matrix block
tions within each group may be ordered such that the interactions between any two groups comprise a rectangular block of the matrix . The matrix block describes the interactions between the sources and receivers in any two groups as shown in Fig. 1. impedance It is desired to obtain the factorization of an matrix block corresponding to two separated groups as (1) where is is and is the rank. This corresponds to receivers and sources shown in Fig. 1. is expected to be low-rank if the groups are well-separated, i.e., [7]. The Grahm-Schmidt algorithm extracts a set of orthonormal column vectors that approximately span the same space spanned by the columns of . The MGS algorithm with partial pivoting selects the orthonormal vectors in order of their importance, as done in SVD [8]. The algorithm can then be halted when a new vector adds a negligibly small contribution. The orthonormal column vectors are the columns of the matrix . is then easily found from (2) where denotes the Hermitian (conjugate transpose). The columns and rows of may be represented as
.. .
(3)
column vector (corresponding to transwhere row vector (corresponding to receiver ). mitter ) and (Note that the rows of are not used in the conventional MGS algorithm, but will be used in the Dual-MGS of Section III.) as the orthonormal column vectors of the Likewise, define . The convenMGS expansion such that tional MGS algorithm with partial pivoting is given as follows: 1) Find such that 2) 3) 4) While a) b) c) d) e) For
and
BURKHOLDER AND LEE: DUAL-MGS BLOCK-FACTORIZATION ALGORITHM FOR MoM MATRICES
f) Find
such that
and
g) h) i) j)
g) where denotes the Hermitian inner product and is the prescribed error threshold relative to the largest column vector of . The salient feature of this algorithm is that the unused column vectors are made orthogonal to the current set of -vectors in step 4e). The largest of these then becomes the next -vector in the set. Note that instead of actually pivoting (exchanging or re-ordering) the column vectors as they are used, we keep track of in step 4b) so that those the original column indexes with columns are excluded from the operations of steps 4e) and 4f) in the current and subsequent iterations. This substitute for pivoting is illustrated here because actual pivoting cannot be used in the Dual-MGS algorithm of the next section. . The algorithm has a computational complexity of The bottlenecks are in steps 1), 4e) and 4f) which require all the columns of . The computational complexity of finding via in the conventional approach. This results (2) is also in an overall complexity for factorizing all the blocks in the original matrix. III. FAST
-FACTORIZATION WITH DUAL-MGS ALGORITHM
In view of the computational complexity of the conventional MGS algorithm, it is highly desirable to obtain the factorization without assembling the full matrix. A dual-row-column-MGS approach is presented in this section that uses only rows and columns of the original matrix. It follows the same general procedure of the conventional MGS with pivoting, but with a more approximate row or column selection criterion that does not rely on the full matrix. As in Section II, let and represent the columns and rows of , respectively, for and . Also, again let represent the orthonormal column vectors of for that span the space of the columns of . Likewise, represent an orthonormal row vector basis set for let derived from an MGS expansion of the rows of . In denotes the th element of row , and the following, denotes the th element of column . The Dual-MGS algorithm is as follows:
1695
7) In summary, each new column or row candidate vector is chosen based on the largest element of the previous row or column vector of the Dual-MGS expansion, respectively. The of are generated as they are needed rows and columns in steps 1), 3), 6d), and 6h). Again, instead of actual pivoting, and of steps 1), 3), 6c), and 6g) keep track of have been which columns and rows of the original matrix used so they are not selected in steps 6a) and 6f) of the current and subsequent iterations. The row indexes are also used later to find . This approach is a substitute for pivoting because the full matrix is not available, so therefore the rows and columns cannot be interchanged arbitrarily. The computa. The most tional complexity of the algorithm is time consuming operations are in steps 6d) and 6h). As in the conventional MGS QR-factorization algorithm, the Dual-MGS algorithm halts when an error threshold is reached. However, since the dominant rows and columns are selected maapproximately without a priori knowledge of the full trix, it is possible for the Dual-MGS algorithm to halt prematurely. While this is theoretically possible, in practice it has been found that the algorithm reliably finds the dominant linearly independent columns of the matrix. It is noted that the same possibility of premature termination exists for other approaches that use iterative searches for orthogonal vector bases (e.g., conventional MGS and Lanczos methods) because breakdown can occur due to limited numerical precision. To help avoid this in the new Dual-MGS method, the algorithm may be allowed to continue for a small number of additional iterations. The algorithm halts when no new significantly linearly independent columns are found within those few extra iterations. is found with a similar computational comThe matrix plexity using a co-location approximation involving only the of and the columns previously generated rows of . Using (1) and (3), the elements of row , are given by (4)
1) 2) Find such that 3) 4) 5) 6) While a) Find such that b) c) d) e) f) Find
such that
and
and
Equation (4) describes a linear system of equations for solving for column of . The system matrix is and LU-factorized. Then (4) formed from the elements of is solved times for all the columns of . The computational . complexity of this step is Assuming the average number of basis functions in each and the average rank of all block pairs is apgroup is proximately constant (not including the self-blocks), the total computational complexity for compressing the entire matrix is where is the total number to be proportional to yields a of groups. Choosing for the compression. The next section complexity of explains this choice of .
1696
IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 52, NO. 7, JULY 2004
IV. OPERATIONAL COUNT OF THE MATRIX-VECTOR PRODUCT -factorization of each block of the impedance matrix The may be used to perform the matrix-vector product (MVP) for , that block. The operational count for computing . For a full where is a column vector of order , is , so the representation is matrix block the MVP count is . Likewise, the storage requirement used only if complex variables, whereas for for each and is it is . If the rank is much smaller than and we expect to get significant compression of the original matrix. Again, assuming the average number of basis functions in each group is , and the average rank is , the total operational count of the MVP including all blocks of the impedance matrix is estimated by
(5) is the number of groups. (The operational where .) The count is relative to a full matrix MVP which has count first term in (5) is the operational count of filling the self-inter-factorizaaction blocks of the matrix. We do not expect the tion to be able to appreciably compress the self-blocks, so they are filled in the conventional manner. to miniIn general, it is difficult to choose an optimum mize the count because of the complex dependence of on the group size and the number of well-separated groups. However, for electrically large problems we expect the average rank to approach a lower limit. If we assume is constant, the count is . With this choice minimized by choosing proportional to ; likewise, the for the computational complexity is computational complexity of the matrix fill stage may also be , albeit with a larger scale factor. shown to be V. NUMERICAL RESULTS In the following results, the combined source electric field integral equation [10] is to be solved using Rao–Wilton–Glisson (RWG) triangular roof-top basis functions [11] to expand the surface current. The impedance matrix entries are obtained by Galerkin testing of the integral equation [1]. The geometry is a 1-m perfect electric conductor (PEC) cube. First, to demonstrate the rank-deficiency of matrix blocks corresponding to separated groups, and to show that the average rank is relatively independent of the number of basis functions per group, Fig. 2 shows the 1-m cube with the surface divided up into 56 equally sized cube-shaped groups of size 0.25 m. The rank of each impedance matrix block corresponding to each pair of groups is computed using the Dual-MGS algorithm, with the exception of the self-blocks. The average rank is plotted in Fig. 2 as a function of the number of RWG basis functions per group, with the physical size of the groups held constant 0.01 is used in at 0.25-m cubed. An error threshold of the Dual-MGS algorithm. The number of basis functions per group is varied in two ways: 1) by increasing the mesh density from 192 to 3072 RWGs per square wavelength for a fixed frequency of 300 MHz and 2) by increasing the frequency from 300 to 1200 MHz while keeping the mesh density constant at
Fig. 2. Average rank for nonself-blocks of 1-m PEC cube impedance matrix. The fixed frequency plot is at 300 MHz as the mesh density varies from 192 to 3072 RWGs per square wavelength. The varying frequency result uses a fixed mesh density of 192 RWG basis functions per square wavelength as frequency varies from 300 to 1200 MHz. Error threshold in the dual-MGS algorithm is 0.01.
192 RWGs per square wavelength. The average rank stays less than 12 as the number of RWGs per group increases from 21 to 329. The varying frequency result shows the average rank has some small frequency dependence. Fig. 3, plots the average rank as a function of the number of groups for the 1-m cube. The mesh density is fixed at 192 RWGs per square wavelength and the frequency is 600 MHz. As the number of groups increases, a larger number of groups will become more widely separated relative to the group size, and hence, the average rank is expected to decrease. However, a lower limit will eventually be reached because a certain minimum rank is necessary to adequately represent the vector field over the receiving group. As Fig. 3 shows, the rank curve tends to flatten out for large numbers of groups. Fig. 4, plots the CPU times and memory requirements for generating the matrices of the 1-m PEC cube on a 450-MHz Pentium II processor. The number of basis functions per group for the Dual-MGS results. is chosen to be approximately The full matrix results are also plotted for comparison. The scaling in both fill Dual-MGS results demonstrate time and memory as compared with the full matrix scaling of . It is noted that even though the -factorization complexity is , it has not been perfectly optimized. The
BURKHOLDER AND LEE: DUAL-MGS BLOCK-FACTORIZATION ALGORITHM FOR MoM MATRICES
1697
Fig. 3. Average rank for nonself-blocks of 1-m PEC cube impedance matrix for a fixed frequency of 600 MHz and a fixed mesh density of 192 RWG 4, 608 and varies from 21 to basis functions per square wavelength. 576 RWGs per group. Error threshold in the dual-MGS algorithm is 0.01.
N=
m
Fig. 5. Radar backscatter patterns of 1-m PEC cube computed with GMRES iterative solver. Frequency is 600 MHz and mesh density is 192 RWG basis functions per square wavelength with 4608. Pattern is in the horizontal plane. (a) Vertical polarization and (b) horizontal polarization.
N=
Fig. 4. Fill times and memory for the 1-m PEC cube impedance matrix with a fixed mesh density of 192 RWG basis functions per square wavelength. Frequency varies from 300 to 1200 MHz. Error threshold in the dual-MGS algorithm is 0.01. (a) Matrix fill times and (b) matrix storage requirements.
results would be even better with a more optimum choice of or better yet with an adaptive grouping procedure.
,
Fig. 5, shows the radar backscatter patterns of the 1-m PEC cube computed using the full matrix and the Dual-MGS -factorized matrix with the generalized minimal residual (GMRES) iterative solver [12]. The residual error threshold in the GMRES algorithm is 0.01 in all results. An error threshold 0.01 and 0.1 is used in the Dual-MGS results. The of agreement is excellent for 0.01, but only fair for 0.1. It was found that the Dual-MGS algorithm could not give good 0.01 due to the compression for a smaller threshold than approximate nature of the column selection approach. This ultimately limits the accuracy of the new method, although very 0.01. good results have been obtained for a threshold of Table I summarizes the CPU requirements for this problem. It is also important to consider the error introduced by the approximation for computing matrix vector Dual-MGS products in the GMRES iterative solver. Fig. 6 shows the error in the solution vector as a function of the number of iterations. The error is defined relative to the solution vector generated by the full matrix. The important thing to observe is that the error stabilizes as the number of iterations increases. For a threshold of 0.01 in the Dual-MGS algorithm, the error in the final solution vector is about 0.06, and for a threshold of 0.1 the
1698
IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION, VOL. 52, NO. 7, JULY 2004
TABLE I CPU REQUIREMENTS FOR 1-M CUBE BACKSCATTER RESULTS. FREQUENCY IS 4608 RWG BASIS FUNCTIONS 600 MHz AND
N=
does not depend on the analytic form of the Green’s function. Therefore, it can be applied to integral equation problems set in arbitrary background environments. Examples include stratified or inhomogeneous media, and complex environments characterized by a general ray-optical Green’s function. While good accuracy has been demonstrated for a MoM radar scattering problem, further work is needed to allow the Dual-MGS algorithm to attain good compression for threshold 0.01. It is also of interest to quantify the levels less than relationship between the error threshold in the Dual-MGS algorithm and the error observed in the final solution. Lastly, it remains to investigate more optimal grouping strategies that can minimize the size of the compressed matrix for a prescribed accuracy. REFERENCES
QR
Fig. 6. Error introduced by the Dual-MGS approximation in the solution vector of the GMRES iterative solver. Target is a 1-m PEC cube. Frequency is 600 MHz and mesh density is 192 RWG basis functions per square wavelength with 4608. Azimuth and elevation angles are zero (broadside incidence). (a) Vertical polarization and (b) horizontal polarization.
N=
error is 0.3. The error observed in the backscatter patterns of Fig. 5 is at most only about 2 dB even for the threshold 0.1 case. VI. CONCLUSION It has been demonstrated that the compressed matrix fill time and memory requirement of the new Dual-MGS block factorusing a simple-minded grouping ization method are complexity can be strategy. It is expected that attained with an adaptive multilevel grouping approach. Because the algorithm uses only the original matrix entries, it is straightforward to implement into existing MoM software. The new approach is also robust in that the compression algorithm
[1] R. F. Harrington, Field Computation by Moment Method. New York: IEEE Press, 1993. [2] A. F. Peterson, S. L. Ray, and R. Mittra, Computational Methods for Electromagnetics. New York: IEEE Press, 1998. [3] N. Engheta, W. D. Murphy, V. Rokhlin, and M. S. Vassiliou, “The fast multipole method (FMM) for electromagnetic scattering problems,” IEEE Trans. Antennas Propagat., vol. 40, pp. 634–641, June 1992. [4] R. Coifman, V. Rokhlin, and S. Wandzura, “The fast multipole method for the wave equation: A pedestrian prescription,” IEEE Antennas Propagat. Mag., vol. 53, pp. 7–12, June 1993. [5] J. Song, C.-C. Lu, and W. C. Chew, “Multilevel fast multipole algorithm for electromagnetic scattering by large complex objects,” IEEE Trans. Antennas Propag., vol. 45, pp. 1488–1493, Oct. 1997. [6] S. Kapur and D. E. Long, “IES3: Efficient electrostatic and electromagnetic simulation,” IEEE Computat. Sci. Eng. Mag., vol. 5, pp. 60–67, Oct.–Dec. 1998. [7] X. Sun and N. P. Pitsianis, “A matrix version of the fast multipole method,” in Proc. SIAM Rev., vol. 43, 2001, pp. 289–300. [8] G. H. Golub and C. F. Van Loan, Matrix Computations, 2nd ed. Baltimore, MD: The Johns Hopkins Univ. Press, 1989. [9] F. X. Canning and K. Rogovin, “Fast direct solution of standard momentmethod matrices,” IEEE Antennas Propag. Mag., vol. 40, pp. 15–26, June 1998. [10] J. R. Mautz and R. F. Harrington, “A combined-source formulation for radiation and scattering from a perfectly conducting body,” IEEE Trans. Antennas Propag., vol. AP-27, pp. 445–454, July 1979. [11] S. M. Rao, D. R. Wilton, and A. W. Glisson, “Electromagnetic scattering by surfaces of arbitrary shape,” IEEE Trans. Antennas Propag., vol. AP-30, pp. 409–418, May 1982. [12] Y. Saad and M. Schultz, “GMRES: A generalized minimal residual algorithm for solving nonsymmetric linear systems,” SIAM J. Sci. Statist. Comput., vol. 7, pp. 856–869, 1986.
Robert J. Burkholder (S’85–M’89–SM’97) received the B.S., M.S., and Ph.D. degrees in electrical engineering from The Ohio State University, Columbus, in 1984, 1985, and 1989, respectively. From 1989 to the present, he has been with ElectroScience Laboratory, Department of Electrical Engineering, The Ohio State University, where he currently is a Research Scientist and an Adjunct Associate Professor. He has contributed extensively to the EM analysis of large cavities, such as jet inlets/exhausts, and is currently working on the more general problem of EM radiation, propagation, and scattering in realistically complex environments. He has published 25 peer-reviewed journal papers and one book chapter. His research specialties are high-frequency asymptotic techniques and their hybrid combination with numerical techniques for solving large-scale electromagnetic radiation and scattering problems. Dr. Burkholder is a Full Member of the International Scientific Radio Union (URSI) Commission B, and a Member of the Applied Computational Electromagnetics Society (ACES). He has served as an Associate Editor for IEEE TRANSACTIONS ON ANTENNAS AND PROPAGATION and is currently an Associate Editor for IEEE ANTENNAS AND WIRELESS PROPAGATION LETTERS.
BURKHOLDER AND LEE: DUAL-MGS BLOCK-FACTORIZATION ALGORITHM FOR MoM MATRICES
Jin-Fa Lee (M’01) received the B.S. degree from National Taiwan University, Taiwan, R.O.C., in 1982 and the M.S. and Ph.D. degrees from Carnegie-Mellon University, Pittsburgh, PA, in 1986 and 1989, respectively, all in electrical engineering. From 1988 to 1990, he was with ANSOFT Corporation, Pittsburgh, where he developed several CAD/CAE finite-element programs for modeling three-dimensional microwave and millimeter-wave circuits. His Ph.D. studies resulted in the first commercial three-dimensional FEM package for modeling RF/Microwave components, HFSS. From 1990 to 1991, he was a Postdoctoral Fellow at the University of Illinois at Urbana-Champaign. From 1991 to 2000, he was with the Department of Electrical and Computer Engineering, Worcester Polytechnic Institute, Worcester, MA. Currently, he is an Associate Professor in the ElectroScience Laboratory, Department of Electrical Engineering, The Ohio State University, Columbus. His current research interests include analysis of numerical methods, fast finite-element methods, integral equation methods, hybrid methods, three-dimensional mesh generation, and nonlinear optic fiber modeling.
1699