Sparse Transformations and Preconditioners for 3-D Capacitance ...

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS VOL. XX, NO. Y, MONTH 200X

1

Sparse Transformations and Preconditioners for 3-D Capacitance Extraction Shu Yan, Vivek Sarin, and Weiping Shi

Abstract— Three-dimensional capacitance extraction algorithms are important due to their high accuracy. However, the current 3D algorithms are slow and thus their application is limited. In this paper, we present a novel method to significantly speed up capacitance extraction algorithms based on boundary element methods, under uniform and multiple dielectrics. coefficient matrix in the boundary element method is The dense, even when approximated with the fast multipole method, where is the number of panels needed to discretize the conductor surfaces and dielectric interfaces. As a result, effective preconditioners are hard to obtain and iterative solvers converge slowly. In this paper, we introduce a linear transformation to dense coefficient matrix into a sparse matrix convert the with nonzero entries, and then use incomplete factorization bus to produce a very effective preconditioner. For the crossing benchmark, our method requires at most 4 iterations, whereas previous best methods such as FastCap and HiCap require 10-20 iterations. As a result, our algorithm is up to 70 times faster than FastCap and up to 2 times faster than HiCap on these benchmarks. Additional experiments illustrate that our method consistently outperforms previous best methods by a large magnitude on complex industrial problems with multiple dielectrics.

Index Terms— Parasitic extraction, capacitance extraction, boundary element method, iterative method, preconditioning.

I. I NTRODUCTION APACITANCE extraction is important for timing verification and signal integrity analysis of VLSI circuits, multi-chip modules, printed circuit boards and packages. Most existing methods fall into two categories: library look-up where the layout is divided into sections and matched against a pre-characterized library to derive the capacitance value, or field solver where the electromagnetic field is solved to derive the capacitance. The library methods are faster, while the field methods are more accurate. As the technology shrinks, the demand for fast and accurate tools is increasing. In this paper, we try to meet this demand by proposing a novel technique to significantly speed-up fast multipole accelerated [1] boundary element method (BEM), which is used by many field solvers such as FastCap [2], HiCap [3], the multi-scale algorithm [4], and hybrid algorithms [5]. The linear system arising from BEM is often solved by iterative methods. However, the linear system is dense, even when approximated with the fast multipole method. As a result, effective preconditioners are hard to obtain and the

C

This research was supported by the NSF grants CCR-0098329, CCR0113668, EIA-0223785, and ATP grant 512-0266-2001. Shu Yan, Department of Electrical Engineering, Texas A&M University, College Station, TX 77843, [email protected] Vivek Sarin, Department of Computer Science, Texas A&M University, College Station, TX 77843, [email protected] Weiping Shi, Department of Electrical Engineering, Texas A&M University, College Station, TX 77843, [email protected]

iterative solvers converge slowly. In this paper, we propose PHiCap, a Preconditioned Hierarchical algorithm for Capacitance extraction. PHiCap uses a linear transformation to convert the dense linear system obtained from the hierarchical algorithm [3] to an equivalent sparse system. The sparse structure is exploited to construct preconditioners based on incomplete LU or incomplete Cholesky factorizations. The transformed linear system is solved by preconditioned GMRES or CG iterative methods (see, e.g., [6]). The rate of convergence of the iterative methods increases dramatically by using these preconditioners. For benchmark examples, PHiCap uses fewer iterations and runs significantly faster than previous methods such as FastCap [2] and HiCap [3]. The number of iterations used by PHiCap is also less than the multi-scale method [4]. In addition to fast multipole accelerated BEMs, there are other fast capacitance extraction algorithms, such as the pFFT algorithm [7], the singular value decomposition (SVD) algorithm [8] and the geometric independent method [9]. We do not know if our method can be applied to speed-up these algorithms. The paper is organized as follows: In Section II, we review the integral equation approach for capacitance extraction for uniform and multiple dielectrics. In Section III, we introduce the new algorithm. We present experimental results in Section IV and conclusions in Section V. II. P RELIMINARIES To compute the self and coupling capacitances, we need to compute the conductor surface charges, given certain conductor potentials. In general, the surface charges satisfy the integral equation

!""$#&%' (1) denotes the known conductor surface potential, where (*) is the the conductor (+ is the dielectric-dielectric surfaces, (*) and (*+ , !" is interfaces, is the charge densities on , # % the Green’s function, the incremental conductor surface / has the .- #,% . The isGreen’s area, and function form

where .

7 8; 7

!" 0 2435'67 91 8: 7 denotes the Euclidean distance between

and

In addition, the interface condition

5 =&> - *( + . Here, 5 < must be satisfied at any point the permittivities of the two adjacent dielectrics C

(2)

5 ? are and < and D , > is

2


the normal to the dielectric-dielectric interface at pointing < 0"E =&> < and = ? "E =&> < are the normal into C , and = component of electric field at in C and D , respectively. Here, the equivalent charge approach [10], [11] is used to deal with multiple dielectric. To solve (1) and (2) numerically, the standard Galerkin scheme is used. In this approach, the conductor surfaces and dielectric-dielectric are divided into > small panels, FG FH JIKIJI$ FML , andinterfaces a dense linear system is formed:

NPOQO S TR O

O

NPOQR S TR R

U

R

UR

O

VWO

t w~} Fig. 1.

\^]_ T` a[bJ` 1 F ] `&aQbK` 1 F _ /c d c e ] " _ #&% _ #,% ] I X S R&R is defined as The -th diagonal entry of f ]] g 5 be the number of leaf panels and be the q - IR~@ be total number of leaf and non-leaf panels. Let the matrix where each nonzero entry represents a link between L the panels in the hierarchical data structure. Let m - corresponding IR~ be the matrix representing the tree structure. Each m row of corresponds to a panel, either leaf or non-leaf,

XYZ* ofandm each column corresponds to a leaf panel. Entry X Z is 1 if panel contains the leaf panel , and 0 otherwise. N m$o0qrm . Here According to [3], we have the q N is a dense matrix with > factorization: block entries and is a sparse

matrix with > nonzero entries.

III. T HE PH I C AP A LGORITHM B. Transforming the Linear System (Step 2) In this section, we present PHiCap, the preconditioned hierarchical capacitance extraction algorithm. Reformulate linear system (3) as follows.

N U

V I

1) Overview: Our transformation is based on the factorizaN mpoqrm . Since aQ`& 'm0 > , we can always construct tion - IR* (described later in an orthonormal transformation this section), such that

(4)

We will show how to transform dense linear system (4) to a sparse system, which is then solved by a preconditioned iterative method. The algorithm is outlined below. The PHiCap Algorithm

where

N

-

L L

IR

N nmpoqrm

1) Construct the factorization N V . to equivalent 2) Transform the dense system U N V. sparse system s s s U 3) Compute an incomplete factorization preconditioner for Ns . N V by preconditioned CG or GMRES 4) Solve s s s U method. 5) Compute capacitance.

where

q o

m:

W

. Thus,

m o r q m m o o

m o o

' q o m q o m W q o

can be represented as

q o

Ns

I

W

YAN, SARIN AND SHI: SPARSE TRANSFORMATIONS AND PRECONDITIONERS FOR 3-D CAPACITANCE EXTRACTION

N

> matrix (we show this property Here, s is a sparse > N later), and ’s denote submatrices that do not contribute to . Since N o N s , the dense linear system (4) is transformed to the sparse system (5) s Vs U o where s U U and V s V . 2) Constructing : We first introduce a basic transformation that is used to construct . Consider the matrix mp Q1 A 1 A Q where is a constant that depends on the height of a node in the tree. There exists matrix )Y¥ )Y¥ H an orthonormal ¡ H£¢¤)Y)Y¥"¥ ¦¨§ HQ© ¡ H£¢¤)Y¥"G ¦§pH[© ¡ H£¢ª)Y¥G ¦¨§pH[© ¡ )Y¥ ¦ §pH 8 ¡ )YG ¥ ¦ § H 8 ¡ )Y¥ G ¦ §pH A 8 «H «H Ns

such that

m where

3

Q §G

G 1. and

h j H : j

§¬G §G A A 8 f f fk ¡ [ j

A ´´ ´ µ µ µ B · ¶ · E¶ · ¶ · ¶ C

D

F

G

C D F G A 1 1 1 1 B 1 1 C 1 1 D 1 1 E 1 F 1 G

À

(a) Transformation with

A ´´ ´ µ µ µ B · ¶ · E¶ · ¶ · ¶ C

D

F

G

Á xÀ

¸

Á x

C D F G A 1 1 1 1 B C D E F G

¼½¼½ Â^Ã ÄÂ Ã

Á x

Á

¼½¼½ ¸ ÂÃ ÂÃ

(b) Transformation with

¹º¹»¹º¹ T¼ ½@¼T½ ¾^¿ ¾¿ ¼½¼½ ¾¿ ¾¿ Á xÀ

C D F G A B C 0 0 D E 0 0 F G

¼ÆÅ*¼ÆÅ¼ÆÅ¼4GÅ

C D F A B 0 0 0 C D E F G

0

ÂÄÃ ÄÂ Ã ¾ ½ ¾ ½ ¾ ½ -¾ ½ Â Ã -Â Ã Á Á xÀ

Á

(6)

.® 1

To simplify the discussion, we define an element tree as a tree with one root and two children. Given a hierarchical data m structure and the corresponding matrix , the transformation is done by a depth-first traversal of the corresponding tree, propagating the transformation upward to the root. Fig. 2 1 , as shown illustrates the procedure. Starting from height in Fig. 2(a), for each element tree rooted at height 1, i.e., trees mp ¯ (B, C, D) and (E, F, G), we can identify the corresponding mp¯ m ¯ blocks ¯ mp in¯ . We construct that transforms all m blocks to blocks without changing anything else in . Next, as illustrated in Fig. 2(b), we identify the element tree at j , i.e., tree (A, B, E), and the corresponding mp° height ¯ m . Note that the rows of A, B, and E have two block in m ° block in columns C and F and columns instances of the ° that transforms the D G, respectively. We construct ° m ° m ° and blocks to blocks without changing anything else in ¯ m . In this way, the transformation is propagated to the root. Finally, as shown in Fig. 2(c), we move the nonzero rows to S the top of the matrix using a permutation matrix . The overall ° ¯ S transformation is given as is easy to see that m correspond to the. Itroot the nonzero rows of node and nodes that are right children of other nodes. In other words, zero m correspond to nodes that are left children of other rows in nodes. For a tree of height ± , the transformation is

S ³²,³² ¯ Jl lKl ° ¯ is constructed according to the element trees at where ¯ ° KIJIJI ³² , and S are orthonormal, the height . Since transformation matrix is orthonormal.

A ´´ ´ µ µ µ B · ¶ · E¶ · ¶ · ¶ C

D

F

G

¼ CÅ ¼ DÅ ¼ FÅ ¼ GÅ

A B C D E F G

Ç ¾^¿ -¾^¿ ¾ ½ ¾ ½ - ¾ ½ -¾ ½ ¸ ¾ ¿ -¾ ¿ Á Á xÀ

A E D G B C F

¼ CÅ ¼ DÅ ¼ FÅ ¼ GÅ ¾,½¾,½ -¾,½ -¾,½ ¾^¿ -¾^¿ ¾^¿ -¾^¿ Ç Á Á xÀ

(c) Permuting rows with E Fig. 2.

Construction of transformation

È

É

for a tree of height .

q o : The matrix q is transformed into q 3) Computing o by applying the transformations as shown below qÊ § ¯ &qÊ o 1 j JIKIJI$ ± qÊ¯q , and then by applying the permutation matrix where S : q o S q ²§ ¯S o I In the hierarchical data structure, this is done by a depth-first traversal of the tree, propagating the transformation upward, in a manner similar to the process of constructing the matrix . q o , we are concerned only In the transformed matrix N with the submatrix s that contains the links among root nodes N and right child nodes. The matrix s can be treated as a sparse matrix with the number of nonzeros that are comparable to N the number of block entries in (see Fig. 3). m 4) Computing V s : The rows of the transformed matrix are orthogonal. It follows o that the rows of are mutually Ë H orthogonal, and that is a diagonal matrix with values j §G , where is the height of the corresponding node in

4


aãKbd äæcå e

÷ø@fù gú@hûKiü! j!ý^kþ$ lnÿmopq r s

t uvwx yz{

| } ~ ! " $ #& %( '& ) * +- ,/ .1 0( 2 35 4 68 7 9 :

^àÎ_Y á`â =; ?< A> 1@ DB F C E

ZÝÎ[] Þ/\ß

VÚÎWY ÛXÜ

GÌÎHJ ÍÐIÏ

KÑÎLN ÒMÓ

çBèKéëê"ìëí;îÄï|ð.ñÎò*óQô|õÄö

OÔÎP ÕQÖ

R×ÎSU Ø.TÙ

Fig. 3. The number of nonzero entries in and the number of block entries are comparable, for various problem sizes. in

the tree. This property is exploited when computing V s :

¯ Vs o V o V I

Furthermore, the sum of entries in each row of is zero for all nodes except the root. The sum of entries in rows j §G . As corresponding to root nodes at height G is §¬G in the locationsa result, V s has nonzero entries of value corresponding to the roots of the conductor surfaces at unit potential. The preceding description is for a balanced tree in which all leaf nodes are at height 1. An unbalanced tree can be embedded into a balanced tree by adding additional dummy nodes, and the above procedure can be followed. An efficient implementation can be developed by avoiding the actual construction of dummy nodes. C. Solving the Transformed System (Steps 3-4) For problems in uniform medium, the sparse linear system (5) is symmetric. We use the incomplete Cholesky factorization with no fill [6] to compute the preconditioner. Preconditioned Conjugate Gradients method is used to solve the system. For problems with multiple dielectrics, the sparse linear system is nonsymmetric. The preconditioner is computed from an incomplete LU factorization with no fill [6]. We use right preconditioned GMRES method to solve the system. D. Computing Capacitance (Step 5) Capacitance can be computed from s directly without , and that U the rows of computing . Recall that s U U U corresponding to root nodes at height have identical nonzero [ §G . Thus, a root node entry in s is entries with value §G times the sum of all the leaf panel charges in that U tree. Capacitance can be computed by adding the root node entries of each conductor in U s after scaling them by corresponding §G G factors .

Fig. 4.

4x4 bus with two layers of dielectrics (section view).

E. Complexity Analysis

N

The complexity of constructing the factorization of in

Step 1 is > [3]. The transformation of the linear system in ¢¡¤>£ ± time, where ± is the height of the Step 2 usually takes > . Since the number of nonzeros in tree. Normally, ± N s is > , the incomplete factorization can be done in > N time. Each iteration requires a matrix-vector product with s , and a solution of the lower and upper triangular systems of the

preconditioner. Thus, Steps 3 and 4 take > time when the number of iterations is small. Capacitance can be computed in constant time. The overall complexity of this algorithm

¢¡¤£ > h¦¥ > , where ¥ is the number of is normally > conductors. IV. E XPERIMENTAL R ESULTS We compare PHiCap with the following algorithms: FastCap with expansion order 2, FastCap with expansion order 1, and HiCap. Other methods, such as SVD [8] and pFFT [7], exhibit performance that is similar to FastCap. No benchmark experiments were reported for the geometric independent method [9]. In [3], HiCap algorithm can only solve problems in uniform media. To make the comparison complete, we extend HiCap to the multiple dielectrics case. The algorithms are executed on a Sun UltraSPARC Enterprise 4000. Unless otherwise noted, the iterations are terminated when the relative H norm of the preconditioned system is reduced below residual 1A . The first set of benchmarks are bus crossing structures ¥ ¥ j h ¥ 1 from [2]. Each bus is scaled to 1 1 . The ¥ distance between the adjacent buses in the same layer is 1 ¥ j and the distance between the two bus layers is . For the uniform dielectric cases, the permittivity is assumed to be 5 6 . For the multiple dielectric cases, see Fig. 4, the medium I ¨ 5'6 surrounding the upper layer conductors has permittivity § and the medium surrounding the lower layer conductors has I ª 5'6 . The shaded box represents the interface of permittivity © the two layers.

YAN, SARIN AND SHI: SPARSE TRANSFORMATIONS AND PRECONDITIONERS FOR 3-D CAPACITANCE EXTRACTION

Tables I and II compare the four algorithms. PHiCap is the fastest and uses much less memory compared to FastCap. The current implementation of PHiCap uses more memory compared to HiCap because of the additional storage needed N for the transformed system s . The storage requirement can be N reduced by q computing s directly, since it is not necessary to construct . In HiCap and PHiCap algorithms, the discretizaA*I A&A8 . This ensures that the tion threshold «n¬ is chosen to be relative error in the capacitance matrix computed by PHiCap is below 3%, which is acceptable in practice. The relative error in the capacitance matrix ® , which is computed by HiCap or 7 8 ® 7-¯ E 7 ® 7-¯ , where PHiCap algorithms, is defined as ® 7 l 7-¯ denotes the Frobenius norm. As per standard practice, ® is computed by FastCap with expansion order 2. Table III shows the first and second rows of the capacitance matrix computed by PHiCap and FastCap. It is easy to see that for the self capacitances and significant coupling capacitances, where a coupling capacitance is considered significant if it is greater than 10% of the self capacitance, PHiCap’s error is mostly within 3%, with respect to FastCap with expansion order 2. The error for the small coupling capacitances is sometimes large, which is acceptable since the small coupling capacitances have minor influence on the circuit performance. Fig. 5 shows the error distribution of the self capacitances and the significant coupling capacitances for the six benchmark examples in Table I and II. C OMPARISON FOR UNIFORM DIELECTRIC . T IME IS CPU SECONDS , MB,

ITERATION IS AVERAGE FOR SOLVING ONE CONDUCTOR , MEMORY IS

Time Iteration Memory Error Panel Time Iteration Memory Error Panel Time Iteration Memory Error Panel

TABLE II C OMPARISON FOR MULTIPLE DIELECTRICS . T IME IS CPU SECONDS , ITERATION IS AVERAGE FOR SOLVING ONE CONDUCTOR , MEMORY IS AND ERROR IS WITH RESPECT TO

Time Iteration Memory Error Panel Time Iteration Memory Error Panel Time Iteration Memory Error Panel

FastCap (order=2) 4x4 Bus 63 13 68 — 3456 6x6 Bus 162 17.1 92 — 5448 8x8 Bus 324 18 133 — 7968

MB,

FAST C AP ( ORDER =2).

FastCap HiCap PHiCap (order=1) with Two Layer Dielectrics 36 1.7 2 14 9 3 39 3.0 3.9 0.6% 1.0% 1.0% 3456 2120 2120 with Two Layer Dielectrics 104 10.4 5.8 17 11.3 3 61 6.3 8.3 0.5% 1.0% 1.2% 5448 4120 4120 with Two Layer Dielectrics 197 32 15 18 12.8 3 86.9 11.5 15.3 0.0% 1.4% 1.4% 7968 6784 6784

120 FastCap(order=1) HiCap PHiCap

100 80 60

TABLE I

AND ERROR IS WITH RESPECT TO

5

FAST C AP ( ORDER =2).

FastCap FastCap HiCap PHiCap (order=2) (order=1) 4x4 Bus with Uniform Dielectric 18.6 19 0.5 0.4 8 14.9 9 3 25.7 16.7 0.7 1.0 — 0.05% 2.1% 2.0% 2736 2736 1088 1088 6x6 Bus with Uniform Dielectric 113.9 68.5 2.4 1.5 14.4 14.5 11.9 3.2 62.5 40.3 1.9 2.9 — 1.1% 2.1% 2.2% 5832 5832 3168 3168 8x8 Bus with Uniform Dielectric 206 204 7.3 3.4 12 21.9 13.0 3.9 112 67 3.1 4.9 — 1.0% 3.1% 3.0% 10080 10080 4224 4224

The second set of benchmarks are complex industrial circuits containing 8 layers of dielectrics and 48, 68 and 116 conductors, respectively. The smallest case of 48 conductors is shown in Fig. 6. The results are in Table IV. FastCap cannot solve these examples because of prohibitive time and memory requirements. PHiCap displays near-optimal preconditioning on these experiments. Fig. 7 shows that the residual norm decreases rapidly for PHiCap. In contrast, the decrease is slower for HiCap. As a result, PHiCap requires much less time to solve the problem.

40 20 0 −4

−2

0

2 Error (%)

4

6

8

Fig. 5. Error distribution of self capacitance and significant coupling capacitance for the 6 examples in Table I and II.

ÆÈÇ¢É ÃÅÄ

á µâ

Á¸Â

ã µä

³µ´ ¾À¿

å µæ ç µè

¼¸½ ¹ º »

εÛ =

Ø ÙÚ

ñ ò µ ó ô µ

ùDú û µ ü

é µê ë µì

õ¤ö ÷ ø µ

¶¸· í µî °²± ï µð Ê¤Ë8Ì ÍÏÎÑÐÓÒÕÔ×Ö

εà =

Ü Ý Þß

Fig. 6. Example with 48 metal conductors and 8 dielectric layers. Metal wires are shaded. Relative permittivity of M1 is 3.9, M2 through M6 is 2.5, and M7 and M8 is 7.0. Layers M2 through M5 have 10 conductors each whereas layers M7 and M8 have 4 conductors each.

6


TABLE III F IRST TWO ROWS OF CAPACITANCE MATRIX COMPUTED BY PH I C AP AND FAST C AP ( ORDER =2) FOR 4 X 4 B US WITH UNIFORM DIELECTRIC .

ý

x"x

ý

x

xzy

ý

ý

x|{

ý

x

ý

x

"

y

{

-48.40 -48.05 ý

'

-40.26 -39.77 ý

}

-40.17 -39.84 ý

-48.48 -46.60 ý

FastCap PHiCap

-137.54 -139.32

468.23 468.27

-132.66 -129.49

-11.89 -18.30

-40.15 -39.96

-32.59 -31.49

-32.54 -31.21

-40.20 -41.04

F IG . 6. T IME IS CPU SECONDS ,

48 conductors HiCap PHiCap 533 122 18.7 2.8 43 59 19840 19840


!"$# %


AttI

Sparse Transformations and Preconditioners for 3-D Capacitance ...

Sparse Transformations and Preconditioners for 3-D Capacitance ...

Suggest Documents

Sparse Transformations and Preconditioners for ... - CiteSeerX

Block-ADI Preconditioners for Solving Sparse ...

Parallel Multilevel Sparse Approximate Inverse Preconditioners in ...

3D TRANSFORMATIONS

PhD Thesis Sparse preconditioners for dense linear ... - CiteSeerX

3D affine coordinate transformations

A Divide-and-Conquer Algorithm for 3D Capacitance Extraction - cs.York

3D Transformations - 3D Scanning, Digital Modelling

SPECTRAL PRECONDITIONERS FOR ... - ECMWF

SUBSTRUCTURING PRECONDITIONERS FOR ... - Semantic Scholar

Parallel Hierarchical Solvers and Preconditioners for ... - CiteSeerX

Fast and Precise 3D Computation of Capacitance of Parallel Narrow

DOMAIN DECOMPOSITION PRECONDITIONERS FOR LINEAR

Circulant Preconditioners for Stochastic Automata

Efficient Preconditioners for Krylov Subspace

Sparse-then-dense alignment-based 3D map

Interface Preconditioners and Multilevel Extension

Krylov Subspace Solvers and Preconditioners

Inpainting with 3D sparse transforms - I-Revues

VIDEO DENOISING BY SPARSE 3D TRANSFORM DOMAIN ...

Trigonometric Preconditioners for Block Toeplitz Systems - CiteSeerX

Support-Theoretic Subgraph Preconditioners for Large ... - CiteSeerX

preconditioners for linearized discrete compressible euler ... - CiteSeerX

LDB: Localization with Directional Beacons for Sparse 3D Underwater ...