Efficient Implementation of a Planar Clock Routing with ... - CiteSeerX

Efficient Implementation of a Planar Clock Routing with the Treatment of Obstacles* Haksu Kim and Dian Zhou Department of Electrical and Computer Engineering 9201 University City Blvd. University of North Carolina at Charlotte, Charlotte, NC 28223-0001 Abstract: In this paper, we present an automatic clock tree design (ACTD) system for high speed VLSI designs. The ACTD is designed to extend the capabilities of the existing computer aided design tools and provides a convenient environment to CAD users. We have developed new theoretical analyses and heuristics. Specifically, the following issues are considered: (i) a planar clock routing, (ii) a solution for avoiding obstacles, (iii) a strategy of buffer insertion, and (iv) a complete system for clock routing. To achieve a planar clock routing, we first present a cuttingline embedding routing algorithm which constructs a planar clock tree topology. Then, we employ heuristic techniques called planar obstacle-avoiding routing which can solve the obstacle-crossing in the clock net. Therefore, this paper introduces two novel algorithms for developing a planar clock routing system with the treatment of obstacles. Both a cutting-line embedding algorithm and a planar obstacle-avoiding routing algorithm show a good enhancement in convenient usage and performance.

Index Terms - Clock distribution, clock tree, planar clock routing, bounded-skew routing, zero skew clock routing.

1 Introduction As digital Integrated Circuits (ICs) are driven at higher and higher clock frequencies, the need for a better clock net routing scheme has become essential. Just as the gate delays have become less significant than the routing delays in interconnects, the skew in clock pulse delivery also becomes ever more significant to system timing. Digital Equipment Corp. was shipping 500MHz microprocessors already in 1996. According to the National Technology Roadmap for Semiconductors (NTRS), high-performance microprocessor frequencies will exceed 1.5 GHz by 2006 [3]. There is less room for error when the clock period becomes small, so great care must be taken to deliver the clock pulse to all points. Therefore, for complex ASICs (or VLSI), circuit designers ensure proper timing by carefully planning and implementing the distribution of clocks throughout the circuit. This part of the design process is critical because poor clock distribution can cause a circuit to malfunction, especially because of problems caused by skew and latency. To minimize skew and latency, circuit designers create clock trees that balance delays and loads in the clock buffers. A planar clock tree may be implemented on a single metal layer. Single-layer clock routing reduces the delay and attenuation through vias, as well as the sensitivity to process variation. When more metal layers become available, the top layer may be exclusively used for clock and power/ground distribution. Further, with the flip-chip assembly technology, the global clock tree can even be put on a single layer of single-chip packages or on the substrates in multichip modules [5]. Since connections between the various layers are made by plated-through holes and vias, it is not simple to achieve uniform electrical parameters on multiple layers. A single-layer clock routing has traces in either the horizontal or vertical direction for ease of design and manufacture. Furthermore, it is easier to adjust a planar clock tree for zero skew and minimal phase delay. We have developed a CAD tool called automatic clock tree design (ACTD) for practical clock routing. The ACTD system is designed to extend the capabilities of the existing computer aided design tools and provides a convenient environment to CAD users. The ACTD system is divided into general routing, detailed routing, and buffer insertion/ re-routing. In this paper, we consider general routing and detailed routing. The detailed buffer insertion/re-routing algorithm will be presented in a separate paper [1]. This paper is organized as follows: The remainder of this section *This research was supported by a research grant from Airforce F49620-96-1-0341.

Page 1

summarizes previous works on the clock routing and the main contribution of our work. Section 2 introduces a cuttingline embedding clock (CLE) routing algorithm for constructing a planar clock tree. Section 3 describes a planar obstacle-avoiding (POA) routing algorithm for detouring obstacle-crossing. Section 4 gives the results of simulation. Concluding remarks are given in Section 5.

1.1 Research trends Many heuristics for clock routing have been proposed in the past. H-tree and X-tree structures are the most widely used, especially in systolic array designs. A further improvement is done by bottom-up pairwise connections which construct a perfect length balanced tree. The recursive geometric matching (RGM) is the bottom-up approach clock routing, which is achieved by constructing a binary tree using recursive geometric matching. In general, the matching operation will pair up the clock entry points (i.e., roots) of all the trees in the current forest. Unlike the geometric matching algorithm, the method of means and medians (MMM) works top down. The MMM algorithm recursively partitions a circuit into two equal parts, and then connects the center of the mass of the whole circuit to the centers of mass of the two sub-circuits. However, all these heuristics focus only on wire length balancing, which is not the real objective of balancing clock delay because that delay is not simply a linear function of wire length. These approaches are not effective enough for tight skew optimization, as encountered in many high-performance designs nowadays. Many difficulties exist between the current method and the goal. There has been active research in the area of high-performance and low-power clock routing. An improved algorithm [19] considers delay balance instead of length balance but adopts a bottom-up process similar to that described in [21]. The delay balanced clock tree is to deliver at exactly the same time the clock pulse from the root to all terminals in the circuit. This goal is known as zero skew clock routing. The zero skew tree (ZST) clock routing algorithm achieves delay balance by enlongating wires that have smaller delays [19]. The major problem with these algorithms is that they create many overlaps in the clock networks. Multiple routing layers must be used to implement such a non-planar network. The performance and routability of the clock network are sacrificed [20]. A clock net on the metal layer with the smallest RC delay is preferable since it avoids the use of vias in the clock net and makes the layout more tolerant of process variations. This motivates the following papers on planar clock routing. In these papers, they assume Euclidean planarity, i.e. all edges in the tree do not cross when an edge is represented by a straight line segment (instead of rectilinear line segments for the Manhattan geometry) on a Euclidean plane. Nevertheless, the cost of an edge is in the Manhattan distance metric. Still, it is not difficult to see that, given a routing solution with Euclidean planarity, we can always embed a straight Euclidean segment by a rectilinear staircase to get a planar rectilinear routing solution [7]. The planar clock routing problem was first studied by Zhu and Dai [20]. They proposed the Max-Min algorithm which assumes a given source location. The two key components of the Max-Min algorithm are governed by the Max-rule and the Min-rule, respectively. The Planar-DME algorithm proposed by Kahng and Tsao [12] is that a single top-down pass can produce the same output as the two-path DME algorithm at the expense of computation time under the path length delay model. The Max-Min and Planar-DME algorithms achieve planarity through higher routing costs measured by the total wire length. As mentioned, X-tree tends to be more costly than H-tree. The planar-DME algorithms incur only an average penalty of 9.9% additional routing costs to achieve planarity, while the planar clock trees generated by the Max-Min algorithm have an average of 35% higher routing costs when compared to the best (non-planar) zero-skew solutions in [18], [7]. Recently, it has been pointed out that it is almost impossible to achieve exact zero-skew in real designs [8]. In fact, it is neither necessary nor desirable to achieve zero-skew [23]. For low power designs, the bounded-skew tree (BST), rather than ZST has been proposed to reduce the clock power [9]. The BST algorithms assume a fixed nonzero skew bound; hence, the actual design requirement is for a bounded-skew routing tree. For timing optimization in lower level designs of VLSI, buffer insertion (or fanout optimization), interconnect topology optimization, and wire sizing play important roles, and a number of algorithms has been proposed for these problems. On fanout optimization problems, most of the previous work focused on buffer tree construction in logic synthesis. Academic clock routing research results have often had limited impact on industry practice, since such practical considerations as hierarchical buffering, rise-time and overshoot constraints, obstacle checking, varying layer parasitics and congestions, and even the underlying design flow are often ignored. Recent works have also given practical methods for BST (P-BST), which is non-planar routing algorithm [4]. The timing optimization for the ACTD system is considered extensively in paper [1]. Most well known academic clock routing research results are shown in Table 1. More detailed surveys of clock tree synthesis can be found in [7], [11].

Page 2

Table 1: Classifying clock routing algorithms Classification

Routing algorithm

Year

MMM

90’

RGM

91’

ZST

92’

Elmore

Topology & Embedding

DME

92’

Linear/Elmore

Bounded-skew

BST

94’

linear

Bounded-skew & Embedding

BST/DME

95’

Linear/Elmore

DME with non-zero skew bound

P-BST

97’

Linear

Obstacle-avoiding BST routing

MIN-MAX

92’

Linear

Planar routing with minimum path using a simple Max-Min algorithm

Planar-DME

96’

Linear

Easy to extend to Elmore delay model

Topology

Delay model

Key feature

Linear Reasonable skew on average O(1/√n) (or path length) Less wirelength than the MMM Zero-skew routing Zero-skew routing with minimizing the total wirelength Assume a fixed non-zero skew bound

Single-layer planar

1.2 Definitions First, we define basic conventions and terminologies used throughout the paper. A tree is a nonempty collection of vertices (v) and edges (e) that satisfies certain requirements. A vertex is a simple object that can have a name and can carry other associated information; an edge is a connection between two vertices. A path between two vertices in a tree is a sequence of distinct edges. Nodes with no children are called leaves (or terminals, or sinks). Figure 1 shows a simple example of a clock distribution with 4 terminals on the XY coordinates and its topological representation. The nodes in a tree divide themselves into levels; the level of a node is the number of nodes on the path from it to the root. The height of a tree is the maximum level among all nodes in the tree. The common distance function used to measure routing length is the rectilinear or Manhattan distance. For two connections located at positions (ax, ay) and (bx, by), the distance d(a,b) = |ax - bx| + |ay - by|. The cost of the edge ev is simply its wirelength, denoted by |ev|; this is always at least as large as the Manhattan distance between the endpoints of the edge, i.e., |ev|>d(l(p), l(v)). The cost of T, denoted by cost(T), is the total wirelength of the edges in T. We denote the set of sink locations in a clock routing instance as S = {s1, s2, ..., sn}. so s2 vr

vl s1

s3 s4

(a) 4 terminals on the XY coordinators

s1

s2

s3 s4

(b) Topological representation

A routing topology G is a rooted binary tree with n terminal nodes corresponding to the sinks in S. A clock tree T(G, S) is an embedding of the connection topology in the Manhattan plane, i.e., each internal node v ∈G is mapped to a location l(v) in the Manhattan plane. The root of the clock tree is the source denoted by so. Any edge between a parent node p and its child q may be identified with the child node, i.e., we denote this edge as eq. A tree is called planar if it can be drawn in the plane without two edges crossing. If di denotes the signal delay from clock source so to sink si, then the skew of clock tree T is given by skew(T) = maxsi,sj ∈S |di - dj|. The pathlength (or linear) and the Elmore delay models are used to compute and optimize the clock tree. The tapping points where the wires come together are very important to good clock net design. These tap-

Page 3

ping points must be carefully chosen to reduce clock skew. The exact computation of the delay of a clock tree is quite difficult. But many algorithms for clock routing use well-defined approximation methods. The first method for determining an initial tapping point location is to ensure that the distance from the root to the leaves in the whole tree is the same. The distance balanced method is implemented by computing the total path distance from tapping point to tapping point to end point, and choosing a tapping point such that the distance will remain the same. Every end point is at the end of a unique path beginning with the root of the clock routing net. This method for determining the initial tapping point seeks to balance the distances of the various unique paths. Under the pathlength delay model, the delay from u to v is the sum of edgelengths in the unique u-v path, i.e. t ( u, v ) =

∑

e w ∈ Path ( u, v )

ew

(1)

A second way to find a tapping is to use the loading weight of each branch to determine the relative distance the tapping point should be to each routing point. This algorithm is called the weight balanced method, or more generally called the Elmore delay model. For instance, if node p is more heavily loaded than node q, the tapping point should be located closer to node p. The Elmore delay model is defined as follows: let r and c denote the unit length wire resistance and capacitance, respectively. Then, the wire resistance and capacitance of edge ev are |ev|·r and |ev|·c, respectively. For node v in G, we use Tv to denote the subtree of T that is rooted at v, and we use Cap(v) to denote the total capacitance of Tv [19]. Then, under the Elmore model the signal delay t(u,v) is defined recursively as t ( u, v ) =

∑

e w ∈ Path ( u, v )

ew ⋅ c e w ⋅ r ⋅  ---------------- + Cap ( w ) .  2 

(2)

1.3 Works of this paper The clock routing problem on a VLSI chip requires an extremely large computational program with a very large number of rows and columns, too large to be solved even by the column-generating techniques[25]. Based on the distribution of nets, the key idea of the clock routing is that it recursively cuts the area of the chip into smaller and smaller regions, until the routing problem within a region can be handled by the local region routing method. Then, the adjacent regions are successively connected to obtain the clock routing of the whole chip. In the simplest terms, the clock routing problem is to place signal lines on a chip in a nonoverlapping manner, and then connect the sinks on the signal line by mutually noninterfering wires according to a given wiring list. The chip can be thought of as a grid-graph with horizontal and vertical arcs, where the nodes are the potential positions of the pins and the arcs are the places for wires. In this paper, since there are thousands of nodes in the grid-graph, and equally many nets, the routing problem is further divided into a general routing and a detailed routing. We have developed a CAD tool for practical clock routing. The ACTD system consists of a cutting-line embedding (CLE) routing as a general clock routing, a planar obstacle-avoiding (POA) routing as a detailed clock routing, and buffer insertion/re-routing as shown in Figure 2. We present a detailed method for developing a planar clock net routing system with the goal of a satisfied bounded skew in delivery. First, the clock terminals are partitioned into a set of clusters using a top-down partition scheme. For each cluster, a local on-chip clock tree is connected with each other using the CLE routing algorithm. There are three phases in the algorithm: partitioning, terminal-routing & embedding, and avoiding overlappings. Next in the detailed routing step, we have developed new theoretical analyses and heuristics to avoid obstacles, which is the POA routing algorithm, using a changing tapping point scheme. This scheme gives a minimum cost solution to build obstacle-avoiding clock routing, using the algorithm of a changing tapping point. Now, a clock tree can produce almost an equal path length in the randomly well distributed terminals. After that, we insert buffers somewhere into the clock tree. Therefore, the main issues on these algorithms are to avoid obstacles and to maintain planar routing in a single layer. Clearly, a clock net produced by this algorithm may not be the minimum pathlength if obstacles exist. On the other hand, our approaches heuristically minimize the cost of the planar clock tree. In practice, circuits still operate correctly within some non-zero skew bounds, and so the actual design requirement is for a bounded-skew routing tree. To maintain a bounded skew, buffer insertion is applied for this unbalanced clock tree. The clock signal is then

Page 4

distributed from the main clock driver to each of the local buffers by means of a global clock tree. Buffers are adjusted by the goal of zero skew or bounded skew specified by the user. General Routing: Partitioning

Cutting-line embedding routing to achieve planar clock tree

Terminal-routing & embedding Avoiding wire oberlappings Detailed Routing: Classifying obstacle-crossing

Planar obstacle-avoiding routing to avoid obstacles

No

exist tapping points

? Sweep-up wires

Yes

Change tapping point

Buffer Insertion/re-routing: to achieve almost zero skew or bounded skew clock tree

Initial buffer insertion

Simulation Re-routing Buffer Re_location

No skew and delay satisfied? Yes

Key features of our planar clock tree algorithm are as follows: (i) it always constructs a planar single layer clock tree, (ii) it solves the obstacle-crossing problems, (iii) the path lengths from the clock source to the terminals are almost the same, and (iv) Simulation run time is fast, since both CLE and POA routing algorithms trace the linked tree and find detour wires. The ACTD system is a CAD tool which processes digital circuit node lists described by design language. Note that physical circuits include various types of obstacles and nodes. In particular, a procedure interpreting node lists, from a node description file (NDF), is a major preprocessing of the ACTD system. A NDF may include the location information of sinks, a source, and obstacles. And it may also include the capacitances of sinks and other digital parameters. The design data in a LEF/DEF can be converted to a NDF format. The interpreter algorithm produces useful information on the basis of the data structures.

2 The Cutting-Line Embedding (CLE) Routing Algorithm The basic idea of the cutting-line embedding (CLE) routing algorithm is to find the cutting-lines (or partitionboundaries) of a clock net, to route along the cutting-line, and to find the embedding of internal nodes in a given topology. Hu and Shing proposed a cut-and-paste technique which partitions the area of the chip into global cells such that a hierarchical solution to the global routing problem can be obtained[HS85]. The CLE routing algorithm is inspired by this technique. Note that a planar clock tree intuitively means that a tree can be drawn in a plane without the edges crossing. Specifically, we first define two cases of the design rule violence for a clock net in order to achieve a planar clock

Page 5

tree. Overlapping and crossing are defined as follows: • Overlapping: when the distance between two (or more) wires is less than (or equal to) a predefined distance. • Crossing: when two or more wires have at least one common point. CLE routing algorithm Input: A set of sinks S and a connection topology G Procedure Partitioning Apply a top-down recursive partitioning; =>Output: a partitioned tree, Tp Input: A partitioned tree Tp Procedure Terminal-routing&embedding (v, Tp) { /* Visiting the tree Tp, (Bottom-up built) */ If(v=terminal) /* Terminal routing within a region */ conn(v) = determine one pattern from an HV pattern set; Connect two(or one) points; mid(v) = decide middle point of conn(v); else if(v=cutting-line) /* Embedding */ Find two middle points: v->left and v->right; conn(v) = determine one pattern from an HV pattern set; Connect two points; mid(v) = decide middle point of conn(v); } If(v->left!=NULL) continue Terminal-routing&embedding (v->left, Tp); If(v->right!=NULL) continue Terminal-routing&embedding (v->right, Tp); =>Output: A clock tree with terminal-routing & embedding, Tr Input: A clock tree: Tr Procedure Sweep-up /* Avoiding overlappings */ Apply Sweep-up scheme for shifting wires; =>Output: A planar clock tree with avoiding overlappings, To

The CLE routing algorithm is described in Algorithm 1, which includes three procedures. Let To be a planar clock tree. Given a set of sinks S and a topology G, the CLE routing constructs To via: (i) a partitioning phase that divides the whole set of endpoints into a hierarchy of smaller groups. As a result of partitioning, Tp includes both a set of sink locations and a set of partitioned lines. Secondly, (ii) a terminal-routing & embedding phase that connects sinks by a connection pattern, which will be defined in Section 2.2, determines locations for the internal nodes in Tp also by a connection pattern. Tr describes the order for connecting a given set of sinks and internal nodes, and (iii) a sweep-up phase that reroutes wires to avoid overlappings. Finally, a planar clock tree, connecting V(t) = Sn ∪ Vt with a set of edges E(t), can be expressed as To = (V(t), E(t)), where Sn = {s1,s2,...,sn} is a set of nodes, and Vt = {t1,t2,...,tn} is a set of tapping points (or branches).

2.1 Top-down partitioning A typical clock net routing problem has many endpoints; practical clock net routing problems include as many as one thousand endpoints. The whole set of endpoints cannot be handled at once. They must be divided into a hierarchy of smaller groups. Therefore, the purpose of partitioning is to divide the large problem into many smaller more manageable ones. The whole design area is first considered a single partition containing all points. It is then divided into two areas, each containing approximately half the original number of points. This process continues until each of the lowest level partitions contain no more than two points. In general, the partitioning algorithm is a binary search for a horizontal or a vertical cut line, such that the number of endpoints in each sub-partition is as close as possible. If the number of endpoints in each sub-partition is still not balanced, then further partitioning is performed. Given a set of sinks S={s1,s2,...sn} to be partitioned, the cutting-line

Page 6

of X (or Y) computes the center of mass of S denoted xcl(S) (or ycl(S)), by calculating the means of the x- and y-coordinates of sinks in S: x y x cl = ∑ ------------i, y cl = ∑ ------------i n n

(3)

The set of sinks are then ordered by their x- and y-coordinates. If S is to be partitioned in the X (or Y) direction, then sinks in the first half of the ordered sink set are grouped in the Sleft (or Sbottom) partition and the rest of the sinks belong to the Sright (Stop) partition. Then, the subsets Sleft and Sright (or, Sbottom and Stop) can be divided recursively until a partition has only one or two sinks [22]. Similarly, in the CLE routing, the area of a chip is partitioned into subregions until a subregion contains one or two nodes. Two nodes are connected by an arc in G if the corresponding subregions are adjacent. We shall introduce a way of partitioning the area of a chip into subregions such that we can use the CLE routing algorithm to solve the planar clock tree design. If the area of the chip is already partitioned into a large number of subregions, we can use the same technique to solve the global routing problem. Consider a vertical line in the chip which partitions the area of the chip into left and right regions. We say that the nets in the vertical cutting-line are separated by the vertical line. In this case, the nets can be included in either the left or right region. For the left region, we find a horizontal line. This will cut the left region into a left-top and a left-bottom region. In the same manner, we can cut the right region into a right-top and a right-bottom region. Each region will be cut into smaller regions by vertical and horizontal lines, alternately, until the number of sinks of most nets are less than two in the region. Let us illustrate the above approach by the example shown in Figure 3. Two subregions (top and bottom) exist in the left region of circuit (Sleft). The top subregion includes s1 and s2, and the bottom subregion includes s3 and s4. Notate cl0 as the cutting-line of the center of the mass of the two subregions from the equation (3). In case of having the Y axis cutting-line, this cutting-line should be parallel with the X axis to prevent overlapping the connection from the root to the each middle points. The same rule can be applied to the X axis cutting-line. ycl can be calculated by adding the value of the four Y axes and dividing by 4, as follows, cl0 = (y1+y2+y3+y4)/4. As the result of partitioning, all pins in the clock nets are separated by one or more partitioned lines. Finally, we create a partitioning tree notated by Tp, which includes both a set of sink locations and a set of partitioned lines. Figure 3(a) illustrates the result of a top-down partitioning in the CLE routing. We can represent the successive partitioning as a binary tree, where the root of the binary tree corresponds to the source. The two successors of the root (v7) are the left (v3) node, and the right (v6) node; the two successors of the left node (v3) are the left-top (v0) node, and left-bottom (v1) node, and so on. The small subregions at the end of the partition are called atomic regions, and they correspond to the leaves of the binary tree. For example, the chip shown in Figure 3(a) is partitioned into four regions, and its corresponding binary tree is shown in Figure 3(b). We shall call this binary tree the partitioned tree (Tp) of the chip. If the region contains a sink, then the successors of the node are empty. s1

cl0

s0 s8

left-top cl1

v7

cl0

right-top

s2 s7 s6 right-bottom

left-bottom s3

cl2

s4

s5

cl2 v6

cl1 v3

v0 s1 s2

v1 s3 s4

v4 s5

v5

s6

s7 s8

s0

(a) A top-down partitioning

(b) A partitioned tree, Tp

Page 7

2.2 Terminal-routing & embedding scheme Terminal-routing & embedding is a recursive bottom-up algorithm for interconnecting two subtrees, and a practical routing algorithm which can solve crossing problems using only vertical wires and horizontal wires. Repeating the process in a bottom-up fashion will construct a complete clock tree. Given the partitioning tree, Tp, the bottom-up routing chooses one connection pattern to connect one or two pins in the atomic regions, and decides exact embeddings of the internal nodes. We use Tr to denote the clock tree constructed by the terminal- routing & embedding phase, which describes the order for connecting the given set of sinks and internal nodes. A formal recursive definition of conn(v) follows. If v is a sink si, then conn(v) = si. If v is a cutting-line, then conn(v) is one of the HV pattern set which merges a left subtree of v and a right subtree of v. A. Terminal-routing within a region By having the locations of two sinks (or one) which are placed in a same atomic region, the connection pattern is classified into seven different types. This collection of connection types is called an HV pattern set by the condition using only vertical and horizontal wires, as shown in Figure 4. All sinks are represented by squares, and sinks are connected by using vertical and horizontal wires. By using any one of the seven pattern set, the tapping point can be detected easily, which will be discussed in Section 2.3. The set of an HV pattern considered in this paper, for connecting two points si(xi,yi) and sj(xj,yj) by Manhattan distance, is defined as follows: Pattern_set={PS, PH, PV, PDTH, PDTV, PTDH, PTDV} where, PS is a Stand-alone connection with xi = xj, yi = yj. PH is a Horizontal connection with xi ≠ xj, yi = yj. PV is a Vertical connection with xi = xj, yi ≠ yj. PDTH is a DownTop with a Horizontal line, which is a connection x and y (xi < xj, yi < yj) having a horizontal wire. PDTV is a DownTop with a Vertical line, which is a connection x and y (xi < xj, yi < yj) having a vertical wire. Specific usage of the two cases can be explained in more detail in later sections. PTDH is a TopDown with a Horizontal line, which is a connection x and y (xi > xj, yi > yj) having a horizontal wire. PTDV is a TopDown with a Vertical line, which is a connection x and y (xi > xj, yi > yj) having a vertical wire.

Stand-alone (PS)

DownTop with a Horizontal line (pDTH)

Horizontal (PH)

DownTop with a Vertical line (pDTV)

Vertical (PV)

TopDown with a Horizontal line (pTDH)

TopDown with a Vertical line (pTDV)

B. Embedding with two subtrees Now, suppose the sinks in each region have been connected by the HV pattern set. We shall paste all the subtrees of the same net in the adjacent regions together, starting at the bottom of the partition tree, and working towards the root. We determine the middle point of each terminal connection. These two middle points can be used to decide one pattern of the HV pattern set for connecting the adjacent regions. If these two middle points are sinks, the terminal-routing algorithm can be applied. DownTop (or TopDown) includes two different shapes of connections due to the partition-boundary in-between the two middle points to be connected. For example, DownTop with a Horizontal line will be used for the connection of two middle points with a horizontal partition boundry. To interconnect two subtrees with a wire and with almost the same path length of the merged tree, the problem to be solved is the decision of where on the wire the new root of the merged tree will be, such that the delay time from this new root to all leaf nodes is almost equal, i.e., it will be easy to achieve the zero skew(or bounded skew) in the stage of buffer insertion and adjustment of the ACTD system. Even though the simulator, in the stage of buffer

Page 8

insertion/re-routing, will be used to finally determine these locations, the initial placement is still important. Poor initial placement of the tapping points could cause more difficulty later in the process. The simplest way to choose a tapping point is to use the midpoint between the two routing points to be connected. This method will work well if the two routing points have the same weight. The midpoint is calculated by generating the equation of the line passing through the two routing points. A tapping point is chosen along that line such that the distance between the tapping point and each routing point is the same. The next step of the CLE routing algorithm is to connect two tapping points, t1~2 for s1 and s2, and t3~4 for s3 and s4. Notate t1~2 as the middle point of the two nodes in the top region, and t3~4 as the middle point of the two nodes in the bottom region. Now, the middle point can be calculated by adding the two Y axes values and dividing by 2, as follows: t1~2 ((x1+x2)/2, (y1+y2)/2) and t3~4 ((x3+x4)/2, (y3+y4)/2). The root tapping point, t1~4, is always located along the line of ycl obtained above. By the method used in partitioning one domain into two subdomains, the length between the each middle point and the cutting-line is almost same, and the root tapping point is always located along the line of the cutting-line. Now, to find the exact root tapping point, two tapping points should be connected to the ycl parallel with Y axis. Then the root tapping point is the middle point between two points which connect the upper and lower middle point to the ycl. By using this formula, a connection pattern can be determined. In this case, a DownTop with a Horizontal line (PDTH) is needed. Figure 5 shows the initial routing using this algorithm. Let t1~2 and t3~4 be the children of node t1~4, and let Tt1~2 and Tt3~4 denote the subtrees of tapping points rooted at t1~2 and t3~4. The construction of the connection from t1~2 to t3~4 depends on the locations of t1~2, t3~4 and the cutting-line between them. We seek placements of this connection, which allow Tt1~2 and Tt3~4 to be connected with minimum length and the planar routing. The values of x- and y-coordinates and the cutting-line are computed and stored for usage in determination of the connection type from an HV pattern set. Finally, this clock tree is constructed by using the HV pattern set of {PDTH, PH, PDTH, PTDH, PV, PTDH, PTDV}. s1

s8

t1~2

t7~8

s2 t1~4

s7 t1~8

t5~8 s6 t5~6

s3

t3~4 s4

s5

* Connections by using an HV pattern set PDTH for S1 and S2 PH for S3 and S4 PDTH for t1~2 and t3~4 PTDH for S5 and S6 PV for S7 and S8 PTDH for t5~6 and t7~8 PTDV for t1~4 and t5~8

s0

2.3 Sweep-up method to avoid overlapping In the general routing described previously, the CLE routing achieves an almost equal path length in the circuit of randomly generated sinks. In addition, it guarantees that a clock net has no crossing, which is a planar routing. However, to simplify routing using an HV pattern set, we do not consider overlapping among wires during the initial routing. As we described before, overlapping is defined when two or more wires have a certain distance less than a predefined distance between or among them. The overlapped wires should be rerouted to avoid this overlapping(s). We can easily predict the location of the overlapping and its shape because we use only the seven pattern shape to construct a tree topology as shown in Figure 4. Sweep-up method for overlapped wires; we can solve overlapping using a simple sweep-up method including four operations of a move-up, a move-down, a move-left, and a move-right as shown in Figure 6. In other words, the sweep-up method enables to find the minimum number of bends for a rectilinear edge from point p to q. Let bi (xi, yi) be a bend point where a bend exists in the rectilinear edge. The coordinates of xi and yi are grid indexes in X

Page 9

(horizontal) and Y (vertical) derections, respectively. For example, an edge from point p to q is connected with a subtree constructed by DownTop with a Horizontal line, as shown Figure 6(a), which is overlapped with its root connection at node q. Overlapped wires are shown as solid. A move-up operation should be used since the edge p and q is placed to the left side of DownTop with a Horizontal line. Figure 7 gives an example of a planar clock tree construction for avoiding overlappings by the CLE routing algorithm for the topology G shown at Figure 3(b). In this example, four sweep-up operations are required: a move-down shifting for t1~4, a move-right shifting for t7~8, a move-down shifting for t5~8, and a move-left shifting for t1~8. Figure 8 shows the performance comparison between the CLE routing and previous works. Even though MMM and RGM construct clock trees with zero skew, these trees can not handle the overlapping problem. The CLE routing algorithm achieves a planar clock tree using only vertical and horizontal wires, while the planar clock tree generated by Min-Max is constructed by using diagonal wires. p p

b2 b3

b1

q

q

b1

b3

b2

q b1

b3

p

b2

b3

b1

q

b2

p (a) a move-up

(b) a move-right

(c) a move-down

(d) a move-left

s1 s8 s2

s7

s6 s3

s4

s5 s0

Cost = 20 Skew = 0

Cost = 20 Skew = 0

s0 (a) Top-down routing by MMM

s0 (b) Bottom-up routing by RGM

Cost = 21 Skew = 0

s0 (c) Planar routing by Min-Max

Cost = 17 Skew = 3

s0 (d) Planar routing by the CLE

The CLE routing algorithm always returns a planar clock tree. This algorithm is very useful since our heuristics are both simple and quick, and solve the overlapping and prevent crossings in a clock net. As the result of the CLE routing algorithm, the clock tree exists as a collection of four different types of objects linked in a tree list structure. The

Page 10

linked list structure, once assembled, should resemble the actual clock routing net. The four basic types of elements are defined as load, wire, branch (or tapping point), and buffer elements to represent sinks, edges, tapping points, and buffer insertion, respectively. The data structures of this planar clock tree contain all information necessary to perform the detailed routing, which is described in Section 3. These data structures are composed of node structure, partition structure, and connection structure. The CLE creates a hierarchical tree structure decided by partitionboundaries (or cutting-lines). These trees may be assembled manually by an additional source code, or by externally with a text input file.

3 The Planar Obstacle-Avoiding (POA) Routing Algorithm This section proposes wire reconstruction rules if obstacles exist in a routing plane. The planar obstacle-avoiding (POA) routing is to construct detour wires in the presence of obstacles. The POA routing employes heuristic techniques in avoiding obstacles. The process of the relocation of wires is needed because of the CLE routing without considering obstacles. Wires generated by the CLE routing may encounter over obstacles. These obstacle-crossing wires should be relocated to the outside of the obstacles in order to achieve the goal of a planar routing. Note that we assume that all obstacles are rectangular and of a non-overlapped manner. First, we define a problem to maintain in a planar clock tree with obstacles. The obstacle-crossing can be defined as follows: • Obstacle-crossing: when any portion of the wire ep which connects any two nodes, or if a tapping point vq is placed over the obstacle region, then we call it an obstacle-crossing. Here, a set of obstacle locations is defined as O = {o1, o2, o3, .., on}. An obstacle is represented by a location of j, oj ∈O, oj = {xi ≤ x ≤ xk, yl ≤ y ≤ ym} where xi> 0, xk> 0, yl> 0, ym> 0, xi ≠ xk and yl ≠ ym. Before the process of clean-up, the type of obstacle-crossing should be defined. The obstacle-crossing problem is classified into two categories: (i) no tapping point on the obstacle, and (ii) tapping point(s) on the obstacle. As shown in Figure 1, the POA routing algorithm examines the existence of the tapping point on obstacles. If there is more than one tapping point, then changing tapping point scheme will be applied. If there is a wire over an obstacle with no tapping point, sweep-up scheme will be applied. The algorithm 2 described below is for the planar clock tree design with avoiding obstacles. A complete planar clock tree, connecting V(t) = Sn ∪ Vt′ with a set of edges E(t), can be redefined as Tc = (V(t), E(t)), where Sn = {s1,s2, .. ,sn) is a set of nodes, and Vt′= {v1′,v2′,...,vn′} is a set of new tapping points after detouring POA routing algorithm Input: A linked list for obstacle O and a planar clock tree To object = s0; /* root node */ Procedure POA(object) If(object = tapping point on obstacle, vj) CTP(vj, Oj); /* call procedure changing tapping point */ Else if(object = wire on obstacle, ej) { If(object has bend) Find new bends around obstacle; Else if(object has no bend) Apply Sweep-up scheme; } If(object->left!=NULL) continue POA (object->left, Tp); If(object->right!=NULL) continue POA (object->right, Tp);} } /* object = {sink; sj, tapping point; vj, edge; ej, buffer; bj} */ => Output: A complete planar clock tree with treatment of avoiding obstacles Tc Procedure CTP(vj, Oj) /* Procedure changing tapping point */ num = number of tapping points on the same obstacle, Oj; v = vj; for(i=0; i=2 */ v′ = find new tapping point of v; Find detour_wire(v, v′, Oj); if(i=num) return; else v = adjacent tapping point of v on the same obstacle;}

Page 11

POA routing algorithm Procedure detour_wire(old-root, new-root, Oj) new-left = the point in between the boundary of Oj and old-left; new-right = the point in between the boundary of Oj and old-right; find bend points to connect two points from new-root to new-left; find bend points to connect two points from new-root to new-right; connect all the bend points;

3.1 Sweep-up scheme for crossed wires without tapping point on the obstacle The sweep-up scheme finds detour wires, determined by the distance from the crossing wire to two parallel sides of the obstacle. The location for the detouring wire should be the shorter of these two distances. After finding the location for detouring, the crossing line is rerouted using one of the operations of a move-right, a move-left, a move-up, or a move-down along the outside of the obstacle as used in the Section 2.3. Note that while detouring, it is possible that any wire has already existed exactly at the same location for the new wire to be detoured. In this case, the POA routing tries alternate regions to avoid overlapping with existing wire. After trying every alternate region, the POA routing may not find any detouring direction, then the POA routing relocates the existing neighboring wire to the new position. However, if we need to maintain the zero path length skew, wires are elongated via snaking as necessary. On the other hand, buffer insertion can be also applied for this unbalanced clock tree due to relocation wires. Type I: Full crossing on an obstacle One of the cases of obstacle-crossing without a tapping point is a full crossing on an obstacle as shown in Figure 9. The simple sweep-up scheme is applied. Four bend points, b1, b2, b3, and b4 are needed for rerouting in this case. Then, a move-up operation is applied instead of move-down because of the requirement of detouring with minimum distance. b2

b3

b1

b4

Type II: Bend point on an obstacle Another type of the obstacle-crossing without a tapping point is a bend point on an obstacle as shown in Figure 10. The simple sweep-up scheme can be used also after finding new bend point. If all bend points around the obstacle have been found, the detour wire should be the connection from b1 to b2 since the connection from b1 to b2 is much shorter than the connection from b1 to b3 to b4, and the number of bends can also be reduced. b3 bend

b4

b1

b2

3.2 Changing tapping point scheme for tapping points on the obstacle The clock tree generated by the CLE routing may contain another type of a design-rule violence. The algorithm chang-

Page 12

ing tapping point (CTP) is applied to the obstacle having one or more tapping points. If one tapping point exists on the obstacle, the tapping point is moved to the outside of the obstacle along the root wire with a certain amount of distance. Note that as the size of the obstacle becomes larger, the number of tapping points and crossings on the obstacle may exist as more than two. As noted before, here is a brief summary of the CTP approach. It constructs a planar clock tree in four phases. Phase I) Given a connection topology, it searches all the nodes in the tree. If one tapping point (vj) has been found on an obstacle. It traces the rest of the tree to find out how many tapping points exist on the same obstacle. If there exists one tapping point, we call this case as a Type III. A Type IV is defined as the number of tapping points as more than one. Phase II) The tapping point, vj should be relocated to the outside of the obstacle. Four candidate regions, left-, right, up- and down-side of the obstacle, will be considered. However, Phase II will try to relocate vj into the same region where the root node of vj is located, since it can avoid a long detouring wire and overlapping of the clock net. Phase III) Now, all the wires generated by Phase II of the relocation of the tapping point should be relocated also. First, the connection between the new tapping point (vj′) and left node of vj should be rerouted with minimum distance and no crossing over obstacles. Next, the connection between the new tapping point (vj′) and the right node of vj should also be connected. This process produces more than two bend points for detouring wires. Phase IV) The new tapping point should be adjusted when it is placed on a bend. A bend can not be a tapping point since tapping points have to have two child nodes (vr, vl). If a new tapping point is not a branch point, all the connections of the bends are searched to find the actual new tapping point. Another case of adjustment is needed when a detouring wire is overlapped on any existing wire. Type III: Obstacle with one tapping point For example in Figure 11, a tapping point vj has a root node (vR), a left child node (vl), and a right child node (vr). The clock tree should be reconstructed to avoid this obstacle-crossing problem, since the CLE routing has connected these three points on an obstacle. Figure 11 shows the steps in the CTP algorithm for detouring one tapping point. In the one tapping point case of the obstacle-avoiding routing, it moves out each tapping point to the outside of the obstacle following the rule of simple sweep-up, until there is no crossing on the obstacle. First, the new-root node(vR′), new-left node (vl′), and new-right node (vr′) can be decided by searching points in-between the boundary of obstacles and each node of vR, vr and vr. Next, new bends should be determined for detouring wires to avoid the obstacle-crossing. Finally, new-root (vR′) becomes a new tapping point (vj′). This process creates two bends (b1 and b2) for the connection between the new-tapping point (vj′) and new-right (vr′), and two bends (b3 and b4) for the connection of the new-tapping point (vj′) and new-left (vl′). vr

vr

vr b2

vr′

vR

vj

vR

vR vj′

vR′ vl ′

vl (a) Phase I: find tapping point on an obstacle

b1

vl (b) Phase II: relocate tapping point

b3

b4

vl (c) Phase III: find detour wires

Note that Phase IV is necessary only when the new tapping point (vj′) is not a branch point. Figure 12(a) shows a subtree having a tapping point (vj), it is the same shape of the example above in Figure 11(a), but in this case vR is the top of the obstacle and vl is the left side of obstacle. Now, we apply the changing tapping point scheme. First, it finds new-root (vR′), new-left (vl′), and new-right (vr′) as shown in Figure 12(b). It chooses the bends around the obstacle and connects between bends (see Figure 12(c)).

Page 13

However, if the distance from vl′ to vr′ is shorter than the distance from vR′ to vr′, or if there exists any wire routed along the detouring wire (here, the detouring wire between vR′ and vr′), then an alternate detouring wire can be used. The detouring wire from vl′ and vr′ can reduce the path length and decrease the clock skew. If we take this bend for the new tapping point, we notice that the new tapping point (vR′) is placed in a bend point, which is not a branch point as shown in Figure 12(d). Adjusting the tapping point enables to find the right place for the new tapping point. It will be determined as one point on the connections of bends. In this example, vl′ becomes a new tapping point, vj′. vR

vR

vR vR′

vl

vl

vr′

vr′

vr

vr

vr

vr

vj ′

vl ′

vl′

(a) Phase I: find tapping point on an obstacle

vR′

vl

vl

vj

vR

vj ′

(b) Phase II: relocate tapping point

(d) Phase IV: adjust tapping point

(c) Phase III: find detour wires

Type IV: Obstacle with more than one tapping point The number of tapping points on the obstacle may exist as more than one, and the occurrence of the obstacle-crossing may be increased as the size of obstacle becomes larger. The procedure will be more complicated because all tapping points are needed to be relocated. By using the changing tapping point algorithm for one tapping point described previously, even 2 or more interconnected tapping point problems can be resolved by applying the above algorithm recursively as shown in Figure 13. First, it finds all new tapping points; vi′, vj′ and vk′. Each tapping point should be relocated by the rule of one tapping point case until there is no tapping point on the obstacle. Next, it finds all newleft and new-right, along with all the tapping points: vil′ and vir′ for vi′, vjl′ and vjr′ for vj′, and vkl′ and vkr′ for vk′. Note that both vkl′, which is the left node of the new tapping point of vk′ and vj′, which is the new tapping point of vj, are placed at the same point, and both vjl′ and vk′ indicate also the same point. All the bends for detouring will be found using the sweep-up scheme. The detouring wire from vj′ to vk′ will be shared as shown in Figure 13(c). vj ′ vj vi

vi ′

vir′ vk′

vi ′

′

vj ′

vkl′ vjl′ vk′

vk vil′

(a) Phase I: find tapping points (b) Phase II: relocate tapping on an obstacle points

vjr′ vkr′ (c) Phase III: find detour wires

Another case of adjustment is needed when a detouring wire is overlapped on any existing wire. Figure 14(a) has a wire, ej (the connection between vj′ and vjr′) over an obstacle, which is an obstacle-crossing. However, it has nowhere to go. Three possible rerouting paths can be considered as shown in Figure 14. One way to solve this problem is to route it in-between detouring wires and obstacle boundary. Figure 14(b) shows the wire is located in the left side of the obstacle boundary. It works very well without relocating other wires if the default rerouting distance is large enough. Thus, we assume that the design rule allows this relocation process if there is enough region for rerouting. However, if we have to keep a certain amount of distance between detouring wires, shifting of existing wires can be used. In this case of keeping the distance, which is one grid line in this example, the existing detouring wires should

Page 14

be relocated again. If the obstacle-crossing ej has been placed on the left side of the obstacle, then the connection from vir′ to vil′ and the tapping point vi′ should be rerouted to the left as shown in Figure 14(c). Figure 14(d) shows the result of detouring when the obstacle-crossing ej has been placed the right side of obstacle.

′ vi ′

vj ′

ej

vk ′

vjr′

(a) obstacle-crossing; ej

b) no shifting

(c) shifting to left

(d) shifting to right

The POA routing algorithm returns a planar clock tree with the treatment of the obstacles. Note that both the CLE routing and POA routing can not guarantee optimal tree cost under the path delay model and the Elmore delay model. For a given planar clock tree, the buffer insertion/re-routing algorithm inserts the same number of buffers along every source-to-sink path in the tree, such that both skew and path delay are minimized. The delay calculation is based on the Elmore delay model. Buffers are adjusted by the goal of minimum skew or bounded skew specified by user; thus, this helps to reduce the sensitivity of signal delay and skew due to process variations [1].

4 Experimental Results The CLE routing algorithm and the POA routing algorithm have been implemented on a SUN (Sparc 20) in ANSI C. Both algorithms were tested on various example circuits. Table 2 shows the characteristics of ten example circuits tested by the proposed algorithms. The first five circuits (ckt-1 to ckt-5) include 64, 100, 580, 1270 and 2016 sinks with randomly distributed locations in a 2000 by 2000 layout region; five examples have 110, 100, 100, 215 and 230 randomly generated obstacles respectively. The next five circuits (ckt-6 to ckt-10) include less sinks but more and bigger obstacles, and these five circuits include 64, 256, 290, 512 and 1024 sinks with 55, 88, 200, 156 and 75 obstacles. The fourth column of Table 2 indicates the percentage of obstacle area. For instance, 25% of the obstacle area represents that 1/4 of the total die size is the total area occupied by obstacles. Note that extremely large obstacles and a large percentage of the obstacle area may have many tapping points on obstacles. In this case, the POA routing may not find detour wires. We define this case as a Type V. For example in Figure 14(a), if there is no way to place the obstacle-crossing wire ej, Phase 4 will fail to adjust this detouring wire. In our experience, simulation results are promising when the percentage of obstacle area is below 35%, and there are no extremely big obstacles. As shown in the fourth column of Table 2, the range of the obstacle occupation varies from 9.436% up to 32.939%. Table 3 shows the simulation results in terms of total wirelength and run time, for the planar obstacle-avoiding routing algorithm, for various instances and obstacle sizes. All sinks have identical 0.5pF loading capacitance and the per-unit wire resistance and wire capacitance are 16.6mΩ and 0.027fF as used in [CPHKK 97]. We assume that a certain region around obstacles for detour wires is available. Figure 15 illustrates the execution of our algorithms: both the CLE routing and POA routing for ckt-2. Ckt-2 includes 100 sinks (with asterisks at terminal points of the tree) and 100 obstacles (rectangular) as shown in Figure 15(a). Figure 15(b) shows that the entire routing region is recursively divided into subregions (partition-boundaries of subregions indicated by thick dashed lines) until only one or two sink lies within each subregions. The tree of terminal-routing & embedding is given by straight lines. The root of the clock tree, s0, is notated by a triangle in the centerbottom of the die. By applying the first algorithm, the CLE routing algorithm produces a non-planar tree with many overlaps, as shown in Figure 15(c). The nonplanar clock tree must employ two or more routing layers to finish physical embedding; however, the skew, source-sink path delay, and routability of the clock net become worse. Therefore, as we described above, the POA routing algorithm is applied for constructing the planar clock tree. With a given clock net generated by the CLE routing algorithm, the POA routing generates a planar clock tree using both the changing tapping point technique and a sweep-up scheme recursively. Figure 15(d) shows the planar clock tree constructed by the POA routing for ckt-2.

Page 15

Table 2: Ten circuits tested by the proposed algorithms Circuits

# of sinks

# of obstacles

% of obstacle area

# of type I

# of type II

# of type III

# of type IV

# of type V

ckt-1

64

110

17.034

23

7

8

0

0

ckt-2

100

100

18.999

29

30

13

0

0

ckt-3

580

100

11.577

29

43

27

4

0

ckt-4

1270

215

16.531

78

86

67

6

1

ckt-5

2016

230

9.439

53

110

81

10

0

ckt-6

64

55

32.939

12

16

11

2

0

ckt-7

256

88

27.841

21

40

26

6

0

ckt-8

290

200

21.343

54

68

32

6

0

ckt-9

512

156

21.046

44

44

45

5

0

ckt-10

640

75

22.998

14

50

28

10

0

Table 3: Total wirelength and run time for planar obstacle-avoiding routing algorithm, for various instances and obstacle sizes (0.2cm x 0.2cm die size). Circuits

Total wire length(cm)

Longest path length(cm)

Path length skew(cm)

Run time (sec)

ckt-1

2.431

0.174

0.061

1.335

ckt-2

3.337

0.208

0.074

2.433

ckt-3

8.020

0.249

0.093

11.951

ckt-4

12.080

0.179

0.072

111.666

ckt-5

15.457

0.196

0.068

116.150

ckt-6

2.452

0.202

0.096

0.566

ckt-7

5.122

0.232

0.070

3.417

ckt-8

5.766

0.244

0.123

17.350

ckt-9

7.421

0.223

0.106

20.500

ckt-10

8.235

0.312

0.105

6.900

Page 16

(a) Ck-2 with 100 sinks and 100 obstacles

(b) The result of partitioning

Page 17

(c) The CLE routing solution

(d) The POA routing solution

Page 18

5 Conclusion We have presented a complete planar clock routing system called ACTD, which is implemented by employing heuristic techniques. The ACTD system is an interactive design tool to extend the capabilities of the existing VLSI physical design tool for practical usage. Our first algorithm, called CLE routing, is implemented to solve the overlapping of clock net routing and prevent the crossing in the clock net, which is a new algorithm to construct a planar clock tree. The second algorithm, called POA routing, reconstructs the clock tree using heuristics, to avoid obstacles with the scheme of changing tapping points. The main contribution of this work is a novel algorithm to construct a planar clock tree, which can be implemented on a single metal layer. We have validated our new techniques experimentally using test circuits. We expect this clock routing algorithm will be widely used for performance enhancement for synchronous VLSI digital systems. Even though the simulator in the stage of buffer insertion/re-routing of the ACTD will insert and adjust buffers to reduce the skew and delay, the locations of the tapping points are still important. Poor initial locations of the tapping points and the relocation of these tapping points could cause a lot more difficulty later in the process because of the unbalanced clock tree. Future work includes extending the CLE routing and POA routing approach to include well-balanced clock trees, since our current implementation constructs a planar clock routing in the presence of obstacles with allowing the path length skew. Extending our present results to the minimum skew goal presents an intriguing direction for future work.

References [1] W. Li and D. Zhou, "An Effective Buffer Insertion Algorithm for High-Speed Clock Network," to be submitted to IEEE International Conference on Computer-Aided Design, 1999. [2] J. Cong, Z. Pan, L. He, C. Koh, and K. Khoo, "Interconnect Design for Deep Submicron ICs,'' Proceedings International Conference on Computer-Aided Design, pp. 478-485, 1997. [3] Semiconductor Industry Association, National Technology Roadmap for Semiconductors, 1997. [4] A. Kahng, and C. Tsao, "More Practical Bounded-Skew Clock Routing,'' Proceedings of 34th ACM/IEEE Design Automation Conference, pp. 594-599, June, 1997. [5] Q. Zhu, and W. Dai, "Planar Clock Routing for High Performance Chip and Package Co-Design," IEEE Transactions on very large scale integration systems, Vol., 4, No. 2, June 1996. [6] J. Xi, and W. Dai, “Useful-Skew Clock Routing With Gate Sizing for Low Power Design,” Proceedings of 33rd ACM/IEEE Design Automation Conference, pp. 383-388, June, 1996. [7] J. Cong, L. He, C Koh, and P. Madden, “Performance optimization of VLSI interconnect layout,” INTEGRATION, the VLSI Journal 21, August, 1996. [8] J. X, and W. Dai, “Buffer insertion and sizing under process variations for low power clock distribution,” Proceedings of 32nd ACM/IEEE Design Automation Conference, pp.June 1995. [9] D. Huang, A. Kahng, and C. Tsao, “On the bounded-skew clock and steiner routing problems,” Proceedings of 32nd Design Automation Conference, pp. 508-513, 1995. [10] C. J. Alpert, A. B. Kahng, "Multiway Partitioning Via Geometric Embeddings, Orderings, and Dynamic Programming,'' IEEE Transaction Computer-Aided Design, vol. 14, pp. 1342-1357, Nov, 1995. [11] A. Kahng and G. Robins. On Optimal Interconnections for VLSI, Kluwer Academic Publishers, 1995. [12] A. Kahng and C. Tsao, “Low-cost single-layer clock trees with exact zero Elmore delay skew,” Proceedings IEEE International Conference on Computer-Aided Design, pp. 213-218, 1994.

Page 19

[13] D. Zhou, S. Su, F. Tsui, D. Gao and J. Cong, “A simplified synthesis of transmission lines with a tree structure,” International Journal Analog Integrated Circuits Signal Processing, pp. 19-30. [14] D. Zhou, S. Su, F. Tsue, D. Gao and J. Cong, “A two-pole circuit model for VLSI high-speed interconnection,” Proceedings of IEEE International Symposium on Circuits and Systems, pp. 2129-2132, 1993. [15] D. Gao and D. Zhou, “Propagation delay in RLC interconnection networks,” Proceedings of IEEE International Symposium on Circuits and Systems, pp. 2125-2128 1993. [16] D. Zhou, F. Tsui and D. Gao, “High performance multichip interconnection design,” Proceedings of 4th ACM/ SIGDA Physical Design Workshop, pp. 32-43, 1993. [17] M. Edahiro, “A Clustering-Based Optimization Algorithm in Zero-Skew Routings,” Proceedings of 30th ACM/ IEEE Design Automation Conference, pp. 612-616, 1993. [18] R. Tsay, “Exact zero skew,” IEEE Transaction on Computer-Aided Design, vol. 12. no 2, pp. 242-249, Feb. 1993. [19] Q. Zhu, and W. Dai, “Perfect-balance Planar Clock Routing with Minimal Path-length,” IEEE International Conference on Computer Design, pp.473- 476, 1992. [20] A. Kahng, J. Cong, and G. Robins, “High-Performance Clock Routing Based on Recursive Geometric Matching, ” Proceedings of 28th ACM/IEEE Design Automation Conference, pp. 322-327, June 1991. [21] M. Jackson, and A. Sirinivasan, and E. Kuh, “Clock Routing for High-Performance ICs,” Proceedings of 27th ACM/IEEE Design Automation Conference, pp. 573-579, June 1990. [22] J. Fishburn, “Clock skew optimization,” IEEE Transaction on Computers, 39(7), pp. 945-951, 1990. [23] J. L. Burns, and A. R. Newton, “Efficient Constraint Generation for Hierarchical Compaction,'' IEEE International Conference on Computer Design, pp. 197-200, 1987. [24] T. Hu and M. Shing, "A Decomposition Algorithm for Circuit Routing," VLSI Circuit Layout: Theory and Design, pp 144-152, IEEE Press, NY, 1985. [25] S. Dhar, M. Franklin, and D. Wang, “Reduction of Clock Delays in VLSI Structures,” Proceedings of IEEE International Conference on Computer Design, pp. 778-783, October 1984. [26] B. Krishnamurthy, "An improved min-cut algorithm for partitioning VLSI networks,'' IEEE Trans. Computer, vol. C-33, pp. 438-446, May, 1984. [27] S. B. Akers, "On the Use of the Linear Assignment Algorithm in Module Placement,'' Proceedings of 18th ACM/IEEE Design Automation Conference, pp. 137-144, June, 1981.

Page 20

Efficient Implementation of a Planar Clock Routing with ... - CiteSeerX

Efficient Implementation of a Planar Clock Routing with ... - CiteSeerX

Suggest Documents

An Implementation of Energy-efficient Routing

Design and implementation of a digital clock

design and implementation of a new routing simulator - CiteSeerX

An Efficient Implementation of Max Tree with Linked List ... - CiteSeerX

Efficient implementation of Marching Cubes' cases with ... - CiteSeerX

GNU/Linux Implementation of a Position-based Routing ... - CiteSeerX

Efficient Routing in Delay Tolerant Networks with ... - CiteSeerX

Efficient Implementation of Rijndael Encryption in ... - CiteSeerX

Designed Implementation of Modified Area Efficient ... - CiteSeerX

EFFICIENT IMPLEMENTATION OF THE MULTISHIFT QR ... - CiteSeerX

Efficient implementation of inverse approach for ... - CiteSeerX

Efficient Distributed Implementation of Semi-Replicated ... - CiteSeerX

Implementation Experience with MANET Routing Protocols - UOW

Efficient implementation of inverse approach for ... - CiteSeerX

Congestion Avoidance and Energy Efficient Routing ... - CiteSeerX

Simple and Efficient Geographic Routing around ... - CiteSeerX

Bandwidth-Efficient Geographic Multicast Routing ... - CiteSeerX

Skew Scheduling and Clock Routing for Improved ... - CiteSeerX

A Routing Protocol with Byzantine Robustness - CiteSeerX

A Routing Protocol with Byzantine Robustness - CiteSeerX

An efficient FPGA priority queue implementation with ... - CiteSeerX

RENEW: A Tool for Fast and Efficient Implementation of ... - CiteSeerX

Implementation of a Bandwidth-Efficient M-FSK ... - CiteSeerX

Region-Based Routing: A Mechanism to Support Efficient ... - CiteSeerX