TWO-DIMENSIONAL DATAPATH REGULARITY EXTRACTION Raymond X.T. Nijssen and Jochen A.G. Jess
Design Automation Section/ES Eindhoven University of Technology, The Netherlands
[email protected]
ABSTRACT This paper presents a new method to automatically extract regular structures from logic netlists containing datapath circuitry. The goal of datapath extraction is the exploitation of structural regularity to efficiently obtain regular placements which are typically more compact. Datapaths constitute increasingly sizeable parts of ever more and larger circuits, hence flexible technology-independent layout tools, unlike state of the art datapath compilers, will become critical in the design flow. Our method transforms a circuit’s existing functional hierarchy, if any, into a 2-dimensional hierarchy that is more suitable for subsequent cell-placement, thereby also automating the currently mostly handcrafted task of selective partial hierarchy flattening. Once the two-dimensional structure is known, the remaining placement task is greatly reduced to arranging just one row and one column of the discovered matrix-like structure, allowing for much larger circuits to be placed in one go. Experiments show superior extraction results compared to existing approaches. 1 INTRODUCTION Bitwise parallelism has become the predominant technique in the design of datapaths in high performance data processing circuits. Due to the repetition of per-bit operators across the width of the data representation, both interconnect structure and component geometries of datapath circuitry are inherently regular. These effects can be exploited to obtain high density layouts as reported in [5] and in other publications. However, the current two mainstream placement methods Gordian/Domino [9] and TimberWolfSC [13] are fundamentally unable take advantage of this structural regularity because these methodologies are based on optimizing objective functions in which regularity cannot be expressed. Several fully tailored datapath synthesis environments called datapath compilers using specialized standard cell libraries like [4] have been developed to answer this need. These systems explicitly put in regularity at the logical level and deal with this information in a very explicit manner throughout the entire design flow down to the layout phase using specialized tools and dedicated cells. An important drawback of such separate technology dependent design
This research was supported by ESPRIT BRA 6855 LINK
flows for certain parts of the circuit not only add cost and integration overhead, but also seriously decrease generality and flexibility. Furthermore, while such dedicated tools yield dense layouts of fully regular circuitry, they rapidly perform worse as the circuit is less regular, causing considerable area waste due to their limited flexibility [12]. This leaves a large class of circuits which would benefit from a regular placement with the same cell library as in the rest of the circuit, but for which neither dedicated systems, nor general layout systems produce satisfying solutions. Among the first addressing this open field, Odawara [11] proposed a methodology which was later refined [8][2][3]. This method is based on improving placement by searching logical designs for a structural characteristic feature typical for datapaths, namely bit-latches repeated over all bits connected via the same terminal type at each latch to one common net. A cell cluster, called location macro, is then grown around all such groups so as to serve as placement initializers for subsequent conventional standard cell layout generation. A similar method [14] uses primary outputs instead of latch chains attempting to find strongly connected subcircuits called cones in which all cells have a path to the same primary output. While both approaches yield some gain in terms of density and run-time over general placement, they are fundamentally unable to fully extract datapath regularity because they disregard the essentially two-dimensional nature of this feature. Consequently, the resulting placements are still not nearly as regular as those produced by dedicated datapath synthesis tools, hence the potential benefits of datapath regularity are only partially exploited. So far, no method is known to us from literature that is capable of extracting this two-dimensional structure which is needed for generating truly regular placements similar to those made by dedicated design flows. 2 CONTRIBUTIONS OF THIS PAPER We have developed a fast and efficient technique to automatically extract the two-dimensional datapath regularity from circuit netlists enabling explicit placement of the extracted circuitry as regular as by fully dedicated systems. While performing the extraction, the netlist is decomposed to form a new hierarchy that matches placement criteria much more closely than, if available, the usual functional hierarchy-implied locality used by most other placement methods. The target hierarchy is based on both the interconnect
CONTROLLER
CONTROLLER
REGISTER
ALU
MULTIPLIER MULTIPLIER
Figure 1. Circuit hierarchy transformation structure and the physical geometry of the cells, namely blocks with discovered regularity, a glue logic part and any number of large hard macros. Figure 1 illustrates the effect of this transformation on the floorplan. Note that this transformation, selective hierarchy flattening so as to obtain macros containing a suitable number of more or less related cells, is still mostly carried out by hand. The method described in this paper automates this task. Unlike the other approaches mentioned, the regular parts of circuitry generated by conventional non-dedicated synthesis tools are placed regularly, hence densely. Moreover, the transformed hierarchy reduces the solution space of the placement, allowing for much larger circuits to be placed in one go than without extraction. In addition, regular placement of regular structures is known to facilitate accurate clock and data skew control. Note that our approach is not technology dependent like dedicated datapath compilers, which enables to seamlessly integrate regular and non-regular circuitry, thus helping to prevent waste area. Furthermore, we believe that datapath placement using standard cells as against specialized cells will pay off even more as more routing layers are available. As depicted in figure 2, the proposed regularity extraction and hierarchy transformation method is an add-on which is plugged into a conventional design flow after the logical netlist generation, before the layout phase. The regularity extraction effectively performs a multi-decomposition of the circuit, yielding a restructured netlist, as well as the discovered 2-dimensional structure, if any. The remainder of this paper is organized as follows: The next section provides some necessary terms and preliminaries on datapath regularity. The modeling of datapath regularity we use is presented in section 5. Section 6 introduces a metric quantifying the extent of regularity of the circuit surrounding a partially reconstructed datapath. This metric guides the search-wave used by the regularity extraction algorithm presented in section 7 to expand into the most regular extension. Experimental results are presented in section 8 Finally, in section 9 conclusions and remarks are given.
EDIF Portable Netlist
EXTRACT & TRANSFORM
EDIF Portable Netlist
2-D alignment
Horizontal LIN. ARR
Place & Route
Vertical LIN. ARR
SCRIPT Cell + Conn Placement Seeds
Figure 2. Extended Design Flow
3 PRELIMINARIES
At the same time, the highly similar if not identical bit slices of the datapath are stacked alongside. Perpendicular to the slices, cells of the same type occurring at similar places in all slices are forming a datapath stage. The circuit is thus fitted onto a matrix of rectangular buckets containing the cells, where each slice coincides with a row, and each stage coincides with a column. The fact that all cells in a stage have the same type, hence form guarantees zero cell width variation per column. At the same time, as standard cells occupy only one row of transistors, the height variation within the rows is also negligible. Together, both properties establish a high degree of geometrical regularity yielding maximum density cell placement. Note that above properties are also found in many other popular layout styles.
Datapath cell-placement essentially maps the structural regularity onto topological regularity by cell alignment in 2 directions. Figure 3 shows a part of a fully regular 4-bit wide datapath. Cells associated with the same bit-slice are lined up horizontally.
Crucial to this approach, in addition to geometrical regularity, the interconnect regularity of datapaths have the following property: (almost) all nets running through the matrix are fully contained either within one slice or within one stage. This is caused by
t2
t3
t3
t2
t1
t1
TROL
t4
t2
rowheight
CON
DATAFLOW
t4
t2
t3
t3
t2
t1
t1
t2
t4
t2
t3
t3
t2
t1
t1
t2
t4
t2
t3
t3
t2
t1
t1
t2
slices
columnwidth
stages
Figure 3. An ideally aligned datapath the perpendicularity of data and control flows. Because of this orthogonality of the interconnect structure, the composition of each column is not affected by swapping rows in the matrix, and likewise for swapping columns. Under this interconnect orthogonality condition, ordering optimizations of rows and columns are mutually independent tasks that can be carried out in separate steps. The more nets violate this, the less valid it becomes to treat column and row orderings independently. Considering that ordering only one row and one column directly yields the complete relative placement of the entire matrix, the complexity of placement of datapath circuitry is thus reduced very significantly from one general 2-dimensional placement problem of all datapath cells at the same time, to two independent much smaller linear arrangement problems of just one single row and one single column. While the problem of regular placement remains NP-complete, the problem size is drastically reduced by many orders of a magnitude, as compared to the general placement task that would otherwise have to be carried out. Even for the small circuit in figure 3, the reduction . ratio amounts to This vast placement problem size reduction clearly already allows for much larger circuits to be placed in one go. Simplifying the task even further, the typical interconnect between the slices is such that ordering the slices with respect to each other is mostly straightforward. The linear arrangement problem under various constraints is a well known problem from literature, eg. [7][10], [1] or [6], which is therefore not elaborated on in this paper.
glue logic including insufficiently regular circuitry
In addition, the extracted regularity is described by two decompositions of the datapath part, namely a set of stages and a set of slices. Figure 4 shows the result of this process. The main objective of this process is to maximally satisfy the orthogonality condition and regularity goals. This objective implies that all extracted bit slices will be identical or highly similar. 5
REGULARITY MODELING
Given a circuit as a set of modules nets and pins . Each module is an instantiation of a module-type . %$'& ( instantiates a terminal-type T ! " Each pin !#" )+* of . Importantly, any terminal-type uniquely belongs to one single function-type. For convenience, let ,-"#/.0 be the set of circuit entities. The desired datapath regularity information of the circuit can be fully described by two separate decompositions of and into a number of stage sets 12 of entities and a number of slice sets 3 * . Any entity occurs in exactly one slice set and exactly one stage set at the same time. Consider figure 5 showing a part of t2 n8
n7
t1 t1
b1
τ1
p1
τ1
p2
n1
n14
p10 p3
τ2
p4
τ2
n2
n9 t1 t1
b2
τ1
p6
τ1
p7
n3
p8
τ2
n4
p9
τ2
large and/or prefabricated macros like memory blocks
m3 n12 t4
n10 t1 t1
b3
τ1
p11
τ1
p12
n5 n6
p13
τ2
p14
τ2
s1 n15
a number of datapath chunks containing regular circuitry
m1
m2 n11 t4
p5
4 PROBLEM FORMULATION To be able to perform datapath placement in the above way, the membership to both one slice and one stage for each cell belonging to the datapath must be known. This information is generally completely or largely unavailable, inconsistent or even, placementwise, unreliable in most design flows. It must therefore be extracted automatically from the netlist before placement. This task results in a decomposition of the circuit into
t3
t5
τ3 τ3
Figure 5. Part of a datapath
m4 n13 t4
t6
?A@ 8B
I F
unexpandable (leaf) cell
H
I
F
G
G H
I
F
G H
expandable module
stages
I
slices
Figure 4. Regularity induced circuit hierarchy transformation an example datapath circuit and a reference stage 12JK"L1 . The pairs of nets M2& $'&ON P , MQ&OR$+&TS=P and MQ&OU$+&TV=P are each in a distinct slice, 3 , 3:N and 3+R , respectively. Together, they form stage 1 "WMXM2& $'&ON PX$AMQ&OR$+&TS=PX$AMQ&TU $'&OV PXP , outlined in the figure. Note that a slice within a stage is actually is set of entities. The function Y[Z\,L] IN returns the unique stage index ^ of the entity which is in stage 1 . All entities that are not (yet) part of any stage are in the complement stage set 12_ . Recall that for alignment, all modules in the same stage 12 have the same type, but note also that there may be many other entities with that type outside 1 . Likewise, entities in 3 _ are not (yet) member of any slice, and function `aZ,(] IN returns the slice index of an entity. All entities are initialized with an undefined stage/slice membership: 1 _ "#3 _ "#, . Our modeling is founded on the basis that datapath regularity in a circuit is an essentially relative notion, in the sense that it is expressed in terms of certain attributes of the interconnect structure between the entities in a current reference stage that is known to be regular, referred to as 12J , and entities outside 12J . Many different sorts of netlist attributes that to some extent characterize datapath regularity can be distinguished. The most obvious attribute is the terminal-type associated with a pin between two entities. In addition, the degree of the adjacencies, their use (eg. signal flow direction), and possibly even explicit annotations concerning buses may be used if available. The set of attributes used for characterizing c c datapath regularity of a connection !b" $ N between entities c c and N is called generally the regularity signature or RS of ! , de noted d ! . Using more characteristics may help reducing certain ambiguities, thereby possibly increasing the amount of regularity found. In practice, using only terminal-type attributes already provides sufficient information to be able to extract almost all regular ity present a netlist, hence d ! " T ! . This is because some of the other attributes are partially implied by e . Nonetheless, the description in the next section is kept general to be able to accommodate more comprehensive signatures in the model. The extent of regularity between 1 J and its adjacent entities that are connected via incidences of the same RS is now determined by
the nature of the statistical distribution of the frequencies of that RS over the slices of 1 J . A uniform distribution corresponds with maximum regularity. For example, in figure 5, the incidences via the same terminaltype ) N to modules adjacent to those in 12J are clearly uniformly distributed over its slices 3 , 3 N and 3 R . Hence, the modules N , ) R and S which are of the same type fS associated with N have a regular interconnect structure towards 1 J . The same does obviously not apply to the module of type fU . Entity types like S are called multi-slice types. These essentially form datapaths if the associated entities N , R and S are repeated over the width of the datapath. Alternatively, this may be due to a single multi-slice entity, incident to multiple slices, like net &hg or module . In that case, it may be a non-expandable block such as a hard macro, so its c slice membership will be left undetermined, b3 _ , while it does belong to some stage. Otherwise, the multi-slice entity may even contain another datapath block which may also be considered by expanding it into the current circuit level. Finally, the two decompositions Y and ` of , can thus be inferred by repeatedly composing new reference stages if they are sufficiently regular. 6
LOCAL REGULARITY METRIC
In order to be able to quantify the extent of regularity between the bitslices in the current reference stage 1 J and the RSes of the connections to the neighborhoods of all members of 1 J , a numerical relation between the occurrence of each RS in the neighborhood and each bitslice in this reference stage is formulated. Only notc known entities in these neighborhoods are considered. An entity is said to be known if at least its membership to a slice or a stage cji cji are known, so k12_ml k3'_ . c The analysis proceeds as follows: Suppose entity is known to c2 c be in stage 1 J . Let P denote the set of pins of , then the set of RSses adjacent to 1 J is given by n
1 J
"
o p qsr:t:u vXq
d
Pw
p:x
!
y
1 J
Z{z
where z
12J
12J
to express the connectivity and the RSses:
\
1 J
1 J
y
Next, we use the function structure between the slices in
n
1 J
"~o ` p qsr t
]|X}
c2
y
12J$'3 $'
*
"WM:!
P
c2 A c
12J
3 l0d
!
y
is the set of slices that are found in the reference stage. formally defined as *
"
1 J
is
P
Thus y 1 J $'3 :$' * returns the set of pins attached to entities in slice * 3 of stage 12J that have RS . The complete bipartite graph in fig-
SIGNATURES
PINSET {p1,p2} {p 6,p 7}
τ1
t1
SLICES b1
{n1,n2}
2}
,p1
11
{p
τ2
t4
4} 3,p {p {p8,p9} {p 13 ,p1 4}
b2
{n3,n4}
In general, the uniformity of the distribution of decreases as the corresponding regularity decreases. The simplest uniformity measure of the number series is its range Z IR ] IR "2O h ^ ¡ T ^ , where ¢ is the defined as number of entries of . Considering only the range would already suffice if only fully regular relations need to be detected. For : instance, 1 $' " implies complete regularity, while : 1 $+ R "£ indicates non-regularity. The range does not provide much information about the extent of irregularity. For in stance, =$AX$A=$2X$A=$'Q "# =$'$A=$AX$ $'Q "- , whereas the former vector is clearly preferable over the latter. To be able to distinguish between large and smaller irregularities, we also consider the number ¤ of zero-entries in and the average of the vector in a sum of weighted terms. We propose the following local relative regularity metric ¥ 1 J Z n ] IR between stage 1 J and signature A defined as follows: ¥
12J$'
5} {p
τ3
t5
Figure 6. Signature to bit-relation graph y
{n5,n6}
1
ure 6 depicts y 1 . The pinsets implementing the adjacency of the elements in the respective classes are shown on the edges. For ex ample, if " ) , then y 1 $+3 N $ "WM !hX$!|P . We can now quantify the extent of regularity of the neighborhood of 1 J with regard to RS A by interpreting a the distribution of the number of pins to the individual slices of 1 J . We therefore construct a score vector 1 J $'A in which entry h ^ holds the connection count between bitslice 3 of 1 J and signature class , hence h ^
1 J $+A
"
y
1 J $'3 :$'A
A
These counts may vary from many times in all slices to zero in all slices except one. A uniform distribution of all elements of 1 J $'A corresponds to maximum datapath regularity at stage 1 J with respect to the adjacent entities in A . For example, the perfect regularity between stage 1 and the modules of type and in the circuit of figure 5 is reflected by
1
$'
"- M !
$!N P
$ M:!V$! g P $ M:! : $!
N P T " |$'| $'|Q
Conversely, regularity deviations are manifested by non-uniform ) distributions of , like incidences with R " R , since 1 $' R "-=$AX$+Q .
: :¨
12JA$'
Maximally regular:
b3
"W¦§T¤
¦©O
12J$+
: :¨
¦Aª«
12JA$'
where ¦ § , ¦ © and ¦ ª« are weight factors, chosen such that aliasing between the terms cannot occur. In our experiments, we used 10000, 100 and 1, respectively. The metric has following impor tant properties: ¥ 12J$+ is inversely proportional to the extent of regularity from 1J with respect to its adjacent cells with signature A .
} 10 {p {}
¥
Maximally irregular:
The value of ¥ ity decreases
12J$'
1J2$' # " ¥ 12J$+ # " ¬
increases monotonically as the regular-
These properties enable comparisons of the extent of regularity between different stages of the datapath and between different signatures making it suitable for the regularity extraction algorithm described in the next section. 7
REGULARITY EXTRACTION ALGORITHM
Our regularity extraction algorithm works by expanding searchwaves through the network, stage by stage. It uses the relative regularity metric introduced in the previous section to determine how to expand the wave such that every expansion is as regular as possible, while remaining able to deal with a certain amount of nonregularity. Suitable initializing reference stages can be found using another characteristic of datapath circuitry present in a subset of candidate stages, namely the occurrence of many nets that satisfy the condition it is connected once to one terminal type, and multiple times (typically 4 or more) to one other terminal-type. For example, see net &hg in figure 5. This datapath property is due to the fact that datapaths are formed by repeating bitwise operators that are operated in parallel. Such nets typically carry the control signals that apply to all bits of the datapath like clock lines, enable lines, multiplexor address selectors, etc. Any such net with a high number of pins may induce a suitable first reference stage. Alternatively, the user may explicitly specify initial reference stages.
Some particular non-regularities occur in most circuits, such as some bits, notably MSB and LSB which are different at some spots. The extraction algorithm will usually still be able to expand the wave, and since the wave often encloses irregular spots from several directions, entities in non-regular parts of the datapath will be fit into a suitable position later. The algorithm is outlined below: 1. Find as many as possible seed-stages 2. If there are no (more) seeds, exit 3. The seed-stage with the highest number of slices is selected as the first 1 J to start a search-wave. 4. Build y
12J
5. Compute ¥ for every RS in y
12J
6. If threshold not satisfied goto 8 7. Enter the pinset returned by y 8. If
1 J
in queue keyed by ¥ .
empty Goto 2
9. Extract the pinset from with the lowest ¥ 10. Create a new stage 11. For each pin p in the pinset Add the entity connected to via p to the new stage in the slice inherited via p from the reference stage. 12. The new stage becomes the reference stage 12J 13. Goto 4 A number of threshold values can be are used to control the expansion process. The algorithm does not work well if there are too few slices. A minimum datapath width of 4 slices seems sufficient. Waves that die before a configurable minimum of stages is reaches are discarded, etc. Also, if the extent if irregularity exceeds a tunable value, the current candidate stage will be dropped. The thresholds and weigh factors can be set such that the algorithm will only find fully identical slices, if present. A post-processing phase resolves undefined stage and slice tags for entities for which, a posteriori, a clear choice can now be made. This means that if for a module, a slice and stage tag is induced by a majority of its environment, and if the conditions for it being part of a datapath are met, it will still be added to the datapath. Otherwise, it is identified as being part of insufficiently regular logic. ' since The run-time complexity of our algorithm is only ® every pin is analyzed at most once from a net, and once from a module. Actual run times may even be significantly smaller, since the search-wave can never expand into non-regular areas of the circuit, hence no time is wasted in non regular circuitry. The space complexity is basically determined by the number of candidate extensions in the wavefront which takes just a very small amount of storage.
8 RESULTS We implemented our algorithm in C++. The regularity extraction and hierarchy transformation tool is part of a larger framework under development, aimed at performing fully automatic placement and routing in an environment where the designs were automatically generated from an abstract specification where human intervention in the synthesis backend will no longer be a viable option. Input to the program is a EDIF netlist file which may or may not be hierarchical. Figure 2 shows the relevant part of the system. Output are the restructured netlist in EDIF, and a file describing the regularity found. The latter file can then read programs that perform both linear orderings. The partial placement is then supplied to a standard placement and routing backend to complete the placement. Since the emphasis of this paper pertains to regularity extraction, we did not elaborate on the two subsequent linear orderings. Lacking a general way to quantify the success of regularity extraction of circuits that are not completely regular, we used indirect metrics indicating the usefulness of the results, namely the percentage of regular circuitry. Table 1 presents some results of the extraction algorithm on a number of examples. These times were measured on a HP9000/735 workstation. In all examples, we used the default value of all tunable parameters. The first 3 circuits are automatically generated by an HLS system. TTA and 8048 are microprocessor cores. Circuit CDFilter is a signal processor for CD audio. The remaining circuits come from the standard cell benchmark set. The percentage of the total number of cells assigned to the datapath by the extraction process should be interpreted with care, since some circuits include controllers and other non-regular circuitry. In most benchmark netlists, the identifiers are stripped so that they do not provide information regarding the curcuit’s structure which could have served as a reference. One notable advantage of a nonflattened circuit description is that they contain more multi-slice modules, which can be automatically selectively opened to keep memory requirements low. Note that not all extracted slices have to be identical. 9
CONCLUSIONS AND REMARKS
This paper presents a very fast new technique for datapath extraction. It is based on a new metric quantifying the extent of regularity between a known regular part of a datapath and its neighborhood. A search-wave guided by this metric constructs the rest of the datapath regaining its 2-dimensional regular structure, allowing fast, dense and technology independent placement of the datapath’s cells. The algorithm can deal with large circuits that need not be fully regular. Note that minor irregularities in datapath circuits such as carrylook ahead logic per 4 bits in a 16 bit wide datapath have a minor effect on the usefulness of the results regarding subsequent placement for two main reasons. The first reason is that the postprocessing takes care of the larger part of these cases. Secondly, the remaining cells which might qualify for being placed in the datapath will, if sufficiently strongly connected to other cells which
Circuit name wave digital filter diffeq8 elliptic8 register file i8048 w/ ctrl w/o mem TTA-16 CDFilter struct fract biomed avq-large avq-small
total cells 9180 1273 1857 730 948 6720 7218 1888 125 6417 25114 21854 ¯
¯ regular cells found 8041 855 1370 694 333 5418 5088 1879 72 5458 18928 16451
regular % 87% 67% 74% 96% 35% 80% 70% 99% 58% 85% 75% 75%
¯ DP chunks 5 1 2 2 2 9 12 2 1 1 1 1
MAX width 32 8 8 8 8 16 42 16 9 20 16 16
time (sec) 6.7s 0.5s 1.1s 0.4s 0.7s 5.6s 4.7s 1.3s 0.1s 1.9s 8.0s 7.2s
Table 1. Datapath extraction results were already explicitly preplaced regularly, be pulled into the regularly placed area because of the wire-length reduction performed by placement tool. Finally, in the simplified layout model we used, subsequent placement is assumed to be row-based with only one single row of standard cells per bit slice. Clearly, in case of a very large number of datapath stages compared to the number of slices, the aspect ratio of the datapath matrix may become unfavorable with respect to the global floorplan of the chip. Allowing 2 or 3 rows per slice can greatly alleviate this effect while hardly affecting the advantages of regular placement generation. Alternatively, the datapath can be folded.
[8] M. Hirsch and D. Siewiorek. Automatically extracting structure from a logical design. In Proceedings of the International Conference on Computer Aided Design, pages 456– 459. IEEE, 1988. [9] J.M. Kleinhans, G. Sigl, F.M. Johannes, and K.J. Antreich. Gordian: Vlsi placement by quadratic programming and slicing optimization. IEEE Transactions on Computer Aided Design, 10(3):356–365, 1991.
REFERENCES
[10] H. Nakao, O. Kitada, M. Hayashikoshi, K. Okazaki, and Y. Tsujihashi. A high density datapath layout generation method under path delay constraints. In Proceedings of the Custom Integrated Circuits Conference, pages 9.5.1–9.5.5. IEEE, 1993.
[1] T. Asano. An optimum gate placement algorithm for mos one-dimensional arrays. Journal of Design Systems, 6(1):1– 27, 1982.
[11] G. Odawara, T. Hiraide, and O. Nishina. Partitioning and placement technique for cmos gate arrays. IEEE Transactions on Computer Aided Design, CAD-6(3):355–363, May 1987.
[2] H. Cai, S. Note, P. Six, and H. De Man. A data path layout assembler for high performance dsp circuits. In Proceedings of the Design Automation Conference, pages 306–311. ACM/IEEE, 1990. Paper 18.1.
[12] Leveugle R. and Safinia C. Generation of optimized datapaths: bit-slice versus standard cells. IFIP Transactions A, A22:153–66, Sept. 1992.
[3] C.E. Cheng and C.-Y. Ho. Sefop: A novel approach to data path module placement. In Proceedings of the International Conference on Computer Aided Design, pages 178– 181. IEEE, Nov 1993. [4] Compass Design Automation. Compass Datapath Compiler, v8r3 edition, 1991. [5] Marshburn et al. Datapath: a cmos datapath silicon assembler. In Proceedings of the Design Automation Conference, pages 722–12. IEEE, 1986. [6] C.M. Fiduccia and R.M. Mattheyses. A linear-time heuristic for improving network partitions. In Proceedings of the Design Automation Conference, pages 175–181, 1982. [7] S. Goto, I. Cederbaum, and B.S. Ting. Suboptimum solution of the back-board ordering with channel capacity constraint. IEEE Transactions on Circuits And Systems, 24(11):645– 652, Nov 1977.
[13] C. Sechen and K.W. Lee. An improved simulated annealing algorithm for row-based placement. In Proceedings of the International Conference on Computer Aided Design, pages 478–481, 1987. [14] Y.-W. Tsay and Y.-L. Lin. A row-based cell placement method that utilizes circuit structural properties. IEEE Transactions on Computer Aided Design, 14(3):393–397, Mar 1995.