Chu ZF, Xia YS, Wang LY. Cell mapping for nanohybrid circuit architecture using genetic algorithm. JOURNAL OF COMPUTER SCIENCE AND TECHNOLOGY 27(1): 113–120 Jan. 2012. DOI 10.1007/s11390-012-1210-7
Cell Mapping for Nanohybrid Circuit Architecture Using Genetic Algorithm Zhu-Fei Chu (储著飞), Student Member, IEEE, Yin-Shui Xia∗ (夏银水), and Lun-Yao Wang (王伦耀) School of Information Science and Engineering, Ningbo University, Ningbo 315211, China E-mail:
[email protected]; {xiayinshui, wanglunyao}@nbu.edu.cn Received January 25, 2010; revised September 26, 2011. Abstract Nanoelectronics constructed by nanoscale devices seems promising for the advanced development of integrated circuits (ICs). However, the lack of computer aided design (CAD) tools seriously hinders its development and applications. To investigate the cell mapping task in CAD flow, we present a genetic algorithm (GA) based method for Cmos/nanowire/MOLecular hybrid (CMOL), which is a nanohybrid circuit architecture. By designing several crossover operators and analyzing their performance, an efficient crossover operator is proposed. Combining a mutation operator, a GA based algorithm is presented and tested on the International Symposium on Circuits and Systems (ISCAS) benchmarks. The results show that the proposed method not only can obtain better area utilization and smaller delay, but also can handle larger benchmarks with CPU time improvement compared with the published methods. Keywords
1
nanohybrid circuit, cell mapping, genetic algorithm, optimization
Introduction
With rapid development in nanotechnology, nanoelectronics have received much attention in recent years due to the potential for incredible density and low fabrication costs[1-2] . A typical nanohybrid circuit architecture developed by Likharev K K’s group and named Cmos/nanowire/MOLecular hybrid (CMOL)[3] , has advantages to be memories[4-5] , field programmable gate array (FPGA)[6-7] and neuromorphic CrossNets[8] . However, the development of corresponding computer aided design (CAD) tools is far from demand. Especially, CMOL cell mapping is one of the key parts in the CAD tools. When Strukov and Likharev first presented the CMOL FPGA, they performed the cell mapping task manually for several simple, regular-structured Boolean circuits[7] . They also presented a reconfigurable architecture for CMOL FPGA, which grouped CMOL cells (e.g., 4 × 4 cells) to form tiles[9] , can be treated similarly to traditional FPGA’s clusters, and can utilize the algorithm of the existing cluster-based FPGA CAD tools[10] . However, that work also did not solve the CMOL cell mapping problem for general cases. Since it was restricted to the tile abstraction, it relied on
sufficient cell connectivity radius and the insertion of additional inverters for routing purposes. In [11], the authors solved the CMOL cell mapping problem via satisfiability and extended it as a reconfiguration tool for various CMOL defects. But satisfiability generally does not scale nicely with the size of the problem. Another work accomplished by Kim et al. solved it by combining a force-directed placement algorithm with a Munkres gate assignment algorithm in a loop[12] . As there are a lot of overlaps when the first step is done, they must carry out gate assignment work later. In our previous work[13] , we adopted simulated annealing based dynamic interchange method to carry out cell mapping, which avoids overlapping but causes bad area and delay. Hence, research in this area is far from satisfaction. In this paper, we reinvestigate this problem as an optimization problem, with the hope of minimizing the area with reasonable CPU time based on genetic algorithm (GA). A preliminary conference version of this work presented a basic GA framework[14] . In this paper, we extend our work by introducing several crossover operators to guide the optimization and more comprehensive evaluation to the proposed scheme. Experimental results show that the two-dimensional (2-D) crossover
Short Paper This work is supported by the National Natural Science Foundation of China under Grant Nos. 61131001, 60871022, 61041001, Natural Science Foundation of the Zhejiang Province of China under Grant Nos. Z1090622, Y1080654 and K.C.Wong Magna Fund from Ningbo University. ∗ Corresponding Author ©2012 Springer Science + Business Media, LLC & Science Press, China
114
J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1
operator can obtain better results than other proposed operators. Correspondingly, the GA approach presented in this paper is more efficient and presents better optimization than published results in terms of circuit size and CPU time. The rest of the paper is organized as follows. In Section 2, the background of the CMOL and cell mapping formulation are introduced. Then the framework details are demonstrated in Section 3. Experiments are carried out in Section 4. Conclusions are shown in Section 5. 2 2.1
3×4. In each cell, there is a CMOS inverter. Nanowires aligned with one direction receive signals from the outputs of the CMOS inverters. Those nanowires are or’ed together with nanowires aligned with another (orthogonal) direction according to the nanodevice configurations. The or’ed signal goes to the inverter’s input, which is on the CMOS level. This or-not logic is the preferred implementation of the CMOL FPGA. For example, in Fig.2, A, B, F are three signals connected with the three cells’ output pins. With the illustrated nanowire connections (nw1, nw2, nw3) and “ON” nanodevices (d1, d2), the logic expression is F = A + B.
CMOL Principle and Cell Mapping CMOL
The generic CMOL circuit cross-section view is shown in Fig.1. It consists of nanowires, nanodevices, interface pins and CMOS stacks. The nanodevices in CMOL can be any two-terminal nanodevices, e.g., a binary “latching switch” based on molecules with two metastable internal states. The interface pins connect the CMOS upper-level metals and the nanowires. The nanodevices are sandwiched between the two levels of perpendicular nano-imprinted nanowires. For example, there is a nanodevice at the cross-section of nanowire a and the perpendicular nanowire β. This unique structure solves the problems of addressing denser nanodevices with sparser CMOS components. Each nanodevice is accessed by two perpendicular nanowires which connect to the nanodevice. The nanowires are, in turn, connected by different height interface pins to the CMOS circuits. With N nanowires and pins, we could address O(N 2 ) nanodevices.
Fig.1. Generic CMOL circuit cross-section view.
Strukov and Likharev proposed the CMOL FPGA idea to fully explore the regularity of the CMOL architecture[7] . Because the nanodevices are nonvolatile switches, they could program those nanodevices and route the signals from CMOS to the nanowires and nanodevices, and back to CMOS again. Logic functions are created by a combination of CMOS inverters and diode-like nanodevices. To further explore architectural regularity, they proposed a cell-like CMOS structure[9] . As shown in Fig.2, it is a CMOL FPGA array with size
Fig.2. CMOL FPGA example configuration.
According to [7], there are periodic breaks in the nanowire fabric, such that each input/output nanowire has a fixed length based on these breaks. Hence each CMOL cell can only be connected to a limited number of neighboring CMOL cells. The set of CMOL cells that can be connected to the input of a particular cell X is called the input connectivity domain of X. Similarly, the output connectivity domain refers to the set of cells that can be connected to the output of X. CMOL FPGA has several varieties, also named “CMOL cousins”. For the sake of easy fabrication, Snider et al. introduced an easier implementation version, field-programmable nanowire interconnect (FPNI) with larger interface pad[15] . Furthermore, Xia et al. performed the first experimental demonstration of FPNI recently[16] . 3-D (three dimensional) CMOL by Tu et al. was introduced to double the crossbar density[17-18] . In this paper, we only focus on the architecture similar in [7], whereas the proposed method can be extended to CMOL FPGA varieties, such as FPNI. 2.2
CMOL Cell Mapping
Given any Boolean combinational circuit, we can easily transform the circuit into a netlist of nor gates, using any library-based technology mapping from any logic synthesis tool or the technique outlined in [11].
Zhu-Fei Chu et al.: Cell Mapping for Nanohybrid Circuit
Mathematically, we can view the nor gate netlist as a directed acyclic graph (DAG) G = (V, E), where V is a set of vertices (each vertex corresponds to a nor gate in the netlist), and E = V ×V is a set of directed edges. Given a collection of CMOL cells Ψ, we need to find a mapping from the vertices (nor gates) to the CMOL cells, p:V →Ψ such that: ∀i, j ∈ V : (i 6= j) ⇒ (p(i) 6= p(j)), ∀(i, j) ∈ E : dist(p(i), p(j)) 6 R, where dist is the Manhattan distance between two CMOL cells and R is the radius of the connectivity domain. Above constraints indicate that each gate must be mapped to one CMOL cell while one CMOL cell can be occupied by at most one gate, we call this one-andonly-one constraint. Since each specified CMOL cell can only connect to limited other CMOL cells, each connected gate pair in netlist should be mapped within each other’s connectivity domain. Although inverter pairs (buffers) can be inserted to enlarge the connectivity domain[19] , it will worsen the timing of the circuit if too many such buffers exist. This situation should be avoided to the highest degree. 3 3.1
Procedure of GA Based CMOL Cell Mapping Algorithm Overview
We present a GA[20-21] based CMOL cell mapping framework in this section. The algorithm is shown in the following. Step 1: Initialize the population; Step 2: Evaluate the fitness score of the population; Step 3: Update the population by crossover, mutation operators; Step 4: Select the best individuals and keep population size constant; Step 5: When stopping rule is not reached, go to Step 2, else continue; Step 6: Check all connections in the netlist, apply buffer insertion to accomplish routing for gates which are mapped beyond the connectivity domain.
In the proposed framework, GA is the core. GAs are inspired by Darwin’s evolution theory, the survival of the fittest. The algorithm is started with a set of solutions (represented by chromosomes) called population. The population is randomly generated with a group of possible solutions (individuals), with population size psize. Individuals in the population are then evaluated based on fitness function and given each solution a score based how well it performs at a given task.
115
Two individuals are then selected based on their fitness, enabling higher fitness to access the higher chance of being selected. These individuals then “reproduce” one or more offspring by crossover operator with a rate rc, after which the offspring are mutated randomly with a rate rm. The algorithm will be terminated until it reaches stopping rule (reaches the upper limit of generation numbers max gen or the fitness score without updating 20 times). Finally, for gates beyond the connectivity domain, buffers will be inserted to accomplish routing. 3.2
Chromosome
The constitution of a chromosome for our CMOL cell mapping problem is a solution which represents the mapping result. The chromosome structure is (C1 , C2 , C3 , . . . , CN ), where N = kΨk is the number of cells in the CMOL cell array. Each Ci can be either index to a nor gate or −1. The nor gate index is in the range [0, M − 1], where M is the total number of nor gates in the circuit. The value −1 indicates that the cell is not occupied by any nor gate. Each position 1, . . . , N in the chromosome corresponds to a CMOL cell in the array. Take Fig.3 as an example. A circuit which contains 16 nor gates is mapped to a 5 × 5 CMOL cell array. Hence, N = kΨk = 5 × 5 = 25 and M = 16. From the chromosome code, each digit in the chromosome is the index of a nor gate while each position in the chromosome appears to be specific to the corresponding gate location. For instance, nor gates 12 and 14 are mapped to CMOL cells at position (0, 1) and (4, 4), respectively. In addition, the code starts from the cell at position (0, 0) and ends at position (4, 4) following the sequence shown in dashed line in a bottom-up and left-right order. We call the chromosome constructed by this kind of coding as vertical chromosome. This one-and-onlyone constraint is satisfied by coding scheme.
Fig.3. Example of CMOL cell mapping and the corresponding vertical chromosome.
116
3.3
J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1
Fitness Function
In GA, fitness function is used for evaluating adaptive ability of individuals. The function provides a measurement standard for choosing the best population to the next generation. Unlike the FPGA standard cell placement, the CMOL logic structure requires gates cascaded in the circuit being within each other’s connectivity domain. In other words, the wire-length of the linked gates should be less than constant R, the radius of the connectivity domain. Because of this constraint, penalty function is also added to the fitness function so as to find a near-optimized solution.
As depicted in Fig.4, gene 7 in Parent a aligns with gene 8 in Parent b, then genes 7 and 8 in Parent a get swapped. Similarly, genes 10 and 5, genes 6 and 12, genes 11 and 4 in Parent a get swapped, respectively. This is implemented by mapping Parent b to Parent a to generate Child a. Next, by mapping Parent a to Parent b, genes 8 and 7, genes 5 and 10, genes 12 and 6, genes 4 and 11 in Parent b get swapped. Thus after the PMX, the two children are reproduced.
1 . w L (i,j)∈E i,j i,j + ui,j (Li,j − R)
f=P
In the above equation, f is fitness function, wi,j and Li,j are the weight (it is measured by fanin of the nor gate) and Manhattan distance between gates i, j, respectively. We use a simple method to penalize infeasible solutions by applying a constant penalty to those solutions which violate feasibility in any way. The constant ui,j is defined as below: if dist i,j 6 R, 0, ui,j = j dist i,j k , otherwise. R Above expression illustrates that for each (i, j) ∈ E, if dist i,j is larger than radius R, there will be a penalty to reduce fitness score. 3.4
Genetic Operators
3.4.1 Crossover Crossover is one of the main genetic operators in GA. It takes two parent solutions to reproduce their children. After the selection process, the population is enriched with better individuals. It can simply proceed by selecting cutting points randomly and generate the offspring by combining the cutting segments of two parents. On account of better children obtained by combining good parents, we adopt Roulette algorithm to choose the top better parents to crossover based on their fitness and survival probability in the process of selection. However, the digits in the solution string cannot be duplicated after crossover except the empty cell, −1. Otherwise, it will violate the one-and-only-one constraint. In this work, partially matched crossover (PMX) is applied to solve the duplication problem[22] . The PMX is implemented by position based pairwise exchanges. The genes among cut segments get exchanged by searching corresponding gene positions.
Fig.4. PMX crossover.
To fully explore the preferable crossover ability, several distinguishing crossover operators are designed and compared. In comparison with vertical chromosome described in Subsection 3.2, horizontal chromosome is introduced firstly. It can be seen in Fig.5, the code starts from the cell at position (0, 4) and ends at position (4, 0) following the sequence shown in dashed line which is in a top-down and left-right order. Due to two individuals being selected as parents, a hybrid crossover HV is operated by one parent in a way of vertical coding (V ) and the other one in a way of horizontal coding (H). Similarly, crossover HH is operated by both parents in horizontal coding and VV is operated by both parents in vertical coding. The above three crossover operators are constructed and performed to compare the efficiency of chromosome coding scheme.
Fig.5. Example of CMOL cell mapping and the corresponding horizontal chromosome.
The experimental results show that the HV crossover operator has relatively better efficiency than HH and VV, but without significant performance improvement.
Zhu-Fei Chu et al.: Cell Mapping for Nanohybrid Circuit
The possible reason is that some already well mapped sub-circuits is split during crossover. Take the subcircuit which is built by nor gates 0, 6, 5, 1 shown in Fig.5 for example. If the sub-circuit is well mapped, within each other’s connectivity domain, the above three crossover operators may split the sub-circuit during crossover operation. Hence, the algorithm may not be efficient. In order to reflect the mapping feature in the process of GA, we develop a novel 2-D crossover operator. It operates as follows: randomly choose two parents and then randomly select the corresponding object block, the components in the block are then doing the pairwise exchange as the same as PMX does. As shown in Fig.6, among cut segments, gene 15 in Parent a aligns with gene 14 in Parent b, then genes 15 and 14 in Parent a get swapped. Similarly, genes 19 and 13, genes 17 and 12, gene 18 and a random −1 get swapped. Then the two children are reproduced.
117
CPU time. We apply them to six benchmarks {s344, s400, s510, s526, s641, s838} which are of different sizes. Then the algorithm runs fixed time and the medium value is recorded as a fairly result. The testing results are shown in Table 1 in which the index a, b, c, d indicate HH, VV, HV and 2-D crossover, respectively. From the results, the performance of HH, VV, HV crossover operators appears almost the same. However, 2-D crossover operator results in fewer buffers for routing, better timing traded with relative more CPU time. Under the same stopping rule, 2-D crossover operator has the comprehensive ability to search the solution space. Therefore, in the proposed algorithm, 2-D crossover operator is adopted. 3.4.2 Mutation Mutation is viewed as a background operator to maintain genetic diversity in the population. It produces spontaneous random changes in various individuals. There are many different forms of mutation for the different kinds of representations. The mechanism applied here is pairwise interchange. Two genes of the string are randomly chosen and the indices corresponding to those genes are interchanged as shown in Fig.7.
Fig.7. Mutation.
Fig.6. Example of 2-D crossover.
It can be seen that the 2-D crossover operator can protect the well mapped sub-circuit from being split during crossover. Therefore, the algorithm may achieve better solutions. Given the crossover rate rc = 0.33 in GA, we compare the performance of the four different crossover operators in terms of timing, the number of buffers and the
Mutation is followed after crossover and is controlled by the mutation rate rm. If there are n cells in a CMOL array with a population size psize, the total amount of genes is n × psize. Thus n × psize × rm pairwise interchanges will be operated in each generation. The mutation rate should be selected properly. If too small, the useful genes may have no chance to be selected in the algorithm. Otherwise, the algorithm may consume much more memory space and running time. Moreover, the offspring will begin losing their excellent characters inherited from their parents. 4
Experimental Results We implement the CMOL cell assignment GA in C
Table 1. Comparison Results of the Four Crossover Operators Circuits s344 s400 s510 s526 s641 s838
a 18 11 18 11 24 30
Timing b c 18 18 11 11 18 18 11 11 24 23 30 30
d 18 11 18 11 23 28
a 0 1 5 4 26 69
Buffers b c 0 0 1 2 5 3 3 6 25 22 66 69
d 0 1 2 5 15 50
a 0.58 1.66 14.23 9.03 65.60 159.92
CPU Time (s) b c 0.55 0.56 1.65 1.36 14.20 14.80 11.30 9.32 62.65 64.69 152.33 168.09
d 0.57 2.12 16.56 9.75 82.66 201.37
118
J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1
language and conduct experiments on the International Symposium on Circuits and Systems (ISCAS) benchmarks under Linux on a 1.73 GHz Intel Pentium M processor with 512 MB memory. For each design, we first carve out the combinatorial logic, i.e., convert all inputs/outputs of latches to primary inputs/outputs respectively. Then we employ logic synthesis tool (such as SIS[22] ) to generate input BLIF file which only contains nor gates by technology mapping using specified library. After that, we select appropriate CMOL array territory which can guarantee the primary I/Os all being mapped around the peripheral of CMOL cell array and the nor gates being mapped to rest cells without overlapping. Finally, we run GA described in Section 3 to place a given circuit in a 2-D CMOL array. We set the connectivity domain with radius R = 12, the population size psize = 24, crossover rate rc = 0.33, mutation rate rm = 0.01, the upper limit of generation numbers max gen = 1000. As a comparison, we run above benchmarks using the “CMOL FPGA CAD Tool 1.0” software developed by Strukov and Likharev[23] . It is observed that the default radius of CMOL connectivity domain R is 40. For the fair comparison, we set the radius to be 12 employed in our work. The test results are shown in Table 2. The first column is the tested circuit name. The next two main columns show the tested results from “CMOL FPGA CAD Tool 1.0” and our proposed method. Under the two main columns, there are a number of sub columns that show the measured parameters. Column “Cells” presents the number of CMOL cells used for nor gates in both logic and routing. Column “Area” shows the scale of CMOL array territory where “Tiles” in the round parenthesis is automatically
generated based on circuit details, in terms of area, one tile equals to 16 CMOL cells, and hence 2×2 tiles equal 64 CMOL cells. Column “AU%” stands for area utilization and is the percentage of CMOL cells used in CMOL array territory. Column “Delay” shows the maximum levels (include logic and buffers for routing) of the circuits. Column “CPU Time” gives the time for the algorithm to solve the problem. From Table 2, it can be seen that the proposed approach runs much faster, and has better delay than the tile approach while the proposed approach uses fewer CMOL cells and hence better area utilization for most circuits. Circuits s510, s713, s820, s832, s838, s1196 and s1238 cannot be mapped when we use the “-size” argument in “CMOL FPGA CAD tool 1.0” to optimize the area until the resource utilization is over 100%. In terms of area utilization, the optimized results for circuits s420, s641, s713 from the proposed approach are worser than that of “CMOL FPGA CAD tool 1.0” for the later carves out combinational logic in which all inputs/outputs of latches are converted to primary inputs and outputs (PIOs). For some circuits, the transformation leads to large PIOs. For example, circuit s641 has 96 PIOs. In our approach, all the PIOs are randomly mapped around the peripheral of the CMOL cell array. Therefore, we have to generate an array to make sure the mapping is successful. In this case, the redundancy of the CMOL cells is significant and hence the low area utilization incurs. Table 3 compares our results against the SAT approach in [11]. The SAT approach seems to have comparable delay against our approach, but our GA runs much faster than the SAT approach. In fact, for large benchmarks, both tile[23] and SAT[11] approaches are blowing up. Our GA seems to scale much better as the
Table 2. Comparison with Tile Approach[23] Circuit s27 s208 s298 s344 s349 s382 s386 s400 s420 s444 s510 s526 s641 s713 s820 s832 s838 s1196 s1238
Cells 12 123 125 174 186 175 219 189 300 210 – 329 289 – – – – – –
CMOL FPGA CAD Tool 1.0 (r = 12) Area (Tiles) AU% Delay CPU Time (s) 64 (2 × 2) 18.75 9 1 256 (4 × 4) 48.05 18 3 256 (4 × 4) 48.83 13 7 400 (5 × 5) 43.50 20 8 400 (5 × 5) 46.50 20 7 400 (5 × 5) 43.25 13 7 400 (5 × 5) 54.75 16 11 400 (5 × 5) 47.25 15 8 400 (5 × 5) 75.00 20 8 400 (5 × 5) 52.50 17 9 – – – – 576 (6 × 6) 57.12 16 13 576 (6 × 6) 50.17 25 8 – – – – – – – – – – – – – – – – – – – – – – – –
Cells 8 109 85 130 134 124 138 137 248 136 266 222 206 225 400 407 507 613 662
Our Work (r = 12) Area (Row×Col) AU% Delay CPU Time (s) 25 (5 × 5) 32.00 7 0.01 169 (13 × 13) 64.50 16 1.12 144 (12 × 12) 59.03 11 0.17 196 (14 × 14) 66.33 18 0.57 196 (14 × 14) 68.37 18 0.49 196 (14 × 14) 63.27 11 1.60 196 (14 × 14) 70.41 10 1.05 196 (14 × 14) 69.90 11 2.12 361 (19 × 19) 68.70 16 8.50 196 (14 × 14) 69.39 11 1.86 361 (19 × 19) 73.68 18 16.56 324 (18 × 18) 68.52 11 9.75 676 (26 × 26) 30.47 23 82.66 676 (26 × 26) 33.28 24 52.84 529 (23 × 23) 75.61 15 77.52 529 (23 × 23) 76.94 16 69.27 676 (26 × 26) 75.00 28 201.37 729 (27 × 27) 84.09 30 234.88 784 (28 × 28) 84.44 37 268.92
Buffers 0 0 0 0 0 0 0 1 1 2 2 5 15 34 41 54 50 84 121
Zhu-Fei Chu et al.: Cell Mapping for Nanohybrid Circuit
119
Table 3. Comparison with SAT Approach[11] Circuit s27 s208 s298 s344 s349 s382 s386 s400 s420 s444 s510 s526 s641 s713 s820 s832 s838 s1196 s1238
Cells 8 109 85 130 134 124 138 137 – 136 266 – – – – – – – –
SAT Approach (r = 9) Delay CPU Time (s) 7 0.07 16 509.84 11 370.30 18 6.18 18 7.60 11 12.88 10 10.30 11 7.52 – – 11 7.59 18 213.27 – – – – – – – – – – – – – – – –
problem size increases. The promising numerical running results may depend on following aspects. The non-duplicating permutation encoding can guarantee each cell is occupied by at most one nor gate and each nor gate can only be assigned to at most one cell, none other treatment for this one-and-only-one constrain is employed in the iteration step of GA. The novel 2-D crossover operator can efficiently explore the solution space and it protects some already well mapped sub-circuit for avoiding being split by traditional genetic operators in some extent. The tile approach formed by grouped CMOL cells relies on sufficient cell connectivity radius and the insertion of additional routing inverters which cause bad delay. Therefore, the utilization rate of the cells in the tile approach is not efficient. In contrast, the proposed approach is based on cells, which is 1/16 of a tile. Hence, during mapping process, the proposed approach is more flexible and compact than the tile approach. 5
Conclusions
Previous methods of CMOL cell mapping can only solve small circuits with long CPU runtime and low area usage. In this paper, we develop a GA based approach to address these issues. By encoding the solution expression of CMOL cell array and developing a 2-D crossover operator, we present an efficient GA based approach and test it on a set of ISCAS benchmarks. Experimental results show that our approach is capable of solving larger circuits with less CPU time, smaller area and shorter delay than published results. Acknowledgement The authors would like to thank Dr. Hung W N N from Synopsys Inc. and Prof.
Cells 8 109 85 130 134 124 138 137 248 136 266 222 206 225 400 407 507 613 662
Our Work (r = 9) Delay CPU Time (s) 7 0.01 16 0.86 11 2.12 18 3.01 18 3.18 11 3.11 10 3.56 11 7.27 16 8.50 11 7.45 22 25.13 14 13.97 30 109.14 24 96.99 18 117.57 20 101.29 31 240.12 40 322.94 38 457.86
Xiao-Yu Song from Portland State University for providing valuable advice and benchmarks. References [1] Haselman M, Hauck S. The future of integrated circuits: A survey of nanoelectronics. Proceedings of the IEEE, 2010, 98(1): 11-38. [2] Wang W, Liu M, Hsu A. Hybrid nanoelectronics: Future of computer technology. Journal of Computer Science and Technology, 2006, 21(6): 871-886. [3] Likharev K K. Hybrid CMOS/nanoelectronic circuits: Opportunities and challenges. Journal of Nanoelectronics and Optoelectronics, 2008, 3(3): 203-230. [4] Abid Z, Barua M, Alma’aitah A. Design of a transmission gate based CMOL memory array. IET Micro & Nano Letters, 2008, 3(3): 70-76. [5] Strukov D B, Likharev K K. Defect-tolerant architectures for nanoelectronic crossbar memories. Journal of Nanoscience and Nanotechnology, 2007, 7(1): 151-167. [6] Strukov D, Mishchenko A. Monolithically stackable hybrid FPGA. In Proc. DATE, Dresden, Germany, Mar. 8-12, 2010, pp.661-666. [7] Strukov D B, Likharev K K. CMOL FPGA: A reconfigurable architecture for hybrid digital circuits with two-terminal nanodevices. Nanotechnology, 2005, 16(6): 888-900. ¨ ¨ Lee J H, Ma X, Likharev K K. Architectures for [8] TUrel O, nanoelectronic implementation of artificial neural networks: New results. Neurocomputing, 2005, 64: 271-283. [9] Strukov D B, Likharev K K. CMOL FPGA circuits. In Proc. Int. Conf. Computer Design, Las Vegas, USA, June 26-29, 2006, pp.213-219. [10] Betz V, Rose J. VPR: A new packing, placement and routing tool for FPGA research. In Proc. the 7th International Workshop on Field-Programmable Logic and Applications, London, UK, Sep. 1-3, 1997, pp.213-222. [11] Hung W N N, Gao C, Song X, Hammerstrom D. Defecttolerant CMOL cell assignment via satisfiability. IEEE Sensors Journal, 2008, 8(6): 823-830. [12] Kim K, Karri R, Orailoglu A. Design automation for hybrid CMOS-nonoelectronics crossbars. In Proc. IEEE International Symposium on Nanoscale Architectures, Santa Clara,
120 USA, Oct. 21-22, 2007, pp.27-32. [13] Chu Z, Xia Y, Wang L, Hu M. CMOL cell assignment based on dynamic interchange. In Proc. the 8th ASICON, Changsha, China, Oct. 20-23, 2009, pp.921-924. [14] Xia Y, Chu Z, Hung W N N, Wang L, Song X. CMOL cell assignment by genetic algorithm. In Proc. the 8th NEWCAS, Montreal, Canada, June 20–23, 2010, pp.25-28. [15] Snider G S, Williams R S. Nano/CMOS architectures using a field-programmable nanowire interconnect. Nanotechnology, 2007, 18(3), Article No. 035204. [16] Xia Q, Robinett W, Cumbie M W, Banerjee N, Cardinali T J, Yang J J, Wu W, Li X, Tong W M, Strukov D B, Snider G S, Medeiros-Ribeiro G, Williams R S. Memristor-CMOS hybrid integrated circuits for reconfigurable logic. Nano Letters, 2009, 9(10): 3640-3645. [17] Tu D, Liu M, Wang W, Haruehanroengra S. 3D CMOS/Molecular hybrid circuits. Journal of Nanoscience and Nanotechnology, 2009, 9(2): 1015-1018. [18] Tu D, Liu M, Wang W, Haruehanroengra S. Threedimensional CMOL: Three-dimensional integration of CMOS/nanomaterial hybrid digital circuits. IET Micro & Nano Letters, 2007, 2(2): 40-45. [19] Chen G, Song X, Hu P. A theoretical investigation on CMOL FPGA cell assignment problem. IEEE Transactions on Nanotechnology, 2009, 8(3): 322-329. [20] Xia Y, Almaini A E A. Genetic algorithm based state assignment for power and area optimisation. IEE Proceedings — Computers and Digital Techniques, 2002, 149(4): 128-133. [21] Sivanandam S N, Deepa S N. Introduction to Genetic Algorithms. Heidelberg: Springer, 2008. [22] Sentovich E M, Singh K J, Lavagno L et al. SIS: A system for sequential circuit synthesis. Technical Report UCB/ERL M92/41, University of California, Berkeley, 1992. [23] Strukov D, Likharev K. A reconfigurable architecture for hybrid CMOS/nanodevice circuits. In Proc. the 14th ACM/SIGDA Int. Symp. Field Programmable Gate Arrays, Monterey, USA, Feb. 22-24, 2006, pp.131-140.
Zhu-Fei Chu received the B.S. degree in electronic information science and technology from Shandong University, Weihai, China, in 2008 and the M.S. degree in communication and information system from Ningbo University, Ningbo, China, in 2011, where he is currently working toward the Ph.D. degree in the same subject. His research interests include the design automation, low power SoC physical design. He is a student member of IEEE.
J. Comput. Sci. & Technol., Jan. 2012, Vol.27, No.1 Yin-Shui Xia received the B.S. degree in physics and the M.S. degree in electronic engineering from Hangzhou University, Zhejiang, China in 1984 and 1991, respectively, and the Ph.D. degree in electronic engineering from Edinburgh Napier University, Edinburgh, UK in 2003. He was a visiting scholar at King’s College London in 1999 and then joined Edinburgh Napier University as a research assistant and enterprise fellow from 2000 to 2005. He is currently a professor at Ningbo University, Ningbo, China. His research interests include low-power digital circuit design, logic synthesis and optimization, and SoC design. Lun-Yao Wang received the B.S. degree in physics education from Ningbo University, Ningbo, China, in 1995 and the M.S. degree in circuits and systems from Zhejiang University, Hangzhou, China, in 2003, where he is currently working toward the Ph.D. degree in the same subject. He is currently an associate professor at the School of Information Science and Engineering at Ningbo University. His research interests include low-power digital circuits design, logic synthesis and optimization.