Designing Function Configuration Decoders for the PAnDA ...

Designing Function Configuration Decoders for the PAnDA architecture using Multi-objective Cartesian Genetic Programming James Alfred Walker, Martin A. Trefzer and Andy M. Tyrrell Intelligent Systems Group, Department of Electronics, University of York, Heslington, York, YO10 5DD, UK Email: [email protected], [email protected], [email protected] Abstract—The Programmable Analogue and Digital Array (PAnDA) is a novel reconfigurable architecture, which allows variability aware design and rapid prototyping of digital systems. Exploiting the configuration options of the architecture allows the post-fabrication correction and optimisation of circuits directly in hardware using bio-inspired techniques. In order to reduce the overhead of extra configuration memory and area consumption, a portion of the configuration memory required to configure the logic functionality of the Configurable Analogue Blocks (CABs) in the PAnDA architecture is replaced by Function Configuration Decoders (FCDs). In the past, bio-inspired approaches based on Cartesian Genetic Programming have been demonstrated as a suitable method for designing such circuit topologies. As the area of the FCDs is a primary concern, in addition to performance, a form of CGP which utilises a multi-objective strategy (MOCGP) is used to evolve FCD designs for the two types of CAB present in the PAnDA architecture. The results show that MOCGP is capable of evolving and optimising FCDs that are optimal for area and performance for both CABs. A PAnDA prototype chip containing FCDs is currently being fabricated. Also, when compared with designs produced by a commercial synthesis tool, the MO-CGP designs are smaller, faster, and more power efficient.

I.

that it is possible to design and optimise analogue CMOS circuits in hardware using field programmable transistor arrays (FPTAs) [5], [6]. Therefore, if FPTA based mechanisms to alter device sizes according to [2] could be incorporated in hardware, it would be possible to optimise designs post fabrication in a dynamic fashion. This would not only have the advantage of being able to enhance variability tolerance and performance for a specific design, but could also account for variations between different devices. In addition, such a multifaceted reconfigurable device will provide significant resources to deal with both pre- and post-fabrication faults PAnDA (Programmable Analogue and Digital Array) [7], [8] is a novel reconfigurable adaptive, evolvable architecture, which allows variability aware design and rapid prototyping by exploiting the configuration options of the architecture to allow the post-fabrication correction and optimisation of circuits directly in hardware using bio-inspired techniques. Similar to FPGAs, the PAnDA architecture provides configuration at the digital level for circuit design. Accessibility to additional configuration options of the underlying analogue layer enables continuous adjustment of circuit characteristics at runtime, which enables dynamic optimisation of the mapped design’s performance. Moreover, the yield of devices can be improved post-fabrication via reconfiguration at the analogue level, which can overcome faults induced due to process variability. Since optimisation goals are generic (i.e., not restricted to simply reducing variability, power consumption or increasing speed), the same mechanisms have a great potential to enhance the device’s fault tolerant abilities in the case of component degradation and failures during its life-time or when exposed to hazardous environments. Examples of previous work where post-fabrication optimisation has been shown to be beneficial in terms of yield, fault tolerance and/or performance can be found in [6], [9]–[12]

I NTRODUCTION

Current FPGA architectures have been continuously improved over the past 20 years. They have directly benefited from the advancements in process technology which made it possible to significantly increase logic density as a direct result of reducing feature sizes without the need for creating conceptually new FPGA architectures. As a consequence, programmable logic elements of all models and vendors generally consist of LUTs, MUXes and Flip-Flops that are arranged in similar topologies. However, there may be severe limitations to this strategy when faced with the challenges of electronic design when shrinking device sizes to the atomistic scale, even when moving to new device technologies such as SOI and FinFETs, given the likely affect of variability on latches in the future [1]. It is anticipated that this will have a direct impact on FPGA architectures, which comprise of a large number of Flip-Flops, of which latches are an essential part. In addition, SRAM cells may become unstable even in the static case, which would randomly change the configuration of an FPGA and the effective logic function.

PAnDA proposes an approach to introduce mechanisms to overcome stochastic variability by automatically optimising designs post-fabrication. In order to achieve this, reconfiguration options are included in the design of the device that allow the characteristics of the fabricated devices and components to be altered during operation. This provides an access point for optimisation algorithms to find configurations that may improve the circuit’s performance and bring it back into specification.

Recent work suggests that optimising the widths of transistors in standard cells can improve their variability tolerance, in addition to performance [2]–[4]. It has been also shown

c 978-1-4673-5869-9/13/$31.00 2013 IEEE

Although introducing (additional) configuration options into a design generates area overhead there will be an overall

96

Fig. 1. The hierarchical design structure of the PAnDA architecture. Cell, interconnect and CLB level are similar to current commercial FPGAs, whereas the Configurable Transistor Level and CAB are unique to PAnDA. While conventional FPGAs can only be reconfigured on the CLB and interconnect level, PAnDA offers additional configuration options on the analogue level.

benefit by continuing to use parts of the device that otherwise would have to be disabled because they do not work according to specification, or even worse, not being able to use the whole device. In order to reduce this overhead, a portion of the configuration memory required to configure the logic functionality of the Configurable Analogue Blocks (CABs) in the PAnDA architecture will be replaced by Function Configuration Decoders (FCDs). Cartesian Genetic Programming, originally developed by Miller and Thomson, is a design technique which has been used to evolve novel logic-circuit topologies and has demonstrated efficiency in computation time and resources over other biologically-inspired methods such as Genetic Programming [13], [14]. CGP differs from conventional Genetic Programming in its representation of a program, which is a directed graph as opposed to a tree. A key benefit of this representation is the implicit re-use of nodes whereby a node can be connected to the output of any previous node within the graph. Vasicek and Sekanina have also shown that by using a SAT-based fitness function, it is possible to scale-up the size of circuit that CGP can evolve and that the resultant circuits were also smaller in terms of the number of gates than those produced by conventional synthesis methods [15]. However, in conventional logic design one of the primary goals is to simultaneously minimise area, power consumption and delay of a circuit; fewer large circuits can be fabricated on a single wafer which results in increased cost, longer delays result in a decrease in the maximum operating frequency of the device, and high power consumption generally leads to shorter battery life in mobile devices. Recently, the conventional CGP algorithm has been augmented with a multi-objective optimisation stage, which is based on a modified Non-Dominating Sorting Algorithm II (NSGA-II) [16] and is known as Multi-objective CGP (MOCGP) [17], [18]. Initially, MO-CGP uses a conventional boolean-error score based on the binary Hamming distance between the observed output and the target truth-table to find a functionally correct circuit, which is then further optimised for performance over a number of different criteria (such as gate/transistor count, longest gate/transistor path) using the

multi-objective strategy. MO-CGP has been shown to produce smaller, faster designs than those produced by conventional CGP over a number of circuit benchmarks, such as adders and multipliers [17], [18]. In this paper, the MO-CGP algorithm is used to produce optimised FCD designs for two different types of CAB found in the PAnDA architecture. A comparison is also made between the FCD designs produced by MO-CGP and a commercial logic synthesis tool. A prototype chip of the PAnDA architecture, including the evolved FCD designs, is currently being fabricated in a 40nm process and is expected back for testing in early 2013. The next section of the paper provides an overview of the PAnDA architecture. Section III describes the MO-CGP algorithm. Section IV discusses the results of MO-CGP when designing the FCDs for the two CABs. Section V compares the results from the MO-CGP experiments with a commercial logic synthesis tool. Finally, the conclusions and ways in which the MO-CGP algorithm may be improved in the future are discussed in section VI. II.

T HE PA N DA A RCHITECTURE

With PAnDA we propose a novel FPGA architecture, which provides a radically different approach to reconfiguration than current off-the-shelf devices in order to make them more reliable when shrinking transistor sizes to the nano-scale [7], [8]. The PAnDA architecture features configurable transistors (CTs), the effective device sizes of which can be altered over a wide range by configuring them in various ways. The hierarchical design of PAnDA is illustrated in Figure 1. As can be seen from the figure, the most important difference to a conventional FPGA is that the PAnDA architecture extends to a finer granularity of building blocks. As a result, additional options are created that provide configuration access to lower design levels (i.e., the characteristics of transistors that define the functional behaviour at higher levels (the digital level) can be altered). At the next higher hierarchy level to the CTs, the PAnDA architecture comprises configurable analogue blocks (CABs),

2013 IEEE International Conference on Evolvable Systems (ICES)

97

configurable logic blocks (CLBs), logic cells and interconnect. CLBs, logic cells and interconnect are also present— and have a similar structure—in current commercial FPGA architectures, whereas CTs and CABs are unique to the PAnDA architecture and are described in the following sections. A. Configurable Transistor (CT) The CT is the smallest reconfigurable element of the PAnDA architecture. As can be seen from Figure 1, the CT is a device formed from 7 PMOS or 7 NMOS (depending on whether it is a NMOS or PMOS CT) transistors (M0. . . M6) that are connected in parallel. Each of the 7 transistors can be individually turned on or off via opening or closing a switch (S0. . . S6) that connects their gate to a common gate connection (CG). The states of the switches are controlled via configuration bits stored in a configuration static random access memory (SRAM). This design exploits the fact that a number of CMOS transistors of the same gate length (L) can be connected in parallel in order to form a device that is equivalent to a single transistor of the same length and with the sum of the individual widths (W). For example, connecting two transistors in parallel 120 nm where the size of one is W L = 40 nm and the size of the other W 180 nm is L = 40 nm , would result in a device that is equivalent to 300 nm a transistor with W L = 40 nm . The transistor sizes used in the PAnDA CT are L0...6 = 40 nm and W0...6 = [120, 120, 140, 160, 180, 200, 220] nm. Minimum width and length are constrained by the smallest device sizes that are allowed according to the design rules of the 40 nm technology in which the PAnDA prototype will be fabricated. The increment of 20 nm is chosen as half the minimum feature size of the technology used, which will provide a suitable value range when optimising for effects of stochastic variability. The 128 different configuration options for the CT allow it to be configured with 47 unique widths between 120 nm and 1140 nm, and there is generally more than one possible combination of transistors to achieve a certain width of the CT. This redundancy helps when optimising for stochastic variability, since it is expected that different combinations of devices that form the same width of the CT will exhibit different behaviour caused by stochastic variability. In addition, this kind of redundancy at the device level should prove invaluable when considering dynamic reconfiguration at the device level, for instance, for fault recovery. B. Configurable Analogue Blocks (CABs) Configurable Analogue Blocks (CABs) represent the next higher level of entities to the CTs, and is denoted as the CAB level in Figure 1. At the CAB level, PAnDA is a heterogeneous architecture, as it consists of two types of CAB – Combinatorial CABs (CCABs) and Sequential CABs (SCABs). The purpose of a CCAB within the PAnDA architecture is to represent basic logic blocks that can be implemented on the FPGA fabric. Whereas, the purpose of the SCAB is to represent pass transistor logic blocks and basic tri-state logic blocks that can be combined to create sequential logic components found within an FPGA fabric. With conventional CMOS design in mind, the PAnDA CABs consists of an equal number of PMOS and NMOS CTs arranged in a CMOS-like

98

structure where all source and drain terminals are directly connected and are non-configurable. This decision was made to remove any parasitic effects that are introduced from inserting switches in source-drain paths between MOSFETs, as seen in previous architectures [6], [19], [20]. The CABs also contain a configurable interconnect block that routes signals from the CAB inputs to the gate terminals of the CTs. The CCAB and SCAB feature 3 and 4 analogue inputs respectively, and 2 analogue outputs (one of which is an inversion of the other output). The configurable interconnect block routes the CAB inputs to the gate terminals of the respective PMOS and NMOS CTs forming the configured logic function. Any CTs that do not contribute to the configured logic function are either disabled or made transparent (i.e., form a path to Vdd or Vss ). The configurable interconnect blocks for both CABs currently have 8 different configuration options, providing 16 different 1-, 2and 3-input logic functions for the CCAB and 16 different 1-, 2-, 3- and 4- input logic functions for the SCAB. The configurable interconnect block is controlled via configuration bits stored in a configuration static random access memory (SRAM). In order to try and minimise the amount of configuration memory required to configure the CABs and also provide a method to easily disable all CTs in a CAB when not in use, it was decided that a FCD would be implemented for both the CCAB and SCAB. The FCD takes its inputs from the configuration SRAM and its outputs control the routing of signals in the configurable interconnect block, which in turn determines the logic function of the CAB. For both CABs, the FCD has one enable bit and three configuration bits (to encode 8 configurations) as inputs. For the FCD in the CCAB, 12 outputs are required to control the configurable interconnect block, whereas in the SCAB only 10 outputs are required from the FCD. In order for the use a FCDs to be warranted, the area overhead of an FCD should not be excessively higher than the area of using configuration SRAM alone, so an emphasis will be placed on reducing the area of the FCDs rather than performance when selecting the evolved designs from the population in section IV. III.

M ULTI - OBJECTIVE C ARTESIAN G ENETIC P ROGRAMMING

The representation used in MO-CGP is identical to that found in conventional CGP. The genotype is a fixed-length list of integers encoding both the node-function and its connections within the directed graph. Each node within the directed graph represents a particular function, such as a logic gate, and is encoded by a number of genes; one gene encodes the functionality and the remaining genes encode the inputs to the function. The nodes take feed-forward inputs from either previous nodes in the graph or a terminal input. A typical CGP genotype and its corresponding decoded phenotype and logic circuit representation are illustrated in Figure 2. In this example the genotype decodes to represent a functionally-correct 2-bit multiplier circuit. The first element of each node represents the nodes functionality (eg. 0 = logical AND gate), with the remaining two elements of each node represent the inputs to the logic block; the first n values representing the circuit inputs and the other


TABLE I.

002 003 345 012 013 257 269 4

Input A

5

2 11 8

0 11 8

11

12

13

3

9

4

9

12

oA

oB

oC

10

13

Parameter Population size (parents + offspring) Genotype length (nodes / genes) Mutation rate (%) Run length (generations)

oD

Value 20 + 80 600 / 1800 2 30,000,000

Output A 4

OR

XOR

6

AND

5

XOR

10

7 AND

Output B

AND

9

AND

2 Input D

8

AND

1 Input C

7

057

0 Input B

6

MO-CGP PARAMETERS FOR EVOLVING THE CCAB AND SCAB FUNCTION CONFIGURATION DECODERS .

XOR

11

Output C

During the initial functional verification stage of MO-CGP, where the aim is to find a functionally correct design (based on simulations in which its boolean outputs are calculated for every possible input combination), a (20 + 80) selection strategy is used (a population size of 100), which preserves neutrality in the same manner as conventional CGP [21].

AND

12

13

Output D

8

Fig. 2. An example of a CGP genotype and corresponding phenotype for a 2-bit multiplier digital circuit (four inputs, four outputs). The function genes in each node of the genotype are underlined and decode to the following functions: 0 - AND, 2 - XOR, 3 - OR. The number underneath each node in the genotype denotes the node number. The inactive areas of the genotype and phenotype are shown in dashed lines.

values representing existing decoded logic blocks. The final k elements of the genotype (4 in the given example) represent the circuit output connections, each having a single input. The genotype-phenotype mapping does not mandate that all nodes are connected to each other, resulting in a bounded variablelength phenotype, with genes that are entirely inactive and have a neutral effect on the resulting fitness. This neutrality has been shown to be very beneficial to the evolutionary process [21]. The complete genotype-phenotype decoding procedure is covered in more detail in [22], [23]. Whilst various mutation and recombination strategies are possible with CGP, a typical algorithm as discussed in this paper will be entirely mutation-based without any form of crossover [24]. A point-mutation operator is generally used in which a given number of genes from anywhere within the genotype are changed to a different value; the new value is chosen at random but will fall within the valid range of values which depends on the genes position within the genotype. For example, if the gene selected for mutation is one which determines the nodes functionality, the new value will be within the range of the function set. However, if the gene selected for mutation is an input gene, the new value will fall within a range covering the program inputs and the output label of any previous node. Most previous published work based on the CGP algorithm for evolving digital circuits has made use of a (1 + λ) selection process and thus small population size [25], [26]. For effective use of a multiple-objective strategy in CGP, it is necessary to use a significantly larger population in order to arrange resultant circuits into Pareto-fronts and extract a range of circuits from the primary front [16]. For this reason the population size for MO-CGP was set to 100 for the experiments conducted in this paper, as this has previously been shown to be a suitable value [17].

Once a functionally correct design has been found, MOCGP adopts a multi-objective strategy based on a modified form of the popular NSGA-II (Non-dominated Sorting Genetic Algorithm II) [16], in which neutrality is preserved by allowing offspring to dominate parents when they are tied for fitness. This selection algorithm arranges all the individuals within the population into a series of non-dominated Pareto fronts and allows an unbiased trade-offs between the objective [16]. One characteristic of the selection algorithm is it promotes a wide spread of results on the primary fronts and avoids filling the population with closely-clustered individuals, thus maintaining a diverse set of solutions. The choice of NSGA-II for the multiobjective algorithm was made due to observed performance in both external literature and previous work by the authors in evolution of variability tolerant designs [17], [27]. As in the initial function verification stage, each circuit is simulated in software by calculating its boolean outputs for every possible input combination. For each circuit evaluated, the primary fitness measure is the circuit functionality. Circuits which are not functionally correct are not evaluated further and not included in the pareto fronts used by NSGA-II. For circuits which pass the functionality test, three fitness scores are calculated which are equally weighted within the fitness calculation. The first of these objectives is the number of logic gates used, which is a common fitness criteria for optimising circuit designs using CGP [15], [17], [25]. Minimising the number of logic gates will generally result in the most compact circuit schematics. Whilst having a minimum number of gates is a clear goal in terms of creating compact, space-efficient schematics, and is an interesting optimisation target, there are a number of reasons why it is not necessarily the best target for an optimised circuit. It must be remembered that not all gates are equal; exclusive-gates contain more transistors than AND/OR gates, which themselves contain more transistors than NAND/NOR gates. The number of transistors impacts on the die-area, power consumption and maximum operating speed in a fabricated design. For these reasons, the secondary objective is the number of transistors used, based on the logic-gate arrangements found within the standard cell library of the 40nm process design kit (PDK) that was used to fabricate the PAnDA architecture. Each inverter is considered to be 2 transistors, each NAND and NOR function is 4 transistors, each AND and OR gate is


99

TABLE II.

MO-CGP RESULTS FOR A FUNCTIONALLY CORRECT AND OPTIMISED FUNCTION CONFIGURATION DECODER FOR THE CCAB AND SCAB OF THE PA N DA ARCHITECTURE . CCAB

SCAB

Optimised

Generation Gate Count Gate Path Length Transistor Count

Best

Average

Worst

Functional

Best

Average

Worst

31,725

30M

30M

30M

15,929

30M

30M

30M

111 13 610

21 4 92

22.25 5.30 101.55

24 6 110

111 16 605

21 4 86

21.35 4.65 89.59

22 6 92

6 transistors and the XNOR and XOR gates are 12 transistors each. Whilst there will be a degree of correlation between gate-count and transistor-count, there are important differences which may affect which design is considered more useful; a minimised number of gates will generally result in a more compact gate-level schematic and thus might be preferable if the design is to appear in print, however the minimised number of transistors will result in a more-compact fabricated design and thus may be more efficient in actual fabrication. The final objective is the longest gate-level path length between input and output, which will give some measure of delay found in the critical path of the design. For this objective, all gates are equally weighted, with the resultant scores being the sum of gates in the longest path between an output and any of its inputs. IV.

E VOLVING THE F UNCTION C ONFIGURATION D ECODERS

The MO-CGP algorithm was set to evolve the FCDs for each of the two CABs in the PAnDA architecture. Both FCDs have 4 inputs (1-bit enable and 3-bit for configuring the 8 functions of each CAB) and 12 or 10 outputs for the CCAB or SCAB respectively (configures the overall CAB function by routing signals from CAB inputs to the CTs via the configurable interconnect block). The circuits were allowed to be synthesised from the 2-input logic gates: NAND, NOR, AND, OR, XNOR and XOR, and a 1-input inverter. For each FCD, a single run was executed for a predetermined number of generations as described in Table I. At the end of each run, the 20 best individuals were saved for analysis and the results are summarised in Table II. Also recorded was the generation and objective scores of the first circuit in each run which was functionally correct, as this result is where a conventional CGP run would terminate. As can be seen from Table II, MO-CGP is capable of finding functionally correct designs for both the CCAB and SCAB after a short number of generations (approximately 32,000 and 16,000 respectively) and that after the multiobjective optimisation stage of the algorithm the functionally correct designs can be drastically optimised for gate count, gate path length and transistor count. For both the CCAB and SCAB, there is little variation in the gate count and gate path length objectives, whereas there is a much larger spread in the transistor count objective. This could be attributed to the fact that the function set contains gates with different numbers of transistors, so two circuits with equivalent gate counts may have vastly different transistor

100

Optimised

Functional

counts. It is also possible that designs with a higher than average gate count may have a lower than average transistor count. For this reason, transistor count was the main objective on which a design was selected from the final population for the FCD for the CCAB and SCAB, as this should have the strongest correlation with the physical area of the design. This is extremely important for configuration circuitry of a reconfigurable architecture, as area overhead is a huge concern, as it will be repeated numerous times on the chip. Speed and dynamic power usage for configuration circuitry are much less important, as the majority of the time the FCDs will be in static operation, so it is area and leakage power that are the main concern. With this in mind, the designs with minimal transistor count were selected from the final population for the FCDs of the CCAB and SCAB, which had the objective scores of 24, 6, 92 and 21, 6, 86 respectively for the gate count, gate path length and transistor count. The schematics for these designs are shown in Figure 3. V.

C OMPARISON WITH COMMERCIAL SYNTHESIS TOOLS

In order to verify that the chosen designs evolved by MO-CGP for the CCAB and SCAB FCDs were correct and to also gather more information about the designs (such as timing, area and power metrics), the two chosen designs were converted to verilog netlists and analysed using Cadence Encounter RTL Compiler (RC), an industry-standard commercial synthesis tool used for logic design, and the standard cell library from the 40nm PDK, from which the PAnDA chip is currently being fabricated. In order to compare the chosen MOCGP designs, a VHDL description of the CCAB and SCAB FCDs, based on their truth tables were also synthesised using RC and the standard cell library from the 40nm PDK, which is the non-evolutionary industry-standard method of producing the logic designs for the CCAB and SCAB decoders. In order for the RC synthesised designs for the CCAB and SCAB to be compared with the MO-CGP designs, the number of standard cells available for use by RC is restricted to those with the same functionality as the MO-CGP function set (i.e., AND, NAND, NOR, OR, XOR, XNOR, INV). The RC synthesised designs for the CCAB and SCAB are shown in Figure 4. Table III compares the CCAB and SCAB FCD designs produced by MO-CGP and RC. The top section of the table compares the designs based on the objectives used by MOCGP, whereas the second half of the table compares the designs based on performance analysis metrics in RC, which are calculated using data from the standard cells in the 40nm PDK. Comparing the MO-CGP and RC designs based on the MO-CGP objectives clearly shows that MO-CGP is capable


(a) Fig. 3.

(b)

Evolved function configuration decoders for the CCAB (a) and SCAB (b) with the best transistor count.

(a) Fig. 4.

(b)

Function configuration decoders for the CCAB (a) and SCAB (b) produced by a commercial synthesis tool.

TABLE III.

C OMPARISON BETWEEN THE CCAB AND SCAB

FUNCTION CONFIGURATION DECODERS PRODUCED BY SYNTHESIS TOOL .

CCAB

Gate Count Gate Path Length Transistor Count Area (μm2 ) Delay (ps) Leakage Power (nW) Dynamic Power (μW)

MO-CGP AND A COMMERCIAL

SCAB

MO-CGP

RC

Comparison

MO-CGP

RC

Comparison

24 6 92

30 5 120

-20% +20% -23%

21 6 86

26 6 110

-19% 0% -22%

17.91 403 7.62 1.08

23.13 346 9.41 1.31

-23% +16% -19% -18%

17.24 346 7.03 1.02

19.96 454 9.02 1.27

-14% -24% -22% -20%


101

for producing much smaller designs for the CCAB and SCAB FCDs based on gate count (-20% and -19% respectively) and transistor count (-23% and -22% respectively). In terms of gate path length, both MO-CGP and RC performed equally well for the SCAB but RC produced a slightly shorter critical path for the CCAB FCD. However, if the final population of designs was considered for MO-CGP instead of only the design with the smallest transistor count, then MO-CGP would have produced designs that had a shorter critical path than RC in addition to a smaller gate and transistor count. For example, one design in the final MO-CGP population had the scores 23, 4 and 110, for the gate count, gate path length and transistor count respectively, which would be an improvement over the RC design by 23%, 20% and 8% respectively.

successfully evolved functionally correct designs for FCDs for both the CCAB and SCAB and was also capable of optimising the designs for multiple objectives such as gate count, gate path length and transistor count. Comparing the optimised designs with designs for the FCDs produced by a commercial synthesis tool using a commercial 40nm standard cell library showed that the designs produced by MO-CGP were superior when using either the metrics used by MO-CGP during optimisation or those from the commercial synthesis tool (i.e., timing, area, and power). However, the trade-off is that the runtime of MOCGP is much slower than that of the commercial synthesis tool. Although in certain circumstances, such as reconfigurable architecture design, a more optimal design may warrant the extra run-time required.

Interestingly, comparing the MO-CGP and RC designs based on the performance metrics from RC shows a similar trend to that using the objectives from MO-CGP. There especially appears to be close correlation between gate and transistor count and area, leakage and dynamic power. It is also worth noting that when a design has a longer gate path length, the delay metric will also be higher. However, if the two designs have an equal gate path length, then the delay value can differ significantly (24% in the case of the SCAB). This could be attributed to the different gates present in each of those critical paths and also the fan out and drive strength of each gate of the critical path, as this could drastically effect the overall timing.

A prototype version of the PAnDA architecture, including the evolved FCDs for the CCAB and SCAB is currently being fabricated in a 40nm process and the packaged chips are expected back by early 2013. In future work, the FCDs will be tested and verified in hardware in order to substantiate the results of this paper. It is also intended that the work presented in this paper will be taken further by investigating the use of a full standard cell library as the function set for MO-CGP and also the use of “real-world” performance metrics such as area, timing and power information as the objectives during the optimisation stage of MO-CGP.

Although MO-CGP produces more efficient designs when compared with RC, it is worth noting that RC produced these designs in a matter of seconds, whereas MO-CGP required hours to reach the optimisation limit of 30 million generations, although it is possible that the optimised designs may have been found much quicker in the optimisation cycle. However, when designing components for a reconfigurable architecture (such as PAnDA) that are repeatedly used throughout the chip, the extra run-time required by MO-CGP is worthwhile, as a small saving in area per component can add up to a huge area saving over the entire chip die. The die area that has been saved could then be used to implement extra reconfigurable logic resources (such as CABs), which will increase the overall functionality of the chip. VI.

C ONCLUSIONS & F UTURE W ORK

This paper has introduced the PAnDA architecture and its main components, such as CT’s and CAB’s, which make it unique compared with commercial FPGA architectures. One of the aims when designing the PAnDA architecture was to minimise the ratio between transistors that process user signals and those that perform configuration. An area of the PAnDA architecture where this can be achieved is by reducing the amount of configuration memory required for the CABs. To achieve this, FCDs are required to control the functional behaviour of the two types of CAB (CCAB and SCAB) in the PAnDA architecture. This also has the benefit of reducing the possibility of invalid configurations that may damage the chip by producing short-circuits at the transistor level. In order to design FCDs for the CCAB and SCAB, a technique known as MO-CGP was applied, using a truth table description of the decoders to assess functionality. MO-CGP

102

ACKNOWLEDGMENT This work is part of the PAnDA project that is funded by EPSRC (EP/I005838/1) and is the subject of a UK patent application (GB1119099.8). R EFERENCES [1] A. Asenov, “Statistical Nano CMOS Variability and Its Impact on SRAM,” in Extreme Statistics in Nanoscale Memory Design. Springer Berlin Heidelberg, 2010, pp. 17–50. [2] J. A. Walker, J. A. Hilder, D. Reid, A. Asenov, S. Roy, C. Millar, and A. M. Tyrrell, “The evolution of standard cell libraries for future technology nodes,” Genetic Programming and Evolvable Machines, vol. 12, no. 3, pp. 235–256, Apr. 2011. [Online]. Available: http://www.springerlink.com/content/e2906714n5qj9j82/ [3] J. A. Walker, R. O. Sinnott, G. Stewart, J. A. Hilder, and A. M. Tyrrell, “Optimising Electronic Standard Cell Libraries for Variability Tolerance Through the Nano-CMOS Grid,” Philosophical Transactions of the Royal Society A, 2010. [4] J. A. Hilder, J. A. Walker, and A. M. Tyrrell, “Optimising variability tolerant standard cell libraries,” in 2009 IEEE Congress on Evolutionary Computation. IEEE, May 2009, pp. 2273–2280. [Online]. Available: http://ieeexplore.ieee.org/lpdocs/epic03/wrapper. htm?arnumber=4983223 [5] J. Langeheine, M. A. Trefzer, J. Schemmel, and K. Meier, “Intrinsic Evolution of Analog Electronic Circuits Using a CMOS FPTA Chip,” in Fifth Conference on Evolutionary Methods for Design, Optimization and Control with Applications to Industrial and Societal Problems (EUROGEN)., 2003. [6] A. Stoica, R. S. Zebulum, D. Keymeulen, R. Tawel, T. Daud, and A. Thakoor, “Reconfigurable VLSI Architectures for Evolvable Hardware: From Experimental Field Programmable Transistor Arrays to Evolution-Oriented Chips,” IEEE Transactions on Very Large Scale Integration (VLSI) Systems, vol. 9, no. 1, pp. 227–232, Feb. 2001. [7] M. A. Trefzer, J. A. Walker, and A. M. Tyrrell, “A Programmable Analog and Digital Array for Bio-inspired Electronic Design Optimization at Nano-scale Silicon Technology Nodes,” in IEEE Asilomar Conference on Signals, Systems, and Computers, Asilomar, CA, Nov. 2011.


[8] J. A. Walker, M. A. Trefzer, and A. M. Tyrrell, “A Reconfigurable Architecture for Current and Future Challenges in Electronic Design and Technology,” in Proceedings of 2012 Workshop on Variability Modelling and Mitigation Techniques in Current and Future Technologies (VAMM 2012), Dresden, Germany, 2012. [9] E. Takahashi, Y. Kasai, M. Murakawa, and T. Higuchi, “A post-silicon clock timing adjustment using genetic algorithms,” in 2003 Symposium on VLSI Circuits. Digest of Technical Papers (IEEE Cat. No.03CH37408). Japan Soc. Appl. Phys, pp. 13– 16. [Online]. Available: http://ieeexplore.ieee.org/xpl/articleDetails.jsp? arnumber=1221149 [10] M. Murakawa, T. Adachi, Y. Niino, Y. Kasai, E. Takahashi, K. Takasuka, and T. Higuchi, “An AI-calibrated IF filter: a yield enhancement method with area and power dissipation reductions,” IEEE Journal of Solid-State Circuits, vol. 38, no. 3, pp. 495–502, Mar. 2003. [Online]. Available: http://ieeexplore.ieee.org/xpls/abs\ all. jsp?arnumber=1183858 [11] E. Takahashi and Others, “Power dissipation reductions with genetic algortihms,” in Proceedings of NASA/DOD Evolvable Hardware, 2005. [12] J. Langeheine, M. Trefzer, J. Schemmel, and K. Meier, “Intrinsic Evolution of Digital-To-Analog Converters Using a CMOS FPTA Chip,” in Proc. of the NASA/DoD Conf. on Evolvable Hardware. Seattle, WA, USA: IEEE Press, Jun. 2004, pp. 18–25. [13] J. F. Miller and P. Thomson, “Cartesian Genetic Programming,” in Proceedings of the 3rd European Conference on Genetic Programming, ser. Lecture Notes in Computer Science, vol. 1802. Springer-Verlag, 2000, pp. 121–132. [14] J. R. Koza, Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, 1992. [15] Zdenek Vasicek and Lukas Sekanina, “A Global Postsynthesis Optimization Method for Combinational Circuits,” in DATE ’11 Proceedings of the Conference on Design, Automation and Test in Europe, 2011, pp. 1525–1528. [16] K. Deb, A. Pratap, S. Agarwal, and T. Meyarivan, “A fast and elitist multi-objective genetic algorithm: NSGA-II,” IEEE Transactions on Evolutionary Computation, vol. 6, no. 2, pp. 181–197, 2002. [17] J. A. Hilder, J. A. Walker, and A. M. Tyrrell, “Use of a multiobjective fitness function to improve cartesian genetic programming circuits,” in 2010 NASA/ESA Conference on Adaptive Hardware and Systems. IEEE, Jun. 2010, pp. 179–185. [Online]. Available: http://ieeexplore.ieee.org/xpl/freeabs\ all.jsp?arnumber=5546262 [18] L. Sekanina, J. A. Walker, P. Kaufmann, M. Platzner, and J. F. Miller, “Evolution of Electronic Circuits,” in Cartesian Genetic Programming, ser. Natural Computing Series, J. F. Miller, Ed. Berlin, Heidelberg: Springer Berlin Heidelberg, 2011, pp. 125–179. [Online]. Available: http://www.springerlink.com/content/u62487882777w4g4/

[19] A. Stoica, Lazaro Carlos-Salazar, D. Keymeulen, and K. Hayworth, “Evolution of CMOS Circuits in Simulations and Directly in Hardware on a Programmable Chip,” in Proc. of the Genetic and Evolutionary Computation Conference (GECCO), W. Banzhaf, J. Daida, A. E. Eiben, M. H. Garzon, V. Honavar, M. Jakiela, and R. E. Smith, Eds. Orlando, FL, USA: Morgan Kaufmann, Jul. 1999, pp. 1198–1203. [Online]. Available: http://ehw.jpl.nasa.gov/publications.htm [20] J. Langeheine*, M. Trefzer*, D. Brüderle, K. Meier, and J. Schemmel, “On the evolution of analog electronic circuits using building blocks on a CMOS FPTA,” in Proceedings of the Genetic and Evolutionary Computation Conference (GECCO 2004), Part I, ser. LNCS 3102, K. Deb and Others, Eds., vol. 1. Seattle, WA, USA: Springer- Verlag, Jun. 2004, pp. 1316–1327. [21] T. Yu and J. F. Miller, “Neutrality and the evolvability of Boolean function landscape,” in Proceedings of the 4th European Conference on Genetic Programming, ser. Lecture Notes in Computer Science, vol. 2038. Springer-Verlag, 2001, pp. 204–217. [22] J. A. Walker and J. F. Miller, “Evolution and Acquisition of Modules in Cartesian Genetic Programming,” in Genetic Programming, ser. Lecture Notes in Computer Science, M. Keijzer, U.-M. OReilly, S. Lucas, E. Costa, and T. Soule, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2004, vol. 3003, pp. 187–197. [Online]. Available: http://www.springerlink.com/content/btnej5c0ubyqv23b/ [23] ——, “The Automatic Acquisition, Evolution and Reuse of Modules in Cartesian Genetic Programming,” IEEE Transactions on Evolutionary Computation, vol. 12, no. 4, pp. 397–417, Aug. 2008. [Online]. Available: http://ieeexplore.ieee.org/xpl/freeabs\ all.jsp?arnumber=4358780 [24] J. Clegg, J. A. Walker, and J. F. Miller, “A new crossover technique for Cartesian genetic programming,” in Proceedings of the 9th annual conference on Genetic and evolutionary computation - GECCO ’07. New York, New York, USA: ACM Press, Jul. 2007, p. 1580. [Online]. Available: http://dl.acm.org/citation.cfm?id=1276958.1277276 [25] J. F. Miller, D. Job, and V. K. Vassilev, “Principles in the Evolutionary Design of Digital Circuits - Part I,” Genetic Programming and Evolvable Machines, vol. 1, pp. 8–35, 2000. [26] J. A. Walker, J. A. Hilder, and A. M. Tyrrell, “Evolving VariabilityTolerant CMOS Designs,” in Evolvable Systems: From Biology to Hardware, ser. Lecture Notes in Computer Science, G. S. Hornby, L. Sekanina, and P. C. Haddow, Eds. Berlin, Heidelberg: Springer Berlin Heidelberg, 2008, vol. 5216, pp. 308–319. [Online]. Available: http://www.springerlink.com/content/d04902888742712l/ [27] ——, “Towards evolving industry-feasible intrinsic variability tolerant CMOS designs,” in 2009 IEEE Congress on Evolutionary Computation. IEEE, May 2009, pp. 1591–1598. [Online]. Available: http://ieeexplore. ieee.org/lpdocs/epic03/wrapper.htm?arnumber=4983132


103

Designing Function Configuration Decoders for the PAnDA ...

Designing Function Configuration Decoders for the PAnDA ...

Suggest Documents

Designing for Health Behavior Change - Usability panda

Designing the Enterprise Architecture Function

Designing for Self-Configuration and Self-Adaptation in the ... - MEWS

A Function Based Approach for Designing Intelligent

Designing for Self-Configuration and Self ... - Semantic Scholar

expectations for cop23 - Panda

OTRA-Based Multi-Function Inverse Filter Configuration

Designing Smartphone Apps that Support Habit ... - Usability panda

Don't Forget Your Pill! Designing Effective ... - Usability panda

Designing Smartphone Apps that Support Habit ... - Usability panda

Designing Effective Medication Reminder Apps That ... - Usability panda

Designing Smartphone Apps that Support Habit ... - Usability panda

Decay Function Model for Resource Configuration and Adaptive

Fast Software Polar Decoders

Optimal Decoders For 4G LTE Communication

LINEAR PRECODERS AND DECODERS DESIGNS FOR ... - CiteSeerX

Stay the course - Panda

the circle - Panda - WWF

Decoders for Truly Asynchronous Spectral ... - CiteSeerX

Efficient Decoders for Qudit Topological Codes

Capacities and Capacity-Achieving Decoders for Various ...

crossing the divide - Panda

Technical Design Study for the PANDA Time

Panda - to Tan for EOUGH