and implementation of digital circuits from a behavioral de- scription subject to ... a multiple input signature register (MISR), the combinational block connected to ...
An Incremental Approach for Test Scheduling and Synthesis using Genetic Algorithms Haidar Harmanani and Aouni Hajar Department of Computer Science Lebanese American University Byblos, 1401 2010, Lebanon
Abstract— This paper presents a new and efficient method for concurrent BIST synthesis and test scheduling. The method maximizes concurrent testing of modules while performing the allocation of functional units, test registers, and multiplexers. The method is based on a genetic algorithm that efficiently explores the testable design space. The method was implemented using C++ on a Linux workstation. Several benchmark examples have been implemented and favorable results are reported.
I. I NTRODUCTION VLSI chips designers have long realized the importance and advantages of incorporating testability early in the design process. High-level synthesis is concerned with the design and implementation of digital circuits from a behavioral description subject to a set of goals and constraints. The two main tasks in high-level synthesis are operation scheduling and resources allocation. There are usually constraints on the design that may limit the total area, throughput, delay, or power consumption. Built-In Self Test (BIST) is a “test per clock” DFT technique that allows a circuit to test itself by embedding external tester features into the chip itself [5]. The main idea of BIST is to reconfigure the data path, during test mode, into a number of self testable Combinational Logic Blocks (CLBs). During selftest, combinational logic blocks (CLBs) must be configured into kernels. A kernel is a combinational block that is fed directly or indirectly by a pseudorandom pattern generator, and each output feeds either directly or indirectly a signature analyzer. In order to reduce test time, CLBs are organized into test sessions. A test session brings together the tests of compatible modules. This compatibility is checked with respect to the test resource sharing needs. Genetic Algorithms (GAs) are stochastic combinatorial optimization techniques based on the evolutionary improvement in population. GAs operate on a population of chromosomes where each chromosome consists of a number of genes. Each gene represents one of the parameters to be optimized. A number of GAs control parameters are used to combine and mate the chromosomes within the population. A. Related Work There has been various deterministic approaches to solve the test datapath synthesis problem [12]. Some of the earliest work in test synthesis was presented by Harmanani et al. [8] and Avra et al. [3] while Craig et al. [6] proposed one the
most classical work in test scheduling through optimal and sub-optimal procedures. Recently, Kim et al. [9] proposed a BIST datapath synthesis approach based on ILP that performs testable synthesis and test scheduling. Nicolici et al. [10] proposed a method that reduces test application time by grouping same-type modules into test compatibility classes. Parulkar et al. [1] proposed a method that minimizes the sharing of test registers in order to reduce BIST area overhead while Harris et al. [2] developed synthesis techniques to synthesize BISTed datapaths with the time required for testing is minimized. Dhodi [7] proposed some of the earliest genetic algorithm based methods for datapath synthesis. The problem is altered using a genetic algorithm and then transformed into solution space by means of a heuristic. B. Problem Description This paper presents a method for high level synthesis with testability. Given a behavioral description of a digital circuit and a set of design constraints, the method uses a genetic algorithm in order to generate a self-testable RTL datapath that: 1) implements the original behavior; 2) minimizes the overhead of test registers in the data path, and 3) minimizes the test time by minimizing the number of test sessions. The main features of the proposed method are: • A model for the testable synthesis of RTL datapath structures from behavioral descriptions based on the BIST methodology. The method discriminatingly incorporates all kind of test registers including BILBOs, and CBILBOs • Rapid exploration of the complex design space using an efficient genetic algorithm. • Sharing and minimizing of BIST registers required to test different modules based on user constraints. • A trade-off scheme that provides the capability to tradeoff the design area, test, and number of test sessions. The rest of the paper is organized as follows. Section II presents the evolutionary datapath synthesis along with the chromosomal representation, the genetic operators, and the cost function. Section III presents the algorithm while experimental results are discussed in section IV. II. P ROPOSED GA F ORMULATION A. Background The smallest test unit or kernel that can be tested independently consists of one test register that can be configured as
T2
c
Input 1
T1
c+1
...
Input 2 CLB
CLB
c
c+1
...
n
...
n
T3
T4
CLB k2
k1
Merged Output
Fig. 1.
n
c
c+1
Simple Circuit Fig. 2.
a multiple input signature register (MISR), the combinational block connected to the inputs of this register, and a set of test registers to generate test patterns for the inputs of the block. For example, in Fig. 1, register T1 and T3 must be configured as TPGRs, and T2 as a BILBO. Since a BILBO register cannot generate pseudorandom patterns and compact test responses simultaneously, some circuits cannot be tested concurrently, especially when a test register feeds itself through a combinational logic. Test application must be scheduled such that these conflicts are avoided. Alternatively, some researchers proposed using three latches for each bit of test register such that it can generate patterns and compact test responses independently. These registers are called CBILBOs [13]. In Figure 1, register T4 must be configured as a CBILBO. B. Formulation Consider a DFG node associated with the variable instance V . It corresponds to an operation O(V ), and value that must be assigned to a register for the duration of its life span L(V ). Finally, data transfers are assigned to some path of connections. The objective of our method is to allow the mapping of DFG nodes into test kernels and to incrementally reduce the datapath cost by merging compatible test kernels. There are three conditions that should be met for the successful merging of two test kernels: 1) There is no conflict in the use of the kernels functional units. That is, they are assigned to different clock cycles. 2) The merging results with a feasible functional unit. 3) Self-adjacent registers are allowed as a tradeoff among area, delay, testability, and test time. Condition three above is accomplished through the use of CBILBO registers and subject to the following requirements: 1) non-testability of the circuit due to self-adjacency, and 2) the reduction of test sessions. Formally, two nodes that can be merged under the above conditions are called compatible. C. Chromosomal Representation A chromosome in the population encodes all the information that needs to be optimized. Each chromosome represents a candidate datapath solution that implements the original DFG. The chromosomal representation consists of a vector whose length is equal to the number of test kernels in the DFG. Each test kernel is encoded as a gene in the chromosome. Each kernel input and output is modeled as a vector whose size is a function of the total number of clock cycles and
General chromosome representation
contains references to other test kernels. Thus, if n is the total number of clock cycles, each kernel is connected to an array of size n at each port (Figure 2). to Whenever two nodes are merged and assigned to the same hardware resource, the index of the genes are updated as follows. Assume that kernel Ki is scheduled before Kj . Then, if Ki is merged with Kj , the inputs of Kj are appended to Ki ’s inputs at an index equal to the Kj ’s clock cycle. Therefore, all indexes in the Ki ’s arrays that are smaller than Ki ’s cycles are not used. D. Chromosomal Implementation In order to increase the efficiency of the computation, the algorithm uses dual representations to represent a chromosome. Thus, merged kernels in a chromosome are represented using a parent/child relationship using a doubly linked list. The list maintains references to the next child, to the parent, and to the absolute parent. Furthermore, the algorithm maintains a hash table in order to reduce the complexity caused by the above parent/child representation. Finally, registers are represented using a binary tree structure. The advantage of the above representation is that kernels search, insertion, merge, and split is done in constant time (O(1)). E. Test Registers Representation A test register is assigned an attribute that can be isControllable, isObservable, or isConcurrent. The final implementation of a register is the union of the underlying register attribute. For example if a register is a TPGR for one module and an MISR for another module in the same test session, then this register will have a final implementation attribute as a CBILBO. This implementation can indicate the hardware overhead before and after merging of data flow nodes by computing the difference in cost over two test plans. The advantage is that the algorithm can assign one attribute at a time and evaluate the tradeoff before it finally commits to the solution. The test session attribute assignment is determined by the isConcurrent attribute and test session. F. Initial Population At the beginning of each run, an analysis of the DFG is performed. Thus, compatibility relations among DFG nodes as well as registers life spans are analyzed. Compatibility relations are stored in a compatibility graph, Gcomp (V, E), that consists of vertices V denoting operations and edges E denoting the compatibility relations among DFG nodes.
Select Nodes for Merger
Register Merge Operation
Test Attributes Reassign for Merged Output Registers after Merger
Test Attributes Reassign for Connected Output Registers after merger
Multiplexer Merge Operation
Test Attributes Reassign for Self−Loop After Merger
Test Attributes Reassign for Merged Input Registers after merger
Test Session Schedule after merger
Fig. 3.
Kernel Evolution Process
Nodes that are connected with edges in the graph correspond to a partial binding. The initial population is then generated based on problem specific data in two steps. First, an initial chromosome is generated where each DFG node is mapped to a test kernel. Second, chromosomes are generated through random enumeration of partial feasible bindings that are directly derived from the compatibility graph Gcomp (V, E).
For each register in hash table If register is a BILBO tc select not−used test session if tc is found then kernel’s test session tc reset isConcurrent Attribute in Kernel else tc select test session that has min. CBILBO set isConcurrent for registers where their kernels’ test session are equal to tc (change all BILBO to CBILBO)
Fig. 4.
Test session reassignment algorithm after kernel merge
It is very clear from the selection process that some “healthy” individuals maybe selected more than once while weaker individuals have a very small chance of getting selected. I. Genetic Operators
In order to explore the design space, we use two genetic operators, mutation, and crossover. The genetic operators are G. Cost Function, Selection and Reproduction applied iteratively and by taking turns with their corresponding The objective function measures the fitness of each chromo- probabilities. some in the population and is crucial for the transmission of 1) Mutation: Mutation is used for finding new points in gene information to the next generation. The datapath cost is a the search space. We use a novel mutation operator based weighted sum of the cost of individual components in addition on a split/merge mechanism. Thus, a kernel is split into two to the number of test sessions and is given as follows: or two kernels are merged into one. It should noted that the algorithm encourages more merger than split as the split X X F = α∗( Af u (i)Nf u (i) + Ar Nr + Atr (j)Ntr (j)) + operator is destructive and has a more of a hill-climbing effect. The algorithm merges 80% of the time and splits 20% of the i j time subject to the mutation probability Pm . X Mmux (k)Mk ) + β(N umber of test sessions) 2) Crossover: In order to improve the quality of the sok lutions found, one needs to overcome the information loss where Nf u (i) is the number of functional units of type i; that occurs when the GA converges to a solution. This was Af u (i) is the area of functional units of type i. Ntr (j) is the done through the use of a uniform crossover operator that number of test registers of type j; Atr (j) is the area of test mates two parent chromosomes and produces two child chroregister of type i. Nr is the number of registers while Ar mosomes. The algorithm randomly selects two chromosomes is the area of a register. Finally, Mmux (k) is the number of and then select randomly two compatible kernels based on the compatibility graph. The operator checks the status of the multiplexers of type k with Mk the corresponding area. edge between the selected nodes. If there is an edge, then a H. Selection and Reproduction merge operation is applied otherwise a split is accepted. If the We adopt a reproduction process that maintains a mixture of probability is lower than the threshold, nothing will be applied. chromosomes based on fitness. The best chromosomes are kept III. A LGORITHM in order to search for local optimum while bad chromosomes disturbing the state of the solution space. The system selects Every chromosome represents an intermediate data path chromosomes based on the following reproduction procedure: that has different number of registers, multiplexer inputs, 1) Select 20% of the best chromosomes and 10% of the functional units and controller cost. During every generation, chromosomes are selected for reproduction, resulting in new worst chromosomes. 2) Select 25% of those chromosomes whose fitness is datapaths. This is accomplished by merging compatible nodes within each chromosome. between 90% to 100% of the best chromosomes 3) Select 40% of those chromosomes whose fitness is An initial population is first selected and all input registers between 25% to 75% of the best chromosomes are assigned a isTPGR attribute, isMISR to all output registers, 4) Select 20% of the chromosomes whose fitness is be- and isConcurrent to all remaining registers. The system then tween 17% to 30% of the best chromosomes loops for n.m times where n is the total number of generations 5) Randomly select 5% of the remaining chromosomes and m is the number of incremental steps that reproduces and
Design Circuit 3rd Order
Clock Cycles 8
6th Order FIR Filter
7
6 Tap Wavelet Filter
7
4-Point DCT
6
17 5th Order Elliptic Filter Wave
19 21
Test Sessions 1 2 3 1 2 3 1 2 3 1 2 3 1 2 1 2 1 2
ALUs
1 1 1 2 2 2
2(*),1(+) 2(*),1(+) 2(*),1(+) 2 (*), 1 (+) 2 (*), 1 (+) 2 (*), 1 (+) (*), 1 (+), 1 (-) (*), 1 (+), 1 (-) (*), 1 (+), 1 (-) (*), 1 (+), 1 (-) (*), 1 (+), 1 (-) (*), 1 (+), 1 (-) 2(*), 3(+) 2(*), 3(+) 2(*), 2(+) 2(*), 2(+) 1(*), 2(+) 1(*), 2(+)
# Mux Inputs 21 19 20 14 14 14 14 14 14 22 23 22 28 36 32 33 30 34
TPGR
MISR
BILBO
CBILBO
Normal
1 2 2 2 2 2 3 3 3 3 1 1 11 10 8 8 7 8
1 1 2 1 1 1 2 2 2 1 1 0 2 1 1 1 1 2
0 2 1 0 2 2 0 1 1 0 3 4 4 3 2 2 1 1
2 0 0 2 0 0 1 0 0 3 0 0 0 2 0 2 0 2
0 0 0 0 0 0 4 4 4 4 6 4 1 0 2 0 2 0
OH (%) 17.48 9.58 8.26 19.04 10.36 10.36 15.51 10.27 10.27 20.36 10.20 11.73 12.10 9.53 15.40 8.97 18.19 9.24
TABLE I B ENCHMARK R ESULTS
evaluates a new population. The incremental step invokes the kernel evolution process and monitors the solution feasibility by checking a set of rules. The genetic optimizer selects one operation per one incremental step, either mutation or crossover. The selected operator is applied on a randomly selected chromosomes and genes. There are six operations that are required every time two kernels are either merged or split (Figure 3). If a violation occurs during the process, the whole process is aborted. Thus, kernel evolution process commits one action at a time. The operations are: Node Merge, Register Merge, and MUX merge. Furthermore, the algorithm reassigns all test attributes after merge or split in order to resolve self loop conditions, update output and input attributes, in addition to reassigning test points in order to minimize the number of test sessions. Similar operations are applied in the case of split operations. The test session reassignment pseudo-code after kernel merge is shown in Figure 4. IV. E XPERIMENTAL R ESULTS We implemented the described allocation and tradeoff scheme on a P4, 1.4 Ghz Linux workstation. We measured the performance of our BIST system using five DFGs that are widely adopted for benchmarking in High-Level Synthesis: the fifth order elliptical wave filter, the 6th order FIR filter (finite impulse response), the 3rd order IIR filter (infinite impulse response), the 4-point DCT (Discrete Cosine Transform) circuit, and the 6-tap wavelet filter. The implementation details by our synthesis system are shown in Table I. For every example, we show the circuit design details, number of test sessions and
test overhead based on the transistor count provided by [9]. As shown in Table I, our system successfully synthesized different style datapaths based on the number of test sessions in addition to the number of clock cycles and design constraints. All results were obtained with a maximum of 1 CPU second. R EFERENCES [1] I. Parulkar, S. Gupta and M. Breuer, “Scheduling and Module Assignment for Reducing BIST Resources,” in Proc. DATE 98. [2] I. Harris, A. Orailoglu, “SYNCBIST: synthesis for concurrent built-in self-testability,” Proc. Euopean. Design & Test Conf., pp. 101-104, 1994. [3] L. Avra, “Allocation and Assignment in High-Level Synthesis for SelfTestable Data Paths,” in Proc. ITC, pp. 463-472, 1991. [4] P. Bukovjan, L. Ducerf-Bourbon, M. Marzouki, “Cost/Quality Trade-Off in Synthesis for BIST,” JETTA, Vol. 17, pp. 109-119, 2001. [5] M.L. Bushnell, V.D. Agrawal, Essentials of Electronic Testing for Digital, Memory, and Mixed Signal VLS Circuits, Kluwer Academic Publishers, Boston, 2000. [6] G. Craig, C. Kime, and K. Saluja, “Test Scheduling and Control for VLSI Built-In Self-Test,” IEEE Trans. on Computers, Vol. C-37, pp. 1099-1109, 1988. [7] M. Dhodhi, F. Hielscher, R. Storer, J. Bhasker, “Datapath Synthesis Using a Problem-Space Genetic Algorithm,” IEEE Trans. on CAD, Volume 14, pp. 934–944, August 1995. [8] H. Harmanani, C. Papachristou, “An Improved Method for RTL Synthesis with Testability Trade-Offs,’ in Proc. ICCAD, November 1993. [9] H. Kim, D. Ha, T. Takahashi, T. Yamaguchi, “A New Approach to Built-In Self-Testable Datapath Synthesis Based on ILP,” IEEE Trans. on VLSI, Vol. 8, pp. 594-605, Oct. 2000. [10] N. Nicolici, B. Al-Hashimi, A. Brown, “BIST Hardware Synthesis for RTL Datapaths Based on Test Compatibility Classes,” IEEE Trans. on CAD, Vol. 19, pp. 1375-1385, Nov. 2000. [11] C.E. Stroud, A Designer’s Guide to Built-In Self-Test, Kluwer Academic Publishers, Boston, 2002. [12] K. Wagner, S. Dey, “High-Level Synthesis for Testability: A Survey and Perspective,” in Proc. DAC, 1996. [13] L.T. Wang, E.J. McCluskey, “Concurrent Built-In Logic Block Observer,” in Proc. ISCAS’86, pp. 1054-1057.