Artificial Intelligence and Knowledge Engineering

1 downloads 0 Views 3MB Size Report
METAPROGRAMS FROM TRUTH TABLE SPECIFICATIONS ..... 4-2 priority encoder. 273. 0.26 .... Genetic Systems Programming: Theory and Experiences, pp.
GENERATION OF COMBINATIONAL CIRCUIT METAPROGRAMS FROM TRUTH TABLE SPECIFICATIONS USING EVOLUTIONARY DESIGN TECHNIQUES Robertas Damasevicius Kaunas University of Technology, Software Engineering Department Studentu 50, LT-51368, Kaunas, Lithuania, [email protected] Abstract. Design productivity crisis and power crisis in electronic design stimulate the search for new design methods that automate design space exploration and circuit design. Evolutionary hardware design has the ability to discover new circuit architectures satisfying a variety of design constraints (area, delay, power). In this paper, evolutionary hardware design techniques (genetic programming) are used to derive circuits from truth table specifications satisfying different target design criteria. For better variability management, the derived circuits are generated as metaprograms (generative components) allowing for an end-user to select a particular design criterion as a parameter for circuit instance generation in VHDL. Keywords: evolutionary hardware design, circuit optimization, metaprogram generation, multi-stage metaprogramming.

1

Introduction

The main aim of design is to develop a target system, which implements the prescribed functionality constrained by a given set of requirements, in time and with minimal amount of allocated resources. Designing a circuit that implements the desired functionality consists of deriving a high level hardware description from its behaviour specification (e.g., truth table, functional specification, formal model). Hardware description must be operational (i.e. to produce all expected outputs from given inputs) and synthesizable (can be mapped to a specific technology library). Today’s major problems in electronics design [1] are: 1) The increasing gap between technology capability and designer productivity: design productivity crisis. The problem is to design efficient circuits that implement a given input/output behaviour without much designing effort and time expenditures, while at the same time using all capabilities provided by the physical technology. There has been a great deal of success of electronic design automation, including technology mapping and architecture-specific optimization, physical, RTL and behaviour-level synthesis, routing and placement, and power optimization. However, the traditional design process is essentially based on designer knowledge and creativity. These human characteristics are too hard to be automated. 2) The increasing power consumption: power crisis. Traditionally, there are three basic circuit design efficiency characteristics: chip area, propagation delay (latency), and energy (power) consumption. As transistor sizes continue to shrink, size reduction allows frequency to be increased. Consequently, the dynamic power due to the logic transitions of circuit nodes also increases. Energy consumption is becoming the most important design characteristic for a wide range of mobile computing and battery-powered electronic systems, though other characteristics remain as important as ever. These two problems are hard to solve, because the design space in hardware domain is huge. It contains a large number of components, which can be wired in an infinite number of ways. In order to find useful circuits, human designers usually reduce this design space by working in lower dimension subspaces, e.g., considering only a limited subset of gates, or taking only a single design characteristic in mind. The exploration of larger solution space can allow finding more efficient circuit designs. However, even when working at the highest level of abstraction, for designer it is practically impossible to evaluate exhaustively all circuit configurations satisfying the same functional requirements. Finding efficient solutions for poorly specified problems (e.g., specified using truth tables) with no or little knowledge on the nature of design problem is especially difficult. Traditional logic synthesis techniques such as Karnaugh maps or Quine-McCluskey method are suited for finding minimal Sum-of-Products solutions. A combinational circuit based on this minimal form offers the shortest delay, but not at all the smallest size or power consumption, therefore the application of separate power optimization techniques is required [2]. Furthermore, these techniques produce circuits that only use AND, OR and NOT gates and ignore all the rest of gates. Therefore, further optimization of the circuit yield by these methods in order to introduce other kinds of more efficiently implemented gates such as NAND, NOR and XOR, is required. In this paper, genetic programming [3, 4] techniques are used for design criteria-directed combinatorial circuit design space exploration. The results of a design space exploration are represented at a higher-level of abstraction as metaprograms, leaving the selection and generation of a particular circuit instance for the end-user. - 109 -

The aims of this paper are as follows: 1) multi-objective design characteristic-directed design space exploration using evolutionary design techniques and derivation of area, delay and power-efficient combinational circuits in VHDL from truth table specifications; 2) generation of the design criteria parameterized circuit metaprograms in Open PROMOL [5]. The outline of the paper is as follows. Section 2 overviews the related works. Section 3 introduces with the basic principles of evolutionary hardware design and describes the implemented metaprogram generation system. Section 4 presents a case study. Finally, Section 5 discusses the results and presents conclusions.

2

Related Works

There are two main directions in current research in the field of parameterized hardware system design: 1) Definition of methods for exploring the solution space for circuits with minimal cost function values. 2) Development of system-level simulation and estimation tools to evaluate, as quickly and accurately as possible, circuit design characteristics. The second direction was discussed in [6] and is not considered here. The main problem in design exploration techniques is that the space of possible solutions is so vast that an exhaustive approach is unfeasible. Therefore a number of directed optimization techniques such as genetic algorithms are used. Genetic programming has already been used for combinational circuit design by several authors [7-10]. Koza et al. [11] have used genetic programming to design combinational circuits using a small set of gates (AND, OR, NOT), but the author aimed on generating functionally correct circuits rather than efficient circuits. Ascia et al. [12] use genetic algorithms to minimize a function depending on the area, power, and delay, but in this case, only mono-objective search is used leading to a single solution. In [13] the solution is extended to finding a Pareto-optimal set of system configurations, however, these system configurations are not generalized in any way. A similar problem is also considered in [14], where Pareto-optimal digital circuits are evolved with respect to the multi-objective criteria. These Pareto-optimal configurations obtained represent a trade-off that the designer can consider as a set of starting solutions from which he can, at a high level of abstraction, select instances of a system that meet the given design constraints. Our novelty is that a set of optimal (with respect to various circuit design objectives) design solutions, discovered using the evolutionary design techniques, is encapsulated at a higher level of abstraction in a metaprogram. The metaprogram can generate optimal solutions on demand. The developed system is thus a metaprogram generator, and represents the application of multi-stage metaprogramming in hardware design domain. Our contribution is as follows: 1) our genetic algorithm uses more efficiently implemented gates (NAND, NOR, XOR), which allows to increase the design space and to discover more efficient target circuit designs. 2) We consider not only area and performance of the target circuit but as well its power consumption, which allows the designer to meet more design constraints. 3) We generate a metaprogram that encapsulates three area, delay and power efficient implementations of a circuit, thus allowing an end-user to select the generation of a specific circuit instance meeting the desired design criterion at a late design stage.

3

Evolutionary Multi-Objective Derivation of Circuit Architectures

3.1

Basic Principles of Evolutionary Hardware Design Evolutionary hardware design [15, 16] is an alternative to conventional-based electronic circuit design. It offers a mechanism based on the principles of biological evolution, natural selection, and survival of the fittest, and allows to automatically creating circuits from the input/output behaviour of the expected circuit without explicit specifications how to do it. The evolution is implemented using a genetic algorithm [17]. Genetic algorithms are directed search algorithms, which search the parameter set of some problem domain for a combination of distinct values that correspond to a solution within that domain. The parameters define the types of available circuit components (gates) and their interconnection. Therefore, each set of distinct parameter values (genotype) corresponds to an individual electronic circuit (phenotype). Individual circuits are encoded as binary trees, where a node is a logical operation and its children are operands. The tree captures the hierarchical nature of circuit topology. Such representation is similar to abstract syntax trees used by hardware description languages. For example, in the VHDL behavioural description of a circuit, all logical functions except NOT have two operands. For further manipulations each binary tree is encoded as a sequence of integers corresponding to the number of a logical function and circuit input variables. The evolution is controlled by two processes: 1) selection reduces variation in the population by selecting the fittest individuals, and 2) genetic operators (crossover, mutation) increase variation in the population by creating new individuals. The evolution starts from a randomly generated set of circuits (initial population) composed of allowed logic functions and terminals (input variables). The creation of this initial random population is, in fact, a blind random search in the circuit design space. - 110 -

Each circuit is assigned a fitness value, which expresses how the circuit adheres to the imposed design objectives. In our case, there are four kinds of objectives: 1) the outputs of the circuit should match its imposed output behavior specification in the tabular form (truth table), 2) the circuit should have smaller chip area, 3) the circuit should have shorter propagation delay, and 4) the circuit should consume less power. Each circuit in the population is run over a number of different fitness cases so that its fitness is measured as a sum of correct output cases. The computation of the fitness value is detailed in the computational fitness model (section 3.3). Then the algorithm selects fitter individuals from the population, recombines them through crossover or mutation, and produces a next generation of a circuit population. The circuits that better obey the constraints are more often used in the reproduction process through tournament selection, when few individuals are chosen at random from the population and the one with the best fitness score is used. This ensures that individuals with high fitness scores are more likely to be selected than those with low scores. Reproduction copies the selected individual from the current population to the new one. Crossover mimics sexual reproduction by recombining two selected circuits (parents) to create two new circuits (offsprings) using a single-point crossover. Crossover is performed by randomly selecting two branches of the parents and transposing them. One of the parents is the winner of tournament selection while the other one is randomly selected from the population. The offspring can be subjected to mutation. Mutation yields a new individual by changing a single randomly chosen logical function in the selected circuit. After the genetic operations are performed on the current population, the new population of offspring replaces the old population of parents. Each individual in the new population of programs is then measured for fitness, and the best individual is designated as the result produced by circuit population in this generation. This process is iterated until a circuit design that obeys to all prescribed design objectives is encountered within the current population or a given number of generations is reached. 3.2

Definition of Solution Space In applying a genetic algorithm to a problem, there are five major steps (according to [17]): 1) definition of a problem; 2) definition of the set of terminals and the set of primitive functions, 3) definition of the computational fitness model, 4) setting of the parameter values for controlling the evolution process, and 5) definition of the criterion for terminating the evolution process. The problem of circuit evolution is defined as follows. Given a truth table of a Boolean function f and a set of target technology components (gates), evolve an optimized combinational logic circuit that performs the function of f subject to a set of optimization criteria (area, delay, power).

Each truth table is a matrix 2 n × (n + m ) , representing a set of input values X and a set of output values Y, where n is a number of inputs, and m is the number of outputs in a circuit. If a truth table has more than one output, then a separate circuit is evolved for each respective output. Each circuit to be evolved is a composition of logical gates from the gate set G and terminals from the terminal set T, which together compose the solution space. The terminals can be viewed as the inputs to the combinational circuit. The values of the terminals are taken from the given truth table. The set of gates representing logic functions that are used to generate circuits is defined based on the available technological library gates and functional completeness of a gate set. A set of Boolean functions is functionally complete, if all other Boolean functions can be constructed from this set. Minimal functionally complete sets are {NAND}, {NOR}, {XOR}, {AND, NOT}, {OR, NOT}. Other functionally complete sets can be constructed from these minimal sets. We use a functionally complete set of gates {NAND, NOR, XOR, XNOR, NOT}. The selected logic function set and the terminal set should have the closure property, i.e., each of the functions in the function set should be able to accept, as its arguments, any value and data type that may possibly be returned by any function in the function set and any value and data type that may possibly be assumed by any terminal in the terminal set. 3.3

Computational Fitness Model The goal of an evolution process is to produce a fully functional circuit (i.e., one that produces the expected behavior stated by its truth table) which minimizes the amount of resources (time, space, energy) used. Therefore, a fitness function that works in two stages is introduced. At the beginning of the evolution, only compliance with the truth table is taken into account, and the evolutionary approach is basically exploring the search space. Error fitness function fE is defined as follows: fE =

2n

∑ f (x1 ,..., xn ) − y j j =1

- 111 -

,

(1)

where, xi ∈ X are inputs, f is a logical function (composition of gates) describing a circuit and y j ∈ Y are output values of a circuit. Once the first functional solution has been discovered by the algorithm, we switch to a new fitness function in which fully functional circuits that use fewer resources are rewarded. We have defined three main circuit design criteria: chip area, propagation delay, and power consumption. So we define three fitness functions for each of the design criterion. Delay fitness function fD is defined as the delay of a circuit (critical path delay), i.e. the longest time passed between the supply of input signals to the circuit and the receipt of output signals:

(

)

f D = d i + max d il+1 , d ir+1 ,

(2)

where d i is the delay at the i-th level of a tree representing a circuit, d l is the delay at the left branch of the tree node, and, d r is the delay at the right branch of the tree node. Area fitness function fA is defined as the final circuit area occupied by the implemented system on a chip, i.e., a sum of the area of all gates of in an evolved circuit:

fA =

∑a j ,

(3)

t j ∈T

where aj is the area of a gate at the j-th node of a tree T representing an evolved circuit. Power consumption defines the amount of power consumed by a component during the execution of its functions. Power consumption in integrated circuits has three main components [18]: 1) Power consumption during the switching of CMOS gates, when complementary parts are open simultaneously. 2) Power consumption caused by leakage currents during the non-conducting state of gates. Parameters influencing leakage are the supply voltage, the transistor threshold voltage, transistor dimensions like width and length, temperature, IR drop effects, manufacturing tolerances and the state of the gate. 3) Dynamic power consumption during data-dependent switching of transistors and connections between them. Dynamic power consumption accounts for the largest portion of total consumption of power in digital circuits [18]. It depends upon switching activity and the size of switched capacities: 1 2 CVdd E sw f clk = Pmax E sw f clk , (4) 2 where Pmax is maximal energy dissipation of a gate per clock cycle, C is physical capacitance at the gate output, Vdd is supply voltage, Esw is switching probability at the output of a gate, and fclk is clock frequency. Switching probabilities for each type of a gate are computed differently: P=

NOT E sw = 1 − E sw (i ), i ∈ I , AND E sw = ∏ E sw (i ) , i∈I

NAND E sw = 1 − ∏ E sw (i ) , i∈I

OR E sw

( ) = ∏ (1 − E (i ) ) ,

= 1 − ∏ 1 − E sw (i ) ,

NOR E sw

(5)

i∈I

i∈I

sw

⎞ ⎞⎛ ⎛ XOR E sw = 1 − ⎜1 − ∏ Esw (i ) ⎟⎜1 − ∏ (1 − E sw (i ) )⎟, ⎟ ⎟⎜ ⎜ ⎠ ⎠⎝ i∈I ⎝ i∈I

where I is a set of inputs of a gate, and Esw(i) is switching probability at the i-th input of a gate. Considering this, power fitness function fP is calculated recursively as follows:

( )

( )

f P (g i ) = Pmax (g i )E sw (g i ) + f P g il+1 + f P g ir+1 ,

(6)

where g i is a gate at the i-th level of a tree representing a circuit, g il+1 and g ir+1 are gates at the left branch and right branch of the tree node corresponding to g i . - 112 -

Circuit fitness fC is calculated depending upon evolution criterion as follows: ⎧ PEN ⋅ f E + f A , target = area ⎪ f C = ⎨ PEN ⋅ f E + f D , target = delay , ⎪ PEN ⋅ f + f , target = power E P ⎩

(7)

where PEN is a penalty constant. 3.4

Metaprogram Model The result of the described evolutionary circuit design framework is a metaprogram encapsulating derived circuits with different design criteria in mind. We consider metaprograms as special cases of OntologyBased Generative Components introduced in [19]. The role of a metaprogram is as follows: 1. Metaprograms are members of reuse libraries to support large-scale reuse and design knowledge sharing. 2. Metaprograms are a representation of a family of generative components. A configuration of the family is specified by the design goal and design context. 3. A metaprogram encodes domain ontology (domain concepts and their relationships). 4. The model of a metaprogram has two parts: meta-interface (for expressing communication with the environment and initialization of generative aspects) and meta-body (for expressing functionality and implementing generative aspects). 5. At the core of the meta-interface model is the meta-parameter concept. The ontology-based metaparameter has a name, abstract value and semantics. The abstract value is a bridge for connecting knowledge represented in the meta-interface with the implementation knowledge that is hidden in the meta-body. The role of meta-interface is: 1) metadata for describing meta-parameter purpose, usage context, constraints; and 2) explicit variability management in a domain. Here we use metaprograms described in Open PROMOL [5], a domain language independent metalanguage based on the heterogeneous metaprogramming [20] techniques. The evolutionary hardware design and metaprogram generation framework is summarized in Figure 1.

Create Initial Population

Start

Start

Evaluate fitness

Evolve circuit using error fitness

Select best individual

Yes

Termination criterion

Criterion

No

Return best individual

Evolve circuit using area fitness

Perform tournament selection

End

Evolve circuit using delay fitness

Evolve circuit using power fitness

Perform crossover Perform mutation

Generate HDL program code

Add to next generation

Generate metaprogram No

New generation created?

Yes

b)

a)

End

Figure 1. Evolutionary design framework: a) circuit evolution, and b) metaprogram construction

- 113 -

4

Case Study

For a case study we analyze the design criteria-directed evolution of simple digital circuits such as full adders, multipliers, majority function, encoders and comparators. Circuit evaluation is based on their possible implementation using CMOS technology logic gates [21] (see Table 1). Table 1. Library elements (logic gates)

Element NOT NAND NOR XOR XNOR

Delay, ns 0.06 0.10 0.09 0.25 0.25

Area, um2 36 55 55 127 109

Power, uW/MHz 0.06 0.18 0.22 0.35 0.24

The dynamics of the evolution process for the 5-majority circuit is shown in Figure 2, a) and b). Here the relationship between average program size and average fitness vs. number of generations for different evolution criteria defined by error, area, delay and power fitness functions is shown. The size of a program is calculated as the total number of nodes in a binary tree representing the circuit.

a)

b) Figure 2. Relationship between a) average program size and b) average fitness vs. number of generations for different evolution criteria - 114 -

The results of evolution are presented in Table 2. Note that here only the results of best evolved individuals with a specific evolution criterion as a target for optimization are given. Table 2. Circuit evolution results

Circuit 1-bit full adder 2-bit full adder 2-bit multiplier 3-majority 5-majority 4-2 encoder 4-2 priority encoder 2-bit comparator

Area, um2 (target: area) 474 2260 684 274 914 510 273 893

Delay, ns (target: delay) 0.50 1.01 0.35 0.34 0.65 0.34 0.26 0.54

Power, uW/MHz (target: power) 0.602 3.369 1.284 0.332 1.681 0.682 0.522 1.321

Finally, the tool generates metaprograms, which encapsulate VHDL circuits evolved with different design aims in mind. The metaprograms are described in Open PROMOL metalanguage [22]. The automatically generated metaprogram for 1-bit full adder is presented in Figure 3. The metaprogram encapsulates three VHDL programs representing three circuits evolved with different design aims in mind. The meta-interface of the metaprograms contains (between ‘$’ symbols) context information on design characteristics and allows for an end-user the convenient selection of a specific VHDL program to generate. The specific VHDL component is assembled from VHDL code fragments using @case function. $ "Select design criterion: 1 - area (area=474 um2; delay=0.50 ns; power=0.688 uW/MHz) 2 - delay (area=601 um2; delay=0.50 ns; power=0.919 uW/MHz) 3 - power (area=546 um2; delay=0.59 ns; power=0.602 uW/MHz)" {1,2,3} sel:=1; $ ENTITY system_@case[sel,{area},{power},{delay}] IS PORT(X0,X1,X2: in bit; Y0,Y1: out bit); END system_@case[sel,{area},{power},{delay}]; ARCHITECTURE behave OF system_@case[sel,{area},{power},{delay}] IS BEGIN @case[sel,{ Y0 1 increases the likelihood of detecting untargeted faults and defects. Generation of an n-detection test requires repeated applications of a test generation process to target faults that are not yet detected n times. Each time a fault is targeted, a different test pattern must be generated for it. This increases the complexity of test generation. A procedure for forming n-detection tests without applying a test generation procedure to target faults is described in [19]. The proposed procedure accepts a one-detection test. It extracts test cubes for target faults from one-detection test and then merges the cubes in different ways to obtain an n-detection test. Merging of cubes does not require test generation. Fault simulation is required for extracting test cubes for target faults [19].

- 140 -

N-detection may lead to large tests where many test patterns do not help increase the defect coverage [20]. The problem of control of test size addition is considered in [20-22]. The authors of [20] introduced variable n-detection tests where different target faults are targeted different number of times. In a variable n-detection test, only selected faults are targeted n times. Other faults are targeted between 1 and n-1 times. The motivation for introducing variable n-detection tests was to control the size of test pattern set as n was increased. The number of times each fault is targeted is determined by a parameter that measures the usefulness of multiple test patterns for the fault in detecting defects. This parameter is based on the number of paths through the fault site [20]. The use of variable number of fault detections while transforming the pin pair test into functional delay test is suggested in [21]. The performed experiments show the effectiveness of this proposal. The restriction of number of fault detections allowed shortening the test size almost twice [21]. In the paper [22], three parameters of an n-detection test to measure the saturation of the test generation process were defined: 1) the fraction of faults detected n times or less by the test; 2) the fraction of faults detected fewer than n times by the test; and 3) the test set size relative to the size of a one detection test. Based on these parameters and the rationale for computing n-detection tests, the authors defined saturation to occur at the value of n where one of the three parameters reaches a threshold specified for it. The thresholds were selected based on experimental results [22]. The main drawback of all reviewed techniques is the test size addition. Another disadvantage of almost all mentioned approaches lies in use of test generation for enrichment of test pattern sets.

3

Procedure of enrichment of a functional test

In this section we give a detailed description of the proposed procedure. We consider the enrichment of test sets that are generated for detection of functional delay faults. A test for the functional delay fault is a pair of input patterns that propagates a transition from a primary input to a primary output of a circuit [23]. Under functional delay fault model proposed in [23], a fault is a tuple (I, O, t I, t O), where I is an input of the circuit under test (CUT), O is a CUT output, t I is a rising or falling transition at I , and t O is a rising or falling transition at O. Thus, four functional delay faults are related with every input/output (I/O) pair and the total number of faults is 4*n*m, where n is the number of inputs of the CUT and m is the number of outputs of the CUT. The pseudocode of procedure for Enrichment of Functional Delay Test (procedure EoFDT) is shown in Figure 1. In the rest of the paper the following notations will be used: • T – the test pattern pair set; • Tk – the test pattern pair (Tk ∈ T) • t1k,i – the signal value (1 or 0) of first pattern of the test pattern pair Tk on circuit input i; • t2k,i – the signal value (1 or 0) of second pattern of the test pattern pair Tk on circuit input i; • F – the set of functional delay faults of particular circuit; • Fk – the set of functional delay faults that are detected on test pattern pair Tk; • F*k – the set of functional delay faults that are detected on the modified test pattern pair Tk • n – the number of circuit inputs. The procedure EoFDT modifies each test pattern pair Tk of the set T in such way that the modified test pattern pair T*k detects all functional delay faults detectable on the initial pattern pair Tk and, probably, some additional functional delay faults not detectable on Tk. Therefore, the enriched test pattern set TE may detect some functional delay faults that are not detectable on the test pattern set T or, at least, increases the number of detections of some functional delay faults. Complementary clarification is needed for lines 6-9 of procedure EoFDT. There are four possibilities of modification of test pattern pair Tk. Namely, if the signal values are not equal in test patterns of Tk on circuit input i (t1k,i ≠ t2k,i) we can change either signal value of the first test pattern or of the second. In case of signal values equality (t1k,i = t2k,i)we have the same choice as well. Therefore, the four different operation modes of procedure EoFDT are: 1. if t1k,i ≠ t2k,i then t2k,i = t1k,i and if t1k,i = t2k,i then t1k,i =NOT(t1k,i), let’s name this mode M_1_1; 2. if t1k,i ≠ t2k,i then t2k,i = t1k,i and if t1k,i = t2k,i then t2k,i =NOT(t2k,i) (mode M_1_2); 3. if t1k,i ≠ t2k,i then t1k,i = t2k,i and if t1k,i = t2k,i then t1k,i =NOT(t1k,i) (mode M_2_1); 4. if t1k,i ≠ t2k,i then t1k,i = t2k,i and if t1k,i = t2k,i then t2k,i =NOT(t2k,i) (mode M_2_2). The most prominent features of the proposed functional test enrichment procedure are: 1) procedure EoFDT does not expand the initial test pattern set T, i.e. there is no test size addition; 2) procedure EoFDT does not require test generation. The described approach enriches the test patterns using functional delay fault simulation. Thus, the computing time of the procedure EoFDT depends linearly on the test size.

- 141 -

procedure EoFDT INPUT: circuit C, corresponding functional delay fault set F, test pattern pair set T, number of iterations It OUTPUT: enriched test pattern pair set TE 1. l=1 2. repeat 3. for each Tk ∈ T do 4. determine Fk 5. for i=1 to n do 6. if (t1k,i ≠ t2k,i) then 7. set the same signal value: t1k,i = t2k,i (t2k,i = t1k,i) 8. else 9. set signal value transition: t1k,i =NOT(t1k,i) (t2k,i =NOT(t2k,i)) 10. determine F*k 11. if Fk ⊆ F*k then 12. Fk=F*k 13. else 14. restore signal values t1k,i and t2k,i 15. end 16. end 17. l=l+1 18. until k > It end procedure Figure 1. The pseudocode of procedure EoFDT

4

Experimental results

In this section we present results of the application of the proposed test enrichment procedure to ISCAS’85 benchmark circuits. The test pattern sets for functional delay faults were generated for the black box model of the circuits [24]. Remind the black box model represents a system by defining the behaviour of its outputs according to the values applied to its inputs without the knowledge of its internal organization. The black box models written in the programming language C were used by the test generation for the functional delay faults. The results of test pattern set enrichment are reported in Table 1. The initial test pattern set T exposes 100% coverage of targeted functional delay faults. In Table 1, after the circuit name we show the number of detectable functional delay faults (DFDF), and the average number of detections under T, computed as follows. The sum of detections of all faults is divided by number of detectable functional delay faults. Next for each of four test enrichment modes, we show the number of detections under enriched test pattern set TE, the improvement of the average number of detections, expressed in percent, and the number of iterations. The best results are shown in bold. Table 1. Results of test pattern set enrichment

Circuit DFDF

Av. T

c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Aver.

4.7 10.1 11.8 9.5 15.6 11.6 10.6 26.3 17.5 45.5 16.3

540 5184 1326 5184 3004 3320 2588 10540 3068 12188 4694

Mode M_1_1 Av. Imp. It. TE in % 8.4 79.3 4 4 10.3 2.4 26.9 127.2 4 4 9.6 1.3 30.6 96.8 3 29.4 153.1 3 18.6 75.6 4 55.2 109.7 4 18.1 3.3 2 65.7 44.2 3 27.3 69.3 3.5

Mode M_1_2 Av. Imp. It. TE in % 6.4 37.3 3 10.2 0.7 1 25.3 114.1 3 9.5 0.4 1 18.6 19.3 2 24.7 113.0 3 15.5 46.5 3 49.2 86.6 4 17.9 2.1 1 57.5 26.2 4 23.5 44.6 2.5 - 142 -

Mode M_2_1 Av. Imp. It. TE in % 7.6 61.8 3 10.3 2.0 4 22.3 88.3 3 9.6 1.2 4 29.4 89.1 3 26.5 128.1 3 19.0 79.0 3 52.8 100.4 4 2 18.5 5.5 59.9 31.7 3 25.6 58.7 3.2

Mode M_2_1 Av. Imp. It. TE in % 6.8 44.6 4 10.2 0.9 2 25.5 115.8 4 9.6 0.6 2 18.6 19.8 3 24.6 111.6 3 16.6 57.1 3 51.2 94.5 3 18.3 4.3 1 56.8 24.9 4 23.8 47.4 2.9

The following points can be seen from Table 1. The procedure EoFDT was able to enrich the functional delay fault test pattern set of all circuits in all modes. The average number of fault detections was increased at 55% on average. The best result displays Mode M_1_1 (69.3%), the worst improvement brings Mode M_1_2 (44.6%). If we consider separate circuits we see that the improvement of the average number of detections ranges from 0.4 (circuit c1355, Mode M_1_2) to 153.1 (circuit c2670, Mode M_1_1) percent. The experiment shows that procedure EoFDT converges after maximum four iterations. Four iterations are needed in test enrichment Mode M_1_1, three iterations are enough almost in all cases for other modes. Many authors state that the n-detection tests are effective in detecting untargeted faults and defects [1418]. To examine the influence of the improvement of average number of detections in detecting untargeted faults, we simulated s transition faults under the test pattern sets T and TE. Note that the tests for functional delay faults are generated at functional level and then applied for detection of structural level faults. Table 2. Results of transition fault simulation Circuit

T

TEM_1_1

TEM_1_2

TEM_2_1

TEM_2_2

c432 c499 c880 c1355 c1908 c2670 c3540 c5315 c6288 c7552 Average

96.08 93.00 99.67 95.01 94.58 98.21 94.21 99.91 99.88 99.17 96.97

97.82 91.22 99.29 92.18 95.21 98.87 96.48 99.96 99.89 99.15 97.01

97.24 93.47 99.83 95.64 95.11 99.19 94.69 99.98 99.89 99.45 97.45

96.73 88.98 99.46 90.69 95.05 98.62 96.41 99.96 99.89 99.17 96.50

97.46 93.18 99.83 95.13 95.21 99.29 94.44 99.98 99.89 99.35 97.38

The results of fault simulation are reported in Table 2. In Table 2, after the circuit name we show the transition fault coverage of the initial test pattern set T and of the enriched test pattern sets TEM_1_1, TEM_1_2, TEM_2_1 and TEM_2_2. The fault coverage is expressed in percent. The best transition fault coverages are shown in bold, and the worst results which are equal or below of coverages of initial test sets are in italic. From Table 2 it can be seen that the functional delay fault tests expose moderate quality in regard of detecting of untargeted transition faults. The test enrichment accomplished using proposed procedure EoFDT contributed in test quality improvement in three modes of four if we take in account average fault coverage. The improvement of transition fault coverage ranges from 0.04% (Mode M_1_1) to 0.48% (Mode M_1_2). However the test enrichment Mode M_2_1 produces test sets which transition fault coverage is on average 0.47% lower than of initial test sets despite of fact that the average number of functional delay fault detection was increased at 58.7%. Therefore, we can conclude that not always the increment of detections of targeted faults brings increment in detection of untargeted faults and defects. The reasons of such phenomenon are in changed conditions of signal propagation from circuit input to circuit output. However, the statement that n-detection tests are effective in detecting untargeted faults and defects is true when the initial test sets stay unchanged, and additional tests patterns are generated in order to increase the number of detections of targeted faults. In our case the procedure EoFDT modifies the patterns of initial test set, and there is no test size addition. If we examine tests for separate circuits we can see that the test enrichment contributed to augmentation of fault coverage of untargeted faults in 32 cases of 40, and in 8 cases of 40 there was no improvement or even reduction of fault coverage of untargeted faults. These 8 cases belong to test set enrichment Modes M_1_1 and M_2_1. Contrarily, application of procedure EoFDT in Modes M_1_2 and M_2_2 allowed us to improve the transition fault coverage for all circuits. Therefore, if the primary goal of increasing of number of functional delay fault detections is the detecting of untargeted faults and defects we have to use the procedure EoFDT in Modes M_1_2 and M_2_2.

5

Concluding remarks

We described an approach for functional test enrichment. The proposed postprocessing procedure modifies each test pattern of the test in such way that the modified test pattern detects all functional delay faults detectable on the initial test pattern and some additional functional delay faults. The test enrichment procedure does not increase the test size and it is fast because the procedure does not require test generation. The described approach enriches the test patterns using functional delay fault simulation. The performed experiments demonstrated that the proposed postprocessing procedure for functional delay test enrichment is an efficient and simple way to enhance the quality of initial test pattern set. - 143 -

References [1] [2] [3] [4] [5] [6]

[7] [8] [9]

[10]

[11] [12] [13] [14]

[15] [16] [17] [18]

[19] [20] [21] [22]

[23] [24]

I. Pomeranz and S. M. Reddy. On Testing Delay Faults in Macro-based Combinational Circuits. Proceedings of Int. Conf. Computer-Aided Dsign, San Jose, CA, 1994, pp. 332-339. F. Ferrandi, F. Fummi, G. Pravadelli, D. Sciuto. Identification of Design Errors Through Functional Testing. IEEE Transactions On Reliability, Vol. 52, No. 4, December 2003, pp.400-412. H. Kim and J.P. Hayes. Realization-Independent ATPG for Designs with Unimplemented Blocks. IEEE Trans. on CAD, vol. 20, No. 2, pp. 290–306, 2001. F. Ferrandi, F. Fummi, D. Sciuto. Implicit Test Generation for Behavioral VHDL Models. Proceedings of International Test Conference, 18-23 October 1998, pp. 587-596. J. Zeng, M. Abadir, G. Vandling, L. Wang, A. Kolhatkar, J. Abraham. On Correlating Structural Tests for Speed Binning of High Performance Design. Proceedings of the International Test Conference ITC’04, pp. 31-37, 2004. T.N. Pham, F. Clugherty, G. Salem, J.M Crafts, J. Tetzloff, P. Moczygemba, T.M. Skergan. Functional Test and Speed/Power Sorting of the IBM POWER6 and Z10 Processors. Proceedings of the International Test Conference ITC’08, pp. 1-7, 2008. J. Yi and J.P. Hayes. The Coupling Model for Function and Delay Faults. Journal of Electronic Testing: Theory and Applications, No. 21, pp. 631-649, 2005. K.M. Butler and M.R. Mercer. Assessing Fault Model and test quality. Kluwer Academic, 1992 N. Neophytou, M.K. Michael and S. Tragoudas. Functions for Quality Transition-Fault Tests and Their Applications in Test-Set Enhancement. IEEE Transactions on Computer Aided Design of Integrated Cicuits and Systems, vol. 25, No. 12, pp. 3026-3035, 2006. I. Pomeranz and S. M. Reddy. Tuple Detection for Path Delay faults: A Method for Improving Test Set Quality. Proceedings of the 18th International Conference on VLSI Design held jointly with 4th International Conference on Embedded Systems Design, pp. 41-46, 2005. I. Pomeranz and S. M. Reddy. Pattern Sensitivity: A Property to Guide Test Generation for Combinational circuits. Proceedings of 8th Asian Test Symposium, 1999, pp. 75-80. E. Bareiša, V. Jusas, K. Motiejūnas, Š. Packevičius, R. Šeinauskas. The Improvement of Test Independence from Circuit Realization. Information technology and control, Kaunas, Technologija, 2004, No. 4(33), pp. 45 - 52. E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas. Functional Test Generation Based on Combined Random and Deterministic Search Methods. Informatica, 2007, Volume 18, Nr. 1, pp. 3-26. B. Benware, C. Schuermyer, N. Tamarapalli, K.-H. Tsai, S. Ranganathan, R. Madge, J. Rajski, and P. Krishnamurthy. Impact of multiple-detect test patterns on product quality. Proceedings of International Test Conference, Sep. 2003, pp. 1031–1040. S. Venkataraman, S. Sivaraj, E. Amyeen, S. Lee, A. Ojha, and R. Guo. An experimental study of n-detect scan ATPG patterns on a processor. Proceedings of VLSI Test Symposium, April 2004, pp. 23–28. I. Pomeranz and S. M. Reddy. Worst-case and average-case analysis of n-detection test sets. Proceedings of Design Automation Test European Conference, Mar. 2005, pp. 444-449. S. Neophytou, M.K. Michael. On the Relaxation of n-detect Test Sets. Proceedings of the 26th IEEE VLSI Test Symposium, pp. 187-192, 2008. J. Geuzebroek, E. J. Marinissen, A. Majhi, A. Glowatz and F. Hapke. Embedded Multi-Detect ATPG and Its Effect on the Detection of Unmodeled Defects. Proceedings of the IEEE International Test Conference ITC 2007, pp. 1-10, 2007. I. Pomeranz and S. M. Reddy. Forming n-detection test sets without test generation. ACM Transactions on Design Automation of Electronic Systems, vol. 12, No. 2, 2007. I. Pomeranz and S. M. Reddy. On n-Detection Test Sets and Variable n-Detection Test Sets for Transition Faults. IEEE Transactions on Computer Aided Design of Integrated Cicuits and Systems, vol. 19, No. 3, pp. 372-383, 2000. E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas. Properties of Variable n-Detection Functional Delay Fault Tests. Information Technology and Control, Kaunas, Technologija, 2008, Vol. 37, No. 2, pp. 95-100. I. Pomeranz and S. M. Reddy. On the Saturation of n-Detection Test Generation by Different Definitions with Increased n. IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems, vol. 27, No. 5, pp. 946-957, 2008. B. Underwood. W.O. Law. S. Kang. H. Konuk. Fastpath: a path-delay test generator for standard scan designs. Proceedings of 1994 International Test Conference, pp.154–163, 1994. E. Bareiša, V. Jusas, K. Motiejūnas, R. Šeinauskas. The Realization-Independent Testing Based on the Black Box Models. INFORMATICA, Vilnius, Institute of Mathematics and Informatics, Vol. 16, No. 1, pp. 19-36, 2005.

- 144 -

ALTERNATIVE LMS ALGORITHM FOR EMBEDDED VOIP UA Egidijus Kazanavicius, Mindaugas Vidmantas, Juozas Dovydaitis Kaunas University of Technology, Computer Engineering Department, Studentų g.50, Kaunas, Lithuania, [email protected], [email protected], [email protected] Abstract. This paper presents the echo canceller based on alternative implementation of LMS adaptive filter algorithm. We focus on battery-powered embedded devices and need to reduce their CPU usage. Proposed low complexity echo canceller could partially help to solve lower CPU usage problem. Experiment results are based on proposed alternative LMS filter integrated together with SIP-based Voice over IP(VoIP) User Agent(UA) running on ARM architecture embedded device. Keywords: LMS, VoIP, Embedded System.

1

Introduction

Speech is a stochastic signal that is highly nonstationary, but over short enough time intervals, say 30 milliseconds, speech seems stationary because we can’t move our mouth and tongue this fast. [7] The goal of the designers of selective-tap algorithms is to find ways to reduce the number of coefficients updated per iteration in a manner which degrades algorithm performance as little as possible. However, new pressures on product design have emerged - the increase of user mobility imposes a requirement of low power consumption for portable battery powered equipment. [2] SIP-VoIP (Voice Over IP) based user agent (UI), also known as hands-free telephone or full-duplex intercom system, has a feedback or echo problem because the output from the loudspeaker feeds into the microphone. Several methods can be used to reduce or eliminate the problem: 1) Reduce the overall amplification, which often leads to poor volume. 2) Use Acoustic Echo Suppression, which reduces the full-duplex telephone to half-duplex. This technique of switches cans even "switch away" beginnings of words. 3) Use Acoustic Echo Cancellation, which is realized with an adaptive or learning filter. First the filter learns the acoustics from given microphone and speaker signals, then the filter can calculate an estimated microphone signal from the loudspeaker signal.

Figure 1. VoIP UA Acoustic Environment

Figure 1 shows how the echo problem arises, when the output from the loudspeaker feeds into the microphone. Last mean-square (LMS) algorithm is a member of stochastic gradient algorithms, and because of its robustness and low computational complexity, it has been used in a wide spectrum of applications.[3]

- 145 -

Figure 2. VoIP UA application of adaptive filtering to system identification

Figure 2 shows VoIP UA echo canceling subsystem general scheme for unknown time-variant system identification. The input signal, {x(n)}, to the unknown system is the same as the one entering the adaptive filter; n is continuous time and k is time delay. The output of the unknown system is the desired signal, {d(n)}. The output of an adaptive FIR filter with the same number N of coefficients wk, is given by N −1

y (n) = ∑ wk x(n − k ).

(1)

k =0

For these two systems d(n) and y(n) to be equal, the difference:

e( n ) = d ( n ) − y ( n )

(2)

must be equal to zero.

2

Alternative LMS Algorithm Description

In previous research works [4],[5],[6] it was separately from each other discussed few techniques to impact the LMS performance. In this paper the integration of these techniques into single one is presented. The goal of proposed LMS model is to increase the rate of convergence and decrease the computational complexity. In the conventional adaptive LMS algorithm [1], the characteristics of a transversal FIR filter can be expressed as a vector consisting of values known as tap weights or weight vector coefficients w(n), which determine the performance of the filter. The values of tap weights are expressed in column vector form and are updated according to the LMS algorithm by equation

w(n + 1) = w(n) + 2μe(n) x(n),

(3)

where w(n) is the coefficient vector at time n and, μ is the convergence parameter or step size parameter and is a small positive constant, e(n) and x(n) are the error signal, and the input vector signal, respectively. Proposed alternative LMS algorithm consists of few improvements put together into LMS coefficient update equation: •

Robust variable-step-size μ calculation



Employ partial updates of filter coefficients



Multiply operation replacement by XOR operation

2.1

Robust variable-step-size LMS algorithm The use of variable step-size μ(n) serves two objectives: first, the fast convergence speed of the update p(n) that we used the energy of the output signal to control the adaptation process and second, to rejects the effect of uncorrelated noise sequence on the step-size update to ensuring low misadjustment that large p(n) performs when the solution is far from the optimum and step update p(n) decreasing as we approach the optimum even in the presence noise[4]. Robust variable-step-size for LMS algorithm is described by equation

μ (n + 1) = αμ (n) + γp 2 (n) g 2 (n), - 146 -

(4)

where μ(n) is a variable-size parameter, which varies in time domain, p(n) is the energy of the output signal, g(n) is the estimate in the time varying of the autocorrelation of echo signal and α, γ are positive constant in the range of between 0 and 1. The detailed explanation and derivation of p(n) and g(n) could be found in [4]. 2.2

Employing partial updates of filter coefficients The basic idea behind selective partial updating is to update only a small number of the adaptive filter coefficients at each iteration as identified by a selection criterion. In this way, the overall complexity of the adaptive system is less than that of the least-mean-square LMS adaptive filter. [6]

⎧wi , k + 2 μek xk − i +1 if (k − i + 1) mod N = 0 wi , k +1 = ⎨ . w otherwise i , k ⎩

(5)

Reduced complexity selective partial update LMS algorithm not only performs almost as well as the full-update LMS algorithm in general, but in some cases it is also capable of outperforming the full-update algorithm. 2.3

eXclusive-OR usage in coefficient update The coefficient update formula computed using an eXclusive-OR (XOR) operation of both the error and the input to the respective tap enhances algorithm calculation performance. The multiplications required in the alternative LMS algorithm reduce to a single XOR operation, which results in significant power and area savings. This constituent part of proposed alternative LMS algorithm, together with the rest parts, is implemented in VoIP UA application only. 2.4

Implementation of the Alternative LMS Algorithm Each iteration of the LMS algorithm requires 3 distinct steps in this order: 1. The output of the FIR filter, y (n) is calculated using equation N −1

y (n) = ∑ wk x(n − k ) = wT (n) x(n).

(6)

k =0

2.

The value of the error estimation is calculated using equation

e(n) = d (n) − y (n). 3.

(7)

The tap weights of the FIR vector are updated in preparation for the next iteration, by equation:

⎧wi , k + 2μ k ek xk − i +1 if (k − i + 1) mod N = 0 , wi , k +1 = ⎨ wi , k otherwise ⎩

where

μ k +1 = αμ k + γpk2 g k2

.

(8)

In primary LMS, the step size is fixed at each iteration, but we use variable step-size. The proper value for the step size is calculated by formula 4. Table 1. Complexity of the LMS and proposed alternative LMS.

Algorithm LMS

Multiplies

Adds

Data Memory

2 L + 1 + [ Lh ]

2 L + [ Lh − 1]

2 L + [ Lh ]

Proposed alternative LMS

1⎞ ⎛ ⎡L ⎤ ⎜1 + ⎟ L + 4 + ⎢ h ⎥ N⎠ ⎝ ⎣N⎦

1⎞ ⎛ ⎡ L − 1⎤ ⎜1 + ⎟ L + ⎢ h ⎥ + 1 ⎣ N ⎦ ⎝ N⎠

⎡L ⎤ 2L + ⎢ h ⎥ ⎣N⎦

Table 1 shows the complexity of the LMS and proposed alternative LMS, where Lh is the length of the modeling filter, L is FIR echo canceller coefficient count. 2.5

MATLAB Simulation Selection of a suitable value for μ is imperative to the performance of the LMS algorithm, if the value is too small the time the adaptive filter takes to converge on the optimal solution will be too long. If step-size μ is too large the adaptive filter becomes unstable and its output diverges. - 147 -

In MATLAB simulation we applied standard LMS and proposed alternative LMS on echoed signal (mono, 8kHz, 12.5s length), 10000 iterations. For standard LMS it was chosen step-size μ = 0.05, and for alternative LMS calculated using formula 4.

Figure 3. Learning curve of the adaptation

Figure 3 shows the adaptation curves of two algorithms for the same echoed signal. It is easy to notice that the proposed alternative LMS has less mean-square error, which means it outperform standard LMS.

3

Conclusions and future works

Our analysis of algorithms indicates that proposed alternative LMS theoretically (MATLAB simulation) has potential to outperform standard LMS. However, their convergence speeds are reduced approximately in proportion to the number of coefficients updated per iteration divided by the filter length. Proposed algorithm could be potentially useful for real-time applications of adaptive filters. Experimentally open source SIP-based VoIP UA was tested with frame packets size of 10ms. We need to admit that real world application performance results were 10-20% worse than MATLAB simulation results. It could be possible of additional noise of a big test room. Future works will focus on some other algorithms with the same optimization, which could result in significant power and area savings. We hope to optimize programming source for better performance on embedded devices. Contrary to MATLAB simulation, the real world application VoIP UA application expectation, no significant performance enhancement was observed. The source code implementation of VoIP UA echo canceling subsystem needs to be investigating in more detail.

References [1] [2] [3] [4]

[5]

[6] [7]

Poularikas A.D., Ramadan Z.M. Adaptive Filtering Primer with MATLAB. CRC Press; 1st Edition, 2006. Naylor P.A., Khong A.W.H.. Selective-Tap Adaptive Algorithms for Echo Cancellation . Springer Berlin Heidelberg, 2006. Proakis J., Manolakis D. Digital Signal Processing. Prentice Hall, 4th edition, April 2006. Tingchan W., Chutchavong V., Benjangkaprasert C. Performance of A Robust Variable Step-Size LMS Adaptive Algorithm for Multiple Echo Cancellation in Telephone Network. SICE-ICASE International Joint Conference 2006, Bexco, Busan, Korea, 18-2 1 October, 2006. Cetin E., Kale I., Morling R. C. S. On Various Low-Hardware-Complexity LMS Algorithms for Adaptive I/Q Correction in Quadrature Receivers. International Symposium on Circuits and Systems, (ISCAS 2004), Vancouver, Canada, 23-26 May 2004. Douglas S. C. Adaptive filters employing partial updates. IEEE Trans. on Circuits and Systems II, vol. 44,no. 3, pp. 209–216, March 1997. Stein J.Y. Digital Signal Processing: A Computer Science Perspective. Wiley-Interscience, 1st edition, October, 2000. - 148 -

FUNCTIONAL DELAY TEST GENERATION APPROACH BASED ON EXTRACTING INFORMATION FROM THE SOFTWARE PROTOTYPE Vacius Jusas1, Mantas Smilingis2, Rimantas Seinauskas2 1

Kaunas University of Technology, Software Engineering Department, Studentų 50-404, LT51368 Kaunas, Lithuania, [email protected] 2 Kaunas University of Technology, Information Technology Development Institute, Studentų 48A, Kaunas, Lithuania, [email protected], [email protected] Abstract. The suggested approach for test generation is based on extracting information about device behavior from the software prototype. This approach presents a new attitude towards the problem of the test generation and it enables generating high quality tests in initial design stages when just the software prototype of the device under design is available. Functional test constructed detects near to all transition faults at the gate level of the benchmarks examined. The tests for the device can be generated in parallel with the design activities of the device, and deterministic test generation methods can be used before logical synthesis of the device is completed. When the synthesis of the device is completed, the functional delay test can be minimized, adjusting it to the particular structural implementation. The best test construction approach was selected among the considered ones and the further activities on the test quality improvement were considered. Keywords: test generation, functional delay, software prototype.

1

Introduction

Generally test generation is performed after logical synthesis of the device is completed. In this case, test generation has an effect on the overall duration of a design cycle of the device. Therefore, the efforts are made to compile tests in the initial stages of the device design reducing the costs of test preparation after logical synthesis of the device is completed. A software prototype of a device is composed in the initial design stages; this prototype is used for verifying device behavior and analyzing it. Such a software prototype enables the given input stimuli to estimate the output responses using simulation. We will discuss the use of device’s software prototype for test generation. The device under design has the primary inputs and the primary outputs; meanwhile the software prototype operates with the variables consisting of several bits. Therefore, the bits of input variables are linked to the primary inputs of the device and the bits of output variables are linked to the primary outputs of the device. We will assume that the association between the bits of input/output variables and the primary inputs/outputs of the device is known. The development of the methods of the test generation at the behavioral level is closely related to the following two activities: 1) the development of the new functional faults models and the methods of test generation based on software prototype; 2) the extraction of the information from the software prototype for deterministic test generation using simulation. The second activity was not clearly defined until now and it is more promising as one can use results that are gathered while developing deterministic test generation methods for many years. It has to be mentioned that information obtained from the software prototype using simulation can be incomplete; however, it can be used as the basis for the test generation using the deterministic methods. The investigated problem allows estimating the device behavior on the base of the software prototype. The device behaviour can be partly described by the input stimuli with the labeled values of the active inputs that are named as the activity vectors. The input is active if the change of its value into the opposite one changes the output value. The activity vector is considered essential if the values of its active inputs differ from the values of active inputs of the other vectors. The essential activity vectors, which are obtained using random search methods and simulation, can describe the device behavior incompletely; however, the deterministic tests can be generated on their basis. The deterministic methods of test generation can be modified in such a way that the test generation may use incomplete information about the device behavior. The functional test is generated for all the possible implementations of the device; therefore, its length is much larger in comparison with the length of the test for the particular implementation. When the synthesis of the device is completed, the functional test redundancy can be eliminated by removing such tests that do not detect new faults of the particular implementation. Therefore the length of the functional test is not a critical parameter. However, it is purposeful to have several functional tests of the increasing completeness and of the increasing length at the same time. A more complete test uses more input stimuli that define device functioning. - 149 -

The least complete test is analyzed first of all. If it does not detect all the faults of the particular implementation, a more complete test is analyzed further. During the analysis of device faults, one estimates the output subcircuits in regard to the undetected faults and the more complete test is analyzed only for such output subcircuits. The tests with different level of the completeness can be generated easily on the base of the activity vectors.

2

Related work

Models of physical faults are needed at higher levels of abstraction in order to be able to develop test patterns from functional or behavioral description. Researchers have experienced that the stuck-at fault model works quite well at logic level. Many efforts have been devoted to the problem of finding behavioral level fault model. But no such fault model has been discovered at behavioral or higher level which is universally accepted. Behavior level fault models can be broadly classified into two main categories: 1) fault models related to the description code [4]-[8], [12]; 2) black-box fault models related to input stimuli and output responses [1], [9], [15], [16]. Testing at higher level of abstraction has a lot in common with software testing. Therefore, the pattern generation methods based on the fault model related to the description code can be further classified, namely, code oriented methods and fault oriented methods. The code oriented methods exploit the most widely used metrics developed for automated software testing: statement coverage [7], branch coverage [6] and path coverage [12]. Although there are similarities there are also important differences due to different sources of errors/faults and models in these two cases. The purpose of software validation is to detect design errors whereas the purpose of testing is to detect physical defects and fabrication faults. The fault oriented methods use single bit stuck-at fault model [8], which was firstly introduced in [5], and the variable bit stuck-at fault model [4]. The variable stuck-at fault model means that the variable is stuck-at a particular value. Multiple bit stuck-at faults where all bits have a stuck-at fault are equivalent to variable stuckat faults. Together with bit stuck-at fault models, a condition stuck-at fault, which means that a condition is either stuck-at true or stuck-at false, is used [8]. These models have been derived from the logic level stuck-at fault model but they do not give adequate coverage of physical faults. Faults inside elements that implement operators cannot be modeled in this way. To resolve this problem the fault oriented methods [4] use the operator mutation fault model. This fault model implies that the operator will make a miscalculation for a subset of operand values. It is obvious that for an operator with a large number of inputs, it is practically impossible to enumerate all possible operator mutations and then generate test patterns to test them. Black-box fault models are more universal as they do not depend on the description code; however, such black-box fault models are of little use still. Several black-box fault models were suggested that do not examine the description code, and they are based on the input stimuli and the output responses [1], [9], [15], [16]. The most universal is the single coupling fault model proposed in [15] and extended in [16], which is defined in terms of a single input/output pair, considers the influence of the input value change to the output value change. The definition of coupling fault is realization-independent [16]. The set of all test vectors for a coupling fault is called coupling test set. The average size of the coupling test set approaches to 2n - 1, where n denotes the number of inputs of the module [16]. The only elementary n-input (gate) functions, i.e. AND, OR, NAND, and NOR require n + 1 coupling test. Therefore, the coupling test sets are very large even for small modules. Yi and Hayes [17] extend high-level delay fault models to large modular logic circuits applying a hierarchical approach to delay test generation for modular circuits. The proposed new fault model, which is based on the coupling delay fault model [16], imposes the requirements for robust delay testing on module implementation and on input pattern pair. The proposed fault model has several drawbacks. Each circuit is manually partitioned into multiple modules such that every module output has at most 12 inputs so that the coupling delay test set for each coupling delay fault is kept reasonably small. Although the coupling delay test set for a function z detects all robust path delay faults in any gate-level realization of z [16], the module path delay test set (MPDTS) [17] for a modular realization of z may not detect all such faults. Since complete coverage of robust path delay faults is not guaranteed by MPDTSs, any strong conclusions cannot be drawn from the proposed model. The functional fault models [1] that are named pin pair (PP) and pin triplet (PT) enable to develop the functional test on a base of the software prototype at the early stages of the design process, while the synthesized description of the device is not available yet. The very promising results are achieved in functional test generation for detecting stuck-at faults when the generation is based on PP fault model [1]. The average percent of undetected faults did not exceed 0.5% for ISCAS’85 benchmark circuits. The test sets for PP faults are larger than the test sets generated at the gate level by TetraMAX only 6 times on average. The functional delay fault is denoted as follows [11], [14]: (I, O, t I, t O), where t I is rising (r) or falling (f) transition on input I and t O is rising (r) or falling (f) transition on output O. If we compare the functional delay and PP fault models we see that both models have almost the same meaning with one distinction: the - 150 -

functional delay model is intended for detection of malfunctions in the dynamic behavior of module and the PP fault model – for detection of malfunctions in the static state of module. Based on this observation, we can define how to extend the PP fault test to the functional delay fault test. Every input pattern that detects PP faults is transformed only into k input pattern pairs in such a way that the single signal value transition occurs on every input that is associated with PP fault detection on the considered test pattern [2]. There is another way described in [3] to obtain functional delay fault tests from PP tests. By applying the approach from [3] every input pattern that detects PP faults is transformed only into one input pattern pair in such a way that the signal value transition occurs on every input that is associated with PP fault detection on the considered test pattern. Consequently, if the test for PP faults consists of p input patterns the constructed functional delay fault test has p input pattern pairs, as well. Thus, the obtained test is much shorter than by applying single transition test. The experiments on ISCAS’85 benchmark circuits demonstrated the test shortening of 3.8 times on average [3]. However, the test pattern pairs constructed by applying the approach from [3] possess the change of signal value on more than one input. Therefore, these pattern pairs are multi-input transition (MIT) tests [10] and some of functional delay faults that are functional robustly detectable on single-input transition (SIT) test may become functional nonrobustly detectable [10] or even worse not detectable on considered test pattern pair, because some activation conditions needed for signal transition propagation from particular input to particular output may be corrupted. It is desirable that the generated MIT test would guarantee the function-robust propagation (FRP) property. According to definition from [10] a transition t I on input I is function-robustly propagated as a t O transition to output O when the signal value on O changes if and only if the signal value on I changes. Our experiences on functional delay fault testing revealed that not all the delay faults of the circuits can be detected by SIT tests. Some delay faults require MIT tests. MIT test launches several signal transitions on the inputs of the circuit at the same time. Some of these signal transitions can overlap or partly block each other. Therefore, some delay fault effects can be propagated as the signal glitches, which can be observed and measured by the test equipment. Such a propagation of the fault effect is called a weak non-robust propagation [13]. A conventional transition fault test uses the weak non-robust propagation [13]. Therefore, it is meaningful to think of such the rules that allow generating the functional delay test, which enables the weak non-robust propagation of the fault effect. When the fault effect propagates as the signal glitch, there is no signal transition on the output, if we observe it in the static mode. The combination of several functional delay tests obtained in the different ways allows having the delay test that detects more than 99% of the transition faults [3]. In this paper, we show that the software prototype, which is used for the test generation, could be replaced by the activity vectors obtained from the software prototype. The activity vectors retain the information on the functioning of the device. We introduce the activity vectors in the next section. We present a test generation approach using the activity vectors and we report the results of the experiment in Section 4. We finish with conclusions in Section 5.

3

Activity vectors

Let’s say, the software prototype model has n inputs and m outputs. We denote the input stimulus by P=, where pi= {0, 1}, i=1, 2, …, n. The activity vector Pj= is associated with output j. A component of the activity vector can take on one of the following values: 0, 1, N, V. The value V shows that the complement of the value 1 on the input i changes the value to the opposite on the output j. The value N shows that the complement of the value 0 on the input i changes the value to the opposite on the output j. The activity vectors P1j set the value 1 on the output j, meanwhile the activity vectors P0j set the value 0 on the output j. The values V and N are the active values. The activity vector summarizes n + 1 input stimuli that differ only by single value. Let’s say, we assign the following input stimulus = for the benchmark circuit C17 (Figure 1). This input stimulus sets the value 1 on the output y1. We complement every value of this stimulus one by one and we derive the following activity vector: . The activity vector summarizes the following input stimuli: , , , , , . These input stimuli set the value 1 on the output, except the third one and the fourth one. x1 x3

e10

e22

y1 x4 x2

e1 e16

y2 x5

e19 Figure 1. Benchmark circuit C17 - 151 -

e23

M activity vectors P1j or P0j can be derived for every input stimulus P (M value denotes the number of outputs). The activity vector Pa can cover the activity vector Pb, and we will denote Pa>Pb. The activity vector Pb has the active values only on the same inputs as the activity vector Pa, and the active values of the vector Pb are equal to the active values of the vector Pa on the same inputs. If the active values of the vectors Pa and Pb are the same, then the activity vectors Pa and Pb are equal. The prerequisites of covering the vector Pb by the vector Pa are presented in Table 1. Table 1. The prerequisites of covering

Pa Pb

V V

N N

V 1

V 0

N 0

N 1

The activity vector Pa covers the activity vector Pb, if at least one of the conditions, which are in the last four columns of Table 1, is satisfied. The vectors, which are not covered by the other vectors, are essential. After analysis of input stimuli, the sets of essential vectors A1j and A0j are formed for every output j. The vectors of set A1j set the value 1 on the output j, while the vectors of set A0j set the value 0 on the output j. Let’s consider the benchmark C17 presented in Figure 1. The input stimulus P = = sets the following output values: = . The results of complementing every input value one by one are presented in Table 2. We obtain the following activity vectors according to this table: P0y1 = , P0y2 = . In the same way, for the input stimulus P = we obtain the following activity vectors according to Table 3: P0y1 = , P0y2 = . These vectors are not essential, because they are covered by the previous vectors. Table 2. The complement of input values of stimulus

p1 0 1 0 0 0 0

p2 1 1 0 1 1 1

p3 1 1 1 0 1 1

p4 1 1 1 1 0 1

p5 0 0 0 0 0 1

y1 0 1 0 1 1 0

y2 0 0 0 1 1 0

Table 3. The complement of input values of stimulus

p1 0 1 0 0 0 0

p2 0 0 1 0 0 0

p3 1 1 1 0 1 1

p4 1 1 1 1 0 1

p5 0 0 0 0 0 1

y1 0 1 0 0 0 0

y2 0 0 0 0 0 0

After analysis of all the possible input stimuli, we obtain the following essential activity vectors: A0y1 = {< N1VV0>, , } A1y1 = {< V0V10>, , } A0y2 = {< 1N01N>, } A1y2 = {< 101NV>, , , } We notice that the active values of the essential activity vectors correspond to the variables of the terms of the direct and inverse logical functions. Then, we obtain the following logical functions: y1 = X 1 X 3 + X 2 X 4 + X 2 X 3 y1 = X 1 X 3 X 4 + X 2 X 3 + X 1 X 2 y2 = X 4 X 5 + X 3 X 5 + X 2 X 4 + X 2 X 3 y2 = X 2 X 5 + X 3 X 4

But the complete correspondence not always exists between the values of the essential activity vectors and the variables of the terms of the logical function. It is possible to think of the example when the active values of the activity vectors are subset of the variables of the terms of the logical function. Let’s consider the logical function TTF of three logical variables: Y = X1 X 2 X 3 + X 1 X 2 X 3 - 152 -

Y = X1 X 2 + X1 X 3 + X 1 X 2 + X 2 X 3 + X 1 X 3 + X 2 X 3 After analysis of all the possible input stimuli, we obtain the following essential activity vectors: A1 = {< VVV>, } A0 = {< V00>, , , , , }. That corresponds to the following terms of the logical function: Y = X1 X 2 X 3 + X 1 X 2 X 3 Y = X1 + X 2 + X 3 + X 1 + X 2 + X 3

We notice that the active values of the activity vectors of the set A0 form the incomplete terms of the inverse logical function. The active values of the activity vectors can produce the incomplete terms if all the possible input stimuli are not considered. Such a situation arises for the large circuits. But that it is not the case for the example of the logical function TTF. Conjecture. The active values of the essential activity vectors of the sets A1j and A0j of the output j correspond to the complete or incomplete terms of the direct and inverse logical function. We can not prove this conjecture, but, on the other hand, we were unable to find the example that would contradict to our conjecture. The investigation is difficult because the logical function of the output can be expressed in many different ways. The logical function is not obligatory minimal in all the cases. Because the essential activity vectors correspond to the complete or incomplete terms of the logical function of the outputs, there is a possibility to check whether the output responses of the synthesized circuit not contradict to the existence of the term of the logical function. Let’s consider how it is possible to determine the membership of the term in the logical function of the output. The term consists of the input logical variables. The variable of the term can be in complemented or uncomplemented form. The term is completely defined if the input stimulus assigns to the uncomplemented variables the value 1, whereas the input stimulus assigns the value 0 to the complemented variables. The term X1, X 2 , X3 will be completely defined, if the value 1 will be assigned to the variables X1 and X3, and the value 0 – to the variable X2. The input stimulus, which completely defines the term of the direct (inverse) logical function, sets the value 1 (0) on the output. If the term X1, X 2 , X3 belongs to the direct logical function, then any input stimulus, which completely defines the considered term, sets the value 1 on the output. If the term X1, X 2 , X3 belongs to the inverse logical function, then any input stimulus, which completely defines the considered term, sets the value 0 on the output. Generally, the term of the logical function determines the input stimuli that set the same value on the output. Condition 1. All the input stimuli, which completely define the term, always set the same value on the output. This condition is mandatory, but not sufficient. Any two terms of the logical function, the variables of which do not contradict each other, will satisfy Condition 1. The Condition 2 defines the prerequisites for the single term. Condition 2. Every variable of the term has the corresponding input stimulus that the value change assigned to the variable of the term invokes the value change on the output. Let’s assume that the accordance to the Condition 1 and Condition 2 confirms the existence of the term of the logical function. The activity vector does not support completely the both conditions of the existence of the term. Firstly, the only n-k input stimuli, which completely define the term, are considered, where n – the number of inputs, k – the number of the active inputs. Additionally, the accordance to the Condition 2 is proved only for a single input stimulus. For example, the Condition 2 for the term X1 X 2 of the logical function TTF of three variables is satisfied by two input stimuli 100 and 101. Therefore, when we consider the single input stimulus, we can not obtain a minterm, but only the term X1 X 2 . Based on this observation, we will introduce the Rule 1 specifying how to construct the minterms from the terms. Rule 1. The two terms defined by the activity vectors can be combined into a single term, if the combined terms have the different variables and the values of the inactive inputs are the same as the values of the active inputs. The constructed term has to satisfy the Condition 1 and Condition 2. Based on the Rule 1 for the logical function TTF, we obtain the following essential activity vectors: A0 = {, , , , , }. The combination of and allows obtaining the term X1 X 2 , and the combination of and allows obtaining the term X1 X 3 . In such a way, we can obtain all the terms of inverse function: Y = X 1 X 2 + X 1 X 3 + X 1 X 2 + X 2 X 3 + X 1 X 3 + X 2 X 3 . This logical function is not minimized. If we combine the activity vector with the single other activity vector, we could obtain the minimal logical function: Y = X1 X 2 + X 2 X 3 + X 1 X 3 . - 153 -

The size of the sets of the activity vectors A1j and A0j directly depends on the size of the set of the input stimuli considered. The number of the activity vectors is directly proportional to the number of the terms of the logical function of the output. Therefore, after finding the appropriate number of the activity vectors, this number does not increase more. We could use this feature for the test generation. The input stimuli are generated randomly and selected only those ones that increase the number of the essential activity vectors of the set A1j or the set A0j. The selected input stimuli according to this rule can be used as the test for the device. The generation of the random input stimuli becomes ineffective, when the generation process does not lead to the selection of the new input stimuli. If the generation does not lead to the increase of the sets A1j or A0j during the predefined time limit, the generation is stopped. Based on this simple algorithm, the delay tests were generated for the ISCAS’85 benchmark circuits. The results are presented in Section 4. The random generation is not the most effective way to find the activity vectors. We noticed that the process of finding the activity vectors is more effective, if we use the adjacent generation of the stimuli for the selected ones [1], [2]. The adjacent activity vector differs from the selected one by a single value only. The change of the active value allows obtaining the activity vector, which sets the opposite value on the output, whereas the change of the inactive value allows obtaining the new activity vector in some cases. The probability that the randomly generated stimuli will differ by a single value is small. Therefore, the generation of the adjacent stimuli to the selected ones allows to enrich the random search and to speed up the process of finding the new activity vectors [1], [2]. Additionally, the active values of the activity vectors can be considered as the terms of the logical function of the output and they could be used for the generation of the new input stimuli. In order to obtain the activity vector of the set A1j (A0j) that has the most possible number of the active values, it needs to define as much as possible of the values of the activity vector of the opposite set A0j (A1j) that has the single complemented active value only. This feature allows creating the different deterministic input stimuli generation methods that enrich ineffective random search.

4

Test generation approach

The number of the activity vectors selected for complex devices can be large; therefore, the number of the activity vectors has to be limited. Any imposed restrictions would lead to the loss of the information retained in the activity vectors. Nevertheless, the practical considerations of the used resources compel to apply the restrictions. Therefore, the minimum amount minij = Sj/Si of the values V and N was calculated for the activity vectors of the sets A0j and A1j, where Sj – the number of inputs related to the output j, Si – the number of outputs related to the input i. The obtained value of the amount was rounded to the nearest integer value bigger than Sj/Si. This ratio has to be kept larger than 1/3. The generated activity vector is not included to the set A0j or A1j, if the number of the active values V or N is not increased on any of the inputs, which have the ratio of active values less than minij. In such a way, the number of the activity vectors is controlled for the sets A0j and A1j. The number of the active values V and N for the activity vectors can be minimized. For every activity vector Pj from the set A0j (A1j), the duplicating active values and the opposite active values are removed if the remaining active values do not contradict to the values of the activity vector Pj. The simplest way to generate the pair of the delay test patterns is to use the activity vector as the second pattern of the pair, and the first pattern of the pair is obtained by changing single active value to the opposite one. For every activity vector, the number of test pattern pairs formed is equal to the number of the active values of the activity vector. The constructed test is SIT. But such a test is quite long; therefore, the minimization has to be applied. Table 4. SIT test

Circuit C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Average

Before minimization No AV L 1073 9288 4174 30903 2370 22016 4147 30462 7721 115132 5195 40928 4095 46090 43909 508672 1408 20800 69881 932026 14398 175632

FC% 99.78 94.40 99.83 97.13 96.47 100 99.98 99.90 100 99.45 98.70

After minimization No AV L FC% 800 2987 95.40 1444 10197 94.40 1479 5670 99.33 1997 10113 97.13 5178 43117 98.23 4453 29564 99.98 2711 8263 96.86 39144 190919 99.88 9593 10592 100 17565 253576 99.29 7576 56500 98.30

- 154 -

The results of the experiment on the benchmark circuits ISCAS’85 are presented in Table 4. This table has two parts: before minimization and after minimization. The minimization was based on the minimizing active values V and N. We presented the following attributes: the number of the activity vectors (No AV), the length of the SIT test (L), and the transition fault coverage (FC%). During the minimization, some activity vectors loose all the active values. Such activity vectors are removed from the further consideration. We see that after minimization the number of the activity vectors decreased twice, the length of the SIT test diminished more than 3 times, but the transition fault coverage degraded 0,4 per cent on the average, as well. We know from our experience that the complete coverage of the transition faults for some circuits can be obtained by using MIT tests only. The simplest way to obtain MIT test is to change all the active values to the opposite ones in first pattern of the pair. We name such a method as AVI. Let’s consider the other possible ways of the MIT test construction. The pair of test patterns formed from any of the activity vector from the set A0j (A1j) and from any of the activity vector from the set A1j (A0j) after change the active values V and N to the values 1 and 0 respectively detects transition faults. The pair of the test patterns formed from the activity vectors of the same set A0j or A1j detects also the transition faults but the signal glitch can be observed only. The four cases of the combination of the activity vectors can be considered; V1. For every activity vector from the set A0j (A1j), the first vector of the pair is selected from the set j j A1 (A0 ) such a one, which has the most of the opposite values to the active values of the activity vector; V2. For every activity vector from the set A0j (A1j), the first vector of the pair is selected from the set j j A0 (A1 ) such a one, which has the most of the opposite values to the active values of the activity vector; V3. For every activity vector from the set A0j (A1j), the first vector of the pair is selected from the set j j A1 (A0 ) such a one, which has the least of the opposite values to the active values of the activity vector, but not less than 1; V4. For every activity vector from the set A0j (A1j), the first vector of the pair is selected from the set j j A0 (A1 ) such a one, which has the least of the opposite values to the active values of the activity vector, but not less than 1; Table 5. MIT test generation modes for C17

V1 10110 01110 01101 10010 01010 00100 01110 10110 10010 01101 00100 01010 10101 10010 01010 11111 10010 10101 11111 10011 10010 11100 11111 01010 96%

V2 01010 01110 10110 10010 10110 00100 10010 10110 01110 01101 01110 01010 10011 10010 10101 11111 11111 10101 10010 10011 11111 11100 10010 01010 92%

V3 10010 01110 01110 10010 10010 00100 01101 10110 10110 01101 10110 01010 11111 10010 10010 11111 01010 10101 11100 10011 10011 11100 10101 01010 94%

V4 00100 01110 01110 10010 01110 00100 01010 10110 01010 01101 10110 01010 11111 10010 10010 11111 10011 10101 10101 10011 10101 11100 10011 01010 80%

Let’s consider the circuit C17 in Figure 1. After consideration of all the possible input stimuli, we obtain the following sets of the activity vectors for the output y1: A0y1 = {< N1VV0>,, }, and A1y1 = {< V0V10>,, }; and for the output y2: - 155 -

A0y2 = {< 1N01N>,}, and A1y2 = {< 101NV>,, , }. The test pattern pairs constructed in all four modes for the circuit C17 are shown in Table 5. The second pattern of all the pairs corresponds to the above enumerated activity vectors, when the active values V and N are changed to 1 and 0, respectively. The names of the columns show the mode of the construction of the first pattern of the pair. The second pattern of the pair is taken from the set A0y1 in all the construction modes. The last line of the table shows the transition fault coverage for every mode of the construction. We see that the MIT test does not cover the transition faults completely, but it could be used in order to augment the SIT test. The introduced modes of the delay test construction were applied to the benchmark circuits ISCAS’85. The results are presented in Table 6. The activity vectors were not minimized, because the experimental investigation revealed that the minimization of the active values has no positive influence to the final results. The analysis of the results in Table 6 reveals that the test length is equal to the number of the activity vectors, and the length on average is less about 4 times than the length of minimized SIT test (Table 4). The transition fault coverage on average of the mode V1 is higher almost 1 per cent than the transition fault coverage of the SIT test. The transition fault coverage on average for all the other modes is almost the same but less than in the mode V1. Table 6. MIT test generation modes for the ISCAS’85

Circuit C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Average

No of pairs 1073 4174 2370 4147 7721 5195 4095 43909 1408 69881 14398

V1 FC % 99.93 99.83 100 99.91 98.45 99.89 99.00 100 100 99.14 99.60

V2 FC % 99.35 98.13 100 95.19 98.00 96.48 99.50 100 100 99.13 98.58

V3 FC % 99.35 93.53 99.96 94.39 97.96 99.98 98.69 100 100 98.73 98.25

V4 FC % 99.20 93.47 99.42 92.33 97.79 96.42 99.41 100 100 98.77 97.68

AVI FC% 96.86 92.54 99.83 93.31 97.79 96.71 99.08 99.95 100 96.99 97.31

The constructed test sets were combined together in order to achieve the higher transition fault coverage (Table 7). The names of the columns indicate the combined test sets. The combination of two test sets enlarges the length of the final test set twice, but the obtained fault coverage is higher only 0,18 per cent than in the mode V1. This result leads to the conclusion that the combination of two test sets is not reasonable. In order to obtain the higher transition fault coverage the restrictions should be relaxed on the number of the generated activity vectors. Table 7. Combinations of MIT tests

Circuits C432 C499 C880 C1355 C1908 C2670 C3540 C5315 C6288 C7552 Average

5

No of pairs 2146 8348 4740 8294 15442 10390 8190 87818 2816 139762 28795

V1&V2 FC % 100 100 100 100 98.89 99.95 99.77 100 100 99.18 99.78

V1&V3 FC % 99.93 100 100 99.94 98.70 100 99.83 100 100 99.23 99.76

V1&V4 FC % 100 100 100 100 98.66 100 99.91 100 100 99.24 99.78

V1&AVI FC % 99.93 99.83 100 99.94 98.74 100 99.90 100 100 99.26 99.75

V2&V3 FC% 99.93 99.59 100 97.85 98.58 99.98 99.94 100 100 99.24 99.51

Conclusion

The activity vectors obtained using the software prototype can be successfully used for the construction of the functional delay test. The presented experimental research revealed that the functional delay test can cover almost completely the transition faults.

- 156 -

Several test construction modes using the activity vectors were investigated. The best construction mode is when the first pattern of the pair is taken from the opposite set than the second one, and the first pattern of the pair has the largest number of the values opposite to the active values of the second pattern. The further improvement of the quality of the functional test can be obtained relaxing the restrictions on the number of the generated activity vectors.

References [1] [2] [3]

[4] [5] [6] [7] [8] [9] [10] [11] [12]

[13] [14] [15] [16] [17]

Bareisa E., Jusas V., Motiejunas K., Seinauskas R. The Realization-Independent Testing Based on the Black Box Models, Informatica, Vilnius, Institute of Mathematics and Informatics, Vol. 16, No. 1, pp. 19-36, 2005. Bareiša E., Jusas V., Motiejūnas K., Šeinauskas R. Functional Digital Systems Testing, ISBN 9955-25-008-9, Kaunas, Technologija, 2006, p. 281. Bareiša E., Jusas V., Motiejūnas K., Šeinauskas R. Functional Delay Test Construction Approaches. Elektronika ir elektrotechnika = Electronics and electrical engineering. ISSN 1392-1215. Kaunas, Technologija, 2007, No. 2(74), pp. 49 - 54. Buonanno G., Ferrandi F.. Ferrandi L.. Fummi F., Sciuto D. How an “Evolving” Fault Model Improves the Behavioral Test Generation, Proceedings of Great Lakes Symposium on VLSI, pp. 124–130, 1997. Cho C. H., Armstrong J. R. B-algorithm: a Behavioral Test Generation Algorithm, Proceedings of International Test Conference, pp. 968–979, October 1994. Chiusano S., Corno F., Prinetto P. A Test Pattern Generation Algorithm Exploiting Behavioral Information, Proceedings of Seventh Asian Test Symposium (ATS'98), Singapore, December 1998, pp. 480-485, 1998. Corno F., Prinetto P., Sonza Reorda M. Testability analysis and ATPG on behavioral RT-level VHDL, Proceedings of IEEE International Test Conference, pp.753-759, October 1997. Ferrandi F., Fummi F., Sciuto D. Test Generation and Testability Alternatives Exploration of Critical Algorithms for Embedded Applications, IEEE Transactions on Computers, Vol. 51, Issue 2, pp.200–215, 2002. Kim H., Hayes J.P. Realization-Independent ATPG for Designs with Unimplemented Blocks, IEEE Trans. on CAD, vol. 20, no. 2, pp. 290–306, 2001. Michael M., Tragoudas S. ATPG Tools for Delay Faults at the Functional Level. ACM Transactions on Design Automation of Electronics Systems, Vol. 7, No. 1, pp. 33-57, January 2002. Pomeranz I., Reddy S. M. On Testing Delay Faults in Macro-based Combinational Circuits, Proc. Int. Conf. Computer-Aided Dsign, San Jose, CA, 1994, pp. 332-339. Rudnick E.M., Vietti R., Ellis A., Corno F., Prinetto P., Sonza Reorda M. Fast Sequential Circuit Test Generation Using High-Level and Gate-Level Techniques, Proceedings of IEEE Design, Automation and Test in Europe, pp. 570576, Feb. 1998. Shao Y., Pomeranz I., Reddy S. M., On Generating High Quality Tests for Transition Faults, Proceedings of the 11th Asian Test Symposium (ATS'02), pp.1-8, 2002. Underwood B., Law W.O., Kang S., Konuk H. Fastpath: A path-delay test generator for standard scan designs. In Proceedings of 1994 International Test Conference, pp.154–163, 1994. Yi J., Hayes J. P. A Fault Model for Function and Delay Testing, Proc. of the IEEE European Test Workshop, ETW'01, pp. 27-34, 2001. Yi J., Hayes J. P. The Coupling Model for Function and Delay Faults, Journal of Electronic Testing: Theory and Applications, vol. 21, No. 6, pp.631–649, 2005. Yi J., Hayes J. P. High-Level Delay Test Generation for Modular Circuits, IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, Vol. 25, No. 3, March 2006, pp. 576-590.

- 157 -

VARIABLE RESPONSE ZONE ROUTING FOR AD HOC NETWORKS Rimantas Plestys, Rokas Zakarevicius Kaunas University of Technology, Department of Computer Networks, Studentu str. 50-416, Kaunas, Lithuania, [email protected], [email protected] Abstract. A mobile Ad Hoc network consists of mobile nodes connected to each other via wireless links. The nodes function both as routers and as host devices. They can move often and form different network topologies, so proper routing protocols have to be used in order to ensure sufficient quality of network services. We focus on location-based on-demand routing protocols that operate by using the location information of network nodes. We propose a limited response zone routing algorithm (LRZR), which operates by changing the size of the response zone in separate steps of the route search process. The simulations of LRZR and other location-based routing algorithms have been performed and the results are presented in this paper. Keywords: Ad Hoc networks, location-based routing, variable response zone.

1

Introduction

A mobile Ad Hoc network is made of mobile nodes capable to communicate to each other via wireless links. The nodes can move freely in limited area and therefore the network may take different topology. Changes of the topology break off the established routes of the network. Packet transmission routes have to be reestablished dynamically, and Ad Hoc network routing protocols are responsible for that. Ad hoc network nodes function simultaneously as routers performing packet routing functions and as hosts sending and receiving data packets. Any route reestablishment protocol generates the additional stream of control packets – the overhead, which temporarily reduces the network bandwidth. It is desirable to minimize the amount of control packets and at the same time to get routes reestablished as soon as possible. The routing protocols can be classified into location-unaware and location-based routing protocols. Location-unaware protocols do not use node location information for routing. Ad-Hoc On Demand Distance Vector (AODV) [1] is a reactive routing protocol for Ad Hoc networks, operating on the on-demand basis [1, 2]. It requests a route only when needed and does not require nodes to maintain routes to the destinations that are not currently communicating. The active routes are temporarily cached in the routing table. AODV protocol operates by broadcasting route request (Rreq) packets in all directions over the network. Each intermediate node forwards request packets to other nodes so that all the network is flooded by route search packets. If the route to the destination node D is found, it is made available by unicasting a route reply (Rrep) packet back to the source along that temporary path. The intermediate nodes insert forward route records into their routing tables. In this way, the source S and all the route nodes M 1 , M 2 ..., M n have the route to the destination D. Dynamic Source Routing (DSR) [5] protocol is based on source routing. The route to the destination node D information is stored in data packet headers. I.e. the source S knows all the intermediate nodes of the route, and these nodes have the information only about the directly reachable nodes. As for AODV protocol, a network node keeps active routes temporarily in the routing table. Two main definitions: request zone and response zone should be emphasized when talking about routing protocols for Ad Hoc networks. Request zone is a space within the wireless signal transmission range of the node S; therefore, all the neighbor nodes inside this space receive all the packets sent. Response zone is a space, where nodes send response packets or forward request packets further to the network, i.e. the nodes react to the request packets they receive. Location-based routing protocols use node location information for routing. The assumption is made that each node knows the current locations of all other network nodes [6, 7]. The following on-demand routing protocols are described below. Location-aided routing (LAR) [6] protocol contains two algorithms: LAR-1 and LAR-2. LAR-1 algorithm uses a fixed rectangular response zone to limit the number of nodes participating in route search process. In LAR-2 algorithm, response zone contains only the nodes that are closer to the destination node D than the node from which they received the route request packet. They forward this Rreq packet further into the network. Each node in the request zone has to calculate the distance to the destination D to detect if it is inside the response zone. The new response zone is formed in every next step of the route search process. The number of such zones varies depending on the number of nodes responding to the request packets. Another on-demand location-based routing protocol NB-GEDIR [7] also operates by calculating distances to the destination D. The nodes in the request zone respond by sending their coordinates to the requesting node. After receiving the location reply packets, the source (or intermediate) node determines its nexthop node with the minimum distance to the destination D. - 158 -

Route search overhead can be reduced by using node location information to exploit the geometric relationship among the network nodes [3]. LAR-1, LAR-2 and NB-GEDIR protocol response zones are not variable. In this paper, our goal is to change the size of the response zone to reduce the network overhead.

2

Variable response zone routing

2.1

Modified Location-aided Routing (MLAR-1) algorithm LAR-1 algorithm operates by flooding the response zone with route search packets. The response zone originally is the smallest rectangle SADB, which contains the location of source node S and destination node D [6] (Figure 1). The node Mi is inside the response zone therefore it forwards the Rreq packet to other nodes. The node Mj ignores the Rreq packet, because it is outside the response zone. The weakness of LAR-1 algorithm is the response zone SADB being initially set by the source node S and fixed during all the routing process. We propose a modified LAR-1 algorithm (MLAR-1), where response zone is varied at every step of the routing process – each node includes its own response zone data into the Rreq packet. That data is the coordinates of the smallest rectangle containing current node Mi and destination node D (Figure 1). When the node Mg sends the Rreq packet with its response zone coordinates inside, Mk receives the packet as it is in the request zone of the node Mg. However, the node Mk does not send the Rreq packet further into the network as it is outside the response zone of the node Mg. The advantage of our modification comparing with the original LAR-1 algorithm is that the response zone size decreases when nodes closer to the destination D are sending the Rreq packets.

Figure 1. MLAR-1 algorithm with variable response zones

We see that MLAR-1 algorithm operation is similar to the flooding-based protocols (e.g. AODV or DSR), but the response zone limits the size of the network flooding.

Figure 2. LAR-2 algorithm – the response zone is shown as the shaded area. - 159 -

In the case of LAR-2 algorithm, the response zone is not explicitly specified in a route request message. Here the source S calculates the distance S, D  to the destination D with the location information known – (XS,YS,), (XD, YD), and broadcasts a route request packet to the network (the distance S, D  and the coordinates XD;YD are placed into the request packet) (Figure 2). When the network node Mi receives the route request, it calculates its distance M i , D  to the

destination D. If the distance  M i , D  is smaller than the distance S, D  , i.e. M i , D   S , D  , the node Mi forwards the Rreq packet further to the network, otherwise it ignores the Rreq packet [6]. In Figure 2, the node Mj ignores the Rreq packet, because M j , D   S , D  . Rs is the radius of the request zone, and ρ1, ρ2, ρ3 are

the distances from the nodes to the destination D, i.e. the radii of the arcs that limit the response zone. As the nodes are closer to the destination D, the response zone is smaller, as it can be seen in Figure 2. 2.2

No-Beacon GEDIR (NB-GEDIR) protocol In the case of the No-Beacon GEDIR (NB-GEDIR) [7] routing protocol, the source node S or some intermediate node requests location information from neighbor nodes. As it is made on-demand, the network bandwidth is saved, since the original GEDIR [4] protocol periodically sends beacon messages with location information into the network. NB-GEDIR protocol (Figure 3) operates assuming that the source S knows the geographic location of the destination node D, and each network node knows its own location.

Figure 3. NB-GEDIR algorithm – the response zone is shown as the shaded area.

Routing operates according to the steps below [7]: a) The source S broadcasts the location request (Lreq) packet to all neighbor nodes within a wireless signal transmission range (the request zone). The request contains the location information of both source S and destination D. b) On receipt of Lreq packets all the nodes M in the request zone calculate their distances to the destination D and compare them with the distance of the Lreq packet sender node (in this case source node S). If M , D   S , D  , the node transmits its location information to the source S in a location reply (Lrep) packet. c) After the receipt of Lrep packets the source S chooses the node Mi, which is closest to the destination D, i.e. M i , D   S , D  and M i , D   min M , D  , and sends route request (Rreq) packet to the node Mi. Now the node Mi becomes the next-hop node and the route search process is repeated. If there are no nodes sending reply packets or they do not meet the condition M , D   S , D  , the route search process is aborted considering that the path was not found. d) Then the Rreq packet is received by the destination D, it is considered that the path has been found. As the Rreq packets have travelled all the way through a number of intermediate nodes, this circuit of nodes is considered the shortest route from the source S to the destination D. In the case of NB-GEDIR protocol, all the nodes that are inside the circle zone with radius Rs and meeting the condition M , D   S , D  , react to the route request packets – they send their location information to the requesting node. In Figure 3, ρ1, ρ2, ρ3 are the distances from the nodes to the destination D, i.e. the radii of the - 160 -

arcs that limit the response zone. As the nodes are closer to the destination D, the response zone is smaller. As the next-hop node is chosen that is closest to the destination node D, many other nodes in the response zone being far away from the destination may never be chosen even they have sent Lrep packets. Therefore, the weakness of NB-GEDIR routing protocol is that the response zone size depends only on the radius Rs and the distance from the current node to the destination node D. If the radius Rs is quite big or network density is high, many Lrep packets are created and transmitted through the network.

3

Limited Response Zone Routing (LRZR) algorithm

It could be possible to reduce the number of Lrep messages in the NB-GEDIR request zone with radius Rs by applying the response zone limit with radius rs (Figure 4).

Figure 4. LRZR-1 algorithm – the response zone is shown as the shaded area.

We propose, that only the nodes, which rs  M , S   RS , can send Lrep messages. It is important that at least one node M would exist in the response zone. In Figure 4, the node Mj will not send its location information since it is outside the response zone, i.e. M j , S   rs . S Since some nodes inside the response zone can be farther from the destination D than source S, they will never be chosen as next-hop nodes. Therefore, the response zone can be reduced by applying the constraint M , D   S , D  . It is presented as LRZR-2 algorithm in Figure 5. This condition was already proposed for NB-GEDIR protocol [7]. ρ1, ρ2, ρ3 are the distances from the nodes to the destination D, i.e. the radii of the arcs that limit the response zone. As the nodes are closer to the destination D, the response zone is smaller. The node Mj will not send its location information since it is outside the zone, i.e. M j , S   rs . The nodes Ma and Mb S

will also ignore the location request message, because M a , D   S , D  and M b , D   S , D  .

Figure 5. LRZR-2 algorithm – the response zone is shown as the shaded area.

- 161 -

In real networks nodes are scattered, i.e. the distances among them are different. Therefore the response zone width Rs  rs is increased if there are no nodes that M , S   rS , M , S   RS . As Rs is the maximum distance, only rs has to be reduced. Then a number of route search steps are successful, the response zone again could be reduced in order to minimize the number of control packets while searching for route. Since the algorithm is based on making dynamic response zone limitations, it was named LRZR (Limited Response Zone Routing). This algorithm must be flexible in changing the response zone size to find the route while generating the smallest amount of control packets. We propose to use LRZR-2 algorithm with constraint M , D   S , D  by changing the values of rs . Initially rs  r0 is used in route searching process. If there are any nodes answering with Lrep packets, the further routing steps operate with the same value rs  r0 . If no nodes are found in some route search step, the rs is reduced and set to some value rs  rn , n  1, N , where rn  rn 1 . If still there are no nodes responding with Lrep packets, the algorithm finally transforms into NB-GEDIR, where rs  0 . Both NB-GEDIR and LRZR algorithms make route search in the direction with the shortest distance to the destination D. There are cases then routes will not be found by using these algorithms, even that they exist in the network. It happens when there are no device M, that M , D   S , D  , though some nodes exist in an opposite direction, that could be used for route creation. In such cases, the algorithms mentioned abort the route search process considering that the route was not found, even when the route exists in the network. Here the flooding-based protocols (AODV or DSR) should be used, because they will always find the route, if it exists in the network.

4

Network overhead simulations

To model the operation of Ad Hoc network routing protocols a simple network structure – rectangular grid has been used. The simulation programs have been written using MatLab to implement the operation of flooding-based (AODV or DSR), and location-based (LAR and NB-GEDIR) protocols according to their authors [1, 5, 6, 7]. The newly proposed LRZR (Limited Response Zone Routing) algorithm operation was also implemented in the simulator. The purpose of the research was to perform routing protocol simulation with different data in order to analyze the operation in separate steps of the routing process. The simulations were made on a square network model with dimensions [20 x 20], 400 network nodes in total. The radius of the request zone is Rs=100m. The distances among adjacent nodes in perpendicular directions are equal to dist=35 m. The number of nodes in the request zone is 24. The coordinates of the source node S are (10;11), and the destination D – (1;20). The simulation results of different routing protocols are presented in Figure 6. The diagrams show the cumulative number of packets generated during the routing process. Here LRZR algorithm is with constrains rS  7 / 8  R S and M , D   S , D  . 400 350

DSR, AODV

300 Packets

250 200 LAR-2

150 100

LAR-1

NB-GEDIR

50 LRZR (rs =7/8Rs )

0 0

2

4

6

8

10

12

14

16

Steps of the routing process

Figure 6. Simulation results of different on-demand location-based routing protocols for Ad Hoc networks.

The performance of LAR-1 and LAR-2 algorithms in all cases gave significantly better results than flooding-based protocols AODV or DSR. This can be explained by the fact that even the LAR protocol performs flooding, it uses location information to reduce the request zone in order avoid global flooding and to speed up the route search process. This can be clearly seen in the presented simulation results (Figure 6), because the source node S was in the middle of the rectangular network (coordinates 10;11), and the destination node D was - 162 -

at the corner (1;20). Therefore, only some part of network nodes were sending and receiving route request packets. The NB-GEDIR protocol generates significantly smaller control packet burst, so the degradation of the network quality of service is smaller than the protocols mentioned above. However, the route search process takes much longer, because location request and response packets as well as best node election packets are being transferred in separate route search operation steps. LRZR algorithm case with rS  7 / 8  R S and M , D   S , D  performed best in this simulation as the response zone was the smallest then comparing with other algorithms. Different LRZR algorithm cases were simulated separately to analyze the reduction of routing overhead in various response zones. The simulation results are presented in Figure 7. 70

LRZR-1

60

NB-GEDIR

Packets

50 LRZR (rs =Rs /2)

40 30

LRZR (rs =3/4Rs )

20 10

LRZR (rs =7/8Rs )

0 0

2

4

6

8

10

12

14

16

Steps of the routing process

Figure 7. Simulation results of LRZR and NB-GEDIR routing protocols.

The newly proposed LRZR algorithm in most cases generate even smaller control packet burst, so the affect on the network quality of service is minimal. Only the LRZR-1 algorithm case with rS  R S / 2 generates a bigger amount of control packets than NB-GEDIR protocol. This happens because NB-GEDIR protocol response zone is constrained with M , D   S , D  , therefore only the nodes closer from the destination D than source S will respond with Lrep packets. LRZR-1 is not constrained in such way. Other LRZR algorithm cases with rS  R S / 2 ; rS  3 / 4  R S ; rS  7 / 8  R S apply the constraint mentioned above therefore their performance is improved. As mentioned before, the simulations were performed on a well-ordered network structure, so the mentioned LRZR algorithm cases were simulated separately. In the complex LRZR algorithm case, if the response zone in some route search step is not suitable, i.e. no nodes are found, the size of the response zone is changed. Then route requests have to be re-sent, and that increases the route search delay.

5

Conclusions

The routing control packet stream is generated in the network during the route search process. In mobile Ad Hoc networks the network topology changes as nodes are moving, so the number of route search requests can be high. This overhead reduces the network bandwidth and quality of network services provided. As the nodes function both as routers and as host devices, it is important to use the proper routing protocols that would minimize the routing overhead. The flooding-based routing protocols (AODV and DSR) generate a big network overhead, especially when the network is more dynamic. Route search overhead can be reduced by using node location information. The LAR and NB-GEDIR location-based routing protocols have been analyzed in this paper. We propose to use the new definition response zone when talking about location-based routing protocols. The route search overhead can be more reduced by changing the size of the response zone. The modification of LAR-1 algorithm (MLAR-1) was proposed, so that each node should include its own response zone data into the route request packet (the coordinates of the smallest rectangle containing current node M i and the destination node D). We propose a limited response zone routing algorithm (LRZR), which is based on the NB-GEDIR protocol. LRZR operates by changing the size of the response zone (the width Rs  rs ) in separate - 163 -

steps of the route search process. The simulations were made with different response zone size values of the LRZR algorithm. They indicate that using this algorithm results in lower routing overhead, as compared to other location-based routing protocols. If no nodes are found in some route search step, the size of the response zone has to be increased. Therefore, the route search delay increases, because route requests are re-sent in such case. If the route cannot be found using the location-based routing protocols, the flooding-based protocols (AODV or DSR) should be used. They will always find the route, if it exists in the network, although flooding the network with a large number of route request packets can reduce the quality of some network services.

References [1] [2] [3] [4] [5] [6] [7]

Perkins C. E., Royer E. M., Ad-Hoc On-Demand Distance Vector Routing, Mobile Computing Systems and Applications, 1999. Proceedings. WMCSA '99. Second IEEE Workshop, Feb. 1999, pages 90-100. Royer E.M., Toh C.K., A Review of Current Routing Protocols for Ad Hoc Mobile Wireless Networks, IEEE Personal Communications Magazine, 1999, pages 46-55. Tseng Y-C., Wu S-L., Liao W-H., Chao C-M.., Location awareness in ad hoc wireless mobile networks, Computer, June 2001, pages 46-52. Stojmenovic, I., Xu Lin., Loop-free hybrid single-path/flooding routing algorithms with guaranteed delivery for wireless networks, IEEE Transactions on Parallel and Distributed Systems, vol. 12, no. 10, 2001, pages 1023–32. Johnson D. B., Maltz D. B., Dynamic Source Routing in Ad Hoc Wireless Networks, Mobile Computing, T. Imielinski and H. Korth, Eds. Kluwer Academic Publishers, 1996, ch. 5, pages 153-181. Ko Y-B., Vaidya N.H., Location-Aided Routing (LAR) in Mobile Ad Hoc Networks, Conference Proceedings, Mobile Computing MOBICOM, 1998, pages 66-75. Watanabe M., Higaki H., No-Beacon GEDIR: Location-Based Ad Hoc Routing with Less Communication Overhead, ITNG '07. Fourth International Conference on Information Technology, 2007, pages 48-55.

- 164 -