Correction Capability based on Static Slicing and ...

3 downloads 2547 Views 647KB Size Report
Correction Capability based on Static Slicing and. Dynamic Ranking for RTL Datapath Designs. Bijan Alizadeh, Senior Member IEEE, Payman Behnam, and ...
This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

A Scalable Formal Debugging Approach with AutoCorrection Capability based on Static Slicing and Dynamic Ranking for RTL Datapath Designs Bijan Alizadeh, Senior Member IEEE, Payman Behnam, and Somyeh Sadeghi-kohan, Student Member IEEE Abstract— By increasing the complexity of digital systems, verification and debugging of such systems have become a major problem and economic issue. Although many Computer Aided Design (CAD) solutions have been suggested to enhance efficiency of existing debugging approaches, they are still suffering from lack of providing a small set of potential error locations and also automatic correction mechanisms. On the other hand, the ever-growing usage of Digital Signal Processing (DSP), Computer graphics and embedded systems applications that can be modeled as polynomial computations in their datapath designs, necessitate an effective method to deal with their verification, debugging and correction. In this paper, we introduce a formal debugging approach based on static slicing and dynamic ranking methods to derive a reduced ordered set of potential error locations. In addition, to speed up finding true errors in the presence of multiple design errors, error candidates are sorted in decreasing order of their probability of being an error. After that, a mutation-based technique is employed to automatically correct bugs even in the case of multiple bugs. In order to evaluate the effectiveness of our approach, we have applied it to several industrial designs. The experimental results show that the proposed technique enables us to locate and correct even multiple bugs with high confidence in a short run time even for complex designs of up to several thousand lines of RTL code. Index Terms— Formal verification, equivalence checking, debugging and correction, decision diagram, RTL datapath designs, HED

——————————

——————————

1 Introduction erification is the process to check whether there is a discrepancy between a given design and its specification or not. Debugging aims to find the location of the observed error(s) in the verification phase while repairing means how bugs can be removed or the design can be rectified to make the design function in the way it was desired based on the specification. Although diagnosis and debugging objectives have some overlap, the problems are fundamentally different. In case of diagnosis, the patterns that cause failures as well as golden responses are known, and the task is to identify the failing portion of the design. By contrast, in debug we have a complete specification and only known is that an error happens after runs from seconds to hours. In that sense, what we are going to propose is a debugging technique and not a diagnosis one. While improvements in verification allow engineers to find a larger fraction of design errors more efficiently, little effort has been devoted to fixing such errors. As a result, debugging remains as expensive and challenging task. With the continued growth in complexity of digital system designs and advances in design techniques, automated debugging along with auto-correction capability especially in the case of multiple errors, becomes increasingly significant. To cope with large sizes of real world digital designs, reducing run times in verification, debugging and correction is a key point. Despite the advances in debugging techniques, the process of finding reduced potential error locations, finding true design errors, distinguishing false positive errors and then repairing the errors remain as hard

V

————————————————

• Bijan Alizadeh, Payman Behnam and Somayeh Sadeghi-kohan are with School of Electrical and Computer Engineering, College of Engineering, University of Tehran, North Kargar Ave., Tehran 14399-515, Iran. E-mail: {b.alizadeh, payman.behnam, sm.sadeghi79}@ut.ac.ir.

tasks that require a large processing time and ad-hoc manual efforts [1]. There is a large amount of literature on debugging and repairing. However, most of them focus on the gate level [7-10]. The authors of [7] proposed a technique that encodes the debugging problem as a Conjunctive Normal Form (CNF) formula and adds several constraints such as input test vectors, golden responses and error cardinality constraints. Then, the augmented CNF is passed to an SAT solver to find satisfying assignments that correspond to potential bug locations. This method can be applied to both fault and logic diagnosis problems. The authors of [8] use CNF to extract a set of unsatisfiable cores to accelerate diagnosis problem especially in the case of multiple errors in design. Quantified Boolean Formula (QBF) based techniques [9] as alternatives of iterative logic array techniques have been suggested to efficiently represent sequential circuits. In [10] a debugging platform based on Maximum Satisfiability (MaxSAT) solvers has been suggested which works in two stages: 1) finding coarse solutions based on approximate MaxSAT, and 2) obtaining the final result using the exact SAT debugger. Most hardware debugging tools, however, are based on bit-level methods like BDD, SAT or MaxSAT which suffer from space and time explosion problems when dealing with multiple errors in large industrial designs. Furthermore, almost all SAT-based diagnosis approaches cannot provide any suggestion to fix bugs. Due to faster design changes, design complexity and required simulation speed, designers tend to move up in level of design abstraction from gate level to Register Transfer Level (RTL) or Electronic System level (ESL). On the other hand, although there have been lots of works on RTL verification in the literature [11][14][30][35], the lack of scalable and powerful RTL error debugging and correction tools significantly increases time to market and

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

consequently reduces designer’s productivity. To address this problem, some techniques have been proposed that work directly at the RTL. The techniques presented in [11][30] employ software analysis approaches that implicitly use multiplexers (MUXes) to identify which statements in the RTL code would be error candidates. However, this technique returns large potential error sites. One way to alleviate this problem is to explicitly insert MUXes into Hardware Description Language (HDL) code [2]. This approach makes use of hardware analysis techniques and greatly improves the accuracy of error diagnosis. The authors of [14] presented a debugging technique that formally analyzes an HDL description and failed properties to identify buggy statements. This technique can only be used in a formal property checking framework and cannot be applied in a formal equivalence verification flow common in the industry today. The authors of [18] make use of the extra observability of assertions to extract a smaller potential error space. There have been approaches based on Satisfiability Modulo Theory (SMT) solvers to debug RTL designs. These methods depend on the existence of erroneous traces and also their corresponding correct results [17][29]. In [17] a diagnosis method is presented which extends the SAT-based diagnosis for RTL designs. The design description as well as error candidate signals are specified in the word level and therefore word-level MUXes are added into error signals. Finally, to solve the resulting formula, a word-level SAT solver is used. The method presented in [28] works for ESL designs and can find bug locations based on dynamic slicing and correct them based on mutation technique. However [28] may fail when new counterexamples, which are not considered during correction, are to be taken into consideration. Hence we cannot make sure whether corrected design is correspondent to the specification or not. In [15] a method called rank ordering of error candidates has been proposed to accelerate finding bug in the extracted reduced potential error location set. In this work, a new probabilistic confidence score (PCS) has been suggested that takes the masking error situation into consideration in order to provide a more reliable and accurate debugging priority to reduce error-searching process in the derived potential error set. However, this method is based on the test set which are generated by the simulation. If the generated test set has not enough coverage, the design bug cannot be detected and hence debugging efficiency would significantly reduce. After localizing errors, the next step is correction. Design error correction for combinational circuits has been thoroughly studied for decades. Some works including error-matching-based [28] and re-synthesis [6] have been proposed during last years. Although the work in [17] is able to diagnose and correct RTL design errors automatically, it relies on state-transition analysis and therefore is not scalable to handle large industrial circuits. Although resynthesis is represented as a partial truth table based on the design stimuli under verification, using it for RTL or high level design error correction has some drawbacks. First, the correction is not readable and therefore cannot be checked with the designer. Second, the correction is based on a limited set of counterexamples and therefore we cannot ensure whether corrected design corresponds to the specification or not. Hence, it may fail when new counterexamples which are not considered during the correction, need to be taken into account. A symbolic execution

method based on SMT solver for imperative software programs described in C was proposed in [11]. The proposed correction method is based on templates. This is a technique borrowed from the field of synthesizing loop invariant which ensures that the repairs are readable. However, it works only for high level design abstraction. The ever growing usage of DSP and multimedia applications necessitate an efficient method to handle their debugging problem. For such applications, often a “golden model” is written in high level languages like C or Matlab that is converted to an implementation in HDL [33]. Since there are many differences between hardware and software, e.g. parallelism and clock-based synchronization, a correct conversion from software to hardware will be a difficult task. Therefore, the process of generating RTL design (automatically or manually) from higher level programs such as C or Matlab needs to be checked whether is performed correctly or not. Moreover, it has been reported that up to 80% of the overall design costs are due to verification so that more than 60% of verification effort is somehow related to debugging [37]. Hence, efficient debugging of datapath RTL designs is desirable that requires a scalable representation model. In the recent years, a strong and scalable high level decision diagram called Horner Expansion Diagrams (HED) has been proposed [32]. In order to verify polynomial datapath designs over bit-vectors, the authors of [5] have enhanced the HED to manipulate modular arithmetic circuits and called it Modular-HED (M-HED). This decision diagram has a compact and a canonical form, and also is close to high-level descriptions of a design. In this paper, we propose a scalable and powerful debugging and correction system for datapath intensive applications with multiple design errors at the RTL. Fig. 1 shows the proposed debugging and correction system. Unlike those debugging methods that depend on the existence of erroneous traces (or test stimuli) and their corresponding correct responses, our work is based on a high level specification as a golden model that its equivalency to the RTL implementation should be checked. One of advantages of our correction system is that fixing new bugs never re-introduce old bugs due to equivalence checking with a golden model, while those correction mechanisms that rely on test stimuli have such a problem. Another its advantage is that it enables us to quickly focus on a small region of the design by automatically finding a reduced ordered set of potential bugs that are ranked based on a dynamic ranking mechanism introduced in this paper and finally the bugs are automatically corrected. The proposed debugging and correction system only requires a buggy implementation at the RTL and algorithmic level specification as the systems of multivariate polynomial functions making it more practical than existing formal analytical solutions. Since the specification is a high level model of what should be implemented, we assume that the specification is bug free. As mentioned before, such specification is usually available for datapath-intensive designs. First of all, the equivalence between the implementation and specification is checked. If they are not equivalent, debugging and then repairing are started which include four phases: 1) looking for buggy outputs using the Modular Horner Expansion Diagram (MHED) [5], 2) finding reduced ordered set of error candidates by static slicing, backward and forward path tracing, 3) ranking of error candidates through priority criterion and 4) repairing along with dynamic ranking. Although the path tracing is a powerful

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

well-established technique for debugging, we use it in combination of correlation graph and dynamic ranking in order to quickly localize and repair multiple bugs. To repair the implementation, the main idea is introducing changes into the implementation which is called mutation. The buggy implementation is changed by considering different mutants. Then we observe whether the implementation and specification are equivalent or not. If not, we select the candidate that can correct the maximum number of buggy outputs. By changing the implementation based on selected candidate and its mutations, priority criterion based on a new set of buggy and bug free outputs is computed (i.e., dynamic ranking) and then this process is repeated again. The algorithm finishes when the design becomes corrected or all error candidates are taken into account. Note that the specification is a C-code and the implementation is often modeled with fixed-size datapath architectures, so that polynomial computations are carried out over n-bit integers where the size of the entire datapath is fixed by way of signal truncation. Hence, equivalence verification of two polynomial functions extracted from high level specification and the implementation must be carried out over a finite word-length. For doing so, M-HED as a canonical representation of polynomial functions over finite word-length, is used as shown in Fig. 1. Hence, the main contributions of this paper are as follows: • Automatically deriving a reduced set of potential error locations and ranking them based on their probability of being a true error(s) using high level decision diagrams and correlation graphs. • Presenting an automatic formal debugging and incremental repairing mechanism based on equivalence checking between the implementation and the specification. • Guaranteeing that corrected design is functionally matched to the specification, because the corrections are not limited to counter examples and are checked against the specification. Implementation

Specification Using M-HED

Modular Equivalence Checking Phase 1

END

Passed

Failed Using M-HED

Finding Buggy Outputs Phase 2 Deriving a Reduced Set of Error Candidates Phase 3 Phase 4

Ranking using Priority Criterion Correction Along with Dynamic Ranking

Using Static Slicing and Path Tracing

Based on Mutations of Selected Candidates

Fig. 1. Four phases of our debugging and repairing/correction system for RTL datapath designs (square boxes show the actions to be taken and round boxes show how to perform them. For example, Modular Equivalence Checking is performed by means of M-HED).

The rest of the paper is organized as follows. In Section 2 we briefly describe a canonical graph-based representation of polynomial functions over bit-vectors, i.e., M-HED [5], that is used for equivalence verification purposes. Section 3 describes program slicing technique used in our debugging approach. Section 4

presents the proposed method for debugging when the hardware implementation and algorithmic level specification are not equivalent. In Section 5, our auto-correction mechanism along with dynamic ranking will be explained. In Section 6, we discuss experimental results and finally a brief conclusion and future work are presented in Section 7.

2 POLYNOMIAL EQUIVALENCE CHECKING In this section, we briefly present our graph-based representation called Modular Horner Expansion Diagram (M-HED) [5] for functions with a mixed Boolean and integer domain and an integer range to represent arithmetic operations at a high level of abstraction, while other proposed Word Level Decision Diagrams (WLDDs) [20] are graph-based representations for functions with a Boolean domain and an integer range. Furthermore, the M-HED is able to deal with modular arithmetic computations over finite rings as discussed in the following subsections. 2.1 Horner Expansion Diagram (HED) The HED is a binary graph-based representation which is able to represent polynomial function by factorizing variables recursively as shown in (1), where const is a term which is independent of variable y, while linear is another term which is served as the coefficient of variable y [5]. F(y,…) = F(y=0,…) + y × [F’(y=0,…) +…] = const + y × linear

(1)

Definition 1 (HED Representation): HED is a directed acyclic graph G = (VR, ED) with vertex set VR and edge set ED. While the vertex set VR consists of two types of vertices: Constant (C) and Variable (V), the edge set indicates integer values as weight attribute. A Constant node v has as its attribute a value val(v)∈ Z. A Variable node v has as attributes an integer variable var(v) and two children const(v) and linear(v) ∈ {V, C}. According to the above definition, Fig. 2(a) depicts vertex v in the HED that denotes an integer function fv defined recursively as follows: • If v∈C (is a Constant node), then fv = val(v). • If v∈V (is a Variable node), then fv=const(v)+var(v)*linear(v). In order to further reduce the size of HED by means of looking for isomorphic graphs, a normalization phase (ConstWeight and LinearWeight shown in Fig. 2(b)) is applied in which any common factor between the weights assigned to the edges of const and linear portions is extracted by taking the greatest common divisor (gcd) of the weights. Example 1: Fig. 3 illustrates how f(x, y, z) = 24-8z+12y+12yz-6x6x2z is represented by the HED. Let the ordering of variables be x > y > z. First the decomposition w.r.t. the variable x is taken into account. As shown in Fig. 3(a), after rewriting f(x, y, z) = (248z+12y+12yz) + x(-6-6xz) based on (1), const and linear parts will be 24-8z+12y+12yz and -6-6xz, respectively. The linear part is decomposed w.r.t. variable x again due to x2 sub-monomial. After that, the decomposition is performed w.r.t. variable y as shown in Fig. 3(b). Finally, decomposition w.r.t. variable z is performed as shown in Fig. 3(c). In order to reduce the size of the HED representation and to make it canonical, redundant nodes are removed and isomorphic sub-graphs are merged based on

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

normalization phase as explained before. In order to normalize the weights, first of all, the values of Constant nodes (terminal nodes in Fig. 3(c)) are moved to the related edges so that we only have a terminal node 1 (see node 1 in Fig. 3(d)). Then, gcd(24,-8) = 8, gcd(12,12) = 12, gcd(8, 12) = 4 and gcd(-6,-6) = -6 are taken to extract common factors. Finally, Fig. 3(d) shows the normalized graph where gcd(4,-6) = 2 is taken to extract the common factor between out-going edges from x node. In this representation, dashed and solid lines indicate const and linear parts, respectively. Note that in order to have a simpler graph; paths to 0-terminal have not been drawn in Fig. 3(c) and Fig. 3(d). In order to show how a piece of HDL behavioral code related to a datapath component can be represented by mean of the HED, let us consider behavioral code in Fig. 4(a) where O is the primary output. After applying symbolic simulation and replacing D from (2), the primary output O is represented by the HED as shown in Fig. 4(b). (2) (3)

D = X×Y×c1 + (X+Y)×(1-c1) O = (D-Z)×c2 + (D×Z)×(1-c2) v

v

1 const(v)

LinearWeight

ConstWeight

1

const(v)

linear(v)

linear(v) (b)

(a)

Fig. 2. Integer function representation in HED (a) before normalization and (b) after normalization. x

x

2

x

x

2 x

24-8z+ 12y+12yz

-6-6xz 24-8z

x

y

y

y

z

z

-6

z

z

z

-1 3 24 -8 12 12

(b)

3

z

12+12z -6z

(a)

x

2 -6

-3

-6

1 (d)

(c)

Fig. 3. HED representation of 24-8z+12y+12yz-6x-6x2z: (a) decomposition w.r.t. x, (b) decomposition w.r.t. x and y, (c) decomposition w.r.t. x, y and z before normalization, and (d) after normalization. X if (c1) D = X × Y; else D = X + Y; if (c2) O = D - Z; else O = D × Z;

(a)

Y

Y y Z

Z

Z -1

c1 -1

c1

c1

c1 -1 c2

c2

-1 (b)

1

Fig. 4. (a) A simple behavioral code, and (b) its HED representation.

2.2 Modular Horner Expansion Diagram (M-HED) In order to verify polynomial datapaths over bit-vectors, we have extended the HED to manipulate modular arithmetic [5]. To make this paper self-contained, we briefly review the main idea behind the M-HED. Although the equivalence verification over Zm is known to be NP-hard when m≥2, analyzing polynomials over arbitrary finite integer rings and their properties are useful to deal with the equivalence checking problem. The theory of univariate

vanishing polynomials over Zm, m∈N, m>1; i.e. those polynomials f such that f(x) mod m ≡ 0, has been presented in [23]. Let us consider a simple example that defines functions f1[3:0] = 15(Y[3:0])3 - 5(Y[3:0])2 + 19Y[3:0] + 6 and f2[3:0] = 7(Y[3:0])3 + 3(Y[3:0])2 + 3Y[3:0] + 6. While f1 is not equivalent to f2 as polynomial functions over Z, they are equivalent over Z24, i.e. f1 mod 24≡ f2 mod 24. Computing their difference over Z24 results in f1[3:0] – f2[3:0] = 8(Y[3:0])3 - 8(Y[3:0])2 + 16Y[3:0]. While the result is non-zero polynomial, (8Y3 – 8Y2 + 16Y) mod 16 =0, ∀ Y ∈ {0, 1, …, 15} and we say 8Y3-8Y2+16Y vanishes in Z24. In general, it is not straightforward at all to see if given polynomials are vanishing ones or not. Actually a straightforward approach which expands everything into Boolean domain would not be efficient. To reduce polynomials over Z 2 n a useful function in number theory called Smarandache function S(m) is utilized which is defined for a given positive integer m as the smallest positive integer such that its factorial S(m)! is divisible by m. Generally, in the ring of interest, Z 2 n , let S(2n) = k, such that k is the smallest number satisfying 2n | k!. For example, S(8=23) = 4 as 8 divides 4! = 4×3×2×1 = 23×3. Obviously, 100×101×102×103 mod 23 = 0 and therefore we can say that product of 4 consecutive numbers, i.e., (x+1)(x+2)(x+3)(x+4), is divisible by 2 3. Hence, (x+1)(x+2)(x+3)(x+4) is reduced to 0 over Z 23 . The following theorem can be used to find out reduced polynomials from the original one. Theorem 1: The polynomial V(x) =



S (m) i =1

( x + i ) is equivalent

to 0 in Zm and it is called vanishing polynomial. Here S(m) denotes the Smarandache function. The proof is available in [25]. ■ This theorem says that if we can factorize a polynomial function V(x) into a product of S(m) consecutive numbers, then V(x) can be reduced to 0 in Zm. In [5], based on the properties of polynomial functions over finite integer ring [22][23], we have proved that if the polynomial cannot be described as such consecutive expressions, it may be reducible in terms of the degrees or the coefficients of the terms. In other words, to reduce a given polynomial, its terms are evaluated one by one to see whether they are degree reducible or coefficient reducible. Then a vanishing polynomial is subtracted from the term to reduce its degree or coefficient. Although the authors of [34] have also tried to optimize polynomial datapaths, they have used symbolic computer algebra tools like MAPLE to do so. In [36] we have shown that HED-based polynomial optimization indicates better performance in terms of memory usage and run time in comparison with computer algebra tools.

3 PROGRAM SLICING Program slicing [26] is a software engineering technique for extracting parts of a program which have an impact on a selected set of variables. Portions of the program that cannot affect these variables are discarded and hence a reduced set is obtained. This reduced set is called a slice. Program slicing can help reduce the size of the debugging problem significantly, and can be categorized into static slicing [26] and dynamic slicing [27]. The static slicing includes all statements that affect the value of a

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

variable v for all possible inputs at the point of interest, e.g. at the statement x, in the program. While dynamic slicing includes those statements that actually affect the value of a variable v for a particular set of inputs of the program. In order to formally represent RTL designs we make use of correlation graph. In this section, we briefly define several terminologies used in the rest of the paper. Definition 2 (Digraph): A digraph is a structure (N, E), where N is a set of nodes and E is a set of edges in an N×N processing. If (n, m)∈ E then n is an immediate predecessor of m, and m is an immediate successor of n. Definition 3 (Flowgraph): A flowgraph is a structure (N, E, n0), where N is a set of nodes (ni), E is set of edges, (N, E) is a digraph and n0 is a member of N such that there is path from n0 to all other nodes in N. Definition 4 (Path and Distance): A path in a digraph from a node n1 to a node n2 is a list of nodes p0, p1, ..., pk such that p0 = n1, pk = n2, and for all i, 1 ≤ i ≤ k – 1, (pi, pi+1) ∈ E. The minimum number of edges between n1 and n2 determines distance between n1 and n2. Definition 5 (Hammock Graph): A hammock graph as a special case of flowgraph, is a structure (N, E, n0, ne) with the property that (N, E, n0) and (N, E-1, ne) are both flowgraphs. n0 is the initial node and ne is the end node. Note that, as usual, E-1 = {(a, b)| (b, a)∈ E}[14]. There is a path from n0 to all other nodes in N. From all nodes of N, excluding ne, there is a path to ne. Definition 6 (Correlation Graph): A correlation graph is a structure (N, E, Z), where N is a set of nodes, E is a set of edges, and Z ⊆ N is set of output variables. The edge from node ‘A’ to node ‘B’ shows that ‘B’ is dependent to ’A’.

excluded from the static and dynamic slicing as indicated by white circles. Because windex is zero, if part of the conditional statement is executed and then nodes n4 and n5 are excluded from dynamic slicing. Please note that multiple occurrences of the same variable are represented with a single node in the correlation graph in order to reduce the size of the graph as much as possible. For example, consider variable aar in Fig. 5. We can see in column HDL code of Fig. 5 that two different RTL lines (lines 3 and 8) are responsible to identify the value of aar. While we just have one node aar in the correlation graph so that its function is determined as tmr×(1windex)+(tmr×c-tmi×s)×windex where windex is a Boolean variable. This way, we not only reduce the memory usage but also can trace the paths to find error candidates as quickly as possible. It is clear that by static slicing rather than dynamic slicing, more statements are involved, because it keeps the behavior of the original program for all possible input values. However, those works that make use of dynamic slicing cannot ensure whether the correction is still valid or not, when new counterexamples are applied to the design under verification.

Unlike hammock graph which is single entry-single exit graph, correlation graph can have several inputs/outputs. Definition 7 (Impaction): Node ’A’ has an impact on node ‘B’ in hammock graph or correlation graph if there is a path from node ‘A’ to node ‘B’. Definition 8 (Independent Node): Node ’A’ is independent of node ‘B’ in hammock graph or correlation graph if there is no path from nodes ‘B’ to node ‘A’. The procedure of constructing a hammock graph from an HDL code is similar to that from a software code which is found in [26, 28]. To construct a correlation graph, first of all a symbolic simulation is carried out to extract a sequence of statements from the RTL code. Then a static slicing is started from those statements that assign symbolic values to the primary outputs. During this phase, correlation graph is constructed. Example 2: Fig. 5 shows a simple Verilog code in HDL code column. Suppose that tmr, tmi, len, c, s and windex are inputs and out is an output. Moreover suppose that the value of windex input is zero in the case of dynamic slicing, while other inputs have specific values. The second and third columns represent static and dynamic slicing for primary output out while the last two columns show hammock graph and correlation graph of the given HDL code. Black circles in the second and third columns show which statements influence out, while white circles show statements that do not have any effect on out. It can be seen in the figure that node n3 (len1= len statement in HDL code) does not affect out, so it is

4 PROPOSED DEBUGGING APPROACH

Fig. 5. Static slicing, dynamic slicing (when windex is zero), Hammock Graph and Correlation Graph of a given HDL code.

In our framework the error-debug-correction problem is represented with: 1) an RTL code containing one of more bugs that is composed of variables and operations on them, and 2) a high level description of what should be implemented as a golden model. The objective is to identify a minimal number of variables in the RTL code that are responsible for the design’s erroneous behavior. Furthermore, although errors can be corrected by modifying the statements related to these variables, in this paper we propose an auto-correction mechanism as well. Fig. 6 shows the proposed RTL debugging and correction system which consists of four main phases: 1) equivalence checking and finding buggy outputs, 2) deriving a reduced set of error candidates, 3) ranking the error candidates and 4) mutation based correction along with dynamic ranking. Three first phases are explained in more details in the following subsections while the last phase will be explained in Section 5. 4.1 Equivalence Checking and Finding Buggy Outputs (Phase 1) The first phase of our verification and debugging method is equivalence checking between the specification and implementation of the design [4]. The pseudo code shown in Fig. 7 demonstrates the steps of this phase. Assume a specification (MS) and an implementation (MI) is given. Note that the specification is a C-code and the implementation is often modeled with fixed-size

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

datapath architectures, so that polynomial computations are carried out over n-bit integers where the size of the entire datapath is fixed by way of signal truncation.

Example 3: Suppose a correct implementation of a 4-point FFT (FFT4) is given. Fig. 8 depicts its correlation graph (CGI). As can be seen in the figure, the design has ten inputs (inp0, …, inp9), and eight outputs (out0, …, out7). We deliberately injected three bugs in the implementation which directly affect n9, n15, and n21, as indicated in Fig. 8. These errors can be any form of design errors in the RTL code. After looking for equivalent outputs using the MHED the subtraction results of correspondent output polynomials are obtained as follows: subr0 = out0spc - out0imp = inp2×((-2)×inp9);

subr4 = out4spc - out4imp = 0

subr1 = out1spc - out1imp = inp5×(2×inp8)+(inp2×((-2)×inp6 + (2×inp7))) subr2 = out2spc - out2imp = inp2×(2×inp9);

subr6 = out6spc - out6imp = 0

subr3 = out3spc - out3imp = inp5×(2×inp9);

subr7 = out7spc - out7imp = 0

subr5 = out5spc - out5imp = 2×inp8 + 2×inp9

As mentioned before, those primary outputs that are not equivalent are taken into account as buggy outputs set. Therefore, buggy outputs set BOs = {out0, out1, out2, out3, out5} is determined as highlighted in Fig. 8. EquivalenceChecking (MI,MS) 1 2 3 4 5 6 7 8 9

Fig. 6. Proposed debugging and correction system for RTL datapath designs. Note that square boxes show the actions to be taken and round boxes show how to perform them. For example, Modular Equivalence Checking is performed by means of M-HED.

First of all, HEDs of specification (HS) and implementation (HI) are constructed (lines 1-2 of Fig. 7). To do so, a symbolic simulation is done which leads to a list of assignments for the specification and the implementation. This list completely describes the behavior of specification and implementation. Then by using M-HED we check whether HI and HS are equivalent or not. If so, the implementation is bug free and the algorithm finishes (lines 4-5). Otherwise, they are not equivalent and we should debug the implementation to find potential error locations. To do so, we extract Outputs Polynomial in the specification (OPs) and Outputs Polynomial in the implementation (OPi) (lines 7-8) and subtract the corresponding polynomials for each output (OPsi OPii). These outputs describe the functionality of the design in terms of polynomials. If the error effects manifest themselves on some primary outputs, their subtraction result would be not zero and therefore they are taken into account as a member of Buggy Outputs set (BOs) (line 9). Note that since buggy outputs are dependent on symbolic values of primary inputs and not concrete values specified by test stimuli, constructive and destructive interferences which depend on test stimuli are not occurred in our work. Hence, buggy outputs are good indicators to correct the designs.

HS = HED of MS; HI = HED of MI; Equivalence checking of HS and HI using M-HED; IF (HI and HS are equivalent) RETURN bug free implementation; ELSE OPs = outputs polynomial in the specification; OPi = outputs polynomial in the implementation; BOs = buggy outputs set which its members are non-zero results of (OPsi - OPii); Fig. 7. Phase 1: equivalence checking and finding buggy outputs.

Fig. 8. Correlation Graph (CGI) of a 4-point FFT.

4.2 Deriving a Reduced Set of Error Candidates(Phase 2) In the second phase, we try to derive a set of error candidates that is then reduced as much as possible. Fig. 9 illustrates the algorithm to obtain a reduced set of error candidates. In order to determine a set of error candidates, the basic idea is to find those intermediate nodes that manifest themselves on the buggy outputs. To do so, for each BOsi ∈ BOs, obtained in Phase 1, the correlation graph is backward traversed to find those intermediate nodes (EINBOsi) that affect it (line 3 of Fig. 9). The union of these nodes is called Potential Error Locations (PELs) as described in line 5 of Fig. 9. As mentioned in Section 3, in the worst case, the size of the static slicing equals to that of the original program. In this case, PELs may still contain many error candidates which are not true

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

bug locations of the observed errors. To extract a Reduced Potential Error Locations set, RPELs, the difference between the implementation and the specification is determined by subtracting the polynomials of each element in BOs and their corresponding outputs in the specification, as performed in the previous phase, to find Effective INPuts that affect each buggy output (EINPBOsi) (line 4). Since the HED, like other decision diagrams, is able to describe the primary outputs in terms of the primary inputs, this set indicates an effective set of primary inputs that have an impact on buggy outputs. Then the correlation graph is forward traversed from these primary inputs to find those Intermediate Nodes that are affected by them (INEinpi) (lines 6-7). Their union gives those Intermediate Nodes that are Affected by primary Inputs obtained from Subtraction Results of Phase 1 (INAISRs) (line 8). In other words, this set indicates those intermediate nodes that are indirectly affected by buggy outputs. The intersection of EINBOsi and INEinpi gives the Reduced Potential Error Locations set for each buggy output (RPELsboi) (line 9). Furthermore, the intersection of PELs and INAISRs indicates a reduced potential error location set (RPELs) (line 10). DerivingRPELSET (CGI, BOs) 1

INSRSi = an empty set;

2

FOR (each BOsi ∈ BOs)

//backward path tracing

3

EINBOsi = Intermediates nodes that affect BOsi;

4

EINPBOsi = Primary inputs that affect BOsi;

∪ EINBOsi

5

PELs =

6

FOR (each EINPBOsi)

7

//forward path tracing

INEinpi = Intermediate nodes affected by EINPBOsi;

∪ INEinpi

8

INAISRs =

9

RPELsboi = EINBOsi ∩ INEinpi;

10 11

RPELs = PELs ∩ INAISRs; RETURN RPELsboi and RPELs; Fig. 9. Phase 2: deriving reduced potential error candidates.

Example 4: Let us consider the correlation graph (CGI) in Fig. 8 again. By traversing it, those intermediate nodes (EINBOsi) that affect buggy outputs are obtained as follows: For out0 : EINBOs0 = {n2, n6, n7, n8, n9, n10, n14, n22} For out1 : EINBOs1 = {n1, n5, n7, n8, n9, n10, n13, n21} For out2 : EINBOs2 = {n2, n6, n7, n8, n 9, n10, n14} For out3 : EINBOs3 = {n1, n5, n7, n8, n9, n10, n13}; For out5 : EINBOs5 = {n3, n11, n15}

Hence, PELs= ∪ EINBOsi = {n1, n2, n3, n5, n6, n7, n8, n9, n10, n11, n13, n14, It is obvious that 16% (= 3/18) reduction is obtained because the number of initial error candidates is 18 while |PELs| is 15. On the other hand, based on subri computed in Phase 1 (see Example 3), the effective primary inputs on each buggy output (EINPBOsi) are determined as follows: n15, n21, n22,}.

For out0 : EINPBOs0 = {inp2, inp9};

For out2 : EINPBOs2 = {inp2, inp9}

For out1 : EINPBOs1 = {inp2, inp5, inp6, inp7, inp8} For out3 : EINPBOs3 = {inp5, inp9};

For out5 : EINPBOs5 = {inp8, inp9}

According to such information, we will be able to indicate those intermediate nodes (INEinpi) that are affected by primary inputs determined in the previous step as follows: From EINPBOs0 = {inp2, inp9}



INEinp2,9= {n8, n9, n11, n13, n14, n15, n17, n19, n21, n22} From EINPBOs1= {inp2, inp5, inp6, inp7, inp8} ⇒ INEinp2,5,6,7,8= { n7, n8, n9, n10, n11, n12, n13, n14, n15, n16, n17, n18, n19, n21, n22} From EINPBOs2 = {inp2, inp9} ⇒ INEinp2,9= {n8, n9, n11, n13, n14, n15, n17, n19, n21, n22} From EINPBOs3 = {inp5, inp9} ⇒ INEinp5,9= {n7, n9, n11, n13, n14, n15, n17, n19, n21, n22} From EINPBOs5 = {inp8, inp9} ⇒ INEinp8,9= { n9, n11, n13, n14, n15, n17, n19, n21, n22}

Hence, INAISRs = ∪ INEinpi= {n7, n8, n9, n10, n11, n12, n13, n14, n15, n16, and also we will have:

n17, n18, n19, n21, n22}

For out0 ⇒ RPELsbo0 = EINBOs0 ∩ INEinp2,9 = {n8, n9, n14, n22} For out1 ⇒ RPELsbo1 = EINBOs1 ∩ INEinp2,5,6,7,8 = {n7, n8, n9, n10, n13 ,n21} For out2 ⇒ RPELsbo2 = EINBOs2 ∩ INEinp2,9 = {n8, n9, n14} For out3 ⇒ RPELsbo3 = EINBOs3 ∩ INEinp5,9 = {n7, n9, n13} For out5 ⇒ RPELsbo5 = EINBOs5 ∩ INE8,9 = {n11, n15}

This way, we are able to figure out which intermediate nodes affect which buggy outputs as listed below: n7 ⇒ {out1, out3}; n8 ⇒ {out0, out1, out2}; n11 ⇒ {out5} n9 ⇒ {out0, out1, out2, out3}; n13 ⇒ {out1, out3}; n10 ⇒ {out1 n14⇒ {out0, out2}; n15 ⇒ {out5}; n21 ⇒ {out1}; n22 ⇒ {out0} Therefore RPELs = PELs ∩ INAISRs = {n7, n8, n9, n10, n11, n13, n14, n15, n21, n22}. Obviously, 44% (= 8/18) reduction is obtained because |RPELs| is 10 as shown in Fig. 10. 4.3 Ranking Error Candidates through Priority Criterion (Phase 3) Although Phase 2 greatly reduces the number of error candidates, there may still be a large number of potential error locations, especially in the case of large industrial designs. Identifying the true design errors by examining all error candidates one by one requires a huge amount of run time and effort. To alleviate this problem, we introduce a priority criterion to rank the error candidates. Let TNBO and NBOAi be the total number of buggy outputs and the number of buggy outputs affected by a specific error candidate RPELsi ∈ RPELs, respectively. Also let IDSNBOj be the inverse of the distance between the suspected node and jth buggy output. Equation (4) defines a weighted function in terms of Buggy outputs affected by the Suspected Node (BSN) and the average of IDSNBOj (AIDSNBO). Please note that, BSN is easily obtained from RPELsboi set computed in Phase 2 and AIDSNBO is obtained from RPELsboi set and the correlation graph. (4)

PriorityRPELsi = W1 × BSN + W2 × AIDSNBO BSN

= NBOAi/TNBO ; AIDSNBO =

TNBO



j =1

IDSNBO j / NBOAi

The basic idea behind defining these terms is the fact that if an intermediate node affects more buggy outputs it has more chance to be a true error location and hence should have a relatively higher rank. The second term also indicates that if a suspected node is closer to buggy outputs, it has more chance to be a true error, because it has less interaction with other nodes in its path to the outputs. In order to show the importance of each term, i.e., BSN or AIDSNBO in (4), W1 and W2 weights have been defined that help us to find a tradeoff between two terms. Furthermore, nodes with the same priority are grouped in the same class of priority as illustrated in Fig. 11. They are named First, Second, Third, and

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

Other Class of Priorities (FCPs, SCPs, TCPs, OCPs).

Fig. 10. Reduced potential error locations – Example 4.

Fig. 11. Ranking error candidates (nodes with same priority have been circled).

Example 5: Let us consider RPELs set computed in Phase 2 (see Example 4). The priorities are as follows when W1=W2=1: For n7 : PriorityRPELs7 = For n8 : PriorityRPELs8 = For n9 : PriorityRPELs9 = For n10 : PriorityRPELs10 = For n11 : PriorityRPELs11 = For n13 : PriorityRPELs13 = For n14 : PriorityRPELs14 = For n15 : PriorityRPELs15 = For n21 : PriorityRPELs21 = For n22 : PriorityRPELs22 =

1 × (2/5) + 1 × (1/3+1/2)/2 = 0.8 1 × (3/5) + 1 × (1/3+1/2+1/3)/3 = 1.0 1 × (4/5) + 1 × (1/3+1/2+1/3+1/2)/4 = 1.2 1 × (1/5) + 1 × (1/3) = 0.5 1 × (1/5) + 1 × (1/2) = 0.7 1 × (2/5) + 1 × (1+1/2)/2 = 1.1 1 × (2/5) + 1 × (1+1/2)/2 = 1.1 1 × (1/5) + 1 × (1) = 1.2 1 × (1/5) + 1 × (1) = 1.2 1 × (1/5) + 1 × (1) = 1.2

Obviously, we will have four classes of priorities: FCPs = {n9, n15, n21, n22}, SCPs = {n13, n14}, TCPs = {n8} and OCPs = {n7, n10, n11}. Note that the actual bugs belong to FCPs. Moreover, one of the nodes with the highest priority (say n9) is chosen as a suspected node for correction mechanism.

5 PROPOSED AUTO-CORRECTION MECHANISM ALONG WITH DYNAMIC RANKING (PHASE 4) After identifying error candidates, the next phase is to localize and then correct bugs. In practice, engineers often find error diagnosis more difficult than error correction. It is common that engineers need to spend days or weeks finding the cause of a bug. However, once the bug is identified, fixing it may only take a few hours. It should be noted that in our framework, we keep the correspondence between nodes in the correlation graph and variables in the statements of the RTL code so that when a node in the correlation graph is selected as an error candidate the related statement in the RTL code is easily determined. As shown in Fig. 6, the highest ranked candidate is selected and then removed from RPELs. Afterward all mutations of this candidate are computed and then the mutant that can correct a maximal number of buggy outputs is chosen. According to selected mutant the implementation is changed and since this mutation can correct some buggy outputs, the priority criterion for ranking is dynamically changed and therefore needs to be recomputed. After

that the node with the highest priority is selected and the abovementioned process is repeated until the design becomes corrected. Please note that if all candidates are processed or the run time is over the limit but the implementation is still not corrected, we would say that our method is not able to correct the design. Definition 9: We say that the implementation is correct if and only if it is equivalent to the given specification; otherwise, we say that the implementation is still buggy. Mutation testing [19][24] was originally developed for constructing a set of tests which distinguish between a given software program and any nonequivalent program and for measuring the quality of test cases. The nonequivalent program is generated using the given one and applying the mutation transformation. In this paper, we make use of the idea of mutation testing to automatically correct datapath intensive designs. We follow an error-matching based correction mechanism, which would be capable of modeling design errors. In the following, we define polynomial mutation formally and then give an autocorrection mechanism. A valid mutation transformation (E1, E1’) maps an expression E1 to a different expression E1’≠E1. We distinguish two types of mutations: 1) replacement of an arithmetic, logical or assignment operator by another operator of the same class, 2) eliminating/adding arithmetic, logical or assignment operator. The set of Verilog mutations used in this work consists of: 1) arithmetic operator replacement (+, -, *), 2) relational operator replacement (==, !=, >, =, ) to represent the mutant MI’. (MI , < (E1, E1’), …, (En, En’) >) denotes a sequence of n mutations of MI where first (E1, E1’) is applied, then (E2, E2’) on the resulting mutant, and so on. Definition 11 (Mutation Diagnosis): Let MS be a specification and MI be an implementation of MS. A mutant MI’ = (MI , < (E1, E1’), …, (En, En’) >) is a mutation diagnosis iff (if and only if) MI’ is functionally equivalent to MS. Based on the above-mentioned definition, it is obvious that one efficient way to correct MI is to compute a mutation diagnosis of MI. The following subsections explain how to compute all possible mutants for reduced error candidates and how to automatically correct the buggy implementation. 5.1 Mutation Computation In order to compute all possible mutations of a given implementation, the basic mutation functions are applied into statements that lead to a new implementation. To compute mutants of each member of a given statement AllMutations algorithm illustrated in Fig. 12 is used which requires only the buggy implementation MI. Mutations for assignments include changing the target variable (lines 3-4 of Fig. 12), and mutating the expressions (line 5). To compute mutants for a given expression, E, the algorithm replaces variables and constants by other variables, constants or even expressions (lines 5-9 of Fig. 12).

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

Operators are to be changed by other operators (lines 13-14), and expressions with an operator are reduced to one argument (lines 11-12). In addition, this algorithm allows for adding an additional operator to an existing one (lines 8-9). Moreover, unary operators are to be changed by other unary operators (lines 15-19 of Fig. 12). AllMutations (MI) 1 M = an empty set;

equivalent (lines 13-14). In this case, the errors are not only localized but also corrected according to the modification defined by AMi. If the algorithm, however, fails to find a bug free implementation based on Ai, the process is repeated with another highest ranked element of ROPELs (lines 2-17). If the run time is over the limit and no correction is found, we have to say that our debugging and correction technique is not able to localize such a bug (line 18 of Fig. 13).

2 FOR(all assignment statement S∈MI of the form X= E) 3

FOR (all variables Y ≠ X)

MutationDiagnosis(MI, MS, ROPELs)

4 5

Add (MI, < (X,Y) >) to M; IF (E is a variable or constant)

1

maxCi=0

2

WHILE (ROPELs ≠ ∅ and Time is not over the limit)

6

FOR (all variables Y ≠ E)

3 4

Ai = Highest ranked element of ROPELs; AM = AllMutations(Ai);

5

FOR (all mutants AMi ∈ AM)

7

Add (MI, < (E,Y) >) to M;

8

FOR (all variables X and operators ⊕)

9 10 11

Add (MI, < (E,E ⊕ X) >) to M; ELSE IF(E is of the form X ⊕ Y) Add (MI, < (E,X) >) to M;

12

Add (MI, < (E,Y) >) to M;

13

FOR (all operators ⊕’ ≠ ⊕)

14

Add (MI, < (E,X ⊕’ Y) >) to M;

6

MI’ = (MI, AMi);

7

BO = EquivalenceChecking(MI’,MS);

8 9

Ci = Number of all outputs - |BO|; IF (Ci > maxCi)

10

maxCi = Ci;

11

SelectedMI = MI’;

12 13

15

ELSE IF(E is of the form X ⊕ or ⊕ X)

16

Add (MI, < (E,X ⊕’) >) to M;

14

17

Add (MI, < (E,⊕’ X) >) to M;

15

18

Add (MI, < (E,Y ⊕’) >) to M;

16

19 Add (MI, < (E,⊕’ Y) >) to M; 20 RETURN M;

Fig. 12. AllMutations procedure to compute all possible mutants.

5.2 Mutation-based Correction Technique As mentioned before, mutation diagnoses are mutants making the implementation (MI) equal to the specification (MS). In other words, finding such mutants provide insightful suggestions for correcting diagnosed errors and therefore bug localization and bug correction are performed simultaneously. MutationDiagnosis algorithm depicted in Fig. 13 is used to compute such diagnoses. It takes as inputs the implementation (MI), the specification (MS) and the reduced ordered potential error locations computed in Phase 3 (ROPELs). Since the number of existing bugs is unknown, the algorithm starts finding errors one by one and then tries to correct them until the implementation becomes functionally equivalent to the specification. The algorithm starts by computing all possible mutants of the highest ranked member of ROPELs (lines 3-4 of Fig. 13). Then, for each mutant AMi, the modified implementation (MI’ in line 6) is examined against MS by calling EquivalenceChecking(MI’,MS), explained in Fig. 7, and it is checked how many buggy outputs would be corrected (lines 7-8). The mutant that can correct the maximum number of buggy outputs is selected and the implementation is modified accordingly (lines 10-12). Then DerivingRPELSET algorithm is called by passing the correlation graph of the modified implementation (CGSELECTEDMI) and correspondent buggy outputs set (SELBOs), to obtain new RPELs and to compute priority criterion for remaining members of ROPELs (lines 16-17). This process is repeated until the specification and the implementation become functionally

SElBOs = BO; IF (|Bo| = 0) Bugs have been fixed! RETURN SelectedMI; ELSE

TRPELs = DerivingRPELSET(CGSELECTRDMI,SELBOs);

17 ROPELs = Rank TRPELS; //dynamic ranking 18 RETURN BUGs cannot be fixed!

Fig. 13. MutationDiagnosis algorithm to automatically localize and correct bugs.

In order to deal with multiple bugs MutationDiagnosis algorithm is extended and therefore MutationDiagnosisN algorithm is introduced. This algorithm acts similar to MutationDiagnosis except in line 4 it makes use of AllMutationsN(Ai) instead of AllMutations(Ai) and in each iteration of outer while loop it increments the value of N (the number of simultaneous mutations or bugs). Please note that AllMutationsN is similar to AllMutations except it takes into account N suspicious nodes simultaneously (N = 2, 3, …) and compute all their combinations. Example 6: Let us consider the previous example again where the highest priority node is n9. By changing the variables which have effect on n9, no change in ROPELs is observed. However, by changing the operator between the nodes in the implementation (e.g. changing – to +) it can be seen that out0, out2, and out3 are omitted from BOs. After that, DerivingRPELSET is called to compute new ROPELs which are dynamically ranked. In this case, ROPELs becomes {n10, n15, n21} and their priorities would be as follows: For n10 : PriorityRPELs10 = 1 × (1/2) + 1 × (0) = 0.5 For n15 : PriorityRPELs15 = 1 × (1/2) + 1 × (1) = 1.5 For n21 : PriorityRPELs21 = 1 × (1/2) + 1 × (1) = 1.5

Obviously, we would have two classes of priorities: FCPs = {n15, n21}, and SCPs = {n10}. Note that the actual bugs belong to FCPs. Moreover, one of the nodes with the highest priority (say n15) is chosen as a suspected node for correction mechanism.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

Finally, the new ROPELs = {n21} is obtained that is easily repaired by utilizing MutationDiagnosis. It is obvious that if we had exhaustively checked mutants of all statements in the RTL code, the number of mutants would be increased exponentially especially in the case of multiple errors. By using RPELs and ranking its elements, however, the number of error candidates to be checked is significantly reduced. Furthermore, ordered RPELs enables us to apply multiple mutants simultaneously and find the corrected design quickly. Moreover, because we make use of the formal equivalence checking platform instead of prohibitive time-consuming simulation-based equivalence checking to find out whether the implementation is correct or not, we are able to automatically and quickly check the correctness of the implementation.

6 EXPERIMENTAL SETUP AND RESULTS In order to demonstrate the effectiveness of the proposed debugging technique with auto-correction mechanism, several industrial designs have been employed. These are AlphaBlending as an algorithm in image processing which blends the foreground picture with the background one, Differential Equation (DIFFEQ), Finite Impulse Response (FIR) as the most common digital filter, Sobel as a convolution algorithm which is a core of many image processing algorithms, Fast Fourier Transform with three sizes (FFT32, FFT256, FFT512), and (Inverse) Discrete Cosine Transform (DCT and IDCT). These designs come from a variety of problem domains such as mathematics, digital signal processing, digital image processing, and multimedia. The M-HED and fully automated proposed approach have been implemented in C++ and carried out on an Intel 2.8 GHz Corei5 with 3 GB main memory running Linux with Qt Creator as a cross-platform C++ integrated development environment (IDE) and the time out is set to 1000 seconds. We randomly injected typical RTL design errors such as false state transition, incorrect assignment, and incorrect operator, which change the functionality of the design. Note that such design errors are most common error categories in digital circuits specified at the RTL [38]. Moreover, because datapath RTL designs are considered in this work, most of the errors are somehow related to the arithmetic parts. Hence, our focus for design error correction is on arithmetic operations and the mutants are selected accordingly. The input of our algorithm is a list of assignments obtained from the implementation as well as the specification. To obtain such assignments, we have translated execution of specification and implementation into several assignments by using symbolic simulation. In symbolic simulation, symbols rather than an integer or binary values are used as input vectors to simulate a specification or RTL implementation. Symbolic expressions would then be propagated rather than actual values from inputs to outputs. In symbolic simulation, the loops are unrolled, controlling statements are removed and the array indexes are adjusted so that multiple assignments to a single variable are not happened while data dependencies are preserved. The result of symbolic simulation is a list of assignments which exactly mimics the behavior of the specification and implementation. To perform symbolic simulation, the RTL implementation is modeled by a Finite State Machine with Datapath (FSMD) [4][31][39]. The FSMD adds a datapath including variables and operators on communication to

the classic FSM and is represented as a transition table, where we assume each transition is executed in a single clock cycle. Operations associated with each transition of this model are executed in a sequential form. Each controller transition is defined by the current state, the condition to be satisfied and a set of operations or actions. The condition evaluated true will determine the transition to be done and thus the actions to be executed. As a result, the hierarchy is flattened and loops are unrolled. The list of obtained assignments can be used as an alternative design for equivalence checking. All results are averaged over ten runs of each experiment with W1 = W2 = 1 (i.e., the importance of BSN and AIDSNBO terms in (4) are the same). Table 1 tabulates the results for different number of errors. Benchmark and #LS columns show the benchmark’s name and the number of lines obtained after symbolic simulation, respectively. Please note that #LS column also represents the initial search space for error candidates. Major columns Single Error and Multiple Errors indicate the results for single error and multiple errors, respectively. Minor columns ReducedPEL and ReducedRPEL exhibit the amount of reduction obtained by using potential error locations set (PELs) and reduced potential error locations set (RPELs), respectively. Minor column Time states the run time in seconds for equivalence checking, debugging and ranking. Minor column SAT time indicates the run time in seconds for debugging when the SAT-based debugging technique in [7] is employed. The results show an average speedup of 7.1× in comparison with the SAT-based debugging technique [7]. The sub column #FEC represents the number of final error candidates (the number of lines in the RTL code) after mapping the list of assignments into the RTL code. Moreover, GenHED+EC column indicates the time needed to construct HED and perform equivalence checking. The sub column #Errors shows the number of errors injected in the case of multiple errors. As can be seen in the table on average by finding buggy outputs, using static slicing and backward path tracing and obtaining PELs, 64% and 34% reduction in terms of the number of error candidates can be obtained for single error and multiple errors, respectively. Furthermore, RPELs obtained by forward path tracing and intersection the result set with PELs reduces the search space for single error and multiple errors by 85% and 74%, respectively. Time column shows that, on average, it takes 147.5 and 220.9 seconds to debug the circuits with single error and multiple errors, respectively. In other words, we are able to debug the large industrial circuits in a short time. The worst case results in terms of the number of error candidates, i.e., #FEC, are related to DCT and IDCT benchmarks. This is because the structure of these designs consists of functions with successive dependent arithmetic operations on the assignments. Fig. 14 demonstrates the effectiveness of our error candidate ranking mechanism for single and multiple errors. In these experiments, we have obtained some candidates with the same priorities. One can infer from Fig. 14(a) that our proposed ranking mechanism for single error works well because in the best cases (AlphaBlending, DIFFEQ, Sobel, FFT32 and FFT256 benchmarks) true errors are found in the first class of priorities (FCPs) while in the worst cases (FIR, DCT, IDCT and FFT512 benchmarks) on average only 4.5% of the errors can be found in the second class of priorities (SCPs). For multiple errors shown in Fig. 14(b), we have observed that in the best cases

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

(Alphabelending and DIFFEQ), all errors occur in FCPs and in the worst case (FFT512) 68% of the errors occur in the FCPs, 20% of

them in SCPs, 9% in TCPs and 3% in OCPs.

TABLE 1 EXPERIMENTAL RESULTS OF OUR DEBUGGING APPROACH FOR SEVERAL BENCHMARKS (TIME IS GIVEN IN SECONDS; TIME OUT (TO) IS SET TO 1000 SECONDS). Benchmark

#LS

Alphablending DIFFEQ

Single Error #FEC GenHED +EC (s) 5 0.4

100

Reduce dPEL 82%

Reduce dRPEL 95%

111

79%

88%

4

0.4

Sobel

587

52%

74%

11

FIR

1560

47%

71%

8

DCT

1620

55%

86%

Multiple Errors Reduced #FEC RPEL 91% 5

Time (s) 48.1

SAT time (s) 146.5

#Error s 2

Reduced PEL 81%

Time (s) 64.3

SAT time (s) 533.6

49.6

180.9

2

75%

85%

6

68.2

733.4

1.8

85.5

445.6

2

27%

5.1

155.8

786.2

4

19%

62%

13

137.8

TO

72%

15

232.3

14

5.2

171.1

TO

5

TO

22%

72%

23

238.8

TO

IDCT

592

54%

84%

15

1.9

88.1

882.0

5

15%

65%

21

183.6

TO

FFT32

533

72%

93%

10

1.7

75.5

530.6

3

24%

81%

12

123.9

TO

FFT256

6638

71%

84%

13

13.7

225.5

TO

3

19%

75%

15

328.8

TO

FFT512

30170

69%

85%

14

58.4

428.7

TO

4

21%

64%

21

610.4

TO

Average

6436.6

64%

85%

10.6

9.8

147.5

1052.4

3.2

34%

74%

14.5

220.9

1596.3

100

100

100

100

95

94

93

100

100

97

80 60 40 20

7

6

5

3

0

FCPS

SCPS

(a) 100 100

100

90

88

81

80 71

80

68

68

60 40 20 12

20

19 10

21 11

532

568

20 9

3

0

FCPS

SCPS

TCPS

OCPS

(b) Fig. 14. Ranking results for: (a) single error, (b) multiple errors.

In order to demonstrate the effectiveness of the proposed correction mechanism, in another experiment, we have applied our proposed mechanism to several industry benchmarks. Table 2 represents the results for single errors. Columns #Mutants and Time present the number of mutants to be checked and the processing time for correction, respectively. Please note that the HED constructing time is also taken into account in the results, i.e. column Time. In addition, Success Rate column shows how many cases of different runs could successfully correct the design. For example 90 indicates that in 9/10 cases we could automatically correct the circuit. Unsuccessful tries are due to time out (TO = 1000 seconds) or the proposed technique was not able to correct the circuit after taking into account all error candidates. As can be seen in the table, for single error in all cases, our proposed method can definitely correct the circuit. The last row in Table 1 shows that on average 6.6 mutations need to be checked for single errors.

One may ask this question: is the proposed correction mechanism practical when for each mutant a new HED needs to be constructed and then equivalence checking should be run? In other words, is not the amount of equivalence checks has to be run large? The answer is no because each time the design is modified according to a mutant, instead of constructing a new HED, we modify the previous one in such a way that the HED representation of all assignments that are not affected by the mutant as well as equivalent nodes are taken into account again. Among the advantages of M-HED, its ability to cache the data during constructing nodes is used to distinguish isomorphic sub-graphs, due to the canonical form. In other words, during creating a node, the M-HED checks whether it is created before or not. If so, the generated node is shared and therefore it is not necessary to be regenerated. In order to show the complexity of using M-HED to do the equivalence checking we have run the benchmarks several times with different changes in their RTL codes. In each run, only one statement of the original Verilog code has been changed and its corresponding M-HED is constructed. For example in the case of FFT512 benchmark, the results show that the run time to construct its M-HED is more than 26 seconds for the first time while its M-HED construction for the 10th run takes less than 20 seconds. The HED of those assignments that are affected by the mutant needs to be reconstructed and therefore the equivalence of new constructed nodes should be checked. Obviously, the average time needed to reconstruct the HED and perform equivalence checking for each mutant is obtained by Time / #Mutants. In the case of multiple errors, we have considered three categories of error distributions: 1) multiple errors occurred on independent nodes, 2) multiple errors occurred on one or two lines of the RTL code, and 3) multiple errors occurred on dependent nodes. The results are reported in Tables 3, 4 and 5. In these tables two major columns Proposed Method with Incremental Ranking and Proposed Method without Incremental Ranking give the results when the ranking of potential errors is dynamically changed in various iterations of the correction phase and when the ranking

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

of potential errors is statically computed once and used several times during the correction phase, respectively. TABLE 2 EXPERIMENTAL RESULTS OF THE PROPOSED CORRECTION APPROACH (SINGLE ERROR). Single Error Benchmark

#Mutants

Time (sec)

Success Rate (%)

Alphablending

3

6.1

100

DIFFEQ

4

7.6

100

Sobel

6

22.5

100

FIR

7

24.3

100

DCT

9

51.5

100

IDCT

11

57.1

100

FFT32

4

18.7

100

FFT256

7

34.8

100

FFT512

9

51.1

100

Average

6.6

30.4

100

Case 1: Multiple errors on independent nodes; in this case, neither debugging nor the correction of an error has effect on another error. Hence, our proposed method shows its efficiency. That is because by correcting the highest ranked error candidates, some buggy outputs are corrected and the remained candidates are ranked again and most likely their priorities will increase. By continuing in this way, all buggy outputs are corrected. Table 3 shows the results where column # Errors represents the number of injected errors. It can be seen in the table that our proposed method corrects the circuits in 71.9 seconds on average in all cases while without employing dynamic ranking mechanism on average it spends 209.4 seconds to correct all bugs. In other words, by using the dynamic ranking mechanism the correction time is improved by 66%. It should be noted that for IDCT, FFT256 and FFT512 benchmarks Success Rate is less than 100 which means that without using the dynamic ranking mechanism we are not able to correct all bugs in these benchmarks. Case 2: Multiple errors on one or two lines of RTL code; if multiple errors occur on one or two lines of the RTL code (e.g. five

errors are injected in two RTL lines), the proposed correction approach with dynamic ranking exhibits its capability. Table 4 shows the results. In summary, the proposed method with dynamic ranking mechanism is able to correct the design in 143.1 seconds while without dynamic ranking mechanism the correction time is 212.7 seconds. On average, the amount of reduction in the correction time is 33%. Again, the average time required to reconstruct the HED and perform equivalence checking for each mutant is obtained by Time / #Mutants. Case 3: Multiple errors on dependent nodes; in this case, we have considered the errors on dependent nodes. Table 5 shows the results for different benchmarks. By means of dynamic ranking mechanism no improvement in terms of the correction time is achieved. Please note that the average time needed to reconstruct the HED and perform equivalence checking for each mutant is obtained by Time / #Mutants. Obviously, our proposed method can significantly reduce the number of error candidates and rank them in a short time. This method is considerably faster and more scalable than gate-level diagnosis and correction technique such as that proposed in [3] [39]. This is because errors are modeled at a higher level of design abstraction and also much more complicated polynomials can be quickly processed. It should be noted that one simple error in the RTL design can be translated to several errors at the gate level and therefore debugging at gate level is difficult and limited to some specific errors [2]. In general, our experiments with industrial designs demonstrate that our debugging technique is efficient and scalable. In particular, the designs up to 30,000 unrolled lines of code which is the size that an engineer actively works on can be diagnosed and then corrected within minutes with high accuracy. These results show that our technique can be applied to complex designs. The only case that the proposed correction mechanism is not able to carry out the correction is when some statements (not operators) are added (removed) to (from) the implementation. However, as reported in [38], such design errors which cause adding (removing) a statement to (from) the implementation rarely happen in real RTL designs and that is why we have not considered them in this work.

TABLE 3 EXPERIMENTAL RESULTS OF THE PROPOSED CORRECTION APPROACH IN CASE 1: MULTIPLE ERRORS ON INDEPENDENT NODES (UNSUCCESSFUL TRIES ARE DUE TO TIME OUT OF 1000 SECONDS OR OUR TECHNIQUE IS NOT ABLE TO CORRECT THE CIRCUIT AFTER CONSIDERING ALL ERROR CANDIDATES; TIME IS GIVEN IN SECONDS). Proposed Method with Incremental Ranking

Proposed Method without Incremental Ranking

#Mutants

Time (s)

Success Rate (%)

#Mutants

Time (s)

Success Rate (%)

Run Time Improvement with Incremental Ranking (%)

2

4

8.3

100

11

19.5

100

57

2

5

8.9

100

11

20.5

100

57

Benchmark

#Errors

Alphablending DIFFEQ Sobel

2

7

27.6

100

20

80.1

100

66

FIR

4

9

33.9

100

33

113.3

100

70

DCT

5

24

147.8

100

61

409.3

100

64

IDCT

5

27

163.2

100

85

489.6

90

67

FFT32

3

6

29.6

100

28

129.5

100

77

FFT256

3

18

81.6

100

41

211.3

90

61

FFT512

4

28

146.2

100

79

411.9

80

65

Average

3.3

14.2

71.9

100

41

209.4

95.5

66

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

TABLE 4 EXPERIMENTAL RESULTS OF THE PROPOSED CORRECTION APPROACH IN CASE 2: MULTIPLE ERRORS ON ONE OR TWO LINES OF RTL CODE. Proposed Method with Incremental Ranking

Proposed Method without Incremental Ranking

#Mutants

Time (s)

Success Rate (%)

#Mutants

Time (s)

Success Rate (%)

Run Time Improvement with Incremental Ranking (%)

7

16.6

100

7

16.5

100

-0.01

2

9

17.9

100

10

18.7

100

0.1

2

18

68.3

100

19

77.9

100

12

FIR

4

19

73.8

100

34

149.1

100

51

DCT

5

39

258.7

100

66

421. 8

100

39

Benchmark

#Errors

Alphablending

2

DIFFEQ Sobel

IDCT

5

55

310.6

100

87

428.9

100

28

FFT32

3

13

55.4

100

31

139.5

100

60

FFT256

3

31

168.9

100

46

232. 7

90

27

FFT512

4

59

317.9

100

83

429. 2

70

26

Average

3.3

27.7

143.1

100

39.6

212.7

95.6

33

TABLE 5 EXPERIMENTAL RESULTS OF THE PROPOSED CORRECTION APPROACH IN CASE 3: MULTIPLE ERRORS ON DEPENDENT NODES. Proposed Method with Incremental Ranking #Mutants Time Success (s) Rate (%) 7 17.1 100

Benchmark

#Errors

Alphablending

2

DIFFEQ

2

10

Sobel

2

FIR

4

Proposed Method without Incremental Ranking #Mutants Time Success (s) Rate (%) 7 16.9 100

19.1

100

10

18.5

100

20

83.4

100

34

162.3

100

20

86.1

100

34

141.9

100

DCT

5

69

453.2

100

69

441.9

100

IDCT

5

77

414.9

100

77

388.3

100

FFT32

3

25

106.3

100

25

122.7

100

FFT256

3

40

211.9

100

40

207.1

100

FFT512

4

75

411.5

80

75

399.1

80

Average

3.3

39.7

208.9

98

39.7

202.5

98

7 CONCLUSION AND FUTURE WORK In this paper we have proposed a novel formal debugging technique with auto-correction mechanism for datapath intensive designs. In order to deal with the problem of multiple design errors, we have proposed a technique to significantly reduce the number of error candidates and then based on a dynamic ranking technique we have obtained a reduced ordered set of error candidates. The proposed techniques are based a canonical decision diagram named M-HED. This diagram supports modular polynomial computations over ring Z2n. This way, the M-HED not only supports equivalence verification of polynomials with multiple bit-width operands, but also helps localize probable bugs when two given polynomials are not equivalent. In addition, we have proposed a mutation based correction mechanism which enables us to automatically and efficiently correct the circuits. One possible avenue for future work is to apply such a debugging technique to control-intensive designs.

REFERENCES [1] [2]

[3]

[4]

[5]

P. Rashinkar, P. Paterson, and L. Singh, System-on-a-Chip Verification: Methodology and Techniques. Boston, MA: Kluwer, 2000. K. Chang, I. Wagner, V. Bertacco, and I. L. Markov, ”Automatic Error Diagnosis and Correction for RTL Designs,” in Proc. of High Level Design, Verification and Test (HLDVTʹ07), pp. 65-72, 2007. B. Alizadeh, “A Formal Approach to Debug Polynomial Datapath Designs,” in Proc. of Asia and South Pacific Design Automation Conference (ASP-DACʹ12), pp. 683-688, 2012. B. Alizadeh, and M. Fujita, “Automatic Merge-Point Detection for Sequential Equivalence Checking of System-level and RTL Descriptions,” in Proc. of International Symposium on Automated Technology for Verification and Analysis (ATVA’07), pp. 129-144, 2007. B. Alizadeh, and M. Fujita, “Modular Datapath Optimization and Verification based on Modular-HED,” in IEEE Transactions on Computer Aided design of Integrated Circuits and Systems (TCAD’10), vol. 29, no. 9, pp. 1422-1435, 2010.

[6]

K. Chang, I. Markov, and V. Bertacco, “Fixing Design Errors With Counterexamples and Resynthesis,” in IEEE Transactions on Computer Aided design of Integrated Circuits and Systems (TCAD’08), vol. 27, no. 1, pp. 184-188, 2008. [7] A. Smith, A. Veneris, M. F. Ali, and A. Viglas, “Fault Diagnosis and Logic Debugging Using Boolean Satisfiability,” in IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD’05), vol. 24, no. 10, pp. 1606-1621, 2005. [8] A. Sülflow, G. Fey, R. Bloem, and R. Drechsler, “Using Unsatisfiable Cores to Debug Multiple Design Errors,” in Proc. of Great Lakes Symposium on VLSI (GLSVLSIʹ08), pp. 77-82, 2008. [9] H. Mangassarian, A. Veneris, S. Safarpour, M. Benedetti, and D. Smith, “A Performance-Driven QBF-Based Iterative Logic Array Representation with Applications to Verification, Debug and Test,” in Proc. of International Conference on Computer Aided Design (ICCADʹ07), pp. 240-245, 2007. [10] S. Safarpour, H. Mangassarian, A. Veneris, M. H. Liffiton, and K. A. Sakallah, “Improved Design Debugging Using Maximum Satisfiability,” in Proc. of Formal Methods in Computer Aided Design (FMCADʹ07), pp. 13-19, 2007. [11] R. Konighofer and R. Bloem, “Automated Error Localization and Correction for Imperative Programs,” in Proc. of Formal Methods in Computer Aided Design (FMCADʹ11), pp. 91-100, 2011.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

This is the author's version of an article that has been published in this journal. Changes were made to this version by the publisher prior to publication. The final version of record is available at http://dx.doi.org/10.1109/TC.2014.2329687

[12] C. H. Shi, and J. Y. Jou, “An Efficient Approach for Error Diagnosis in HDL Design,” in Proc. of International Symposium on Circuits and Systems (ISCAS’03), pp. 732-735, 2003. [13] G. Fey, S. Staber, R. Bloem, and R. Drechsler, “Automatic Fault Localization for Property Checking,” in IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD’08), vol. 27, no. 6, pp. 1138-1149, 2008. [14] R. Bloem, and F. Wotawa, “Verification and Fault Localization for VHDL Programs,” in Proc. of Journal of the Telematics Engineering Society, (TIV’02), vol. 2, pp. 30-33, 2002. [15] T. Jiang, C. Liu, J. Jou, “Accurate Rank Ordering of Error Candidates for Efficient HDL Design Debugging,” in IEEE Transactions on Computer Aided Design of Integrated Circuits and Systems (TCAD’09), vol. 28, no. 2, pp. 272-284, 2009. [16] S. Staber, B. Jobstmann, and R. Bloem, “Finding and Fixing Bugs,” Springer-Verlag LNCS 3725, pp.35-49, 2005. [17] S. Mirzaeian, F. Zheng, and K.-T. Cheng, “RTL Error Diagnosis Using a Word-Level SAT-Solver,” in Proc. of International Test Conference (ITCʹ08), pp. 1-8, 2008. [18] B. R. Huang, T. J. Tsai, and C. N. Liu, “On Debugging Assistance in Assertion-based Verification,” in Proc. of Workshop of synthesis System Integration Mixed Information Technologies (SASIMI’04), pp. 290– 295, 2004. [19] V. Debroy, and W. E. Wong, “Using Mutation to Automatically Suggest Fixes for Faulty Programs,” in Proc. of International Conference on Software Testing Verification and Validation (ICSTʹ10), pp. 65-74, 2010. [20] R. Drechsler, B. Becker, and S. Ruppertz, “The K*BMD: A verification data structure,” in IEEE journal of Design and Test of Computers, vol. 2, issue 2, pp. 51-59, 1997. [21] B. Becker, R. Drechsler, and R. Enders, “On the Representational Power of Bit-level and Word-level Decision Diagrams,” in Proc. of Asia and South Pacific Design Automation Conference (ASP-DACʹ97), pp. 461467, 1997. [22] N. Hungerbuhler, and E. Specker, “A Generalization of the Smarandache function to several variables,” in Electronic Journal of Combinatorial Number Theory (INTEGERS’06), vol. 6, pp. A23, 2006. [23] D. Singmaster, “on polynomial functions (mod m),” in Electronic Journal of Combinatorial Number Theory (INTEGERS’74), vol. 6, pp. 345352, 1974. [24] A. S. Namin, J. H. Andrews, and Y. Labiche, “Using Mutation Analysis For Assessing and Comparing Testing Coverage Criteria,” in IEEE Transactions on Software Engineering, vol. 32, no. 8, pp. 608-624, 2006. [25] L. Halbeisen, N. Hungerbuhler and H. Lauchli, “Powers and Polynomials in Zm”, in Elem. Math, vol. 54, pp. 118-129, 1999. [26] M. Weiser, “Program slicing,” in IEEE Transactions on Software Engineering, vol. 10, no. 4, pp. 352–357, 1984. [27] B. Korel and J. Laski, “Dynamic Program Slicing,” in Information Processing Letters, vol. 29, no. 3, pp. 155–163, 1988. [28] U. Repinski, H. Hantson, M. Jenihhin, J. Raik, R. Ubar, G. Di Guglielmo, G. Pravadelli, and F. Fummi, ”Combining Dynamic Slicing And Mutation Operators For ESL Correction,” in Proc. of European Test Symposium (ETS’12), pp. 1-6, 2012. [29] A. Veneris, B. Keng, and S. Safarpour, “From RTL to Silicon: The Case for Automated Debug,” in Proc. of Asia and South Pacific Design Automation Conference (ASP-DAC’11), pp. 306-310, 2011. [30] T. Y. Jiang, C. N. J. Liu, and J. Y. Jou, “Estimating likelihood of Correctness for Error Candidates to Assist Debugging Faulty HDL Designs,” in Proc. of International Symposium on Circuits and Systems (ISCASʹ05), pp. 5682-5685, 2005.

[31] B. Alizadeh and M. Fujita, “A Functional Test Generation Technique for RTL Datapaths,” in Proc. of International High Level Design Validation and Test Workshop (HLDVTʹ12), pp. 64-70, 2012. [32] B. Alizadeh and M. Fujita, “A Canonical and Compact Hybrid WordBoolean Representation as a Formal Model for Hardware/Software CoDesigns,” in Proc. of Constrains on Formal Verification (CFVʹ07), pp. 1529, 2007. [33] A. Koelbl, R. Jacoby, H. Jain, and C. Pixley, “Solver Technology for System-level to RTL Equivalence Checking,” in Proc. of Design Automation and Test in Europe (DATE’09), pp. 196-201, 2009. [34] S. Gopalakrishnan, and P. Kalla, “Optimization of Polynomial Datapaths Using Finite Ring Algebra,” in ACM Transactions on Design Automation of Electronic Systems (TODAESʹ07), vol. 12, no 4, pp. 49:149:30, 2007. [35] R. Brinkmann, and R. Drechsler, “RTL Datapath Verification Using Integer Linear Programming,” in Proc. of Asia and South Pacific Design Automation Conference (ASP-DACʹ02), pp. 741-746, 2002. [36] B. Alizadeh and M. Fujita, “Modular Equivalence Verification of Polynomial Datapaths with Multiple Word-length Operands,” in Proc. of High Level Design Validation and Test Workshop (HLDVTʹ11), pp. 9-16, 2011. [37] H. Foster, “Assertion-based Verification: Industry myths to Realities (invited tutorial),” in Proc. of International Conference on Computer Aided Verification (CAV’08), pp. 5-10, 2008. [38] D. V. Campenhout, J. P. Hayes and T. Mudge, “Collection and Analysis of Microprocessor Design Errors,” in IEEE Design and Test of Computers, pp. 51-60, 2000. [39] B. Alizadeh and P. Behnam, “Formal equivalence verification and debugging techniques with auto-correction mechanism for RTL designs,” in Journal of Microprocessors and Microsystems (MICPROʹ13), vol. 37, no. 8, pp. 1108-1121, 2013. Bijan Alizadeh received the B.Sc., M.Sc. and Ph.D. degrees in Electrical and Computer Engineering from the University of Tehran, Iran in 1995, 1998 and 2004, respectively. In 2005, he was with Sharif University of Technology, Tehran, Iran. From 2006 to 2007, he was a postdoctoral researcher and from 2008 to 2010, he was a research associate with VLSI Design and Education Center (VDEC), University of Tokyo, Japan. In May 2011, he became an assistant professor with the school of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran. His current research interests include verification, test and synthesis of high level and system level designs, post silicon debugging, reconfigurable computing and embedded system design methodologies. Payman Behnam received the B.Sc. degree in computer science and engineering from the University of Shiraz, Shiraz, Iran, in 2009. He is currently an M.Sc. student in Computer Hardware Engineering at University of Tehran, Tehran, Iran. He is interested in pre and post silicon debugging of digital systems, computeraided design, variability and reliability management and also low-power and energy-efficient of VLSI design in embedded systems. During his M.Sc. study, he is working on debugging of digital design in different abstraction level by using SAT, SMT solvers as well as high level decision diagrams. Somayeh Sadeghi-Kohan received the B.Sc. degree in Computer engineering from the Shahid Bahonar University, Kerman, Iran, in 2005, and the M.Sc. degree in computer architecture from University of Tehran, Tehran, Iran, in 2011. She is currently pursuing the Ph.D. degree in digital electronic systems at University of Tehran. During her M.Sc. study, she worked on the online test methods for on-chip interconnect testing, high level design methodology based on TLM communications, Mixed level design simulation. Her current research interests include the development of high level design tools with emphasis on debugging and design for testability.

Copyright (c) 2014 IEEE. Personal use is permitted. For any other purposes, permission must be obtained from the IEEE by emailing [email protected].

Suggest Documents