Mechanizing Veri cation of Arithmetic Circuits: SRT Division ? Deepak Kapur1 and M. Subramaniam2?? 1 Computer Science Department, State University of New York, Albany, NY 12222
[email protected]
2 Functional Veri cation Group, Silicon Graphics Inc., Mountain View, CA 94040
[email protected]
Abstract. The use of a rewrite-based theorem prover for verifying
properties of arithmetic circuits is discussed. A prover such as Rewrite Rule Laboratory (RRL) can be used eectively for establishing numbertheoretic properties of adders, multipliers and dividers. Since veri cation of adders and multipliers has been discussed elsewhere in earlier papers, the focus in this paper is on a divider circuit. An SRT division circuit similar to the one used in the Intel Pentium processor is mechanically veri ed using RRL. The number-theoretic correctness of the division circuit is established from its equational speci cation. The proof is generated automatically, and follows easily using the inference procedures for contextual rewriting and a decision procedure for the quanti er-free theory of numbers (Presburger arithmetic) already implemented in RRL. Additional enhancements to rewrite-based provers such as RRL that would further facilitate verifying properties of circuits with structure similar to that of the SRT division circuit are discussed.
1 Introduction There has been considerable interest recently in using automated reasoning techniques to aid in enhancing con dence in hardware designs. A number of researchers have been exploring the use of BDD based software, model checkers, theorem provers and veri cation systems for verifying properties of arithmetic circuits, cache-coherence protocols, dierent kinds of processors including pipeline, scalable processors, as well as a commercial processor. Papers on these attempts have appeared in recent conferences such as CAV and FMCAD. Intrigued by these attempts and results, we decided to try our theorem prover Rewrite Rule Laboratory (RRL) [11] for hardware veri cation, with the main objective of exploring circuits and their properties that can be veri ed automatically in a push-button mode. We have also been interested in identifying extensions and enhancements to RRL which would make it better suited for this application. In [8] and [7], we discussed how RRL had been used for verifying ripple-carry, carry-lookahead and carry-save adders, as well as a family of multipliers including Wallace-tree and Dadda multipliers. ? ??
Partially supported by the National Science Foundation Grant no. CCR-9712366. This work was done while the author was at State University of New York, Albany.
Our experience in using RRL has been very encouraging. RRL can be used effectively, essentially in the push-button style, for proving number-theoretic properties of these circuits without having to require xing their widths. Parametric circuits can be veri ed; descriptions common to a family of related circuits can be given and reasoned about. Proofs of components can be reused while attempting proofs of larger circuits; as an example, while reasoning about multipliers, adders used in them can be treated as black-boxes insofar as they satisfy their speci cations. In this paper, we discuss how RRL can be used for reasoning about SRT division circuits. After reading [2] and [18], we rst suspected that considerable user interaction with and guidance to RRL may be needed to verify the main properties of the circuit. The reported use of Mathematica and Maple in [2, 4] for reasoning about inequalities and real numbers, as well as the use of dependent types, table data structure, and other higher order features in [18] initially discouraged us from attempting a mechanical veri cation of the division circuit using RRL. We subsequently discovered to our pleasant surprise that the proof reported in [2] could be easily found using RRL without any user guidance; a brief sketch of that proof is given in [5]. In fact, the mechanization of that proof was the easiest to do in RRL in contrast to the proofs of adders and multipliers in [8, 7]. We have recently found a much simpler and easier proof of the SRT division circuit by explicitly representing the quotient selection table. (It is widely believed that the bug in the Intel Pentium processor was in the quotient selection table.) In this paper, we discuss this new proof. Later, we contrast this proof with our earlier proof attempt as well as proofs in [2, 18]. Four major features seemed to have contributed to RRL being eective in mechanization attempts for hardware veri cation. 1. Fast contextual rewriting and reasoning about equality [23]. 2. Decision procedures for numbers and freely constructed recursive data structures such as lists and sequences, and most importantly, their eective integration with contextual rewriting [6]. 3. Cover set method for mechanization of proofs by induction [24], and its integration with contextual rewriting and decision procedures. 4. Intermediate lemma speculation heuristics. In the next section, SRT division algorithm and circuit are informally explained, with a special focus on radix 4 SRT circuit. The interaction between the appropriate choice of radix, redundancy in quotient digits, quotient selection and remainder computations is brie y reviewed. The third section is a brief overview of the theorem prover RRL. Section 4 is an equational formalization of SRT division circuit description in RRL. Section 5 is a brief sketch of how the proof of the two invariant properties of the circuit was done using RRL. Section 6 is a discussion of related work, and our experience in using RRL for SRT division circuit. Section 7 concludes with some remarks on possible enhancements to RRL to make it better suited for verifying circuits using preprogrammed read-only-memory (rom).
2 SRT Division Algorithm and Circuit The basic principles underlying the SRT division algorithm are reviewed. SRT division algorithm proposed by Sweeney, Robertson [17] and Tocher [19] has been frequently used in commercial microprocessors due to its eciency and ease of hardware implementation [20, 22]. Several expositions of the design of hardware divider circuits based on this algorithm appear in the literature [20, 15, 16, 3]. The SRT algorithm takes as input, two normalized fractions, the dividend and the positive divisor, and outputs the quotient and the remainder. The focus in this paper is on this part of the division circuit as in [4, 2, 18]. It is assumed that a normalization circuit for handling signs and exponents is correct. Much like the paper and pencil grade school division method, the SRT division algorithm is iterative, in which the quotient is computed digit by digit by repeatedly subtracting the multiples of the divisor from the dividend. In each iteration, the algorithm selects a quotient digit, multiplies it with the divisor, and the result is subtracted from the partial remainder computed so far. The result of the subtraction is the partial remainder for the next step. The partial remainder is initialized to be the dividend divided by r. The algorithm terminates once all the quotient digits have been computed. The algorithm can be formalized in terms of the following recurrences. 0 := dividend=r; Q0 := 0; +1 := r Pj ? qj+1 divisor; f or j = 0; ; Qj +1 := r Qj + qj+1 ; f or j = 0; ; n ? 1; P
Pj
n
?1
;
where Pj is the partial remainder at the beginning of the j -th iteration, and 0 Pj < divisor, for all j , Qj is the quotient at the beginning of the iteration j , qj is the quotient digit at iteration j , n is the number of digits in the quotient, and r is radix used for representing numbers. The alignment of the partial remainders and the multiples of the divisor being subtracted is achieved by left shifting the partial remainder at each step (i.e., by multiplying Pj with the radix r). The correct positional placement of the quotient digit is similarly ensured by left shifting the partial quotient. And, the invariant 0 Pj < divisor ensures that at each step, the highest multiple of the divisor less than the partial remainder is subtracted. SRT dividers used in practice incorporate several performance enhancing techniques while realizing the above recurrence. An important issue in implementing such an algorithm in hardware is the selection of correct quotient digit at each step. A brute force strategy of enumerating the multiples of the divisor until the subtraction leads to a number that is less than the divisor could be prohibitively expensive. The SRT dividers instead use quotient digit selection functions in the form of look-up tables for guessing a quotient digit at each step of division based on the partial remainder and the divisor. Two other major aspects resulting in the increased performance of SRT dividers are the choice of the radix in representing the quotient, and the choice
of a signed digit representation for the quotient digits. The former reduces the number of iterations required to get the quotient, and the latter reduces the time taken in each iteration by speeding up the partial remainder computation. In [20], tradeos between speed, radix choice, redundancy of quotient digits, are discussed.
2.1 Choosing Quotient Radix In an SRT divider using the radix 2, each iteration produces one quotient bit, and n iterations are required to produce a quotient of n bit accuracy. The number of iterations can be reduced by choosing a higher radix. For example, choosing the radix to be 4, only n=2 iterations are needed; at each step, two quotient bits can be generated. The choice of a higher radix, however, entails larger time in each iteration since the selection of the quotient digit and the generation of divisor multiples become more complicated. Typically, radix 4 is used in practice, since it seems to provide a reasonable trade-o between the number of iterations and the time spent in each iteration [20]. Multiplication by quotient digits 0, 1, 2, and 3, can be performed by shifting and adding/subtracting. The SRT divider speci ed and veri ed in this paper, uses the radix 4.
2.2 Redundant Quotient Digit Representation SRT dividers reduce the latency of each iteration by using a redundant signeddigit representation for the quotient digits. Typically, the digit values of a quo-
tient represented with a radix r can range from 0 through r ? 1. In contrast, in a redundant signed-digit representation, the digit values of a quotient with radix r are a consecutive set of integers [?a; a] where a is at least dr=2e. Depending upon a, this allows for some redundancy. For example, a redundant signed-bit representation for quotient with radix 4 would be the quotient digit set f?2; ?1; 0; 1; 2g; this is in contrast to 4 quotient digits commonly used for radix 4: f0; 1; 2; 3g. The value of a quotient with signed digits is interpreted by subtracting the binary weights of the negative digits from the non-negative ones. Due to the redundancy in the representation, more than one quotient can map onto the same number. For example, the quotients 10(?2) and 1(?1)2 in radix 4 both have the value 1 42 ? 2 1 = 14 = 1 42 ? (1 4) + 2. An advantage of using the above quotient digit set is that divisor multiples are generated simply by shifting. This is in contrast to the unsigned quotient digit set representation for radix 4 for which it is necessary to implement a shift followed by an add/subtract to generate 3 times the divisor. More importantly, redundancy among quotient digits allows the quotient digits to be selected based on only a few signi cant bits of the partial remainder and the divisor. This reduces the complexity of the quotient selection table, as well as allows the multiplication and the subtraction stage of an iteration to be overlapped with the quotient selection stage of a successive iteration. The radix 4 SRT divider in this paper uses the redundant signed-digit representation [?2; 2].
8/3D
16/3 5 14/3 13/3 4 11/3 10/3 Shifted Partial 3 8/3 Remainder 7/3
qj = 2
5/3D 4/3D
qj = (1, 2)
2 5/3 4/3 1 2/3 1/3 0
qj = 1
2/3D
qj = (0, 1)
1/3D
qj = 0 8/8
12/8 Divisor
15/8
Fig. 1. P-D Plot for Radix 4
2.3 Quotient Selection Function
The SRT division algorithm with redundant signed-digit quotient representation allows quotient digits selected to be inexact within certain bounds; the partial remainder generated in a step could be negative. The bound on the successive partial remainders using a redundant signed-digit representation [?a; a] for radix r is ?D a=(r ? 1) Pj D a=(r ? 1); where D is the divisor. By substituting the recurrence for the successive partial remainders, the range of shifted partial remainders that allow a quotient digit k to be chosen is: [(k ? a=(r ? 1)) D; (k + a=(r ? 1)) D]: The correlation between the shifted partial remainder range P and divisor D in the SRT division algorithms is diagrammatically plotted as a P-D plot. The shifted partial remainder and the divisor form the axes of the plot which illustrates the shifted partial remainder ranges in which a quotient digit can be selected, without violating the bounds on the next partial remainder. The P-D plot for a radix 4 quotient with redundant digit set [?2; 2] is given in Figure 1. As the reader would notice, when the partial remainder is in the range [5=3D; 8=3D], the quotient digit 2 is selected. The shaded regions represent quotient digits overlaps where more than one quotient digits selection is feasible. So if the partial remainder is in the range [4=3D; 5=3D], either 2 or 1 can be used.
parrem
g7g6g5g4.g3g2g1 1010.0 1010.1 1011.0 1011.1 1100.0 1100.1 1101.0 1101.1 1110.0 1110.1 1111.0 1111.1 0000.0 0000.1 0001.0 0001.1 0010.0 0010.1 0011.0 0011.1 0100.0 0100.1 0101.0 0101.1
1.000 1.001 { { { { { { { { { { -2 -2 -2 -2 -2 -2 A B -1 -1 -1 -1 0 0 0 0 1 1 1 1 2 C 2 2 2 2 { 2 { { { { { { { { { {
Divisor
f1.f2f3f4 1.010 1.011 1.100 1.101 { { { { { { { { { { { -2 { -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 B -1 -1 -1 -1 -1 -1 -1 -1 -1 -1 D D 0 0 0 0 0 0 0 0 0 0 1 1 E 0 1 1 1 1 1 1 1 1 2 2 C 1 2 2 2 2 2 2 2 2 2 2 2 2 { { 2 2 { { { 2 { { { { { { { {
1.110 1.111 { { -2 -2 -2 -2 -2 -2 -2 -2 -2 -2 B -1 -1 -1 -1 -1 -1 -1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 2 1 2 2 2 2 2 2 2 2 { 2 { {
Table 1. Quotient Digit Selection Table
For selecting an appropriate quotient digit, it is not necessary to know the exact value of the shifted partial remainder P or the divisor D. It suces to know the region in which the ratio P=D lies in Figure 1. Due to the overlap between the lower bound for the P=D ratio for quotient digit k and the upper bound for the quotient digit k ? 1, P=D ratio can be approximated in choosing quotient digits. For instance, a radix 4 SRT divider with the partial remainders and divisor of width n, n > 8, it suces to consider partial remainders up to 7 bits of accuracy and a divisor up to 4 bits of accuracy [20]. The quotient selection table implementing the P-D plot for radix 4 is reproduced above from [20]. Rows are indexed by the shifted truncated partial remainder g7g6g5g4:g3g2g1; columns are indexed by the truncated divisor f 1:f 2f 3f 4; table entries are the quotient digits. The table is compressed by considering only row indices up to 5 bits since only a few entries in the table depend upon the 2 least signi cant bits g2g1 of the shifted partial remainder. For those cases, the table entries are symbolic values A; B; C; D; E , de ned as: A = ?(2 ? g2 g1); B = ?(2 ? g2); C = 1 + g2; D = ?1 + g2; E = g2: These entries as well as other aspects of the selection table are further discussed in subsection 4.1, where we show how the table is input to RRL. The - entries in the table are for the cases of the shifted truncated partial remainder and truncated divisor pairs which are not supposed to arise during the computations.
2.4 Divider Circuit
A radix 4 SRT divider circuit using the signed digit representation [?2; 2] is given in Figure 2. The registers divisor, remainder in the circuit hold the value of the divisor and the successive partial remainders respectively. The register
rout A+B