Comparing Techniques for Out-of-Order Processor ... - Semantic Scholar

Comparing Techniques for Out-of-Order Processor Verification in UCLID Shuvendu K. Lahiri Electrical and Computer Engineering Department Carnegie Mellon University, Pittsburgh, PA 15213 [email protected]

Sanjit A. Seshia, Randal E. Bryant Computer Science Department Carnegie Mellon University, Pittsburgh, PA 15213 fsanjit.seshia,[email protected]

1.1 System Modeling

Abstract

CLU can express a wide variety of data structures and system types. Uninterpreted functions (UIFs) provide a natural means for abstracting data and data operations. Thus, complex datapath operations including ALU can be replaced with generic uninterpreted functions. UIFs can also be used to specify arbitrary initial values for memories and arrays. The introduction of lambda notation allows us to model the effect of a sequence of read and write operations on a memory. At any point of system operation, a memory is represented by a function expression denoting a mapping from addresses to values. The effect of a write operation with integer expressions A and D denoting the address and data values yields a function expression 0 :

In this paper, we show the verification of out-of-order processors in a tool called UCLID. The processor is modeled using the Logic of Counter Arithmetic with Lambda Expressions and Uninterpreted Functions (CLU) — where data words are abstracted with unbounded integers, functional units including ALUs are modeled with uninterpreted functions and memories and queues are modeled with restricted lambda expressions. We look at three different verification techniques for the processor which offer different degrees of automation and guarantee — bounded property checking; Burch-Dill correspondence checking ; and deductive verification. We show the strength and weakness of each approach and suggest a place for each of the techniques in an overall verification flow.

1

M

M

M

0

= addr : ITE(addr = A; D; M (addr ))

Modeling of other forms of memory including Content Addressable Memory (CAM), parallel-update memories (where an arbitrary number of entries can be modified in a single step) is also possible using lambda notations [3, 5]. Counter arithmetic provides us the ability to express counters and some forms of pointers. Combining lambda expressions with counters allows us to model arbitrary large queues and stacks. Counters are used to model the head and tail pointers of the queue and a lambda expression denotes the contents of the queue — mapping an index in the queue to the value present at the index. For example, consider a queue = : , : , : . Pushing data item into returns a new queue Q0 where

UCLID: The overall system

The tool UCLID [3, 5] uses the logic of CLU (described in Fig 1) to model and verify systems with unbounded resources. CLU is a fragment of quantifier-free first order logic extended with increment (succ), decrement (pred), equality and inequality operations over terms (integer expressions). ITE denotes the “ifthen-else” constructor to choose between two terms depending on a Boolean control. Expressions in CLU describe means of computing four different types of values. Boolean expressions, also termed formulas, yield true or false. Integer expressions, also referred to as terms, yield integer values. Predicate expressions denote functions from integers to Boolean values. Function expressions, on the other hand, denote functions from integers to integers.

Q :tail Q : ontents 0

0

X

= =

Q hQ ontents Q head Q tail i Q su

(Q :tail ) i : ITE(i = Q :tail ; X; Q : ontents (i))

1.2 Verification with UCLID bool-expr

::=

int-expr

::=

predicate-expr

::=

function-expr

::=

true j false j :bool-expr j (bool-expr ^ bool-expr) j (int-expr = int-expr) j (int-expr < int-expr) j predicate-expr(int-expr; : : : ; int-expr) int-var j ITE(bool-expr; int-expr; int-expr) j su

(int-expr) j pred(int-expr) j function-expr(int-expr; : : : ; int-expr) predicate-symbol j int-var; : : : ; int-var : bool-expr function-symbol j int-var; : : : ; int-var : int-expr

Figure 1.

The UCLID verification engine comprises a symbolic simulator that can be “configured” for different kinds of verification tasks, and a decision procedure for CLU [3]. There is a facility for generating counterexamples for verification failures — a counterexample demonstrates a valuation of the state variables (inlcuding function and predicate state variables) which lead to the failure. We can perform three different verification tasks for the systems described in UCLID: 1. Bounded property checking, where a safety property is checked for all states reachable within a fixed number of steps. The system is simulated for a fixed number of steps

CLU Syntax.

1

rf.vl and a tag rf.t. If rf.v bit is true, the rf.vl contains a valid value, else, rf.t would hold the tag of the most recent instruction that will write to this register. The reorder buffer has two pointers, rb.hd, which points to the oldest instruction in the reorder buffer, and rb.tl, where a newly dispatched instruction is added. The index of an entry in the reorder buffer serves as its tag. Each entry in the reorder buffer has a valid bit rb.v indicating if the instruction has finished execution. It has fields for the two operands rb.s1vl, rb.s2vl. The bit rb.s1v indicates if the first operand is ready. If the first operand does not have valid data, rb.s1t holds the tag for the instruction which would produce the operand data. There are similar fields for the second operand. Finally, each entry contains the destination register identifier rb.d and the result of the instruction rb.vl to be written back. When an instruction is dispatched, if a source register is marked valid in the register file, the contents of that register are filled into the corresponding operand field for the instruction in the reorder buffer and it is marked valid. If the instruction which would write to the source register has finished execution, then the corresponding operand field copies the result of that instruction and the operand is marked valid. Otherwise, the operand copies the the tag present with the source register into its tag field and the operand is marked invalid. When an instruction executes, it updates its result, and broadcasts the result on the result bus so that all other instructions in the reorder buffer that are waiting on it can update their operand fields. Finally, when a completed instruction reaches the head of the reorder buffer, it is retired. If the tag of the retiring instruction matches the rf.t for the destination register, the result of the instruction is written back into the destination register, and that register is marked valid. Otherwise, the register file remains unchanged.

starting from the reset state. At each step, the decision procedure is invoked to check the validity of some safety property. If the property fails, then we can generate a counterexample trace from the reset state. 2. Burch-Dill’s Correspondence checking, to verify that the implementation machine is simulated by the specification machine, based on the Instruction Set Architecture (ISA) model. An abstraction function relates a state of the implementation (out-of-order processor) with a state of the ISA machine. This is usually the state of the processor obtained by completing all the partially executed instructions in the processor. 3. Deductive verification, where the system is started from the most general state which satisfies a set of user-specified invariants and then simulated for one step. The invariants are checked at the next step to ensure that the state transition preserves the invariant. If the invariants hold for the reset state, and the invariants are preserved by the transition function, then the invariants hold for any reachable state of the model. We can express an interesting class of invariants with universal quantifiers and can use automatic quantifier instantiation strategies to decide the proof obligations with a high degree of automation.

2 Out-of-Order Processor Description OOO (depicted in Figure 2) is a simple, non-speculative processor with unbounded resources, out-of-order instruction execution, inorder retirement and supports register renaming. The only instructions permitted are arithmetic and logical (ALU) instructions with two source operands and one destination operand.

D E C O D E

PROGRAM MEMORY

Execution

src2 dest

01 1 01 1 0 1 1 0 00 1 00 1 0 1 01 1 01 1 0 1 0 1 0 0 0 1 0 1 0 01 1 01 1 0 1 0 1 0 0 1 0 1 0 1 0 01 1 01 1 0 1 0 1 0 0 1

Units

In this section, we describe the analysis of the processor using the three verification techniques.

3.1 Bounded Property Checking

HEAD

The presence of an efficient decision procedure in UCLID enables automatic exploration of the state space up to a reasonable depth. The counterexamples generated greatly facilitated debugging the design to remove most errors in the design. Fig 3 shows the results of performing bounded property checking for two different properties, tag-consistency and rf-rob. The first property (tag-consistency) ensures that two distinct registers in use can’t have the same instruction modifying them. The second property (rf-rob) states that if an instruction modifies a register r, then the destination register for the instruction in the ROB should be r. The results are compared with another decision procedure, SVC [1], which can also decide the formulas. The results demonstrate the we can explore a much larger portion of the state space using UCLID than comparable tools.

VALID? opcode

11 00 00 11 00 11 00 11 00 11 00 11 00 11

3 Verification of the Processor

RESULT

retire

RESULT BUS

11 00 00 11 00 11 00 11 00 00 0011 11 0011 11 00 11 00 11 00 11 00 11 00 11 00 11 00 0011 11 0011 11 00 11 00 11 00 00 11

SRC1 VAL SRC1 TAG SRC1 VALID? SRC2 VAL SRC2 TAG SRC2 VALID? DEST REG ID

REGISTER FILE

src1

0 1 1 0 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1

OPCODE PC

dispatch REORDER BUFFER

TAIL

execute

Figure 2.

OOO: An Out-of-order execution unit.

As shown in Figure 2, an instruction is read from program memory, decoded, and dispatched to the end of the reorder buffer, which is modeled as an infinite queue. An arbitrary subset of instructions with ready operands can execute out-of-order. Finally, an instruction is retired (the program state updated), once it is at the head of the reorder buffer. The operations dispatch, execute and retire happen concurrently at each step. The register file is modeled as an infinite memory indexed by register ID. Each entry of the register file has a bit, rf.v, a value

3.2 Burch-Dill Correspondence Checking To perform Burch-Dill verification, we must restrict the number of entries in the ROB to be a finite value in order to generate a bounded flushing. The size of the register file can still be unbounded. 2

Property tag-consistency

rf-rob

#steps 6 10 14 20 10 14 20

F lu size

UCLID time 0.87 10.80 76.55 1679.12 10.31 71.29 777.12

346 2566 7480 19921 2308 7392 19791

SVC time 0.22 233.18 > 5hrs > 1 day 160.84 > 8hr > 1day

time, the result at rb.hd (either already present or the result of the execution) is written to the destination register and the rb.hd is incremented by 1. Thus the buffer gets “flushed” in k steps for a k-deep buffer. #-buffers 2 3 4 6 8

Figure 3.

Experimental results for Bounded Property Checking with OOO. Here “steps” indicates the number of steps of symbolic simulation, “F lu ” denotes the CLU formula obtained after the symbolic simulation, the “size” of a formula denotes the number of distinct nodes in the Directed Acyclic Graph (DAG) representing the formula. “SVC time” is the time taken by SVC 1.1 to decide the CLU formula.

6.83 30.23 157.41 3051.79 *

Fig 4 shows the result of performing the experiments with increasing buffer sizes. There is an exponential growth in the time with increasing buffer width. We could not compare the results against SVC, since SVC uses a rational interpretation of numeric values, which gives spurious counterexamples in this case.

3.3 Deductive Verification We verify the OOO processor by proving a refinement map between OOO and a sequential Instruction Set Architecture (ISA) model. The ISA contains a program counter Isa.PC, and a register file Isa.rf. The program counter Isa.PC is synchronized with the program counter for OOO. Isa.rf maintains the state of the register file when all the instructions in the reorder buffer (ROB) have retired and the ROB is empty. Every time an instruc: tion I = (r1,r2,d,op) is decoded and put into the ROB, the result of the instruction is computed and written to the destination register d in the ISA register file as follows:

=) (:rb:v(rb:s1t(t))^ (rb:hd rb:s1t(t) < t)℄ ( B 1 )

Similar invariant exists for the second operand. 2. If a register r is being modified, then the instruction modifying it is present in the rob, and the destination field contains r.

8r:[:rf:v(r) =) ((rb:d(rf:t(r)) = r)^ (rb:hd rf:t(r) < rb:tl))℄ ( D )

Isa:rf(d)

Alu(op; Isa:rf(r1); Isa:rf(r2))

where, Alu is an uninterpreted function to abstract the actual computation of the execution unit. To state the invariants for the OOO processor, we maintain some auxiliary state elements in addition to the state variables of the OOO unit. These structures are very similar to the auxiliary structures used by McMillan [6] for verifying the correctness of out-of-order processors. We maintain a shadow reorder buffer, sh.rob, where each entry contains the correct values of the operands and the result. This structure is used to reason about the correctness of values in the ROB entries. sh.rob is a triple (sh.vl, sh.s1vl, sh.s2vl). sh.vl(t) contains the correct value of rb.vl(t) in the ROB. Similarly, the other fields in the sh.rob contain the correct values for the two data operands. When an instruction I =: (r1,r2,d,op) is decoded, the sh.rob structure at rb.tl is updated as follows:

:

3. For any entry t in the ROB, if r = rb:d(t) be the destination register, then r is being modified and by either t or a later instruction.

e8t:[(t rf:t(rb:d(t)) < rb:tl)) ^ (:rf:v(rb:d(t))℄

( C )

8

UCLID time

Results with Burch-Dill technique. F lu is the CLU formula. “terms” are the integer symbolic constants in F lu after eliminating function applications, “UCLID time” is the time taken by our decision procedure.

e8t:[:rb:s1v (t)

where et:(t) is an abbreviation for t:((rb:hd t < rb:tl)) = (t). Since there are finite number of entries in the ROB, the quantifier can also be replaced by a conjunction over the indices in the ROB. To reduce the complexity of flushing, we use the accelerated flushing approach proposed by Burch [4]. This involves safely modifying the design to insert transitions which do not affect the normal mode of operation. We introduce an external signal flush to drain the processor. If flush is high, then at each step, the instruction at the rb.hd is executed if it was not executed yet. The dependent instructions get the result as usual. But at the same

)

# of terms 63 83 103 143 183

Figure 4.

Unlike the verification of inorder pipelined processors [2], the verification of the out-of-order processor requires us to restrict the start state. The main difference is the presence of ordering between the instructions in the ROB, the interaction between the register-renaming unit and the ROB and the the out-of-order nature of execution. We required the following invariants to restrict the start state of the verification: 1. For an entry t in the ROB, if an operand not does contain a value, then there is an earlier non-executed instruction (rb.s1t(t)) in the ROB, on which this instruction depends. For the first operand, this means:

8

CLUFormula (F lu ) size 398 618 886 1534 2342

sh:vl(rb:tl) Alu(op; Isa:rf(r1); Isa:rf(r2)) sh:s1vl(rb:tl) Isa:rf(r1) sh:s2vl(rb:tl) Isa:rf(r2) 3

3.4 Comparing the techniques

Correctness criteria. The correctness is established by proving the following refinement map between the register file of the OOO unit and the ISA register file.

8r:[rf:v(r) =) (Isa:rf(r) = rf:vl(r))℄

Fig 5 illustrates the relative strength and weakness of each approach. Bounded property checking requires the least user intervention and can prove properties for systems with unbounded resources, but only for a fixed depth. Burch-Dill verification requires bounding the size of the ROB, but provides verification for unbounded depth. Since the Burch-Dill technique does not require the presence of auxiliary fields, the number of invariants are far less than the full deductive verification approach. Deductive verification offers rigorous proof of the system at the cost of significant user guidance.

( Ha)

The lemma states that if a register is not the destination of any of the instructions in the ROB, then the values in the OOO model and the ISA model are the same. Below are the added invariants for this purpose — the invariants for the control path are already described in the section on Burch-Dill verification. Invariant E 1 states the relationship between the sh.s1vl and rb.s1vl. There is a similar invariant for the second operand.

e8t:[rb:s1v(t)

=) (sh:s1vl(t) = rb:s1vl(t))℄

( E 1 )

Method

Automation

BPC Burch-Dill

Fully Automatic Control path Invariants Control + Data path Invariants

Deductive

A similar invariant is also stated for rb.vl and sh.vl.

e8t:[rb:v(t)

=) (sh:vl(t) = rb:vl(t))℄

( Ga )

Isa:rf(r) = rb:vl(rf:t(r))℄

=)

( Hb )

Unbounded

Significant

References

sh:s1vl(t) = sh:vl(rb:s1t(t))℄ ( K 1 )

[1] C. Barrett, D. Dill, and J. Levitt. Validity checking for combinations of theories with equality. In Formal Methods in Computer-Aided Design (FMCAD ’96), LNCS 1166, November 1996. [2] R. E. Bryant, S. German, and M. N. Velev. Exploiting positive equality in a logic of equality with uninterpreted functions. In N. Halbwachs and D. Peled, editors, Computer-Aided Verification (CAV ’99), LNCS 1633, pages 470–482. Springer-Verlag, July 1999. [3] R. E. Bryant, S. K. Lahiri, and S. A. Seshia. Modeling and verifying systems using a logic of counter arithmetic with lambda expressions and uninterpreted functions. In Proc. Computer-Aided Verification (CAV’02), July 2002. [4] J. R. Burch. Techniques for verifying superscalar microprocessors. In 33rd Design Automation Conference (DAC ’96), pages 552–557, June 1996. [5] S. K. Lahiri, S. A. Seshia, and R. E. Bryant. Modeling and verification of out-of-order microprocessors in UCLID. In (FMCAD ’02), LNCS 2517, pages 142–159. Springer-Verlag, Nov 2002. [6] K. McMillan. Verification of an implementation of Tomasulo’s algorithm by compositional model checking. In Computer-Aided Verification (CAV 1998), June 1998.

e8t:[(sh:vl(t) = Alu(rb:op(t); sh:s1vl(t); sh:s2vl(t)))℄

( Gb )

Finally, the invariant H relates the value of a register r in the shadow register file with the result of the instruction which would write back to the register.

Isa:rf(r) = sh:vl(rf:t(r))℄

Unbounded

Hence, we propose the following verification flow to best exploit the effectiveness of the three approaches: 1. Use BPC to debug the system. The counterexample information can be used to locate errors in the system with very little manual guidance. This can be done as long we can increase the depth of verification. 2. Next, restrict the sizes of the various buffers in the systems and perform Burch-Dill verification. This part would require the user to specify control invariants. Add invariants as long as the verification does not succeed. 3. Finally, perform deductive verification by using the control invariants from part (2). Add the auxiliary structures to prove the correctness of the data path and also add the required auxiliary invariants.

To relate the result of execution to the correct value for any entry, we state:

8r:[:rf:v(r) =)

Unbounded Fixed

Auxiliary Structures None None

Comparing different techniques. “BPC” refers to bounded property checking.

The following invariant asserts that the correct value of a data operand which is not ready is the result of the instruction which would produce the data.

e8t:[:rb:s1v(t)

Resources

Figure 5.

Auxiliary Invariants. In addition, we need a set of auxiliary invariants to strengthen the invariants to make them inductive. This is often the most tedious part of the verification and requires careful user guidance. In UCLID, the presence of counterexamples for verification failures helps the user identify the reasons for failure and perform appropriate strengthening. Hb states if the latest instruction modifying the register r has completed execution, then the value of r in the shadow register file is the result of the instruction.

8r:[:rf:v(r) ^ rb:v(rf:t(r)) =)

Verification (Depth) Finite Unbounded

( H )

The proof obligations in the deductive verification were discharged automatically by using a sound decision procedure for the quantified formulas. This alleviates significant user guidance in discharging the proofs. 4

Comparing Techniques for Out-of-Order Processor ... - Semantic Scholar

Comparing Techniques for Out-of-Order Processor ... - Semantic Scholar

Suggest Documents

Comparing Dimension Reduction Techniques for ... - Semantic Scholar

General Techniques for Comparing Unrooted ... - Semantic Scholar

Dynamic Gridmaps: Comparing Building Techniques - Semantic Scholar

Digital RF Processor Techniques for Single-Chip ... - Semantic Scholar

Comparing techniques for authorship attribution of ... - Semantic Scholar

Comparing Redundancy Removal Techniques for

Processor Support for Temporal Predictability - Semantic Scholar

Holographic parallel processor for calculating ... - Semantic Scholar

Strategies for Achieving Improved Processor ... - Semantic Scholar

Processor Capacity Reserves for Multimedia ... - Semantic Scholar

Partially Reconfigurable Vector Processor for ... - Semantic Scholar

The HORUS Processor - Semantic Scholar

An Extended Framework for Comparing ... - Semantic Scholar

Comparing Various Parallelizing Approaches for ... - Semantic Scholar

An Extended Framework for Comparing ... - Semantic Scholar

Comparing SMT Methods for Automatic ... - Semantic Scholar

Comparing parameter tying methods for ... - Semantic Scholar

Reproducible Experiments for Comparing Apache ... - Semantic Scholar

Experimental Design for Comparing Static ... - Semantic Scholar

Comparing four technologies for measuring ... - Semantic Scholar

Comparing different stimulus configurations for ... - Semantic Scholar

Comparing requirements analysis methods for ... - Semantic Scholar

Comparing Situation Awareness for Two ... - Semantic Scholar

COMPARING CURRICULAR APPROACHES FOR ... - Semantic Scholar