Early Error Detection in Industrial Strength Cache

0 downloads 0 Views 126KB Size Report
tocol scenarios specified using SQL constraints are solved to automatically ..... table within a few minutes on a SUN Sparc 10 whereas the it takes around 6 ...
Early Error Detection in Industrial Strength Cache Coherence Protocols Using SQL Mahadevan Subramaniam Computer Science Department University of Nebraska at Omaha Omaha, NE 68182 [email protected]

Abstract A table-driven approach for designing industrial strength cache coherence protocols based on relational database technology is described. Protocols are specified using several interacting multi-input, multi-output controller state machines represented as database tables. Protocol scenarios specified using SQL constraints are solved to automatically generate database tables, and to statically check protocol properties including absence of deadlocks and other protocol invariants. The debugged tables are mapped to hardware using SQL operations while preserving protocol properties. The approach is deployed at Fujitsu System Technology Division in the design of their next generation multiprocessor and has discovered several errors early in the design cycle.

1. Introduction Cache coherence protocols are an integral part of the shared memory systems. Designing robust cache coherence protocols has been a challenge due to their inherent distributed nature and the subtle interactions among their components. There has been a lot of interest in the research community in the verification of these protocols and a number of research articles have been published in conferences and journals [4, 2, 9, 8, 10]. With the rapidly increasing performance requirements, it is becoming common to implement these protocols in hardware and often on the same chip as the processors. Consequently, the time to market pressures for the design and implementation of these protocols is enormous due to the high-level of integration and the prevalent commodity status of the processors. This makes it imperative to design these protocols fast and to detect errors early to avoid costly, time consuming re-design efforts. In this paper, we describe a novel, automatic approach

based on relational database technology and the query language SQL for rapid development and early debugging of industrial strength cache coherence protocols. The approach is motivated by our experiences based on the design and implementation of two generations of multiprocessor systems at Fujitsu Systems Technology Division, Fujitsu Inc1 . The proposed approach was deployed in the development of the cache coherence protocol in their next generation multiprocessor system product (ASURA) and has been highly successful in detecting several protocol design errors early and substantially reduced the development time. A cache coherence protocol is a set of rules that provide a coherent view of the shared memory to all the processors in a multiprocessor system. These rules are specified by defining the actions of the different system components (controllers) for memory and I/O read/write operations at any given state. Each read and write operation is achieved by an exchange of a sequence of messages among the different controllers and constitutes a protocol transaction. In practice, a typical development cycle of a cache coherence protocol starts with a high-level architecture specification of the protocol based on product marketing requirements. Then, a hardware implementation that conforms to the architecture specification and meets the product engineering requirements is produced. Finally, the implementation is tested and certified correct using simulation by running specific as well as random tests. As evident from above, protocol testing does not begin until very late in the development cycle and this often leads to costly re-design efforts. A straightforward solution to this problem is to detect protocol errors early by debugging the architecture specification before an implementation begins. However, in many cases this is not feasible since the architecture specifications are informal and often do not include sufficient protocol details. A typical specification is an English document and specifies only a few commonly 1 This

work was done while the author was working at Fujitsu Inc.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

occurring individual protocol transactions and the messages exchanged by the controllers. Core aspects of the protocol such as the internal states through which a controller transitions as a transaction progresses through the system and the interleavings of different transactions at a controller are absent. Without such information, debugging an architecture specification is not very helpful. To facilitate early error detection, our approach adds protocol details to the architecture specification by completely describing the behavior of all participating system controllers over all transactions. Each controller is specified using a multi-input, multi-output state machine enumerating all possible actions of the controller for all legal combinations of its input messages and states. Each controller state machine is represented as a table (controller table) with each row describing one controller state transition. An enhanced architecture specification with multiple such controller tables is produced. Several protocol properties such as the absence of deadlocks and other protocol invariants are established by analyses of these tables. A hardware implementation is then produced using debugged tables. However, for this early error detection approach to be used for an industrial strength protocol development, the overheads for producing detailed architecture specification and early debugging must be minimal. Further, the debugged tables must be mapped to an implementation while preserving all the properties established by static analyses. The key idea underlying our approach to detect errors early in industrial strength protocol development is the use of relational database technology along with the query language SQL. Controller tables are modeled as database tables in a central database. The table entries are automatically generated from a compact set of SQL constraints using the built-in constraint solver available in the relational database system. This allows an enhanced architecture specification to be generated with minimal overheads. SQL supported table operations and constraints allow several protocol properties to be declaratively specified and these are automatically checked by the relational database system without the need to develop additional checkers. Further, to ensure that the debugged tables are faithfully mapped to an implementation our approach restricts the implementation mappings to be defined using SQL constraints and table operations. Implementation details are added to a given debugged table using SQL constraints to automatically generate an extended table. The extended table is then modified using table operations to produce one or more implementation tables. Each SQL table operation that modifies an extended table must specify the corresponding SQL table operations to reconstruct the original table from the resulting tables. Based on this specification, an extended table is regenerated from the implementation tables and it is checked using SQL constraints that the resulting table con-

tains the original debugged table. The approach is used in a push-button manner by creating a database input comprised of three components - i) database table schema describing the individual controller table columns and their legal values, ii) SQL constraints specifying the behavior of the controllers, and iii) protocol static checks in terms of SQL constraints and table operations. The output from the relational database system is a set of debugged tables that are included in the enhanced architecture specification. Errors found by static analyses are analyzed, the specification is modified and the process is repeated until no errors are found leading to debugged tables. A similar database input comprised of table schemas, constraints and static checks is used to automatically generate the implementation tables from debugged tables. The static checks in the database input for the implementation tables ensure that the mapping preserves the debugged tables. Even though tabular specification of cache coherence protocols have been proposed earlier [2, 10], to the best of our knowledge, ours is the first use of relational database technology and SQL in this manner to support the design of an industrial strength cache coherence protocol. The approach can be easily applied to other cache coherence protocols such as those described in [2, 10]. Several other hardware based I/O protocols are also naturally described using interacting state machines represented using tables. The proposed approach can be used for early error detection in these protocols as well. The rest of this paper is organized as follows. A brief overview of the ASURA directory-based coherence protocol and the directory controller is given in section 2. The key contributions of the approach are illustrated in the rest of the sections using the directory controller table. In section 3, we discuss how the SQL constraints defined over the columns of the directory controller table can be used to automatically generate the table entries. Section 4, illustrates the use of SQL constraints to statically check the protocol. Section 5 describes how the debugged controller tables can be mapped to an implementation using SQL.

2. ASURA cache coherence protocol ASURA is a distributed shared-memory multiprocessor system scalable to 64, 64-bit processors. The system comprises of a group of quads, each quad containing 4 nodes, with support for up to a maximum of 4 quads. Each node contains 2-4 processors. System memory is physically distributed among the nodes. Coherence is maintained in the system using four protocol engines, one per quad, using a directory-based cache coherence protocol (similar to the one implemented in the Stanford Dash system [5]) implemented in hardware. The quads are fully interconnected using high-speed, proprietary links.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

The system uses the well known 4-state (MESI) cache coherence protocol [7] where a cache line could be in one of the four cache states M(modified), E(exclusive), S(shared) or I(invalid). The sharing status of cache lines belonging to the memory in a quad are tracked by the protocol engine in that quad using a structure called the directory. The directory contains an entry for each line that is cached by some node in the system. The directory entry tracks the sharing status of a line using a pair of values - the directory state and the directory presence vector. The directory state indicates the state of the line in the caches and could be one of I(invalid), SI(shared or invalid) and MESI(modified or exclusive or shared or invalid). The directory presence vector is a 16-bit bit vector that indicates the nodes whose caches contain the line. The i th node has the line in one of its cache whenever the i th bit in the vector is set to 1. The overall cache coherence protocol is realized in the system through several controllers including the directory, node, remote access cache, cache, and memory controllers that are distributed and replicated throughout the system. These controllers exchange messages classified as requests and responses to execute protocol transactions. The protocol supports several types of memory and I/O read and write transactions along with special transactions that are used to communicate the state information among the controllers. Around 50 different types of messages are used in the protocol. Some of the protocol messages used in this paper are given in Figure 1. The rest of this section describes the directory controller using a Read exclusive transaction.

Figure 1. Some protocol messages

2.1. Directory controller A typical Read Exclusive protocol transaction at the directory controller is described in Figure 2. The vertices in the figure denote the system nodes and controllers. The arcs denote the messages exchanged among the nodes and the controllers. The numbers on the arcs give the relative ordering of messages in a transaction. The local, home and remote nodes in Figure 2 represent the node initiating a request, the memory and directory con-

Figure 2. Read Exclusive Transaction at D

troller for the requested line and the nodes that potentially have these lines in their caches respectively. The local node initiates the read exclusive transaction by sending a readex request to the directory controller at home. Based on a directory lookup it is determined that the line is cached in SI state by remote node. The controller simultaneously sends a sinv request to the remote node to invalidate the line and an mread request to the memory controller in the home node to get data. The directory controller then enters a Busy state awaiting responses from the remote and home nodes. On receipt of both these responses, the directory state is updated with the value MESI and the directory presence vector is updated with the id of the local node to indicate a transfer in ownership. The transaction is completed by sending compl and data responses to the local node. The directory controller uses different types of Busy states to indicate the type of pending transaction and also indicate the progress of a transaction. The controller may go through a sequence of these states for a single transaction. In Figure 2, the controller starts at the Busy-sd state indicating pending snoop and data responses. It transitions into Busy-s(Busy-d)on receiving data(idone) response. The directory controller actions for the read exclusive transaction can be specified by a table with 3 input columns - Incoming message (inmsg), Directory state including Busy states (dirst), directory presence vector (dirpv) and 5 output columns - message to the local node (locmsg), message to the remote node (remmsg), message to the memory (memmsg), the next directory state (nxtdirst), and next directory presence vector (nxtdirpv). Each row of this table specifies the controller action on an incoming message. The rows of the table for the read exclusive transaction are given in Figure 3. The dirpv column values in this table are an encoding of the current state presence vector with values zero, one, gone denoting zero, one, and more than 1 sharers respectively. The nxtdirpv column values specify operations increment(inc), decrement(dec), replace(repl) or decrement and replace if zero(drepl) that must be performed on the presence vector in the current state to transition to the next state. When there are one or more sharers, the presence vec-

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Figure 3. Table for readex transaction

tor must be zero to ensure that all sharers have invalidated before the transaction is completed. The columns inmsg, locmsg, memmsg and remmsg are called message columns. The protocol has a separate structure called the busy directory to track Busy states. A busy directory entry with appropriate Busy state is allocated when sending a request to a remote node, it is updated as responses from these nodes are processed and is de-allocated when the transaction completes. To specify the state information about the busy directory, the directory controller table is extended with 2 new input columns bdirst, bdirpv and 2 new output columns nxtbdirst and nxtbdirpv. These columns have the same meanings as the corresponding ones in Figure 3. The final protocol directory controller table (D) has several columns in addition to those described above. For instance, there are 3 additional columns for each of the 4 message columns denoting the source, destination and the resource used by these columns. There are also additional columns indicating the result of a lookup in the directory as well as the busy directory.

3. Generating controller tables using SQL We now describe how all the protocol controller tables such as D are automatically generated using SQL constraints. For each controller table, the constraints specified are input to the built-in SQL constraint solver of the relational database system and the output is the controller table. This allowed us to generate a complete, detailed specification of the protocol in a short amount of time without the tedious and error prone data entry work. The use of constraints also considerably reduces the time to update the controller tables. To generate a controller table such as D, its columns are identified and classified as input and output columns. Each column is specified as a database table called the column table. For example, 30 column tables are created in the database for the table D. The entries in a column table define the values that are legal in that column. In addition to the values defined by the protocol, each column table contains the special value NULL that denotes a dontcare value for an input column and denotes a noop value for an out-

put column. The NULL value allows a controller table entry to be specified only using the relevant values and helps in optimal mapping of tables to hardware. An SQL constraint called a column constraint is then specified for each column of the controller table. A column constraint is a boolean expression and defines the relationship between this column and the other column values in the controller table. The column constraint for an unconstrained column is true. For other columns it is specified using a ternary boolean expression of the form condition?trueexpr:false-expr where condition is an SQL boolean expression; true-expr and false-expr are SQL expressions and could both be ternary boolean expressions. An SQL expression is built from column names, literals and sets over these literals by applying the relational operators =, 6=, 2 (in) and the boolean operators and, or and not. For example, the constraint for the input column dirpv for the table in Figure 3 is inmsg = "data" and dirst = "Busy-d" ? dirpv = zero:dirpv = one.

and the constraint for the output column remmsg is inmsg = readex and dirst = SI ? remmsg = sinv : remmsg = NULL.

Inputting the conjunction of such column constraints to the SQL query solver generates the table D. D is a cross product of the column tables from which rows that do not satisfy the constraints are pruned out by the SQL query solver. D is comprised of all satisfying assignments for the input conjunction of constraints and each assignment forms a row in D. There is a unique table for a given set of column constraints and an inconsistent set of column constraints results in D having zero rows. In practice, the column constraints provided a compact representation of the table D. This is because the table D is typically specified only for the legal input combinations and as a result is quite sparse. A single column constraint covers multiple protocol transactions. The number of columns in the table is an order of magnitude of smaller than the number of rows and specifying the constraints for each column keeps the number of constraints defining a table small. Column constraints also allowed us to generate the table incrementally and efficiently by adding one column at a time. Initially, the constraints corresponding to the inputs of D were solved to generate a table containing all the legal input combinations to D. This table is then extended by adding the output column constraints one at a time to produce D. Incremental table generation produces the final table within a few minutes on a SUN Sparc 10 whereas the it takes around 6 hours to solve the conjunction of all the column constraints for D. The table D generated by solving column constraints provides a detailed specification of the directory controller

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

for the protocol. The behavior of this controller is described for all legal input message, directory and busy directory state combinations. This table is made of 30 columns and 500 rows and includes around 40 Busy states and considers all transaction interleavings allowed in the protocol. The other protocol controller tables are comparable to D in size and are generated similarly.

4. Detecting protocol errors early using SQL This section describes how protocol properties including absence of deadlocks and other protocol invariants can be statically checked using SQL for early error detection.

4.1. Checking for absence of deadlocks Deadlocks may arise in a multiprocessor system such as ASURA due to cyclic dependencies between finite channel resources used by the requests and responses in the protocol [1, 5]. One common solution to break such cyclic dependencies is to logically partition a physical channel between two quads/nodes into one or more virtual channels [1]. The deadlock avoidance scheme in ASURA uses a similar approach based on virtual channels. The physical channels among the quads are split into a finite number of virtual channels and a virtual channel assignment V denoting the channel used by each protocol message between a source and a destination is created. The SQL based method for detecting deadlocks takes V and controller tables as inputs and produces a directed virtual channel dependency graph V CG by automatically considering all of the protocol scenarios specified in the controller tables. V CG, a database table, is then analyzed for cycles and the cycles found are reported. The cycles that lead to deadlocks are resolved by modifying V and/or by adding more virtual channels. The process is repeated until no deadlocks are found. V is a database table with 4 columns - m, s, d, v where m is message from source s to destination d and is sent over virtual channel v. The source and destination in V are one of local, remote, home. The vertices of V CG are virtual channels. A directed edge (vc1, vc2) means that the virtual channel vc1 depends on the virtual channel vc2. A virtual channel vc1 depends on vc2 if and only if there exist two entries V1 = (m1, s1, d1, vc1), V2 = (m2, d1, d2, vc2) in V such that there is a row in some controller table in node d1 with incoming message m1 from source s1 and an outgoing message m2 from source d1. V 1 is called the input assignment and V 2 is called the output assignment. SQL is used to construct V CG from V and the controller tables as described below.

After creating the table V in the database, individual controller dependency tables are created. A controller dependency table specifies the channel dependencies induced by the processing of messages at the controller. This table has 8 columns representing the input assignment followed by the output assignment. Multiple outgoing messages for an incoming message lead to multiple entries being added to the table. One entry is added for each outgoing message. To create an individual controller dependency table, the corresponding controller table is extended by adding a new virtual channel column corresponding to each (message, source, destination) column triple in the controller table. For example, the virtual channel column inmsgvc, is added to D corresponding to the (inmsg, inmsgsrc, inmsgdest) column triple in D. Each new virtual channel column along with its column triple represents a virtual channel assignment. The value of this new column is the corresponding assignment in V . Projecting the input columns corresponding to the channel assignments from the extended controller table produces the input columns of individual controller dependency table. Output columns corresponding to assignments are considered one at a time and one controller dependency table for each output assignment is generated. The final individual controller dependency table is a union of these tables. To determine additional dependencies induced by the message exchanges in each transaction and transaction interleavings, the individual controller dependency tables are composed. This is done by composing these tables pairwise to create pairwise dependency tables. The union of all the pairwise dependency tables and controller dependency tables gives the overall protocol dependency table. To create a pairwise dependency table from two tables T 1 and T 2, T 1 is composed with T 2 and vice versa 2. To understand how pairwise dependency tables are created, consider any two individual controller dependency tables T 1 and T 2. Let R = (R1, R2) be any row in T 1 and S = (S3, S4) be any row in T 2. Let R1 = (m1, s1, d1, v1), S3 = (m3, s3, d3, v3) denote the input assignments, and R2 = (m2, s2, d2, v2), S4 = (m4, s4, d4, v4) denote the output assignments. The row R can be composed with S to infer the additional dependency (R1, S4) if R2 = S3 i.e., m2 = m3, s2 = s3, d2 = d3 and v2 = v3 and then the row (R1, S4) is added to the pairwise dependency table. By symmetry, row S is composed with R to infer the dependency (S3, R2) if S4 = R1 and is similarly added. The composition requirement that an output assignment exactly match the input assignment does not capture the de2 To ensure that protocol dependency table includes all the dependencies, it is necessary to repeatedly compose pairwise dependency tables until no new dependencies are added. However, in practice this was not needed as no dependencies were found by composition. Our first attempt at computing protocol dependency table was to do a transitive closure but we abandoned this due to the excessive number of spurious cycles.

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

pendencies induced by sharing of virtual channels by different nodes placed in the same quad. For example, let m2 = m3, v2 = v3, s2 = home, d2 = remote, s3 = remote, d3 = home in assignments R2 and S 3. In this case, since s2 6=s3, d2 6=d3, R2 cannot be composed with S 3 and no additional dependencies are added. However, if remote and home nodes share the same quad, then they both share the same virtual channel v2 and hence a dependency must be inferred. To infer such dependencies, the exact match requirement is relaxed. Two rows with different source and destination values in the input and output assignments are composed if there is a specified quad placement that can make these values equal. The allowed quad placements are specified by the five possible relations between local(L), home(H ) and remote(R) nodes, namely, L=H=R( all on the same quad), L=H6=R(local, home in the same quad but not remote), L 6=H =R (home, remote in the same quad but not local), L=R6=H (local, remote in the same quad but not home) and L 6=H 6=R (all are in distinct quads). The individual controller dependency tables produced from the controller tables and V based on the exact match requirement correspond to the quad placement given by the relation L 6=H 6=R. Four more sets of individual controller tables are generated from this set corresponding to the remaining four quad placement relations by modifying the individual controller tables. Pairwise dependency tables are generated for each set and participate in the union to create the protocol dependency table. To infer channel dependencies due to transaction interleavings, the composition requirement is further relaxed to ignore the messages while matching input and output assignments. The protocol dependency table represents the graph V CG in a tabular form. There is a directed edge (vc1, vc2) in V CG for each row (m1, s1, d1, vc1, m2, d1, s2, vc2) in the protocol dependency table. An absence of cycles in this table indicates absence of deadlocks. Cycles in this table indicate potential deadlocks and need to be analyzed.

4.2. An example of a deadlock discovered One of the nontrivial deadlocks found in our system using SQL is given in Figure 4. The method was initially applied to a channel assignment with four virtual channels VC0-VC3 and several cycles leading to deadlocks were found. Most of these deadlocks involved the directory controller and the memory controller at the home node. To resolve these deadlocks a new virtual channel V C 4 was added to carry the messages between these two controllers and V was modified by reassigning messages. Application of the method to this new assignment discovered this deadlock, which was resolved by adding a dedicated hardware path from directory controller to the home memory controller for mread requests. Our design team informed us that adding

such a path is a major revision and could have proven costly if it was found later. The virtual channels VC0-VC4 were assigned based on the source and the destination and the classification of messages as requests vs. responses. V C 0 carries requests from local to home, V C 1 carries requests from home to remote, V C 2 carries responses from remote to home, V C 3 carries responses from home to local and V C 4 carries requests from home directory to home memory.

Figure 4. Deadlock Example Consider the deadlock scenario given in Figure 4. Figure 4 represents two interleaved transactions for cache lines A and B involving two quads. Both A and B belong to the home memory at quad 2 and are tracked by the directory controller (D2) in that quad. Local node at quad 1 has a modified copy of B; A is modified at the remote node in quad 2. Since memory and remote node are both at quad 2 with local node at quad 1, the quad placement relation is L 6=H = R. Initially, the local node concurrently issues wb(B) and readex(A) requests on V C 0 to D2 to write back its modified copy of B to memory, and to get exclusive ownership of A. The wb(B) request reaches D2 first and is forwarded on V C 4 to the home memory. Then, readex(A) reaches D2, which generates a sinv(A) request on V C 1 to the remote node. The sinv(A) from D2 reaches remote node and is processed prior to the receipt of wb(B) request. Further, the remote node writes back its modified line A to memory before receiving sinv(A). The remote node responds to the sinv(A) request on V C 1 and generates idone(A) response on V C 2 to D2. Home memory then receives wb(B) on VC4. Now, D2 can process the idone(A) response on V C 2 only if it can send mread(A) on V C 4. However, V C 4 is occupied by wb(B) request. Home memory can process this request only if it can send a compl(B) response on V C 2, which is occupied by idone(A) response from the remote node. Therefore, there is a cyclic dependency involving channels V C 2 and V C 4 and the system is deadlocked. This deadlock is detected by SQL using the memory and the directory controller tables. The row specifying the processing of a wb request in the home memory controller table produces the row R1: (wb, home, home, VC4, compl, home, home, VC2) in the memory controller dependency table. Similarly, the row specifying the processing of idone at the directory controller table produces the row R2: (idone, remote, home, VC2, mread, home, home, VC4) in the directory controller dependency table. The quad placement

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

relation L 6=H = R is used to modify R2 to R2’:(idone, home, home, VC2, mread, home, home, VC4). The row R1 can be composed with R2 0 by ignoring messages and the row R3: (wb, home, home, VC4, mread, home, home, VC4) is added to the pairwise dependency table and hence in the protocol dependency table. Thus V CG contains a cycle involving virtual channel V C 4. Similarly, by composing R2’ with R1 a cycle involving V C 2 is added to the protocol dependency table. Manual analyses of these cycles gives the Figure 4 deadlock scenario. As explained above, several conditions are necessary for the above deadlock to occur. Devising a test for these conditions apriori seems difficult. Model checkers [6, 3] based on formal approaches have a lot of reasoning power and can detect such deadlocks. However, to use these tools, the controller tables need to be extensively abstracted to avoid the state explosion problem [4]. Based on our experience, the proposed SQL based approach for statically analyzing finite resource dependencies is surprisingly effective in detecting deadlocks in cache coherence protocols.

4.3. Checking protocol invariants using SQL In addition to deadlocks several protocol invariants are identified and checked before implementation using SQL. Some of the invariants that were checked using SQL for the table D are described below. The first invariant ensures that the directory state and the presence vector are consistent by checking that the presence vector has exactly only sharer whenever the directory state is MESI, it has one or more sharers when the state is SI, and has no sharers when the state is I. It is specified in SQL as

property of the ASURA protocol that any transaction that is allocated a busy directory entry must complete with either D receiving a compl response or with D sending such a response to the requestor (in fact, this is specified as an invariant as well). The SQL for the second invariant is [Select inmsg, bdirst, locmsg from D where isrequest(inmsg) and not(bdirst = "I" and locmsg= "retry")]= empty and [Select inmsg, bdirst, nbdirst, locmsg from D where not inmsg = "compl" and not locmsg = "compl" and not bdirst = "I" and nbdirst = "I")] = empty.

The invariants involving other controllers and interactions of controllers are similarly easily written in SQL. These invariants can be efficiently checked due to the many query optimization techniques inherent in relational database systems. All of the protocol invariants (around 50) are checked on a SUN Sparc 10 within 5 minutes.

5. Mapping tables onto hardware using SQL We now describe how the debugged table D can be mapped into hardware using SQL. A block diagram depicting a hardware implementation of the directory controller is given in Figure 5.

[Select dirst, dirpv from D where dirst = "MESI" and not dirpv = "one" and dirst ="SI" and not dirpv ="gone" and dirst="I" and not dirpv ="zero"] = empty.

The next two invariants involve the busy directory. The first of these checks the mutual exclusion between the busy directory and the directory. It states that a line can be either in the busy directory or in the directory but not in both. The corresponding SQL is [Select dirst, bdirst from D where not dirst = "I" and not bdirst = "I"] = empty.

The second invariant establishes an important property of D. It ensures that D serializes requests to the same address. To establish this we check that a request is issued a retry response whenever a line is in the busy directory, and that a busy directory entry is de-allocated only when a transaction completes. These two together guarantee that the requests to the same address are handled one at a time by D. To identify the completion of a transaction, we use the

Figure 5. A hardware implementation of D To implement D, a number of finite hardware resources (queues) are introduced in Figure 5. There are 3 queues locmsg, remmsg and memmsg corresponding to the output messages from D, 2 queues - lookup and upd for reading and writing to the directory, and 2 queues - request and response that provide input to D. D is modified to manage these queues. Further, implementation splits D into two controllers processing requests and responses in parallel and adds a feedback path from the output of the response controller to the input of the request controller. To add the hardware implementation details, an extended table ED is created from D by adding 2 new input columns

0-7695-1926-1/03/$17.00 (C) 2003 IEEE

Qstatus, Dqstatus and a new output column Fdback. The column tables for the input columns with values Full. NotFull and a that for the output column with value Dfdback are created. The inmsg table is extended to the table Impinmsg to include the implementation defined request Dfdback. Qstatus has the value Full if any of locmsg, remmsg, memmsg, Updmsg queues or the busy directory is full; it is Notfull otherwise. The implementation of D de-queues a request from the input queue after allocating an entry in the locmsg queue to send a retry response, if needed. Then, the request is processed based on the value of Qstatus. If Qstatus = Full then a retry response is generated as the locmsg. If Qstatus = NotFull then one entry is allocated in each of the output queues and the request is processed based on the controller state. The new input Dqstatus indicates whether the update queue is full. On a response, if the directory controller needs to update the directory and Dqstatus = Full then the controller generates the Dfdback request which is fed back to the request controller as a new request through the feedback path. Dqstatus is not consulted for requests, which are handled based on Qstatus. To generate ED, column constraints for Qstatus, Dqstatus and Fdback are specified and the column constraints for D are modified to specify their behavior for the request Dfdback. The constraint for the column locmsg in D is modified to generate a retry response if Qstatus = Full. ED is the cross product - Impinmsg X Qstatus X DqStatus X Fdback X (D - inmsg) whose entries satisfy the column constraints. Nine implementation tables are generated for D by partitioning ED using SQL. There is one implementation table for each output generated by the request and the response controllers. For example, the SQL for generating the table for the output remmsg of the request controller is Create Table Request_remmsg as Select distinct ED.Inputs, remmsg from ED Where (isrequest (ED.Inputs.inmsg))

To ensure that no errors are introduced in mapping D to implementation tables it was also explicitly checked that D could be reconstructed from the these nine implementation tables by using SQL table operations. The other controller tables are similarly mapped to a set of implementation tables using SQL. Code is automatically generated from these tables using SQL report generation. Using SQL allowed us to systematically map the entire protocol to a hardware implementation using simple and reliable table operations. Otherwise, tools to do this mapping and additional checkers with comparable capabilities to SQL may have to be developed which is a lot of work.

errors in industrial strength cache coherence protocols. The proposed approach has been deployed at Fujitsu in the development of a directory-based protocol used in their nextgeneration multiprocessor system. The approach was well received by architects, designers and testing teams alike. A total of 8 controller database tables were automatically generated, updated and maintained throughout the development cycle. Three architects generated the initial controller database tables in 2 months and went through several revisions subsequently. Five logic designers concurrently extended the debugged tables with implementation details. The hardware controller descriptions produced using SQL have met the engineering constraints including both timing and area. The approach is implemented using ORACLE8 on a network of SUN Sparc 10 machines. Acknowledgements: Thanks to Akira Hattori, Pat Conway, Nakagawa, Jung Rung, Takeshi, Shieh and Hitoshi Oi at Fujitsu System Technology Division for their support and contributions. Thanks to the anonymous referees for their comments for improving the quality of this paper.

References [1] W. Dally and C. Sietz. Deadlock free message routing in multiprocessor interconnection networks. IEEE Trans. on Computers, 36, 1987. [2] A. T. Eir’iksson and K. McMillan. Using formal verification/analysis methods on the critical path in system design: A case study. Int. Symp. on Computer-Aided Verification, 1995. [3] G. Holzmann. The model checker spin. Software Engineering, 23(5), 1997. [4] A. Hu, M. Fujita, and C. Wilson. Formal the verification of the hal s1 system cache coherence protocol. International Conference on Computer Design, 1997. [5] D. Lenoski, J. Laudon, K. Gharachorloo, W. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The stanford dash multiprocessor. IEEE Trans. on Computer, 1992. [6] K. McMillan. Symbolic Model Checking. Kluwer Academic Publishers, 1993. [7] M. Papamarcos and J. Patel. A low overhead coherence solution for multiprocessors with private cache memories. Int. Symp. On Computer Architecture, 1984. [8] S. Park and D. Dill. Verification of cache coherence protocols by aggregation of distributed transactions. Theory of Computing Systems, 31(4), July 1998. [9] F. Pong and M. Dubois. Verification techniques for cache coherence protocols. ACM Computing Surveys, 29(1), 1997. [10] D. Sorin, M. Plakal, A. Condon, M. Hill, M. Martin, and D. Wood. Specifying and verifying a broadcast and multicast snooping cache coherence protocols. IEEE Transactions on Parallel and Distributed Systems, 13(6), 2002.

6. Conclusion This paper presents a novel methodology based on relational database technology and SQL for early detection of

0-7695-1926-1/03/$17.00 (C) 2003 IEEE