Automatic Synthesis of Extended Burst-Mode Circuits - CiteSeerX

0 downloads 0 Views 460KB Size Report
S4, because the required cube for a1 during the state/output burst from S0 to S4 can ...... VLSI-programming language Tangram and its translation into handshake circuits,” in Proc. European ... User and Tutorial manual. [7] J. Cortadella, M.
Automatic Synthesis of Extended Burst-Mode Circuits: Part II (Automatic Synthesis) Kenneth Y. Yun, Member, IEEE Abstract— We introduce a new design style called extended burst-mode. The extended burst-mode design style covers a wide spectrum of sequential circuits ranging from delay-insensitive to synchronous. We can synthesize multiple-input change asynchronous finite state machines, and many circuits that fall in the gray area (hard to classify as synchronous or asynchronous) which are difficult or impossible to synthesize automatically using existing methods. Our implementation of extended burst-mode machines uses standard CMOS logic, generates low-latency outputs, and guarantees freedom from hazards at the gate level. In Part II, we present a complete set of automated sequential synthesis algorithms: hazard-free state assignment, hazard-free state minimization, and critical-race-free state encoding. Experimental data from a large set of examples are presented and compared to competing methods, whenever possible. Keywords— Asynchronous controller, Extended burst-mode, Automatic synthesis

I. I NTRODUCTION Asynchronous circuits are finding a niche in industrial applications that require ultra high performance, low power/EM radiation, or multiple timing domains. One of the most significant reasons for the resurrection of asynchronous circuits in the 1980’s and the 90’s was the advent of automatic synthesis [1], [2], [3], [4], [5], [6], [7], which meant tedious and complex tasks, such as hazard-free state assignment and logic minimization, could be carried out by computer. Parts I and II of this paper present a new design style for asynchronous control circuits, called extended burst-mode, and an automated synthesis procedure for it. In Part II, we prove that a hazard-free solution exists for every legal extended burst-mode specification. The basic paradigm used to prove the existence of a hazard-free solution is similar to an earlier work on burst-mode synthesis by Nowick and Dill [2], [8]. We show that a hazard-free implementation exists by proving that, for any legal specification, a next state table constructed with no state minimization is function-hazard-free and a logic-hazard-free realization from it is feasible. We then constrain the state minimization and encoding to insure that the resulting implementation remains hazard-free. Part II also describes a complete set of automated sequential synthesis algorithms (hazard-free state assignment, hazard-free state minimization, and critical-race-free state encoding) and experimental data from a large set of examples. The automated synthesis tool (called 3D) uses a combination of exact algorithms and, when appropriate, heuristics that find near-optimal solutions in polynomial time. The rest of Part II is organized as follows. Section II provides a brief review of Part I in an attempt to make Part II as selfcontained as possible. Section III describes the hazard-free state This work was supported in part by a gift from Intel Corporation and by a National Science Foundation CAREER Award MIP-9625034. K. Yun is with ECE Dept., University of California, San Diego, 9500 Gilman Drive, La Jolla, CA 92093-0407; email: [email protected]. D. Dill is with CS Dept., Stanford University, Stanford, CA 94305; email: [email protected].

David L. Dill, Member, IEEE assignment algorithm and proves the existence of a hazard-free implementation for every legal extended burst-mode specification. It describes the notion of state compatibility based on dynamic hazard freedom, presents a state minimization algorithm, and describes a critical-race-free state encoding algorithm. Finally, it describes how the 3D synthesis interfaces to Nowick and Dill’s hazard-free combinational logic synthesis back-end. Section IV reports the experimental results and compares our results to competing methods, when possible. II. R EVIEW A. Extended Burst-Mode Specification Figure 1 describes an extended burst-mode state machine (biu-fifo2dma) having 4 inputs (ok, cntgt1, fain, dackn) and 2 outputs (frout, dreq). Signals not enclosed in angle brackets, such as ok, fain, and dackn are edge signals. Edge signals ending with + or − are terminating signals; the ones ending with ∗ are directed don’t cares. If a state transition is labeled with a directed don’t care a∗, then the following state transition must be labeled with a∗ or a+ or a−. A terminating signal a+ denotes a 0 → 1 transition of a if a was initially 0, and no transition at all if a was initially 1. A sequence of state transitions labeled with a∗ and terminated with a+ represents a single 0 → 1 transition of a at any point in the sequence. A terminating signal not immediately preceded by a directed don’t care represents a compulsory transition. Signals enclosed in angle brackets, such as cntgt1, represent conditional or level signals. hcntgt1+i and hcntgt1−i denote conditional clauses “if cntgt1 is high” and “if cntgt1 is low.” An input burst is a non-empty set of input edges (terminating or directed don’t care) at least one of which must be a compulsory transition. An output burst consists of a possibly empty set of output edges. If a state transition is not labeled with a level signal, the signal may change freely during the transition. How-

0

ok− fain− dackn+ / ok+ / frout+

1 fain+ / dreq+ frout−

3 fain− dackn+ / frout+

4

fain* dackn− / dreq−

2

fain* dackn− / dreq−

fain+ / dreq+ frout−

Fig. 1. Biu-fifo2dma specification.

5

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

ever, if an edge signal is not mentioned in a transition, it is not allowed to change. There is a restriction to the extended burst-mode specification, called the distinguishability constraint, to prevent ambiguity among multiple input bursts emanating from a single state: For every pair of input bursts i and j from the same state, either the conditions are mutually exclusive, or the set of compulsory edges in i is not a subset of the set of all possible edges in j. In addition, we require that the unique entry condition is satisfied. That is, the set of possible entry points into a state (input and output values entering a state) from every predecessor state must be identical. In a given state, when all the specified conditional signals have correct values and when all the specified terminating signals in the input burst have changed, the machine generates the corresponding output burst and moves to a new state. Specified edges in the input burst may appear in arbitrary temporal order. However, the conditional signals must stabilize to correct levels before any compulsory edge in the input burst appears and must hold their values until after all of the terminating edges appear. Outputs may be generated in any order, but the next set of compulsory edges from the next input burst may not appear until the machine has stabilized. B. 3D Implementation A 3D asynchronous finite state machine is a 4-tuple (X, Y, Z, δ) where X is a non-empty set of primary input symbols, Y a non-empty set of primary output symbols, Z a possibly empty set of internal state variable symbols, and δ : X × Y × Z → Y × Z is a next-state function. The hardware implementation of a 3D state machine is a next-state logic network, which implements the next-state function, with the outputs of the network fed back as inputs to the network. A 3D implementation of an extended burst-mode specification is obtained from the next-state table, a 3-dimensional tabular representation of δ, as described in section III. The next state of every reachable state must be specified in the next-state table; the remaining entries are don’t cares. A Type I machine cycle consists of an input burst followed by a concurrent output and state burst. Initially or after completion of the previous output and state burst, the machine waits for an input burst to arrive. When the machine detects that all of the terminating edges of the input burst have appeared, it generates a concurrent output/state burst, which may be empty. A Type II machine cycle consists of an input burst followed by an output burst followed by a state burst. In a Type III machine cycle, a state burst precedes an output burst. In the 3D implementation of Type I extended burst-mode circuits, no fed-back output or state variable change arrives at the network input until all of the specified edges in the output and state burst have appeared at the network output. These conditions are met by inserting delays in the feedback paths as necessary. A 3D machine can then be viewed as a combinational network alternately excited by a set of input edges (during an input burst) and by a set of fed-back output and state variable edges (during an output/state burst). Thus each burst is a generalized transition of inputs to the next-state network, as described below.

2

Generalized transition. A generalized transition is a triple (T, A, B) where T is a mapping from a set of inputs to a set of input types, A a startcube, and B an end-cube. There are three types of inputs: rising edge, falling edge, and level signals. Edge inputs can only change monotonically. Level inputs must remain constant or undefined (don’t care), which implies that each level input must hold the same value in both A and B or be undefined in both A and B. Level inputs, if they are undefined, may change nonmonotonically, A generalized transition cube [A, B] is the smallest cube that contains the start- and end-cubes A and B. It represents the set of all minterms that can be reached during a legal transition from a point in start-cube A to a point in end-cube B, assuming that the inputs can change in arbitrary order. Open generalized transition cubes, [A, B), (A, B], and (A, B), denote [A, B]−B, [A, B] − A, and [A, B) − A respectively. Note that [A, B) = ∅, if A = B. The start-subcube A0 is a maximal subcube of A such that the value of every rising edge input i in A0 is 0, if it is ∗ in A, and the value of every falling edge input j in A0 is 1, if it is ∗ in A. The end-subcube B 0 is a maximal subcube of B such that the value of every rising edge input i in B 0 is 1, if it is ∗ in B, and the value of every falling edge input j in B 0 is 0, if it is ∗ in B. Intuitively, the longest transitions, disregarding non-monotonic signals, are those that lead from A0 to B 0 . A generalized transition (T, A, B) is a static transition for f iff f (A) = f (B); it is a dynamic transition for f iff f (A) 6= f (B). No change in level inputs can enable output changes directly, that is, at least one edge input must change from 0 to 1 or from 1 to 0 in a generalized dynamic transition. During a generalized transition (T, A, B), each output signal is assumed to change its value at most once. If not, a function hazard is said to be present. Below is the new definition of function hazard adapted for generalized transitions: Definition 1 A combinational function f contains a function hazard in (T, A, B) iff 1. there exists a pair of minterms X, Y in A such that f (X) 6= f (Y ), or 2. there exists a pair of minterms X, Y in B such that f (X) 6= f (Y ), or 3. there exists a pair X, Y in (A, B) such that Y ∈ [X, B 0 ) (or, equivalently, X ∈ (A0 , Y ]) and f (A) 6= f (X) and f (Y ) 6= f (B). An extended burst-mode transition is a generalized transition with the following requirements: 1. For every pair of minterms X and Y in [A, B), f (X) = f (Y ). 2. For every pair of minterms X and Y in B, f (X) = f (Y ). We showed that every extended burst-mode transition is function-hazard-free in Part I. An edge signal that changes from 0 or ∗ to 1 or from 1 or ∗ to 0 during an extended burst-mode transition from A to B is a terminating signal in [A, B]. An edge signal whose value is ∗ in B is a directed don’t care in [A, B]. A level signal whose value is ∗ in [A, B] is an undirected don’t care. In a dynamic extended burst-mode transition, the output is enabled to change only after all of the terminating edges appear.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

III. AUTOMATIC S YNTHESIS P ROCEDURE

A.1 Function-Hazard-Free Next State Assignment

The synthesis procedure consists of the following steps: 1. next state assignment: A primitive next-state table with each specification state assigned to a unique layer is constructed from the extended burst-mode specification. 2. layer minimization: Layer minimization is performed by merging compatible layers. 3. layer encoding: A layer diagram, which represents connectivities and encoding restrictions among the layers, is generated, and then a critical-race-free layer encoding is performed. 4. next-state logic synthesis: Next-state logic synthesis is carried out. Both the proof of correctness and the basic synthesis strategy follow the paradigm used in an earlier work on burst-mode synthesis by Nowick and Dill [2], [8]. That is: (1) we show that a hazard-free implementation exists by proving that, for any legal specification, a next state table constructed with no state minimization is function-hazard-free and a logic-hazard-free realization from it is feasible; (2) we constrain the state minimization to insure that the resulting implementation remains hazard-free. A. Next State Assignment We describe an algorithm for assigning next states in a primitive next-state table. In a primitive next-state table, a state is a combination of the values of primary inputs and outputs and the current specification state of a 3D machine; a next state is a combination of the next output values and the next specification state of a state. A layer of a primitive next-state table contains 2l+m+n entries, each of which represents the next state of a state, where l, m, and n are the total numbers of conditional inputs, edge inputs, and outputs respectively. A primitive next-state table consists of a set of layers; there is a one-to-one correspondence between the set of layers in a primitive nextstate table and the set of specification states of the 3D machine that the table represents. This algorithm assigns a next state for every reachable entry in the table, according to the extended burst-mode semantics.

abc 000 001 011 010 110 111 101 100 XY 00 0 0 1 1 B 01 1 1 11 10

0

00

0

01

A

B

0

11 10

0

00

0

1

1

1

1

1

C

1

1

1

1 1

1 0

1

1

C

1

1

11 10

1 1

1 1

D

A

B x−

a+ b*

x+ y+ x+

y+

x+

y+

x+

y+

a+ b+ C

B x−

a+ b+ C

1

01 0

0

x−

a+ x−

Fig. 2. Next state assignment.

b+ C

3

D

A Type I machine cycle that requires no conditional signal to stabilize has transitions corresponding to an input burst and a concurrent output/state burst. A Type II (III) machine cycle that requires no conditional signal to stabilize has transitions corresponding to an input burst and an output burst followed by a state burst (a state burst followed by an output burst). Fig. 2 illustrates the next state assignments for the layer that corresponds to S0 of a fragment of an example. The Karnaugh maps on the left side represent the next output function, X, of output x during the output/state burst from S3 to S0 and the input and output/state bursts from S0 to S1 . The first transition, which is a static extended burst-mode transition, corresponds to the output burst x−. The start cube A of this transition is abcXY : 00010, and the end cube B is abcXY : 00000. The second transition corresponds to the input burst a+ b∗. The start cube B of this transition is abcXY : 00000, and the end cube C is abcXY : 1x000. f ([B, C)) = f (B) = 0 and f (C) = 1; thus it is a dynamic extended burst-mode transition. The third transition, which is a static extended burst-mode transition, corresponds to the output burst, x+ y+. The start cube C of this transition is abcXY : 1x000, and the end cube D is abcXY : 1x011. cab 000 001 011 010 110 111 101 100 X 0 0 0 0 0 1 1 B A 1 1 1

A

b+

x+

c+ c− a+ B

0 1

0

0

C 0

b+

D 1

1

1

1

0

x+

c+ c− a+ D C

Fig. 3. Conditional input setup transition.

A 3D machine cycle that requires conditional signals to stabilize has an additional transition for setting up conditional signals. Consider an input burst, hc+i a∗ b+ (see Fig. 3). Initially, c = ∗ and a = b = 0. Conditional signals must stabilize before any compulsory edge appears. The conditional signal c may change non-monotonically until some setup time before b+ appears. Since a is a directed don’t care, it may rise before c stabilizes to 1. Therefore, the start cube A of this conditional input setup transition is cabX : x000, and the end cube B is cabX = xx00. All conditional input setup transitions are static, because outputs cannot be enabled to change until the setup is complete. The next transition corresponds to the input burst, a∗ b+. The start cube C of this transition is cabX : 1x00, and the end cube D is cabX : 1x10. f ([C, D)) = f (C) = 0 and f (D) = 1; thus it is a dynamic extended burst-mode transition. Given an extended burst-mode specification, G = (V, E, C, I, O, v0 , cond, in, out), with C = {c1 , . . . , cl }, I = {x1 , . . . , xm }, and O = {y1 , . . . , yn }, let W be the set of conditional input bit vectors {(c1 , . . . , cl ) | ci ∈ {0, 1}, i ∈ 1, . . . , l}, let X be the set of edge-input bit vectors {(x1 , . . . , xm ) | xj ∈ {0, 1}, j ∈ 1, . . . , m}, and let Y be the set of output bit vectors {(y1 , . . . , yn ) | yk ∈ {0, 1}, k ∈ 1, . . . , n}. A primitive next-

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

state table is defined as T = (V, W, X, Y, δ, λ), where V is the set of specification states, and δ : V × W × X × Y → V ∪ {∗} and λ : V × W × X × Y → {0, 1, ∗}n define the next specification state function and the next output function respectively. Below, we describe the next state assignment for Type I (an input burst followed by a concurrent output/state burst) and Type II (an input burst followed by an output burst followed by a state burst) machine cycles. cond(u, v), in(u), and out(u) denote the values of conditional inputs from u to v, edge-inputs in u, and outputs in u respectively. L(u) is a symbolic code assigned to specification state u. [cond(u, v), in(u), out(u), L(u)] denotes a cube in {0, 1, ∗}l+m+n+r , where r is the length of state variable bit vectors. Of course, the actual value of r will not be determined until the layer encoding is done. ∗ in the place of conditional inputs is a shorthand notation, meaning that all conditional inputs are ∗. in0 is defined as: for all j ∈ 1, . . . , m, in0 j (u) =



inj (u) if ∃ (u, v) such that inj (v) 6= ∗ . ∗ otherwise

in0 j (u) 6= ∗ iff there exists a state transition (u, v) in which j is compulsory or a constant. Thus, in0 (u) represents the allowed values of edge-inputs immediately prior to the first compulsory input edge during a state transition from u. ∗ means that both 0 and 1 are possible. The next state assignment for an input burst is as follows: For all k ∈ 1, . . . , n and for every state transition, (u, v), • Conditional input burst: Conditional input setup: λk (M ) = outk (u) and δ(M ) = u, for every minterm M in [A, B], where A = [∗, in(u), out(u), L(u)]; B = [∗, in0 (u), out(u), L(u)]. Input burst: λk (M ) = outk (u) and δ(M ) = u, for every minterm M in [A, B), where A = [cond(u, v), in0 (u), out(u), L(u)]; B = [cond(u, v), in(v), out(u), L(u)]. •

Unconditional input burst: λk (M ) = outk (u) and δ(M ) = u, for every minterm M in [A, B), where A = [∗, in(u), out(u), L(u)]; B = [∗, in(v), out(u), L(u)].

Note that [A, B] = B during conditional input setup transitions because A ⊆ B. The only edge-inputs that may change during conditional input setup transitions are non-compulsory signals. The next state assignment for output and state bursts is as follows: For all k ∈ 1, . . . , n and for every state transition, (u, v), • Type I machine cycle: Output/state burst: λk (M ) = outk (v) and δ(M ) = v, for every minterm M in [A, B], where A = [cond(u, v), in(v), out(u), L(u)]; B = [cond(u, v), in(v), out(v), L(v)]. •

Type II machine cycle:

4

Output burst: λk (M ) = outk (v) and δ(M ) = u, for every minterm M in [A, B), where A = [cond(u, v), in(v), out(u), L(u)]; B = [cond(u, v), in(v), out(v), L(u)]. State burst: λk (M ) = outk (v) and δ(M ) = v, for every minterm M in [A, B], where A = [cond(u, v), in(v), out(v), L(u)]; B = [cond(u, v), in(v), out(v), L(v)]. Finally, for all the remaining entries, λk (M ) = ∗ and δ(M ) = ∗. We prove that a function-hazard-free next state assignment exists for every output and state variable if each specification state is assigned to a unique layer of the next-state table and if the layers can be encoded so that every state burst is functionhazard-free. In section III-C, we will show that such layer encoding is always possible as well. Theorem 1 The above next state assignments are free of function hazards, if each specification state is assigned to a unique layer and if the layers can be encoded so that every transition crossing the layer boundary is function-hazard-free. Proof: We prove this for Type I machine cycles: Consider a layer which corresponds to specification state u. According to the next state assignment algorithm for Type I machine cycles, next states must be assigned for the following transitions: output/state burst transitions into u, conditional input setup transitions from u, and input burst and output/state burst transitions from u. Since all these transitions are extended burstmode transitions and only the input bursts can be dynamic transitions, each output must have the same next output throughout all the output/state burst transitions into u, the conditional input setup transition, and all the input burst transitions excluding the end cubes of the transitions. The next output values may change in the end cubes of the input burst transitions but remain at those values during the corresponding output/state transitions from u. Therefore, to show that the next state assignment for layer u is free of function hazards, it suffices to show that no output/state burst transition from u intersects (1) an output/state burst transition into u, (2) the conditional input setup transitions from u, (3) the other output/state burst transitions from u or the input burst transitions which enable those output/state bursts. Without loss of generality, consider an output/state burst transition from u to v. 1. Since every input burst must contain a compulsory edge, there exists j ∈ 1, . . . , m such that inj (u) 6= inj (v) and inj (u) 6= ∗ and inj (v) 6= ∗. Therefore, the generalized transition cube for the output/state burst from u to v, [. . . , in(v), . . .], does not intersect any generalized transition cube for the output/state bursts into u, [. . . , in(u), . . .]. 2. Since every input burst must contain a compulsory edge, there exists j ∈ 1, . . . , m such that inj (u) 6= inj (v) and inj (u) 6= ∗ and inj (v) 6= ∗. Moreover, inj (v) 6= ∗ implies in0j (u) = inj (u), which means that in0j (u) 6= inj (v) and

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

in0j (u) 6= ∗ and inj (v) 6= ∗. Therefore, the generalized transition cube for the output/state burst of (u, v), [. . . , in(v), . . .], does not intersect the conditional input setup transition from u, [. . . , in0 (u), . . .]. 3. The distinguishability constraint requires that, for every pair of state transitions (u, v) and (u, w), either the conditions are mutually exclusive, or the set of compulsory edges in the input burst of (u, v) is not a subset of the set of all possible edges in the input burst of (u, w). Therefore, either there exists i ∈ 1, . . . , l such that condi (u, v) 6= condi (u, w) and condi (u, v) 6= ∗ and condi (u, w) 6= ∗, or there exists j ∈ 1, . . . , m such that j is compulsory in (u, v) but a constant in (u, w), that is, inj (v) 6= inj (u), inj (v) 6= ∗, inj (u) 6= ∗, and inj (w) = inj (u). In the first case, neither the input burst transition nor the output/state burst transition of (u, w) can intersect the output/state burst transition of (u, v), because the generalized transition cubes of both the input and output/state bursts of (u, w) are of the form [cond(u, w), . . .] but the generalized transition cube of the output/state burst of (u, v) is of the form [cond(u, v), . . .]. The same is true of the second case, because the generalized transition cube of the output/state burst of (u, v) is of the form [. . . , inj (v), . . .] but the generalized transition cube of the input and output/state bursts of (u, w) are of the form [. . . , inj (v), . . .]. Therefore, the only possible intersections are between the input burst transitions and the corresponding output/state burst transitions. Since the end cube of an input burst transition is the same as the start cube of the corresponding output burst transition and the next states are specified the same, there is no function hazard. A.2 Checking Freedom from Logic Hazards Now we must examine whether it is possible to find a hazardfree logic implementation from the next-state table constructed using the above next state assignment algorithm. In Part I, we derived hazard-free covering requirements for both two-level AND-OR and gC, as summarized below. 1. Output/state variable y in two-level AND-OR: (a) For every 1 → 1 transition of y, there exists a product term of y that contains the corresponding transition cube. (b) For every 1 → 0 transition of y, all product terms of y that intersect the corresponding transition cube must also contain its start-subcube. (c) For every 0 → 1 transition of y, all product terms of y that intersect the corresponding transition cube must also contain its end-subcube. 2. Output/state variable y in gC: (a) For every 0 → 1 transition of y, there exists a product term of yset that contains its end-subcube. Furthermore, all product terms of yset that intersect the corresponding transition cube must also contain its end-subcube. (b) For every 1 → 0 transition of y, there exists a product term of yreset that contains its start-subcube. Furthermore, all product terms of yreset that intersect the corresponding transition cube must also contain its start-subcube. (c) In addition, to remove feedback delay requirements, i. a product term of yset must contain a transition cube associated with an output/state burst enabled by every burst that sets y;

5

ii. a product term of yreset must contain a transition cube associated with an output/state burst enabled by every burst that resets y. Transition cube [A, B] is a privileged cube for f , if f changes from 0 to 1 and B contains more than one minterm or if f changes from 1 to 0 and [A, B) contains more than one minterm. Transition cube [A, B] is a privileged cube for fset , if f changes from 0 to 1 and B contains more than one minterm, or a privileged cube for freset , if f changes from 1 to 0 and A contains more than one minterm. Each maximal subcube of [A, B] needed to satisfy the covering requirements is a required cube of [A, B]. These covering requirements can be satisfied trivially for each transition individually. It is not obvious, however, whether it is possible to satisfy the requirements for every transition simultaneously. First, we will show that it is always possible to satisfy the covering requirements for every transition without violating a requirement for another transition, provided that every input burst is unconditional, that is, no level input exists in the specification. Second, we will show how level signals, in certain situations, cause dynamic hazards and then show how to handle level signals by further constraining the next state assignment. Assume that no level input exists (and each specification state is assigned to a unique layer). Consider specification state u. In Type I machine cycles, only the input bursts can be dynamic transitions; hence, only the input burst transition cubes can be privileged cubes. By the unique entry condition imposed on the extended burst-mode specification, the end cube of every output/state burst transition into u is the same as the start cube of the input bursts. Therefore, the required cubes of outputs/state variables associated with state transitions into u, which intersect input burst transition cubes, contain the start cubes of the input bursts in u. Furthermore, the required cubes of outputs/state variables associated with state transitions out of u contain the end cubes of the input bursts. Thus, no required cubes illegally intersect privileged cubes. Similarly, we can show that no required cubes illegally intersect privileged cubes in Type II machine cycles as well. So far, we sketched a proof that there exists a hazard-free logic implementation for an extended burst-mode specification without level signals, by constructing a function-hazard-free next-state table with each specification state assigned to a unique layer and showing that all covering requirements for logic hazard freedom can be satisfied. Below, we examine the effects of undirected don’t cares on the next state assignment. Effects of undirected don’t cares on next state assignment. We examine the effects of allowing undirected don’t cares, that is, how conditional signals can cause dynamic hazards, in two-level AND-OR. We then show how making state variable changes before output changes can avoid these hazards. For two-level AND-OR implementations, our synthesis method automatically selects Type III machine cycles (instead of Type I or II) for certain state transitions susceptible to dynamic hazards induced by undirected don’t cares. Finally, we show that gC circuits are unaffected by the condition that causes dynamic hazards in two-level AND-OR. We begin by analyzing an example. Fig. 4 shows the specification of an example circuit described below. Fig. 5 depicts

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

one possible synchronous implementation and its timing. Fig. 6 shows a next-state table and the K-map for next x. Specification: If mode bit d sampled at the rising edge of clock φ is 1, output x follows the clock for that cycle and output y remains 0. Otherwise, y follows the clock and x remains 0. φ −/ y−

2

φ −/x−

d − φ +/ y+

0

1

d + φ +/x+

Fig. 4. Example II (specification).

d

D

φ

x

Q

φ d

y

Q

x y

Fig. 5. Example II (synchronous implementation).

dφ XY 00 01 11 00  00 01 10 01 00 01  01 11 10 00 10 10  xy

10 00  00 00

dφ XY 00 00 0 01 0 11 10 0

(a)

01 11 10 0 1 0 0 0 0 1 1 x

c a

d= 0

d=1

p+ X+ d− d+ φ− φ−

Fig. 8. Example II (solution represented in state graph).

φ+

c

X+ x α

1 1 Y X

φ+

φ+

Consider the input burst φ− in S1 , which causes output x to fall. The covering requirement states that no cube may intersect the transition cube [A, B] (A = dφXY : x110 and B = dφXY : x010) unless it also contains A0 , which is the same as A here. However, cube c, required to cover the output burst x+ in S0 , shown in Fig. 6, intersects cube a (part of the transition cube [A, B]), but it cannot be expanded to cover A0 — there is a dynamic hazard.

a

d= 1

3ULYLOHJHGFXEH

Fig. 6. Example II. (a) next-state table; (b) K-map of x.

d φ

Since we allow conditional signals to vary freely, d may fall, taking the machine to point α (product terms that correspond to cubes c and a are 0 and 1 respectively at α). If the next set of events is a concurrent input change d+ φ−, the machine may step through point β en route to point γ. During this change, product term a falls, and product term c glitches (0 − 1 − 0), which may propagate to the output causing a 1 − 0 − 1 − 0 glitch at the output (dynamic hazard). Note that this phenomenon can occur long after the hold time constraint is satisfied, because, according to the hold time requirement, d only needs to be stable until x becomes 1, and the cause of this glitch, a concurrent input change d+ φ−, can occur long after x+. This phenomenon would not occur if the conditional signal change is monotonic. This is a general problem that occurs when an output burst transition enabled by a conditional input burst intersects an unconditional input burst transition that enables an output to fall. Consider an output/state burst transition (To , E, F ), from z to u, enabled by a conditional input burst, where E = [cond(z, u), in(u), out(z), L(z)] and F = [cond(z, u), in(u), out(u), L(u)], and an unconditional input burst transition (Ti , A, B) from u, where A = [∗, in(u), out(u), L(u)] and B = [∗, in(v), out(u), L(u)]. Note that there are no setup transitions preceding unconditional input bursts. Assume that output x undergoes 1 → 1 and 1 → 0 transitions in [E, F ] and [A, B] respectively. If there exists i such that condi (z, u) 6= ∗, then A0 6⊆ F because the level signal i is a don’t care in A0 . If F cannot be expanded to contain A0 , then the covering requirement for the 1 → 0 transition is violated, which induces a dynamic hazard.

0

(b)

d=0

6

d− β d+ φ− γ

Fig. 7. Example II (problem).

To see how the hazard can actually occur, we examine a fragment of the state graph in Fig. 7. After the output burst x+, the machine is at point β waiting for the next input change φ−.

Solution. Our solution to avoid this dynamic hazard is to add a new layer and to move to it via a state burst before enabling outputs to change if the next input burst is unconditional and enables an output to fall. Intuitively, the trick is to “store” the conditions in the state variables. Once the conditions are stored in the state variables, the conditional signals can change freely. To eliminate hazards on the state variables, the state variables are “latched” by a “strobe” which turns on when the sampling edge arrives. The conditional signals must remain stable only until the strobe is turned off by the output changes.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

dφ PXY 00 000  00 001 011 010

01 11 10 00 00 00 

110 00 10  10 111 101 100 00 10 10 xy

dφ 00 01 11 10 0 0 0 0

00

0

00

0 1

1 1

3ULYLOHJHGFXEH

1 x

dφ 00 01 11 10 0 0 1 0

0

0

0

0 1

VWDUWFXEH

(a)

1 1

1

0

0

p

(b)

Fig. 9. Example II (solution). (a) Partial next-state table; (b) Partial K-maps for x and p.

Figs. 8 and 9 illustrate our solution. In Fig. 8, if d = 1 when φ rises, the machine transitions to a new layer via the state burst p+ before raising x. Thus the next x entry for dφXY = 1100 in the P = 0 part of the table in Fig. 9 is specified to be 0. Now we need to specify the next x for the output burst transition x+. The trick here is that we expand the generalized transition cube as if d were allowed to change, although the environment is not allowed to change d until x is stable because of the hold time constraint. This is possible because the generalized transition cube for this output burst is on a new layer, the P = 1 part of the table. When output x stabilizes to 1, the machine is in S1 . The next x for dφXY = x110, in the P = 1 part of the table, is specified to be 1 so that output x remains unchanged until the compulsory edge φ− appears. The start cube of the output burst x+ is dφP XY = x1100, and the end cube dφP XY = x1110. The required cube (dφP XY = x11x0) for this output burst now contains the start cube of the next input burst φ−; hence, there is no violation of covering requirements. Now examine the required cubes for state variable p we added (see the rightmost table in Fig. 9). We require one cube (dφP XY = 11x00) to cover the state burst p+ enabled by the conditional input burst hd+iφ+ and another cube (dφP XY = x11x0) to cover the output burst x+. Since the required cube (dφP XY = 11x00) does not intersect the start cube of the next input burst φ−, no covering requirement is violated.

φ

x d φ dφ Q X

p d

dφ Q X

p x

q y

Hold time

Fig. 10. Example II (circuit and timing).

7

We can also understand this in the physical circuit (see Fig. 10). The sampling edge φ+ generates a “strobe” (d φ Q X = 1) to latch p; once the output x rises, the strobe signal is turned off, blocking out the effects of changing d. No glitch occurs if d remains stable until x rises. Note that dφP XY = xx110 is a privileged cube (see Fig. 9b) because its start cube (A = x1110) contains two minterms. Furthermore, since A = A0 (d is a undirected don’t care), no cube that does not contain A can intersect this privileged cube. In our synthesis path, this can be guaranteed by constraining the logic minimization, i.e., flagging the logic minimizer that dφP XY = xx110 is a privileged cube with A0 = x1110. This constraint results in the inclusion of literal X in the “strobe” signal d φ Q X, which is consistent with the explanation above. Next state assignment for Type III machine cycle. Formally, the next state assignments for the state and output burst portions of a Type III machine cycle (an input burst followed by a state burst followed by an output burst) are as follows: For all k ∈ 1, . . . , n and for every edge, (u, v), • State burst: λk (M ) = outk (u) and δ(M ) = v, for every minterm M in [A, B), where A = [cond(u, v), in(v), out(u), L(u)]; B = [cond(u, v), in(v), out(u), L(v)]. •

Output burst: λk (M ) = outk (v) and δ(M ) = v, for every minterm M in [A, B], where A = [∗, in(v), out(u), L(v)]; B = [∗, in(v), out(v), L(v)].

Note that the output burst in Type III machine cycle must not be empty. If the output burst is empty in the specification, a dummy output edge is added. Next state assignment for gC implementation. As discussed above, the next state assignment for two-level SOP needs a special treatment when a high output at the end of a conditional input burst is enabled to fall by an unconditional input burst in the next specification state. This problem is caused by simultaneously having to satisfy the covering requirements for a 1 → 1 transition (enabled by the output burst) and for a 1 → 0 transition (enabled by the unconditional input burst). However, this scenario never occurs in the gC circuits because there are no associated required cubes for 1 → 0 transitions of f for fset (and no required cubes for 0 → 1 transitions of f for freset ). Typically, we use Type I machine cycle for gC implementations, because the gC implementation is generally used in applications requiring very low latency control circuits, although using Type II or III is certainly feasible for gC implementations. Summary. In this subsection, we showed that it is always possible to construct a function-hazard-free next-state table, with each specification state assigned to a unique layer, from every legal extended burst-mode specification. We then showed that a hazard-free logic implementation exists for every specification without level signals, by showing that all covering requirements can be satisfied. Finally, we demonstrated inherent dynamic logic hazard

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

problems in two-level AND-OR, when undirected don’t cares are introduced, and presented a solution, namely, Type III machine cycle. We also noted that gC implementations do not suffer from the same problem, hence can be designed for Type I machine cycle. B. Layer Minimization In the previous subsection, we showed that a hazard-free implementation (two-level AND-OR or gC circuit) exists if each specification state is assigned to a unique layer and if the layers can be encoded so that every transition crossing the layer boundary is function-hazard-free. In this section, we review a classical algorithm for hazard-free layer minimization [9] used in our tool. The goal of hazard-free layer minimization is to reduce the number of layers required for the next state table while insuring that a hazard-free implementation can be found for every output and state variable. This is done by merging compatible specification states, as defined below, into a common layer. B.1 Definitions A partially encoded total state (u, s), where u ∈ V , s ∈ W × X × Y , is a member of the set V × W × X × Y . (u, s) and (v, s) are output-compatible iff δ(u, s) = δ(v, s) or δ(u, s) = ∗ or δ(v, s) = ∗. u and v are dhf-compatible (dynamic-hazard-free compatible) if no dynamic hazard results from specifying the next states of u and v on a single layer of the next-state table. The notion of dhf-compatibility was first introduced by Nowick in [10]. u and v are compatible (u ∼ v) iff u and v are dhf-compatible and, for every s in W × X × Y , 1. (u, s) and (v, s) are output-compatible and 2. δ(u, s) = ∗ or δ(v, s) = ∗ or δ(u, s) ∼ δ(v, s). ∼ is reflexive and symmetric but not transitive. Thus ∼ is not an equivalence relation. We use the notation u ∼ v to mean that u and v are compatible and u 6∼ v to mean that u and v are incompatible. A compatible, C, is a set of specification states, of which every pair of specification states u and v in C are compatible. A maximal compatible is a compatible which is not a proper subset of any other compatible. A set of compatibles is said to cover the extended burst-mode specification if every specification state is included in at least one compatible. An irredundant cover is a cover in which each specification state appears in exactly one compatible. (u0 , v 0 ) belongs to the implied set of (u, v) if u ∼ v implies u0 ∼ v 0 . A compatible C is closed if, for every pair u and v in C, every member of the implied set of (u, v) is a subset of some compatible C 0 in the cover. A closed cover is a cover of which every compatible is closed. An irredundant closed cover is a closed cover in which each specification state appears in exactly one compatible. B.2 Avoiding Dynamic Logic Hazards Below, we examine the possibilities of dynamic hazard (1) produced by merging two states and (2) induced by state (or state/output) bursts.

8

Dynamic hazard produced by merging two states. We consider the first scenario, as shown in Fig. 11. If states Si and Sj are merged, layers i and j, represented as Karnaugh maps, would be superimposed, resulting in a violation of a hazard-free covering requirement for two-level SOP circuits: i.e., a required cube for output x, abXY : 11x1, illegally intersects a privileged cube abXY : xx11. ab XY 00 01 abxy = 1111 11 Si a+ b− / 10 x− Sj b+ / b− / ab x+ x− XY 00 Sk 01 11 10

i j ab 00 01 11 10 XY 00 01 11 10 00 01 11 01 01 11 11 11 11 11 01 10 xy xy Si DQG Sj QRWVRSGKIFRPSDWLEOH

00 01 11 10

1

1 1

1

0 0

,OOHJDOLQWHUVHFWLRQ

[A,B] 6WDUWSRLQWRI[A,B]

x Fig. 11. Si and Sj not dhf-compatible because Si and Sj on a single layer causes a required cube, abY , to intersect a privileged cube, XY , illegally.

The transition cube of every input burst for Type I machine cycles is treated as a privileged cube; since we do not yet have encoding of layers, we must assume that any input burst may enable a state variable to toggle. Likewise, the transition cubes for output bursts for Type II and input bursts for Type III are treated as privileged cubes. Note that two specification states bridged by a Type III machine cycle are always incompatible, because the next outputs of the minterms in the start cube of the state burst are different from those in the end cube. That is, outputs are enabled to change in the end cube of the state burst in Type III machine cycles. Dynamic hazard induced by state transition. We now consider the second scenario, namely, how a dynamic hazard, not present when every specification state is assigned to a unique layer, comes into being when specification states are merged. Fig. 12 shows an example of this scenario. S4 and S7 are output-compatible, but S0 and S7 are not output-compatible and neither are S0 and S4 . S0 is assigned to layer A, and S4 and S7 to layer B. If Type III machine cycle is used for S0 → S4 , then there is no dynamic hazard problem. However, if Type I machine cycle is used, then the required cube for 1 → 1 transition of a1 illegally intersects the privileged cube (the transition cube for the input burst from S7 to S4 ). This scenario can occur only when a conditional state transition intersects an unconditional state transition (the unique entry requirement guarantees that merging of two unconditional state transitions does not induce dynamic hazard). Furthermore, this problem disappears if S4 is assigned to another layer (and S7 does not obstruct the transition from S0 to S4 , as described below). Note that twolevel SOP implementations employing only Type III machine cycles for conditional state transitions and gC implementations not needing 1 → 1 transition cubes as required cubes do not

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

have this problem. However, the synthesis of Type I gC implementations with the robust covering requirement (no feedback delay) may encounter this scenario. s + r1 + / a1+

0

(a)

r2 − / a1 +

4

7

$

%

r1r2s r1r2s A1A2 000 001 011 010 110 111 101 100 A1A2 000 001 011 010 110 111 101 100

00 01 $ 11 10

00



00

00

01 01

SULYFXEHIRU a1

%

00

00 01 11 10



00 01 $ 11 10

00

10

10

7\SH,,,

(b)



00

10

01 01

10 SULYFXEHIRU a1

10

%

a1a2

00



10

00 01 11 10

00



00

10

10

10



10

a1 a2 7\SH,

(c)

Fig. 12. Dynamic logic hazard in state transition. (a) Two state transitions with the same destination; (b) No covering requirement violation for Type III; (c) Covering requirement violation exists if the transition cube for 1 → 1 transition is a required cube.

Definition 2 u is dh-susceptible in (u, v), if there exists (w, v) ∈ E such that a required cube for a state/output burst from w to v can illegally intersect a privileged cube in u. In the example shown in Fig. 12, S7 is dh-susceptible in S7 → S4 , because the required cube for a1 during the state/output burst from S0 to S4 can illegally intersect a privileged cube in S7 . Definition 3 u and v are sop-dhf-compatible (sum-of-products dynamic-hazard-free compatible) if, for all y ∈ O, 1. no required cube of y in u illegally intersects a privileged cube of y in v and vice versa, when the next states of u and v are specified on a single layer of the next-state table and, 2. if (u, v) ∈ E, u is not dh-susceptible in (u, v) and, 3. if (v, u) ∈ E, v is not dh-susceptible in (v, u), Definition 4 u and v are gc-dhf-compatible (generalized Celement dynamic-hazard-free compatible) if, for all y ∈ O, all 3 conditions above are satisfied for both yset and yreset , when the next states of u and v are specified on a single layer of the next-state table. The condition 2 states that, if (u, v) ∈ E, for u and v to be compatible, no required cubes for state/output bursts into v can illegally intersect privileged cubes in u. An obvious solution is to insure that no transitions into v intersect u. We accomplish this in two steps: (1) declare that u and v are dhf-incompatible; (2) constrain the layer encoding so that no layer transitions into v cross u. Similarly, the condition 3 applies for the case, (v, u) ∈ E.

9

B.3 Layer Minimization Algorithm The general state minimization problem is to find an irredundant closed cover of minimum cost, e.g., the cost is the cardinality of the cover, if the objective is to minimize the number of states. Our layer minimization algorithm1 proceeds as follows. It is similar to a classical technique [9] used for state minimization. 1. For every pair of specification states u and v, the algorithm determines the compatibility of u and v and the implied set of (u, v). 2. Find a set of maximal compatibles. 3. Find an irredundant closed cover using a variation of Petrick’s method [11]. Our heuristic proceeds as follows: Sort maximum compatibles into an ordered list. Initialize the cover as a null set. Repeat (a) Of the specification-states not yet covered by any compatibles in the cover under construction, find the first one covered by no more than one compatible. Select this compatible. If no such specification-state exists, then select the first compatible in the list. Add the selected compatible to the cover and delete it from the list. (b) Remove the specification-states, covered by the compatible selected in step (a), from the remaining compatibles in the list, so that no specification-state is covered by more than one compatible in the final cover. (c) Sort compatibles in the list. (d) Delete compatibles, which are subsets of other compatibles, from the list. until all specification-states are covered. The outcome of the layer minimization is a reduced next-state table, T 0 = (V 0 , W, X, Y, δ 0 , λ0 ), where V 0 is the set of layers, and δ 0 : V 0 × W × X × Y → V 0 ∪ {∗} and λ0 : V 0 × W × X × Y → {0, 1, ∗}n define the next layer function and the next output function respectively. Example: ISEND Type II. An example shown in Fig. 13a is used to describe the layer minimization algorithm. For every pair of specification states, u and v, in the extended burst-mode specification, the compatibility of u and v is decided (see Fig. 13b). Note that in this example no pair of compatible specification states has a corresponding implied set. The following is a list of maximal compatibles computed: A (0, 1, 2, 8) D (0, 2, 7, 8) G (0, 4, 7, 8)

B E

(0, 1, 3, 4) (0, 4, 5)

C F

(0, 1, 4, 8) (0, 4, 6)

To find an irredundant closed cover, the maximum compatibles are sorted into a list first: {A, B, C, D, E, F, G}. The first compatible removed from the list and added to the cover is (0, 1, 3.4), because state 3 is the first specificationstate covered by a single compatible, (0, 1, 3, 4). After removing redundancies and re-sorting, the list becomes: {(2, 7, 8), (5), (6)}. The completed irredundant closed cover is: 1 The layer minimization algorithm used in the original version of the 3D tool was a heuristic. We replaced it with a classical algorithm to accommodate more complex compatibility check.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

%

1 a− / y+ b− c− / y−

3

(a)

( b+ c− / x+ y−

5

c+ / x− y+ 4

b− d− / y−

0 b+ / x+ z+ 8

d+ / x−

' 7

b− d+ / x−

)

b+ d− / x+ y−

b− c+ / x− y+

2

a+ / z−

6

Compatible

1 2

Incompatible

3 4

(b)

5 6 7 8 0

1

2

3

4

5

6

7

Fig. 13. (a) Layer assignment of ISEND controller; (b) Compatibility table.

{(0, 1, 3, 4), (2, 7, 8), (5), (6)}; the final symbolic layer assignment is: B E

(0, 1, 3, 4) (5)

D (2, 7, 8) F (6)

10

been specified, then the layer E containing e is considered a potential conflict layer in assigning codes to C and D, unless the next layer of e is D and E is not dh-susceptible during the transition from E to D (see Section III-B.2). The edge between C and D is then labeled with E. Furthermore, if the next state of e is in another layer, say F , then layer E, layer F , and all the transient layers between E and F become potential conflict layers. The edge between C and D is labeled with [E, F ] in this case. Given a reduced next-state table, T 0 = (V 0 , W, X, Y, δ 0 , λ0 ), where V 0 is the set of layers, and δ 0 and λ0 define the next layer function and the next output function respectively, assume that C, D, E, F ∈ V 0 are unique layers, s is a member of W ×X×Y , and δ 0 (C, s) is D. Formally, a layer transition from E to F is said to be a potential conflict transition for the transition from (C, s) to D iff δ 0 (E, s) = F . Furthermore, layer E is said to be a potential conflict layer for the transition from (C, s) to D if δ 0 (E, s) = E and λ0 (E, s) 6= ∗. So far, we have assumed that C, D, E, and F are unique. Now consider degenerate cases: C = E and D = F . In the first case, since δ 0 (C, s) cannot be both D and F , if D 6= F , it is impossible for [C, F ] to be a potential conflict transition for the transition from C to D. In the second case, the transition from (E, s) to D is not a potential conflict transition for the transition from (C, s) to D, because they merge at (D, s), unless E is dh-susceptible. E is a potential conflict layer for the transition from (C, s) to D if δ 0 (E, s) = D, λ0 (E, s) 6= ∗, and E is dhsusceptible.

C. Layer Encoding

C

E

[E,F] [C,D]

ut

F

tp

input

ou

tp

ut

D

ou

We describe a simple algorithm,2 derived from [13], for critical-race-free encoding of layers. In the previous section, we presented an algorithm to construct the layers of the next-state table. At the end of the layer construction, we have a symbolic layer code for each state. We can then extract the constraints on the layer encoding in the form of a layer diagram (see Fig. 15). We use this diagram to perform the critical-race-free layer encoding. The objective of the layer encoding is to generate a critical-race-free layer assignment with few state bits.

input

C

0 010

0010

C

D

0 100

D

E

1 001

0100 0000 0001

F

1 100

1000

F

E

C.1 Layer Diagram A layer diagram is an undirected graph. Each edge in a layer diagram is labeled with a (possibly empty) set of pairs of vertices. The vertices represent layers of the next-state table, the edges represent transitions between the layers, and the labels on an edge correspond to potential conflicts during transitions between the layers. A conflict is a hazard induced by a layer transition. If there is a transition from layer C to layer D, an undirected edge is drawn between them. The next-state table entries of all states with the same combination of primary inputs and outputs as the initial state (c) and the final state (d) of the layer transition from C to D are checked for possible conflicts. If the next state of e, a state with the same xy-position as c and d, has already 2 Fuhrer

et al [12] showed that it is possible to perform state (layer) encoding so that hazard-free logic minimization can be carried out over all possible legal encodings of states (layers) optimally. We are currently investigating the insertion of this optimal state encoding algorithm into the synthesis procedure.

No obstruction

Obstruction

Fig. 14. Layer encoding example.

Let the codes assigned to layers C and D be cC and cD . A potential conflict layer E, if assigned the code cE , is said to obstruct the transition from C to D (or from D to C) iff cE differs from cC and cD only in bit positions whose values change during the transition from C to D, that is, cC + (cC ⊕ cD ) = cE + (cC ⊕ cD ), where + and ⊕ denote bitwise OR and XOR. For example, the potential conflict layer E obstructs the transition from C (cC = 001) to D (cD = 010), if cE is 000 or 011, but does not, if cE is 100.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

We can generalize the definition of obstruction to the case in which the transition between C and D has a potential conflict with a transition [E, F ]. The potential conflict transition [E, F ] is said to obstruct the transition from C to D (or from D to C) iff E and F are assigned the codes cE and cF and cE and cF differ from cC and cD only in bit positions whose values change during the transition from C to D or during the transition from E to F , that is, cC + cX = cE + cX , where cX = (cC ⊕ cD ) + (cE ⊕ cF ). Note that if [E, F ] is labeled on the edge between C and D, then [C, D] is labeled on the edge between E and F ([C, D] and [E, F ] are said to form a dichotomy in [13]). In Fig. 14, [E, F ] does not obstruct [C, D] and vice versa, as long as cC and cD share the same value and cE and cF share the value opposite to cC ’s at least in one bit position (the MSB in this example). The potential conflict transition [E, F ] obstructs the transition from C (cC = 0010) to D (cD = 0100), if cE and cF are 0001 and 1000 respectively, but does not, if cE and cF are 1001 and 1100. ( >*+@ >-,@ '

+

$

%

+

, &

>'(@ >*+@

)

+ >'(@ >-,@ *

Fig. 15. Layer diagram.

Suppose that layer transition [C, D] is labeled [E, F ] and layer transition [E, F ] is labeled [C, D]. If C 6= E, C 6= F , D 6= E, and D 6= F , we can find codes with an arbitrary Hamming distance between C and D and between E and F . We simply select a bit position and assign 0 in that bit position for C and D and 1 for E and F ; the remaining bit positions are used to make codes for C, D, E, and F unique. Therefore, it is always possible to encode layers so that every state burst is critical-race-free. C.2 Layer Encoding Algorithm Our goal is to encode the layers so that no layer or transition between layers obstructs the transitions (edges) on which it is labeled as a potential conflict. The layer encoding algorithm begins by resolving all potential conflicts. Initially, a code bit is reserved for each labeled layer transition so that the potential conflict layers are assigned a value different from the source and destination layers of the layer transition in that bit position. For example, 5 bits are allotted to encode the layers of the layer diagram in Fig. 15 because there are 5 labeled transitions, i.e., 5 pairs of dichotomies. All redundant dichotomies are then removed and symbolic code values are assigned, as shown in Table I. Each column in Table I represents a dichotomy (a transition and its potential obstructions) and each row represents a layer to be encoded. vi is 0 or 1, vi is the complement of vi , and − is a don’t care. Compatible columns of the dichotomy table are merged to minimize the number of state bits. Two columns i and j are compatible iff one of the following conditions is true:

11

Transition Obstructions

[A, E] H

[A, I] H

A B C D E F G H I J

v1 − − − v1 − − v1 − −

v2 − − − − − − v2 v2 −

[D, E] [G, H] [I, J] − − − v3 v3 − v3 v3 v3 v3

[G, H] [I, J] − − − − − − v4 v4 v4 v4

TABLE I D ICHOTOMY TABLE .

• in every row, the bit positions i and j have the same value or at least one is a don’t care; • in every row, the bit positions i and j have the opposite value or at least one is a don’t care. A set of columns of a dichotomy table that can be merged is said to be a compatible. A maximal compatible is a compatible which is not a proper subset of any other compatible. For example, columns 1, 2, and 4 of Table I constitute a maximum compatible. Minimizing the number of columns is then posed as a covering problem (as in the state minimization case): cover all the dichotomies with as few maximal compatibles as possible. A dichotomy table with the minimum number of columns is said to be a reduced dichotomy table. Then the symbolic codes are replaced with binary values (0 for vi and 1 for vi ). Table II shows the resulting reduced dichotomy table.

A B C D E F G H I J

0 − − − 0 − 1 1 0 0

− − − 0 0 − 1 1 1 1

TABLE II R EDUCED DICHOTOMY TABLE .

Before completing the code assignment, the algorithm determines the total number of state bits required to uniquely encode each layer, which may exceed the number of columns in the reduced dichotomy table. A subcode of code, cA , is any code that has the same value as cA in every bit position in which cA is not a don’t care. For example, cE (00) is a subcode of cA (0−) but cD (−0) is not. In order to differentiate cA from all of its subcodes in the table (cE , cI and cJ ), at least 2 bits (dlog2 (3 + 1)e) are needed; hence we need at least 3 bits to encode cA (2 bits to differentiate cA from

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

cE , cI and cJ and 1 preassigned bit). The minimal code length required for a partially encoded layer (a layer with don’t care bit positions) A with the code assignment of cA in the reduced dichotomy table is then: max(nc , nc − nd + dlog2 (ns + 1)e) where nc is the number of columns of the reduced dichotomy table, nd is the number of don’t care bit positions in cA , and ns is the number of subcodes of cA in the reduced dichotomy table. nc − nd is the number of preassigned bits, and dlog2 (ns + 1)e additional bits are needed to differentiate cA from all of its subcodes. If the longest of the minimal code lengths exceeds the number of columns of the reduced dichotomy table, the algorithm pads the table with columns of don’t cares until the number of columns equals the longest of the minimal code lengths (4 in our example), which is the lower bound on the number of columns that needs to be added. Note that the upper bound is no greater than dlog2 nl e where nl is the total number of layers. In all of the examples we synthesized, the lower bound was sufficient. The number of available codes for a partially encoded layer A is 2nd − ns where ns is the number of subcodes of cA in the padded reduced dichotomy table and nd is the number of don’t care bit positions in cA . The remaining task of the algorithm is to assign an available code to each partially encoded layer while sustaining the number of available codes for other partially encoded layers greater than 0. If no code can be found that keeps the number of available codes for other layers greater than 0, then the algorithm pads another column. The final code assignment for the example is shown in Table III. A B C D E F G H I J

0 1 0 0 0 0 1 1 0 0

0 0 0 0 0 1 1 1 1 1

0 0 1 1 0 1 0 0 0 0

1 0 1 0 0 0 0 1 0 1

TABLE III L AYER CODE ASSIGNMENT.

D. Combinational Logic Synthesis The 3D synthesis tool generates required cubes, off-set minterms, and privileged pairs (privileged cubes and corresponding start/end cubes) for each primary output and internal state variable function. For gC implementations, fset and freset are treated as two separate functions. Off-set minterms of fset are off-set minterms of f , and off-set minterms of freset are onset minterms of f .

12

Logic minimization is performed using exact algorithms for hazard-free logic, implemented in an automated logic minimizer originally developed by Nowick and Dill [10] and further enhanced by Fuhrer et al [12]. This hazard-free logic minimizer, using a variation of Quine-McCluskey algorithm [14], finds an optimal hazard-free cover of required cubes using dynamichazard-free implicants, implicants that do not illegally intersect privileged cubes. IV. E XPERIMENTAL R ESULTS The synthesis procedure is completely automated, other than adding feedback delay when necessary.3 The 3D synthesis tool transforms a textual extended burst-mode specification into a next-state table with symbolic layer assignment, derives a layer diagram, performs a critical-race-free layer encoding, generates required cubes, off-set minterms, and privileged pair sets for outputs and state variables. The logic minimization is performed by the combinational synthesis tool discussed in Section III-D. Numerous experiments have shown that the 3D synthesis tool produces results that have smaller area and are much faster than competing asynchronous synthesis methods. There are two measures that characterize the performance of the 3D machines: latency and cycle time. The latency is the delay from the last terminating input edge of an input burst to the last edge of the resulting output burst. The cycle time is the delay required to avoid circuit malfunction from the last terminating input edge of an input burst to the first compulsory input edge of the next input burst. A. Examples Using Two-Level AND-OR Experimental results are shown in Table IV. All the examples in this table are designed for Type II machine cycle. The latencies and the cycle times were evaluated using a 0.8µm CMOS standard cell library, developed for the Verilog simulator by the Torch group at Stanford University [15]. The library cells were characterized using the SPICE simulator under military worst-case conditions (4.5V power supply, 125◦ C) and derated for the nominal case (5V, 25◦ C). The two-level logic equations produced by the 3D tool were mapped to the standard cell library either manually or using a hazard-nonincreasing technology mapper [16]. The 3D design methodology makes timing assumptions (feedback delay requirements, MIC fundamentalmode constraints, and setup/hold time requirements) during synthesis and thus requires a post-synthesis calculation of what these requirements are. Recently, the 3D timing analyzer [17] has been developed by Chakraborty et al. to compute these requirements in a “manufacturer’s databook” fashion. However, the latency and cycle time entries in Table IV were obtained by simulation runs using Mentor QuickSim. Each parameter represents the worst-case scenario. For example, for chu-ad-opt, both the worst-case latency and the worst-case cycle time are 1.2ns. Note that this circuit still requires fundamental-mode timing constraints on its environment, because not all outputs take the worst-case delay to switch. 3 Chakraborty’s timing analyzer computes required feedback delays automatically for static XBM circuits. We expect to extend this tool for gC circuits as well.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

The largest example we have evaluated is a pipelined SCSI bus controller, which implements an asynchronous data transfer protocol. The extended burst-mode specification of the asynchronous data transfer protocol of the pipelined SCSI bus controller has 45 states and 62 state transitions, 10 primary inputs, and 5 primary outputs. The 3D synthesis tool added 5 internal state variables, and the implementation required 108 product terms and 378 literals. The Verilog simulation results for the output latency and the cycle time were 3.3ns and 6.1ns. Comparison to locally-clocked and UCLOCK methods. In Table V, we compare the 3D and two competing methods (locally-clocked method [2] and UCLOCK method [18]) for 6 controller implementations4 including two large published examples, dramc [19] and cache-ctrl [20]. It is interesting to compare to these methods, because (1) both locally-clocked and UCLOCK methods use the burst-mode as the user-level specification formalism, which means that every machine synthesized using these methods can be re-implemented in 3D, and (2) the locally-clocked method has been used for many practical large scale controller designs. The 3D results are based on two-level AND-OR implementations; all are designed for Type I machine cycle. The area reduction is mainly due to the lack of a local clock (when compared to locally-clocked machines) and fewer state variables (when compared to UCLOCK machines). We did not list delay comparisons because it is difficult to compare the delay estimates without simulation results.5 The output latencies of 3D machines appear to be significantly shorter than those of the corresponding locally-clocked machines for all cases shown in Table V (although the precise amount cannot be deduced without simulations). This reduction is a result of greater degrees of freedom in logic minimization due to the greater number of inputs to logic functions, with the addition of primary outputs as inputs to logic functions. The output latencies of 3D machines are somewhat shorter than or at par with those of UCLOCK machines. Finally, although 3D machines using Type II or III machine cycle require longer cycle times than UCLOCK machines, Type I machines do not suffer from the same problem in most cases. Literal and product counts for locally-clocked machines do not include the latch overhead. We also ignore area overhead due to feedback delays, which we assume is negligible. In a typical 1µm gate array implementation, we estimate an additional saving of about 2ns in output latency over the locally-clocked implementation due to the lack of latches. If the locally clocked implementations use dynamic latches, then the saving would be about 0.7ns. The cost of latch removal in the 3D machines is an increase in state variable logic and greater constraints on state encoding, since the encodings must be critical-race-free. B. Examples Using gC Initially, we synthesized the gC circuits without the robust covering requirement. Of the 20 gC examples synthesized, 14 required no decomposition, i.e., every output and state variable 4 No extended burst-mode machine implementations are compared because both the locally clocked synthesis and UCLOCK synthesis only work for original burst-mode machines. 5 Only the literal counts are available in the literature for locally-clocked and UCLOCK methods.

13

can be mapped to a single generalized C-element with no series stack of more than 4 transistors. Out of the 6 examples that required decomposition (the ones with * next to its name in Table VI), 3 examples had an output that required 5 or 6 transistors in series but with no more than 3 trigger signals. For those circuits, the performance of the implementations without decomposition would not be significantly different from that of the corresponding decomposed ones. In other words, only 3 remaining examples truly required decomposition. The total literal counts were considerably fewer than twolevel solutions. The literal counts in Table VI actually represent the total number required for both N and P stacks; thus the actual transistor counts (excluding sustainers and reset logic) are equal to the literal counts, whereas for two-level solutions the transistor counts are more than double the literal counts shown in the table. We then modified the tool to incorporate the robust covering requirement and re-synthesized the same 20 circuits. The results are shown in the columns with the heading, gC*. Out of the 20 examples, 14 produced results with the same number of literals and 2 with one more literal than the the original circuit. Of these 16, 7 produced more robust results, i.e., the original circuit required feedback delays but the new one did not. Of the remaining 4, 3 examples produced results with significantly more literals (up to 35% more) than the original circuit (without requiring an additional state variable). Finally, the selmerge circuit required an additional state variable (with 28 literals) but one of the outputs required 10 fewer literals as a result of adding a state variable. Thus the output latency actually improved somewhat with little effect on cycle time. We feel that, in practice, it would be desirable to have the tool to compute both solutions and let the designers choose the solution they prefer. If a robust solution exists without adding significant feedback delays, then it should be selected. Otherwise, feedback delays can be added, after determining the safe bounds using Chakraborty’s 3D timing analyzer. The worst-case performance of the gC circuit can be approximated by an equivalent OR gate (taking the output inverter of the gC into account) with the number of fanins equal to the number of transistors in the longest series stack. According to a series of post-layout SPICE simulations [21], gC circuits (for diffeq examples) are at least 30% faster than equivalent two-level circuits. V. C ONCLUSION In Parts I and II, we discussed all aspects of an asynchronous controller design method: user-level specification formalism, hazard-free logic synthesis theory and its application to sequential synthesis, and automated sequential synthesis algorithms. A summary of the results is as follows: Extended-burst-mode design style: We introduced a new specification formalism called extended-burst-mode. We showed that a wide range of practical circuits, both asynchronous and synchronous, can be specified in extended-burstmode and synthesized using a single synthesis tool. This design style is appropriate for specifying many circuits that fall in the gray area, hard to classify as synchronous and asynchronous, which are difficult or impossible to synthesize automatically us-

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

chu-ad-opt vanbek-ad-opt dme dme-fast alloc-outbound mp-forward-pkt nak-pa pe-send-ifc pe-rcv-ifc ram-read-sbuf rcv-setup sbuf-ram-write sbuf-read-ctl sbuf-send-ctl sbuf-send-pkt2 sendr-done sic-example dram-controller scsi-tsend-bm scsi-trcv-bm scsi-isend-bm scsi-tsend-csm scsi-trcv-csm scsi-isend-csm pscsi-isend pscsi-ircv pscsi-tsend pscsi-trcv pscsi-tsend-bm pscsi-trcv-bm pscsi

Specification States / Primary Transitions In Out 4 4 3 3 3 3 3 3 8 10 3 3 8 10 3 3 8 9 4 3 4 4 3 4 6 6 4 5 11 14 5 3 12 15 4 4 8 8 5 5 6 8 3 2 6 6 5 5 7 8 3 3 8 9 3 3 7 10 4 2 3 3 2 1 6 12 2 1 12 14 7 6 11 13 5 4 10 12 5 4 10 12 5 4 10 11 5 4 8 9 5 4 8 9 5 4 9 11 4 3 6 7 4 3 10 12 4 3 6 7 4 3 10 12 4 4 7 9 4 4 45 62 10 5

State Vars 0 0 2 2 2 0 1 2 2 0 0 1 1 2 2 1 1 1 2 2 2 2 2 2 3 2 3 1 3 2 5

14

Type II Implementation Prod Terms Lits Latency 4 11 1.2ns 4 9 1.3ns 11 29 2.0ns 12 29 1.7ns 12 27 1.8ns 6 14 1.4ns 10 17 1.7ns 21 60 2.3ns 26 72 2.1ns 13 22 1.7ns 3 8 1.4ns 18 41 1.9ns 8 17 1.5ns 14 32 2.1ns 11 30 2.2ns 4 8 1.0ns 6 13 1.5ns 20 46 2.2ns 27 58 2.3ns 24 55 2.3ns 25 62 2.5ns 24 44 2.2ns 23 42 2.3ns 24 42 1.9ns 28 80 2.9ns 14 31 1.7ns 26 70 2.2ns 14 25 2.2ns 23 60 2.0ns 21 47 2.0ns 108 378 3.3ns

Cycle Time 1.2ns 1.3ns 3.1ns 2.9ns 3.0ns 1.4ns 2.5ns 3.7ns 3.8ns 1.7ns 1.4ns 3.2ns 2.6ns 3.3ns 3.5ns 2.4ns 2.5ns 2.2ns 3.8ns 3.4ns 3.9ns 3.3ns 3.6ns 3.4ns 4.4ns 3.2ns 4.3ns 2.6ns 3.7ns 3.8ns 6.1ns

TABLE IV E XPERIMENTAL RESULTS FOR TWO - LEVEL SOP CIRCUITS .

pe-send-ifc dramc cache-ctrl

Literals Output Total LC 3D LC 3D 47 42 79 65 51 40 64 50 720 494 886 532

Product terms Output Total LC 3D LC 3D 15 14 25 24 20 17 23 21 215 161 245 172

pe-send-ifc dme dme-fast sbuf-read-ctl chu-ad-opt

UC 54 12 20 13 11

UC 16 4 8 5 4

3D 42 14 19 12 11

UC 103 20 34 20 14

3D 65 25 30 15 11

3D 14 6 7 5 4

UC 34 9 16 9 6

3D 24 11 13 7 4

TABLE V C OMPARISONS TO LOCALLY- CLOCKED AND UCLOCK METHODS .

Area Reduction Estimate 18% 22% 40%

37% −25% 12% 25% 21%

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

iccad93ex edac93ex condtest dff q42 select2ph selmerge2ph ring-counter* binary-counter pe-send-ifc* pe-rcv-ifc* dramc* stetson-p3 fifocellctrl scsi-targ-send* scsi-init-send diffeq-ALU1 diffeq-ALU2* diffeq-MUL1 diffeq-MUL2

S 3 4 4 4 4 4 8 8 32 11 12 12 8 3 9 9 7 14 4 3

Spec I/O 2/2 3/2 3/2 2/2 2/2 2/2 3/2 1/2 1/4 5/3 4/4 7/6 4/2 2/2 5/3 5/3 3/5 5/7 3/3 3/3

State vars 2L gC gC* 2 0 0 2 1 1 2 1 1 2 0 0 1 1 1 2 0 0 2 1 2 1 1 1 3 3 3 2 2 2 3 2 2 1 0 0 1 0 0 1 1 1 3 3 3 3 3 3 2 2 2 3 2 2 1 1 1 0 0 0

C OMPARISON OF

2L 20 32 30 28 27 42 89 45 94 90 84 71 16 11 97 83 43 141 42 15

15

Literals gC gC* 8 8 13 13 18 19 16 16 15 15 24 24 52 70 160 160 56 56 57 57 54 54 38 38 8 8 10 11 47 53 46 62 38 38 89 102 32 32 13 13

gC* more robust? No change No change Yes No change Yes No change Yes No change No change No change No change No change No change Yes Yes — No change — No change No change

TABLE VI SOP AND G C CIRCUITS .

TWO - LEVEL

ing competing methods. Hazard-free combinational synthesis requirements: We described two different hazard-free next-state logic synthesis methods: two-level sums-of-products and generalized Celements implementation. We extended the existing theories on hazard-free combinational synthesis to handle non-monotonic input changes and developed a set of requirements for freedom from logic hazards for each next-state logic synthesis method. 3D automatic synthesis algorithm: We presented a complete set of automated sequential synthesis algorithms: hazard-free state assignment, hazard-free state minimization, and criticalrace-free state encoding. We observed that eliminating dynamic hazard was the key factor which set the overall synthesis direction. In fact, the functional synthesis step looks ahead for any possibility of dynamic hazards and takes measures to prevent them. We observed that the heuristics that find near-optimal solutions in polynomial time do not significantly degrade the quality of the final implementations. Efficient timing analysis and verification, which are not described in this paper, are crucial components in the 3D design methodology. The 3D design flow makes timing assumptions during synthesis. In particular, it assumes that feedback delays, fundamental-mode timing constraints, and conditional signal setup and hold time requirements are sufficiently large during synthesis. These requirements need to be “tightened,” especially for systems employing multiple communicating (extended) burst-mode controllers. There have been several chips designed and implemented, which incorporate communicating (extended) burst-mode controllers: a high-performance differential equation solver chip [21], a high-performance SCSI controller [22], and a low power infrared communication chip [23].

In these designs, we found that the environment of each controller is typically so slow (relative to the controller settling time) that reasonable timing assumptions can be satisfied with large safety margins. Finally, we used the 3D synthesis tool described in this paper to design significant portion of control circuitry for Intel’s Asynchronous Instruction Length Decoder chip, as well as the chips mentioned above. All the 3D controllers in the fabricated chips worked correctly in first silicon. R EFERENCES [1]

[2] [3] [4] [5] [6] [7]

[8]

K. van Berkel, J. Kessels, M. Roncken, R. Saeijs, and F. Schalij, “The VLSI-programming language Tangram and its translation into handshake circuits,” in Proc. European Conference on Design Automation (EDAC), 1991, pp. 384–389. S. M. Nowick and D. L. Dill, “Automatic synthesis of locally-clocked asynchronous state machines,” in Proc. International Conf. ComputerAided Design (ICCAD), Nov. 1991, pp. 318–321. K. Y. Yun and D. L. Dill, “Automatic synthesis of 3D asynchronous state machines,” in Proc. International Conf. Computer-Aided Design (ICCAD), Nov. 1992, pp. 576–580. L. Lavagno, K. Keutzer, and A. Sangiovanni-Vincentelli, “Synthesis of hazard-free asynchronous circuits with bounded wire delays,” IEEE Transactions on Computer-Aided Design, vol. 14, no. 1, pp. 61–86, Jan. 1995. C. J. Myers and T. H.-Y. Meng, “Synthesis of timed asynchronous circuits,” IEEE Transactions on VLSI Systems, vol. 1, no. 2, pp. 106–119, June 1993. C. Ykman-Couvreur, B. Lin, and H. de Man, “Assassin: A synthesis system for asynchronous control circuits,” Tech. Rep., IMEC, Sept. 1994, User and Tutorial manual. J. Cortadella, M. Kishinevsky, L. Lavagno, and A. Yakovlev, “Petrify: a tool for manipulating concurrent specifications and synthesis of asynchronous controllers,” IEICE Transactions on Information and Systems, vol. E80-D, no. 3, pp. 315–325, Mar. 1997. S. M. Nowick, Automatic Synthesis of Burst-Mode Asynchronous Controllers, Ph.D. thesis, Stanford University, Department of Computer Science, 1993.

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS

[9] [10]

[11]

[12]

[13]

[14] [15] [16] [17]

[18]

[19]

[20]

[21]

[22]

[23]

S. H. Unger, Asynchronous Sequential Switching Circuits, WileyInterscience, John Wiley & Sons, Inc., New York, 1969. S. M. Nowick and D. L. Dill, “Exact two-level minimization of hazardfree logic with multiple-input changes,” IEEE Transactions on ComputerAided Design, vol. 14, no. 8, pp. 986–997, Aug. 1995. S. R. Petrick, “A direct determination of the irredundant forms of a boolean function from the set of prime implicants,” AFCRC-TR-56-110 Air Force Cambridge Research Center, Apr. 1956. R. M. Fuhrer, B. Lin, and S. M. Nowick, “Symbolic hazard-free minimization and encoding of asynchronous finite state machines,” in Proc. International Conf. Computer-Aided Design (ICCAD), 1995, pp. 604–611. J. H. Tracey, “Internal state assignments for asynchronous sequential machines,” IEEE Transactions on Electronic Computers, vol. EC-15, pp. 551–560, Aug. 1966. E. J. McCluskey, Logic Design Principles With Emphasis on Testable Semicustom Circuits, Prentice-Hall, 1986. J. Maneatis and D. Ramsey, “Torch standard cell library,” 1992, Private communication. P. S. K. Siegel, Automatic Technology Mapping for Asynchronous Designs, Ph.D. thesis, Stanford University, Feb. 1995. S. Chakraborty, D. L. Dill, K. Y. Yun, and K. Chang, “Timing analysis for extended burst-mode circuits,” in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, Apr. 1997, pp. 101–111. S. M. Nowick and B. Coates, “UCLOCK: Automated design of highperformance asychronous state machines,” in Proc. International Conf. Computer Design (ICCD), Oct. 1994, pp. 434–441. S. M. Nowick, K. Y. Yun, and D. L. Dill, “Practical asynchronous controller design,” in Proc. International Conf. Computer Design (ICCD), Oct. 1992, pp. 341–345. S. M. Nowick, M. E. Dean, D. L. Dill, and M. Horowitz, “The design of a high-performance cache controller: a case study in asynchronous synthesis,” Integration, the VLSI journal, vol. 15, no. 3, pp. 241–262, Oct. 1993. K. Y. Yun, P. A. Beerel, V. Vakilotojar, A. E. Dooply, and J. Arceo, “The design and verification of a high-performance low-control-overhead asynchronous differential equation solver,” in Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, Apr. 1997, pp. 140–153. K. Y. Yun and D. L. Dill, “A high-performance asynchronous SCSI controller,” in Proc. International Conf. Computer Design (ICCD), 1995, pp. 44–49. A. Marshall, B. Coates, and P. Siegel, “Designing an asynchronous communications chip,” IEEE Design & Test of Computers, vol. 11, no. 2, pp. 8–21, 1994.

Kenneth Y. Yun is currently an assistant professor in the Dept. of Electrical and Computer Engineering at University of California, San Diego. He has a Ph.D. in Electrical Engineering from Stanford University and an S.M. in Electrical Engineering and Computer Science from MIT. He had held design engineering positions at TRW and Hitachi for 6 years. His current research interests include the design, synthesis, analysis, and verification of mixed-timed VLSI circuits and systems: in particular, interface design methodologies and tools to facilitate ultra-high-speed communications between synchronous/asynchronous modules. He has been working with Intel Corp. as a primary consultant on the Asynchronous Instruction Decoder Project. He has organized ASYNC’98 as a program co-chair. Dr. Yun is the recipient of a National Science Foundation CAREER award and a Hellman Faculty Fellowship. He has received the Charles E. Molnar award for a paper that best bridges theory and practice of asynchronous circuits and systems at ASYNC’97 and a best paper award at ICCD’98.

16

David L. Dill is Associate Professor of Computer Science and, by courtesy, Electrical Engineering at Stanford University. He has been on the faculty at Stanford since 1987. He has an S.B. in Electrical Engineering and Computer Science from the Massachusetts Institute of Technology (1979), and an M.S and Ph.D. from Carnegie-Mellon University (1982 and 1987). His primary research interests relate to the theory and application of formal verification techniques to system designs, including hardware, protocols, and software. Prof. Dill’s Ph.D. thesis, “Trace Theory for Automatic Hierarchical Verification of Speed Independent Circuits” was named as a Distinguished Dissertation by ACM and published as such by M.I.T. Press in 1988. He was the recipient of an Presidential Young Investigator award from the National Science Foundation in 1988, and a Young Investigator award from the Office of Naval Research in 1991. He has received Best Paper awards at International Conference on Computer Design in 1991 and the Design Automation Conference in 1993 and 1998. From July 1996 to September 1997 he was Chief Scientist of 0-In Design Automation.

Suggest Documents