A Low Latency Asynchronous FIFO Combining a Wave ... - Google Sites

IEICE TRANS. ??, VOL.Exx–??, NO.xx XXXX 200x

1

PAPER

A Low Latency Asynchronous FIFO Combining a Wave Pipeline with a Handshake Scheme Jeong-Gun LEE† , Suk-Jin KIM† , Student Members, Jeong-A LEE†† , Nonmember, and Kiseon KIM† , Regular Member

SUMMARY This paper presents a new asynchronous FIFO design to reduce forward latency in a linear structure. The operation mode for each cell can be reconfigured dynamically as either of the two schemes, wave pipelining or handshaking, according to the data flow in the FIFO. The adoption of wave pipelining to the conventional self-timed FIFO can reduce the overhead of the handshaking as well as latching control in each stage. Initial pre-layout simulations indicate about two times of improvement on latency performance over a state-of-art asynchronous FIFO, while retaining its throughput. key words: Asynchronous FIFO, wave pipeline, linear structure, forward latency, throughput

1.

Introduction

A FIFO is commonly used as a buffer to smooth out bursty traffic between a data producer and a consumer. In addition, the independent interfaces of its input and output can provide a solution for bridging clock domains. In an asynchronous system, most of FIFOs are implemented using a ripple or linear structure that exploits local communications between cells, whereas FIFOs in a clocked system are usually implemented with a ring structure using read and write pointers with a global clock [1]. It is note worthy that asynchronous linear FIFOs are useful in many asynchronous as well as clocked systems due to their simple and modular design. The recent Sun UltraSPARC IIIi processor includes many asynchronous linear FIFOs to transfer data from synchronous RAM into the processor clock domain [1]. Two key performance measure of the asynchronous FIFO are forward latency and the cycle time. Forward latency is the time for a data item to pass an empty cell, while the cycle time is the time interval between two adjacent data items at maximum speed. The linear FIFO can provide steady throughput, however, it increases forward latency because it adds sequential data movement inside the FIFO. Therefore, long latency of Manuscript received 0, 2004. † The authors are with the Department of Information and Communications, Gwangju Institute of Science and Technology, Gwangju, 500-712, Korea. E-Mail:{eulia,guesswho,kskim}@gist.ac.kr †† Corresponding author. The author is with the Department of Computer Engineering, Chosun University, Gwangju, Korea. E-Mail:[email protected]

the FIFO may have critical impact on the overall performance of systems. Alternative approaches to building low latency asynchronous FIFO have been proposed [2]–[5]. These FIFOs reduces the number of data movement by modifying its structure from a standard linear one (for instance, a parallel, a tree, a squared, and a folded structure). Although the control circuitries of these FIFOs brings a small increase in area, the overhead of the overall FIFO can be significant when a wide datapath is applied. A focus of the proposed asynchronous FIFO is to target low latency while retaining high throughput and supporting a small area of the linear structure. To reduce forward latency of the FIFO we adopt the wellknown wave pipelining technique [6] to a conventional self-timed FIFO. Subsequently, the FIFO is configured dynamically as wave pipelining mode for a latencycritical situation (e.g., all cells in a FIFO are empty), and as handshake pipelining for a throughput-critical situation (most of cells are full). Since neither handshaking nor latching control is required during wave pipelining, the proposed FIFO can reduce forward latency in the wave pipelining mode. On the contrary to the synchronous wave pipelining, furthermore, balancing of the data path delays is relatively easy since the FIFO does not include the combinational logic inside. Moreover, the area overhead of the FIFO is only caused by the added control circuitry thanks to the linear structure. The rest of the paper is organized as follows. Section 2 introduces the basic idea and behavior of the proposed FIFO. The detailed implementation issues are discussed in Section 3 and the simulation results are shown in Section 4. Finally we draw conclusion in Section 5. 2.

The Basic idea for improving latency performance

The basic idea of the proposed FIFO is “to adopt wave pipelining in cells where a data item can advance forward without handshaking”. Basically, the asynchronous FIFO stores the full or empty state of a cell to a flip-flop for handshaking with adjacent cells. The data item entering the FIFO can propagate through


2

Rin

Rout

Datain

Dataout

Initially WPR

Blocked

Rin

Rout

Datain

Dataout

HPR

WPR

Blocked

Rin

Rout

Datain

Dataout

WPR

HPR

Released

Cycle time of Waves

Wave Wave Pipelining Pipelining Region Region

Fig. 1

Handshake Pipelining Region

C

Behavior of the wave-handshake hybrid FIFO

empty cells along with its corresponding request without handshaking or latching just like a wave until it meets the cell whose state is full. Then, the corresponding data item is saved to the empty cell before the full cell while making it a full state as well. After stopping wave propagation, the data movement is controlled by a handshake protocol as a conventional asynchronous FIFO does. Since neither handshaking nor latching control is required during wave pipelining, we can improve latency performance of the proposed FIFO. Note that the wave is only generated at the beginning of the FIFO, and it does not change the state of the cell during wave propagation. Fig. 1 shows a simple example of behavior of the proposed FIFO. A dot, a shadowed box, and a blank box in the figure denote a data item, a wave operating cell, and a handshake operating cell, respectively. Initially, all cells are empty and the output of the FIFO is blocked by the slow environment. The first data item entering the FIFO can flow to the last cell by wave pipelining, while successive data items are entering the FIFO. Since the output is blocked, the first data item is saved to the last cell and its corresponding request signal makes the cell’s state full at the same time. As the subsequent waving items meet the full cells, they are piled up one by one and the operation mode of these cells are changed to the handshake scheme. When the output of the FIFO is released (the last cell receives an acknowledgement from the environment), it generates a request signal (Rout) and deliver the data item to the environment. We name the cluster of the shadowed cells for a “wave pipelining region” (WPR). The cells in the WPR just pass data items with waves while keeping the states of the cells to empty. Therefore, subsequent data items can also propagate as fast as the wave, up to the end of the WPR. If the next cell to the WPR changes its state to empty, the WPR expands in a forward direction. Otherwise, the WPR shrinks in the backward direction as a new data item arrives at the end of the WPR. Note

that time intervals between adjacent data items should be long enough to guarantee that the operation mode can switch from wave pipelining to handshaking before the following wave overruns the currently blocked one. A series of the blank cells is also called a “handshake pipelining region” (HPR) where all the cells are operated with the handshake scheme to transfer data items. On the contrary to the WPR, each cell in the HPR can be either full or empty state depending on the data flow. If the output of the FIFO is blocked by the slow environment, the HPR expands in a backward direction as new waving items arrive in this region. However, if the output environment is fast enough to manage waving data items, the HPR shrinks in a forward direction. Finally, all the cells in the FIFO belong to the WPR. The proposed FIFO improves forward latency performance by the fast wave propagation in the WPR containing a chain of empty cells in the front part of the FIFO. 3.

Waving a self-timed FIFO

The design of the proposed wave-handshake FIFO is based on the well-known GasP circuits that are a stateof-art asynchronous linear FIFO developed by Sun Mirosystems [7]. In order to handle a data item on the wave operation mode, we append a wave controller and a latch controller to each cell of the self-timed handshaking FIFO. 3.1 Introduction of the GasP circuits The well-known GasP circuits are the asynchronous linear FIFO using a single track handshaking. Instead of storing of the state of a cell in a flip-flop, the GasP stores each state on a single wire, called a state conductor [7]. Fig. 2 shows the GasP circuits consist of three PLACE and two PATH logics. Each PLACE holds a data item in data latches and its state on a state conductor with a keeper, a pair of small inverters, to retain the state for an indefinite period. Note that the state is encoded to HI = empty and LO = full. A PATH logic located between two PLACEs contains a control circuit and an N-type pass transistor for a data item. A special NAND stack [b x] detects the condition LO-HI of the predecessor and the successor state conductors, meaning full-empty PLACEs. When it detects the condition it generates a short positive pulse for the pass transistor. Inverter [c] and N-type transistor [d] drive the successor state conductor LO (full state), while Ptype transistor [y] makes the predecessor PLACE empty (HI). After a short delay [r s t], it resets the NAND gate. As for the performance of the GasP FIFO, forward latency per a cell is four gate-delays [a b c d] and its reverse latency (to move emptiness backwards) is two

LEE et al.: A LOW LATENCY ASYNCHRONOUS FIFO COMBINING A WAVE PIPELINE WITH A HANDSHAKE SCHEME

3

self-reset

keeper 1

r’

y’

State conductor

self-reset

keeper 2

s’

t’

3

r

y

5 c’

c

s

d’

keeper 3

s

t

State conductor

d

6 a’

a b’

b 2 1

4 x’

x

wshjl

wshjl

wshjl

wh{o

wh{o Data Latch

Data Latch

A GasP based FIFO and its cycle time

gate-delays [x y]. In general, a cycle time can be defined as “the time difference between successive request transitions (falling transitions at the input of a keeper) when the FIFO is operating at maximum speed”. Fig. 2 shows the path determining cycle time of GasP circuits. At the point of the keeper 2 marked by symbol s ’ in Fig. 2, a falling transition (LO) indicating a ‘ full state goes to the inverter [a] and its output signal turns on N-type transistor [b]. Then, P-type transistor [y] is turned on and it changes the state of the keeper 2 to HI (empty state). The HI signal switches on Ntype transistor [x’] and the input of the inverter [c’] is connected to the ground directly. The transistor [b’] is turned on before the transistor [x’] turns on when the FIFO works in maximum speed. The inverter [c’] produces rising transition at the output of N-type transistor [d’]. Finally, the keeper 2 becomes LO indicating a full state. These signal transitions occur in the order defined by circled numbers as shown in Fig. 2 and the sequence of the corresponding transistors, [a b y x’ c’ d’], becomes the critical path determining cycle time. Therefore, the cycle time of the GasP FIFO is sixgate delays, a sum of forward and reverse latency, which is equivalent to the cycle time of a minimum (threeinverter) loop. 3.2 A wave controller for the GasP Circuits In order to handle a wave in the GasP FIFO, we add a wave controller (WC) in the PATH logic, as shown in Fig. 3. When a new wave request (WRin) arrives by forms of a positive pulse, the proposed WC performs the following function according to the state of the sucessor PLACE (Sin): • empty state: passing the wave to the successor PLACE by generating a WRout signal • full state: making the predecessor PLACE a full state by driving a Sout signal LO

self-reset

keeper

keeper

s

r

y

c

State conductor

d

State conductor

a b

NAND

Fig. 2

Data Latch

x

Sout

WRin

Sin

WC

wshjl

wh{o

WRout

wshjl

e

Data Latch

Data Latch

Fig. 3

Wave-handshake GasP circuits

Then, the above functions can be implemented by the following signalling rules: WRin ∧ Sin → WRout ↑ ¬WRin → WRout ↓ WRin ∧ ¬Sin → Sout ↓

(1) (2) (3)

In general, input and output environments of a FIFO are independent and working asynchronously. WRin signal propagates in a forward direction and it is controlled by an input environment while Sin signal propagates in a backward direction and it is controlled by an output environment. Subsequently, two input signals (WRin and Sin) of a WC can be changed asynchronously and simultaneously. Assume that keeper 1 has HI value (empty state) and keeper 2 has LO value (full state) in Fig 4. In this configuration, a rising transition on WRin can be triggered by the WC of the pre-


4 keeper 2

keeper 1

Input controller y

zyT uhuk

N

SR-NAND

zyT uhuk

M 2 WC

WC

Req_In

Signaling Rule (3) 1

WRin

ar ME ag br bg

Sin

Sout

v

a

d

WRout

b

When path delays of M and N are denoted by TAck , tReq, repectively. Timing Constraint : TAck < tReq

Fig. 5

An input controller

c

ar br

s t

q

p

Fig. 4

u

r

bg Signaling Rule (1) & (2)

ag

A wave controller (WC)

vious cell (input environment of the current cell) when the keeper 2 (Sin signal) is being changed by the GasP control circuit of the next cell (output environment of 1 and the current cell). In Fig 4, two paths marked by 2 show simultaneous and uncorrelated rising transi tions on WRin and Sin signals, respectively. The simultaneous transitions on the inputs (WRin and Sin) may cause unexpected glitches at the output of WRout and Sout signals. Furthermore, a metastable situation may occur in a state conductor (Sout) which retains the value by a keeper, a memory-like element. To resolve these problems, the WC is designed using a mutual exclusive (ME) element which forces the temporal separation of rising edges of the two inputs by means of the first-come first-service. Even though both inputs arrive at the same time, the ME selects one to pass through arbitrarily. The circuits in the dashed line box of Fig. 4 shows the schematic of the proposed WC. Initially the WRin signal is LO. If the successor PLACE is also empty (Sin = HI), the output bg of the ME becomes HI. Once the wave arrives (the WRin comes in the form of a positive pulse), the WC generates a WRout immediately. The circuit for the WRout on a shadowed region is an implementation of two signalling rules (1) and (2). On the other hand, the successor PLACE is full (Sin = LO) at the end of the WPR. Then, the bg of the ME becomes LO and it disables the pull-down path of the circuit for the WRout. When the WRin signal arrives at the WC, the ag becomes HI and a Sout signal is discharged to LO (the state conductor of the predecessor PLACE becomes the full state). It is noteworthy that the ME element is not on the critical path of propagating a wave signal from WRin to

WRout, since all the Sin signals in the WPR are stable and keep HI even when they pass the wave signals. For the proposed WC, therefore, forward latency per a cell is two gate-delays [a d] in the WPR. The data item arrives at the end of the WPR should change the operation mode before the next waving data item overruns. Furthermore, the mode change should not influence the cycle time of the GasP circuits for retaining high throughput. Therefore, when the WRin meets the full cell (Sin = LO), the bg as well as the Sin signal of the predecessor WC should become LO before the next WRin arrives. It is observed that the critical path of them takes five-gate delays [p’ u’ s q v] (p’ and u’ are transistor gates in the ME circuit of the predecessor WC). In consequence, it does not exceed the six-gate delays, the cycle time of the GasP FIFO. The cycle time larger than six-gate delay is guaranteed by an input data controller which is implemented by a single-cell GasP circuit. The circuit on the shadow region in Fig. 5 is the input controller. Firstly, data items cannot come into the proposed FIFO faster than six-gate delay since the GasP based control circuit takes six-gate delay as its cycle time at its maximum speed. Notice also that further input data items are blocked by the input controller when the first cell in the proposed FIFO stores a data item to its latch. The 1 in Fig. 5 shows the signal solid arrow marked by transition sequence occurred when the first FIFO cell 2 shows becomes full. The dashed arrow marked by a signal transition path interacting with an input envi1 and 2 are denoted ronment. When path delays of by Tack and treq , respectively, the timing constraint ‘Tack < treq ’ should be satisfied to block new data items to come in. After producing a transition forking at the 1 takes output of the NAND gate, the transition path four-gate delay to arrive at the input of the NAND gate 2 takes five-gate delay to arwhile the transition path rive at the other input of the NAND gate. Therefore, the timing constraint is satisfied by the input controller and the acknowledge transition arrives at the input of the NAND gate faster than new request


5 Table 1

Performance comparison

Performance

Proposed

GasP

(gate-delay)

WPR

HPR

(HPR)

forward latency

2

4

4

reverse latency

-

2

2

cycle time

6

6

6

transition. It guarantees that the acknowledge transition by 1 can turn off one of the N-type transistors in the NAND gate before the new request transition propagates through the NAND gate. Subsequently, the next request cannot be propagated into the proposed FIFO. 3.3 Latch control for the waving GasP In GasP circuits, the output of the special NAND gate controls the N-type pass gate for the data item. Since the proposed FIFO handles both operation modes on each cell, however, we replace the control signal of the pass gate with the Sin signal, the state conductor of the successor PLACE, as shown in Fig. 3. When the Sin signal is HI (the successor PLACE is empty, but the operation mode is unknown), the pass gate is opened for the fast waving data or the next handshaking data in advance. Otherwise the successor PLACE is full (Sin = LO), the switch on the pass gate is closed to prevent the new data from overwriting no matter which operation mode. Therefore, the Sin signal can control the pass transistor for data items, but we use its buffered signal for the wide-bit data item. Since the critical delay to the pass gate is six-gate delay which is the sum of the forward latency of the original GasP circuits [a b c d] and two gate-delay for the buffer [e] in Fig. 3 (A buffer implemented using two inverter gates, thus the delay of the buffer [e] is two-gate delay), it does not influence the cycle time of the GasP FIFO. In addition to the control of the pass gate, data latches of the proposed FIFO is modified from the GasP circuits, because the propagation speed of the data item should be same as one of the wave. To align the data item with the fast wave signal (two gate-delays), we remove an output inverter on the data path of each PLACE. Therefore, an odd cell of the FIFO has a negated value. Subsequently the number of the FIFO cells should be even for the correct value. Combinational logics can be embedded into the proposed FIFO so that cells perform some functions with incoming data items like pipelines do. However, we focus primarily on a design of a FIFO working as a buffer. Negated values of the odd cells do not cause any problem if we use even number of cells in this FIFO buffer application.

3.4 Performance comparison Table 1 summarizes the performance of the proposed and the GasP FIFO in terms of latency and the cycle time. Compared to the state-of-art asynchronous FIFO (GasP circuits as the handshake mode only), the proposed FIFO reduces forward latency to two gate-delays in the WPR, while retaining the same cycle time. The four-gate delay forward latency in HPR mode corresponds to transition path [a b c d] in the SR-NAND of Fig. 3. On the other hand, the two-gate delay forward latency in WPR mode corresponds to transition path [b d] for rising WRin or [a d] for falling WRin in the WC of Fig. 4. In a WPR mode, the cycle time of the proposed FIFO is determined by the input controller. Since the input controller based on GasP circuits handshakes with an input environment with six-gate delay cycle time, the cycle time of the proposed FIFO is six-gate delay in a WPR mode as summarized in Table 1. 3.5 Design issues on reliability The robustness of the proposed FIFO mainly depends on the robustness of its wave control/datapath circuits. It is well known that there exists inevitable transistor delay variation, which must be accounted in order to establish the minimum separation between waves [6]. The maximum difference between the fastest and slowest wave paths of the proposed FIFO becomes a factor limiting its cycle time performance for its safe and robust operation [6]. To analyze the effect of delay variation for the pessimistic case, let’s denote ‘α’ the maximum variation rate of transistor switching delay. Then, the maximum difference of the n-cell FIFO is approximated to be ‘4 × n × α × D’ where D denotes maximum delay of switching a transistor. If the cycle time is less than the maximum difference, then it is possible to have collisions between successive waves. In consequence, the proposed FIFO is robust when the conditime tion 4 × n × α × D < ‘cycle time’ (i.e., α < cycle 4×n×D ) is satisfied. For example, if we use six-gate delay (approximately 6 × D) as the cycle time of a ten-cell FIFO (n = 10) implemented in the proposed scheme, the FIFO is safe under the constraint ‘α < 6/(4×10) = 0.15’. It implies that the ten-cell FIFO can be worked correctly under less than approximately 15 % of transistor delay variation. To have higher than 15 % variation of the transistor delay for the more robust design, we need to increase the cycle time or to reduce the number of cells in the FIFO. It is noteworthy that robustness is tradeoff relation with the cycle time performance and the time number of FIFO cells. The equation, α < cycle 4×n×D , is derived from a simple approximate analysis under strict path delay balancing of wave control and data paths.


6 St_Cond 1

St_Cond 3

St_Cond 2

St_Cond4

Req_In

0.3ns

Ack_out

SRNAND

SRNAND

SRNAND

Wrout1 Req_out

Wave Propagation

Wrout2 WC

WC

Req_In

Wrout1

Fig. 6

WC

WC

Wrout2

LO

Wrout3

Wrout3

The 4 cells of the proposed FIFO

More thorough analysis and design trade-offs involved in the robustness of the proposed FIFO will be covered in the future work. 4.

Preliminary simulation results

4.1 The handshake operating FIFO in a throughput critical case Fig. 7 shows wave forms of control signals in the proposed FIFO. We assume that the FIFO is full (St Cond* = LO), and its output is blocked by the environment

Ack_out 0.3ns

HPR -> WPR full

St_Cond3

HPR -> WPR full

St_Cond2

St_Cond1

empty

Handshaking Propagation

HPR -> WPR empty

full

HPR -> WPR full

Fig. 7

empty

St_Cond3

empty

empty

The handshake operating FIFO

WPR -> HPR full

WPR -> HPR

St_Cond2

empty

St_Cond1

empty

full

WPR -> HPR

WPR -> HPR

Fig. 8

The proposed FIFO has been designed in a 0.25µm, 2.5V CMOS process technology and initial pre-layout HSPICE simulations have been made to verify the correctness and evaluate the performance of the FIFO. Fig. 6 shows the wave-handshake hybrid FIFO consists of four cells (4 PLACEs and 3 PATH logics) and 4-bit wide data items for the simulation. An input Sin signal of the last WC is set to LO (meaning the next cell is always full) so that the FIFO communicates with the output environment by the handshaking scheme only. Subsequently, we will show three possible examples of the FIFO including only handshaking operation case, wave pipelining, and a mixed one of both modes.

St_Cond4

St_Cond4

full

full

The wave pipeline operating FIFO

(Ack out = HI), initially. This is the case that all the cells in the FIFO belong to the HPR and throughput is more important than latency. A little while later, the output gets the acknowledgement by driving the Ack out signal LO. Subsequently, the state of the last cell is changed to empty (a St Cond4 signal becomes HI) and other cells start transferring data items by handshaking each other as the conventional GasP FIFO does (see that the state conductor signals are fluctuating). As the output environment repeats the acknowledgement with about 0.3 ns of the cycle time, the FIFO delivers all data items to the environment. Finally, it becomes empty (i.e., all the cells of the FIFO belong to the WPR). For the performance of the handshake mode, 180 ps forward latency per a cell and 3.3 GHz throughput is measured in the proposed FIFO. The signal transition path that corresponds to 180 ps forward latency is [a b c d] in the SR-NAND of Fig. 3. 4.2 The wave pipeline operating FIFO in a latency critical case Once the FIFO becomes empty, we put four data items into the FIFO by triggering Req In signal at the rate of 3.3 GHz throughput with 0.3 ns cycle time while the output is still blocked. Fig. 8 shows wave forms of signals in this situation (see four positive pulses on a Req In signal, and the Ack out signal is not shown, but it keeps HI value). This is the case that latency is critical for the performance of the FIFO. Since all the cells belong to the WPR, the first wave and its data item propagate to the end of the FIFO (see first positive pulses on Wrout1, Wrout2, and Wrout3), and change the last cell to full (St Cond4 = LO) because the Sin of the last WC is LO. Note that the first item does not change the state of empty cells in the WPR. Subsequently, the second and third data


7

Req_In

0.3ns

Wave Propagation

Wrout1 Wrout2 Wrout3 Ack_out

Handshake Propagation

0.3ns

HPR

St_Cond4

full

St_Cond3

full

St_Cond2

full

HPR ->

St_Cond1

full

HPR -> WPR

full

HPR ->WPR ->HPR

WPR

full

HPR -> WPR empty

third cell at the second PATH logic. Subsequently, it is saved at the second cell (St Cond2 becomes LO) and the data movement is controlled by the handshaking mode after that. Finally, these two items are piled up at the last two cells by the handshaking scheme. In the simulation, we can observe that the WPR and the HPR expands or shrinks depending on the data flow of the FIFO. In the beginning, the entire FIFO belongs to the HPR. However, as it gets the acknowledgements from the environment, the WPR expands (or the HPR shrinks) to the third cell (St Cond3). Then, the HPR expands (or the WPR shrinks) again after the new waving items meet the HPR. Finally, a half of cells belong to the WPR and others to the HPR.

empty

5. Fig. 9

The wave-handshake hybrid FIFO

items stop wave propagation at the third and second cell, respectively, after they meet the HPR (the next cell to the WPR is full). For the last data item, it is just saved in the first cell and changes it the full state (St Cond1 = LO). Finally, the FIFO becomes full and all the cells inside belong to the HPR again. From the simulation, the wave propagation delay per a cell is measured to about 90 ps, and the proposed FIFO works well in the wave pipelining mode with the same cycle time of the GasP circuits. The actual signal transition path that corresponds to 90 ps forward latency can be denoted as [b d] for rising WRin or [a d] for falling WRin in Fig. 4. Since the input controller is implemented with GasP circuits and the incoming data rate is controlled by the controller, the cycle time of the proposed FIFO is same to that of GasP. 4.3 The mixed case of both wave pipeline and handshake operating modes Fig. 9 shows that some of the cells are operated as wave pipelining and others as handshaking mode in the FIFO. After the FIFO becomes full, the output environment acknowledges four data items every 0.3 ns (see four negative pulses in the Ack out). Subsequently, the FIFO issues four data items to the environment, as you already seen in Fig. 7 (see that the St Cond4 (=Req Out) signal is surging up and down). However, we put two more data items into the FIFO after acknowledging three data items (see two positive pulses on the Req In signal). Then, the first wave propagates to the second cell (see first pulses on the Wrout1 and Wrout2) and its corresponding data item is saved in the third cell (St Cond3 goes LO), since the first three cells belong to the WPR at this time (see that the St Cond1, St Cond2, and St Cond3 are LO when the wave arrives). For the second data item, it passes the first cell along with the wave, but it is blocked by the full state of the

Conclusion

In this paper, we propose a new asynchronous FIFO design to improve latency performance while enjoying benefits of the simple linear structure. The proposed FIFO adopts the well-known wave pipelining technique in a conventional self-timed FIFO. Neither handshaking nor latching control is required during wave pipelining, the proposed FIFO reduces forward latency of the cells in the wave pipelining region, which is a chain of empty cells in the beginning of the FIFO. Depending on the data flow, the FIFO is operated as wave pipelining in a latency-critical situation, whereas it transfers data items by the handshaking scheme in a throughputcritical one. Initial pre-layout HSPICE simulations show that the proposed FIFO has 90ns forward latency per cell for the wave mode, 180ns for the handshake mode while keeping 3.3 GHz throughput in a 0.25µm CMOS technology. Compared to a state-of-art asynchronous FIFO, GasP circuits, it improves latency performance by two times in the WPR. Thanks to the linear structure, furthermore, the proposed scheme brings a small area increase only in the control circuitry even when we apply a wide-bit data. Acknowledgement This work was supported by the Center for Distributed Sensor Network at GIST. References [1] W. Coates and R. Drost, “Congestion and Starvation Detection in Ripple FIFOs,” In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 33-13, May. 2003. [2] J. T. Yantchev, C. G. Huang, M. B. Josephs, and I. M. Nedelchev, “Low-latency asynchronous FIFO buffers,” In Asynchronous Design Methodologies, pages 24-31. IEEE Computer Society Press, May 1995. [3] Erik Brunvand, “Low latency self-timed flow-through FIFOs,” In Proceedings of Advanced Research in VLSI, pages 76-90, 1995.


8

[4] Tiberiu Chelcea and Steven M. Nowick, “Low-latency asynchronous FIFO’s using token rings,” In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 210-220, April 2000. [5] Jo Ebergen, “Squaring the FIFO in GasP,” In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 194-205, March 2001. [6] W. P. Burleson, M. Ciesielski, F. Klass, W. Lie, “WavePipelining: A Tutorial and Research Survey,” IEEE Transactions on VLSI Systems, 6(3):464-474, September 1998. [7] S. Fairbanks and I. Sutherland, “GasP: A minimal FIFO control,” In Proc. International Symposium on Advanced Research in Asynchronous Circuits and Systems, pages 46-53, March 2001.

Jeong-Gun Lee received the B.S. degree in computer science from Hallym University 1996, and M.S. degree in information and communication from GIST, in 1998. Currently he is Ph.D. student in Department of Information and Communications of GIST. His research interests include Petri-net theory and its application to concurrent systems, asynchronous circuit design and CAD.

Suk-Jin Kim received the B.S. degree in electronics engineering from Kyunghee University 1998, and M.S. degree in information and communication from GIST, in 2000. Currently he is a Ph.D. student in Department of Information and Communications of GIST. His research interests include the synchronization and the power saving in Globally Asynchronous Locally Synchronous (GALS) systems.

Jeong-A Lee received the B.S. degree in computer engineering with honors from Seoul National University 1982, M.S. degree in computer science from Indiana University, Bloomington, in 1985, and the Ph.D. degree in computer science from the University of California, Los Angeles, in 1990. From 1990 to 1995, she is an Assistant Professor at the Department of Electrical Engineering, University of Houston. Since joining Chosun University, in 1995, she is presently a Professor. Her research interests include computer architecture, fast digital arithmetic, application specific VLSI architectures for digital signal processing and configurable computing.

Kiseon Kim received the B.Eng. and M.Eng from Seoul National University, all in electronics engineering, in 1978 and 1980, and Ph.D. degree from the Univer-

sity of Southern California, Los Angeles, in 1987, in electrical engineering systems. From 1988 to 1991, he was with Schoumberger in Texas, as a senior development engineer where he has been involved in development of telemetry systems. From 1991 to 1994, he was a computer communications specialist for Superconducting Super Colider Lab., in TX. Since joining GIST, in 1994, he is presently a Professor. His research interests include wideband digital communications system design, analysis and implementation.

A Low Latency Asynchronous FIFO Combining a Wave ... - Google Sites

A Low Latency Asynchronous FIFO Combining a Wave ... - Google Sites

Suggest Documents

472 MHz throughput asynchronous FIFO design on a ... - Google Sites

Slotted-FIFO Communication for Asynchronous

472MHz throughput asynchronous FIFO design on a Virtex-5 FPGA ...

Low-Latency Millimeter-Wave Communications: Traffic ... - arXiv

a robust ultra-low power asynchronous fifo memory with ... - IEEE Xplore

Slotted-FIFO Channels for Asynchronous Distributed ... - CiteSeerX

YASIR: A Low-Latency, High- Integrity Security Retrofit ... - Google Sites

Direct Mapping of Low-Latency Asynchronous ... - IEEE Xplore

Clocked and Asynchronous FIFO Characterization and Comparison

Hierarchical Cluster-Based FIFO Asynchronous Data Transfer ...

Ultra-Low Latency (ULL) Networks: A Comprehensive

A comprehensive simulation study of low latency

Design and Analysis of a Low Latency

A Low Latency Electrocardiographic QRS Activity

Implementing a Low Cost, Low Latency Parallel ... - Semantic Scholar

A Low-resource Low-latency Hybrid Adaptive ...

A Low-Latency, Low-Area Hardware Oblivious ... - People.csail.mit.edu

Quasi-Resonant Interconnects: A Low Power, Low Latency Design

Achieving Ultra-Low Latency in 5G Millimeter Wave Cellular ... - arXiv

Tiered-Latency DRAM: A Low Latency and Low ... - 400 Bad Request

Demartek Low Latency Evaluation

Low-latency trading - NYU

Low-latency trading - NYU

Low-latency trading - NYU