Application of Credit-Based Flow Control to RSFQ ... - CiteSeerX

0 downloads 0 Views 175KB Size Report
ity of peta ops scale computations ( 3], 4]) raise the issue of reliable distributed ... (N + 1)'th stage of the pipeline is ready to accept new data, it indicates this by ...
Application of Credit-Based Flow Control to RSFQ Micropipelines Dmitry Y. Zinoviev+ and M. Maezaway + SUNY, Stony Brook, NY 11794, USA y ETL, Tsukuba, Ibaraki, Japan

T1 C2

T1

C

C

D

D

D

D

N

N+1

N+2

N+3

n

C

t

C

i ed

Manuscript received September 15, 1998. This work was supported in parts by DoD's University Research Initiative via AFOSR and by DARPA/NSA/NASA via the JPL-led HTMT project.

C1

cr

The operation of a traditional micropipeline (Fig. 1) is based on a simple handshaking mechanism: When the (N + 1)'th stage of the pipeline is ready to accept new data, it indicates this by sending a credit message to the N 'th stage, and the N 'th stage responds with the data followed by an indication message (Fig. 3A). This is similar to the well-known request/acknowledgment protocol, only the request (indication) and the acknowledgment (credit) occur in reverse order. Data can only be forwarded to

We propose an RSFQ version of the credit-based ow control mechanism rst applied by Kung et al. [5] to ATM networks. This mechanism can to some extent hide the round-trip latency and signi cantly improve the throughput of a micropipeline. The idea behind the credit-based ow control (Fig. 2) is to provide enough additional bu ers at the receiving side (stage N +1) and an up-down counter, or credit pool, at the sending side (stage N ) that re ects the availability of the free storage at the receiving side. Knowing there

tio

II. Traditional Micropipelines and Their Drawbacks

III. Credit-based Flow Control

ca

Micropipelines [1] are used for the asynchronous delivery of data and control signals. They do not require global clocking, and therefore are not in uenced by clock skew. This advantage is of key importance for the emerging very-high-speed digital circuits, including those belonging to the superconductor rapid single ux quantum (RSFQ) logic/memory family [2]. Recent studies of the possibility of peta ops scale computations ([3], [4]) raise the issue of reliable distributed asynchronous on-chip and chip-tochip communication media, and micropipelines may be good candidates to occupy that niche.

di

I. Introduction

the next stage if both the indication and the credit are present. This condition is enforced by using so-called Muller \C" elements (denoted as \C" in Fig. 1). An RSFQ \C" element [2] produces an output SFQ pulse and returns to its initial state only after SFQ pulses arrive at both of its inputs (in an arbitrary sequence). The \precharged" version of the \C" elements (denoted as a \C" element with a dot at one of the inputs) behaves as if the corresponding input signal had already arrived. Traditional RSFQ micropipelines are simple and robust, but their throughput is limited by the round-trip

ight time between two consecutive micropipeline stages: after sending out an indication, the N 'th stage must wait for the next credit from the (N + 1)'th stage. This time can be long, especially in the case of across-chip or chipto-chip transfers. For example, if the two stages are placed in the opposite corners of a 2 cm2 cm chip, the maximum available throughput in the pipeline will be 3.5 GHz, which is ten times less than the performance that typical existing RSFQ circuits can achieve. Note also that the aggregate throughput of a micropipeline is determined by the throughput of its slowest stage.

in

Abstract|Traditional micropipelines based on handshaking mechanisms are simple and reliable, but their throughput is limited by the round-trip

ight time between two consecutive micropipeline stages. We propose an RSFQ implementation of a micropipeline with simple credit-based ow control that can hide the round-trip latency and signi cantly improve the throughput. In this paper, we present numerically calculated and experimentally measured throughput for several types of RSFQ credit-controlled micropipelines (including the special case of a micropipeline with only one credit), and their critical comparison.

Fig. 1. A traditional micropipeline.

C1

T2

C

D0

D1

T1

T1

C

C3

T3

C4

C

C

D3

D4

it ed cr

C

C2

credit pool

C

in ight: Tc  Trt =Nc. The minimum number of credits necessary to hide this round-trip latency can be calculated using the following equation: 

N

c

N+1

Fig. 2. A micropipeline with credit-based ow control.

are M bu ers available at the (N + 1)'st stage, it is possible to send at most M data messages without waiting for additional credits (i.e., to overlap several credits and indications). Let us consider the operation of a credit-controlled micropipeline with two credits (Fig. 3B): An indication from the N 'th stage is sent not only to the next stage, but to the local credit pool C2 as well. The number of credits in the pool is decremented (if non-zero), and a credit is sent back to the N 'th synchronization element C1. The original indication and the accompanying data propagate to the remote synchronization elements C3 or C4 and corresponding data latches, depending on space availability. When the data in the head latch of the (N + 1)'th stage D4 is advanced to the (N + 2)'th stage, credits are sent back to the synchronization element of the previous latch of the same stage C3 and to the credit pool of the previous stage C2. We assume that it takes time T1 to deliver a signal in either direction between two consecutive stages of the micropipeline, time T2 to obtain a credit from the local credit pool, and time T3 to propagate an indication between two consecutive latches within one micropipeline stage. To accommodate at most Nc messages in the described loop, the minimum time Tc between any two consecutive requests (reciprocal to the data rate) must be greater or equal to a single request round-trip time Trt = 2T1 + (T2 + T3) (Nc ? 1) divided by the total number of requests C2

n ca tio in

di

it

ed

cr

T1 T1

C1

A)



2T1 ? T2 ? T3 : Nc = T ?T ?T

indication

2

(1)

3

In the case of an \ideal" micropipeline with instant credit processing and propagation (T2 = T3 = 0), Nc = d2T1=Tce. The ultimate data rate in a micropipeline with an in nite number of credits (Nc = 1) is 1= (T2 + T3 ). It takes time Ta = T1 + T3 (Nc ? 1)  Nc to move data across one micropipeline stage. Therefore, the number of credits should be limited to the minimum that is necessary. IV. Experimental Setup

A setup for the experimental measurement of the micropipelines' throughput has been designed, successfully fabricated in 3.5 m niobium-trilayer technology [6], and tested using the Octopux superconductor digital tester [7]. Fig. 4 shows the schematics of the setup. It consists of: 1) the micropipeline itself split into two pieces separated by two long segments of Josephson transmission lines (JTLs) with independent dc power supply (these JTLs emulate long interconnecting wires); 2) a segment of a JTL with an overbiased Josephson junction that generates one SFQ and stalls (emulating an eager sender), and 3) a segment of a JTL connecting the indication output and the credit input of the last stage of the micropipeline and operating as an eager receiver. The eager sender generates an indication every time it receives a credit from the micropipeline. The eager receiver generates a credit every time it receives an indication. Therefore, the environment shown in Fig. 4 allows the micropipeline to be tested at the maximal speed. It is also possible to simulate di erent-length interconnects by changing the dc current I(jtl). The throughput R of the micropipeline can be measured indirectly with great accuracy using the relationship between the generation frequency of a Josephson junction and the average voltage across the junction:

Tc

f = V=0 :

(2)

C4

I(In)

I(jtl)

n cati o indi

it

cred

T1 T2

V(In)

C3

JTL C2

micropipeline

micropipeline

JTL C1

B)

Tc

Fig. 3. Flow control in micropipelines: (A) traditional one-credit, and (B) credit-based with two credits.

pseudosender

pseudoreceiver

Fig. 4. Experimental setup to measure the throughput of the micropipelines.

20

10

5

0 0.2

4.8 3.2 2.7 2.6 2.4 2.3 2.2 2.0 1.7

15

Data rate, GHz

Data rate, GHz

15

20 4.8 4.4 3.8 3.2 2.8 2.6 2.4 2.3 2.2

10

5

0.4

0.6 I(In), mA

0 0.2

0.8

Fig. 5. Micropipeline with one credit.

0.6 I(In), mA

0.8

Fig. 6. Micropipeline with two credits.

In a steady state the throughput is equal to the indication rate produced by, and to the credit rate visible by the eager sender, which in turn can be obtained by measuring the dc voltage on any of the Josephson junctions within the DC/SFQ converter, V (In): (3)

The accuracy of a single-shot measurement is inherently limited to the LSB of the voltmeter. In our experiment, we used a Keithley 2001 digital multimeter with an LSB equal to 0.1 V. However, the actual accuracy was lower (0.7 V) due to internal noise in other parts of the setup. V. Experimental Results

Figures 5 and 6 present a family of I-V curves of the micropipelines with one and two credits, respectively. The dependence of the input voltage V (In) (converted into the data rate R) on the dc current I (In) applied to the converter has been built for various values of the dc bias current I(jtl) applied to the internal JTLs, from 1.7 mA to 4.8 mA. (The aggregate critical current of the JTL segments is 5 mA.) V (In) is proportional to the throughput of a micropipeline, and I (In) controls the delay in the internal loop. Both gures are to the same scale. If I (In) is less than the e ective critical current of the DC/SFQ converter, no indications are generated, and therefore there is no trac in the micropipelines. Increasing I (In) initiates the Josephson generation in the converter, which is however constrained by the micropipeline: the at segments of the I-V curves correspond to the maximum-throughput operation of the micropipelines. If I (In) is large enough, the converter can generate more than one indication before getting a credit. This behavior violates the micropipeline discipline (see the right parts of both gures).

20 1 credit 2 credits 15 Data rate, GHz

R = V (In)=0 :

0.4

10

5

0 1.5

2.5

3.5 I(jtl), mA

4.5

Fig. 7. Maximal throughput (data rate) in one- and two-credit micropipelines as a function of I(jtl).

Fig. 7 shows the dependence of the maximal throughput on the dc bias current I(jtl) (and therefore on the delay in the long loop) extracted from p Fig. 5 and 6. At I(jtl)>2.8 mA there takes place a 1= LC resonance which distorts the I-V curve of the JTL and invalidates the results. Using data from Fig. 7, it is possible to extract some timing parameters of the micropipelines. For a one-credit micropipeline with maximum throughput R1 (from (1) with Nc = 1): T1 = (2R1 )?1 :

(4)

The dependence of T1 on I(jtl) is shown in Fig. 8. This dependence is very typical for RSFQ circuits [8]. On the other hand, in a two-credit micropipeline with

25 200

20

Theoretical value

15 (T1+T2)/2, ps

T1, ps

150

100

10 Distorted data

5 0

50 -5 0 1.5

2.5

3.5 I(jtl), mA

4.5

Fig. 8. Delay in the long loop as a function of the dc bias current.

maximum throughput R2 , the following is true: R2?1 ? Tl = T1 (5) (from (1) with Nc = 2), where Tl = (T2 + T3 )=2 is the delay in the credit control logic. Assuming that T1 is by design the same in both cases, the control delay can be calculated: Tl = R2?1 ? (2R1)?1 : (6) The experimental results, as well as the result of the PSCAN simulation, are shown in Fig. 9. For 2.3 mA

Suggest Documents