Application of Credit-Based Flow Control to RSFQ ... - CiteSeerX

Application of Credit-Based Flow Control to RSFQ Micropipelines Dmitry Y. Zinoviev+ and M. Maezaway + SUNY, Stony Brook, NY 11794, USA y ETL, Tsukuba, Ibaraki, Japan

T1 C2

T1

C

C

D

D

D

D

N

N+1

N+2

N+3

n

C

t

C

i ed

Manuscript received September 15, 1998. This work was supported in parts by DoD's University Research Initiative via AFOSR and by DARPA/NSA/NASA via the JPL-led HTMT project.

C1

cr

The operation of a traditional micropipeline (Fig. 1) is based on a simple handshaking mechanism: When the (N + 1)'th stage of the pipeline is ready to accept new data, it indicates this by sending a credit message to the N 'th stage, and the N 'th stage responds with the data followed by an indication message (Fig. 3A). This is similar to the well-known request/acknowledgment protocol, only the request (indication) and the acknowledgment (credit) occur in reverse order. Data can only be forwarded to

We propose an RSFQ version of the credit-based ow control mechanism rst applied by Kung et al. [5] to ATM networks. This mechanism can to some extent hide the round-trip latency and signi cantly improve the throughput of a micropipeline. The idea behind the credit-based ow control (Fig. 2) is to provide enough additional buers at the receiving side (stage N +1) and an up-down counter, or credit pool, at the sending side (stage N ) that re ects the availability of the free storage at the receiving side. Knowing there

tio

II. Traditional Micropipelines and Their Drawbacks

III. Credit-based Flow Control

ca

Micropipelines [1] are used for the asynchronous delivery of data and control signals. They do not require global clocking, and therefore are not in uenced by clock skew. This advantage is of key importance for the emerging very-high-speed digital circuits, including those belonging to the superconductor rapid single ux quantum (RSFQ) logic/memory family [2]. Recent studies of the possibility of peta ops scale computations ([3], [4]) raise the issue of reliable distributed asynchronous on-chip and chip-tochip communication media, and micropipelines may be good candidates to occupy that niche.

di

I. Introduction

the next stage if both the indication and the credit are present. This condition is enforced by using so-called Muller \C" elements (denoted as \C" in Fig. 1). An RSFQ \C" element [2] produces an output SFQ pulse and returns to its initial state only after SFQ pulses arrive at both of its inputs (in an arbitrary sequence). The \precharged" version of the \C" elements (denoted as a \C" element with a dot at one of the inputs) behaves as if the corresponding input signal had already arrived. Traditional RSFQ micropipelines are simple and robust, but their throughput is limited by the round-trip

ight time between two consecutive micropipeline stages: after sending out an indication, the N 'th stage must wait for the next credit from the (N + 1)'th stage. This time can be long, especially in the case of across-chip or chipto-chip transfers. For example, if the two stages are placed in the opposite corners of a 2 cm2 cm chip, the maximum available throughput in the pipeline will be 3.5 GHz, which is ten times less than the performance that typical existing RSFQ circuits can achieve. Note also that the aggregate throughput of a micropipeline is determined by the throughput of its slowest stage.

in

Abstract|Traditional micropipelines based on handshaking mechanisms are simple and reliable, but their throughput is limited by the round-trip

ight time between two consecutive micropipeline stages. We propose an RSFQ implementation of a micropipeline with simple credit-based ow control that can hide the round-trip latency and signi cantly improve the throughput. In this paper, we present numerically calculated and experimentally measured throughput for several types of RSFQ credit-controlled micropipelines (including the special case of a micropipeline with only one credit), and their critical comparison.

Fig. 1. A traditional micropipeline.

C1

T2

C

D0

D1

T1

T1

C

C3

T3

C4

C

C

D3

D4

it ed cr

C

C2

credit pool

C

in ight: Tc Trt =Nc. The minimum number of credits necessary to hide this round-trip latency can be calculated using the following equation:

N

c

N+1

Fig. 2. A micropipeline with credit-based ow control.

are M buers available at the (N + 1)'st stage, it is possible to send at most M data messages without waiting for additional credits (i.e., to overlap several credits and indications). Let us consider the operation of a credit-controlled micropipeline with two credits (Fig. 3B): An indication from the N 'th stage is sent not only to the next stage, but to the local credit pool C2 as well. The number of credits in the pool is decremented (if non-zero), and a credit is sent back to the N 'th synchronization element C1. The original indication and the accompanying data propagate to the remote synchronization elements C3 or C4 and corresponding data latches, depending on space availability. When the data in the head latch of the (N + 1)'th stage D4 is advanced to the (N + 2)'th stage, credits are sent back to the synchronization element of the previous latch of the same stage C3 and to the credit pool of the previous stage C2. We assume that it takes time T1 to deliver a signal in either direction between two consecutive stages of the micropipeline, time T2 to obtain a credit from the local credit pool, and time T3 to propagate an indication between two consecutive latches within one micropipeline stage. To accommodate at most Nc messages in the described loop, the minimum time Tc between any two consecutive requests (reciprocal to the data rate) must be greater or equal to a single request round-trip time Trt = 2T1 + (T2 + T3) (Nc ? 1) divided by the total number of requests C2

n ca tio in

di

it

ed

cr

T1 T1

C1

A)

2T1 ? T2 ? T3 : Nc = T ?T ?T

indication

2

(1)

3

In the case of an \ideal" micropipeline with instant credit processing and propagation (T2 = T3 = 0), Nc = d2T1=Tce. The ultimate data rate in a micropipeline with an in nite number of credits (Nc = 1) is 1= (T2 + T3 ). It takes time Ta = T1 + T3 (Nc ? 1) Nc to move data across one micropipeline stage. Therefore, the number of credits should be limited to the minimum that is necessary. IV. Experimental Setup

A setup for the experimental measurement of the micropipelines' throughput has been designed, successfully fabricated in 3.5 m niobium-trilayer technology [6], and tested using the Octopux superconductor digital tester [7]. Fig. 4 shows the schematics of the setup. It consists of: 1) the micropipeline itself split into two pieces separated by two long segments of Josephson transmission lines (JTLs) with independent dc power supply (these JTLs emulate long interconnecting wires); 2) a segment of a JTL with an overbiased Josephson junction that generates one SFQ and stalls (emulating an eager sender), and 3) a segment of a JTL connecting the indication output and the credit input of the last stage of the micropipeline and operating as an eager receiver. The eager sender generates an indication every time it receives a credit from the micropipeline. The eager receiver generates a credit every time it receives an indication. Therefore, the environment shown in Fig. 4 allows the micropipeline to be tested at the maximal speed. It is also possible to simulate dierent-length interconnects by changing the dc current I(jtl). The throughput R of the micropipeline can be measured indirectly with great accuracy using the relationship between the generation frequency of a Josephson junction and the average voltage across the junction:

Tc

f = V=0 :

(2)

C4

I(In)

I(jtl)

n cati o indi

it

cred

T1 T2

V(In)

C3

JTL C2

micropipeline

micropipeline

JTL C1

B)

Tc

Fig. 3. Flow control in micropipelines: (A) traditional one-credit, and (B) credit-based with two credits.

pseudosender

pseudoreceiver

Fig. 4. Experimental setup to measure the throughput of the micropipelines.

20

10

5

0 0.2

4.8 3.2 2.7 2.6 2.4 2.3 2.2 2.0 1.7

15

Data rate, GHz

Data rate, GHz

15

20 4.8 4.4 3.8 3.2 2.8 2.6 2.4 2.3 2.2

10

5

0.4

0.6 I(In), mA

0 0.2

0.8

Fig. 5. Micropipeline with one credit.

0.6 I(In), mA

0.8

Fig. 6. Micropipeline with two credits.

In a steady state the throughput is equal to the indication rate produced by, and to the credit rate visible by the eager sender, which in turn can be obtained by measuring the dc voltage on any of the Josephson junctions within the DC/SFQ converter, V (In): (3)

The accuracy of a single-shot measurement is inherently limited to the LSB of the voltmeter. In our experiment, we used a Keithley 2001 digital multimeter with an LSB equal to 0.1 V. However, the actual accuracy was lower (0.7 V) due to internal noise in other parts of the setup. V. Experimental Results

Figures 5 and 6 present a family of I-V curves of the micropipelines with one and two credits, respectively. The dependence of the input voltage V (In) (converted into the data rate R) on the dc current I (In) applied to the converter has been built for various values of the dc bias current I(jtl) applied to the internal JTLs, from 1.7 mA to 4.8 mA. (The aggregate critical current of the JTL segments is 5 mA.) V (In) is proportional to the throughput of a micropipeline, and I (In) controls the delay in the internal loop. Both gures are to the same scale. If I (In) is less than the eective critical current of the DC/SFQ converter, no indications are generated, and therefore there is no trac in the micropipelines. Increasing I (In) initiates the Josephson generation in the converter, which is however constrained by the micropipeline: the at segments of the I-V curves correspond to the maximum-throughput operation of the micropipelines. If I (In) is large enough, the converter can generate more than one indication before getting a credit. This behavior violates the micropipeline discipline (see the right parts of both gures).

20 1 credit 2 credits 15 Data rate, GHz

R = V (In)=0 :

0.4

10

5

0 1.5

2.5

3.5 I(jtl), mA

4.5

Fig. 7. Maximal throughput (data rate) in one- and two-credit micropipelines as a function of I(jtl).

Fig. 7 shows the dependence of the maximal throughput on the dc bias current I(jtl) (and therefore on the delay in the long loop) extracted from p Fig. 5 and 6. At I(jtl)>2.8 mA there takes place a 1= LC resonance which distorts the I-V curve of the JTL and invalidates the results. Using data from Fig. 7, it is possible to extract some timing parameters of the micropipelines. For a one-credit micropipeline with maximum throughput R1 (from (1) with Nc = 1): T1 = (2R1 )?1 :

(4)

The dependence of T1 on I(jtl) is shown in Fig. 8. This dependence is very typical for RSFQ circuits [8]. On the other hand, in a two-credit micropipeline with

25 200

20

Theoretical value

15 (T1+T2)/2, ps

T1, ps

150

100

10 Distorted data

5 0

50 -5 0 1.5

2.5

3.5 I(jtl), mA

4.5

Fig. 8. Delay in the long loop as a function of the dc bias current.

maximum throughput R2 , the following is true: R2?1 ? Tl = T1 (5) (from (1) with Nc = 2), where Tl = (T2 + T3 )=2 is the delay in the credit control logic. Assuming that T1 is by design the same in both cases, the control delay can be calculated: Tl = R2?1 ? (2R1)?1 : (6) The experimental results, as well as the result of the PSCAN simulation, are shown in Fig. 9. For 2.3 mA

Application of Credit-Based Flow Control to RSFQ ... - CiteSeerX

Application of Credit-Based Flow Control to RSFQ ... - CiteSeerX

Suggest Documents

Nonlinear Control of Incompressible Fluid Flow: Application to Burgers ...

Application of SABO Technology to Control Debris Flow and Landslides

application of flow control solutions to a disc-wing

application of adaptive control to atm abr congestion control - CiteSeerX

polyacrylamide application to control furrow irrigation ... - CiteSeerX

Delay Insensitive Logic for RSFQ Superconductor ... - CiteSeerX

A methodology for optimal laminar flow control: Application to ... - DICAT

Application of Parallel Computers to Enhance the Flow ... - CiteSeerX

Hayward Flow Control Flow Control

An Application of Flow Graph Interreciprocity. - CiteSeerX

Model predictive control of HVDC power flow to improve ... - CiteSeerX

Computational Analysis of Active Flow Control to Reduce ... - CiteSeerX

Model predictive control of HVDC power flow to improve ... - CiteSeerX

application of epics on f3rp61 to accelerator control - CiteSeerX

design and control of quadrotors with application to ... - CiteSeerX

Application of the control volume method to a ... - CiteSeerX

design and control of quadrotors with application to ... - CiteSeerX

Neural control of fast nonlinear systems Application to a ... - CiteSeerX

Application of Fuzzy Control to a Sonar-Based Obstacle ... - CiteSeerX

The Application of Emissions Control Technologies to a ... - CiteSeerX

application of ultra-violet radiation to control bacterial ... - CiteSeerX

Application of Adaptive Predictive Control to a Newborn ... - CiteSeerX

CFD code application to flow through narrow channels ... - CiteSeerX

CFD code application to flow through narrow channels ... - CiteSeerX