A New Pipelined Datapath for RISC Processor with

1

A New Pipelined Datapath for RISC Processor with Multiple-Valued Logic Signed-Digit Adder SungWon Chung and Ju Sung Park Pusan National University, Division of Electronics and Computer Engineering

Abstract This paper presents a new pipelined datapath for a general RISC processor with current-mode multiple-valued logic signed-digit adder. The datapath balances the latency in the critical path of each pipeline stage, by dividing signed-digit number addition into two processes that is executed in different pipeline stages. To evaluate, we designed a proposed datapath for MIPS R2000 in the view of algorithm, architecture, and circuit level, and it is faster than conventional MIPS R2000 datapath using carry lookahead adder. The core parts of the critical path are simulated with HSPICE model parameter of 0.25µm MOSIS process. Simulation latency of the 64-bit addition in the datapath is 440ps. Estimated peak speed of the datapath is 1.8GHz with 1.8V Vdd.

I

F

Introduction

UTURE computing domain will require more performance for real-time and multimedia function. The basic characteristics of media-centric applications that a processor needs to

support is real-time response, continuous-media data type operability, fine-grained parallelism, coarse-grained parallelism, high instruction reference locality, high memory bandwidth, and high network bandwidth[1]. To satisfy these needs, improving individual instruction execution time is important as well as instruction throughput. There are various approaches for future comSystem Algorithm

Clustering, ... Complexity, Concurrency, Locality, Computation, ...

Architecture

Parallelism, Reconfigurable, ...

Circuit/Logic

Logic optimization, Low-swing logic, ...

Technology

Threshold reduction, Multi-thresholds, ...

Fig. 1. Approaches for future computing domain in each hierarchy of computer system

puting domain, in each hierarchy of computer system shown as in Fig.1. To improve instruction

2

execution time, in conventional architecture, each pipeline stage should be optimized. But in 64-bit or wider-bit high performance microprocessor, addition latency is often in the critical path.[14] We propose a new design of pipelined datapath with reduced latency in the critical path. The basic idea is, instead of carry look-ahead addition, to use signed-digit addition that has no carry propagation. The basic problem in realizing this idea is to integrate signed-digit adder into binary microprocessor with smallest overhead. To solve this problem, We propose hybrid approach for reducing the latency in the critical path of pipelined datapath, in algorithm, architecture and circuit level. For low addition latency in long word length operation, designed datapath uses, instead of binary logic adder, multiple-valued logic (MVL) signed-digit adder. For efficient implementation of multiple-valued logic (MVL) circuit, current-mode logic circuit[6] is used. In radix-2 signeddigit integer addition, the latency in the translation of resulted sum to binary number is generally a big overhead preventing it’s embedded application in binary system. The proposed design of RISC datapath separates signed-digit number translation process from the execution pipeline stage and deferred it to memory access pipeline stage. Thus, performance gain is acquired by reducing the latency in the critical path in execution pipeline stage. As a result, this datapath can achieve up to two fold speed-up, when compared with carry look-ahead binary adder. The limiting factor is the latency of branch address calculation. The reason is described in section III. This paper describe the design and evaluation of a pipelined datapath for RISC processor with multiple-valued logic (MVL) signed-digit adder based ALU. Section II contains fundamental design decisions about the datapath using signed-digit adder. Section III describes a pipelined MIPS R2000 datapath designed by proposed design strategy. Section IV presents a implementation of a prototype to evaluate performance. Section V discusses the simulation results. Section VI provides a summary. II

Fundamental Design Decisions

This section begins with a brief description of signed-digit number representation and addition. We discuss which number system is interoperable for efficient integration of binary logic processor and multiple-valued logic signed-digit adder. And then, we discuss which multiple-valued logic circuit technology is well suited for our purpose. This section ends with the performance

3

evaluation of the signed-digit adder, to estimate the reasonable word length of signed-digit adder that can get performance gain against carry look-ahead adder. 1

Signed-Digit Number Addition This subsection presents a review of signed-digit number system and signed-digit addition.

For a detailed principle, see [7],[6],[15]. The radix-r signed-digit number representation using a symmetrical digit set is defined as follows: X = (Xn−1 ...X1 X0 ) =

n−1 X

Xi ri

i=0

where Xi ∈ {−(r − 1), ..., 0, ..., r − 1} Since parallelism of signed-digit number addition arithmetic is independent of radix-r, for conversion efficiency between binary number and signed-digit number, r = 2 is preferable. In Fig.2, the addition mechanism of two signed-digit numbers, X = (Xn−1 ...X1 X0 ) and Y = (Yn−1 ...Y1 Y0 ), is illustrated. This signed-digit full adder (SDFA) block generates two signal - c and w such that Xi

Yi

Xi-1

+

+

Zi

Zi-1

SDFA

Ci

Yi-1

SDFA

Wi

Ci-1

Wi-1

+

+

Si

Si-1

Fig. 2. Block diagram of radix-2 signed-digit full adder

zi = xi + yi 2ci + wi = zi Total sum Si is generated through the summation of wi and ci−1 . Therefore, when ci−1 equals to −1, to retain the final sum Si within the set {−1, 0, 1}, wi must not be equal to −1. Similarly, when ci−1 equals to 1, wi must not be equal to 1. Thus, following relation holds:[2]

4

ci = 1

wi = 0

if zi = 2

ci = 1

wi = −1

if zi = 1 and zi−1 ≥ 1

ci = 0

wi = 1

if zi = 1 and zi−1 < 1

ci = 0

wi = 0

if zi = 0

ci = 0

wi = −1

if zi = −1 and zi−1 ≥ 1

ci = −1

wi = 1

if zi = −1 and zi−1 < 1

ci = −1

wi = 0

if zi = −2

This redundancy scheme limits carry propagation chain to one digit to left and allows totally parallel arithmetic operations. Thus addition delay is independent of word length. It means that, in long word addition, signed-digit adder outperforms ordinary binary adder such as carry look-ahead adder. To use signed-digit adder as a core of ALU in binary logic microprocessor application, the most fundamental issue is the translation of signed-digit number representation to conventional binary, and vice versa, without giving back the performance advantages of embedded signed-digit arithmetic circuit[4]. If we use radix-2 signed-digit adder, binary-tosigned-digit translation is not necessary. But, in binary system, still we need to translate final signed-digit sum of adder to binary number. This binary-to-signed-digit translation can be done by one binary subtraction. For example, radix-2 signed-digit number Xsd = 10¯ 1100¯ 11 = 10010001 + 00¯ 1000¯ 10 can be translated to binary number Xbin = 10010001 − 00100010 But, for long word addition, latency of one binary subtraction is longer than that of one signeddigit addition. Thus overhead makes embedded signed-digit adder useless in binary logic microprocessor. Thus, we used more efficient translation method described in [10]. It’s efficiency is log2 (O). For 64-bit translation from signed-digit to signed-magnitude, it needs 8 gate delay. For 128-bit translation, it needs 10 gate delay. This method is simpler than carry look-ahead adder and has a large degree of regularity. Also implementation require less fan-in. Our simulation showed

5

that this method is more than two-times faster than the one using carry look-ahead adder. But, This can be still large compared to the constant latency of signed-digit adder. In section III, we describe a new design of pipelined MIPS datapath that avoids this translation overhead by dividing signed-digit addition into summation and translation process executed in the different pipeline stages. 2

Interoperable Number System for Binary Processor and Multiple-Valued Adder Since radix-2 signed-digit is used as a core of ALU, and actual input and output of ALU is

binary number, interoperability of common 2’s complement binary number system with radix-2 signed digit number system should be re-considered. When the two’s complement number is used with signed-digit adder, operand of ALU represented in two’s complement number should be changed. First, when one operand of signed-digit adder is two’s complement representation of negative integer, each digit of ‘1’ should be converted ‘¯1’, and additional ‘¯1’ has to be added. The later addition of ‘¯ 1’ can be replaced with carry input ‘¯1’ of signed-digit adder not to use an additional adder. Second, when ALU performs subtraction, the subtrahend operand should be negated. It can be done by replacing each digit of ‘1’ with ‘¯1’, and ‘¯1’ with ‘1’. The difficulty is in the time when negative number is added to negative number. Not to use an additional signed-digit adder, there must be carry input ‘¯ 2’ of signed-digit adder, which is impossible, with one signed-digit adder, to retain the output number of ALU as minimally redundant symmetric signed-digit number[11]. To avoid this cumbersome problem, we decided to use signed-magnitude representation as binary number system. Before describing another reason why we chosen signed-magnitude representation, as a background, we describe the operation of signed-magnitude addition in next paragraph, quoted from [16]. Generally, the signed-magnitude adder circuit must examine the signs of the addends to determine what to do with the magnitudes. If the signs are the same, it must add the magnitudes and give the result the same sign. If the signs are different, it must compare the magnitudes, subtract the smaller from the larger, and give the result the sign of the larger. The circumstance is different when we think about pipelined MIPS datapath, in whick there exists three kinds of adders - one for program counter increment, another for branch address calculation, and the other for ALU core. The operands of the first adder (for PC) are both

6

positive. The first operands of the second adder (for branch address calculation) is positive and larger than the second operands. The general case is only for the last adder used in ALU. It is obvious that signed-magnitude number is simply converted to and from radix-2 signed-digit number. So, in MIPS datapath, there is no overhead of signed-magnitude adder circuit. The advantage of signed-magnitude representation in pipelined MIPS datapath is that the binary-tosigned-digit translation for negative number doesn’t require addition of ‘1’ or carry input. Thus, we decided to use signed-magnitude representation in the binary part of proposed datapath. 3

Efficient Multiple-Valued Logic to Implement Signed-Digit Adder Signed-digit number system used in radix-2 signed-digit adder, is not binary number but

inherently multiple-valued number. Thus, we discuss the various implementation schemes of multiple-valued logic circuit. We compared general CMOS binary logic circuit[12], multiplevalued current-mode (MVCM) logic circuit[6], multiple-valued neuron-MOS (µ-MOS) voltagemode logic circuit[8], and negative differential-resistance device (NDR) based multiple-valued logic circuit[7] in Table I. Characteristic Complexity Reliability Chip Fabrication Low Voltage Op. Speed Power Dissipation

bin − + + ◦ ◦ +

mvcm + ◦ + + + ◦

µ-mos ◦ − − − − +

ndr ◦ − + − − +

TABLE I Comparison of implementation schemes of multiple-valued logic circuit for signed-digit adder (+: better, ◦: normal, −: worse)

In complexity, to represent a single multiple-valued variable, binary logic circuit needs more than two wire. More generally, to represent a radix-r signed-digit number, binary logic circuit needs log2 (2r + 1) wires. But MVCM, µ-MOS, and NDR MVL circuit need only one wire. Thus, as the function to be implemented becomes more complex, binary logic circuit becomes more complex than other circuits. In reliability, because MVL logic circuit represents logic level as a specified range of voltage of current level, MVL circuit is more sensitive to noise than binary logic circuit. But recently, for MVCM logic circuit, self-checking configuration with low-power characteristic is developed[3].

7

In chip fabrication, µ-MOS MVL requires two poly CMOS process. But other circuits have no restriction on CMOS process technology. In low voltage operation, voltage mode MVL circuit such as µ-MOS MVL and NDR-MVL circuit has weakness. Because low supply voltage reduce driving force and noise margin significantly. In operating speed, voltage mode MVL circuits are slow, due to relatively weak driving force. However, MVCM logic circuit uses differential pair, which is controlled by small input voltage difference and quickly steers output current. Thus, in MVCM logic circuit, as long as current is controlled by voltage, it’s operation speed is by far faster than voltage-mode MVL circuit and CMOS binary logic circuit. The key component to determine the operating speed of MVCM logic circuit is a comparator component that controls voltage by current. But MVCM logic circuit has a advantage that summation can be implemented simply by wiring any signals to be added. Also, previous works has shown that MVCM based signed-adder and multiplier run faster than that based on CMOS binary logic circuit[9],[2]. In power dissipation, MVCM circuit has large static power dissipation. Voltage-mode MVL circuit and CMOS binary logic circuit has zero static power dissipation. But, in high speed operation, dynamic power dissipation plays a role in determining total power dissipation. And, since MVCM logic circuit’s power dissipation can be controlled by it’s current source[2], the penalty of large static power dissipation in MVCM logic circuit is not so critical in high speed operation. Through the above discussion on the implementation of multiple-valued logic circuit for signedadder, we decided to use MVCM logic circuit. 4

Performance Estimation of Signed-Digit Adder This subsection presents the comparison between speed of 64-bit carry look-ahead binary

adder and 64-bit multiple-valued signed-digit logic adder. The purpose of this comparison is to estimate the reasonable word length of signed-digit adder that can get performance gain against carry look-ahead adder. 4.1

Basic Component of MVCM logic circuit

For understanding of MVCM signed-digit adder operation, the 5 basic component of MVCM logic circuit are shown in Fig.3. For detailed description, see [6],[9]. Wired sum circuit performs

8

symbol

wired-sum circuit

+

y

Ix2

Ix

comparator

x

T

y

vref

y=1, when x>T. y=0, when x 0) (K < 0)

Fig. 3. Basic components of MVCM logic circuit

summation without active device. Comparator is to compare an input current Ix with threshold current IT , which is set by transistor size, and to generate output voltage Vy . Differentialpair circuit (DPC) is to generate a multiple-valued differential-pair output (s, 0) or (0, s) in accordance with a binary differential-pair input Current mirror produces several replicas of an input current. Current source can be designed by an enhancement-mode nMOS or pMOS transistor with VGS voltage set to Vref , and its output current level is adjusted by the transistor size. µ

Ix(+) = Ix(−) =

¶

1 Wn µn Cox (Vref − |Vtn |)2 2 Ln Ã ! 1 Wp µp Cox (Vdd − Vref − |Vtp |)2 2 Lp

In MVCM circuit, radix-2 signed-digit logic level is represented as current level as shown in Table II[2].

Each logic level is detected by dual rail configuration of comparator whose

threshold current level set to 0.5I0 , 1.5I0 , 2.5I0 , and 3.5I0 .

9

Logic level

-2

-1

0

1

2

z z¯

0 4I0

I0 3I0

2I0 2I0

3I0 1I0

4I0 0

TABLE II logic level represented as current level in MVCM circuit

4.2

Multiple-Valued Current-Mode 64-bit Signed-Digit Adder xi

xi’ yi

zi

ei e’i ci c’i

si

wi w’i

s’i

xi

zi-1

SDFA

x’i-1 yi-1

z’i

xi-1

yi’

ei-1 e’i-1

z’i-1

SDFA

ci-1 c’i-1

si-1

ei-2 e’i-2

wi-1 w’i-1

s’i-1

ei=1 when zi >= 1 ei=0 when zi < 1

Fig. 4. Block diagram of 64-bit MVCM signed-digit adder

In signed-digit number system, addition is performed parallelly. Thus, 64-bit MVCM adder can be composed of simply duplicated MVCM signed-digit full adder (SDFA)[6] as shown in Fig.4. Also, internal block diagram of MVCM-SDFA[6] is shown in Fig.5. 4.3

Latency Comparison with CMOS 64-bit Carry Look-Ahead Binary Adder

We simulated the operation of the comparator and the differential pair of MVCM-SDFA, with HSPICE in 0.25 µm MOSIS process. The latency of differential pair is smaller than 1.5 ps. But, the latency of comparator is relatively large as shown in in Fig.6 with 1.8V Vdd. (tP HL = 166 ps, tP LH = 168 ps) So, the latency of 64-bit MVCM-SDFA is estimated to about 170ps. In contrast, the latency of a 64-bit parallel carry look-ahead binary adder implemented in a 1GHz research prototype of 64-bit PowerPC microprocessor in IBM CMOS 5S 0.25µm process[18] was 550ps. From these, we can roughly estimate the performance gain of MVCM-SDFA in shown Fig.7. Latency characteristics of carry look-ahead binary adder was estimated as follows: latency of CLA = κ log2 (word length)

(1)

10

-1

DPC 1

DPC 1

DPC 1

DPC

-1

1

0.5 e

1.5

x p

DPC

2.5

y

3.5

DPC

3.5

w

w’

e’

2.5

x’

e’ e

p

1.5

y’

0.5

DPC

DPC

DPC

DPC

e

1

DPC e’ e

1

DPC

e’

c

c’

Fig. 5. Internal block diagram of MVCM signed-digit full adder µ

550ps where κ = log2 64bit

¶

(2)

We can see that above the word length of 4-bit, SDFA outperforms carry look-ahead binary adder, and that at the word length of 16-bit, the latency of SDFA is less than the half of the carry look-ahead binary adder. In the proposed datapath, signed-digit-to-binary translation is separated from execution pipeline stage, but still it needs signal conversion between multiple-valued current and binary voltage. Thus, when the latency of signal conversion is included, the actual performance gain is lower than the above result. The most time consuming component of SDFA circuit is one comparator stage and, conversion from multiple-valued current to binary voltage is also performed by one comparator stage. So, we can estimate that the word length above which the SDFA outperforms carry look-ahead binary adder is not from 16-bit but from 64-bit.

11

****

dpc-comp

****

1.75

1.7

1.65

1.6

1.55

1.5

1.45

1.4

Voltages (lin)

1.35

1.3

1.25

1.2

1.15

1.1

1.05

1

950m

900m

850m

800m 10n

12n

14n

16n

Time (lin) (TIME) Design D0: /home/train15/sdfa/dpc-comp-cas D0: /home/train15/sdfa/dpc-comp-cas

Type Transient Transient

Wave D0:A0:v(vout1) D0:A0:v(vout21)

Symbol

Fig. 6. Latency of comparator in MVCM-SDFA (tP HL = 166 ps, tP LH = 168 ps) carry look-ahead adder

log(latency)

733ps 550ps 367ps

signed-digit adder

183ps

4bit

Fig. 7.

16bit

64bit

256bit

log(word length)

Latency comparison of multiple-valued current-mode signed-digit adder and carry look-ahead

binary adder

III

Datapath Architecture

This section describes the architecture of proposed MIPS datapath, structure of its AU(Arithmetic Unit) using multiple-valued logic signed-digit adder, and overall operation. 1

Background of Pipelined MIPS Datapath Conventional pipelined MIPS R2000 datapath[13] is shown in Fig.8. In this figure, hazard

detection unit, forwarding unit, exception handling unit, and control signal unit is not drawn

12

0 1

IF/ID

ID/EX

EX/MEM

MEM/WB

ADD ADD 4

shift left 2

PC

Address Instruction

INSTRUCTION MEMORY

read register#1 read register#2

read data1

write register write data

read data2

Flags ALU

Address

result

Read Data

1

0 DATA MEMORY

0

1

REGISTER FILE

Write data

sign extension

Fig. 8. Conventional pipelined MIPS R2000 datapath (Hazard detection unit, exception handling unit, and control signal unit are not drawn for simplicity)

for simplicity. The five pipeline stages are instruction fetch stage (IF), instruction decode and register file read stage (ID), execution or address calculation stage (EX), data memory access stage (MEM), and write back stage (WB). We assume as follows: Time Instruction fetch

Reg

Instruction fetch

ALU

Reg

Instruction fetch

Data access

ALU

Reg

Reg

Data access

ALU

Reg

Data access

Reg

Program Execution Order

Fig. 9. Assumed critical path of MIPS pipeline stage

Assumption 1: The latency of addition in ALU is the critical path of a pipeline as shown in Fig.9 when addition is performed by carry look-ahead binary adder. Assumption 2: The latency of addition by carry look-ahead adder is reduced when the same

13

addition is performed by signed-digit adder. In the previous section, we showed that this assumption is valid in word length roughly above 64-bit. So, the following relationship holds: latency of CLA = κ log2 (word length) = α × latency of SDA latency of SDA =

κ log2 (wordlength) α

Thus, the partial word length for which carry look-ahead binary adder(CLA) can make result within the latency of signed-digit adder(SDA) took for making result for full word length, is obtained from the above equations. latency of κ partial word length = 2( 1

SDA

= 2 α log2 (word 2

)

(3)

length)

(4)

Pipeline Stages

ADD

0 shift left 2

1

IF/ID

ID/EX E N C O D E R

ADD 4

PC

Address Instruction INSTRUCTION MEMORY

read register#1 read register#2

read data1

write register write data

read data2

EX/MEM

MEM/WR

Flags result

SD-BIN Translator

Arithmetic Unit

2 Logic Unit

1

Write data

REGISTER FILE sign extension

0

Read Data ADD

Address

DATA MEMORY

Fig. 10. Proposed pipelined MIPS R2000 datapath for signed-digit adder (Hazard detection unit, exception handling unit, and control signal unit are not drawn for simplicity)

Proposed datapath shown in Fig.10 is similar to the original one shown in Fig.8. It is MIPS datapath is for hypothetical 64-bit version of MIPS R2000.

14

Now we describe the differences to conventional datapath . As described in the previous section, for efficient translation with signed-digit number, the first big difference made is that proposed datapath uses signed-magnitude representation for binary number instead of common 2’s complement representation. 2.1

Instruction Fetch Stage (IF)

In the instruction fetch (IF) stage, according to Assumption 1, the latency of addition used to count up program register (PC) is the critical path. Since this addition is always ascending, we can split the total word length by partial word length given in Eq.(3),(4). Because carry propagation between the split words is always predictable by checking the bit pattern of the left split word, the latency of addition can be reduced to that of signed digit adder by parallelly performing addition of split word pairs. 2.2

Instruction Decode and Register File Read Stage (ID)

In the instruction decode and register file read (ID) stage, data transfered to ALU from register file is translated, in encoder, from signed-magnitude binary number to signed-digit number to be used as input of the next stage ALU based signed-digit adder. According to Assumption 1, adder calculating the branch address can’t finish within one pipeline timeslot. Due to it’s unpredictable addition result, it can be optimized as adder used to count up program register. Because memory address is not signed-digit number but binary number, adder for branch address selection is not suitable to be replaced with signed-digit adder. Thus, considering the translation latency, the adder begins addition in the ID stage. Branch address calculation should take up less than two pipeline timeslice. For these operation, two carry look-ahead binary adder is used. Each adder receives alternatively operands from IF stage. The result is sent to EX/MEM pipeline register after two pipeline time slice. This scheme has a disadvantage that optimization for branch delay reduction, to reduce the cost of the taken branch, can’t be applied due to the location of branch address calculation adder. So, in this paper, we implement prototype datapath and evaluate the effect of unreduced delay of branch. 2.3

Execution Stage (EX)

According to Assumption 2, instead of ALU with carry look-ahead adder, AU with signed-digit adder performs addition/subtraction.

15

Bit-wise logic operation of binary number represented in signed-digit number is very difficult. Therefore, in the execution stage, we divided ALU into multiple-valued signed-digit adder based AU(arithmetic unit) and binary LU(logic unit). The output of the AU is represented in signed-digit number, and is translated to signedmagnitude number in the next pipeline stage. But, since these translation can take up whole pipeline timeslice, memory address calculation should not obtained in the form of signed-digit number. Thus, we used another adder for memory address calculation. In MIPS instruction set architecture, address displacement of memory reference is within 16-bit[19]. This ensures that the carry look-ahead adder used to calculate memory reference address finish within EX stage. 2.4

Data Memory Access Stage (MEM)

Signed-digit number, as a result of arithmetic instruction, is translated in data memory access (MEM) stage. As described in the previous section, this translation latency is smaller than carry look-ahead addition. This is possible that MIPS instruction set architecture is carefully designed so that arithmetic instruction doesn’t use MEM stage. 2.5

Write Back Stage (WB)

Write back data is selected by three-to-one multiplexer. To balance the latency of critical path, selection between logical result and arithmetic result is also done here. 3

Forwarding Program Execution

Time

sub $2, $1, $3 IM

Reg

DM

Reg

ALU

and $12, $2, $5 IM

DM

Reg

Reg

ALU

sw $15, 100($2) IM

DM

Reg

Reg

ALU

Fig. 11. Different forwarding data type from MEM stage to EX stage: logical operation operand(first case), memory reference address reuse(second case), arithmetic operation operand reuse(not drawn)

16

In the proposed datapath, there are one additional forwarding rule. It decides what kind of data to be forwarded from MEM stage to EX stage. As shown in Fig.11 forwarding from MEM stage to EX stage occurs, when the arithmetic result calculated in EX stage is used as logical operation operand, memory reference address, and arithmetic operation operand. For example, each case corresponds to the following C language code segments. /* logical operation operand:

*/

a = (b1 + b2) & mask;

/* memory reference address:

*/

pivot = array[i + j];

/* arithmetic operation operand: */

sum += sub1; sum += sub2 ;

In the first and second case, forwarded data in MEM stage is that after translation into signedmagnitude number. But in the third case, AU in EX stage can’t wait for translation and can’t get signed-magnitude operand. Thus, in the third case, forwarded data from MEM stage to EX stage is that before translation. T ranslation latency + 16b CLA ≤ 64b SDA

(5)

T ranslation latency + 16b CLA ≤ 64b CLA

(6)

As long as Eq.(5) is satisfied, this path is not in the critical path. As long as Eq.(6) is satisfied, this path is in the critical path of EX stage, but the latency is still smaller than the datapath using carry look-ahead adder. Our simulation showed that Eq.(5) Eq.(6) is generally satisfied. (160ps + 275ps ≤ 440ps, 160ps + 275ps ≤ 550ps) 4

AU using Signed-Digit Adder Suggested internal structure of AU using multiple-valued current-mode (MVCM) signed-digit

adder is shown in Fig.12. Logic

z1

z0

−1 0 1

1 0 0

0 0 1

TABLE III Encoding scheme of signed-digit number used in AU

Input and output of the AU is radix-2 signed-digit number. In determining encoding scheme of signed-digit number, we considered three transition case in encoder, current-to-voltage converter,

17

VOLTAGE SIGNAL REGION

CURRENT SIGNAL REGION

VOLTAGE SIGNAL REGION

R1 64 ovf

R0 64

overflow detector

ovfl

ovfl 64 R1 64 R0

voltage to current converter

SDFA

current to voltage converter

R1 64

sign sign detector

R0 64

SDFA

current to voltage converter

s1 64 s0 64

R1 64 zero detector

R0 64

zero

R1 64 ALU ID/EX pipeline register

R0 64 EX/MEM pipeline register

Fig. 12. AU(Arithmetic Unit) block diagram

and SD-to-BIN translator. We found the encoding scheme shown in Table III most efficient through heuristics. Since (signed-digit full adder) SDFA is designed so that the result is restricted to {−1, 0, 1}, we need not to define {−2, 2} It coincides with the encoding scheme used in [10]. 4.1

Signal Conversion

In ID pipeline stage, AU operands are translated from signed-magnitude number to encoded signed-digit number by encoder. In EX pipeline stage, these encoded operands is given to AU by ID/EX pipeline register. Voltage-to-current converter in AU converts voltage logic level of given AU operands represented in encoded signed-digit number to current logic level according to Table II. Then, Translated current signal representing signed-digit number becomes the input of MVCM-SDFA. Output of MVCM-SDFA is also current signal representing signed-digit number.

Thus,

current-to-voltage converter in AU converts multiple-valued current logic level of given AU output representing signed-digit number to binary voltage logic level representing encoded signeddigit number. Then, this encoded signed-digit number becomes the input of EX/MEM pipeline register. In MEM pipeline stage, by SD-to-BIN translator, it is converted to signed-magnitude binary number.

18

4.2

Sign and Zero Flag Detection

Sign flag and zero flag are calculated from a partial result of signed-digit-to-binary translation as described in [10]. This translation method for 64-bit can be thought as two stages. VHDLstyle description for these stages is as follows. -- First Stage: Dependent on word length (for 64b: 4 gate delay) S_00

A New Pipelined Datapath for RISC Processor with

A New Pipelined Datapath for RISC Processor with

Suggest Documents

risc processor

Datapath Design for a VLIW Video Signal Processor - Department of

Design of Instruction Fetch Unit and ALU for Pipelined RISC Processor

A pipeline diagram - Washingtonwww.researchgate.net › L11-Pipelined-Datapath-And

risc processor pdf - Google Drive

PROCESSOR: DATAPATH & CONTROL - 1 - Santa Clara University

A very simple 8-bit RISC processor for FPGA

Low Power Code Generation for a RISC Processor by Register ...

FPGA prototyping of a RISC processor core for ... - Semantic Scholar

A Unified Processor Architecture for RISC & VLIW DSP

FPGA prototyping of a RISC processor core for ... - Semantic Scholar

Optimal Synthesis of Processor Arrays with Pipelined Arithmetic Units

Action Systems in Pipelined Processor Design - CiteSeerX

An Improved Pipelined Processor Architecture ... - Mahmudul Hasan

RISC PROCESSOR BASED SPEECH CODEC ... - Semantic Scholar

fpga based risc processor with inbuilt auto tuned pid

Educational Simulation of the RiSC Processor - CiteSeerX

Analysis of SEU effects in a pipelined processor

The Microarchitecture of a Pipelined WaveScalar Processor: An RTL ...

Analysis and Optimization of a Deeply Pipelined FPGA Soft Processor

16-Bit RISC processor design for convolution application - IEEE Xplore

16-Bit RISC Processor Design for Convolution Application - IEEE Xplore

A fuzzy RISC processor - Fuzzy Systems, IEEE ... - Semantic Scholar

Hardware/Software Co-Design of a Fuzzy RISC Processor