REDUCING MEMORY REQUIREMENTS IN CSA-BASED SCALABLE MONTGOMERY MODULAR MULTIPLIERS Tao Wu Shanghai Fudan Microelectronics Group Company Ltd., Guotai Rd.127, Shanghai, 200433, China Email:
[email protected],
[email protected]
Abstract Scalable Montgomery modular multiplier is able to perform multiprecision Montgomery modular multiplications in limited hardware, but it requires memory units to store pipelined temporary results. These memory units are called FIFO, whose range increases as a result of longer operands. In this paper, two techniques are proposed to reduce the FIFO memory units in CSA-based scalable Montgomery modular multipliers by about 50%. They are then validated by synthesis results with an example. INTRODUCTION Solid-state and integrated circuits (IC) are playing a central role these days around the world. Bulk-storage memory devices and kinds of smart cards are typical cases of their applications. Among the families of ICs, security IC cards perform arithmetical computation underlying cryptographical protocols. For public key cryptography, scalable Montgomery modular multiplier is an important hardware architecture [1]. It uses limited datapath units and adequate memory units to carry out multi-precision modular multiplications. In other words, it trades off the arithmetic logics with memory units for circuit optimization or flexibility. High-performance scalable Montgomery modular multipliers [2]–[4] often apply carry save logics (CSA) to decrease the critical path, and it lead to double memory units to store intermediate results. In this paper, two skills are proposed to reducing the memory units in CSA-based scalable Montgomery modular multipliers. The first skill reset the logic to clear addresses, and the second skill replace the CSA results by nonredundant values within first-in-first-out registers (FIFOs). They make few changes in original scalable Montgomery modular multiplier architectures. The other parts of this paper is organized as follows: at first there is a brief introduction of scalable Montgomery
978-1-4799-3282-5/14/$31.00 ©2014 IEEE
modular multipliers, then the proposed ideas are given in the following; moreover, a synthesis example employing the ideas is presented at last. SCALABLE MONTGOMERY MODULAR MULTIPLIER Montgomery algorithm [5] avoids modular reduction over an irregular integer by a power of 2. Usually, Montgomery algorithm is carried out by interleaving the multiplier digits, where the quotient digit can be computed digit by digit rather than in one step. Scalable Montgomery modular multiplication [1] further divides the multiplicand into parallel words, and then the total Montgomery modular multiplication can be processed by a few parallel processing elements iteratively. In this paper, the algorithm in [4] is adopted. Its hardware architecture is composed of a line of processing elements, control logics, and the FIFO. The processing elements is shown in Fig. 1, which deletes a redundant register within Fig. 3 in [4]. carry1
first 0 1
first sign_bit2
0 1
!iAj TCj
carry2 sign_bit1 TC(i+1)
CSA 1
CSA 2
TSj
Feed Forward
reg
TS(i+1) LCj+1
i-1Mj
LCj
LSj+1
LSj qi,1..0 qi-1,1
Booth encode
loopbit_out (carry)
reg qi+1 (sum)
loopbit_in
qi+1,1..0
reg qi,1
Fig. 1: Processing element in [4].
ICSICT2014, Guilin, China
FIFO is the abbreviation of First-in-first-out register, which can be implemented as register files or read-write memories. As is shown in Fig. 2, the topology of an FIFO is a dual port RAM. For very small FIFO depth it can also be implemented as shift registers. read
FIFO i
reg
) begin