a fast algorithm for large number division

0 downloads 0 Views 5MB Size Report
$137Fl. Bit step per iteration. = 8. Dbit length = 46 radix-2ABitStep length = 6. Mdl bit length = 17 radix-2ABitStep length = 3. Mdl approximation. = 79856 + 2A4.
A FAST ALGORITHM FOR LARGE NUMBER DIVISION Marco Bucci, Adina Di Porto Fondazione Ugo Bordoni, Roma, Italy

Abstract This work shows a division procedure suitable to several hardware implementations. This procedure can be used not only für computing divisions and remainders, hut also für performing the modular multiplication. The number of iterations of the proposed algorithms is given, and the cast of elementary operations is evaluated as a function of the constraints. In particular, in order to minimize the cast of the elementary operations, all possible truncations of current operands are discussed. The algorithms which are shown, are easily parallelizable and their hardware implementationscan be very simple and cheap.

1. Introduction Several practical problems involve arithmetic operations between large numbers.

In

particular, public-key cryptographic systems proposed so rar [1,2] involve operations over several hundred bit numbers. The algorithms operating on such numbers have same peculiarities which depend on whether they are designed für hardware or software implementation. Hardware solutions are undoubtedly the most efficient even though their cast is comparatively high. They allow, if possible, to parallelize the algorithm and to work with the precision required by the length of the operands. On the otter hand, hardware solutions are not flexible and require elementary and repetitive algorithms. Such a repetitiveness is mainly duc to the necessity of avoiding the use of compare-and-decide operations that are usually slow and expensive. Special attention rollStbe raid to the precision (i.e., to the length of the operands and results in every elementary operation) in the sense that we rollStkeep in mind that lang operands result in high time and hardware cast. The problem of data transferr is fundamental tao, duc to the length of the data themselves, which can be transfelTedonly bit by bit or using very small bursts 01'bits. This fact constrains considerably the hardware architecture because format, synchronisation and input-output data needed to be matched. From these considerations it is evident that an optimal algorithm does not exist. There Work carried out in the framework of tlIe agreement between the ltalian PT Administration and tlIe Fondazione "Ugo Bordoni".

SPRC '93 - Rome, Italy - February 15-16, 1993

241

exists a variety of solution:;detelmined by either the available hardware or the required speed of ca1culation. This work shows a division procedure suitable to several hardware implementations.This procedure can be used not only for computing divisions and remainders, hut also für performing the modular multiplication. This method can be implemented with a hardware cast proportional to the needed speed. This paper deals in particularly with thc truncation that can be performed on operands and results, in order to make the operations as cheap as possible, and, in the same time, to guarantee the convergence. In section 3 a division is shown by a precomputation of an approximation of the divisor inverse. In section 4 it is shown the algorithm when the partial quotients are computed by direct division. In section 5 modular multiplication by interleaving products and divisions (fig.2) is illustrated.

2. The algorithm general structure As it can be seen by fig.(1), the method hefe explained, is a division (or reduction) procedure based on the computation of partial successive quotients. Such quotients can be computed either by direct division of suitable approximation of the operands, or by precomputing an approximation of tbe divisor inverse. 0-1 := A oB

1 FOR i:= 0 TO L - 1

FOR i:= 0 TO L- 1 a.

0 0 J c: 0 +:; 0 :J

"'C V

a:::

f~Prer~ducenD'm~~n: . :. and :. .

!

: Preset Q j \.,"'u~'uu'u'u~u".

I

Compute Q j

a.

0 0 J c: 0 "';:; 0 :J "'C V a:::

IOj:=Oj-1+Aj-i-1BI I

Compute

I

Dj+1:=Dj-QjC

Q := Q + 1

Q := Q + 1

D L := D L - C

D L := D L - C

Q:=O

Q:=O

FOR i:= 0 TO L- 1

FOR i:= 0 TO L- 1

Q := Q + Q i

Q := Q + Q j

Fig.l

242

Qi

I

Fig.2

SPRC '93

- Rome,

Italy

- February

15-16, 1993

The precision of the partial quotients dctermines the reduction ster of tbc rcduction loop, denoted by k in the sequel. In cvcry hardware or software implementation, the reduction ster should be constant. Actually, this fact makes operand addrcssing and shifting casier. Moreover, in this latter Gasethe number L of thc iteration requircd für division is: L=f(m-n)/kl when f.l is the ceiling function, m is tbc maximum lcngth of the dividend D and n is the aetual divisor length; it results: D :::;2m-l

and

2n-l:::;

C :::;2n-l

.

In order to enlighten the algorithm deseription, we made the hypothesis that the length of C is a multiple of k . As it will be shown, this fact allows maximum precision in eomputing partial quotients. Obviously, in real implementation, whatever n is, the same eondition can be obtained by a suitable "justification"of thc operands. In the following we will denare by LxJsthe quantity obtainable by rcplacing by zeroes all bits at the right of the sth bit of x, that is

LxJs

=

(1)

Lxl2s J2s

where L.J is the integral part funetion. Analogously, we will denare by JxJ the qllantity obtained by disregarding all bits at the left of the sth bit of x, that is

sLxJ

=

x - LxJs+1

=

x - Lx/2s+1 J2s+1

Let us observe that the inequality, 00

x -LxJs

Suggest Documents