CHAPTER 1 BACKGROUND 1.1 INTRODUCTION

0 downloads 0 Views 4MB Size Report
... digits. The implementation of PL/I for the VAX-11 uses this data type to ... Page 1-2 is the number of decimal digits in the corresponding packed decimal string.
CHAPTER 1 BACKGROUND

1.1

INTRODUCTION

This document describes a multiple-precision division algorithm which uses single-precision division operations on integers. The algorithm is especially useful when the single-precision division and single-precision multiplication operations are performed on relatively large numbers such as 2**16 or larger. The algorithm has been successfully implemented as a run-time routine that supports fixed-point decimal division when multiple-precision division is required for the PL/I language on Digital's VAX-11 processors. This paper begins by describing the basic implementation of PL/I's fixed decimal numbers on VAX machines. Next it describes scaled division and clarifies the circumstances under which multiple-precision division is required. Chapter 2 describes the basic algorithm and concludes with a discussion of some implementation considerations. Chapter 3 contains the actual run-time routine used in the implementation of VAX-11 PL/I. The appendix contain the proofs of theorems used to support the validity of the algorithm.

1.2

PACKED DECIMAL STRINGS AND FIXED DECIMAL NUMBERS

The VAX-11 architecture defines a packed decimal string data type that conceptually consists of a sign and up to 31 decimal digits. All operations on packed decimal strings allow packed decimal operands with up to 31 digits. The result of each of these operations is an "operand" of the instruction and may also have a value of up to 31 decimal digits. The implementation of PL/I for the VAX-11 uses this data type to represent the PL/I fixed-point decimal data type. VAX-11 packed decimal string instructions require that the number of decimal digits for each operand be specified. In PL/I, a fixed-point decimal number is usually implemented in such a way that its precision

BACKGROUND

Page 1-2

is the number of decimal digits in the corresponding packed decimal string. Thus, the hardware uses this precision in all operations on the corresponding packed decimal string and can use the precision's value to trap overflow conditions. A PLII fixed-point decimal number has a prec1s1on and a scale, both of which are fixed throughout the life of the variable. We will symbolically represent the value of the prec1s1on of variable Z by prec(Z) and the scale of Z by scale(Z). We note that the ANSI subset G definition of the PLII language requires that scale(Z) be no larger than prec(Z) and that the scale always be nonnegative. The scale of a fixed decimal number is the number of low-order decimal digits of the packed decimal string to the right of the decimal point. The scale of a fixed decimal number has no analogue on the hardware instruction level. The value must be retained by the compiler and used to create ancillary instructions whenever the packed decimal string is to be interpreted as a fixed decimal number. As an example, suppose we declare X and Y to be and assign them values:

fixed

decimal

numbers

declare X fixed decimal(3,0); I* precision = 3, scale = 0 *I declare Y fixed decimal ( 3, 1); I* precision = 3, scale = 1 *I X=2; Y=1.5; The internal representation for X is the packed decimal string +002 and the internal representation for Y is the packed decimal string +015 In order to perform the operations implied by y

represents greater than

11.

>= represents greater than or equal

12.

= Y then we perform the division using single precision division, obtaining at least one nonzero digit of the quotient. The Euclidean division algorithm guarantees that the remainder is less than the divisor so we proceed with the remainder as the new dividend. Thereafter, successive iterations will always satisfy the requirement that the dividend be less than the divisor.

Assumption 2: 0 < X < Y Discussion: We make this true by using Assumption 1 and specifically testing for the case where X = 0 and for the cases where X and Y have negative signs. We thus perform the division on abs(X) and abs(Y) and if necessary, adjust the sign of the quotient after the division has taken place.

Assumption 3: Let y1 be the high order part of Y and y2 be the low order part. Then: Y

= B*y1

+ y2

and we assume: 0 < y1. Discussion: If y1 is 0, successive divisions by y2. of the quotient.

then we can calculate the quotient by Each such division produces log10(B) digits

FINDING Q: THE ALGORITHM 2.2

Page 2-3

INTUITIVE DISCUSSION

preci ~e

Before giving a description of the algorithm, we present a brief intuitive discussion of the basic idea behind the algorithm. Since B*(y1+1) then L Let

=

>Y

[X/(y1+1)] is the smallest possible value for Q.

R(L) = B*X - Y*L

(the highest possible remainder)

A consideration of the division algorithm might lead us to expect that Q

=L

+ R(L)/Y

and since Q is an integer, we must actually be calculating Q

=L

+ [R(L)/Y]

Such a means for determining Q is useful only if R(L) < B*B since otherwise we would not be avoiding the use of multiple-precision division. In fact, the following example shows that this is not always the case: 100 1600 y = 1704 The remainder here is: R(L) B =

X

=

=

10048 which is greater than B*B.

Fortunately, step 4 of the algorithm as this doesn't happen.

described

below

assures

that

FINDING Q: THE ALGORITHM 2.3

Page 2-4

BASIC ALGORITHM

The basic algorithm for multiple-precision division consists of the following key steps. Each time these steps are repeated, we calculate log10(B) digits of the quotient.

1.

Calculate

H

=

[X/y1]

No value for Q can be higher than H. 2.

Calculate R(H)

= B*X

- Y*min(B-1 ,H)

Min(B-1,H) is the largest possible value for Q. R(H) is the smallest possible remainder.

3.

If R(H)

>= 0 then we are done

If the remainder is value for Q. 4.

>=

0,

then

min(B-1,H) is the proper

If -Y = 0 t1(31) = 31 high order digits of x(46) y*z2 t2(15) = 15 low order digits of x(46) - y*z2 t4 = 10**15-t2 t2(15) = t4(15) t4(30) = y1*z2 t1 = t4 shifted right 1 t1(31) = t1(31)+t3(16) t4 = x-t1 t 1 = t4-1 (borrow 1) Goto 700$

3 4.

630$: ;no borrow, t2 = 0 ;calculate R(L) = t1(31) = x- y1*z2 t1(30) = y1*z2 t4 = t1 shifted right 1 t4(31) = t4(31)+t3(16) t1 = x-t4

FINDING Q: THE ALGORITHM

Page 2-12

·*******************************************************

.'' * ;* .' *

calculate Z2

=L

+ R(L)/Y

* * *

·******************************************************* '

35.

2.5

700$: t4 = t1 = z2 = Goto

t4(31) t4+t2 t4/y z2+t1 320$

= t1(31)

shifted right 15

IMPLEMENTATION CONSIDERATIONS

This section describes further details of interest to anyone implementing the previously described division algorithm.

actually

We saw in the preceding section that we have two basic algorithms, one called Long, the other called Short. Short can perform the division whenever the precision of the divisor is less than 31. Long can perform the division whenever the precision of the divisor is greater than 16 and the high order 15 digits are not all zeros. Actually, we could have considered the number of significant digits for each specific divisor but it was felt that such an approach was not likely to be worthwhile. Instead, the choice of the routine to be called is based on the precision of the divisor; this information is known at compile time since precision and scale are fixed for each variable. In practice, the routine Long is modified to contain code similiar to Short, to take care of the special case where the high order 15 digits of the divisor are all zeros. The question then arises, "precisely when should the Long routine be called instead of the Short routine?" Tests that were run (tstspd.mar) on a VAX-11/750 determined that the Long routine was faster than the Short routine when the precision of the divisor was 30 decimal digits. For these tests, the actual value of the divisor was uniformly distributed across all possible values for the divisor. Thus the Long run-time routine (called pli$div pk long) is calied by the compiler whenever the precision of the divisor is 30 or 31 and multiple-precision arithmetic is required.

CHAPTER 3 RUN-TIME SOURCE LISTING

The following is a source listing of an implementation of this algorithm as part of a pl/i run-time routine to perform 31 digit scaled division:

.title .ident

pli$divpklong /v001/

' ·**************************************************************************** .' * *

.' * .'' * .' * .' * .* .' * .' * .' * .' * .'' * .* .'' * ;* .' * .' * .' *

Copyright (c) 1980 by DIGITAL Equipment Corporation, Maynard, Mass.

This software is furnished under a license and may be used and copied only in accordance with the terms of such license and with the inclusion of the above copyright notice. This software or any other copies thereof ~ay not be provided or otherwise made available to any other person. No title to and ownership of the software is hereby transferred. The information in this software is subject to change without notice and should not be construed as a commitment by DIGITAL Equipment Corporation. DIGITAL assumes no responsibility for the use or reliability software on equipment which is not supplied by- DIGITAL .

of

its

* * * * * * * * * * * * * * * *

.* * ' ·**************************************************************************** ' 'j++ routine: PLI$DIVPKLONG facility: VAX/VMS PL1 runtime library.

RUN-TIME SOURCE LISTING

Page 3-2

abstract: Runtime routine performs fixed decimal (packed decimal) division. The routine is called when precision and scale requirements for the quotient imply multiple prec1s1on division. The routine is only called when such multiple precision division is required and when the divisor has a precision of 30 or 31 decimal digits. (Call pli$divpkshort if multiple prec1s1on division is required and the divisor has precision less than 30 decimal digits). author: Peter Baum

30-jun-1980

modifications:

documentation file: [pl1.doc.codegen]THEORY.MEM functional description: This routine calculates: z

=

X

let a b c d

I

y

= = = =

scale(z) + scale(y) scale(x) - 31 + prec(x) scale(z) + scale(y) - scale(x) + prec(x) 31 prec(x) 31 - prec(y)

this routine is called if b > 31 and d < 2 Prior to the call: if c not 0 then shift x left by c. Thus x is a 31 digit packed decimal.

-.

input: O(ap) 4(ap) 8(ap) 12(ap) 16(ap) 20(ap) 24(ap)

II of arguments

address of dividend (shifted left by c) address of divisor precision of divisor (high order bytes zeroed) address of quotient precision of quotient (high order bytes zeroed) a as defined above(high order bytes zeroed)

output: quotient returned at address specified by 16(ap)

RUN-TIME SOURCE LISTING

Page 3-3

optimization notes: 1) Optimized for speed, not space. 2) Optimized for y > 0. 3) Assumes speed for register to register operations are the same for byte operations and longword operations. 4) Many packed instruction sequences were timed. Do not change unless actual tests are made to determine relative speed. Tests were made on 11/780 and Comet.

possible optimizations: 1) currently we always calculate the next 15 digits each iteration and then truncate the last iteration as part of the final step. We might be able to go through making calculations with fewer digits on this last pass. 2) the classical trade-off involving flow paths which could be merged (sometimes resulting in things like adding 0) or merged with switches or left alone (resulting in space-speed tradeoff in favor of space). Since we are running on a virtual machine, even the speed factor is not obvious. This came up when remainder was 0, when borrow was not needed, when less than 15 digits where required for the last part of the quotient, and when the new algorithm had found precisely what the next 15 digits were but did not have a value yet for the remainder (the other algorithms always have a remainder when the quotient is known). 3) New algorithm

special case y2=0

4) New algorithm sign of R(H).

optimize branches according to probable

variable use:

variable

size in digits

y1 y2

15

X

31

z z2

py(ap) 31

t1

31

16

use High order digits of divisor. Low order digits of divisor. Initially dividend, thereafter remainders of successive divide operations. Quotient. Temporarily holds trial low order digits of quotient. High order digits of the remainder.

RUN-TIME SOURCE LISTING t2

31

t3

16

t4

31

Page 3-4 · Holds the 15 low order digits 46 digit remainder. 31 digits possible later changes. Holds the low order digits of remainder. Temporary used because packed not overlap their operands.

of the for the instructions can

register usage: register r6 r7 r8 r9 r10 r 11

use a = additional digits of precision required stky(sp) which holds a copy of divisor py(ap) = precision of y r = number of additional digits of the quotient that are to be found for next step z(ap) pz(ap) = precision of quotient

stack offsets for work area $offset 0, ,

' ;x, 31 digits ;y1 15 digits ;y2 16 digits ;z2 31 digits ;t1 31 digits ;t2 31 digits (15 digits used) ;y 31 digits ;t3 16 digits ;t4 31 digits ;sign of quotient, 2 bits ;length of work area

parameter offsets $offset 4, ,

' = dividend by reference ;x ;y = divisor by reference ;prec(y) by value ;z = quotient by reference ;prec(z) by value ;a as defined above

RUN-TIME SOURCE LISTING

Page 3-5

constant data area (plidata.mar not used because they cause longword displacements to be used) share one: .packed +1 zero: . packed +0 ' ;**warning** the following two data definitions must be contiguous nines: . byte 9 ;10**16- 1 (must be followed by 10**15-1) nine15: .packed +999999999999999 ;10**15- 1 bignine:.packed +0000000000000009999999999999999 ;10**16 - 1 ten 15: . packed +1000000000000000 ;10**15 neg9: . packed -9999999999999999 ;-(10**16- 1)

.

;' local symbol definitions

' bytestosign=15

;bytes to sign for fixed decimal 31

run time routine pli$divpklong

;x =

.entry movab movl movab movl movl movl clrb movp bgtr blss

pli$divpklong,M -stklen(sp) ,sp ;make room for temporaries consta(ap) ,r6 ;get value of a stky(sp) ,r7 ;address of copy of divisor py(ap) ,r8 ;precision of divisor z(ap) ,r10 ;save address of quotient pz(ap) ,r11 ;precision of quotient stksign(sp) ;clear sign flag #31,@x(ap),(sp) ;move x, set cond. code 50$ ;branch if x > 0 ;branch if x < 0 40$

cmpp4 beql ashp ret divp ret

;divisor zero? #O,zero,r8,@y(ap) ;branch if divide by 0 30$ #O,#O,zero,#O,r11,(r10) ;z = 0

0

30$:

.

#O,zero,#O,zero,r11,(r10) ;cause divide by zero

' not 0 ;x

' 40$: incb decb

' stksign(sp) ;set low order bit bytestosign(sp) ;x < 0 so make it positive

' ;determine sign of y ;y may be 0 at this point ;optimized for y>O

' 50$: movp bgeq incb

r8,@y(ap) ,(r7) 60$ stksign(sp)

' ;move y into temporary ;branch if so ;set neg indicator

RUN-TIME SOURCE LISTING

, ; get ,

subp6

r8,@y(ap) ,#O,zero,r8,(r7) ;convert to positive

y1 and y2

,

60$:

ashp ashp subp6

.,

Page 3-6

#-16,r8,(r7),#0,#15,stky1(sp) ;high order 15 digits of y #16,#15,stky1(sp),#0,#31,stkt1(sp) ;y with 16 low order zeroed #31,stkt1(sp),r8,(r7),#16,stky2(sp) ;16 low order digits y

;prec(y) is large enough for y1 to possibly not be 0 cmpp4 beql brw

,

#O,zero,#15,stky1(sp) 80$ 200$

;y1:0? ;branch if y1 is zero ;branch if y1 is not zero

y2(16) holds all of y

80$:

95$: 100$: 11 0$:

,

cmpp4 blss divp mulp brb ashp brb mulp subp4 beql

.,;determine

115$:

130$:

140$:

150$:

155$: 160$:

r, the number of the next low order digits to obtain

movl cmpl bgeq movl ashp movp ashp movp divp addp4 subl2 bneq blbs ret

, ;remainder ,

,

#31,(sp),#16,stky2(sp) ;x10**16-1? 310$ ;branch if not #15,nine15,stkz2(sp) ;move in base-1 320$ ;skip ashp, you have 15 digits

' ;calculate y2*z2 t3 = high order 16 digits of y2*z2 t2 = low order 15 digits of y2*z2

' 310$: 320$:

ashp mulp ashp ashp

.

subp6 beql

#-1 ,1131 ,stkt1(sp) ,IIO,tl15,stkz2(sp) ;we only get 15 digits #15,stkz2(sp) ,#16,stky2(sp) ,1131 ,stkt1 (sp) ;t1=y2*z2 #-15,#31 ,stkt1 (sp) ,IIO,tl16,stkt3(sp) ;t3(16)=t1 ;shifted right 15 #15,#16,stkt3(sp),#0,#31,stkt4(sp) ;t4(31) = t3 ; shifted left 15 #31 ,stkt4(sp) ,1131 ,stkt1 (sp) ,#15,stkt2(sp) ;t2(15)=t1-t4 330$ ;branch if no borrow required

';borrow is -1, t2 not 0

;calculate R(H) = t1(31) = 31 high order digits of x(46) - y*z2

RUN-TIME SOURCE LISTING t2(15) = 15

325$:

mulp ashp addp4 subp6 bleq subp6 subp6 movp brb movp brw

Page 3-8

low order digits of x(46) - y*z2

#15,stkz2(sp) ,#15,stky1 (sp) ,#30,stkt4(sp) ;t4(30)=y1*z2 #1,#30,stkt4(sp),#O,tf31,stkt1(sp);31 high order of 46 # 1 6 ' s t kt 3 ( s p) 'If 31 ' s t kt 1 ( s p) ; t 1 ( 31 ) =t 1 ( 31 ) + t 3 ( 1 6) #31 ,stkt1 (sp) ,#31, (sp) ,#31 ,stkt4(sp) ;t4=x-t1 325$ ;branch if R(H) negative (t4 < = 0) #1 ,one ,#31 ,stkt4(sp) ,#31 ,stkt1 (sp) ;t1=t4-1 (borrow 1) #15,stkt2(sp) ,#16,ten15,#15,stkt4(sp) ;t4=10**15-t2 #15,stkt4(sp) ,stkt2(sp);copy back into t2 370$ ; #31 ,stkt4(sp) ,stkt1 (sp) ;make t1 hold high order R(H) 500$ ;R(H) < 0 (can not be zero)

' ;no borrow, t2 = 0 ;calculate R(H) = t1(31) = 31 high order digits of x(46) - y*z2 t2(15) = 0 = 15 low order digits of x(46) - y*z2 ' 330$:

.

mulp ashp addp4 subp6 bneq

#15,stkz2(sp) ,#15,stky1(sp) ,#30,stkt1(sp) ;t1(30)=z2(15)*y1(15) #1,#30,stkt1(sp),#O,tf31,stkt4(sp) ;31 high order 46 #16,stkt3(sp) ,#31 ,stkt4(sp) ;t4=t3(16)+t4 #31 ,stkt4(sp) ,#31, (sp) ,#31 ,stkt1 (sp) ;t1=x-t4 370$ ;branch if remainder not zero

' ·************************************************************************

' ·* ' R(H) = 0 , i.e. ;*

*

;*remainder is zero ;*z2(15) holds last non-zero digits of the quotient ;*quotient has not been shifted yet to recieve z2

.' *

*

* * *

*

;* this is a local routine with no exit other than ret * ·************************************************************************

'

' 340$:

350$: 370$:

ashp movp subl2 ashp addp4 tstl beql ashp blbs movp ret blss

r9,r11,(r10),#0,r11,stkt4(sp) ;t4=z shifted left by r9=r r11,stkt4(sp),(r10) ;copy back quotient #15,r9 ;shift needed to leave r9 digits r9,#15,stkz2(sp) ,#0,#15,stkt1(sp);low order digits of z # 1 5 , s t k t 1 ( s p ) , r 1 1 , ( r 1 0 ) ; z = z +z 2 _. r6 ;a = 0 ? 395$ ;branch if a=O (last iteration) r6,r11,(r10),#0,r11,stkt1(sp);adjust for scale stksign(sp) ,410$ ;branch if quotient < 0 r11,stkt1(sp),(r10) ;back into quotient 500$

;branch if R(H) 0 , ;* t1 > 0

.' *

i.e.

*

*

*

·************************************************************************

'

RUN-TIME SOURCE LISTING ashp movp tstl beql

380$:

Page 3-9

r9,r11,(r10),#0,r11,stkt4(sp) ;t4=z shifted left by r9=r r11,stkt4(sp),(r10) ;copy back quotient r6 ;a = 0 ? 390$ ;branch if a=O (last iteration)

; a not 0

'; a

addp4 ashp addp6 brw

#15,stkz2(sp) ,r11, (r10) ;z=z+z2 #15,#31 ,stkt1 (sp) ,#0,1131 ,stkt4(sp) ;t4=t1 shifted left 15 #31, stkt4 ( sp), 1115, stkt2( sp), 1131, ( sp) ; x=t4+t2 230$

subl2 ashp addp4 blbs ret

1115,r9 ;shift needed to leave r9 digits r9,1115,stkz2(sp) ,IIO,II15,stkt1(sp);low order digits of z 1115,stkt1(sp) ,r11,(r10) ;z=z+z2 stksign(sp) ,400$ ;branch if quotient