... digits. The implementation of PL/I for the VAX-11 uses this data type to ... Page 1-2 is the number of decimal digits in the corresponding packed decimal string.
CHAPTER 1 BACKGROUND
1.1
INTRODUCTION
This document describes a multiple-precision division algorithm which uses single-precision division operations on integers. The algorithm is especially useful when the single-precision division and single-precision multiplication operations are performed on relatively large numbers such as 2**16 or larger. The algorithm has been successfully implemented as a run-time routine that supports fixed-point decimal division when multiple-precision division is required for the PL/I language on Digital's VAX-11 processors. This paper begins by describing the basic implementation of PL/I's fixed decimal numbers on VAX machines. Next it describes scaled division and clarifies the circumstances under which multiple-precision division is required. Chapter 2 describes the basic algorithm and concludes with a discussion of some implementation considerations. Chapter 3 contains the actual run-time routine used in the implementation of VAX-11 PL/I. The appendix contain the proofs of theorems used to support the validity of the algorithm.
1.2
PACKED DECIMAL STRINGS AND FIXED DECIMAL NUMBERS
The VAX-11 architecture defines a packed decimal string data type that conceptually consists of a sign and up to 31 decimal digits. All operations on packed decimal strings allow packed decimal operands with up to 31 digits. The result of each of these operations is an "operand" of the instruction and may also have a value of up to 31 decimal digits. The implementation of PL/I for the VAX-11 uses this data type to represent the PL/I fixed-point decimal data type. VAX-11 packed decimal string instructions require that the number of decimal digits for each operand be specified. In PL/I, a fixed-point decimal number is usually implemented in such a way that its precision
BACKGROUND
Page 1-2
is the number of decimal digits in the corresponding packed decimal string. Thus, the hardware uses this precision in all operations on the corresponding packed decimal string and can use the precision's value to trap overflow conditions. A PLII fixed-point decimal number has a prec1s1on and a scale, both of which are fixed throughout the life of the variable. We will symbolically represent the value of the prec1s1on of variable Z by prec(Z) and the scale of Z by scale(Z). We note that the ANSI subset G definition of the PLII language requires that scale(Z) be no larger than prec(Z) and that the scale always be nonnegative. The scale of a fixed decimal number is the number of low-order decimal digits of the packed decimal string to the right of the decimal point. The scale of a fixed decimal number has no analogue on the hardware instruction level. The value must be retained by the compiler and used to create ancillary instructions whenever the packed decimal string is to be interpreted as a fixed decimal number. As an example, suppose we declare X and Y to be and assign them values:
fixed
decimal
numbers
declare X fixed decimal(3,0); I* precision = 3, scale = 0 *I declare Y fixed decimal ( 3, 1); I* precision = 3, scale = 1 *I X=2; Y=1.5; The internal representation for X is the packed decimal string +002 and the internal representation for Y is the packed decimal string +015 In order to perform the operations implied by y
represents greater than
11.
>= represents greater than or equal
12.
= Y then we perform the division using single precision division, obtaining at least one nonzero digit of the quotient. The Euclidean division algorithm guarantees that the remainder is less than the divisor so we proceed with the remainder as the new dividend. Thereafter, successive iterations will always satisfy the requirement that the dividend be less than the divisor.
Assumption 2: 0 < X < Y Discussion: We make this true by using Assumption 1 and specifically testing for the case where X = 0 and for the cases where X and Y have negative signs. We thus perform the division on abs(X) and abs(Y) and if necessary, adjust the sign of the quotient after the division has taken place.
Assumption 3: Let y1 be the high order part of Y and y2 be the low order part. Then: Y
= B*y1
+ y2
and we assume: 0 < y1. Discussion: If y1 is 0, successive divisions by y2. of the quotient.
then we can calculate the quotient by Each such division produces log10(B) digits
FINDING Q: THE ALGORITHM 2.2
Page 2-3
INTUITIVE DISCUSSION
preci ~e
Before giving a description of the algorithm, we present a brief intuitive discussion of the basic idea behind the algorithm. Since B*(y1+1) then L Let
=
>Y
[X/(y1+1)] is the smallest possible value for Q.
R(L) = B*X - Y*L
(the highest possible remainder)
A consideration of the division algorithm might lead us to expect that Q
=L
+ R(L)/Y
and since Q is an integer, we must actually be calculating Q
=L
+ [R(L)/Y]
Such a means for determining Q is useful only if R(L) < B*B since otherwise we would not be avoiding the use of multiple-precision division. In fact, the following example shows that this is not always the case: 100 1600 y = 1704 The remainder here is: R(L) B =
X
=
=
10048 which is greater than B*B.
Fortunately, step 4 of the algorithm as this doesn't happen.
described
below
assures
that
FINDING Q: THE ALGORITHM 2.3
Page 2-4
BASIC ALGORITHM
The basic algorithm for multiple-precision division consists of the following key steps. Each time these steps are repeated, we calculate log10(B) digits of the quotient.
1.
Calculate
H
=
[X/y1]
No value for Q can be higher than H. 2.
Calculate R(H)
= B*X
- Y*min(B-1 ,H)
Min(B-1,H) is the largest possible value for Q. R(H) is the smallest possible remainder.
3.
If R(H)
>= 0 then we are done
If the remainder is value for Q. 4.
>=
0,
then
min(B-1,H) is the proper
If -Y = 0 t1(31) = 31 high order digits of x(46) y*z2 t2(15) = 15 low order digits of x(46) - y*z2 t4 = 10**15-t2 t2(15) = t4(15) t4(30) = y1*z2 t1 = t4 shifted right 1 t1(31) = t1(31)+t3(16) t4 = x-t1 t 1 = t4-1 (borrow 1) Goto 700$
3 4.
630$: ;no borrow, t2 = 0 ;calculate R(L) = t1(31) = x- y1*z2 t1(30) = y1*z2 t4 = t1 shifted right 1 t4(31) = t4(31)+t3(16) t1 = x-t4
FINDING Q: THE ALGORITHM
Page 2-12
·*******************************************************
.'' * ;* .' *
calculate Z2
=L
+ R(L)/Y
* * *
·******************************************************* '
35.
2.5
700$: t4 = t1 = z2 = Goto
t4(31) t4+t2 t4/y z2+t1 320$
= t1(31)
shifted right 15
IMPLEMENTATION CONSIDERATIONS
This section describes further details of interest to anyone implementing the previously described division algorithm.
actually
We saw in the preceding section that we have two basic algorithms, one called Long, the other called Short. Short can perform the division whenever the precision of the divisor is less than 31. Long can perform the division whenever the precision of the divisor is greater than 16 and the high order 15 digits are not all zeros. Actually, we could have considered the number of significant digits for each specific divisor but it was felt that such an approach was not likely to be worthwhile. Instead, the choice of the routine to be called is based on the precision of the divisor; this information is known at compile time since precision and scale are fixed for each variable. In practice, the routine Long is modified to contain code similiar to Short, to take care of the special case where the high order 15 digits of the divisor are all zeros. The question then arises, "precisely when should the Long routine be called instead of the Short routine?" Tests that were run (tstspd.mar) on a VAX-11/750 determined that the Long routine was faster than the Short routine when the precision of the divisor was 30 decimal digits. For these tests, the actual value of the divisor was uniformly distributed across all possible values for the divisor. Thus the Long run-time routine (called pli$div pk long) is calied by the compiler whenever the precision of the divisor is 30 or 31 and multiple-precision arithmetic is required.
CHAPTER 3 RUN-TIME SOURCE LISTING
The following is a source listing of an implementation of this algorithm as part of a pl/i run-time routine to perform 31 digit scaled division:
.title .ident
pli$divpklong /v001/
' ·**************************************************************************** .' * *
.' * .'' * .' * .' * .* .' * .' * .' * .' * .'' * .* .'' * ;* .' * .' * .' *
Copyright (c) 1980 by DIGITAL Equipment Corporation, Maynard, Mass.
This software is furnished under a license and may be used and copied only in accordance with the terms of such license and with the inclusion of the above copyright notice. This software or any other copies thereof ~ay not be provided or otherwise made available to any other person. No title to and ownership of the software is hereby transferred. The information in this software is subject to change without notice and should not be construed as a commitment by DIGITAL Equipment Corporation. DIGITAL assumes no responsibility for the use or reliability software on equipment which is not supplied by- DIGITAL .
of
its
* * * * * * * * * * * * * * * *
.* * ' ·**************************************************************************** ' 'j++ routine: PLI$DIVPKLONG facility: VAX/VMS PL1 runtime library.
RUN-TIME SOURCE LISTING
Page 3-2
abstract: Runtime routine performs fixed decimal (packed decimal) division. The routine is called when precision and scale requirements for the quotient imply multiple prec1s1on division. The routine is only called when such multiple precision division is required and when the divisor has a precision of 30 or 31 decimal digits. (Call pli$divpkshort if multiple prec1s1on division is required and the divisor has precision less than 30 decimal digits). author: Peter Baum
30-jun-1980
modifications:
documentation file: [pl1.doc.codegen]THEORY.MEM functional description: This routine calculates: z
=
X
let a b c d
I
y
= = = =
scale(z) + scale(y) scale(x) - 31 + prec(x) scale(z) + scale(y) - scale(x) + prec(x) 31 prec(x) 31 - prec(y)
this routine is called if b > 31 and d < 2 Prior to the call: if c not 0 then shift x left by c. Thus x is a 31 digit packed decimal.
-.
input: O(ap) 4(ap) 8(ap) 12(ap) 16(ap) 20(ap) 24(ap)
II of arguments
address of dividend (shifted left by c) address of divisor precision of divisor (high order bytes zeroed) address of quotient precision of quotient (high order bytes zeroed) a as defined above(high order bytes zeroed)
output: quotient returned at address specified by 16(ap)
RUN-TIME SOURCE LISTING
Page 3-3
optimization notes: 1) Optimized for speed, not space. 2) Optimized for y > 0. 3) Assumes speed for register to register operations are the same for byte operations and longword operations. 4) Many packed instruction sequences were timed. Do not change unless actual tests are made to determine relative speed. Tests were made on 11/780 and Comet.
possible optimizations: 1) currently we always calculate the next 15 digits each iteration and then truncate the last iteration as part of the final step. We might be able to go through making calculations with fewer digits on this last pass. 2) the classical trade-off involving flow paths which could be merged (sometimes resulting in things like adding 0) or merged with switches or left alone (resulting in space-speed tradeoff in favor of space). Since we are running on a virtual machine, even the speed factor is not obvious. This came up when remainder was 0, when borrow was not needed, when less than 15 digits where required for the last part of the quotient, and when the new algorithm had found precisely what the next 15 digits were but did not have a value yet for the remainder (the other algorithms always have a remainder when the quotient is known). 3) New algorithm
special case y2=0
4) New algorithm sign of R(H).
optimize branches according to probable
variable use:
variable
size in digits
y1 y2
15
X
31
z z2
py(ap) 31
t1
31
16
use High order digits of divisor. Low order digits of divisor. Initially dividend, thereafter remainders of successive divide operations. Quotient. Temporarily holds trial low order digits of quotient. High order digits of the remainder.
RUN-TIME SOURCE LISTING t2
31
t3
16
t4
31
Page 3-4 · Holds the 15 low order digits 46 digit remainder. 31 digits possible later changes. Holds the low order digits of remainder. Temporary used because packed not overlap their operands.
of the for the instructions can
register usage: register r6 r7 r8 r9 r10 r 11
use a = additional digits of precision required stky(sp) which holds a copy of divisor py(ap) = precision of y r = number of additional digits of the quotient that are to be found for next step z(ap) pz(ap) = precision of quotient
stack offsets for work area $offset 0, ,
' ;x, 31 digits ;y1 15 digits ;y2 16 digits ;z2 31 digits ;t1 31 digits ;t2 31 digits (15 digits used) ;y 31 digits ;t3 16 digits ;t4 31 digits ;sign of quotient, 2 bits ;length of work area
parameter offsets $offset 4, ,
' = dividend by reference ;x ;y = divisor by reference ;prec(y) by value ;z = quotient by reference ;prec(z) by value ;a as defined above
RUN-TIME SOURCE LISTING
Page 3-5
constant data area (plidata.mar not used because they cause longword displacements to be used) share one: .packed +1 zero: . packed +0 ' ;**warning** the following two data definitions must be contiguous nines: . byte 9 ;10**16- 1 (must be followed by 10**15-1) nine15: .packed +999999999999999 ;10**15- 1 bignine:.packed +0000000000000009999999999999999 ;10**16 - 1 ten 15: . packed +1000000000000000 ;10**15 neg9: . packed -9999999999999999 ;-(10**16- 1)
.
;' local symbol definitions
' bytestosign=15
;bytes to sign for fixed decimal 31
run time routine pli$divpklong
;x =
.entry movab movl movab movl movl movl clrb movp bgtr blss
pli$divpklong,M -stklen(sp) ,sp ;make room for temporaries consta(ap) ,r6 ;get value of a stky(sp) ,r7 ;address of copy of divisor py(ap) ,r8 ;precision of divisor z(ap) ,r10 ;save address of quotient pz(ap) ,r11 ;precision of quotient stksign(sp) ;clear sign flag #31,@x(ap),(sp) ;move x, set cond. code 50$ ;branch if x > 0 ;branch if x < 0 40$
cmpp4 beql ashp ret divp ret
;divisor zero? #O,zero,r8,@y(ap) ;branch if divide by 0 30$ #O,#O,zero,#O,r11,(r10) ;z = 0
0
30$:
.
#O,zero,#O,zero,r11,(r10) ;cause divide by zero
' not 0 ;x
' 40$: incb decb
' stksign(sp) ;set low order bit bytestosign(sp) ;x < 0 so make it positive
' ;determine sign of y ;y may be 0 at this point ;optimized for y>O
' 50$: movp bgeq incb
r8,@y(ap) ,(r7) 60$ stksign(sp)
' ;move y into temporary ;branch if so ;set neg indicator
RUN-TIME SOURCE LISTING
, ; get ,
subp6
r8,@y(ap) ,#O,zero,r8,(r7) ;convert to positive
y1 and y2
,
60$:
ashp ashp subp6
.,
Page 3-6
#-16,r8,(r7),#0,#15,stky1(sp) ;high order 15 digits of y #16,#15,stky1(sp),#0,#31,stkt1(sp) ;y with 16 low order zeroed #31,stkt1(sp),r8,(r7),#16,stky2(sp) ;16 low order digits y
;prec(y) is large enough for y1 to possibly not be 0 cmpp4 beql brw
,
#O,zero,#15,stky1(sp) 80$ 200$
;y1:0? ;branch if y1 is zero ;branch if y1 is not zero
y2(16) holds all of y
80$:
95$: 100$: 11 0$:
,
cmpp4 blss divp mulp brb ashp brb mulp subp4 beql
.,;determine
115$:
130$:
140$:
150$:
155$: 160$:
r, the number of the next low order digits to obtain
movl cmpl bgeq movl ashp movp ashp movp divp addp4 subl2 bneq blbs ret
, ;remainder ,
,
#31,(sp),#16,stky2(sp) ;x10**16-1? 310$ ;branch if not #15,nine15,stkz2(sp) ;move in base-1 320$ ;skip ashp, you have 15 digits
' ;calculate y2*z2 t3 = high order 16 digits of y2*z2 t2 = low order 15 digits of y2*z2
' 310$: 320$:
ashp mulp ashp ashp
.
subp6 beql
#-1 ,1131 ,stkt1(sp) ,IIO,tl15,stkz2(sp) ;we only get 15 digits #15,stkz2(sp) ,#16,stky2(sp) ,1131 ,stkt1 (sp) ;t1=y2*z2 #-15,#31 ,stkt1 (sp) ,IIO,tl16,stkt3(sp) ;t3(16)=t1 ;shifted right 15 #15,#16,stkt3(sp),#0,#31,stkt4(sp) ;t4(31) = t3 ; shifted left 15 #31 ,stkt4(sp) ,1131 ,stkt1 (sp) ,#15,stkt2(sp) ;t2(15)=t1-t4 330$ ;branch if no borrow required
';borrow is -1, t2 not 0
;calculate R(H) = t1(31) = 31 high order digits of x(46) - y*z2
RUN-TIME SOURCE LISTING t2(15) = 15
325$:
mulp ashp addp4 subp6 bleq subp6 subp6 movp brb movp brw
Page 3-8
low order digits of x(46) - y*z2
#15,stkz2(sp) ,#15,stky1 (sp) ,#30,stkt4(sp) ;t4(30)=y1*z2 #1,#30,stkt4(sp),#O,tf31,stkt1(sp);31 high order of 46 # 1 6 ' s t kt 3 ( s p) 'If 31 ' s t kt 1 ( s p) ; t 1 ( 31 ) =t 1 ( 31 ) + t 3 ( 1 6) #31 ,stkt1 (sp) ,#31, (sp) ,#31 ,stkt4(sp) ;t4=x-t1 325$ ;branch if R(H) negative (t4 < = 0) #1 ,one ,#31 ,stkt4(sp) ,#31 ,stkt1 (sp) ;t1=t4-1 (borrow 1) #15,stkt2(sp) ,#16,ten15,#15,stkt4(sp) ;t4=10**15-t2 #15,stkt4(sp) ,stkt2(sp);copy back into t2 370$ ; #31 ,stkt4(sp) ,stkt1 (sp) ;make t1 hold high order R(H) 500$ ;R(H) < 0 (can not be zero)
' ;no borrow, t2 = 0 ;calculate R(H) = t1(31) = 31 high order digits of x(46) - y*z2 t2(15) = 0 = 15 low order digits of x(46) - y*z2 ' 330$:
.
mulp ashp addp4 subp6 bneq
#15,stkz2(sp) ,#15,stky1(sp) ,#30,stkt1(sp) ;t1(30)=z2(15)*y1(15) #1,#30,stkt1(sp),#O,tf31,stkt4(sp) ;31 high order 46 #16,stkt3(sp) ,#31 ,stkt4(sp) ;t4=t3(16)+t4 #31 ,stkt4(sp) ,#31, (sp) ,#31 ,stkt1 (sp) ;t1=x-t4 370$ ;branch if remainder not zero
' ·************************************************************************
' ·* ' R(H) = 0 , i.e. ;*
*
;*remainder is zero ;*z2(15) holds last non-zero digits of the quotient ;*quotient has not been shifted yet to recieve z2
.' *
*
* * *
*
;* this is a local routine with no exit other than ret * ·************************************************************************
'
' 340$:
350$: 370$:
ashp movp subl2 ashp addp4 tstl beql ashp blbs movp ret blss
r9,r11,(r10),#0,r11,stkt4(sp) ;t4=z shifted left by r9=r r11,stkt4(sp),(r10) ;copy back quotient #15,r9 ;shift needed to leave r9 digits r9,#15,stkz2(sp) ,#0,#15,stkt1(sp);low order digits of z # 1 5 , s t k t 1 ( s p ) , r 1 1 , ( r 1 0 ) ; z = z +z 2 _. r6 ;a = 0 ? 395$ ;branch if a=O (last iteration) r6,r11,(r10),#0,r11,stkt1(sp);adjust for scale stksign(sp) ,410$ ;branch if quotient < 0 r11,stkt1(sp),(r10) ;back into quotient 500$
;branch if R(H) 0 , ;* t1 > 0
.' *
i.e.
*
*
*
·************************************************************************
'
RUN-TIME SOURCE LISTING ashp movp tstl beql
380$:
Page 3-9
r9,r11,(r10),#0,r11,stkt4(sp) ;t4=z shifted left by r9=r r11,stkt4(sp),(r10) ;copy back quotient r6 ;a = 0 ? 390$ ;branch if a=O (last iteration)
; a not 0
'; a
addp4 ashp addp6 brw
#15,stkz2(sp) ,r11, (r10) ;z=z+z2 #15,#31 ,stkt1 (sp) ,#0,1131 ,stkt4(sp) ;t4=t1 shifted left 15 #31, stkt4 ( sp), 1115, stkt2( sp), 1131, ( sp) ; x=t4+t2 230$
subl2 ashp addp4 blbs ret
1115,r9 ;shift needed to leave r9 digits r9,1115,stkz2(sp) ,IIO,II15,stkt1(sp);low order digits of z 1115,stkt1(sp) ,r11,(r10) ;z=z+z2 stksign(sp) ,400$ ;branch if quotient