2013 IEEE INTERNATIONAL CONFERENCE ON CIRCUITS AND SYSTEMS
Hybrid Logarithmic Number System Arithmetic Unit: A Review R.C Ismail, M.K Zakaria, S.A.Z Murad School of Microelectronic Engineering, Universiti Malaysia Perlis, 01000 Kangar, Perlis, Malaysia.
[email protected] high precision are required, a floating-point representation is often adopted.
Abstract— Logarithmic number system (LNS) arithmetic has the advantages of high performance and high-precision in complex function computation. However, the large hardware problem in LNS addition/subtraction computation has made the large wordlength LNS arithmetic implementation impractical. In this paper, the concept of merging the LNS and Floating Point (FLP) operation into a single arithmetic logic unit (ALU) that can execute addition/subtraction and division/multiplication more faster, precise and less complicated has been reviewed. The advantages of using hybrid system were highlighted while comparing and explaining about FLP and LNS. Keywords—floating point, arithmetic logic unit, hybrid
I.
logarithmic
number
The main advantages of FLP that can be highlighted are addition and subtraction operations. However, the problem with any of these choices is that compared to fixed point, floating-point arithmetic operations are slow and complex. A FP number F has the value [14] F = −1S ×1. f × 2E
where S is the sign, f is the unsigned fraction, and E is the exponent, of the number. The mantissa is made up of the leading “1” and the fraction, where the leading “1” is implied in hardware. This means that for computations that produce a leading “0”, the fraction must be shifted. The only exception for a leading one is for gradual underflow (denormalized number support in the FP library we use is disabled for these tests [15]). The exponent is usually kept in a biased format, where the value of E is
system,
INTRODUCTION
In many engineering systems, floating-point arithmetic units is a vital component inside the systems such as 3D computer graphics and visual simulation where the main execution unit is fused multiply-add FLP (FLP MAF) unit and the FLP divide / square root unit [1]. Since the beginning, FLP units offered sufficient advantages for being significantly developed and widespread in time, and thus their performance has been continuously improved.
E = Etrue + bias 2e−1 −1,
(2.3)
where e is the number of bits in the exponent. This is done to make comparisons of FP numbers easier. FP numbers are kept in the following format as shown in Figure 1:
Avoiding these disadvantages and at the same time keeping the qualities of both FLP and LNS can be achieved through the design of a hybrid unit which combines the attributes of the FLP processor with logarithmic arithmetic. Very interesting and attractive solutions in this direction were proposed by Lai in [3], [4] and [5], where addition and subtraction were performed in FLP and multiplication, division, square root and all the other operations in LNS.
Figure 1: Binary storage format of the FP number. The IEEE 754 standard sets two formats for FP numbers: single and double precision. For single precision, e is 8 bits, m is 23 bits and S is one bit, for a total of 32 bits. The extreme values of the exponent (0 and 255) are for special cases (see below) so single precision has a range of ± (1.0 × 2−126) to (1.11... × 2127), ≈ ±1.2 ×10−38 to 3.4 ×1038 and resolution of 10−7. For double precision, where m is 11 and e is 52, the range is ± (1.0 × 2−1022) to (1.11...× 21023), ≈ ± 2.2 ×10−308 to 1.8×10308 and a resolution of10−15. Finally, there are a few
FLOATING POINT
In computing, floating point describes a method of representing real numbers in a way that can support a wide range of values. Numbers are, in general, represented approximately to a fixed number of significant digits and scaled using an exponent. When a large dynamic range and
55 978-1-4799-1337-4/13/$31.00 ©2013 IEEE
(2.2)
The most common value of the bias is
However, compared to fixed-point arithmetic, the FLP operations are more complex and imply more stages. The increase of integration density has permitted the development of the LNS processors as an alternative [2], but in these the main difficulty is to implement the addition and subtraction operations.
II.
(2.1)
The represented number X can be expressed as
special values that are reserved for exceptions. These are shown in Table I.
(3.1) Table I. FP exceptions (f is the fraction of M). With this representation, for X, Y real numbers, the following arithmetic operations are computed as: Multiply: (3.2) III.
LOGARITHMIC NUMBER SYSTEM
The LNS has been proposed as an alternative to FLP because of simpler multiplication and division. A major advantage of the LNS is that multiplication and division in the linear domain is simply replaced by addition or subtraction in the log domain. LNS arithmetic also has the advantages of high-precision and high performance in complex function computation.
Divide: (3.3)
Square:
However, this advantage comes at the cost of complicated, inexact addition and subtraction, as well as the possible need to convert between the formats. The large hardware problem in LNS addition/subtraction computation has made the large word-length LNS arithmetic implementation impractical. The addition and subtraction in LNS arithmetic require the computation of the non-linear functions, which is usually performed by table-lookup operation.
(3.4)
Square root: (3.5)
A problem in the development of large word-length LNS arithmetic is the exponential increase of this table size. In order to reduce the hardware cost for computing these two functions, many approaches have been proposed, either to reduce the size of the tables [6][13], to compute [7] or to avoid [8] the computation. However, we can expect that hardware cost in these computational methods will still increase dramatically as the word length increases. Another problem of LNS arithmetic is that high precision in LNS subtraction is very difficult to obtain [8].
This representation allows multiplications and divisions to be performed with just a single fixed-point adder/subtracter [3]. Squares and square roots can be implemented with a onebit left or right shift, respectively. LNS is less effective for additions and subtractions. To perform these operations, Leonelli's algorithm [4] is used. The functions sb(z) and db(z) are defined as:
The LNS represents the value of the real number X using a sign bit Sx (0: positive, 1: negative) and the base-b logarithm, x = logb |X|, where x is a fixed-point word, consisting of a sign bit Sx, K integer bits and F fractional bits. The value of K controls the dynamic range of the representation (largest in absolute value and closest to zero value), while F controls the precision (quantization of the representable values between two consecutive powers of the base, b). The logarithmic numbers are kept in the following format as shown in Figure 2:
(3.6) (3.7) Depending on the signs Sx and SY, and substituting with equations (3.6) and (3.7), addition of X and Y can be computed using: (3.8)
(3.9) Figure 2: Binary storage format of the LNS number.
56
With an extension to the 32-bit operands, Lai and Wu [11] proposed a hybrid system architecture that executed multiplication, division, square root and square in a fast manner using LNS. In contrast, the FLP number system was applied to resolve the input, output, addition and subtraction functions. Due to the consuming nature of the overhead operations whilst converting FLP-to-LNS and LNS-to-FLP, lookup tables and linear interpolation algorithms were inserted, whereupon the routine of this processor appeared to compare favorably with a 32-bit FLP DSP device.
Subtraction of X and Y is computed using equation 3.8 for Sx ≠ Sy and equation 3.9 for SX = SY. The functions sb(z) and db(z) are usually implemented using a look-up table, analogous to the books formerly used in manual calculations. It is common practice to use a Look-Up Table (LUT), where the accuracy of the approximation is a function of the LUT address space. A number of algorithms have been developed to minimize the cost of this LUT which becomes prohibitively large when more than 16 bits of accuracy are required [6]. A further problem, often overlooked in the description of LNS processors, is the cost of converting numbers to and from the log domain. Although several algorithms have been proposed they represent another limitation to the achievable accuracy and performance of LNS systems often requiring large LUTs [7] or time consuming iterative algorithms [8].
Since the main obstacle in this hybrid processor was the overhead of converting between number systems, Stouraitis [12] proposed a hybrid technique using a combination of signed-digit (SD) number representation and LNS, called a SD/LNS arithmetic unit. The addition/subtraction was now accomplished even faster than in the classical LNS processor, because the SD adder/subtractor was largely free from serial carry propagation. Figure 4 shows the principal concepts of the hybrid number system processor.
As a result of these issues two distinct types of LNS architectures are commonly used when implementing. LNS processors designed to perform numerical calculations with floating point accuracy [9]. The first performs all mathematical operations in the log domain and uses a LUT to perform [3] and [4] while the second type, called the HybridLNS processor in this paper, performs the operations of multiplication and division in the log domain and addition and subtraction in the linear domain. Although the Hybrid-LNS processor does not need an LUT for addition and subtraction it does need to convert frequently between the linear and log domains. Both architectures are shown in Fig. 3 where they are being used to calculate an inner product.
(a)
(b) Figure 3: (a)Hybrid-LNS and (b)LNS archictecture. IV.
HYBRID SYSTEM
A combination of two different data formats, including elements from both LNS and FLP systems, has been exploited a new form of processors known as hybrid number system processors. These allow the multiply and divide operations to be rapidly computed using the LNS format, whilst addition and subtraction are processed efficiently in FLP representation. The first hybrid processor design was presented by Taylor [10], named the (FU)2, which offered a 12-bit FLP data path whose overall performance was found to demonstrate effectively when compared to that of the conventional FLP system.
Figure 4: Concept of the hybrid number system processor. Lee [16] demonstrated that Hybrid-LNS arithmetic can be used for performing the Discrete Cosine Transform. The algorithm performs robustly when using a wide variation in the precision of the linear-to-log and log-to-linear conversions. The current results can be improved by applying a better algorithm for the Lin2Log and Log2Lin conversions that have a mean error of zero. The cost of the LUT component in such
57
transforms is just 2K bits for an 8-bit LUT address and reduces to just 128 bits for a 4-bit LUT address.
REFERENCES [1]
Ki-Il Kum, Jiyang Kang, and Wonyong Sung, “AutoScaler for C: an optimizing floating-point to integer C program converter for fixed-point digital signal processing,” IEEE Transactions on Circuits and SystemsII: Analog and Digital Signal Processing, vol. 47, no. 9, Sept. 2000, pp. 840-848. [2] D.M. Lewis, 114 MFLOPS LNS Arithmetic Unit for DSP Applications, IEEE Journal of SolidState Circ, vol.30,No.12,pp.1547-1553,Dec.1995. [3] F. Lai, “A 10-ns Hybrid Number System Data Execution Unit for Digital Signal Processing Systems”, IEEE Journal of Solid-State Circuits, Vol. 26, No. 4, pp. 590-599, Apr. 1991. [4] F. Lai and C.F.E. Wu, “A Hybrid Number System Processor with Geometric and Complex Arithmetic Capabilities”, IEEE Transactions on Computers, Vol. 40, No.8, pp. 952-961, Aug.1991. [5] F. Lai, “The Efficient Implementation and Analysis of a Hybrid Number System Processor”, IEEE Transactions on Circuits and Systems, Vol. 46, No. 6 ICSPE5, pp. 382-392, June 1993. [6] M. L. Frey and F. J. Taylor, “A table reduction technique for logarithmically architected digital filters,” IEEE Transactions on Acoustics, Speech, and Signal Processing, vol. ASSP-33, no. 3, pp. 718719, June 1985. [7] J. N. Coleman, E. I. Chester, C. I. Softley, and J. Kadlec, “Arithmetic on the European logarithmic microprocessor,” IEEE Transactions on Computers, vol. 49, no. 7, pp. 702- 715, July 2000. [8] Mark G. Arnold, Thomas A. Bailey, John R. Cowles, and Jerry J. Cupal, “Redundant logarithmic arithmetic," IEEE Tran. on Computers, vol. 39, pp. 1077-1086, Aug. 1990. [9] Chichyang Chen, Rui-Lin Chen, and Chih-Huan Yang, “Pipelined computation of very large word-length LNS addition/subtraction with polynomial hardware cost,” IEEE Transactions on Computers, vol. 49, no. 7, pp. 716-726, July 2000. [10] F. Taylor, “A Hybrid Floating-point Logarithmic Number System Processor,” IEEE Transactions on Circuits and Systems, vol. 32, pp. 9295, 1985. [11] F. S. Lai and C. F. E. Wu, “A Hybrid Number System Processor with Geometric and Complex Arithmetic Capabilities,” IEEE Transactions on Computers, vol. 40, pp. 952-962, 1991. [12] T. Stouraitis, “A Hybrid Floating-point/Logarithmic Number System Digital Signal Processor,” IEEE International Conference on Acoustic, Speech, and Signal Processing, pp. 1079-1082, 1989. [13] R.C Ismail and J.N Coleman, “ROM-less LNS”, 20th IEEE Symposium on Computer Arithmetic (ARITH), pp. 43-51, 2011. [14] I. Koren, Computer Arithmetic Algorithms, 2nd Edition, A.K. Peters, Ltd., Natick, MA, 2002. [15] K. Underwood, “FPGA’s vs. CPU’s: Trends in Peak Floating Point Performance,” FPGA 04. [16] P. Lee, “An Evaluation of a Hybrid-Logarithmic Number System DCT/IDCT Algorithm,” IEEE International Symposium on Circuits and Systems, vol. 5, pp. 4863 - 4866, 23-26 May 2005.
The former is still within the limits of Block RAM macros available on most modern FPGA architectures while the latter can be accommodated using a modicum of distributed RAM resources. This makes the architecture proposed particularly suitable for implementation on mid-range FPGA architectures where the conversion LUTs can be implemented using the limited memory available on-chip. The first advantage of this hybrid ALU compared to normal FLP and LNS is that the FLP and LNS number representations can be designed in a uniform and compatible manner. Furthermore, the hardware for performing the FLPto-LNS and LNS-to-FLP conversions is embedded within the hybrid ALU. There is no extra software and hardware effort needed for the two conversions. Secondly, the hybrid ALU is a functionally versatile ALU. It can perform FLP addition/subtraction, LNS division/multiplication, FLP-toLNS and LNS-to-FLP conversions. Thirdly, and most importantly, this approach can allow the operation unit can be shared together. This advantage can effectively solve the large hardware problem of LNS arithmetic, and result in a cost effective hybrid FLP/LNS ALU. V.
CONCLUSION
This paper has reviewed a versatile and cost-effective hybrid FLP/LNS arithmetic processor. It is versatile because it can execute the FLP-to-LNS and LNS-to-FLP conversions, choose where the operation will be done, where the addition/subtraction done in FLP meanwhile division/multiplication in LNS in one single datapath with uniform data representation format. The combination of these two arithmetic into a single hybrid system will extract advantages from both world and result a greater performance ALU. It is also a cost effective solution because it allows the FLP hardware be shared by the LNS computation. It is planned to build an optimized hardware implementation of the algorithm that will be implemented in programmable hardware. It is concluded that practical design of very large word-length LNS arithmetic processors is possible by using the hybrid FLP/LNS approach.
58