Redundant logarithmic arithmetic for MPEG decoding - CiteSeerX

0 downloads 0 Views 141KB Size Report
... redundant means more bits are used to represent a value than is needed in theory. ..... where int(log2|X |) is a shift amount computed by a priority encoder.
Redundant logarithmic arithmetic for MPEG decoding Mark G. Arnold *, Lehigh University, 19 Memorial Drive West, Bethlehem, PA 18015 USA ABSTRACT Previous research shows the Signed Logarithmic Number System (SLNS) offers lower power consumption than the fixed-point number system for MPEG decoding. SLNS represents a value with the logarithm of its absolute value and a sign bit. Subtraction is harder in SLNS than other operations. This paper examines a variant, Dual-Redundant LNS (DRLNS), where addition and subtraction are equally easy, but DRLNS-by-DRLNS multiplication is not. DRLNS represents a value as the difference of two terms, both of which are represented logarithmically. DRLNS is appropriate for the Inverse Discrete Cosine Transform (IDCT) used in MPEG decoding because a novel accumulator register can contain the sum in DRLNS, but the products are fed to this accumulator in non-redundant SLNS format. Since DRLNS doubles the word size, the accumulator needs to be converted back into SLNS. This paper considers two such methods. One computes the difference of the two parts using LNS. The other converts the two parts separately to fixed point and then computes the logarithm of their difference. A novel factoring of a common term out of the two parts reduces the bus widths. Mitchell’s low-cost logarithm/antilogarithm approximation is shown to produce acceptable visual results in this conversion. Keywords: Logarithmic Number System (LNS), redundant representation, MPEG, low-power Arithmetic Logic Unit (ALU), multimedia hardware, Inverse Discrete Cosine Transform (IDCT), Dual Redundant Logarithmic Number System (DRLNS), Logarithmic Arithmetic.

1. INTRODUCTION Conventional Arithmetic Logic Units (ALUs), such as floating-point (FP) and fixed-point (FX), are costly for portable multimedia devices, such as MPEG players.5,11,21 A Logarithmic Number System (LNS) uses logarithms to represent real values, which offers significant advantages for operations like multiplication. The most common approach is the Signed Logarithmic Number System (SLNS), which uses the logarithm of the absolute value together with a sign bit to indicate whether that real value is positive or negative. The conventional SLNS approach has been used to reduce power consumption of MPEG decoding.2,3 This paper proposes an even more unusual LNS, Dual Redundant LNS (DRLNS), to further improve the performance of an application-specific ALU for MPEG decoding. DRLNS is appropriate for MPEG decoding, because like many multimedia applications, the computational kernel is a sum of products. In the case of MPEG decoding,11 the sum of products computes the Inverse Discrete Cosine Transform10 (IDCT). The IDCT takes an 8 × 8 matrix, F(u,v), which was encoded from original image data, f(x,y). A quantized IDCT reproduces an approximation, fapprox(x,y) ≈ f(x,y). The smaller the word size, the lower the power consumption,3 but the greater the errors. DRLNS works for such applications because a novel accumulator register can contain the sum in DRLNS format, whilst the products being summed can be fed to the accumulator in nonredundant SLNS format. Although the DRLNS representation is almost twice the size of the SLNS representation, the DRLNS switching activity is nearly the same as SLNS and, most importantly, the difficult operation of computing LNS differences (referred to here as db( ) ) is avoided in the inner loop. Instead, DRLNS allows all additions and subractions to be reduced to LNS sums (referred to here as sb( ) ). One of the advantages of conventional SLNS is that switching activity is reduced not only in the ALU, but also for the memory bus. In a typical multimedia system, the memory bus may offer more opportunity for power savings than the ALU. Since DRLNS doubles the word size, the value in the accumulator needs to be converted back into SLNS after the summation but prior to storage into memory to capitalize on such saving. This paper considers two alternative methods to accomplish this DRLNS-to-SLNS conversion. *

[email protected]; +1 610 758 3285; fax +1 610 758 4096; http://www.cse.lehigh.edu/~caar

top gun3d lns F=5

topgun3d fx F=5 100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d ln s F=6

top gun3d fx F=6 100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d fx F=7

topgun3d lns F=7

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d fx F=8

top gun3d lns F=8

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d fx F=9

top gun3d lns F=9

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgu n3d lns F=10

topgun3d fx F=10 100000000

100000000

1000000

1000000 10000

10000

100

100

1

1

to pgun3 d lns F =1 1

topgun3d fx F=11 1 00 00 00 00

100000000

1 00 00 00

1000000

1 00 00

10000

1 00

100

1

1

top gun3d fx F=12

top gun3d lns F=12

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

Figure 1. Comparison of Fixed Point (FX) and conventional signed LNS using topgun3d.mpg.

One approach is to compute the difference of the two parts of the DRLNS representation using db( ) at the end of the summation. With this DRLNS approach for summing n numbers, sb( ) is computed ((n − 1)/n) of the time, and db( ) is computed 1/n of the time. This translates into much less switching activity for the larger and more powerhungry db( ) unit. For MPEG n=8, this unit is used only 12% of the time. Another approach for DRLNS-to-SLNS conversion is to convert the two numbers separately to fixed point (i.e., take their antilogarithms), subtract them, and then find the logarithm of the fixed-point difference. A novel factoring of a common term out of the two parts reduces

the bus widths involved. Also, because MPEG tolerates small LNS word sizes, a novel application of a low-cost approximation for the logarithm is shown here to produce acceptable visual results. topgun DRLNS F=5

topgun3d lns F=5

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun DRLNS F=6

topgun3d lns F=6

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun DRLNS F=7

topgun lns F=7

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d lns F=8

topgun DRNLS F=8 100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun DRLNS F=9

topgun3d lns F=9

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun DRNLS F=10

topgun LNS F=10

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

topgun3d lns F=11 topgun DRLNS F=11 100000000

100 0000 00

1000000

1 0000 00 100 00

10000

1 00

100

1

1

topgun DRLNS F=12

h

topgun3d lns F=12

100000000

100000000

1000000

1000000

10000

10000

100

100

1

1

Figure 2. Comparison of Dual Redundant LNS (DRLNS) and conventional signed LNS using topgun3d.mpg.

2. CONVENTIONAL SIGNED LNS Before discussing the unconventional LNS arithmetic proposed for use in this paper, let’s review the conventional form of LNS, known as the Signed Logarithmic Number System (SLNS). Base-two SLNS2-4,7,12-13,18 is quite analogous to IEEE-754 floating-point9: for the same word size and precision, the sign bit and integer exponents are identical in SLNS and IEEE-754. The distinction is that the mantissa is linear in IEEE-754 versus logarithmic in

SLNS. In both SLNS and IEEE-754 when an algorithm calls for X+Y, the hardware produces a sum when the sign bits are the same, and a difference when the sign bits are different. However, the non-linear mantissa in SLNS makes the computation of sums and differences more difficult than easy SLNS operations, like multiply and divide. In SLNS, a non-zero real value, X, is converted to its SLNS representation, x = Q(logb|X|, K, F) and xS = sign(X). Positive x represents |X| >1. Negative x represents |X| < 1. In this paper, an upper-case variable is for a real value and the lower-case variables (x and xs) are the corresponding internal LNS representations. Q( ) indicates some form of quantization involving K + 1 bits for dynamic range and F bits for precision. In other words, values in the K K range b−2 ≤ |X| < b2 are represented with a relative error of about one part in 2−F. Values outside this range overflow. K For the kind of applications considered here, zero can be approximated by b−2 . The exact value represented by the xs x SLNS representation, (−1) b , is only an approximation for the original X because of such quantization effects. Previous research has shown that SLNS offers lower power consumption than the fixed-point number system (FX) for multimedia applications, like MPEG video decoding,2,3 because of reduced switching activity and smaller circuit and word sizes. Palourias18 showed that SLNS ALUs consume less power than a comparable fixed-point unit since high-order bits of SLNS-words typically have lower switching activity. Smaller word sizes are possible with SLNS because people tolerate the nearly imperceptible video artifacts of SLNS but find the fixed-point artifacts unacceptable. One requirement for successful MPEG decoding is that the representation covers a large enough dynamic range so that overflow is avoided. SLNS has a big advantage in dynamic range: each additional bit in SLNS squares the dynamic range, whereas in fixed point each additional bit only doubles the dynamic range. For example, it has been shown previously2,3 that a 10-bit SLNS (with 4 bits of precision) is visually superior to a 16-bit fixed point (also with 4 bits of precision) that covers the same dynamic range. The main advantage of SLNS is that multiplication can be performed using an inexpensive fixed-point adder and an XOR gate. The sum of logarithms of two values is the logarithm of their product: log b|X | + logb|Y | = logb| X Y | and sign(X) ⊕ sign(Y) = sign(X Y). It has been shown such simplicity reduces power-consumption compared to conventional arithmetic.18 Other operations besides multiplication are required. Addition and especially subtraction are much more difficult in SLNS. A possible approach is to convert to SLNS only for multiplication, and convert back to fixed-point for the summation.20 This has several disadvantages: three conversions are required for each product and the fixedpoint representation often require more bits and higher power consumption. The approach more commonly used for SLNS2-4,7,12-13,18 is to convert to a fixed-point representation only at the end of a complete calculation. All numbers remain in SLNS until the calculation is done. Given (xS, x) and (yS, y), which are the SLNS representations for the values X and Y, respectively, the SLNS representation of the sum, T= X + Y, can be computed as: a. b. c. d. e.

zS = yS ⊕ xS. z = y – x. If zS = 0, w = sb(z); otherwise w = db(z). t = x + w. If z < 0, tS = xS; otherwise tS = yS.

The computation of z and zS in the first two steps are equivalent to logb| Z | = logb| Y/X | and zS =sign( Y/X ). What is looked up as w in the third step depends on the sign bit zS. When zS = 0, the system looks up z

w = sb(z)= logb(1+ b ).

(1)

When zS = 1, the system looks up z

w = db(z)= logb|1- b |.

(2)

The function sb(z) is used for computing sums when the signs of X and Y are the same. The function db(z) is used for computing differences when the signs of X and Y are different. In either case, the word looked up for w corresponds to logb|1+Z | = logb|1+ Y / X |; the final LNS representation corresponds to t = logb|X (1+ Y / X) | = logb|X + Y |;

(3)

and the sign of this result, tS, is the sign of the operand whose absolute value is larger. Since db(z) has a singularity, it is often more expensive13 to realize in hardware than sb(z).

4. DUAL REDUNDANT LNS The term redundant means more bits are used to represent a value than is needed in theory. A particular value might be represented by one of several bit patterns because of this redundancy. Aside from the non-redundant SLNS, several unconventional variants of LNS have been proposed that use some form of redundancy: Signed-Digit LNS19 (SDLNS), Multi-Dimensional LNS17 (MDLNS), Dual-Redundant LNS1 (DRLNS) and Quad-Redundant LNS1 (QRLNS). SDLNS uses the redundant integer signed-digit number system in what is otherwise a conventional SLNS. MDLNS represents a value as a product of terms, each of which is encoded as a logarithm of a different base. DRLNS represents a value as the difference of two terms, both of which are encoded with a logarithm of the same base. QRLNS represents a value as the quotient of two terms, both of which are encoded in DRLNS. Like SLNS, SDLNS and MDLNS suffer from the difficulty of computing differences. QRLNS is quite expensive. Therefore, this paper explores the most appropriate and economical redundant logarithmic representation, DRLNS, in the novel context of MPEG decoding. DRLNS uses a representation composed of a pair: the positive part, x+, and the negative part, x-. The bit patterns for x+ and x- may be, but need not be, redundant in the sense of SDLNS,19 however this will not be considered + here. The exact value represented by the DRLNS representation is bx − bx . To have the same dynamic range and + precision as SLNS, x and x each have the same layout as an SLNS word (a sign bit of the logarithm, K bits before the binary point and F bits after the binary point), except there is no sign bit for the value being represented. Instead, the sign of a DRLNS number can be determined by whether x+ > x-. The purpose of DRLNS is to avoid the need for subtraction for as long as possible, and thereby to avoid the need to approximate db(z). _ For example, a possible K = 5, F = 4, b = 2 DRLNS representation for X = −√2 is x+ = 0.5 and x- = 1.5, i.e., bx+ − bx = √2 − 2√2, _

_

-

which could be represented by the pair of binary numbers (00000.10002, 00001.10002). Many other DRLNS representations are possible, in contrast to SLNS, whose only possible representation is xS = 1, x = 0.5 (1, 00000.10002). The best DRLNS representation uses the SLNS representation x as x+ if X is positive or as x- if X is negative. The other K part of the best DRLNS representation uses the smallest possible representation (b−2 ≈ b−∞ = 0). In this example, the best DRLNS representation is (10000.00002, 00000.10002). Let (t+, t - ) be the DRLNS representations in the accumulator, T, and (y, yS) be the SLNS representations for a value, Y, that is to be added to T. The new DRLNS representation for the sum in the accumulator can be computed as: a. If yS = 0,, z = y – t +; otherwise z = y – t b. w = sb(z). c. If yS = 0, t + = t + + w; otherwise t - = t - + w. Unlike SLNS, other DRLNS operations (such as DRLNS-by-DRLNS multiplication, division and square root) are hard, but for many applications, such as the ones required in MPEG decoding, the above operations will suffice.

4. MPEG DECODING The numerically intensive portion of MPEG decoding is the Two-Dimensional Inverse Discrete Cosine Transform (2D-IDCT): f(x,y) = ∑7u=0 ∑7v=0 F(u,v) •

Cu 2



Cv 2



[

cos

(2x + 1) uπ 16

] [ •

cos

]

(2y + 1) vπ , 16

(4)

where f(x,y) is the reconstructed output matrix after the IDCT and F(u,v) is an input matrix. Although there are many algorithms to compute the IDCT,14 the direct-form IDCT is appropriate for the DRLNS implementation because it is a simple sum of products form. Other forms might be more appropriate for conventional SLNS implementation. The IDCT algorithm accepts an 8 × 8 matrix of data. The result of the IDCT is an 8 × 8 matrix of integers that are either chrominance or luminance values. Although the system could produce the IDCT with as much accuracy as one wishes, in the MPEG context6 the input integers range from -2,048 to +2,047, and the output integers range from -256 to +255. For conventional SLNS, there is a sign bit (xS) for the value, a sign bit (msb of x) for the logarithm, K=4 4 3 bits (22 > 2,048 > 22 ) for the integer part of the logarithm, and F bits for the fractional part of the logarithm. The total word size of the SLNS representation is thus W = F + 6. For DRLNS, there is no value sign bit, but there are two parts of the representation, x+ and x -, thus W = 2F + 10. The Berkeley MPEG-1 tools6 were modified to use fixed point (FX), conventional SLNS arithmetic (as reported in an earlier paper 2) as well as the novel DRLNS arithmetic discussed here. As described in the earlier paper, 2 the output of this modified player was directed to a .ppm file (a simple RGB format used in Linux), which was converted by “Electric Eyes” 8 to JPEG format for inclusion into an MS Word document for reproduction in this paper. Several MPEG-1 files were used as input to this SLNS/DRLNS-based MPEG player.3 The file discussed in this paper (topgun3d.mpg)15 has several scenes of pilots and aircraft. This file (whose error characteristics are described by the histograms in Figures 1 and 2) is 330,407 bytes long. It consists of 305 frames, of which 21 are I-type, 82 are P-type and 202 are B-type. Each I-type picture is 160 by 128 pixels (80 macroblocks), for a total of 1,680 macroblocks. This file requires 40,467 IDCTs. This video clip shows eight scenes: • two planes silhouetted against the sky • cockpit of the first plane (this includes an explosion in frame 37) • head shot • cockpit • cockpit of the second plane • hand on the stick • head shot • target in cross hairs. As such, it is typical of many entertainment streams, as well as the jittery head-and-shoulders shots of video phones. The MPEG standards11 relate to the IEEE-1180 standard10 which, in turn, specifies that the IDCT should be carried out ideally using IEEE-754 double precision floating-point arithmetic.9 The floating-point IDCT output is then converted to an integer. IEEE-754 has much more precision than a person can perceive. Most MPEG implementations capitalize on this, and choose a lower-precision arithmetic such as F-bit fixed point. Figure 1 compares the errors observed in F-bit fixed point (FX) against those observed in F-bit SLNS. The histograms show how many errors of different magnitudes occur in the integer outputs. It is clear SLNS is much more accurate than fixed point for small F.

5. DUAL REDUNDANT LNS IDCT To implement the signed LNS used in the previous MPEG experiments requires hardware to approximate both sb and db, but db is harder to compute than sb. Because of this difficulty, the Dual Redundant LNS (DRLNS) was proposed.1 The basic idea is to keep separate totals for positive (X +) and negative (X ) values. DRLNS arithmetic is

carried out on the representation (x+, x-). DRLNS is useable for addition and subtraction, but multiplication of two DRLNS values is problematic. Instead SLNS multiplication with DRLNS summation is preferred. For the 2D-IDCT used in MPEG, DRLNS is useful for the summations that occur along each dimension. This discussion will assume that the data is processed in rows and columns with the direct IDCT algorithm. This is done instead of the zigzag order 3 because the zigzag order requires a double-sized memory needing twice the bandwidth (each partial DRLNS sum needs a positive and negative part). Figure 2 gives an example comparing the histograms of topgun3d.mpg for DRLNS and SLNS. Generally the DRLNS and SLNS histograms appear almost identical except for F = 5 and F = 6, where the DRLNS histograms actually have fewer bars, indicative of slightly less DRLNS error.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 3. Effect of ideal DRLNS versus ideal SLNS when decoding ki1.mpg with: (a) F = 3 DRLNS, (b) F = 4 DRLNS, (c) F = 5 DRLNS, (d) F = 3 SLNS, (e) F = 4 SLNS and (f) F = 5 SLNS.

Figure 3 shows the result of decoding a file (ki1.mpg, consisting of a single I-frame) using normal MPEG oddification, 3≤ F ≤ 5 DRLNS arithmetic, ideal round-to-nearest sb, db, input and output. Figure 3 also shows the results of similar precision SLNS arithmetic. The artifacts introduced by DRLNS are no worse than SLNS at a given precision, therefore, if the cost of DRLNS-to-SLNS conversion described below can be made affordable, DRLNS provides an interesting alternative to SLNS that avoids the difficulty of subtraction.

5.1 DRLNS to SLNS conversion At the beginning of the IDCT computation, the data processed along the first dimension from the Run Length (RL) decoder as well as the cosine constants are easily available in SLNS format using the same kind of inputconversion hardware described elsewhere. 2,3 The multiplications can then proceed in SLNS using only an adder and an XOR for the sign, making it possible to work within the restriction limiting DRLNS-by-DRLNS products. The summation of these SLNS terms along the first dimension can proceed using a DRLNS accumulator that has two registers, t+ and t -. In each cycle, only one of these two registers will be updated. Thus, switching activity with the flip flops will be roughly the same as for signed LNS.

When the summation along the first dimension is complete, it is necessary to convert this DRLNS result back into SLNS. This has to be done for each of the 64 results before beginning to process them along the second dimension. If these results were left in DRLNS, the ALU that processes the second dimension would have to multiply these DRLNS elements by SLNS cosines and then add the resulting DRLNS terms to the DRLNS accumulator. Although possible, DRLNS-to-DRLNS addition is twice as expensive as SLNS-to-DRLNS addition because there are both positive and negative terms that must be added in parallel, requiring two sb hardware units. Instead, the approach taken here is to convert the DRLNS results from the first dimension to SLNS prior to processing the second dimension. Such conversion allows the same hardware that processes the first dimension to process the second dimension. There are two implementation choices considered here for converting such a DRLNS representation ( x+, x-) to the equivalent non-redundant SLNS representation (x) either using Leonelli's algorithm: x = x- + db(x+ − x - )

(5)

or using two LNS-to-fixed-point (output) conversions followed by a fixed-point-to-LNS (input) conversion: -

x = logb(bx+ − b x ).

(6)

(6) and (5) are mathematically equivalent if carried out with infinite precision. Assuming the moderate-to-high precision (F ≈ 10) required for IEEE-1180 compliance (indicated by the presence of only two bars in the histograms), the three hardware units (two output anti-log and one input log) needed for (6) would seem to be more expensive than the one hardware unit (db) needed for (5). On the other hand, at the low precisions that previous subjective visual experiments have indicated MPEG tolerates,2,3 it might be possible to use Mitchell's method16 for the input and output, in which case some variation of (6) may be more economical. 5.2 DRLNS to SLNS using Mitchell's method Mitchell’s method16 to compute the base-two antilog uses a simple shifter to produce 2x ≈ 2 +

int(x+)

·(frac(x+) + 1)

(7)

with accuracy of about 4 bits of precision. The “+1” occurs at no cost by concatenation. A similar method for computing the base-two logarithm offers equivalent accuracy. The most obvious way to use Mitchell's method for converting from DRLNS to SLNS involves converting x+ and x- separately to fixed point and subtracting them: X = 2

int(x+)

-

·(frac(x+) +1) − 2int(x ) · (frac(x -) + 1).

(8)

The temporary fixed-point result, X ≈ X, can then be converted back to LNS using Mitchell's logarithm approximation, log2 |2x − 2x | ≈ log2 | X | ≈ int(log2|X |) + 2-int(log2 | X |) · | X | − 1 +

-

(9)

where int(log2|X |) is a shift amount computed by a priority encoder. Figure 5 shows the hardware for (8) and (9). The sign of the SLNS result is determined by whether x+ is less than x -. It is worth noting that after processing in the second IDCT dimension, it is not necessary to apply (9) a second time, and X is instead the desired integer output. 5.3 Improved DRLNS to SLNS conversion Assume that the result is positive (i.e., x+ > x-). The other case is symmetrical, and can be handled by hardware already present in the DRLNS ALU. Determine -

xi = int(x ) − int(x+) ,

(10)

-

which can be computed by a 5-bit integer subtractor. The assumption of x+ < x means xi ≤ 0, thus (8) can be rewritten as: X = 2int( x+) · ((frac( x+) + 1) − 2 i (frac( x -) + 1)). x

(11)

Instead of two shifters for (8), each of which output F + 12 bits, (11) only needs one shifter which outputs F+1 bits. For the case F = 4, this amounts to 5 bits for (11) versus 32 bits for (8). The subtraction in (11) requires F+1 full adders, in addition to the 5 required to compute xi, for a total of F + 6, compared to F + 12 for (8). Combining (11) with (9) creates what is equivalent to a crude F-bit floating-point subtractor: log2 |2x − 2x | = log2 | X | ≈ int(x+) − r + 2r · ((frac( x+) + 1) − 2 i (frac( x -) + 1)) − 1, +

-

x

(12)

x

where r = −int(log2((frac(x+) + 1) − 2 i (frac(x-) + 1))) is an int(log2(F))-bit-wide shift amount computed by a priority encoder. Figure 6 shows the hardware for (12). No one previously has noted that such a circuit can serve as a lowx precision DRLNS-to-SLNS conversion unit, although some LNS researchers7 have noted that factoring out 2 i is helpful. The total number of full adders for Figure 5 is F + 12 compared to F + 11 for Figure 6. The main advantage is the shifters and priority encoder are smaller, and the internal fixed-point representation is shorter, implying lower switching activity. To consider the effects of (12) versus (5) in a realistic, affordable hardware context, Figure 4 shows similar data to Figure 3 (3 ≤ F ≤ 5), except the Low Precision Very Insignificant Power (LPVIP) ALU3 also based on Mitchell’s method is used and oddification4 is omitted. Figures 4(a) - 4(c) use Mitchell's method only on output and use (5) with the LPVIP ALU for DRLNS-to-SLNS conversion.

(a)

(b)

(c)

(d)

(e)

(f)

Figure 4. Effects of DRLNS when decoding ki1.mpg with no oddification and LPVIP ALU using (5) for DRLNS-to-SLNS conversion: (a) F = 3, (b) F = 4, (c) F = 5 and the same using (12) for DRLNS-to-SLNS conversion: (d) F = 3, (e) F = 4 and (f) F = 5.

Figures 4(d) - 4(f) use Mitchell's method for input, output and DRLNS-to-SLNS conversion with (12). At a given precision, there is little visual difference between these two alternatives (using or not using the proposed novel DRLNS-to-SLNS conversion); thus, the designer is free to choose the one which costs less in a particular implementation technology.

int(x+ ) 5 x

+

1

9

frac(x+ )

>>

5

priority encode

16

int(log2(X)) 5 ))

4



-

16

int(x )

9

X




5

16

4

Figure 5. Straightforward F = 4 DRLNS-to-SLNS converter using Mitchell’s method with (8) and (9).

+

int(x ) 5 +

x

1

9 +

frac(x )

priority encode

5

r

+ 3

5

4 5



9

xi



6

-

int(x )

5




5

4

Figure 6. Novel F = 4 DRLNS-to-SLNS converter using Mitchell’s method with (12).

6. CONCLUSIONS This paper has shown that DRLNS is a viable option for a portion of the arithmetic needed in MPEG decoding. Because DRLNS is a specialized number system, two approaches (that produce similar visual quality as a pure SLNS implementation) are proposed to convert DRLNS back to SLNS. By converting at end of each IDCT dimension, the

proposed approach combines advantages of SLNS (reduced switching activity and low-cost multiply) and DRLNS (subtraction avoidance).

REFERENCES 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14.

15. 16. 17.

18. 19. 20. 21.

M. G. Arnold, T. A. Bailey, J. R. Cowles and J. J. Cupal, “Redundant Logarithmic Arithmetic,” IEEE Transactions on Computers, 39, pp. 1077-1086, 1990. M. G. Arnold, “LNS for low-power MPEG decoding,” Advanced Signal Processing Algorithms, Architectures and Implementations XII, SPIE, Seattle, Washington, 11 July 2002. M. G. Arnold, Logarithmic Number Systems for MPEG and Multimedia Applications, PhD Thesis, University of Manchester Institute of Science and Technology, 2002. M. G. Arnold, “Avoiding Oddification to Simplify MPEG-1 Decoding with LNS,” IEEE 2002 International Workshop on Multimedia Signal Processing, St. Thomas, Virgin Islands, Dec 9-11, 2002. Doug Banks and Laurance A. Rowe, “Analysis Tools for MPEG-1 Video Streams,” http://bmrc.berkeley.edu/research/publications/1997/139/dbanks.html Berkeley MPEG Tools, bmrc.berkeley.edu/research/mpeg/ C. Chen and C. G. Yang, “Pipelined Computation of Very Large Word-Length LNS Addition/Subtraction with Polynomial Hardware Cost” IEEE Transactions on Computers, 49, pp. 716-726, 2000. Carsten Haiztler, Electric Eyes, 1997, http://cvs.gnome.org/lxr/source/ee/ IEEE Standard for Binary Floating-Point Arithmetic, ANSI/IEEE Std. 754-1985, IEEE, 1985. IEEE Standard Specifications for the Implementations of the 8 × 8 Inverse Discrete Cosine Transform, IEEE Standard 1180-1990, March 1991. International Organisation For Standardization, MPEG-4 Overview V.21 - Jeju Version: Coding Of Moving Pictures And Audio, ISO/IEC JTC1/SC29/WG11 N4668, March 2002. N. G. Kingsbury and P. J. W. Rayner, “Digital Filtering Using Logarithmic Arithmetic,” Electronics Letters, 7, no. 2, pp. 56-58, 28 January 1971. D. M. Lewis, “Interleaved Memory Function Interpolators with Application to an Accurate LNS Arithmetic Unit,” IEEE Transactions on Computers, 43, pp. 974-982, August 1994. C. Loeffler, A. Ligtenberg and G. Moschytz, “Practical Fast 1-D DCT Algorithms with 11 Multiplications,” Proceedings of the IEEE International Conference on Acoustics, Speech, and Signal Processing, vol. 89, pp. 988-991, 1989. “Top Gun,” Paramount Pictures 1986, http://www.mpegtv.com/topgun3d.mpg J. N. Mitchell, “Computer Multiplication and Division using Binary Logarithms,” IEEE Transactions on Computers, EC-11, pp. 512-517, Aug 1962. R. Muscedere, V. S. Dimitrov, G. A. Jullien, W. C. Miller, “Efficient Conversion from Binary to Multi-Digit Multi-Dimensional Logarithmic Number Systems using Arrays of Range Addressable Look-Up Tables,” Application-specific Systems, Architectures and Processors, IEEE, San Jose, pp. 130-138, 17-19 July 2002. V. Paliouras and T. Stouraitis, “Low Power Properties of the Logarithmic Number System,” Proceedings of the 15th IEEE Symposium on Computer Arithmetic, Vail, Colorado, pp. 229-236, 11-13 June 2001. T. Stouraitis and C. Chen, “Hybrid Signed Digit Logarithmic Number System Processor,” IEE Proceedings-E, 140, PP. 205-210, July 1993. S. Pan, et al., “A 32b 64-Matrix Parallel CMOS Processor,” 1999 IEEE International Solid-State Circuits Conference, San Francisco, California, pp. 15-17, February 1999. John Watkinson, The MPEG Handbook, Focal Press, Oxford, 2001.