video compression using integer dct

4 downloads 0 Views 153KB Size Report
[2] William B. Pennebaker and Joan L. Mitchell, "JPEG. Still Image Data Compression Standard," Van Nos- trand Reinhold, 1993. [3] Joan L. Mitchell, William B.
VIDEO COMPRESSION USING INTEGER DCT y

Ying-Jui Chen, Soontorn Oraintara, Truong Nguyen

ECE Dept, Boston University, 8 St. Mary's St., Boston, MA 02215, USA Phone: +1-(617) 353-1040, Fax: +1-(617) 353-1282

fyrchen,oraintar,[email protected]

ABSTRACT

This paper describes the implementation of integer discrete cosine transform (IntDCT) using the Walsh-Hadamard transform and the lifting scheme. The implementation is in the forms of shifts and adds, and all internal nodes have nite precision. A general-purpose scheme of 8-pt IntDCT with complexity of 45 adds and 18 shifts is proposed which gives comparable performance to the oating-point DCT (FloatDCT). For this particular scheme with 8-bit input, perfect reconstruction (PR) is preserved even when all the internal nodes are limited to 16-bit words, rendering Pentium MMX optimization possible. Implementation has been done to incorporate the proposed IntDCT into the H.263+ coder, and the resulting system performs equally well as the original. Further extension to the MPEG coder is straightforward. The proposed IntDCT is reversible, with a low level of power consumption, and is very suitable for source coding, and communication, etc. in a mobile environment. 1.

INTRODUCTION AND REVIEW

The DCT [1] is used as a transform in video/image coding standard. Basically the DCT nds applications mostly in block-transforming of signals, like JPEG still image compression [2], MPEG high bit rate video encoding [3], and H.263+ low bit rate video coding [4], etc. The DCT quali es as a good transform because it is (sub)optimum in terms of variance distribution, Wiener ltering, and rate distortion. It has fast algorithms and is asymptotically equivalent to the optimum KLT for natural images, which can be well modelled by AR(1) random processes. Given an image with integer intensity values, the conventional DCT maps integers to oating-point numbers, and the computation cost and the power consumption (when implemented in hardware) can not be neglected, especially the cost of oating-point multiplication. To deal with this, there have been algorithms to approximate oating-point DCT using integer arithmetic which scale up fractional coeÆcients to large integers (see [5] for example). Neither this nor the oating-point DCT guarantees reversibility. W. Philips proposed Lossless DCT (LDCT), which decomposes the DCT kernel into some heuristic permutations and product of upper- and lower-triangular matrices with all 1's on the diagonals [6]. The internal data are rounded twice to

give an integer output. But it is not clear in [6] how the reversibility is guaranteed. On the other hand, T. Tran has recently proposed BinDCT to approximate the DCT [7]. BinDCT is based on the integer lifting scheme [8], and is capable of a low-cost and reversible integer-to-integer transform. However, there does not seem to be a systematic way to come up with the lifting coeÆcients in BinDCT. As an alternative to [7], we propose an implementation of integer discrete cosine transform (IntDCT) based on Walsh-Hadamard transform (WHT) and integer lifting scheme. The major di erences between IntDCT and BinDCT are the use of WHT and the systematic way to lift the transform. IntDCT uses simple integer arithmetic (bit shifts and adds) and is reversible. All the coeÆcients required can be made simple integers. Therefore, it has the potential of lowpower consumption, and all the internal nodes are limited to 16-bit words for 8-bit grayscale 8  8 input blocks. This feature of low-power consumption is important for mobile devices.

2.

THE PROPOSED INTEGER DCT

To achieve eÆcient power consumption and also reversibility of the transform, we propose a novel algorithm for DCT which is based on Walsh-Hadamard Transform [1] and the lifting scheme [8, 9]. Only simple integer arithmetic is required, in the form of few shifts and adds, thus rendering low power consumption. The reversibility is made possible by the use of lifting scheme.

2.1.

DCT And Walsh-Hadamard Transform

An interesting connection between DCT and WHT is that the DCT kernel is block-diagonal in the WHT domain [1]. Speci cally, denoting the input data of length N by x, DCT of x by XC , WHT by Hw , and bit reversal by B, we have the following:

y Contact Author. This work is supported by Boston University.

X

C

=

p

1=N BTBHw x;

(1)

where T is the block-diagonal matrix as follows (for N = 8), along with Hw : 0 [1] 1 [1] h i 0 B C B C B C; U22 T8 = B (2) C " # B C @ A U44

0

and

2

6 6 6 6 Hw = 6 6 6 6 4

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

1 1 1 1 1 1 1 1

3 7 7 7 7 7: 7 7 7 5

3.

(3)

U22 = R(  ) is 2  2, and U44 is a 4  4 matrix which 8 can be further factorized using only 4 angles (instead of C42 = 6). See Figure 1 for more detail. This structure can be generalized for DCT with size 2k , where k is integer. 2.2.

Integer Lifting Scheme

The next step toward IntDCT is to express T in terms of 5 Givens rotation angles i . Each rotation angle in turn can be factorized into three lifting steps [8, 9]:   cos() sin() R = sin() cos()    cos() 1  ) 1 1 0 1 cos( sin( ) ; (4) = 1 sin() sin() 1 0 0 1 1 ) 1 where sin() and cos( in general are oating-point numsin( ) bers, and are called lifting multipliers. A FLOOR operation is introduced in each lifting step to render an integerto-integer mapping. 2.3.

Low-Cost, Multiplierless

IntDCT

To avoid oating-point multiplication, one may quantize the oating-point lifting multipliers using a given number of bits, so that the quantized multipliers take the form of k=2b (k; b 2 N ). Then these lifting steps can be implemented using only shifts and adds to enable fast computation and reduce power consumption. As described in (4), two distinct integer multiplers b1 and b2 are used to achieve integer lifting by

b1 =2b1

) 1  cos( sin()

and b2 =2b2

be the same. In [12], we have provided a detailed analysis on the optimum integer lifting scheme given the input statistics. A particular choice of the lifting multipliers which results in 45 adds and 18 shifts for 8-point 1D DCT has been proposed by the authors for general purposes. Figure 3 shows the resulting structure. For this scheme with 8-bit input data, PR is even preserved for separable 2D DCT with 16-bit internal nodes. The resulting IntDCT is used in the following example.

 sin():

Figure 2 shows how a Givens rotation is integer lifted, and Table 1 gives b and b for the four distinct rotations for b = 1  8. Note that the structure in Figure 2 may not be optimum given the input statistics, the main reason being that two of the three lifting multipliers are constrained to

VIDEO COMPRESSION USING

INTDCT

The DCT routines in a given video coder can be replaced by the proposed IntDCT to have a low-complexity coder. 3.1.

Simulation

We have incorporated the proposed IntDCT into a publicdomain H.263+ coder [10]. The performance is comparable to the original coder based on FloatDCT. See Table 2 for a comparison in terms of compression ratio and pSNR. Table 3 is a comparison on visual quality. It compares the 30th frame of reconstructed QCIF sequence suzie for the four di erent combinations of FloatDCT and IntDCT at the encoder and the decoder. The proposed multiplierless IntDCT performs well. More on-line demos are available in [11]. 4.

CONCLUSION

We have presented a novel construction of IntDCT which features low-power consumption (low complexity) and reversibility. It is incorporated into the H.263+ video coder, and the performance is comparable to the original coder. Currently we are building a prototype hardware system by porting the algorithm to FPGA using VHDL. The proposed IntDCT is very promising in mobile computing and lowcost consumer electronics, etc. A more detailed analysis of the proposed IntDCT can be found in [12]. Also of interest is the mismatch between the encoder and the decoder, when FloatDCT and IntDCT follow each other. Speci cally for video compression, if the encoder is of low-complexity (using forward and inverse IntDCT) and the decoder uses FloatDCT, the encoder can not reconstruct exactly what will be available at the decoder, and the analysis of the corresponding mismatch is important and will provide insight on how to reduce it. This will be the future work. 5.

REFERENCES

[1] K. R. Rao and P. Yip, \Discrete Cosine Transform, Algorithms, Advantages, Applications," Academic Press, 1990. [2] William B. Pennebaker and Joan L. Mitchell, \JPEG Still Image Data Compression Standard," Van Nostrand Reinhold, 1993 [3] Joan L. Mitchell, William B. Pennebaker, Chad E. Fogg, \Mpeg Video : Compression Standard (Digital Multimedia Standards Series)," Chapman & Hall, Oct. 1996.

[4] G. C^ote, B. Erol, M. Gallant, and F. Kossentini, \H.263+: Video Coding at Low Bit Rates," IEEE Transactions on Circuit and systems for Video Technology, Vol. 8, No. 7, November 1998. [5] ftp://ftp.uu.net/graphics/jpeg/. [6] Wilfried Philips, \The lossless DCT for combined lossy/lossless image coding," Proceedings of ICIP 98 (Chicago), vol. 3, p. 871, Oct. 1998. [7] Trac D. Tran, \Fast Multiplierless Approximation of the DCT," 33rd Annual Conference on Information Sciences and Systems, Baltimore, MD, pp. 933-938, Mar., 1999. Available from http://thanglong.ece.jhu.edu/Tran/Pub/. [8] Ingrid Daubechies and Wim Sweldens, \Factoring Wavelet Transforms into Lifting Steps," Nov. 1997. [9] W. Sweldens, \The lifting scheme: A custom-design construction of biorthogonal wavelets," Appl. Comput. Harmon. Anal., vol. 3, no. 2, 1996. [10] http://spmg.ece.ubc.ca/h263plus/h263plus.html. [11] http://multirate.bu.edu/~yrchen/Research/ intH.263/. [12] Ying-Jui Chen, Soontorn Oraintara, and Truong Nguyen, \Integer Discrete Cosine Transform (IntDCT)," submitted to IEEE Trans on Signal Processing, Feb. 2000. Also available at http://multirate.bu.edu/~yrchen/Research/ ieee-sp-intdct.pdf.

U22

^ HW

+

β (b2)

>>b1

>>b1 >>b2

α (b1)

α (b1)

+

Figure 2: The three integer lifting steps implementing a particular Givens rotation angle. b and b are the atmost-b-bit lifting multipliers used, which are tabulated in Table 1.

- π/8 +

-1

>>2

>>1

+ >>2

+

x

HW

B

-1

+

+

>>3

>>3

-5

-5

>>2 >>2

+

+

>>3

>>3

+ -3

-3

-5

+

>>2 >>1 >>2

+

-1

-5

C

-1

+

3 π /8

X

+

+

3 π /8

B

7 π /16

+

-1

+

3 π /16

Figure 3: A general-purpose scheme of IntDCT which requires 45 adds and 18 shifts for 1D length-8 inputs. 16-bit internal nodes guarantee perfect reconstruction for 2D 256grayscale 88 blocks.

T8

x

+

- π/8

B

B 3 π /8

-1

XC

3 π /16

-1

3 π /8

-1

7 π /16

U44

Figure 1: 8-pt DCT via WHT. T8 can be viewed as DCT in the WHT domain. Note that U22 is a pure rotation, and that U44 is factorized using 4 rotation angles represented by butter ies. This structure can be generalized for DCT with size 2k , where k is integer.

Angle =8 b bit(s) 1 2 3 4 5 6 7 b 0 0 1 3 6 12 25 b 0 -1 -3 -6 -12 -24 -48

8 50 -97

Angle 3=8 b bit(s) 1 2 3 4 5 6 7 8 b -1 -2 -5 -10 -21 -42 -85 -171 b 1 3 7 14 29 59 118 236 Angle 7=16 b bit(s) 1 2 3 4 5 6 7 8 b -1 -3 -6 -13 -26 -52 -105 -210 b 1 3 7 15 31 62 125 251 Angle 3=16 b bit(s) 1 2 3 4 5 6 7 8 b 0 1 2 4 9 19 38 77 b -1 -2 -4 -8 -17 -35 -71 -142 Table 1: The multipliers b and b for b=1|8 and various rotation angles.

carphone compressed at 60 kbps (target bitrate) Encoded

Bit Stream

Compression

by

Size in Bytes

Ratio

Encoded

Bit Stream

Compression

by

Size in Bytes

Ratio

95,465 95,499

FloatDCT IntDCT

152.1 152.1

Reconstructed Sequences Decoded by FloatDCT (dB) IntDCT (dB) 23.32 / 35.24 / 36.34 23.13 / 35.20 / 36.33 22.78 / 35.08 / 36.14 22.97 / 35.11 / 36.17

suzie compressed at 60 kbps (target bitrate) FloatDCT IntDCT

37,526 37,527

152 152

Reconstructed Sequences Decoded by FloatDCT (dB) IntDCT (dB) 26.05 / 41.02 / 40.18 26.01 / 41.01 / 40.17 25.91 / 46.16 / 40.11 26.05 / 41.16 / 40.14

Table 2: Examples of intH.263+ (with debocking lter enabled) using QCIF sequences carphone and suzie. In the performance matrix, the 3-tuples mean PSNR in Y, Cr, Cb for the reconstructed sequences. On-line demos are available [11], which also cover results for other sequences.

FloatDCT

IntDCT

! FloatDCT

! FloatDCT

FloatDCT

IntDCT

! IntDCT

! IntDCT

Table 3: Example: the 30th frame of reconstructed suzie for the four di erent schemes indicated. \X the encoder and Y at the decoder.

! Y" indicates X at

Suggest Documents