fast software mpeg-2 video transcoder with ... - Semantic Scholar

6 downloads 0 Views 81KB Size Report
functions on PCs, software MPEG-2 bitrate transcoders, ... Bitstream. VLD : Variable length decoding .... A variable bitrate (VBR) coding method is effective for.
FAST SOFTWARE MPEG-2 VIDEO TRANSCODER WITH OPTIMIZATION OF REQUANTIZATION ERROR COMPENSATION Junji Tajime Yuzo Senda Yoshihiro Miyamoto Multimedia Research Laboratories, NEC Corporation 4-1-1 Miyazaki, Miyamae-ku, Kawasaki 216-8555, Japan E-mail [email protected] This paper proposes two novel methods to reduce processing time of the developed transcoder. One is a sparse block transcoding method that eliminates or simplifies each process. The other is a fast MC method that decreases conditional judgments.

ABSTRACT A low-complexity software video transcoder with motion compensation (MC) of requantization errors has been developed. For reducing processing time of the developed transcoder, two novel methods are proposed. One is a sparse block transcoding method that bypasses a transcoding process according to distribution of transform coefficients in an input block. This method is valid for about half of all blocks of P pictures. The other is a fast MC method that alternates rounding directions of MC per picture to decrease conditional judgments. Processing time of the fast MC is reduced to three fifths. The transcoder with the proposed methods has been running in real-time on PCs.

2. SOFTWARE VIDEO TRANSCODER

1. INTRODUCTION The MPEG-2 standard [1] is used in many video distribution services like digital television (DTV), cable television (CATV), and video on demand (VoD). Additionally, there is a growing demand for recording distributed bitstreams on digital storage devices of limited capacity, or transmitting the bitstreams through networks of different capacity and characteristics. For utilizing these services or functions on PCs, software MPEG-2 bitrate transcoders, which convert incoming bitstreams into bitstreams with a lower bitrate, are required. Several transcoders have been presented in [2]-[6]. From image quality and processing cost standpoints, the requantization error MC transcoders (REMCT) in [5] and [6] are suited for practical use. The transcoders are still complex because they comprise processes, which are MC, discrete cosine transform (DCT), and inverse DCT (IDCT), to motion-compensate the requantization errors. A transcoder with frequency-domain MC is presented in [7] to eliminate processing time of DCT and IDCT. However, the transcoder with the frequency-domain MC consumes more memory access bandwidth than the transcoder with pixel-domain MC. The memory bandwidth is a bottleneck in software transcoders as described in [8]. Therefore, a low-complexity software MPEG-2 video transcoder with pixel-domain MC has been developed.

0-7803-7622-6/02/$17.00 ©2002 IEEE

The developed transcoder is based on REMCT in [5]. A block diagram of the transcoder is shown in Figure 1. The transcoding process depends on the picture types in the input bitstream. The requantization errors of I pictures are stored to memory as reference pictures. When B pictures are transcoded, MC and DCT are eliminated to reduce processing time of REMCT as described in [7]. The transcoding process for P pictures is more complicated than transcoding processes for I or B pictures and needs more processing time. In order to reduce the time for P pictures, two methods are proposed in the following.

IQ1

RQ Point W

VLD Input MPEG-2 Bitstream

I - 705

IQ2

Motion Vector

DCT Point Y

MC

MEM

VLC

Point Z Output MPEG-2 Bitstream

Point X

IDCT

DCT : Discrete cosine transform IDCT : Inverse DCT IQ1, IQ2 : Inverse quantization RQ : Requantization MC : Motion compensation MEM : Memory VLC : Variable length coding VLD : Variable length decoding

Figure 1: A transcoder block diagram.

IEEE ICIP 2002

2.1. Transcoding sparse blocks Many of the decoded DCT coefficients are zero due to quantization in decoders. For the decoders, a method selecting an IDCT algorithm according to distribution of the coefficients is proposed in [9]. The adaptive method can be applied to IDCT in REMCT because input 8 by 8 blocks of the IDCT consist of subtraction between decoded DCT coefficients at point Y and point Z in Figure 1. The position of the input blocks is point X in Figure 1. The blocks at the point X are classified into four block types. They are DCT_ZERO, DCT_DC, DCT_4X4, and DCT_8X8. DCT_ZERO denotes that all coefficients in the block are zero. DCT_DC denotes that a nonzero coefficient is only DC component. DCT_4X4 denotes that nonzero coefficients are within 4 by 4 low frequency components. We call these types of block “sparse block”. The other blocks are classified into DCT_8X8. In order to take statistics of each block type, software simulation was performed. Input videos for the input MPEG-2 bitstreams are the ITU-R-601 formatted video “Cheer Leaders”, “Flower Garden”, and “Mobile & Calendar” which have 150 frames. The videos were encoded with 6 or 8 megabit per second (Mbps) by Test Model 5 (TM5) [10]. Their group of pictures (GOP) structures (M, N) were equal to (1, 15) or (3, 15). REMCT transcoded these input bitstreams to bitstreams with half of the input bitrate. An average percentage of each block type for P pictures is shown in Figure 2. Intra coded blocks of P pictures are excluded. In this simulation, the percentage of sparse blocks is between 43.4% and 65.5%. Additionally, though the decoder with [9] decides the block type by using an end of block (EOB) code, the developed transcoder more precisely decides the type according to the distribution of the DCT coefficients. (M=1, N=15) 6Mbps to 3Mbps 8Mbps to 㪋Mbps (M=3, N=15) 6Mbps to 3Mbps 8Mbps to 㪋Mbps

2.3 13.0

50.2 40.4

37.1 30.2

2.4 12.7 1.9 13.1 1.9 11.3

Requantization errors in the frequency-domain at point W in Figure 1 are generally smaller than the decoded DCT coefficients at the point Y. Therefore, almost all requantization errors become zero at the point Z when only these errors are requantized. Namely, when all coefficients at the point Y are zero, all coefficients at the point X become zero. By using this property, the transcoder bypasses the transcoding processes when the block type at the point Y is DCT_ZERO. A block diagram of a simplified transcoder is shown in Figure 3. DCT, inverse quantization (IQ2), requantization (RQ), two subtractions, and one addition in Figure 1 are unnecessary. Processing time of these processes is eliminated. The transcoder stores motion-compensated requantization errors to memory and encodes the DCT_ZERO block. Software simulation was performed to take statistics of DCT_ZERO. The said 6Mbps bitstreams were transcoded to 3Mbps bitstreams. An average percentage of DCT_ZERO for P pictures is shown in Figure 4. The percentage of DCT_ZERO at the point Y is 45.9% in the bitstreams whose GOP structure is equal to (1, 15) and is 32.8% in the bitstreams whose structure is equal to (3, 15). Almost all blocks belong to DCT_ZERO again at the point X.

Input MPEG-2 Bitstream

DCT_ZERO block

VLD

motion-compensated requantization errors Motion Vector

MC

Output MPEG-2 Bitstream

MEM

Figure 3: A simplified transcoder block diagram.

(M=1, N=15)

34.5 44.5

Point Y

45.9

Point X

44.9

54.1 97.9

(M=3, N=15) Point Y 32.8

47.9

Point X

32.2

67.2 98.4

56.6 0%

0%

VLC

50%

100%

DCT_ZERO

DCT_DC

DCT_4X4

DCT_8X8

50% DCT_ZERO

100% The others

Figure 4: An average percentage of DCT_ZERO.

Figure 2: An average percentage of each block type.

I - 706

2.2. Motion compensation with rounding control REMCT reduces complexity of the cascaded transcoder in [2] by optimizing its processes. This reduction presupposes that an MC operation is linear. However, an MPEG2 MC operation is not linear. In the MC, motion vectors are specified to an accuracy of one half sample. Relationship between half-samples and actual samples is shown in Figure 5. For example, half-samples b are interpolated by using actual samples A and B as follows: b = ( A + B + 1) / 2. (1) An arithmetic operator “/” denotes integer division with truncation of the result toward minus infinity. The half samples are rounded to the nearest integer. A half integer is rounded away from zero. Accordingly, rounding errors occur when the requantization errors are motioncompensated with the accuracy of one half sample. This is not a serious problem in REMCT, if the rounding errors in interpolation are not accumulated. Commonly, half-samples of the requantization errors b are interpolated as follows:

­ b = ( A + B + 1) / 2 A + B ≥ 0 . (2) ® otherwise ¯b = ( A + B ) / 2 These samples are symmetrically rounded with respect to zero since the requantization errors are signed integers. In this interpolation, the expectation of the rounding errors is equal to zero, if the probability of positive is equal to that of negative. The accumulation of the rounding errors does not occur. However, the interpolation has to judge a sign of addition of A and B per pixel. The judgments consume large computational amounts. Therefore, in the developed transcoder, half-sample b is interpolated as follows:

­b = ( A + B + 1) / 2 P + picture . (3) ® P − picture ¯b = ( A + B ) / 2 +

-

where P picture and P picture denote odd and even number of P pictures from the last decoded I picture. This interpolation is also used in MPEG-4 standard [11]. The interpolation has as similar effects regarding the rounding errors as the interpolation of equation 2. It can reduce a conditional judgment per pixel to a judgment per picture.













㪘 㩿㪘㪂㪚㪂㪈㪀㪆㪉

㩿㪘㪂㪙㪂㪈㪀㪆㪉

㪚 㪸㪺㫋㫌㪸㫃㩷㫊㪸㫄㫇㫃㪼

㩿㪘㪂㪙㪂㪚㪂㪛㪂㪉㪀㪆㪋 㪛 㪿㪸㫃㪽㪄㫊㪸㫄㫇㫃㪼

Figure 5: Relationship between half-samples and actual samples.

2.3. Implementation of a software transcoder In addition to the said algorithmic improvement, fast implementation techniques are introduced to the developed transcoder. Many coding parameters are reused for the transcoding processes for I and B pictures since these processes do not have feedback of the requantization errors [7]. For example, the code words for dct_dc_size and dct_dc_differential are simply copied without decoding a DC component. The copied component is not requantized. The code words for coded_block_pattern, macroblock_address_increment, and so on, are used for terminating RQ process of a DCT block or a macroblock. The transcoder employs a concurrent variable length decoding and a single-instruction multiple-data (SIMD) optimization as well as a transcoder presented in [12]. For example, four DCT coefficients are parallel-processed in DCT, IDCT, and RQ by using MMX instructions [13]. A variable bitrate (VBR) coding method is effective for video recording applications. The transcoder employs a VBR coding method presented in [14] besides constant bitrate (CBR) coding method for improving image quality. The method uses quantization scales of input bitstreams as minimum quantization scales to avoid unnecessary bit generation. Input bitstreams’ coding parameters, which are quantization scales, the number of bits, and so on, are used for VBR control. 3. PERFORMANCE EVALUATION The performance improvement of the developed transcoder was evaluated on Pentium® III. The CBR 6Mbps bitstreams were transcoded to CBR 3Mbps. An average percentage of processing time in the bitstreams is shown in Figure 6, where Proposed indicates the transcoder with the proposed methods and Conventional indicates the transcoder without the methods. In order to confirm the algorithmic improvement of the methods, both transcoders are implemented without the SIMD optimization. By using the fast MC method, a percentage of the processing time of MC is reduced from 14.1% to 8.5% in the bitstreams whose GOP structure is equal to (1, 15) and from 9.6% to 5.6% in the bitstreams whose structure is equal to (3, 15). By using the sparse block transcoding method, a percentage of the processing time of the other processes is reduced from 85.9% to 61.1% and from 90.4% to 71.9%. The percentage of the total processing time is reduced from 100% to 69.6% and from 100% and 77.5%. Additionally, by using the SIMD optimization, the percentage of the total time is reduced to 24.5% and to 37.3%. Average PSNRs for a luminance through bitstreams are shown in Table 1. Average PSNRs of Proposed are slightly better than that of Conventional. In subjective viewing tests, image quality difference was not perceived.

I - 707

The transcoder had been activated on Pentium® III 1.0 GHz to verify its long-sustained behavior. It transcoded input CBR 6Mbps bitstreams to VBR 3Mbps. Consequently, that it had been running in real-time for a day was confirmed. (M=1, N=15) VLD+VLC Conventional 13.6 (100%)

DCT 27.5

5. REFERENCES [1] ISO/IEC 13818-2, “Information technology - Generic coding of moving pictures and associated audio information - Part2 Video”, 1995.

IDCT RQ+IQ MC Others 13.0 13.6 14.1 18.2

[3] Y. Nakajima, H. Hori, and T. Kanoh, “Rate conversion of MPEG coded video by re-quantization process”, ICIP’95, vol. 3, pp. 408-411, 1995.

Proposed 13.8 14.7 11.6 13.6 (69.6%) 7.4 8.5

[4] R. Klein, W.H.A. Bruls, B. Hunneman, and M. Holl, “A low-complexity MPEG-2 bit-rate transcoding algorithm”, ICCE’2000, THAM 15.5, pp.316-317, 2000.

(M=3, N=15) Conventional (100%) Proposed (77.5%)

0%

24.3

23.0 Fast

20.9

15.1 11.3 9.6

13.1 13.0

18.8

[5] P. A. A. Assunção and M. Ghanbari, “Post-processing of MPEG2 coded video for transmission at lower bit rates”, ICASSP’96, vol. 4 pp. 1998-2001, 1996.

15.4 7.4 5.6

50%

[2] H. Sun, W. Kwok, and J. W. Zdepski, “Architectures for MPEG compressed bitstream scaling”, IEEE Trans. Circuits Syst. Video Technol., vol. 6, pp. 191-199, Apr. 1996.

Slow

100%

Figure 6: An average percentage of processing time in bitstreams (Algorithmic improvement without a SIMD optimization).

Table 1: Average PSNRs for luminance through bitstreams. Cheer Flower Mobile Proposed M=1, 27.72 27.43 25.00 N=15 Conventional 27.72 27.40 24.96 Proposed M=3, 27.29 28.05 26.58 N=15 Conventional 27.29 28.05 26.57

[6] G. Keesman, R. Hellinghuizen, F. Hoeksema, and G. Heideman, “Transcoding of MPEG bit-streams”, Signal Processing Image Com., vol. 8 pp. 481-500, 1996. [7] P. A. A. Assunção and M. Ghanbari, “A frequency-domain video transcoder for dynamic bit-rate reduction of MPEG-2 bit streams”, IEEE Tran. Circuits Syst. Video Technol., vol. 8, pp. 953-967, Dec. 1998. [8] D. F. Zucker, M. J. Flynn, and R. B. Lee, “Improving performance for software MPEG players”, COMPCON’96, pp. 327332, 1996. [9] E. Murata, M. Ikekawa, and I. Kuroda, “Fast 2D IDCT implementation with multimedia instructions for a software MPEG2 decoder”, ICASSP’98, vol. 5, pp.3105-3108, 1998. [10] Test Model 5, ISO/IEC JTC1/SC29/WG11/N0400, MPEG93/457, 1993.

4. CONCLUSION For implementing a fast software MPEG-2 video transcoder, a sparse block transcoding method and a fast MC method have been proposed. In order to confirm the validity of these methods, the performance improvement was evaluated on Pentium® III. Consequently, the sparse block transcoding method bypassed the process for half of all blocks of P pictures in the bitstreams whose GOP structure was equal to (1, 15). By using the fast MC method, processing time of MC was reduced to three fifths. By using the sparse block transcoding method, the processing time of the other processes was reduced to seven tenths. Total processing time of the transcoder with the SIMD optimization was reduced to a quarter without image quality degradation. That the transcoder had been running in realtime was confirmed in behavior verification.

[11] ISO/IEC 14496-2, “Information technology - Generic coding audio-visual objects - Part2 Visual”, 1999. [12] Y. Senda and H. Harasaki, “A real-time software MPEG transcoder using a novel motion vector reuse and a SIMD optimization techniques”, ICASSP’99, vol. 4, pp.2359-2362, 1999. [13] Intel, “Intel architecture MMX™ technology, Programmer’s reference manual”, Order No. 243007-001, 1996. [14] Y. Yokoyama and Y. Ooi, “A Scene-Adaptive One-Pass Variable Bit Rate Video Coding Method for Storage Media”, ICIP’99, vol. 3 pp. 827-831, 1999.

I - 708