H.264/AVC to MPEG-2 Video Transcoding Architecture

6 downloads 0 Views 103KB Size Report
bitstream formats also increases. ... vert H.264/AVC content to MPEG-2 Video format, i.e. a ..... quence, Stockholm, is composed by a continuous horizontal.
H.264/AVC to MPEG-2 Video Transcoding Architecture Sandro Moiron1 , S´ergio Faria1,2 , Pedro Assunc¸a˜ o1,2 , Vitor Silva1,3 , Ant´onio Navarro1,4 1

Instituto de Telecomunicac¸o˜ es, Portugal; 2 ESTG, Instituto Polit´ecnico de Leiria, Portugal; 3 DEEC, Univ. de Coimbra, Portugal; 4 DET, Univ. de Aveiro, Portugal. {sandro.moiron, sergio.faria, amado, vitor}@co.it.pt, [email protected]

Abstract – Video transcoding is assuming an important role in modern video communication systems in order to allow interoperability between different equipment and standards. So far most effort has been spent in transcoding from older standards to the most recent ones. However, as new standards reach higher levels of compression efficiency, the diversity of bitstream formats also increases. In this paper we propose a new transcoding architecture for H.264/AVC (Advanced Video Coding) to MPEG-2 Video. Since both standards are based on the same coding paradigms, interframe block based motion estimation and intraframe transform coding, the proposed architecture explores their similarities in order to achieve an efficient format conversion. The proposed transcoder significantly reduces the computational complexity of the MPEG-2 Video motion estimation based on the information included in the H.264/AVC bitstream. The presented results from our architecture show a significant computation reduction, as much as 30%, with a small objective quality reduction.

I. I NTRODUCTION The MPEG-2 Video standard [1] is currently the most common compression scheme used all over the world for digital video compression. The simplicity and efficiency provided by this standard has lead to its global adoption for home and professional use and to the massification of players and recorders, namely for digital television and DVD. The new coding standard H.264/AVC [2] developed by the cooperation between VCEG (Video Coding Expert Group) from ITU (International Telecommunications Union) and MPEG (Motion Picture Expert Group) from ISO/IEC presents a much better compression performance than MPEG-2 Video [3]. Due to its better compression performance H.264/AVC is becoming more popular than its counterparts, namely in internet, HDTV (High Definition Digital Television) and the Mobile TV, thus it is taking the place of MPEG-2 Video. Co-existence of these two standards will definitely bring interoperability problems. Therefore, a system to convert H.264/AVC content to MPEG-2 Video format, i.e. a transcoder, is necessary to maintain backward compatibility. In the recent past, much effort has been put to develop efficient transcoders to convert MPEG-2 Video into H.264/AVC, in order to obtain a more efficient content format [4]. To the authors knowledge H.264/AVC to MPEG-2 Video conversion has only been addressed in [5] where only P frames with a single reference are used in the H.264/AVC stream. A simple video transcoder solution may be implemented using a cascade of a H.264/AVC decoder and a MPEG-2

Video encoder. However, this scheme completely discards the computational effort done by the H.264/AVC encoder that can be found in the bitstream, as result of the decisions taken to encode each type of block. Additionally, the H.264/AVC decoded images would have to be fully encoded as if no previous coding information existed. In order to avoid this complexity, our transcoder intends to reuse the information contained in the H.264/AVC bitstream to simplify the MPEG-2 Video encoding process. Thus, this paper proposes a simplified transcoding architecture that achieves a significant computation reduction with minimal objective quality reduction, when compared with full decode and reencoding architecture. The following section II presents the proposed architecture, describing the assumptions made in order to create the new architecture. The section III provides a detailed report about the incompatibility issues, as well as conversion methods developed to convert modes between the macroblock types of both standards. Section IV presents the simulation results for the proposed architecture. Finally, section V draws some conclusions about the developed work and point out issues for further improvement of the current transcoder performance. II. A RCHITECTURE The type of transcoding is very important as it influences the decoded image quality, as well as the associated computational complexity. The most simple transcoder architecture consists on a decoder that fully decodes the original video, followed by an encoder that produces the desired format. This architecture is capable of providing the best quality as it executes the full motion estimation procedure, thus achieving better quality than any other architecture. However, this performance is obtained with a high computational cost, which is not desirable. Therefore, the development of an alternative method is preferable in order to reduce its computational complexity and consequently its implementation cost. Since this cascaded architecture achieves the best quality it will be used as the reference transcoder for comparison with the proposed transcoder. By exploiting the information embedded in the bitstream it is possible to develop a faster architecture since the coded data contains several parameters which were previously computed in the H.264/AVC encoder. Therefore they can be exploited in order to minimise the computational complexity at the MPEG-2 Video encoder side. The transcoding architecture proposed in this paper is based on a cascade transcoder where the decoder and the encoder are, respectively, the H.264/AVC JM10.2 [6] and the MPEG-2 Video v1.2 [7].

-"./0%1

!" #

(

A. H.264/AVC Decoder

!!

$

& (

)( %

+, )( &

&*

'

&

2. 3 (

Among the three displayed modules, the H.264/AVC Decoder plays an important role, as it carries most preprocessed information that will be used by the MPEG-2 Video Encoder. The H.264/AVC Decoder is fully implemented as this is necessary to decode the bitstream, in order to generate those images that may have been used as reference during the encoding process. These reference images will also be available to the MPEG-2 Video Encoder. The extracted information is shared with the Conversion Module consisting on parameters like macroblock and partition type, motion vectors, reference images and prediction modes, etc. B. Conversion Module

!!

Figure 1 - Transcoding Architecture.

Figure 1 presents the structure of the proposed cascaded architecture, where the transcoder is composed by a decoder and an encoder diagram blocks. Additionally to the decoder-encoder model, the transcoder includes a functional block between the H.264/AVC Decoder and the MPEG-2 Video Encoder. This new module makes the difference between the classic cascade architecture and the new one proposed, representing a parallel process to the complete decode and encode steps. It interacts directly with the encoder in order to modify its original operation scheme, based on the extracted parameters that will shortcut the motion estimation routines. The motion estimation is one of the most time consuming tasks at the encoder, making it a good starting point for improvement [8]. Since both standards share the same theoretical principals in temporal prediction, i.e., a block based approach, the motion estimation process in the transcoder can use most of the original motion vectors as good candidates for MPEG-2 Video encoding saving a great deal of computational complexity. When the video sequence is encoded with the H.264/AVC, its characteristics are analysed in detail and each macroblock is efficiently encoded in regard to the best R-D point. Therefore, the optimal R-D coding modes may be reused at the MPEG-2 Video encoder side which is done by the Conversion Module by extracting several parameters such as macroblock information, picture type, motion estimation, etc, from the H.264/AVC decoder. These parameters are then processed and converted to MPEG-2 Video encoder format. In the MPEG-2 Video encoder this information is used to modify the normal Motion Estimation procedure, avoiding to perform a large number of operations. Furthermore, this parameter extraction procedure can also be applied to intraframe coding, by converting the Discrete Cosine Transform coefficients to Integer Transform coefficients [9].

The Conversion Module is responsible for interfacing between the decoder and the encoder by converting a set of H.264/AVC parameters into MPEG-2 Video format. In both interframe and intraframe coding these parameters are analysed and converted when they are useful for reducing computational complexity. Thus, this module is divided in three independent submodules: the parameter extraction and conversion, the IT/DCT coefficient conversion and the flow control management. The first one handles the inter prediction modes, the second one converts the H.264/AVC transform coefficients to its MPEG-2 Video equivalents, and the third one manages video data flow between the decoder and the encoder. The interframe prediction conversion modes are explained in section III and the coefficient transform mode was already addressed in [9]. C. MPEG-2 Video Encoder This MPEG-2 Video encoder has the additional capability to receive a set of parameters from the Conversion Layer, that also provides several control signals in order to modify some MPEG-2 Video functions, namely the motion estimation. The extracted parameters, after being processed by the Parameter Converter, are delivered to the Motion Estimation block in order to modify the motion vectors search algorithm and substitute the motion vectors with the converted parameters from the H.264/AVC decoder. This procedure avoids carrying out the motion estimation function, thus resulting in significant computational complexity reduction. In the Motion Estimation module, the function checks if the current macroblock can be represented with the H.264/AVC converted information. Thus, if the macroblock can be efficiently encoded based on the extracted parameters, the encoder includes the converted motion vector as the best prediction, without computing the motion estimation. As described in the next section, there are several differences between both standards which lead to poor conversion efficiency. For non-compatible motion compensated prediction modes, the transcoder resorts to classical motion estimation at the expense of higher computational complexity.

III. M ODE C ONVERSION As stated before, the new set of tools provided by H.264/AVC produce new macroblock modes, that do not have direct correspondence with the MPEG-2 Video tools. The development of mode conversion techniques are essential to convert the extracted parameters from the H.264/AVC video. In this paper, we only describe the mode conversion for 16×16 pixel blocks of P slices. The remaining modes are not implemented yet. A. 16×16 SKIP The mode conversion of H.264/AVC SKIP macroblocks to MPEG-2 Video is limited by several constraints which reduces the number of macroblocks whose coding modes can be fast converted. In P slices, a H.264/AVC SKIP macroblock usually represents an image area with either constant or static motion, while a MPEG-2 Video SKIP macroblock represents only a static area. In other words, while H.264/AVC encodes a macroblock as SKIP when the differential motion vector is zero, the MPEG-2 Video does it only when the absolute motion vector is zero. In both cases, the reference image is the last image marked for reference. During the decoding process of a SKIP macroblock in H.264/AVC, the motion vector associated with this macroblock is obtained from the previous macroblock by keeping the same value, meaning that the differential motion vector is null. A SKIP macroblock, according to MPEG-2 Video standard, cannot be the first nor the last macroblock of a slice, therefore, as each slice should begin and end in the same row, the horizontal boundary macroblocks cannot be encoded as SKIP. This restriction is more relevant when encoding frames with small spatial resolution, as the number of macroblocks in the boundary limits represents a larger area of the image. B. 16×16 Predicted Those 16×16 macroblocks which were not coded as SKIP in the H.264/AVC encoder, are divided in two main groups, according to their distance to the reference image used for prediction. The first group includes those which have a unitary distance to their reference image while all the others belong to the second group. This is due to the fact that in MPEG-2 Video the reference image is only the closest previously encoded images, while H.264/AVC can use up to 16 reference images. The first group is compatible with MPEG-2 Video, while motion vectors belonging to macroblocks in the second group need to be converted for MPEG-2 Video type. The motion vector conversion assumes that motion vector length is related to the reference image distance [10]. Therefore, the motion vectors should be rescaled proportionally, like in figure 2, according to the distance between the used reference image and the last encoded image in the MPEG-2 Video sequence. Since the motion vector accuracy is different in both standards, e.g., 1/4 pixel and 1/2 pixel, all 1/4 pixel motion vectors are converted by round-

Figure 2 - Motion vector scaling.

ing to the nearest 1/2 pixel in order to minimise the mismatch. If H.264/AVC motion estimation is performed over the picture boundaries, then a new MPEG-2 Video incompatibility issue is introduced. Therefore, these macroblocks undergo through additional processing in order to generate a new set of motion vectors compatible with MPEG-2 Video format. Note that in MPEG-2 Video the motion vector range, which is an user defined parameter, limits the range of the re-calculated motion vectors. C. Sub-Partitions The H.264/AVC standard allows additional partitioning of macroblocks into several blocks (16×8, 8×16, 8×8, 8×4, 4×8 and 4×4). Since the MPEG-2 Video allows only 16×16 macroblocks, the conversion of macroblocks with several sub-partition reveals a big challenge. In the previous mode conversion there was only one reference image for the whole macroblock. Now, as the macroblock is segmented into several partitions, each partition can have its own reference frame. Additionally, each partition can be divided into sub-partitions (less than 8×8 pixel), however, all sub-partitions share the same reference frame. The MPEG-2 Video standard, in progressive coding, only allows the use of 16×16 pixel macroblocks, then the H.264/AVC sub-partitioned macroblocks need to be converted into MPEG-2 Video 16×16 pixel macroblocks. The conversion between a set of small blocks into a larger block has been discussed before in [11], but it has been applied in a different context, namely temporal and spatial scaling of images. Therefore, this mode conversion has not been implemented yet, following the classical full search like all the other macroblocks not addressed here. IV. R ESULTS The following results were obtained using a Personal Computer with a 3GHz processor and 1.5GB of RAM memory. All tests were performed using 720×576 video sequences encoded at 5Mbit/s CBR with R-D on and GOP size 12 following a ’IBPBPBP’ structure. The set of sequences used for this test have different intrinsic properties. The first sequence, Mobile Calendar, includes a vertical panning and objects with horizontal and vertical motion. The second sequence, Stockholm, is composed by a continuous horizontal panning with small objects moving randomly. The third sequence, Shields, is composed by a horizontal panning slowing down, followed by a zoom in.

Table I

Stockholm sequence

P ROPOSED TRANSCODER ACHIEVED IMAGE QUALITY AT 5M BIT / S .

H264 MPEG2 TranscRefer TranscModif

41

Stockholm

Shields

PSNR (dB) 38.986 36.240 36.127 36.082

PSNR (dB) 38.849 34.605 34.559 34.499

40

39 PSNR (dB)

H.264/AVC MPEG-2 TranscRefer TranscModif

Mobile Calendar PSNR (dB) 41.159 36.571 36.425 36.291

38

37

Table I shows the objective quality (Peak Signal-to-Noise Ratio) comparison between the proposed architecture and the classical cascade architecture for the various sequences. In the table I, the entry H.264/AVC corresponds to the PSNR of the original sequence used for transcoding while MPEG-2 Video is the PSNR obtained from single-pass MPEG-2 Video encoding at the same bitrate. It was found that this transcoder performance is consistent for other sequences with different characteristics. As mentioned before, the modified transcoder (TranscModif) implements macroblock conversion techniques only for P-type macroblocks of size 16×16. As it can be seen in figure 3 for Stockholm sequence, the objective quality for the modified transcoder is very close to those of the transcoder of reference TranscRefer, that includes the complete decoding of H.264/AVC and the subsequent encoding with the MPEG-2 Video. It is also very close to single-pass encoded sequence with MPEG-2 Video, which is the present scenario of digital television transmission. The execution time for both transcoders was also determined and it was found that for a very similar image objective quality the proposed approach leads to a strong reduction in computational complexity, i.e., up to 30% in comparison to the reference transcoder. Note that such a gain was achieved by applying the proposed approach to only 36% of the P SLICE macroblocks.

36

35 0

40

60 Frame Number

80

100

Figure 3 - PSNR comparison for Stockholm sequence.

macroblocks comprised of smaller blocks by exploiting the information retrieved from H.264/AVC bitstream. ACKNOWLEDGEMENT This work was sponsored by IT/LA H2M project from Instituto de Telecomunicac¸o˜ es. R EFERENCES [1]

ITU, Information technology - Generic coding of moving pictures and associated audio information: Video, ITU-T Recomendation H.262, Feb. 2000.

[2]

Joint Video Team (JVT) of ISO/IEC MPEG and ITU-T VCEG, “Draft itu-t recommendation and final draft international standard of joint video specification itu-t rec. h.264/iso/iec 14 496-10 avc”.

[3]

Ajay K. Luthra; Gary J. Sullivan; Thomas Wiegand, “Introduction to the special issue on the h.264/avc video coding standard”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, pp. 557–559, July 2003.

[4]

Hari Kalva, “Issues in h.264/mpeg-2 video transcoding”, Computer Science and Engineering, 2004.

[5]

L. Yang, X. Song, C. Hou, and J. Dai, “H.264 MPEG-2 transcoding based on personal video recorder platform”, in Proc. of the Ninth International Symposium on Consumer Electronics, June 2005, pp. 438–440.

[6]

http://iphome.hhi.de/suehring/tml/index.htm, Suehring jm H.264.

[7]

http://www.mpeg.org/MSSG/, MPEG2 v1.2.

[8]

Ishfaq Ahmad; Wei Xiaohui; Yu Sun; Ya-Qin Zhang, “Video transcoding: an overview of various techniques and research issues”, IEEE Transaction on Multimedia, vol. 7, pp. 793–803, Oct. 2005.

[9]

Ricardo Marques; S´ergio Faria; Pedro Assuncao; Vitor Silva; Ant´onio Navarro, “Fast conversion of h.264/avc integer transform coefficients into dct coefficients”, SIGMAP, pp. 5–8, Aug. 2006.

V. C ONCLUSIONS The proposed scheme is focused on computational complexity reduction of a video transcoder between H.264/AVC and MPEG-2 Video standards. Particularly, it is used to exploit the similar tools in the interframe coding of both standards to re-encode 16×16 pixel macroblocks using information included in the H.264/AVC bitstream. By taking into account that these macroblocks have already been analysed in the H.264/AVC encoder, it is possible to significantly reduce the computational complexity as much as 30% (only for P-type macroblocks), when comparing with the reference architecture, for similar objective quality. As future work, in order to further improve the computational efficiency, the remaining macroblock types and partition sizes will also be converted by fast transcoding. It is also possible to improve the image quality by performing a motion estimation refinement, in a small search window, for those macroblocks whose motion vectors are not very representative of its motion. However, this additional complexity must be precisely weighted with the objective quality. This is expected to increase the quality for those

20

[10] Mei-Juan Chen; Ming-Chung Chu; Chih-Wei Pan, “Efficient motion-estimation algorithm for reduced frame-rate video transcoder”, IEEE Transactions on Circuits and Systems for Video Technology, vol. 12, pp. 269–275, Apr. 2002. [11] Anthony Vetro; Charilaos Christopoulos; Huifang Sun, “Video transcoding architectures and techniques: An overview”, IEEE Signal Processing Magazine, vol. 20, pp. 18–29, Mar. 2003.