Abstractâ We present a novel arbitrary spatial downsizing method in H.264 to MPEG4 simple profile transcoder. Using median filtering and scaling of input ...
mm09-52
1
Arbitrary spatial downsizing in H.264 to MPEG4 simple profile transcoder Sourya Bhattacharyya*, Subarna Tripathi*, and Emiliano Mario Piccinelli# Advanced System Technology, STMicroelectronics Pvt. Ltd. # *Greater Noida, U.P. - 201308, India 20041 Agrate Brianza (MI), Italy Email: {sourya.bhattacharyya, subarna.tripathi, emiliano.piccinelli}@st.com Abstract— We present a novel arbitrary spatial downsizing method in H.264 to MPEG4 simple profile transcoder. Using median filtering and scaling of input H.264 4x4 block level motion vectors, downsizing module generates target resolution compatible 4x4 block level motion vector, which is fed at MPEG4 encoder to generate MPEG4 compatible final 16x16 or 8x8 block level motion vector. Downsizing and encoding modules are differentiated for better portability. Our approach, being approximately 6-8 times faster compared to full encode system, achieves almost same compression as compared to full encoding. Index Terms— MPEG4 Simple Profile, Multimedia processing, Video coding and processing.
I. INTRODUCTION Digital video codec standard H.264/AVC [1] has very high compression efficiency together with error recovery and transmission unit based implementation; these main features make this coding standard widely popular, mainly to compress high resolution streams. On the other hand, MPEG4 [2] has been widely used especially in low power mobile devices, where the computational complexity, resolution and bandwidth requirements have to necessarily be low. Compatibility between these two standards is ensured by our implemented pixel domain transcoding architecture (as in
Fig. 1), integrated in ST proprietary Dynamic Bitstream Shaper (DBS) [3] transcoding library. Our transcoder system is composed by modified full decoder and encoder. Decoded reusable information is stored in a shared memory buffer (SMB). It is then used at encoding side to create the optimized output stream, being the system synchronized at frame level. The main shared information between decoding and encoding sides are full decoded YUV frames (stored in a shared frame buffer SFB, together with their native picture types), macroblock (MB) information (such as MB type), motion vectors (MV) and reference frames index The encoding part completely avoids motion estimation, thus achieving most of the computational complexity reduction. In the middle between decoding and encoding parts, we added a fully arbitrary downsizing module to satisfy transmission bandwidth constraints, or specific user requirements, such as low resolution device display, etc. Key points of downsizing is to first analyze input to output downsizing ratio (integer or fraction ratio in X or Y direction), determine input image macroblocks needed to produce one macroblock of output image, input MV values, MB partitions, MB coding types, and adaptation of those information in target standard compatible format.
Pixel data
IN
VLD
IQ
IDCT
MC
Q
DCT
S F B
IQ MB map
MV Refine
IDCT
Picture Buffer
Macroblock information
Decoder SFB : Shared frame buffer
Fig. 1: Our proposed transcoder architecture
VLC
S M B
Picture Buffer
MC
Encoder SMB : Shared MB Info Buffer
OUT
mm09-52
2
Following sections demonstrate related works, our proposed downsizing methodology, corresponding encoding strategy, and final transcoding results compared to stand alone encoders.
MV for current output 4x4 block. H.264 Decoder
Input resolution compatible 4x4 block level MV & MB type information
II. PRIOR ART Many existing solutions of transcoding with downsizing are available in literature. For example, [4] demonstrates H.264 baseline to MPEG4 simple profile transcoder with 2:1 downsizing ratio in both horizontal and vertical directions, but it does not support arbitrary or fractional downsizing. Work in [5] is also about integer ratio downsizing. Moreover, it is based on 16x16 MB partition and corresponding MV processing. Work in [6] supports arbitrary downsizing: it deals with input 16x16 MV information and produces output H.264 compatible MV (from 16x16 to 8x8 partitions). We use and extend its arbitrary downsizing approach in our algorithm with adaptation of H264 input 4x4 level MV to MPEG4 output 16x16 or 8x8 MV. Work in [7] for H.264 to H.264 transcoding processes 4x4 block level input MV information, but it does not avoid motion re-estimation completely. Mode refinement strategy for H.264 is computationally far complex than our proposed simple algorithm. Finally, work in [8] deals with video transcoding with variable block-sized motion estimation, but the top-down approach for determining the partitions’ motion vectors is different from our algorithm.
Arbitrary Downsizing – MV refine
Output resolution compatible 4x4 block level MV & MB info MPEG4-SP Encoder Based Adaptation
MPEG4-SP specific 16x16 or 8x8 block based MV Encoding in MPEG4-SP encoder
III. PROPOSED DOWNSIZING TECHNIQUE In most of the prior arts mentioned above, motion vectors are derived from input MV, downsizing factor, and on the basis of desired output MB partition type (which is very much standard dependent). Whereas, in our approach, output MB partition type eventually is derived by bottom up approach – i.e., using the calculated downsized minimum size block’s (4x4) MV. This approach has the benefit of generalization, portability as a black-box technique, for any standard supporting variable size motion compensation block. Schematic diagram of our proposed downsizing module is shown in Fig. 2. Downsizing processes H.264 input 4x4 block level quarter pixel (QPEL) resolution MV and MB information (partition, coding type) to generate target resolution equivalent 4x4 block level MV. This information is fed into MPEG4 encoder. Fig. 3 shows 4x4 motion vector map technique in our downsizing module. From Fig. 3, 2 full 4x4 blocks and 10 partial 4x4 blocks contribute to formation of one output 4x4 block. For this target 4x4 block, we compute overlapped area for each participating 4x4 input block by noting down corresponding pixel boundaries left (xl), right (xr), top (yt) and bottom (yb). For each participating input 4x4 block, if overlapping area is greater than 1/4th of a 4x4 block size then we insert current 4x4 block MV into list of candidate MVs. After formation of total candidate 4x4 MV list, we take median of them and scale the derived MV in both X and Y direction according to corresponding downscaling ratios. This scaled MV is now the
Fig. 2: Arbitrary Downsizing Module Schematic Diagram
(xl, yt)
(xr, yt)
4
4 P
(xr, yb)
(xl, yb)
4 4
Fig. 3: 4x4 Block based MV map in downsizing
For each generated output MB, we track participating input MB reference types (INTRA or forward). If all participating MBs are of INTRA type then generated MB type is INTRA. Unless, it is forward reference MB (MPEG4 simple profile does not allow backward prediction).
mm09-52
3
IV. ENCODING WITH DERIVED BLOCK LEVEL MV We use a very simple encoding technique quite similar to H.264 to MPEG4 block based coding technique mentioned in [4]. From previously calculated output resolution compatible 4x4 MV, we derive output MV for 16x16 and 8x8 MB partition (as supported by MPEG4 simple profile) by following procedures. 1) For input INTRA 16x16 or 4x4 MB, we code it in INTRA 16x16 MB partition. We can skip INTRA MB if its associated coded block pattern (CBP) is zero. 2) For input 16x16 inter or skip MB, we first of all, convert input QPEL resolution 16x16 MV to MPEG4 supported half pixel resolution 16x16 MV. Then we code current MB in 16x16 partition mode if it’s associated CBP is nonzero; otherwise skip it. 3) For other input partition inter MB, we derive output average 8x8 block level half pixel MV from input 4x4 quarter pixel MV information by following formula: (2i+1) (2j+1)
a_mvi,j = [{ ∑
∑ mvk,l + 2 } >> 2 ] / 2
k=(2i) l=(2j)
(1)
Subscripts i, j, denote vertical and horizontal average MV indices of four output 8×8 blocks. Subscripts k, l denote vertical and horizontal motion vector indices of sixteen 4×4 sub-blocks in each input H.264 MB. Then, block conversion process is applied such that if the difference values among each a_mvi,j vector are less than 9 and each a_mvi,j vector has the same direction, then the Inter16×16 mode is selected, otherwise the Inter8×8 mode is selected. With this procedure, sometimes, excessive 8x8 coding mode is selected even with same or larger residual as of 16x16 mode, resulting 3 extra MV overhead per MB. It is due to successive approximation of MV in transcoding, resulting MV value differences across 4x4 blocks. To prevent excessive 8x8 MB partition use (and thus to avoid losing compression under constant QP environment), we compare sum of absolute difference (SAD) for both 16x16 (SAD16x16) and 8x8 (SAD8x8) coding of those MB, originally decided to be coded in 8x8 partition. if (SAD8x8 < (SAD16x16 – SADthreshold_difference )), Then employ 8x8 coding else employ 16x16 coding. If 8x8 coding generates significant improved SAD compared to 16x16 MB partition coding mode, then only we code current MB in 8x8 partition mode. SADthreshold_difference is a dynamic threshold – a linear function of Quantization parameter (QP) (MPEG4 QP range 1-31): SADthreshold_difference = (QP