Motion Estimation for Frame-Rate Reduction in H.264 Transcoding Il-hong Shin, Yung-Lyul Lee* and HyunWook Park Dept. of Electrical Engineering, Korea Advanced Institute of Science and Technology 373-1 Guseong-dong, Yuseong-gu, Daejeon, 305-701, Korea
[email protected] Department of Internet Engineering, Sejong University, Seoul, Korea H.264 [7] codec, which is a joint standard of the ITU-T video compression and the ISO/IEC MPEG-4 Part 10 AVC (Advanced Video Coding), shows improvement of video coding performance. Although it has the same building blocks of the video coding as the previous MPEG-4 standard [6], it has some improved features such as 4×4 integer transform, multiple reference frames, variable block types for motion compensation (MC), quarter-pixel MC, universal variable length coding (UVLC) or context-based adaptive binary arithmetic coding (CABAC), and a non-normative rate-distortion optimization (RDO) tool that is used to decide the optimal block type. With the improved features, H.264 is expected to have high coding efficiency with more than 50 % improvement compared to existing coding standards such as H.263, MPEG-2, and MPEG-4. H.264 can be used in many applications such as video on demand (VOD), teleconferencing, and distance learning. Therefore, transcoding from previous codecs to H.264 or H.264 to H.264 is required. The paper proposes an H.264 to H.264 transcoder to reduce frame rates. We also propose a block-adaptive motion vector resampling (BAMVR) method to estimate optimum motion vectors for MC. An advanced method called BAMVR with RDO method is also proposed to improve the rate-distortion performance compared to the BAMVR only method. In this paper, the proposed transcoder structure of H.264 is described in Section 2. The BAMVR only method and the BAMVR with RDO method are introduced in Section 3. Experimental results and analysis of the proposed methods are given in Section 4, and our concluding remarks are given in Section 5.
Abstract This paper proposes a transcoding method for frame rate reduction in H.264 video coding standard. H.264 adopts various block types and multiple reference frames for motion compensation. When frames are skipped to reduce frame rates in transcoder, it is not easy to estimate optimum motion vectors and block types in H.264. A simple and effective block-adaptive motion vector resampling (BAMVR) method is proposed to estimate motion vector for motion compensation. In order to improve coding efficiency and visual quality, the ratedistortion optimization (RDO) algorithm is also combined with the BAMVR method in transcoder. In experimental results, rate-distortion performance and computational complexity of the proposed transcoder are analyzed for various video sequences. The proposed method achieves remarkable improvement in computational complexity compared to the full-motion estimation (ME) with RDO method.
1. Introduction Ubiquitous networks require situation-aware control of application services such as video and audio transmission. In home-network server, transcoding is required for allowing spontaneous adaptation of bitrate, which are dependent on screen size or processing power of mobile device. Transcoding can be an efficient method to achieve spontaneous bitrate adaptation and good rate-distortion performance [1]-[3]. A simple transcoding method is to convert original bitstreams into lower-bitrate bitstreams to meet the required channel bandwidth. Various transcoder structures have been proposed in H.263 [4], MPEG-2 [5], and MPEG-4 [6] codecs.
2. The proposed transcoder
63
number of reference frames for tracing motion vector in transcoder is S = N FrameSkip + 1 .
Block diagram of the proposed transcoder is shown in Fig. 1, which has a straightforward cascading architecture of the decoder and encoder. The motion estimation usually takes most computations in the encoder. The proposed BAMVR method is used to select optimum motion vectors and reference frames. The loop filter provides improvement of visual quality by reducing blocking artifacts. The non-linearity of loop filter generates drift and mismatch artifacts in transcoding method [1]. The proposed cascade-type transcoder in H.264 is drift-free and mismatch-free thanks to the feed-back loop and the 4×4 integer transform, respectively. End
Rin
end
In H.264, multiple reference frames are used in the motion estimation process in order to obtain better motion compensation performance and help making the H.264 bitstream to be error resilient [8]. In this paper, the multiple reference frames is not used for simplicity.
MV data, S = (N FrameSkip + 1 ) × TranscoderN REF
xnd
case 3
case 2
mvt −1
case 1
mvt
mv = mvt + mvt −1
M ( xnd− k )
ent
Figure. 2. An example of video frames that are remained or skipped for motion vector estimation in the transcoder (three frames are skipped) However, the current block may not be aligned with blocks in the reference frame. In addition, each block can have different block type. For example, the reference block of current 8×8 block in Fig. 3 is overlapped with 4×4 (upper-left), 4×8 (upper-right), 8×4 (lower-left), and 8×8 (lower-right) blocks in the reference frame. Then, motion vectors of four overlapped blocks should be traced, since block types in a frame are widely varied in H.264. Therefore, tracing of motion vector is difficult due to various overlapping regions when frames are skipped in the transcoder.
Ent
M ( xnt − k )
Figure. 1. Block diagram of the proposed transcoder structure
OB0
3. Block-adaptive motion vector
MV0
Ref0
resampling (BAMVR) method MV1
8×8 block
The optimized motion vector can be obtained by reestimating a new motion vector in transcoder. However, motion estimation (ME) requires high computational complexity, unless it utilizes the motion vectors of input video streams. Many researches [2]-[3] exploit the incoming motion vectors in the decoder of transcoder to estimate new motion vectors. Fig. 2 shows an example of frame skipping in the transcoder. N REF , which is set to 1 (the maximum number of reference frames in current H.264 standard is 5), denotes the number of multiple reference frames (MR) in original H.264 encoder, and N FrameSkip is the number
4
OB0 w0 w1 SB00
SB10
OB1
OB1
4 SB01
SB11
in current frame
MV
h0
SB00
h2 OB2
Ref1 h1 h3
MV2
w2 w3
OB2
OB3 Ref2
in reference frame (skipped in transcoder)
MV3
OB3
Ref3
Figure. 3. An example of motion vector resampling of the decomposed 4×4 blocks
of frames to be skipped in the transcoder. Then, the
64
To solve this problem in the paper, an M × N block is divided into the 4×4 subblocks to trace the motion vectors, where M × N can be 16×16, 16×8, 8×16, 8×8, 8×4, 4×8, and 4×4 according to the block types. Trace of motion vector of the 4×4 subblock provides efficient composition method. The motion vector of the kj-th 4×4 subblock (SBkj) of Fig. 3 in the M × N block can be traced as follows:
The BAMVR method that reuses incoming block types as current block type is called BAMVR only method. The transcoded block type is defined as same as the incoming block types in the proposed BAMVR only method. When QP value in the transcoder is different from that of the original compressed bitstream, the incoming block type may not be optimal in the transcoder because characteristics of rate and distortion vary according to QP.
3
∑ A(OB ) × h × w × MV i
MVkj =
i
i
i
i =0
3
∑
meet condition = 1
(1)
n = reference frame number of current block mv = motion vector of current block
A(OBi ) × hi × wi
i =0
while(meet condition )
where subscripts of k and j denote the horizontal and vertical indices of the 4×4 subblocks in the M × N block, and A(OBi ) denotes the area of i-th overlapped block in
{ adaptive estimation of current motion vectors ( MVtemp ) using eqs . (1) - (2) n = n + 1;
the skipped reference frame. In eq. (1), MVi is the
mv = mv + MVtemp ;
motion vector in Fig. 3, respectively, and hi and wi are the horizontal and vertical overlapping region between the 4×4 subblock and the i-th overlapped block, respectively. After obtaining the motion vector MVkj of the kj-th 4×4
if( ( n + 1)% (N FrameSkip + 1 ) = 0 ) meet condition = 0
subblocks, estimation of motion vector for the current M×N block is figured out as follows: M N 4 4
M V =
∑ ∑ k =0
α
kj
j=0 M N 4 4
∑ ∑ k =0
× M V kj
α
} MVt = mv
Figure. 4. Pseudo code of the proposed BAMVR method in the transcoder
(2)
Also direct use of incoming block type results in degradation of visual quality. In order to improve ratedistortion performance and visual quality in transcoder, the BAMVR only method is combined with RDO [7] which selects the block type that minimizes the ratedistortion function from {INTRA4×4 and INTRA16×16} for INTRA frame, and from {INTRA4×4, INTRA16×16, SKIP, 16×16, 16×8, 8×16, P8×8} for INTER frame, in which P8×8 means 8×8, 8×4, 4×8, and 4×4 for each 8×8 block. The improved method, which is called the BAMVR with RDO method, is illustrated in Fig. 5. The arrow in each block indicates direction and magnitude of motion vectors. At first, each 8×8 block in the MB is divided into four 4×4 blocks, where the motion vector of 8×8 block are copied to four 4×4 blocks. For each 4×4 block, MVkj is
kj
j=0
where MV is the motion of the current M × N block in frame skipping situation, respectively, x denotes the nearest integer less than or equal to x , and the weighting factor α kj is set to one in the paper. Using eqs. (1) and (2), new motion vectors are adaptively estimated for current M × N block using incoming block types and motion vectors, when the first reference frame is skipped in the transcoder. The BAMVR method is proposed for estimation of motion vector. The pseudo code of the BAMVR method in the frame-skipping situation is shown in Fig. 4. In Fig. 4, % denotes the modulo operator, and MVt is the final motion vector of a current block from the BAMVR method. The while loop in Fig. 4 provides validity of reference frame number. Until meetcondition is 0, the adaptive estimation is repeated.
obtained by using eq. (1). After obtaining the motion vector and the reference frame number of each 4×4 block, motion vectors of the M×N block type are obtained by using eq. (2). Therefore, seven sets of motion vectors are obtained for seven block types. RDO is applied to these seven sets of MVs for seven block types and two intra
65
(QCIF). Foreman and Silent sequences have large complex local motion and small motion, respectively. The full-search ME with RDO method has ±16 motion search ranges, and the proposed transcoder employs a search window size of ±2 pixels in integer unit for motion vector refinement. In this paper, the fast motion estimation methods [9] are not applied to the transcoder. When those are applied, the computational complexity of the proposed transcoder will be reduced.
block types in order to select the most optimized block type in terms of rate distortion performance. In addition to the BAMVR method, motion vector refinement can be executed to improve the rate-distortion performance. This paper adopts the motion vector refinement process with search range of ±2 in the experiment MV of each 8 ×8 block 16 ×16 block
Foreman, 7.5 Hz MV of each 8 ×4 block
8 MV Decompose into of each 4 ×8 block 16 4×4 blocks
PSNR [dB]
8
RDO in P8×8 for each 8×8 block
Determined P8×8 type
Two intra block types MV Estimate MVkj for of each 16× 8 every 4 ×4 block block ( eq. (1) )
43 42 41 40 39 38 37 36 35 34 33 32 31 30
BAMVR with RDO full-search ME with RDO 10
RDO in MB
MV of each 8 ×16 block
BAMVR only
60
110
160
210
260
310
rate [kbits]
(a)
Final block type
Silent, 10 Hz
PSNR [dB]
MV of 16× 16 block by using eq. (2)
Figure. 5. An example of motion vector composition of 8×8 block using BAMVR with RDO method
4. EXPERIMENTAL RESULTS The proposed transcoder was implemented using the H.264 JM4.2 video codec [7] with CABAC [7], variable block-based ME/MC, motion vector search range of ±16 , quarter-pixel MC, 4×4 integer DCT, 1 reference frames, and the rate-distortion optimization. In experiments, the original compressed bitstreams have QP value of 10 at 30 frames per second (fps) to provide large variation of PSNR (Peak Signal to Noise Ratio) and bitrate in the transcoder. The first frame was compressed with an INTRA frame, and the others with all INTER frames. Bi-directional predicted frames (Bframes) of H.264 were not considered in the paper. PSNR, bitrates, and computational complexity were analyzed for two video sequences such as the “Foreman” and “Silent” sequence of the quarter common intermediate format
43 42 41 40 39 38 37 36 35 34 33 32 31 30 10
30
50
70
90
110
130
150
rate [kbits]
(b) Figure. 6. Rate-distortion (PSNR vs. bitrate) plots of three methods for (a) Foreman and (b) Paris Fig. 6 shows the rate-PSNR curves of various methods such as the proposed BAMVR only method where the incoming block types are reused, the BAMVR with RDO method, and the full-search ME with RDO method. PSNR of the proposed BAMVR with RDO method is degraded with approximately 0.2 dB from the full-search ME with RDO method in Silent as shown in Fig. 6(b). But video sequence with fast and complex
66
This work has been supported by CUCN(National Center of Excellence in Ubiquitous Computing and Networking).
motion such as Foreman has much severe PSNR degradation of 0.5 dB, as shown in Fig. 6(a). In order to reduce PSNR degradation, we can apply larger search window for motion vector refinement in the fast motion sequences. However, it is noted that the BAMVR with RDO method shows better PSNR of 1~3 dB improvement than the BAMVR only method.
6. REFERENCES [1] G. Keesman et al., “Transcoding of MPEG bitstreams”, Signal Processing: Image Commun., vol. 8, pp. 481-500, 1996. [2] J. Youn, M.-T. Sun, and C.-W. Lin, “Motion vector refinement for high performance transcoders”, IEEE Trans. Multimedia, vol. 1, pp. 30-40, Mar. 1999. [3] B. Shen, I. K. Sethi, and B. Vasudev, “Adaptive motion vector resampling for compressed video downscaling”, IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 929-936, Sept. 1999. [4] Video Coding for Low Bitrate Communication, ITU-T Draft Recommendation H.263, May, 1996. [5] ISO/IEC 13818-2 (Mpeg2-Video), Information Technology – Coding of Moving Pictures and Associated Audio for Digital Storage Media at up about 1.5 Mbit/s: Video, 1993. [6] ISO/IEC JTC1/SC29/WG11, Mpeg-4 Video Verification Model 8.0, MPEG97/N1796, July, 1997. [7] T. Wiegand, Joint Final Committee Draft (JFCD) of Joint Video Specification (ITU-T Rec. H.264 | ISO/IEC 14496-10 AVC), JVT-D157, August 2002. [8] T. Wigend, X. Zhang, and B. Girod, “Long-term memory motion-compensated prediction”, IEEE Trans. Circuits Syst. Video Technol., vol. 9, pp. 70-84, Feb. 1999. [9] S. Zhu and K.-K. Ma, “A new diamond search algorithm for fast block matching motion estimation”, IEEE Trans. Image Processing, Vol. 10, pp. 287-290, Feb, 2000.
The computational complexity
Relative complexity ratio
1 0.9
BAMVR only
0.8
BAMVR with RDO
0.7 0.6
full-search ME with RDO
0.5 0.4 0.3 0.2 0.1 0
Figure. 7. Average computation amount Fig. 7 shows the average computation amounts of two video sequences from the BAMVR only method, the BAMVR with RDO method, and the full-search ME with RDO method. For comparison of computation amount, we used Pentium IV 1.7 GHz with 512 MB memory. The relative computational complexity of the proposed BAMVR with RDO method is about 38 % of the fullsearch ME with RDO method. Although BAMVR only method requires the lowest computation amount of about 11% from the full-search ME with RDO method, the BAMVR only method itself shows poor rate-distortion performance and visual quality.
5. CONCLUSION We proposed an efficient transcoder for frame-rate reduction in H.264. The BAMVR method was proposed to reduce computational complexity when optimum motion vectors were estimated for various block types. In addition, the BAMVR with RDO method was suggested to improve rate-distortion performance and visual quality. The experimental results show that the proposed method has a suitable performance compared with full-search ME with RDO method in terms of rate distortion and computational complexity. We expect the proposed method will be applied to the real-time transcoding for H.264 bitstreams in near future.
Acknowledgement
67