EXTENDED DIRECT MODE FOR HIERARCHICAL B PICTURE CODING Jiali Zheng, Xiangyang Ji, Guangnan Ni, Wen Gao
Feng Wu
Institute of Computing Technology Chinese Academy of Sciences Beijing, China {jlzheng, xyji, wgao}@jdl.ac.cn;
[email protected]
Microsoft Research Asia Microsoft Beijing, China
[email protected]
Abstract—In 3D subband coding, Motion-Compensated Temporal Filtering (MCTF) technique, especially with the adaptive 5/3 wavelet kernel, efficiently exploits the temporal correlation among neighboring pictures. It can be achieved alternatively by the hierarchical B picture coding technique with the same decomposition as the temporal 5/3 lifting structure but without update step. This technique has been supported by H.264/AVC [1] with stored B technique. The performance of hierarchical B obviously outperforms that of the conventional IBBP…GOP structure. Direct mode which derives motion vectors from the colocated block of backward reference and does not require any bits for coding motion vectors is a very efficient bi-prediction technique for B pictures coding. Considering that the forward motion vectors in the backward reference is not available when it belongs to block only with backward motion vector, in this paper, we propose an extended direct mode for hierarchical B picture coding to further improve the accuracy of derived motion vectors. It effectively utilizes the forward and backward motion vectors of the co-located blocks in references for scaling.
MPEG-4 [3], H.263 [4] and H.264/AVC all have adopted temporal direct mode. Temporal direct mode takes advantages of bi-directional prediction and does not need to transmit motion vectors by deriving them from that used in the colocated block of backward reference picture [5]. As shown in Fig.2, the forward motion vector MV0 and backward motion vector MV1 of direct mode block are calculated as follows: MV0 = MV1 =
TRb × MVC , TR d
(1)
TRb − TR d × MVC , TR d
(2)
Where TRb is the temporal distance between the current B picture and the forward reference picture, TRd is the temporal distance between the forward reference picture and the backward reference picture, and MVC is the motion vector of the co-located block in the backward reference picture.
Keywords-hierarchical B picture; extended direct mode forward reference
I. INTRODUCTION In the current MPEG Scalable Video Coding (SVC) scheme, temporal scalability is enabled by the MotionCompensated Temporal Filtering (MCTF). It can efficiently exploit the temporal correlation among neighboring pictures by prediction and update steps. Hierarchical B picture coding, which has the same decomposition structure as the lifting based MCTF layer without update step, is also implemented in H.264/AVC [2]. As illustrated in Fig.1, the first picture is independently coded as an IDR picture, and all remaining pictures are coded in “B...B I/P” group of pictures using the concept of hierarchical B picture. The B pictures of the first level (the picture labeled as B1) use only the surrounding I/P references for motion-compensated prediction. The B pictures Bi of level i > 1 can use the surrounding I/P references as well as the B pictures Bj with the level j < i that are located at the same group of pictures for motion-compensated prediction. The coding order of the pictures in a GOP is “I I/P B1 B2 B2 B3 B3 B3 B3…” I
B3
B2
B3
B1
B3
B2
B3
I/P
previous picture
MV0
current B picture
MVC
direct mode block
backward reference
co-located block MV1
TRb TRd time
Fig.2: The motion vectors of the direct mode block are derived from the motion vector of the co-located block. In addition, when the motion vector of the co-located block in the backward reference picture is coded with intra mode in some cases, e.g. uncovered boundaries, overlapped blocks and the luminance change in adjacent pictures, etc, it will be set as zero [6]. However, this will lead to the deterioration of the derived motion vectors in direct mode. The problem is more serious in hierarchical B picture coding when reconstructed B pictures are also used as references. It often happens when the co-located block in the backward reference is coded
Fig.1: The structure of hierarchical B picture
0-7803-9134-9/05/$20.00 ©2005 IEEE
...
II-265
backwardly. In other word, the temporal direct mode is not efficient for the hierarchical B picture coding. To tackle above problem, we propose an extended direct mode for hierarchical B picture coding to further improve the accuracy of the derived motion vectors in direct mode. With this technique, we can use the forward and backward motion vectors used in co-located block of the B-picture reference with proper scaling to efficiently exploit the temporal correlation among the neighboring pictures.
B.
The rest of this paper is organized as follows: section 2 describes the detail of the extended direct mode for hierarchical B picture coding in three different cases. The simulated results for evaluating the rate distortion performance of the proposed technique are presented in section 3. Section 4 gives conclusions and future works. II. EXTENDED DIRECT MODE FOR HIERARCHICAL B PICTURE CODING
As discussed above, when the co-located block in backward reference picture only owns backward motion vectors or even is coded with intra mode, there are three cases to be handled as follows: A. When the co-located block of the backward reference picture is coded with intra mode, if the co-located block of forward B-picture reference owns backward motion vector pointing to the backward reference picture, we can still use this motion vector with proper scaling to derive forward and backward motion vectors of the current direct mode block. As shown in Fig.3, the forward motion vector MV0 and backward motion vectors MV1 of the current direct mode block are derived as follows: TR − TR d (3) MV0 = c × MVC , TR d
TRc MV1 = × MVC , TR d forward B reference
current B picture
MV0
co-located block
the backward motion vector of the co-located block in the forward B-picture reference. In this case, the backward motion vectors of the colocated blocks coded with bi-prediction mode also can be used to get motion vectors in direct mode. That is to say, the proposed direct mode is more efficient to derive valid motion vectors form the co-located blocks than traditional direct mode in hierarchical B picture coding. When the backward reference picture is B picture and the co-located block in it only owns the backward motion vector, we derive the forward motion vector pointing to forward reference picture from the backward motion vector of the co-located block according to the temporal distance between forward reference picture and backward B-picture reference. As shown in Fig.4, the forward motion vector MVC of the co-located block is calculated as follows: TR d (5) MVC = × MVC ' , TR d ' Where TRd is the temporal distance between the forward reference picture and backward B-picture reference, TRd’ is the temporal distance between the backward B-picture reference and subsequent backward reference picture, and MVC’ is the backward motion vector of the co-located block in backward B-picture reference. forward reference
current B picture
MV0
backward B subsequent subsequent reference picture backward reference
MVC MV1
direct mode block
MVC' TRd
(4)
co-located block
TRd' time
subsequent picture
...
Fig.4: Forward motion vector of the co-located block is derived from backward motion vector.
backward reference
direct mode block MV1
MVC TRc TRd
time
Fig.3: The forward and backward motion vectors of the direct mode block are derived from the backward motion vector of the co-located block in the forward B-picture reference.
C.
Where TRc is the temporal distance between the current B picture and the backward reference picture, TRd is the temporal distance between the forward B-picture reference and the backward reference picture, and MVC is
II-266
When the forward motion vector MVC of the co-located block has been derived, we can calculate the forward motion vector MV0 and backward motion vector MV1 of the direct mode block according to (1) and (2). Considering the case that the co-located block in forward B-picture reference only owns the forward motion vector, similarly to this method, we can also derive its backward motion vector according to temporal distance between forward B-picture reference and backward reference picture, and follow the case A to calculate the forward and backward motion vectors of the current direct mode block according to (3) and (4). When the co-located blocks in forward and backward reference pictures both are coded with intra mode, we search previously coded backward pictures to find a colocated block whose forward motion vector pointing to the forward reference picture of current B picture. As
illustrated in Fig.5, we derive forward motion vector MV0 and backward motion vector MV1 of the current direct mode block by following equations: TRb (6) MV0 = × MVC , TR d '
TRb − TR d (7) × MVC , TR d ' Where TRb is the temporal distance between the current B picture and the forward reference picture, TRd is the temporal distance between the forward reference picture and backward reference picture, TRd’ is the temporal distance between the forward reference picture and the subsequence picture whose co-located block is selected, and MVC is the motion vector of the selective co-located block in the subsequent picture. The proposed direct mode in the case C is also suitable for non hierarchical B picture coding with multiple references. For some cases, such as non linear motion and scene change, this method efficiently derives more accurate prediction value than traditional direct mode.
Therefore, the more blocks are coded with direct mode and the better coding performance can be achieved. Fig.7 illustrates that the average gain with the proposed direct mode for B pictures is up to 0.436dB in Mobile sequence and 0.274dB in Tempete sequence. Table.1 Test conditions
MV1 =
forward reference
current B picture
backward reference
subsequent picture ...
MVC MV0 MV1
direct mode block TRb
co-located block
TRd TRd'
Search Range Restrict Search Range MV resolution Reference Frames Hadamard RD optimization Symbol Mode QP
± 16 no restrictions 1/4 pel 1 ON ON CAVLC 27,30,35,40
IV. CONCLUSION In hierarchical B picture coding, traditional temporal direct mode does not efficiently deal with the cases of co-located blocks in B-picture references. This paper proposed an extended direct mode to further improve the accuracy of the derived motion vectors in direct mode by efficiently exploiting the motion information within the B-picture references. In addition, because both the motion vectors used in co-located blocks of forward and backward references can be used by extended direct mode, how to combine these motion vectors to get better trade-off between bit-rate saving for motion vector coding and the prediction accuracy should be further studied. ACKNOWLEDGMENT
time
Fig.5: Forward and backward motion vectors of the direct mode block are derived from the motion vector of the colocated block in subsequent picture. III. SIMULATED RESULTS To evaluate the general performance of the proposed extended direct mode compared with traditional direct mode, we integrated this technique into the H.264/AVC reference software JM92 [7]. The test sequences with the QCIF format include Mobile and Tempete with 30fps. The main test conditions are shown in Table.1. In the test, we enable hierarchical B picture structure and adopt IBBBBBBBPBBBBBBBP…GOP structure. From Fig.6, we can observe clearly that when the proposed direct mode is used, the number of the blocks coded with direct mode in each B picture is much larger than that of traditional direct mode, which proves that the proposed direct mode does efficiently improve the accuracy of the derived direct mode motion vectors in hierarchical B picture coding. As described above, the direct mode owns the following virtues. Firstly, it uses the correlation of the co-located blocks in both future and past pictures. Secondly, the derived motion vectors are more accurate and robust when there are uncovered boundaries, overlapped blocks and the luminance change.
This work is partially supported by National Science Foundation of China under contract No.60333020, National Hi-Tech Research Program of China under contract No.2002AA119010 and National Fundamental Research and Development Program of China under contract No.2001CCA03300. REFERENCES [1]
[2]
[3]
[4] [5]
[6]
[7]
II-267
ITU-T and ISO/IEC JTC1, “Advanced Video Coding for Generic Audiovisual Services,” ITU-T Recommendation H.264 – ISO/IEC 14496-10 AVC, 2003. Julien Reichel , Mathias Wien , Heiko Schwarz, “Scalable Video Model 3.0,”ISO/IEC JTC 1/SC 29/WG 11N6716, October 2004, Palma de Mallorca, ES ISO/IEC JTC1, “Coding of audio-visual objects – Part 2: Visual,” ISO/IEC 14496-2 (MPEG-4 Visual), Version 1: April 1999, Amendment 1 (Version 2), Feb. 2000. ITU-T, “Video Coding for Low Bitrate Communication,” ITU-T Recommendation H.263, Version 1: Nov. 1995, Version 2: Jan. 1998. M.Flierl and B.Girod. “Generalized B Pictures and the Draft H.264/AVC Video Compression Standard,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 13, no. 7, pp. 587-597, July 2003. Xiangyang Ji, Debin Zhao, Wen Gao, Qingming Huang, Siwei Ma, Yan Lu, “New Bi-prediction Technique for B picture coding,” The 2004 IEEE International Conference on Multimedia and Expo (ICME’2004), Taibei, Taiwan, Jun.27-30,2004 Joint Video Team (JVT) Reference software, Version 92, http://bs.hhi.de/~suehring/tml/download/
direct mode number
mobile(qcif 30fps)
110 100 90 80 70 60 50 40 30 20 10 0
reference proposed 1
10
19
28
37
46 55 64 73 B picture
82
91 100 109
(a)
direct mode number
tempete(qcif 30fps)
100 90 80 70 60 50 40 30 20 10 0
reference proposed 1
10
19
28
37 46 55 64 B picture
73
82
91 100 109
(b) Fig.6: Number of blocks (8x8block) coded with direct mode in each B picture
psnr(dB)
mobile(qcif 30fps) 38 37 36 35 34 33 32 31 30 29 28
reference proposed 0
100
200
300 400 500 bitrate(kbits/s)
600
700
(a) tempete(qcif 30fps)
psnr(dB)
38 37 36 35 34 33 32 31 30 29
reference proposed 0
100
200 300 bitrate(kbits/s)
400
500
(b) Fig.7: Rate-distortion curves of traditional direct mode and extended direct mode for hierarchical B picture.
II-268