OBJECT BOUNDARY BASED MOTION PARTITION FOR VIDEO CODING Jianle Chen, SangRae Lee, Kyo-Hyuk Lee and Woo-Jin Han Digital Media R&D Center, Samsung Electronics Co., Ltd. {jianle.chen, srlee73, kyohyuk.lee and wjhan.han}@samsung.com ABSTRACT In the H.264/MPEG-4 AVC video coding standard, motion compensation can be performed by partitioning macroblocks into square or rectangular regions to improve inter prediction efficiency. However, current H.264 MB partition set is not optimal because the rigid square or rectangular divisions can’t match the boundary shape of a moving object well. In this paper, we analyze this problem in detail and propose an object-boundary-based motion partition scheme to overcome the problem. The proposed scheme generates the motion partition map for current macroblock by segmenting corresponding area in the reference picture. Due to continuity of the object boundary shape in the video sequence, this technique allows the shapes of the partition map to match better the boundaries of moving objects. Experimental results show that the proposed motion partition method improves coding efficiency of inter P picture of H.264 coding system with average 6.79% bit saving. The coding gain of test sequence with distinct moving objects is up to 10.63% bit saving. Index Terms — motion compensation, macroblock partition, video coding
1. INTRODUCTION High compression gains for video sequences can be achieved by removing temporal redundancies between images (frames). The current image is predicted by a motion compensated prediction from already encoded reference images. Only the prediction error and the motion information that is obtained by the motion estimation are coded and transmitted. Due to simple implementation cost and relatively high efficiency, blockbased motion estimation and compensation methods are widely used to generate the prediction picture. In early International video compression standards like MPEG-1, H.262/MPEG-2 and H.263, fixed blocksize (16x16) motion estimation and compensation is used to generate the prediction picture. The new video coding standard, H.264/MPEG-4 AVC extends the block-based motion compensation technique by introducing tree structured variable block-size (T-VBS) motion compensation [1]. Macroblocks can be split into multiple square or rectangular blocks of a minimum of 4x4 pixels. This method improves the prediction performance
especially in the case that regions with different motions. However, since the shape of the macroblock partition is restricted to squares or rectangles, it does not normally correspond to the outline shape of the moving object. And further, large amounts of bits are required to encode the macroblock partition type and motion vector information when small block-size mode is selected. To solve this problem, Kondo proposed a new motion compensation method in which macroblocks or submacroblocks are divided into two regions (sliced blocks) by an arbitrary line [2]. On the basis of macroblock processing unit, it segments macroblock into 2 partitions by an arbitrary straight line and performs motion compensation for each segmented region. With the arbitrary straight line, shapes of the sliced blocks can be matched as close as possible to the outline shape of moving objects, and motion compensation performance can be improved. However, strong limitation still exists in this method. One drawback is that the straight line can not exactly represent the round shaped object boundary. Another is that an amount of bits are needed to encode the line segment information which has a considerable negative impact in coding efficiency. In [2], the arbitrary line information is expressed as the position of the two border points that the line passes through. The similar partition method is mentioned by Hung and Divorra.[3][4]. The arbitrary line is expressed by orientation angle θ and distance ρ from the block’s center. This paper propose a method that generates objectboundary-based motion partition for motion estimation and motion compensation to improve the performance of inter prediction. Instead of transmitting the motion partition information directly, the proposed method derives partition information from prediction signal in the reference picture. Due to continuity of the object boundary shape in the video sequence, the proposed method can accurately express the arbitrary object boundary of current encoded region with small overhead. 2. OBJECT-BOUNDARY-BASED MOTION PARTITION 2.1. Analysis of motion partition method The advantages of object-boundary-based motion partition have briefly been aforementioned. This section will further address several questions about motion partition for video coding. How does the motion partition
F’ MV1
B
F
MV2 B’
Reference frame
Current frame
Fig.1 An example of boundary-based motion partition
Fig.2 Comparison of motion partition methods
benefit video coding? What is the most advantageous motion partition? We will analyze these questions by comparison of several existing partition methods with pictorial examples. Fig.1 shows an example of inter prediction of video coding. We consider a 16×16 pixels region which consists of two different predictable regions (assume they are a foreground part F and a background part B). That means the two parts have different motion and texture. Fig.2 shows a comparison of several motion partition strategies. In the fixed block size method (Fig.2 (a)), obviously, lots of pixels in current region can not be correctly predicted. Therefore coding efficiency is strongly limited. Fig.2 (b) divided current region into two 8×16 block; as a result, wrong predicted pixel number is reduced. Fig.2 (c) shows the tree structured variable block-size (T-VBS) partition method of H.264. This method further reduces the wrong predicted pixels number, while as aforementioned, the penalty is large amount of bits to transmit mode and motion vector. From aspects of good prediction performance and small MV information overhead, the current region should optimally be divided into two partitions by the object boundary. As shown in Fig.1and also Fig2. (e), F should be predicted from F' in reference picture with MV1, and B should be predicted from B' with MV2. Obviously, this partition method is only an ideal case, and it would be unworthy if we have to encode the accurate partition map information. In viewpoint of keeping the overhead of transmitting boundary overhead not to outweigh the benefit of motion partition, sliced block method (Fig2. (d)) roughly approximate partition boundary with an arbitrary straight line. This method reduces the erroneous predicted pixels with a middle amount of partition information overhead and coding efficiency is improved with appropriate rate distortion optimization method.
In this paper we try to find another way to obtain the optimal partition map other than express the partition map with explicit information? In most cases, we can assume the boundary shape of a moving object slightly changes between two continuous frames of a video sequence. With this assumption, a similar shape boundary of each object boundary in current frame can be found in the reference frame. 2.2. Partition map generation and prediction The proposed method tries to divide macroblock into two partitions according to the boundary of moving object. Instead of expressing the object boundary shape information directly, we derive the similar boundary shape from the reference picture. A motion vector is required to locate a region in the reference picture which have similar boundary with current macroblock. Then motion partition map of current macroblock can be generated by segmenting the corresponding reference region. In the simulation of this paper the simple threshold-based segmentation method based on illumination value is employed [5]. To get two continuous partitions, some isolated regions are merged together after threshold-based segmentation. After partition map is obtained, motion estimation is performed independently for each partition, generating two motion vectors for the current macroblock. As mentioned in [4], the prediction of current macroblock can be stated as: r r P ( x, y ) = R ( x + MV 1) × MASK P1 ( x, y ) + R ( x + MV 2) × MASK P2 ( x, y )
where P represents current prediction and R represents reference picture. MV 1 and MV 2 are motion vector of partitions P1 and P2. MASKP1 and MASKP2 represent mask of partition map respectively. The mask value is either 1 or 0.
In the proposed method, an additional motion vector (names as MV3) which locates the corresponding region in the reference need be encoded and transmitted to the decoder. From aspect of physical meaning, MV3 represents motion information of object boundary. Meanwhile, one of MV1 and MV2 is the motion vector of object in the same region. So MV1 or MV2 can be employed as the predictor of the additional MV3 (in the simulation, we use the average value of MV1 and MV2 as the predictor). To reduce the overhead further, we can set MV3 to be just same with one of MV1 and MV2, and add one bit flag to indicate which one is used as MV3. 2.3. Modes coding and selection The proposed method performs encoding with macroblock units, so it is easy to integrate the method into the H.264 coding system. As shown in Tab.1, we insert INTER_OMP16×8, INTER_OMP8×16 into MB mode table of current H.264 P picture. They can be deemed as the additional modes of INTER_16×8 and INTER_8×16. INTER_OMP16×8 mode is the new mode whose MV prediction method is same with INTER_16×8, and INTER_OMP8×16 mode is the new mode whose MV prediction method is same with INTER_8×16. Tab.1 MB modes with proposed method Code Number Mode Name 0 SKIP/DIRECT 1 INTER_16×16 2 INTER_16×8 3 INTER_8×16 4 INTER_OMP16×8 5 INTER_OMP8×16 6 INTER_8×8Sub 7 INTRA
To select the best coding mode in the encoder side, the conventional framework of mode selection using rate distortion optimization scheme can be adopted without modification. The selection of best additional MV3 for the new proposed modes can also be selected through best RD cost strategy. The search of MV3 of the proposed method can be performed either in the motion search stage or in the rate-distortion selection stage. In the simulation, a two circles search method is employed to obtain the best MV3. For each candidate of MV3, the corresponding reference MB is located and partition map is generated. MV1 and MV2 of current MB are estimated with the obtained partition map. Then the current MB is encoded and RD cost value is calculated. For each MB, the RD cost values of all possible MV3 are calculated and the one with smallest RD cost value is chosen as the best MV3. 3. EXPERIMENTAL RESULT The proposed technique was implemented on the base of the H.264 reference software JSVM5.9 [6]. For the purpose of comparison, sliced MB by an arbitrary line is
also implemented and evaluated [2]. For the proposed technique, two method “w/o MV3” (MV3 is set from MV1 or MV2 with a flag) and “with MV3” (MV3 is transmitted to decoder side explicitly). For fair comparison, the selection of both line parameter and MV3 is performed at the rate-distortion stage. The bit saving ratio of the proposal against the reference (JSVM 5.9) is used to evaluate the performance. Positive value of bit saving ratio means that the evaluated method can reduce the bit rate in percentage compared to the reference software with same PSNR [7]. We employed six MPEG test sequences (Bus, Football, Foreman, Ice, Mobile and Stefan) with relatively high motion activity to confirm the effectiveness of proposed technique. All test sequences have CIF resolution (352x288) and 30 Hz frame rate. First 16 frames of each sequence were used in the experiment, and only performance of P frame is evaluated. A summary of experimental conditions are shown in Table 2. Tab. 2 Experimental conditions GOP structure IPPP… MV Search range ±32 pixels MV accuracy ¼ pixel Entropy coding CABAC 8×8 Transform On Deblock filter On QPs 30, 36 RDO On
Table 3 shows average bit rate saving of arbitrary line motion partition and the proposed technique for all test sequences. The result shows that all the partition methods achieve observable gain. The average bit savings of arbitrary line method and proposed method without MV3 are 5.74% and 4.74% respectively. The proposed method with MV3 shows the highest gain (6.79% bit saving). Tab.3 Average bit rate saving against JSVM5.9 Bit rate saving (%) Sequence Proposed Arbitrary line w/o MV3 with MV3 Bus 5.14 4.16 5.98 Football 3.37 1.09 2.82 Foreman 7.44 8.81 9.21 Ice 8.46 6.82 10.63 Mobile 3.75 2.99 6.10 Stefan 6.33 4.62 5.93 Average 5.74 4.74 6.79
As mentioned in [4], all new motion partition methods are motivated to improve the prediction efficiency of picture content containing multiple objects/regions with different motion. Due to utilization of image segmentation method, the proposed method is more suitable for the sequence with distinct and unchanged object boundary. Tab.3 shows the experimental results match quite well with the assumption. Both methods achieve much higher gain for “Forman” and “Ice” than the other sequences. Comparing with the arbitrary line
method, the proposed method shows much better performance with “Foreman” and “Ice” sequences which have stable object boundary characteristics, while a little lower performance with the “Football” and “Stefan” which have quickly changing boundary shape and complex background texture characteristics. Figure 3 shows an example of how macroblock partition modes were selected in H.264 and the proposed method. The yellow partition represents INTER_OMP16×8 mode, and the red partition represents INTER_OMP8×16 mode. It can be easily found that both arbitrary line method and proposed method can dramatically reduce the INTER_8×8Sub modes in H.264, resulting in low motion vector overhead. We can further find that macroblock partition map of the proposed method can match better the real object boundary than that of arbitrary line method. In arbitrary line method, lots of macroblock partition map are not synchronized to the outline shape of the moving object. They are just results of coding mode selection with rate distortion optimization. That means the optimal motion partition map is not always identical with object boundary in some region. This is also the reason why the proposed method with an additional motion vector to locate the reference signal achieves higher gain than the one without the additional motion vector.
between RDcurves,” ITU-T Q.6/SG16 VCEG, VCEG-M33, Apr. 2001.
(a). H.264
4. CONCLUSION This paper analyzes macroblock partition for motion compensation in detail and proposes an object-boundarybased motion partition scheme to improve the performance for inter prediction. The proposed scheme generates the motion partition map for current macroblock by segmenting a corresponding area in the reference frame. Experimental result shows that this technique allows the shapes of the partition map to match better the boundaries of moving objects and improves coding efficiency of inter P picture of H.264 with average 6.79% bit saving.
(b). Arbitrary line method
5. REFERENCES [1] Information technology — Coding of audio-visual objects Part 10: Advanced video coding, ISO/IEC 14496-10, Dec.2003. [2] S. Kondo, H. Sasai, “A Motion Compensation Technique Using Sliced Blocks In Hybrid Video Coding,” IEEE International Conference on Image Processing, Genova, Italy, pp.305-308, Sept. 2005. [3] Hung. E.M, De Queiroz R.L. Mukherjee. D, “On Macroblock Partition for Motion Compensation,” IEEE International Conference on Image Processing, Atlanta, USA, pp.1697 – 1700, Oct. 2006. [4] D. Oscar Divorra, Y. Peng, C, Gomila, “Geometry-adaptive Block Partioning,” ITU-T Q.6/SG16 VCEG, VCEG-AF10, San Jose, USA, April, 2007. [5] Castleman .K.R, Digital Image Pocessing, Prentice-Hall, 1996. [6] JSVM 5.9 software, Available from CVS repository :pserver:
[email protected]:/cvs/jvt [7] G. Bjontegaard, “Calculation of average PSNR differences
(c). Proposed method with MV3 Fig.3 Motion partition map of “Foreman” (2nd frame, QP = 30)