compensated prediction for compression. As a result, recent video playback devices only offer limited numbers of controls for video browsing because motion ...
A New Motion Vector Composition Algorithm for Fast-forward Video Playback in H.264 Tsz-Kwan Lee, Chang-Hong Fu, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing, Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong
Abstract— With the rapid growth of streaming digital videos, it is desirable to access video segments of interest by searching through the video contents with a faster speed than a normal playback. Fast-forward playback is the key function that enables quick browsing of videos. It can be realized by a frameskipping transcoder which transcodes only the frames required for playback at the desired fast speed. Various motion vector (MV) composition algorithms aim at reducing the computational complexity of the transcoder. They only perform fairly in limited skipped frames scenarios. In this paper, a new vector selection algorithm is proposed to compose a new motion vector (MV) from a set of candidate MVs for minimizing prediction errors due to a larger frame-skipping factor. Experimental results show that the proposed algorithm can deliver a remarkable improvement on the rate-distortion performance over other algorithms.
I. I NTRODUCTION Current video compression standards are basically developed for the purposes of efficient video storage and transmission, not browsing. This limitation is due to the use of motion compensated prediction for compression. As a result, recent video playback devices only offer limited numbers of controls for video browsing because motion compensation heavily utilizes the frame dependency and severely complicates the fast-forward/backward operations, which are the major functions for video browsing [1], [2]. Thus, fast-forward playback function requires more network bandwidth to send all the related frames in addition to the actual requested frames [3]– [5]. Besides, it also requires high computational capability in clients’ decoders to decode all these extra frames. One simple approach to implement the fast forward/backward playback is just to send and decode the I-frames. However, if the video applications involve an encoded bitstream with a very large GOP size or require high-precision in video-frame access, sending only I-frames may not be an acceptable solution. Some approaches have been proposed for the realization of fast playback. In [1], [3], [4], various dual-bitstream schemes were proposed to store the forward/backward-encoded bitstreams in the server for frame selection. Nevertheless, it approximately doubles the storage requirement for the server. To tackle this, a frame-skipping transcoding approach for fastforward playback was proposed in [2], [5]. Transcoding begins with full video decoding in the server, selects and re-encodes the frames required for fast playback at a desired speed, and sends the transcoded bitstream to the clients for decoding and display. It induces high computational complexity in the re-encoding process. Hence, some MV composition methods
978-1-4244-5309-2/10/$26.00 ©2010 IEEE
are used to expedite the re-encoding process. The forward dominant vector selection (FDVS) [6] is suggested to be the best in fast-forward frame-skipping transcoding [5]. However, FDVS does not work well for a large speed-up factor in fast video playback since the composition of new MVs may not represent the current macroblock (MB) anymore. In this case, the quality of the transcoded videos deteriorates. In this paper, we propose a more reliable algorithm to compose new MVs with a large speed-up factor. The MV composition is based on the relevant area of the current MB. It can also track several possible candidates related to the current MB and select the best candidate in transcoding. The organization of this paper is as follows. In Section II, we discuss the impacts of a large speed-up factor on the performance of FDVS. Section III describes our proposed algorithm for frame-skipping transcoding. Simulation results are presented in Section IV. Finally, some concluding remarks are provided in Section V. II. P ROBLEMS OF USING FDVS Fig. 1(a) illustrates an example of fast-forward transcoding using FDVS. Only four MBs are shown in each frame. We assume that M Bnk represents the kth MB in Frame n with k which uses Frame n − 1 as the reference the MV mvn→n−1 in Fig. 1(a). The fast-forward speed-up factor is 3 where Frame n − 1 and Frame n − 2 are skipped for fast playback, k becomes not valid. It is necessary to find the then mvn→n−1 new MV of M Bnk with Frame n − 3 as the reference, i.e. mvn22on 3
MBn13
3649
MBn3 2
MBn43
MBn2 2
MBn11
MBn4 2
MBn31
Frame n-2
Frame n-3
mv 1n o n 1
mvn21on2 MBn1 2
MBn23
MBn33
mv1non3 mv1non1 mvn21on2 mvn22on3
MBn41
MB
MB
MB
MBn33
MBn43
MBn3 2
2 n 3
Frame n-3
MBn4
MBn21 MB
MBn31
MBn4 2
Frame n-2
MBn3
mv1non 1
1 n 1
1 n2
Fig. 1.
MBn2
Frame n
mvn21on2 MBn2 2
1 n 3
MBn1
Frame n-1
(a)
mvn22on 3
MBn21
(b)
MBn41
Frame n-1
A working example of FDVS.
MBn1
MBn2
MBn3
MBn4
Frame n
k mvn→n−3 is shown in dotted arrow of Fig. 1(a). For each MB, FDVS selects one dominant MV carried by a dominant MB which has the largest overlapping segment with the motioncompensated MB of M Bnk in the previous frame. Considering the motion-compensated MB of M Bn1 overlaps with four 1 2 3 4 , M Bn−1 , M Bn−1 and M Bn−1 , in Frame n−1 MBs, M Bn−1 2 of Fig. 1(a), M Bn−1 is selected as the dominant MB while its 2 MV mvn−1→n−2 is the dominant MV. This dominant vector selection process is repeated until the non-skipped frame is 1 reached, i.e. Frame n−3 in this example. Therefore, mvn→n−3 is composed by summing up the selected dominant MVs and can be written as (1) 1 1 2 2 = mvn→n−1 + mvn−1→n−2 + mvn−2→n−3 (1) mvn→n−3
slash lines, which is the relevant region to the target M Bn1 is used to decide the next dominant MB in Frame n − 2. 4 is selected which contrasts to the selection Note that M Bn−2 2 is picked. Again, if Frame of original FDVS where M Bn−2 n − 2 is also dropped, only the cross-hatch shaded area in Frame n − 2 is used to determine the next dominant area in Frame n − 3. This mechanism ensures only relevant area of M Bn1 is employed in the MV composition. From Fig. 2, the 1 is different from the result obtained resultant MV mvn→n−3 by using FDVS in (1), and can be formed as (2)
FDVS can provide promising results for MV composition in transcoding and becomes the most popular technique compared with other existing algorithms, which aim to reduce the computational complexity of motion estimation processes in video encoders [5], [6]. However, FDVS does not work well for consecutively dropping a large number of frames, which is very common in fast-forward playback. This phenomenon can be explained as depicted in Fig. 1(b), which is redrawn 2 is selected to be the dominant from Fig. 1(a). Here, M Bn−1 2 is used to determine MB and the corresponding mvn−1→n−2 the dominant MB in Frame n − 2. It is observed that only the 2 shaded area of M Bn−1 is actually relevant to target M Bn1 . However, FDVS also utilizes the irrelevant non-shaded area 2 to compute dominant MB in Frame n − 2. The in M Bn−1 relevant area of M Bn1 further diminishes for more skipped frames. The cross-hatch shaded area only occupies a very 2 as shown in Fig. 1(b). It seriously minor portion of M Bn−2 affects the accuracy of the composed MVs since a large irrelevant area to the target M Bn1 is used to decide the dominant MB in Frame n − 3.
1) Merging Process: The objective of the second criterion is to maximize the relevant area used for dominant MB selection. Other non-dominant areas in the skipped frames, but relevant to M Bn1 , can also be utilized to enhance the usage of relevant area in M Bn1 . The reason is that the largest overlapping segment with the motion-compensated MB of M Bn1 sometimes may not be dominant enough compared to the second largest one as shown in Frame n−1 of Fig. 3(a). As a result, the relevant area may diminish after MV composition. In the example shown in Fig. 3(a), the cross-hatch shaded area in Frame n − 2 for selecting the next dominant MB becomes very small so it decreases the reliability of the resultant MV. To fully utilize the relevant area in M Bn1 , the proposed algorithm considers the homogeneity of MVs as well, which is essential to enlarge the relevant area for MV composition. 2 is now We reuse the example in Fig. 3(a), but mvn−1→n−2 4 equal to mvn−1→n−2 as shown in Fig. 3(b). In this case, 2 4 the shaded area overlapped with M Bn−1 and M Bn−1 could be combined, and this merging area is for deciding the next dominant MB in Frame n − 2. The selected MB in Frame 4 where the area relevant to M Bn1 is larger n − 2 is M Bn−2 and is more reliable to determine the dominant MB in Frame n − 3, compared to the case of Fig. 3(a). 2) Multiple-candidates: The merging process is appropriate for areas with homogeneous motion and it is particularly true
III. P ROPOSED V ECTOR S ELECTION A LGORITHM Based on the above observations of the FDVS process in section II, two criteria are set for our proposed algorithm in transcoding. First, only the area relevant to the target MB should be contributed for dominant MB selection. Second, the area relevant to the target MB should be kept as large as possible during MV composition.
1 1 2 4 = mvn→n−1 + mvn−1→n−2 + mvn−2→n−3 (2) mvn→n−3
B. Maximizing Relevant Area
mvn22on3 MBn13
MBn23
MBn1 2
mvn1on1
mvn21on2 MBn2 2
MBn21
MBn11
A. Only Using Relevant Area for Vector Selection According to the first criterion, Fig. 2 shows an improved 2 is chosen as the domimechanism for FDVS. When M Bn−1 nant MB in the first step of FDVS, only the shaded area with mv
mvn42on 3 MBn13
MBn23
MBn43
MBn33
Frame n-3
Fig. 2.
1 non 3
mv
1 non 1
mv
2 n 1on 2
mv
MBn2 2
MBn3 2
MBn4 2 Frame n-2
MBn11
MBn43
MBn3 2
Frame n-3
MB
MB
1 n 3
2 n 3
MB
mv
2 n1on2
MB
1 n2
2 n2
MBn31
MBn41
Frame n-1
MBn2
MBn3
MBn4
MBn33
Frame n-3
Frame n
Improved mechanism for FDVS using relevant area.
MBn43
Fig. 3.
3650
MBn3 2 Frame n-2
mv
mvn1on1
MB
MBn31
(b)
MBn4
2 n 1
MB
MBn4 2
MBn3
Frame n
4 n1on2
1 n 1
MBn21 MBn1
MBn41
MBn2
Frame n-1
(a)
mvn42on3
mvn1on 1
MBn31
MBn4 2 Frame n-2
4 n 2on 3
mvn21on2 MBn1 2
MBn33
MBn1
MBn41
Frame n-1
MBn1
MBn2
MBn3
MBn4
Frame n
Merging where the largest overlapping segment is not dominating.
mvn1on3 mvn1on1 mvn21on2 mvn22on3 MBn13
MBn23
MBn1 2
MBn2 2
C n42 C n52 MB
3 n 3
MB
4 n 3
mvn32on3 MBn13
MBn23
mvn21on2 MBn11
MB
MB
3 n2
4 n2
MBn2 2
C MBn33
MBn43
Frame n-3
Fig. 4.
Cn31
C n1 1
Cn41
Cn21
MB
C
4 n1
C
1 n2
MBn4 2
MBn3 2 Frame n-2
1 n1 2 n1
C
MBn41
Frame n-1
E-FDVS
MCVS-2
MCVS-4
3000
2500
2000
1500
2
3
4
5 6 Speed-up factor
7
8
9
Fig. 5. Total generated bits in the “Coastguard” sequence transcoded by different MV composition algorithms for fast-forward playback at 9 time the normal speed.
34.2
MBn2
MBn3
MBn4
FullSearch
FDVS
E-FDVS
MCVS-2
MCVS-4
mvn1on1
34.0 33.9 33.8 33.7
MBn1
33.6
MBn2
C
MBn31
FDVS
1000
MBn21 3 n1
FullSearch
34.3
4 n 1
MBn11
3500
34.1
MB
3 n 1
mvn41on2
4000
MBn21
mvn1on3 mvn1on1 mvn41on2 mvn32on3 MBn1 2
Simulations have been performed to evaluate the overall efficiency of various MV composition algorithms in fastforward playback. The H.264 reference codec (JM9.2) was employed to pre-encode the test sequences of CIF format (352×288 pixels) with 200 frames, including “Salesman”, “Foreman”, “Mobile”, “Coastguard”, and “Tempete”, at 30 frames/s with a fixed quantization parameter. Their first frame was encoded as I-frame, while others were P-frames in which a full-search motion estimation algorithm with a search window of 31×31 pixels was used to determine the MVs in the preencoded videos. The pre-encoded videos were then transcoded to various fast-forward videos with speed-up factors of 3, 5, 7, and 9. All of the picture types and quantization parameter were preserved during transcoding. For comparison, the fullsearch motion estimation (FS), the forward dominant vector selection algorithm (FDVS) [6], the extended version of FDVS (E-FDVS) [7], and the proposed multiple-candidate vector selection (MCVS-C) were used to obtain the MVs of the transcoded videos. In MCVS-C, C represents the number of candidate MBs selected for each skipped frame. In our simulations, C was set to 2 and 4 represented by MCVS-2
mvn1on1
MBn1
Cn22 C n32
IV. S IMULATION R ESULTS
PSNR (dB)
mvn22on3
candidates is necessary. Note that the number of candidates can be selected by users according to the number of skipped frames and the desired video quality.
Size of the transcoded bitstream (kbits)
for MBs in the background and inside the moving objects. At the object boundary of a video object, we suggest using more than one candidate MB in order to expand the area relevant to the target MB in the MV composition. Assume i is the ith candidate in Frame n − 1 sorted by the that Cn−1 area of the overlapping segment. In Fig. 4, two candidate MBs are used to compose the MV for each step. In Frame 1 2 and Cn−1 are the largest and second largest n − 1, Cn−1 overlapping segments with the motion-compensated MB of 2 4 and M Bn−1 M Bn1 , respectively. Therefore, both M Bn−1 are used to determine the next dominant MBs in Frame 2 4 and M Bn−1 n − 2 because both shaded areas in M Bn−1 1 are relevant to M Bn . 2 From the top diagram of Fig. 4, four candidates (Cn−2 , 3 4 5 Cn−2 , Cn−2 , and Cn−2 ) due to the motion-compensated 1 are considered for the next step. In addition, segment of Cn−1 1 Cn−2 contributed from the motion-compensated segment of 2 is also regarded as one of the possible candidates, as Cn−1 depicted in the bottom diagram of Fig. 4. Since two candidates 1 2 are used for each step, Cn−2 and Cn−2 are chosen as the largest and second largest overlapping segments with their 4 2 and M Bn−1 , respectively. The corresponding MBs, M Bn−1 top diagram of Fig. 4 shows the same procedure of the MV composition as illustrated in Fig. 3(a). The bottom diagram gives an alternative path to compose the new MV, which uses the second largest candidate MB in Frame n − 1. From Fig. 4, we observe that the cross-hatch shaded area in Frame n − 2 of the bottom diagram, which is relevant to M Bn1 and is used to decide the dominant MB in Frame n − 3, is larger than 1 that of the top diagram. In other words, even though Cn−1 represents the largest overlapping segment in the first skipped frame, Frame n−1, it cannot guarantee that it is still the largest overlapping segment in the next skipped frame, Frame n − 2. The use of multiple-candidate MBs for each skipped frame can increase the possibility of keeping the MBs with a large relevant area to the target MB during the MV composition. Since only two frames are skipped in this working example, two candidates are sufficiently enough for each skipped frame. When more frames are skipped, a larger number of possible
33.5
MBn3
Frame n
Multiple-candidate MB selection at object boundary.
33.4
MBn4
2
3
4
5 6 Speed-up factor
7
8
9
Fig. 6. Average PSNR results in the “Coastguard” sequence transcoded by different MV composition algorithms for fast-forward playback at 9 time the normal speed.
3651
TABLE I P ERFORMANCE OF DIFFERENT ALGORITHMS AT A SPEED - UP FACTOR OF 5 FOR VARIOUS SEQUENCES . Speed-up factors Coastguard Foreman Mobile Salesman Tempete
Full Search Bits PSNR 2019
33.87
1234
36.52
2465
32.48
468
38.30
2207
33.49
FDVS Bits (Δkbits) 2253 (+11.59%) 1525 (+23.57%) 2835 (+14.90%) 501 (+7.00%) 2472 (+12.00%)
PSNR (ΔdB) 33.61 (-0.26) 35.80 (-0.72) 32.35 (-0.13) 38.01 (-0.29) 33.28 (-0.21)
E-FDVS Bits PSNR (Δkbits) (ΔdB) 2231 35.62 (+10.51%) (-0.25) 1330 35.91 (+7.80%) (-0.61) 2808 32.37 (+13.90%) (-0.11) 496 38.02 (+6.00%) (-0.28) 2411 33.33 (+9.24%) (-0.16)
MCVS-2 Bits PSNR (Δkbits) (ΔdB) 2148 33.72 (+6.39%) (-0.15) 1253 36.15 (+1.54%) (-0.37) 2648 32.42 (+7.41%) (-0.06) 489 38.12 (+4.40%) (-0.18) 2326 33.42 (+5.40%) (-0.07)
MCVS-4 Bits PSNR (Δkbits) (ΔdB) 2110 33.79 (+4.54%) (-0.08) 1200 36.35 (-2.79%) (-0.17) 2550 32.45 (+3.45%) (-0.03) 483 38.18 (+3.10%) (-0.12) 2244 33.47 (+1.70%) (-0.02)
TABLE II P ERFORMANCE OF DIFFERENT ALGORITHMS WITH VARIOUS SPEED - UP FACTORS FOR TRANSCODING THE “C OASTGUARD ” SEQUENCE . (PSNR : dB and bits : kbits) Speed-up factors 3 5 7 9
Full Search Bits PSNR 2780
34.02
2019
33.87
1635
33.84
1387
33.87
FDVS Bits (Δkbits) 2898 (+4.25%) 2253 (+11.59%) 1913 (+17.02%) 1703 (+22.78%)
PSNR (ΔdB) 33.87 (-0.15) 33.61 (-0.26) 33.51 (-0.33) 33.51 (-0.36)
E-FDVS Bits PSNR (Δkbits) (ΔdB) 2896 33.87 (+4.16%) (-0.15) 2231 35.62 (+10.51%) (-0.25) 1857 33.53 (+13.50%) (-0.31) 1607 33.54 (+15.84%) (-0.33)
and MCVS-4, respectively.
MCVS-2 Bits PSNR (Δkbits) (ΔdB) 2831 33.94 (+1.81%) (-0.08) 2148 33.72 (+6.39%) (-0.15) 1784 33.63 (+9.13%) (-0.21) 1550 33.66 (+11.73%) (-0.21)
MCVS-4 Bits PSNR (Δkbits) (ΔdB) 2805 33.97 (+0.89%) (-0.05) 2110 33.79 (+4.54%) (-0.08) 1743 33.72 (+6.57%) (-0.12) 1509 33.72 (+8.76%) (-0.15)
V. C ONCLUSION
Table I lists the transcoding results with a speed-up factor of 5 on different test sequences. The PSNR result for each transcoded frame is computed by comparing each transcoded frame with its original, uncompressed frame. In the table, ΔdB and Δkbits represent a PSNR change and a percentage change in total bits respectively when compared to FS. The positive values mean increments whereas negative values mean decrements. It is observed that the bits to be generated for the proposed algorithm are much fewer than that of FDVS and E-FDVS, especially in MCVS-4. It is because our proposed algorithm utilizes only the relevant area of the target MB in the MV composition and multiple-candidate selection plays an important role to keep relevant area as large as possible across the skipped frames. Regarding the computational complexity of the proposed MCVS-C, the only major overhead is the SAD calculation required for the final selection of different resulted MVs from different candidates. These extra computations could be neglected, as compared with FS. Table II shows results by different algorithms for transcoding the “Coastguard” sequence with speed-up factors of 3, 5, 7, and 9. It is observed that the performances of FDVS and E-FDVS get worse as the speed-up factor increases. MCVS-2 outperforms E-FDVS and FDVS in terms of both PSNR and total generated bits for the cases with large speed-up factors. MCVS-4 can provide further reduction in generated bits, as shown in Table II. For fast-forward playback at 9 times the normal speed, Δkbits of MCVS-4 is reduced to 8.76% while it is 15.84% in E-FDVS and 22.8% in FDVS. The total bits generated and average PSNR obtained by different MV selection algorithms for the “Coastguard” sequence with speed-up factors ranged from 2 to 9 were also plotted in Fig. 5 and Fig. 6, respectively. It is significant to note that gaps in both PSNR and generated bits between FS and MCVS-4 become narrower. From these statistics, we can conclude that the proposed MVCS can provide outstanding results, especially in the case of a large speed-up factor.
In this paper, we have proposed a novel MV composition algorithm for realizing fast-forward playback of a pre-encoded video by frame-skipping transcoding. Our proposed multiplecandidate vector selection (MCVS) algorithm fully makes use of relevant areas to the target MB, and it is beneficial to perform fast-forward playback with a large speed-up factor. Its performance verified experimentally in terms of both quality and bits is substantially better than that of FDVS and E-FDVS. Besides, the proposed MCVS is adaptive in nature, and the number of candidate MBs can be adjusted according to the speed-up factor. ACKNOWLEDGMENT The work described in this paper is partially supported by the Centre for Signal Processing, Department of EIE, PolyU and a grant from the Research Grants Council of the HKSAR, China (PolyU 5120/07E). Tsz-Kwan Lee acknowledges the research studentships provided by the University. R EFERENCES [1] C.-H. Fu, Y.-L. Chan, T.-P. Ip, and W.-C. Siu, “New architecture for mpeg video streaming system with backward playback support,” Image Processing, IEEE Transactions on, no. 9, pp. 2169–2183, Sept. 2007. [2] Y.-P. Tan, Y. Liang, and J. Yu, “Video transcoding for fast forward/reverse video playback,” in Proceedings of International Conference on Image Processing, (ICIP 2002), September 2002, pp. 713–716. [3] S.-Y. Huang, “Improved techniques for dual-bitstream MPEG video streaming with VCR functionalities,” IEEE Trans. Consumer Electronics, vol. 49, no. 4, pp. 1153–1160, November 2003. [4] T.-P. Ip, Y.-L. Chan, and W.-C. Siu, “Redundancy reduction technique for dual-bitstream MPEG video streaming with VCR functionalities,” IEEE Trans. Broadcasting, vol. 54, no. 3, pp. 412–418, September 2008. [5] Y.-P. Tan and Y. Liang, “A unified transcoding approach to fast forward and reverse playback of compressed video,” IEEE Trans. Consumer Electronics,, vol. 49, no. 4, pp. 1098–1105, November 2003. [6] J. Youn, M. T. Sun, and C. W. Lin, “Motion vector refinement for high-performance transcoding,” IEEE Transactions on Multimedia, vol. 1, no. 1, pp. 30–40, March 1999. [7] S. Yang, D. Kim, Y. Jeon, and J. Jeong, “An efficient motion re-estimation algorithm for frame-skipping video transcoding,” in Proceedings of International Conference on Image Processing, (ICIP 2005), September 2005, pp. 668–671.
3652