Document not found! Please try again

A New Motion Vector Composition Algorithm for Fast-Forward Video

0 downloads 0 Views 124KB Size Report
compensated prediction for compression. As a result, recent video playback devices only offer limited numbers of controls for video browsing because motion ...
A New Motion Vector Composition Algorithm for Fast-forward Video Playback in H.264 Tsz-Kwan Lee, Chang-Hong Fu, Yui-Lam Chan, and Wan-Chi Siu Centre for Signal Processing, Department of Electronic and Information Engineering The Hong Kong Polytechnic University, Hung Hom, Kowloon, Hong Kong

Abstract— With the rapid growth of streaming digital videos, it is desirable to access video segments of interest by searching through the video contents with a faster speed than a normal playback. Fast-forward playback is the key function that enables quick browsing of videos. It can be realized by a frameskipping transcoder which transcodes only the frames required for playback at the desired fast speed. Various motion vector (MV) composition algorithms aim at reducing the computational complexity of the transcoder. They only perform fairly in limited skipped frames scenarios. In this paper, a new vector selection algorithm is proposed to compose a new motion vector (MV) from a set of candidate MVs for minimizing prediction errors due to a larger frame-skipping factor. Experimental results show that the proposed algorithm can deliver a remarkable improvement on the rate-distortion performance over other algorithms.

I. I NTRODUCTION Current video compression standards are basically developed for the purposes of efficient video storage and transmission, not browsing. This limitation is due to the use of motion compensated prediction for compression. As a result, recent video playback devices only offer limited numbers of controls for video browsing because motion compensation heavily utilizes the frame dependency and severely complicates the fast-forward/backward operations, which are the major functions for video browsing [1], [2]. Thus, fast-forward playback function requires more network bandwidth to send all the related frames in addition to the actual requested frames [3]– [5]. Besides, it also requires high computational capability in clients’ decoders to decode all these extra frames. One simple approach to implement the fast forward/backward playback is just to send and decode the I-frames. However, if the video applications involve an encoded bitstream with a very large GOP size or require high-precision in video-frame access, sending only I-frames may not be an acceptable solution. Some approaches have been proposed for the realization of fast playback. In [1], [3], [4], various dual-bitstream schemes were proposed to store the forward/backward-encoded bitstreams in the server for frame selection. Nevertheless, it approximately doubles the storage requirement for the server. To tackle this, a frame-skipping transcoding approach for fastforward playback was proposed in [2], [5]. Transcoding begins with full video decoding in the server, selects and re-encodes the frames required for fast playback at a desired speed, and sends the transcoded bitstream to the clients for decoding and display. It induces high computational complexity in the re-encoding process. Hence, some MV composition methods

978-1-4244-5309-2/10/$26.00 ©2010 IEEE

are used to expedite the re-encoding process. The forward dominant vector selection (FDVS) [6] is suggested to be the best in fast-forward frame-skipping transcoding [5]. However, FDVS does not work well for a large speed-up factor in fast video playback since the composition of new MVs may not represent the current macroblock (MB) anymore. In this case, the quality of the transcoded videos deteriorates. In this paper, we propose a more reliable algorithm to compose new MVs with a large speed-up factor. The MV composition is based on the relevant area of the current MB. It can also track several possible candidates related to the current MB and select the best candidate in transcoding. The organization of this paper is as follows. In Section II, we discuss the impacts of a large speed-up factor on the performance of FDVS. Section III describes our proposed algorithm for frame-skipping transcoding. Simulation results are presented in Section IV. Finally, some concluding remarks are provided in Section V. II. P ROBLEMS OF USING FDVS Fig. 1(a) illustrates an example of fast-forward transcoding using FDVS. Only four MBs are shown in each frame. We assume that M Bnk represents the kth MB in Frame n with k which uses Frame n − 1 as the reference the MV mvn→n−1 in Fig. 1(a). The fast-forward speed-up factor is 3 where Frame n − 1 and Frame n − 2 are skipped for fast playback, k becomes not valid. It is necessary to find the then mvn→n−1 new MV of M Bnk with Frame n − 3 as the reference, i.e. mvn22on 3

MBn13

3649

MBn3 2

MBn43

MBn2 2

MBn11

MBn4 2

MBn31

Frame n-2

Frame n-3

mv 1n o n 1

mvn21on2 MBn1 2

MBn23

MBn33

mv1non3 mv1non1  mvn21on2  mvn22on3

MBn41

MB

MB

MB

MBn33

MBn43

MBn3 2

2 n 3

Frame n-3

MBn4

MBn21 MB

MBn31

MBn4 2

Frame n-2

MBn3

mv1non 1

1 n 1

1 n2

Fig. 1.

MBn2

Frame n

mvn21on2 MBn2 2

1 n 3

MBn1

Frame n-1

(a)

mvn22on 3

MBn21

(b)

MBn41

Frame n-1

A working example of FDVS.

MBn1

MBn2

MBn3

MBn4

Frame n

k mvn→n−3 is shown in dotted arrow of Fig. 1(a). For each MB, FDVS selects one dominant MV carried by a dominant MB which has the largest overlapping segment with the motioncompensated MB of M Bnk in the previous frame. Considering the motion-compensated MB of M Bn1 overlaps with four 1 2 3 4 , M Bn−1 , M Bn−1 and M Bn−1 , in Frame n−1 MBs, M Bn−1 2 of Fig. 1(a), M Bn−1 is selected as the dominant MB while its 2 MV mvn−1→n−2 is the dominant MV. This dominant vector selection process is repeated until the non-skipped frame is 1 reached, i.e. Frame n−3 in this example. Therefore, mvn→n−3 is composed by summing up the selected dominant MVs and can be written as (1) 1 1 2 2 = mvn→n−1 + mvn−1→n−2 + mvn−2→n−3 (1) mvn→n−3

slash lines, which is the relevant region to the target M Bn1 is used to decide the next dominant MB in Frame n − 2. 4 is selected which contrasts to the selection Note that M Bn−2 2 is picked. Again, if Frame of original FDVS where M Bn−2 n − 2 is also dropped, only the cross-hatch shaded area in Frame n − 2 is used to determine the next dominant area in Frame n − 3. This mechanism ensures only relevant area of M Bn1 is employed in the MV composition. From Fig. 2, the 1 is different from the result obtained resultant MV mvn→n−3 by using FDVS in (1), and can be formed as (2)

FDVS can provide promising results for MV composition in transcoding and becomes the most popular technique compared with other existing algorithms, which aim to reduce the computational complexity of motion estimation processes in video encoders [5], [6]. However, FDVS does not work well for consecutively dropping a large number of frames, which is very common in fast-forward playback. This phenomenon can be explained as depicted in Fig. 1(b), which is redrawn 2 is selected to be the dominant from Fig. 1(a). Here, M Bn−1 2 is used to determine MB and the corresponding mvn−1→n−2 the dominant MB in Frame n − 2. It is observed that only the 2 shaded area of M Bn−1 is actually relevant to target M Bn1 . However, FDVS also utilizes the irrelevant non-shaded area 2 to compute dominant MB in Frame n − 2. The in M Bn−1 relevant area of M Bn1 further diminishes for more skipped frames. The cross-hatch shaded area only occupies a very 2 as shown in Fig. 1(b). It seriously minor portion of M Bn−2 affects the accuracy of the composed MVs since a large irrelevant area to the target M Bn1 is used to decide the dominant MB in Frame n − 3.

1) Merging Process: The objective of the second criterion is to maximize the relevant area used for dominant MB selection. Other non-dominant areas in the skipped frames, but relevant to M Bn1 , can also be utilized to enhance the usage of relevant area in M Bn1 . The reason is that the largest overlapping segment with the motion-compensated MB of M Bn1 sometimes may not be dominant enough compared to the second largest one as shown in Frame n−1 of Fig. 3(a). As a result, the relevant area may diminish after MV composition. In the example shown in Fig. 3(a), the cross-hatch shaded area in Frame n − 2 for selecting the next dominant MB becomes very small so it decreases the reliability of the resultant MV. To fully utilize the relevant area in M Bn1 , the proposed algorithm considers the homogeneity of MVs as well, which is essential to enlarge the relevant area for MV composition. 2 is now We reuse the example in Fig. 3(a), but mvn−1→n−2 4 equal to mvn−1→n−2 as shown in Fig. 3(b). In this case, 2 4 the shaded area overlapped with M Bn−1 and M Bn−1 could be combined, and this merging area is for deciding the next dominant MB in Frame n − 2. The selected MB in Frame 4 where the area relevant to M Bn1 is larger n − 2 is M Bn−2 and is more reliable to determine the dominant MB in Frame n − 3, compared to the case of Fig. 3(a). 2) Multiple-candidates: The merging process is appropriate for areas with homogeneous motion and it is particularly true

III. P ROPOSED V ECTOR S ELECTION A LGORITHM Based on the above observations of the FDVS process in section II, two criteria are set for our proposed algorithm in transcoding. First, only the area relevant to the target MB should be contributed for dominant MB selection. Second, the area relevant to the target MB should be kept as large as possible during MV composition.

1 1 2 4 = mvn→n−1 + mvn−1→n−2 + mvn−2→n−3 (2) mvn→n−3

B. Maximizing Relevant Area

mvn22on3 MBn13

MBn23

MBn1 2

mvn1on1

mvn21on2 MBn2 2

MBn21

MBn11

A. Only Using Relevant Area for Vector Selection According to the first criterion, Fig. 2 shows an improved 2 is chosen as the domimechanism for FDVS. When M Bn−1 nant MB in the first step of FDVS, only the shaded area with mv

mvn42on 3 MBn13

MBn23

MBn43

MBn33

Frame n-3

Fig. 2.

1 non 3

mv

1 non 1

 mv

2 n 1on 2

 mv

MBn2 2

MBn3 2

MBn4 2 Frame n-2

MBn11

MBn43

MBn3 2

Frame n-3

MB

MB

1 n 3

2 n 3

MB

mv

2 n1on2

MB

1 n2

2 n2

MBn31

MBn41

Frame n-1

MBn2

MBn3

MBn4

MBn33

Frame n-3

Frame n

Improved mechanism for FDVS using relevant area.

MBn43

Fig. 3.

3650

MBn3 2 Frame n-2

mv

mvn1on1

MB

MBn31

(b)

MBn4

2 n 1

MB

MBn4 2

MBn3

Frame n

4 n1on2

1 n 1

MBn21 MBn1

MBn41

MBn2

Frame n-1

(a)

mvn42on3

mvn1on 1

MBn31

MBn4 2 Frame n-2

4 n 2on 3

mvn21on2 MBn1 2

MBn33

MBn1

MBn41

Frame n-1

MBn1

MBn2

MBn3

MBn4

Frame n

Merging where the largest overlapping segment is not dominating.

mvn1on3 mvn1on1  mvn21on2  mvn22on3 MBn13

MBn23

MBn1 2

MBn2 2

C n42 C n52 MB

3 n 3

MB

4 n 3

mvn32on3 MBn13

MBn23

mvn21on2 MBn11

MB

MB

3 n2

4 n2

MBn2 2

C MBn33

MBn43

Frame n-3

Fig. 4.

Cn31

C n1 1

Cn41

Cn21

MB

C

4 n1

C

1 n2

MBn4 2

MBn3 2 Frame n-2

1 n1 2 n1

C

MBn41

Frame n-1

E-FDVS

MCVS-2

MCVS-4

3000

2500

2000

1500

2

3

4

5 6 Speed-up factor

7

8

9

Fig. 5. Total generated bits in the “Coastguard” sequence transcoded by different MV composition algorithms for fast-forward playback at 9 time the normal speed.

34.2

MBn2

MBn3

MBn4

FullSearch

FDVS

E-FDVS

MCVS-2

MCVS-4

mvn1on1

34.0 33.9 33.8 33.7

MBn1

33.6

MBn2

C

MBn31

FDVS

1000

MBn21 3 n1

FullSearch

34.3

4 n 1

MBn11

3500

34.1

MB

3 n 1

mvn41on2

4000

MBn21

mvn1on3 mvn1on1  mvn41on2  mvn32on3 MBn1 2

Simulations have been performed to evaluate the overall efficiency of various MV composition algorithms in fastforward playback. The H.264 reference codec (JM9.2) was employed to pre-encode the test sequences of CIF format (352×288 pixels) with 200 frames, including “Salesman”, “Foreman”, “Mobile”, “Coastguard”, and “Tempete”, at 30 frames/s with a fixed quantization parameter. Their first frame was encoded as I-frame, while others were P-frames in which a full-search motion estimation algorithm with a search window of 31×31 pixels was used to determine the MVs in the preencoded videos. The pre-encoded videos were then transcoded to various fast-forward videos with speed-up factors of 3, 5, 7, and 9. All of the picture types and quantization parameter were preserved during transcoding. For comparison, the fullsearch motion estimation (FS), the forward dominant vector selection algorithm (FDVS) [6], the extended version of FDVS (E-FDVS) [7], and the proposed multiple-candidate vector selection (MCVS-C) were used to obtain the MVs of the transcoded videos. In MCVS-C, C represents the number of candidate MBs selected for each skipped frame. In our simulations, C was set to 2 and 4 represented by MCVS-2

mvn1on1

MBn1

Cn22 C n32

IV. S IMULATION R ESULTS

PSNR (dB)

mvn22on3

candidates is necessary. Note that the number of candidates can be selected by users according to the number of skipped frames and the desired video quality.

Size of the transcoded bitstream (kbits)

for MBs in the background and inside the moving objects. At the object boundary of a video object, we suggest using more than one candidate MB in order to expand the area relevant to the target MB in the MV composition. Assume i is the ith candidate in Frame n − 1 sorted by the that Cn−1 area of the overlapping segment. In Fig. 4, two candidate MBs are used to compose the MV for each step. In Frame 1 2 and Cn−1 are the largest and second largest n − 1, Cn−1 overlapping segments with the motion-compensated MB of 2 4 and M Bn−1 M Bn1 , respectively. Therefore, both M Bn−1 are used to determine the next dominant MBs in Frame 2 4 and M Bn−1 n − 2 because both shaded areas in M Bn−1 1 are relevant to M Bn . 2 From the top diagram of Fig. 4, four candidates (Cn−2 , 3 4 5 Cn−2 , Cn−2 , and Cn−2 ) due to the motion-compensated 1 are considered for the next step. In addition, segment of Cn−1 1 Cn−2 contributed from the motion-compensated segment of 2 is also regarded as one of the possible candidates, as Cn−1 depicted in the bottom diagram of Fig. 4. Since two candidates 1 2 are used for each step, Cn−2 and Cn−2 are chosen as the largest and second largest overlapping segments with their 4 2 and M Bn−1 , respectively. The corresponding MBs, M Bn−1 top diagram of Fig. 4 shows the same procedure of the MV composition as illustrated in Fig. 3(a). The bottom diagram gives an alternative path to compose the new MV, which uses the second largest candidate MB in Frame n − 1. From Fig. 4, we observe that the cross-hatch shaded area in Frame n − 2 of the bottom diagram, which is relevant to M Bn1 and is used to decide the dominant MB in Frame n − 3, is larger than 1 that of the top diagram. In other words, even though Cn−1 represents the largest overlapping segment in the first skipped frame, Frame n−1, it cannot guarantee that it is still the largest overlapping segment in the next skipped frame, Frame n − 2. The use of multiple-candidate MBs for each skipped frame can increase the possibility of keeping the MBs with a large relevant area to the target MB during the MV composition. Since only two frames are skipped in this working example, two candidates are sufficiently enough for each skipped frame. When more frames are skipped, a larger number of possible

33.5

MBn3

Frame n

Multiple-candidate MB selection at object boundary.

33.4

MBn4

2

3

4

5 6 Speed-up factor

7

8

9

Fig. 6. Average PSNR results in the “Coastguard” sequence transcoded by different MV composition algorithms for fast-forward playback at 9 time the normal speed.

3651

TABLE I P ERFORMANCE OF DIFFERENT ALGORITHMS AT A SPEED - UP FACTOR OF 5 FOR VARIOUS SEQUENCES . Speed-up factors Coastguard Foreman Mobile Salesman Tempete

Full Search Bits PSNR 2019

33.87

1234

36.52

2465

32.48

468

38.30

2207

33.49

FDVS Bits (Δkbits) 2253 (+11.59%) 1525 (+23.57%) 2835 (+14.90%) 501 (+7.00%) 2472 (+12.00%)

PSNR (ΔdB) 33.61 (-0.26) 35.80 (-0.72) 32.35 (-0.13) 38.01 (-0.29) 33.28 (-0.21)

E-FDVS Bits PSNR (Δkbits) (ΔdB) 2231 35.62 (+10.51%) (-0.25) 1330 35.91 (+7.80%) (-0.61) 2808 32.37 (+13.90%) (-0.11) 496 38.02 (+6.00%) (-0.28) 2411 33.33 (+9.24%) (-0.16)

MCVS-2 Bits PSNR (Δkbits) (ΔdB) 2148 33.72 (+6.39%) (-0.15) 1253 36.15 (+1.54%) (-0.37) 2648 32.42 (+7.41%) (-0.06) 489 38.12 (+4.40%) (-0.18) 2326 33.42 (+5.40%) (-0.07)

MCVS-4 Bits PSNR (Δkbits) (ΔdB) 2110 33.79 (+4.54%) (-0.08) 1200 36.35 (-2.79%) (-0.17) 2550 32.45 (+3.45%) (-0.03) 483 38.18 (+3.10%) (-0.12) 2244 33.47 (+1.70%) (-0.02)

TABLE II P ERFORMANCE OF DIFFERENT ALGORITHMS WITH VARIOUS SPEED - UP FACTORS FOR TRANSCODING THE “C OASTGUARD ” SEQUENCE . (PSNR : dB and bits : kbits) Speed-up factors 3 5 7 9

Full Search Bits PSNR 2780

34.02

2019

33.87

1635

33.84

1387

33.87

FDVS Bits (Δkbits) 2898 (+4.25%) 2253 (+11.59%) 1913 (+17.02%) 1703 (+22.78%)

PSNR (ΔdB) 33.87 (-0.15) 33.61 (-0.26) 33.51 (-0.33) 33.51 (-0.36)

E-FDVS Bits PSNR (Δkbits) (ΔdB) 2896 33.87 (+4.16%) (-0.15) 2231 35.62 (+10.51%) (-0.25) 1857 33.53 (+13.50%) (-0.31) 1607 33.54 (+15.84%) (-0.33)

and MCVS-4, respectively.

MCVS-2 Bits PSNR (Δkbits) (ΔdB) 2831 33.94 (+1.81%) (-0.08) 2148 33.72 (+6.39%) (-0.15) 1784 33.63 (+9.13%) (-0.21) 1550 33.66 (+11.73%) (-0.21)

MCVS-4 Bits PSNR (Δkbits) (ΔdB) 2805 33.97 (+0.89%) (-0.05) 2110 33.79 (+4.54%) (-0.08) 1743 33.72 (+6.57%) (-0.12) 1509 33.72 (+8.76%) (-0.15)

V. C ONCLUSION

Table I lists the transcoding results with a speed-up factor of 5 on different test sequences. The PSNR result for each transcoded frame is computed by comparing each transcoded frame with its original, uncompressed frame. In the table, ΔdB and Δkbits represent a PSNR change and a percentage change in total bits respectively when compared to FS. The positive values mean increments whereas negative values mean decrements. It is observed that the bits to be generated for the proposed algorithm are much fewer than that of FDVS and E-FDVS, especially in MCVS-4. It is because our proposed algorithm utilizes only the relevant area of the target MB in the MV composition and multiple-candidate selection plays an important role to keep relevant area as large as possible across the skipped frames. Regarding the computational complexity of the proposed MCVS-C, the only major overhead is the SAD calculation required for the final selection of different resulted MVs from different candidates. These extra computations could be neglected, as compared with FS. Table II shows results by different algorithms for transcoding the “Coastguard” sequence with speed-up factors of 3, 5, 7, and 9. It is observed that the performances of FDVS and E-FDVS get worse as the speed-up factor increases. MCVS-2 outperforms E-FDVS and FDVS in terms of both PSNR and total generated bits for the cases with large speed-up factors. MCVS-4 can provide further reduction in generated bits, as shown in Table II. For fast-forward playback at 9 times the normal speed, Δkbits of MCVS-4 is reduced to 8.76% while it is 15.84% in E-FDVS and 22.8% in FDVS. The total bits generated and average PSNR obtained by different MV selection algorithms for the “Coastguard” sequence with speed-up factors ranged from 2 to 9 were also plotted in Fig. 5 and Fig. 6, respectively. It is significant to note that gaps in both PSNR and generated bits between FS and MCVS-4 become narrower. From these statistics, we can conclude that the proposed MVCS can provide outstanding results, especially in the case of a large speed-up factor.

In this paper, we have proposed a novel MV composition algorithm for realizing fast-forward playback of a pre-encoded video by frame-skipping transcoding. Our proposed multiplecandidate vector selection (MCVS) algorithm fully makes use of relevant areas to the target MB, and it is beneficial to perform fast-forward playback with a large speed-up factor. Its performance verified experimentally in terms of both quality and bits is substantially better than that of FDVS and E-FDVS. Besides, the proposed MCVS is adaptive in nature, and the number of candidate MBs can be adjusted according to the speed-up factor. ACKNOWLEDGMENT The work described in this paper is partially supported by the Centre for Signal Processing, Department of EIE, PolyU and a grant from the Research Grants Council of the HKSAR, China (PolyU 5120/07E). Tsz-Kwan Lee acknowledges the research studentships provided by the University. R EFERENCES [1] C.-H. Fu, Y.-L. Chan, T.-P. Ip, and W.-C. Siu, “New architecture for mpeg video streaming system with backward playback support,” Image Processing, IEEE Transactions on, no. 9, pp. 2169–2183, Sept. 2007. [2] Y.-P. Tan, Y. Liang, and J. Yu, “Video transcoding for fast forward/reverse video playback,” in Proceedings of International Conference on Image Processing, (ICIP 2002), September 2002, pp. 713–716. [3] S.-Y. Huang, “Improved techniques for dual-bitstream MPEG video streaming with VCR functionalities,” IEEE Trans. Consumer Electronics, vol. 49, no. 4, pp. 1153–1160, November 2003. [4] T.-P. Ip, Y.-L. Chan, and W.-C. Siu, “Redundancy reduction technique for dual-bitstream MPEG video streaming with VCR functionalities,” IEEE Trans. Broadcasting, vol. 54, no. 3, pp. 412–418, September 2008. [5] Y.-P. Tan and Y. Liang, “A unified transcoding approach to fast forward and reverse playback of compressed video,” IEEE Trans. Consumer Electronics,, vol. 49, no. 4, pp. 1098–1105, November 2003. [6] J. Youn, M. T. Sun, and C. W. Lin, “Motion vector refinement for high-performance transcoding,” IEEE Transactions on Multimedia, vol. 1, no. 1, pp. 30–40, March 1999. [7] S. Yang, D. Kim, Y. Jeon, and J. Jeong, “An efficient motion re-estimation algorithm for frame-skipping video transcoding,” in Proceedings of International Conference on Image Processing, (ICIP 2005), September 2005, pp. 668–671.

3652

Suggest Documents