Non-linear Motion-compensated Interpolation for Low Bit Rate Video

7 downloads 0 Views 253KB Size Report
Unlike other motion-compensated interpolation methods, which assume a constant motion velocity between two reference P frames, the proposed scheme takes ...
Non-linear Motion-compensated Interpolation for Low Bit Rate Video Shan Liu, JongWon Kim and C. -C. Jay Kuo Integrated Media Systems Center and Department of Electrical Engineering-Systems University of Southern California, Los Angeles, CA 90089-2564 Email: fshanl, jongwon, [email protected]

ABSTRACT By augmenting the ITU-T H.263 standard bit stream with supplementary motion vectors for to-be-interpolated frames, a new deformable block-based fast motion-compensated frame interpolation (DB-FMCI) scheme is presented. Unlike other motion-compensated interpolation methods, which assume a constant motion velocity between two reference P frames, the proposed scheme takes into account the non-linearity of motion to achieve a better interpolation result. The supplementary motion information for the so-called M frame (motion frame) is de ned, which consists of compressed residues of linear and non-linear motion vectors. The non-linear motion vectors of skipped frames are used at the decoder to determine the 6-parameter ane-based DB-FMCI. Experimental results show that the proposed non-linear enhancement scheme can achieve a higher PSNR value and better visual quality in comparison with traditional methods based only on the linear motion assumption. Frame rate up-conversion, motion-compensated interpolation, non-linear motion vector, deformableblock motion-compensted interpolation, H.263.

Keywords:

1 Introduction Frame interpolation has been widely adopted in low bit rate video applications such as frame rate up-conversion and error concealment. Due to the limited bandwidth, some low bit rate video applications apply the temporal subsampling technique (i.e. frame skipping) to meet the desired compression ratios. Under this scenario, frames in the image sequence are periodically skipped from encoding and transmission so that the decoder can only reconstruct one part of the sequence instead of the complete video content. As a result, temporal artifacts such as motion jerkiness are introduced, and visual quality of decoded video is signi cantly degraded. Frame interpolation is used at the decoder to re-generate skipped frames and to reduce temporal artifacts. Furthermore, an encoded frame may be lost or corrupted during transmission. If packet loss happens in a key frame (such as the I frame and the P frame in both ISO MPEG-4 and ITU-T H.263), the corruption introduced by the packet loss will be propagated to the subsequent frames. Both intra- and inter-frame interpolation can be used to conceal the damaged frame in playback, depending on the dependency relation among the lost frame and other frames [1]. Several types of frame interpolation schemes have been proposed. The simple frame repetition scheme [2] results in motion jerkiness. The frame averaging method [2] reduces jerky motions, but often results in the ghost e ects. Linear interpolation [3] generates blurred moving areas since pixels of di erent objects are improperly mixed. To deal with these problems, motion-compensated frame interpolation (MCI) was studied by several researchers [4], [5]. Two types of motion elds, pixel-based and block-based, have been used to provide the motion trajectory between current and previous frames. The pixel-based motion eld, which results in more accurate interpolation, demands a high computational complexity. In contrast, the block-based eld requires only one motion vector for each block and has been adopted by most video compression standards such as H.263/H.263+ (where the block size is xed to 16  16 except for the advanced prediction mode). It provides acceptable visual quality with a carefully

designed algorithm. By employing block-based motion eld, Kuo et al. [5] proposed an MCI scheme, called the deformable block-based fast MCI (DB-FMCI), which demonstrates a good performance-complexity tradeo . However, since the algorithm was used as an encoder-independent post-processing unit, the interpolation performance is inherently constrained by the decoded block motion vectors provided by the encoder. Under this framework, one has to face the uncertainty of compression-optimized block motion vectors, and settle at a compromised solution for the selected type of video. For example, techniques attempting to obtain `true motion (not by block matching) [6] can enhance the e ectiveness of DB-FMCI. Also, the quality of reference frames for interpolation is another performance constraint, which is mostly dependent upon the available bandwidth. Furthermore, similar to work in [6{9], DB-FMCI assumes linear object motions between two adjacent reference frames. Rigorously speaking, this assumption is not accurate since object motions are not always linear for a group of frames. The assumed linear motion and object non-linear motion are compared in Fig. 1. In Fig. 1, arrows indicate non-linear motions from the current P frame to skipped frames and the previous P frame, respectively while the straight line with dots represents the corresponding linear motions under the constant motion velocity assumption. The error is already evident in the 2-D plane, and it becomes even worse when the video clip contains 3-D motions such as object moving back and forth, camera zooming and panning, and etc. Generally speaking, when the number of skipped frames between two reference frames is small, the error between linear and non-linear motion is small, since the nonlinear motion can be approximated well with a linear one. However, with an increased frame skip, the error can grow signi cantly when the object motion is non-linear and fast.

(a)

(b)

Figure 1: (a)Linear vs. Non-linear motion between two adjacent P frames; (b)The stream syntex of proposed system. An intuitive solution is to adaptively skip frames according to di erent object motions [10]. When the object motion is slower and more linear-like, we may skip more frames between two reference frames. Otherwise, the number of skipped frames should be kept small. This adaptive frame skip method reduces the linear motion error by adding more reference frames so that the bit rate may go higher when a video clip contains a majority of nonlinear motions. Also, global motion compensation schemes [11,12] have been proposed to compensate background motions (mostly due to camera movement) that are in general non-linear. By using global motion estimation and compensation, background artifacts are reduced without sacri cing coding bit rates. However, the foreground non-linear motion problem remains unsolved. To compensate non-linear motions for the entire frame (both the foreground and the background), we propose a non-linear motion-compensation interpolation scheme in this work. The proposed coder aided by non-linear motion information is an augmented version of the ITU-T H.263 standard. Modi cations are performed in both the encoder and the decoder so that the bit stream structure is changed. The encoder and the decoder should be used as a pair. Besides I/P frames, a new type of frame called the M frame is de ned to convey compressed residuals between linear and non-linear motion vectors (MVs). I, P and M frames all together form the bit stream. Each M frame corresponds to a skipped frame at the encoder, which is used to reconstruct non-linear MVs based on the calculated linear MVs at the decoder. The obtained non-linear MVs are applied to motion-compensated frame interpolation to generate skipped frames, where the 6-parameter ane-based DB-FMCI scheme [5] is adopted. The use of M frames provides an option of more accurate motion compensation at the expense of a few number of extra bits. Since linear MVs calculated by assuming a constant motion velocity between two reference frames give a reliable and reasonable initialization, we can estimate nonlinear MVs within a local range by using the block matching algorithm. This implies that the motion estimation complexity is reduced and residuals between linear and non-linear MVs are bounded. As a result, the extra number

of bits required is not signi cant while the quality of decoded video is enhanced. Moreover, if necessary, some M frames can be ignored in encoding and transmission, which leads to a very exible rate-distortion performance. The paper is organized as follows. The overview of the proposed system is described in Section 2. Section 3 illustrates the motion residual coding in detail. Section 4 discusses the motion-compensated frame interpolation aided by non-linear motion MVs. Experimental results are presented in Section 5. Conclusion and future work are given in Section 6.

2 System Overview and Design Issues 2.1 System Overview The proposed system consists of two main modules: the encoder module and the decoder module as shown in Fig. 2 and Fig. 3. To avoid the interpolation error introduced by the linear motion velocity assumption, non-linear motion vectors are estimated and encoded by the encoder. Then, they are reconstructed by the decoder for the purpose of frame interpolation.

Figure 2: The functional ow diagram of the encoder of the proposed motion-compensated frame interpolation scheme. As shown in Fig. 2, the encoder rst encodes I and P frames, which is similar to that of the conventional encoder. Each P frame contains block-based (e.g., 16  16 typically) MVs vectors with respect to the reconstructed previous P frame. With a pair of current and previous frames (denoted by Pcurr and Pprev , respectively), MVs of from the current P frame to each skipped frame are calculated with the linear motion assumption. When objects within a video clip exhibit consistent movements, linear MVs often provide a good estimate of object motions for skipped frames, especially for a smaller number of skipped frames. However, as the skip number becomes larger, desired MVs will deviate more and more from linear MVs. Thus, a localized block matching method is employed to search the best match pairs for non-linear MVs. To reduce the overhead, motion residuals are computed and

entropy encoded for M frames. All encoded I, P and M frames form the bit stream transmitted to the decoder.

Figure 3: The functional ow diagram of the decoder of the proposed motion-compensated frame interpolation scheme. The decoder works in the reverse procedure as shown in Fig. 3. After receiving the bit stream, it rst decodes bits for I, P, and M frames, respectively. I/P frames are reconstructed by using the residual and motion information while M frames are decoded for motion residuals only. Linear MVs are rst calculated with the linear velocity assumption and modi ed via motion residuals to reconstruct non-linear MVs. MVs will be used in the context of the 6-parameter ane-based DB-FMCI for frame interpolation. Therefore, skipped frames are regenerated and the whole image sequence can be played back.

2.2 Design Consideration It is worthwhile to point out that adoption of MCI for intermediate frames are the key di erence between the proposed M frame and the popular B frame. A key issue in this research is to gure out a way to take advantage of adopted MCI (e.g. the ane-based DB-FMCI scheme). This is nevertheless a very complicated task, which will be pursued in a separate paper. Here, we highlight several design issues which deserves our special attention. The supplemental information for M frames can be MV only or MV plus residuals. With both MVs and residuals, the M frame is coded and reconstructed similar to that the B frame, except for the following di erences. Right now, only one-directional (i.e. backward) MV is supported for the M frame. Furthermore, the M frame does not rely as much on the accompanying residuals as that for the B frame. For M frames, MVs are playing a key role in boosting the performance of adopted MCI with a relatively small overhead compared to the coded residuals. Thus, the M frame approach enables a di erent rate-distortion tradeo . For example, it may be more ecient to spend bits on multiple M frames instead of one B frame. The separation of MVs and optional residuals may be helpful in layered video coding techniques, since it allows reasonable reconstruction only with the MV part. A more systematic investigation of the rate-distortion is required to quantify the best operation modes of the M

frame approach, which will be one main focus of our research in the near future. The type and the search scheme for MVs of the M frame can a ect the coding performance. A straightforward scheme is to obtain MVs of the M frame with standard block matching. This can be done by a local search around the MV obtained from the linear motion assumption. Also, we may consider the e ect of motion-compensated interpolation and determine the best MV accordingly, which can lead to a higher computational complexity. In this work, the former approach (i.e. simple backward motion estimation) is adopted to get non-linear MVs. Motion vectors between reference frames are backward, i.e. from Pcurr to Pprev . Therefore, linear MVs are calculated in the backward direction, which are from Pcurr to each skipped frame. We take the grid points of the current reference frame Pcurr as the starting points for motion estimation. Then, non-linear reference points (corresponding to the grid points of Pcurr ) for each skipped frame can be calculated in a straightforward manner. Coding of nonlinear MVs is another design issue. The di erential coding is the natural choice since it will have a higher coding gain. It is adopted in this paper to encode the motion residual. However, when there is packet loss and some part of the reference frame (but not the whole reference frame) is damaged, the di erential coding tends to be vulnerable to the loss. Thus, in an error prone environment, coding of MV values directly may be preferable to increase the value of M frame for temporal concealment. Finally, in terms of the overall complexity of the system, the proposed scheme is in general more complex than the use of I/P/B frames since the interpolation process demands more operations than the simple compensation. The performance-complexity tradeo has to be analyzed carefully.

3 M frames and Non-linear Motion Vectors Most MCI algorithms assume a constant motion velocity between two adjacent P frames. This type of motion is called the linear motion [5{9]. Under this assumption, the linear MV from Pcurr to the kth skipped frame (counted from Pcurr to Pprev ) can be expressed as (MV xi  k=(N + 1); MV yi  k=(N + 1)), if the 2D block-based motion vector from Pcurr to Pprev is denoted by (MV xi , MV yi ) and the number of skipped frames is N , respectively. Thus, linear MVs can be easily computed with a low complexity. Linear MVs provide a reasonable estimation of motions for skipped frames in talking head sequences, where the object of interest does not move much. However, as discussed in the previous section, the linearly interpolated MVs can be noticeably di erent from object's real MVs, since they are non-linear in general. Therefore, motion residuals between linear and non-linear motions are encoded at the encoder and transmitted to the decoder to help the interpolation task, which can eciently reduce the interpolation error caused by the linear motion assumption. A local backward block matching scheme is adopted to search the best matching block pairs between the current P frame and the skipped frame to generate non-linear MVs. By taking the linear motion as the center of the local search window, a low complexity search can be performed as depicted in Fig. 4. To summaize, the following three steps are taken to generate non-linear MVs.

 Step 1: Calculate the linear MV (mvxi , mvyi ) for each block from the current P frame to the kth skipped frame (M frame) via

mvxi = MV xi  k=(N + 1); mvyi = MV yi  k=(N + 1)):

 Step 2: Take the grid point (i.e., the center point of the ith block in the current P frame) as (xi , yi ) and nd the corresponding point in the kth M frame with the linear motion relation, which are set as (xi , yi ) with 0

xi = xi + mvxi ; yi = yi + mvyi : 0

0

0

Figure 4: Local motion search for determine the non-linear motion vector.

 Step 3: Select a local search range (4x; 4y), and nd the best matching block to the ith block in Pcurr within (xi  4x, yi  4y). Set the central point of this block as (xi , yi ) and calculate the non-linear MV (mvxi , 0

0

00

00

0

mvyi ) via 0

mvxi = xi ; xi ; mvyi = yi ; yi : 0

00

0

00

At the applicable bit rate range of the proposed MCI scheme, the motion vectors consumes bits around 1=4 of residuals (at the quantization level of 13). This means that if we directly transmit non-linear MVs of skipped frames, around ve M frames will occupy the same amount of bits used for one P frame. Thus, to save bit rates, the di erential encoding is adopted for the coding of MVs in M frames. That is, we encode

4mvxi = mvxi ; mvxi ; 4mvyi = mvyi ; mvyi : 0

0

where (mvxi ; mvyi ) and (mvxi ; mvyi ) indicates the non-linear and linear MV of the ith block, respectively. Unlike non-linear MVs which take values in a wider range, motion residuals are con ned to a smaller range (4 mvx; 4 mvy). For example, in case of the QCIF format (176  144) of block size 16  16, there are 2  11  9 = 198 motion residuals per frame. Bi-directional uniform quantization and entropy coding (by using the arithmetic coder) reduces the total amount to around 20%. As a result, only up to 40 bytes are required for the coding of each M frame in typical applications. 0

0

4 Motion-compensated Interpolation with Non-linear MVs After decoding I/P frames, M frames are used for the interpolation of skipped frames between two reference frames. Bu ers are needed to store decoded MVs from a set of M frames, and they are released after the current P frame is decoded, waiting for the next set of M frames. The procedure in the reconstruction of one group of frames (the previous P frame, the interpolated frames and the current P frame) is explained below.

 Step 1: Decode and reconstruct the previous P frame, i.e. Pprev .  Step 2: Decode the set of M frames that lie between Pprev and Pcurr , and reconstruct non-linear MVs and save them in bu ers.  Step 3: Decode and reconstruct the current P frame, i.e. Pcurr .

 Step 4: Interpolate skipped frames between Pprev and Pcurr by reconstructing pixels of Pprev and Pcurr

(bi-direction interpolation) with non-linear MVs stored in bu ers mentioned in Step 2. Release bu ers for the next set of non-linear MVs.

Linear MVs are calculated in the same way as that in the encoder. Let the block-based motion vector from

Pprev to Pcurr ) be (MV xi , MV yi ). In mathematics, the linear MV from Pprev to the kth interpolated frame among total N interpolated frames can be written as mvxi = MV xi  k=(N + 1); mvyi = MV yi  k=(N + 1): Next, the corresponding motion residual decoded from the M frame is (4mvxi , 4mvyi) so that the reconstructed non-linear MV can be expressed as

mvxi = mvxi + 4mvxi ; mvyi = mvyi + 4mvyi: 0

0

Thus, the corresponding coordinates in Pprev and the interpolated frame with respect to the grid points in Pcurr can be calculated via xi = xi + MV xi ; yi = yi + MV yi ; xi = xi + mvxi ; yi = yi + mvyi : where (xi , yi ),(xi , yi ) and (xi , yi ) stand for the grid point in Pcurr , and its corresponding points in Pprev and the interpolated frame, respectively. This is shown in Fig. 5. 0

0

00

0

0

00

0

00

0

00

Figure 5: The block in Pcurr , the corresponding quadrilaterals in Pprev and the interpolated M frame based on non-linear MVs. Foreground pixels in a block often have their own motion that is di erent from those in background and other foreground blocks. Thus, foreground motion models should be localized, i.e. motion parameters can be di erent from blocks to blocks. Here, we adopt the 6-parameter ane motion model to represent the motion relationship from the interpolated frame to both previous and current P frames. The 6-parameter ane model is de ned as xi = a0  xi + a1  yi + a2 ; yi = a3  xi + a4  yi + a5 ; 0

0

where (xi ,yi ) is the 2D coordinates of the interpolated pixel and (xi ,yi ) is its corresponding pixel in the reference frame (i.e. Pprev or Pcurr ). The interpolation is bi-directional so that both previous and current reference frames contribute to the interpolated frame. Each block (quadrilateral) is split into two triangles by the shorter diagonal. The three vertices of each triangle provide sucient equations to solve the 6-parameters in the ane model de ned above for this triangular region. Therefore, the backward or forward estimated value of the interpolated pixel is calculated by using local motion parameters associated with the triangular patch it belongs to. For more details, we refer to [5].

5 Experimental Results We applied the proposed system to the Miss America test sequence in the QCIF format for performance evaluation. Both objective evaluations such as PSNR and R-D curves and the subjective visual performance are done. They are compared with those obtained by using frame repetition and FMCI [5] techniques. Results are given in Figs. 6, 7 and 8, respectively. Miss America, frame skip = 10 36 Proposed FMCI Rep

34

32

PSNR

30

28

26

24

22

20 70

72

74

76

78

80 Frame No.

82

84

86

88

90

Figure 6: The PSNR performance comparison among frame repetition, FMCI and the proposed scheme. For the Miss America video clip of the QCIF format, the frame rate of the input sequence is 30 frames per second (fps) for the encoder. The basic mode is selected (i.e. no optional mode is activated) and a quantization level of 13 (by default) is used. In Fig. 6, the number of frames skipped between two P frames is xed to be 10. The solid line, the dashdot line and the dashed line stand for the PSNR curve of the proposed, FMCI and frame repetition methods, respectively. Frame repetition gives the worst PSNR especially when the number of skipped frames is not small. The FMCI method achieves a much higher PSNR value because it applies motion compensation for frame interpolation. The best result is obtained by the proposed system. Since linear motions of skipped frames are adjusted by supplemental non-linear motion information, more accurate frame interpolation is reached by the decoder. The cost of the PSNR enhancement achieved by the proposed system is extra bits required by transmitting M frames. In fact, not all M frames have to be transmitted from the encoder to the decoder, since sometimes an M frame does not provide much information when the motion is almost linear. Therefore, M frames can be constructed adaptively and transmitted selectively, which reduces the bit usage signi cantly. Fig. 7 shows the average PSNR vs. the bit usage (in bytes) for the proposed, FMCI and frame repetition methods. The solid line, the dashdot line and the dashed line indicate the averagePSNR ; byte curve of the proposed, FMCI and frame repetition approaches, respectively. Di erent frame skip numbers (e.g. 4, 6, 8, 10, etc.) were applied to meet the di erent bit budgets. Frame repetition gives the worst average PSNR when the bit rate is very low, i.e. the number of frames skipped is high. However, its performance improves very fast, and it has results which are as good as the other two methods when the bit rate is high (e.g. one or two).

Miss America 31

30.5

30

Average PSNR

29.5

29

Rep FMCI Proposed

28.5

28

27.5

27 3400

3600

3800

4000

4200

4400

4600

4800

5000

5200

Bytes

Figure 7: The PSNR-rate performance comparison among the proposed, frame repetition and FMCI schemes. FMCI and the proposed scheme achieve a better PSNR in low bit rates, but the speed of enhancement is slowed down when the bit rate goes higher. The average PSNR of the proposed and the FMCI methods meet when the frame skip number is small, since motions between two adjancent P frames can be considered linear when a small amount of frames are skipped. The visual comparison of the interpolated 85th frame by the proposed, FMCI and frame repetition methods is shown in Fig. 8. With the aid of vertical parallel lines, we can clearly tell that the head position and the facial expression of MissAmerica interpolated by the proposed method is very similar to that of the original one. There is a noticeable di erence between the FMCI result and the original one in the head area, which is due to the constant motion velocity assumption from the 78th frame to the 89th frame, while the motion is in fact non-linear. The frame repetition method gives the worst result since it simply copies the 78th frame and does not consider the 89th frame at all.

6 Conclusion and Future Work An enhancement of the motion-compensated frame interpolation scheme is demonstrated for frame rate upconversion by non-linear MVs. We introduced M framed that consists of supplementary motion vectors in the form of residuals between linear and non-linear MVs. Thus, non-linear ane motion-compensated frame interpolation can be performed at the decoder. Experimental results show that the proposed non-linear enhancement scheme can achieve a higher overall PSNR value (or a better rate-distortion performance especially for low bit rate ranges) and better visual quality in comparison FMCI and simple frame repetition. The introduction of M frames provides a new option to exploit motion-compensated coding in addition to the traditional I/P/B frame framework. However, to achieve the best rate-distortion tradeo via interpolation still requires further investigation. For dynamic and error-prone Internet video transmission, provision of good interpolation/extrapolation tools is essential to error concealment and quality enhancement.

7 REFERENCES [1] P. Salama, N. B. Shro and E. J. Delp, \Error concealment in encoded video streams," Signal Recovery Techniques for Image and Video Compression and Transmission, Kluwer Academic Publishers, 1998, pp. 199-233. [2] A. M. Tekalp, Digital video processing. Prentice hall, NJ, 1995.

Figure 8: The visual quality comparison of interpolated frames with frame repetition, FMCI and the proposed schemes, where (a), (b), (c) are the 78th, the 85th and the 89th frames of the original sequence, respectively, and (d) is the 85th frame interpolated by the proposed coder, (e) is the 85th frame interpolated by FMCI and (f) is the 85th frame interpolated by the frame repetition method.

[3] A. N. Netravali and J. D. Robbins, \Motion-adaptive interpolation of television frames," in Proc. Picture Coding Symposium, June, 1981. [4] H. G. Musmann, P. Pirsch, and H. J. Grallert, \Advances in picture coding," Proc. IEEE, vol. 73, pp. 523-548, Apr. 1985. [5] T. Kuo and C.-C. J. Kuo, \Motion-compensated interpolation for low-bit-rate video quality enhancement," in Proc. SPIE Applications of Digital Image Processing, vol.3460, pp. 277-288, 1998. [6] Y.-K. Chen, A. Vetro, H. Sun, and S. Y. Kung, \Frame-rate up-conversion using transmitted true motion vectors," in Proc. IEEE Second Workshop on Multimedia Signal Processing, pp. 622-627, Dec. 1998. [7] R. Castagno, P. Haavisto, and G. Ramponi, \A Method for motion adaptive frame rate up-conversion," IEEE Trans. on Circuits and Systems for Video Technology, 1996. [8] S.-C. Han and J. W. Woods, \Frame-rate up-conversion using transmitted motion and segmentation elds for very low bit-rate video coding," in Proc. IEEE International Conference on Image Processing, vol. 1, pp. 747-750, Oct. 1997. [9] K. Kawaguchi and S. K. Mitra, \Frame rate up-conversion considering multiple motion," in Proc. IEEE International Conference on Image Processing, vol.1, pp.727-730, Oct. 1997. [10] T.-Y. Kuo, J. Kim and C.-C. J. Kuo, \Motion-compensated Frame Interpolation Scheme for H.263 Codec," in Proc. IEEE Internaltional Symposium on Circuits and Systems, vol. 4, pp. 491-494, June 1999. [11] C.-T. Chu, D. Anastassiou, and S.-F. Chang, \Hierarchical global motion estimation/compensation in low bitrate video coding", in Proc. IEEE International Symposium on Circuits and Systems, June 1997. [12] S. Liu, Z. Yan, J. Kim and C.-C. J. Kuo, \Global/local motion-compensated frame interpolation for low bitrate video", in Proc. SPIE Image and Visual Communications Processing, Jan. 2000.

Suggest Documents