A New Error Resilient Coding Scheme for H. 263 Video Transmission

0 downloads 0 Views 259KB Size Report
Department of Computer Science and Information Engineering, ... each macroblock are extracted and embedded into another macroblock within ..... cealed video frames, the error-free and concealed frames (the Y component) by the six.
A New Error Resilient Coding Scheme for H.263 Video Transmission* Li-Wei Kang and Jin-Jang Leou** Department of Computer Science and Information Engineering, National Chung Cheng University, Chiayi, Taiwan 621, Republic of China {lwkang, jjleou}@cs.ccu.edu.tw Abstract. For entropy-coded H.263 video frames, a transmission error in a codeword will not only affect the underlying codeword but also may affect subsequent codewords, resulting in a great degradation of the received video frames. In this study, a new error resilient coding scheme for H.263 video transmission is proposed. At the encoder, for an I frame, the important data of each macroblock are extracted and embedded into another macroblock within the I frame by the proposed odd-even data embedding scheme for I frames. For a P frame, a rate-distortion optimized coding mode selection approach is employed. The important data for each GOB (group of blocks) are extracted and embedded into the next frame by using the proposed macroblock-interleaving GOB-based data embedding scheme. At the decoder, after all the corrupted macroblocks within a video frame are detected and located, if the important data of a corrupted macroblock can be extracted correctly, the extracted important data will facilitate the employed error concealment scheme to conceal the corrupted macroblock; otherwise, the employed error concealment scheme is simply used to conceal the corrupted macroblock. Based on the simulation results, the proposed scheme can recover high-quality H.263 video frames from the corresponding corrupted video frames up to video packet loss rate = 30%.

1 Introduction For entropy-coded H.263 video frames [1]-[2], a transmission error in a codeword will not only affect the underlying codeword but also may affect subsequent codewords, resulting in a great degradation of the received video frames. To cope with the synchronization problem, each of the two top layers of the H.263 hierarchical structure, namely, picture and group of blocks (GOB), is ahead with a fixed-length start code. After the decoder receives any start code, the decoder will be resynchronized regardless of the preceding slippage. Although the propagation effect of a transmission error can be terminated when any start code is correctly received, a transmission error may affect the underlying codeword and its subsequent codewords within the corrupted GOB. Moreover, because of the use of motion-compensated interframe coding, the effect of a transmission error may be propagated to the subsequent video frames.

*

This work was supported in part by National Science Council, Republic of China under Grants NSC 90-2213-E-194-044 and NSC 90-2213-E-194-039. ** Author to whom all correspondence should be addressed. Y.-C. Chen, L.-W. Chang, and C.-T. Hsu (Eds.): PCM 2002, LNCS 2532, pp. 814-822, 2002. © Springer-Verlag Berlin Heidelberg 2002

A New Error Resilient Coding Scheme for H.263 Video Transmission

815

In general, error resilient approaches include three categories [3], namely: (1) the error resilient encoding approach [4]-[5], (2) the error concealment approach [6]-[9], and (3) the encoder-decoder interactive error control approach [10]. The foregoing error resilient approaches [3]-[10] concentrate on limiting error propagation and using the correctly received information to estimate corrupted video data. If the information of neighboring blocks are not available or the video contents between a corrupted block and its neighboring blocks are very different, the concealed results of the foregoing error resilient approaches are not good enough. Recently, several error resilient coding approaches based on data embedding are proposed [11]-[13], in which some important data useful for error concealment at the decoder can be embedded into video frames, when they are encoded at the encoder. At the decoder, the embedded data for the corrupted blocks are extracted and used to facilitate error concealment performed at the decoder. Yin, Liu, and Yu [11] embedded the block type and edge direction index of a block within an image into the DCT coefficients of another block of the image by using the odd-even embedding scheme. Song and Liu [12] proposed a data embedding scheme for error-prone channels, in which some redundant information for protecting motion vectors and coding modes of macroblocks in one frame is embedded into the motion vectors in the next frame. However, if either the next frame of a corrupted frame is also corrupted or the total number of corrupted GOBs in a corrupted frame is larger than 2, the performance of the system will degenerate greatly. In this study, a new error resilient coding scheme for H.263 video transmission is proposed. This paper is organized as follows. Error concealment for H.263 video transmission is given in Section 2. The proposed error resilient coding scheme for H.263 video transmission is addressed in Section 3. Simulation results are included in Section 4, followed by concluding remarks.

2 Error Concealment for H.263 Video Transmission In this study, both the best neighborhood matching (BNM) algorithm [6] and the spatial error concealment algorithm in TML-9 [7] are employed together to conceal H.263 intra-coded I frames. In the BNM algorithm, each corrupted block of size N×N is extracted from the video frame together with its neighborhood as a range block of size (N + m)×(N + m). Within a range block, all the pixels in the corrupted region belong to the lost part, and the others belong to the good part. After a range block is extracted, an H×L searching range block centralized with the range block within the video frame is generated. Each (N + m)×(N + m) block in the searching range block with no lost pixels may be a candidate domain block for recovery of the lost part of the range block, i.e., the corrupted block. For each candidate domain block, the mean square error (MSE) between the good part of the range block and the corresponding good part of the candidate domain block is evaluated. The candidate domain block with the minimum MSE will be determined as the best domain block, which is used to conceal the lost part of the range block by copying its corresponding central part to the lost part of the range block. On the other hand, for the spatial error concealment algorithm in TML-9 [7], each pixel value in a corrupted macroblock is formed as a

816

L.-W. Kang and J.-J. Leou

weighted sum of the closest boundary pixels of the selected four-connected neighboring macroblocks. Here, a corrupted macroblock within an H.263 I frame will be concealed by the spatial error concealment algorithm in TML-9 if (1) the immediately bottom macroblock of the current corrupted macroblock is correctly received, and (2) mT − m B > Q × σ T and mT − m B > Q × σ B , where mT, mB, σ T , and σ B are the means and the standard deviations of the pixel values in the top and bottom macroblocks, respectively, of the corrupted macroblock and Q is a prespecified constant. If any of the two conditions is not satisfied, the corrupted macroblock within an H.263 I frame will be concealed by using the BNM algorithm. On the other hand, the BNM algorithm [6] is originally developed for still images and in this study, the “motion-compensated” BNM algorithm proposed in [9] is used to conceal corrupted macroblocks in each H.263 P frame.

3 Proposed Error Resilient Scheme for H.263 Video Transmission 3.1 Error Resilient Coding for H.263 I Frames

Because the human eyes are more sensitive to the luminance component than the chrominance component, in this study, the four quantized DC values for the Y component of each macroblock in an H.263 I frame are extracted as important data, which are identically quantized by a quantization parameter QDC. QDC is set to 64 and 5 bits are required to represent each DC value, i.e., 20 bits are required to represent the four corresponding DC values. Here, the extracted important data for a macroblock within an H.263 I frame will be embedded into the DCT coefficients of another macroblock, called the masking macroblock, in the same I frame. A macroblock and its masking macroblock should be as far as possible so that both the two corresponding macroblocks will be seldom corrupted at the same time. Here, a macroblock and its masking macroblock should not be in the same GOB and the masking macroblocks of the macroblocks of a GOB should not be in the same GOB. For a macroblock, MB(i, j), 0 ≤ i ≤ 10, 0 ≤ j ≤ 8, in an H.263 QCIF I frame, its masking macroblock MB(p,q) within the same I frame is determined as:

(i ,(i + j + 2)mod 9 ) (i ,(i + j − 4 ) mod 9 )

( p ,q ) = 

if 0 ≤ i ≤ 5 , if 6 ≤ i ≤ 10.

(1)

To perform data embedding in H.263 I frames, the odd-even embedding scheme [11] operates on the quantized DCT coefficients. If the data bit to be embedded is “0,” the selected quantized DCT coefficient will be forced to be an even number. If the data bit to be embedded is “1,” the selected quantized DCT coefficient will be forced to be an odd number. Additionally, only the quantized DCT coefficients larger than a prespecified threshold, TI, are used to embed data bits. If the data bit to be embedded is bj, and the selected quantized DCT coefficient is Ci, the odd-even data embedding scheme operates as:

A New Error Resilient Coding Scheme for H.263 Video Transmission

Ci + 1 if |Ci| > TI, Ci mod 2 ≠ bj, and Ci > 0,  Ci = Ci − 1 if |Ci| > TI, Ci mod 2 ≠ bj, and Ci < 0,  C  i otherwise.

817

(2)

Note that for a macroblock within an H.263 I frame, if the extracted important data of its macroblock cannot be embedded completely into its masking macroblock, the “remaining” important data of the macroblock can be embedded into the corresponding macroblock in the next frame, with the threshold TI being replaced by another threshold TP. At the decoder, for each corrupted macroblock within an H.263 I frame, its masking macroblock is determined accordingly. If the masking macroblock is correctly received, the embedded data for the corrupted macroblock can be extracted. Then, each corrupted macroblock is firstly concealed by the employed error concealment scheme. The Y component of the firstly “concealed” macroblock is transformed to four sets of DCT coefficients by the 8×8 discrete cosine transform (DCT), and the four firstly “concealed” DC values are replaced by the four corresponding extracted DC values from the corresponding masking macroblock. The resulted four sets of DCT coefficients are transformed back to pixels by the 8×8 inverse DCT to obtain the secondly-concealed macroblock. The resulted secondly concealed macroblocks are processed by a blocking artifact reduction scheme proposed in [14]. Note that if the masking macroblock of a corrupted macroblock is also corrupted, the corrupted macroblock is concealed only by the employed error concealment scheme. 3.2 Error Resilient Coding for H.263 P Frames

For H.263 inter-coded P frames, similar to [4], a rate-distortion (RD) optimized macroblock coding mode selection approach is employed, which takes into account the network condition, including the video packet loss rate, the quantization parameter used in the encoder, the error concealment scheme used at the decoder, and the data embedding scheme used in the encoder. The Lagrangian function for macroblock encoding mode selection is given by: J = (1 – p)Dqw + pDc + λR,

(3)

where p denotes the probability of corruption of a macroblock, Dqw denotes the distortion induced by quantization and data embedding, Dc denotes the distortion induced by transmission errors and error concealment for a P frame, R denotes the bit rate used

2 to encode the macroblock, λ = 0.85 × (Q 2 ) , and Q denotes the quantization step size used for the macroblock, which has been shown to provide good RD tradeoffs [4]. Here the mode (intra, inter, or skip) minimizing Eq. (3) is selected. Additionally, a maximum intra-coded refresh period, Tmax, is imposed [5]. Inserting intra-coded macroblocks with a maximum refresh period can not only limit the temporal error propagation, but also provide a larger capacity (larger DCT coefficients) for embedding important data. In this study, the error rate of the network is defined as the video packet loss rate and a video packet is equivalently one complete GOB [4]. Hence, the

818

L.-W. Kang and J.-J. Leou

probability of corruption of a macroblock used in Eq. (3) is equivalently the video packet loss rate. For H.263 inter-coded P frames, two bits are used to represent the coding mode of a macroblock. Because the motion vector for each macroblock includes two components, i.e., MVx and MVy, if the search range for motion vectors is ±15 with half-pixel accuracy, six bits are needed for each motion vector component. Hence, the important data for each macroblock in a P frame will range from two to at most fourteen bits, depending on the mode of the macroblock is intra-coded, inter-coded, or skipped. To reduce the size of the important data required to be embedded, based on the experimental results obtained in this study, for an inter-coded macroblock, if either (1) both components of the motion vector are identically zero, or (2) the MSE (mean square error) between the concealed macroblocks using the actual motion vector and the estimated motion vector is smaller than a threshold TM, the motion vector of the intercoded macroblock should not be embedded in its masking macroblock. Here the oddeven data embedding scheme is employed, in which the thresholds for intra-coded and inter-coded macroblocks are T′I and TP, respectively. The important data extracted from the current P frame will be embedded into the next frame. Because the smallest synchronization unit in H.263 video frames is a GOB, by adopting the even-odd block-interleaving technique proposed in [15], a macroblockinterleaving GOB-based data embedding scheme is proposed for H.263 P frames, which is described as follows. Assume that two non-adjacent GOBs in frame k are denoted by GOB A and GOB B, respectively. The important data for the two GOBs are extracted. The data extracted from the even-number macroblocks of GOB A and the data extracted from the odd-number macroblocks of GOB B are interleaved by the even-odd order and concatenated to a mixed bitstream and then the bitstream is embedded into its masking GOB in the next frame (frame k + 1) by using the odd-even data embedding scheme. On the other hand, the data extracted from the odd-number macroblocks of GOB A and the data extracted from the even-number macroblocks of GOB B are also interleaved by the even-odd order and concatenated to another bitstream, and the bitstream is embedded into another masking GOB in the next frame. The distance between two masking GOBs in the next frame of the two interleaved GOBs (GOB A and GOB B) in the current frame should be as far as possible so that two or more successive corrupted video packets in the next frame will not induce two corrupted masking GOBs in the next frame. At the decoder, for each corrupted macroblock in a corrupted GOB, its corresponding pair of masking GOBs are found first, and the important data of the corrupted macroblock is extracted from the corresponding pair of masking GOBs if the pair of masking GOBs is correctly received. Then (1) if the coding mode of the corrupted macroblock is “skip,” the macroblock is concealed by copying the corresponding macroblock in the previous reconstructed frame; (2) if the coding mode of the corrupted macroblock is “inter” and its motion vector information is recovered completely from the corresponding pair of masking

A New Error Resilient Coding Scheme for H.263 Video Transmission

819

GOBs, the macroblock is concealed by copying the motion compensated macroblock in the previous reconstructed frame; (3) if the coding mode of the corrupted macroblock is “inter” and its motion vector (i) is not embedded, (ii) can not be embedded completely at the encoder, or (iii) cannot be recovered completely, the macroblock is concealed by using the employed error concealment scheme for H.263 P frames; and (4) if the coding mode of the corrupted macroblock is “intra,” the employed error concealment scheme for H.263 P frames is also employed. Traditionally, the order of concealing consecutive corrupted macroblocks is in a raster scan manner. If all the eight neighboring macroblocks of a corrupted macroblock are received correctly or well-concealed, the concealed results of the corrupted macroblock will be better. Thus, before concealing a corrupted macroblock, its 8-connected neighboring macroblocks will be checked first. If some of its 8-connected neighboring macroblocks of the corrupted macroblock are also corrupted, and these corrupted neighboring macroblocks can be concealed only with important embedded data extracted from its masking macroblock(s), these corrupted neighboring macroblocks will be concealed first. Finally, the corrupted macroblock can be concealed by using the employed error concealment scheme for H.263 P frames with more neighboring macroblock information. In this study, for a corrupted GOB, if only one of its masking GOB is corrupted, the even (or odd) macroblocks of the corrupted GOB can be concealed using the important data extracted from the “good” masking GOB first. Then the odd (or even) macroblocks can be concealed by the employed error concealment scheme with more neighboring macroblock information. Because the corresponding two masking GOBs are seldom corrupted simultaneously, the concealed results of the proposed macroblock-interleaving GOB-based data embedding scheme will be better than that of the conventional approaches.

4 Simulation Results Four QCIF test video sequences “Carphone,” “Coastguard,” “Foreman,” and “Salesman” with different video packet loss rates, denoted by VPLR, are used to evaluate the performance of the proposed scheme (denoted by proposed). Here a video packet is equivalently one complete GOB [4]. All video sequences are coded at frame rate 10 frames/second (fps) with bit rates 48kbps and 64kbps using the H.263 test model TMN-11 rate control method [2] and the peak signal to noise ratio (PSNR) is employed in this study as the objective performance measure. In this study, the constants Q in Eq. (1) and TI in Eq. (3) are set to 0.4 and 55, respectively. In the BNM algorithm and the employed error concealment scheme for H.263 P frames, N, m, H, and L are set to 16, 2, 30, and 30, respectively, for the Y component, and 8, 2, 15, and 15, respectively, for the CB and CR components. In the employed macroblock coding mode selection scheme, Tmax is set to 10 and in the proposed data embedding scheme for H.263 P frames, T′I, TP, and TM are set to 15, 1, and 10, respectively. To evaluate the performance of the proposed scheme, six existing error resilient coding and error concealment approaches for comparison [2], [7]-[8], [12] are imple-

820

L.-W. Kang and J.-J. Leou

mented in this study. They are: (1) zero-substitution, which simply replaces all pixels in a corrupted macroblock by zeros (denoted by Zero-S); (2) the zero motion vector technique, which copies the corresponding macroblock in the previous reconstructed frame (denoted by Zero-MV); (3) the error concealment method described in the H.263 test model TMN-11 (denoted by TMN-11) [2]; (4) concealment by selection, in which a neighbor matching criterion is employed to select a motion vector from a set of motion vector candidates to conceal a corrupted macroblock (denoted by Selection)

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 1. The error-free and concealed H.263 video frames (the Y component) of an I-frame (the first frame) within the “Carphone” sequence at frame rate 10 fps with the video packet loss rate = 10% and bit rate = 48kbps: (a) the error-free frame; (b)-(h) the concealed frames by Zero-S, Zero-MV, TMN-11, Selection, IBR, DEVCS, and the proposed scheme, respectively.

(a)

(b)

(c)

(d)

(e)

(f)

(g)

(h)

Fig. 2. The error-free and concealed H.263 video frames (the Y component) of a P-frame (the fourteenth frame) within the “Foreman” sequence at frame rate = 10 fps with the video packet loss rate = 30% and bit rate = 48kbps: (a) the error-free frame; (b)-(h) the concealed frames by Zero-S, Zero-MV, TMN-11, Selection, IBR, DEVCS, and the proposed, respectively.

[8]; (5) the intra block refresh technique in the encoder supported by TMN-11, in which an intra refresh rate is set to 10, and the error concealment method described in TMN-11 is used (denoted by IBR) [2]. (6) the data embedded video coding scheme in [12] (denoted by DEVCS). For simplicity, the error concealment method for I frames supported by TML-9 [7] is employed in the approaches (2)-(6) for comparison. In terms of the average PSNR of a video sequence, denoted by PSNRseq in dB, the simulation results for the “Foreman” sequence with different video packet loss rates of the six existing approaches for comparison and the proposed scheme (denoted by proposed) are listed in Table 1. As a subjective measure of the quality of the concealed video frames, the error-free and concealed frames (the Y component) by the six existing approaches for comparison and the proposed schemes are shown in Figs. 1-2. Note that the average degradation of the proposed scheme with data embedding (compared with the original H.263 algorithm) is about 0.5 dB, which is comparable with that in [12].

A New Error Resilient Coding Scheme for H.263 Video Transmission

821

Table 1. The simulation results, PSNRseq (dB), for the “Foreman” sequence at frame rate = 10fps with bit rate = 48kbps, and different video packet loss rates of the six existing error resilient coding and error concealment approaches for comparison and the proposed scheme. Without data embedding VPLR Zero- Zero- TMNSelection IBR S MV 11 0% 10% 20% 30%

28.32 28.32 6.88 23.39 5.20 21.56 5.03 20.98

28.32 23.42 21.58 21.02

28.32 23.55 22.46 21.13

28.13 25.80 24.25 23.70

With data embedding Only employed Proposed DEVCS error concealment 27.35 27.91 27.91 23.46 25.28 26.27 21.56 24.19 26.02 21.01 23.66 25.23

5 Concluding Remarks Based on the simulation results obtained in this study, several observations can be found. (1) Based on the simulation results shown in Table 1 and Figs. 1-2, the concealment results of the proposed scheme are better than that of the six existing approaches for comparison. (2) Based on the simulation results shown in Table 1, the relative performance gains of the proposed scheme over the existing approaches for comparison increase as the video packet loss rate is increased, i.e., the performance of the proposed scheme is “slightly” better than that of the six existing approaches for comparison when video packet loss rate is relatively low, whereas the performance of the proposed scheme is “much” better than that of the six existing approaches for comparison when video packet loss rate is relatively high. Compared with the six existing approaches for comparison, the proposed scheme is more robust for noisy channels with burst video packet loss. That is because for low video packet loss rate cases, burst video packet loss will seldom occur and the important data, such as the motion vector of each corrupted macroblock, can be well estimated. On the other hand, for high video packet loss rate cases, burst video packet loss will frequently occur, and the important data cannot be estimated accurately. Within the proposed scheme using the macroblock-interleaving GOB-based data embedding technique, the important data are usually available. Additionally, using macroblockinterleaving, on the average, each corrupted macroblock will have more neighboring information, resulting in better concealed video frames. The proposed scheme is simple, very efficient, and can be easily adopted in various network environments and applicable to many other block-based video compression standards, such as MPEG-2, with some necessary modification.

References 1. 2.

ITU-T. Recommendation H.263: Video coding for low bit rate communication (1998) ITU-T/SG16 Video Coding Experts Group, “Video codec test model near-term, version 11 (TMN11),” Document Q15-G16 (1999)

822 3. 4. 5.

6.

7. 8.

9. 10. 11. 12. 13.

14. 15.

L.-W. Kang and J.-J. Leou Wang, Y., Wenger, S., Wen, J., Katsaggelos, A. K.: Error resilient video coding techniques. IEEE Signal Processing Magazine. 17(4) (2000) 61-82 Cote, G., Kossentini, F.: Optimal intra coding of blocks for robust video communication over the Internet. Signal Processing: Image Communication. 15(1999) 25-34 Frossard, P., Verscheure, O.: AMISP: A complete content-based MPEG-2 error-resilient scheme. IEEE Trans. on Circuits and Systems for Video Technology. 11(9) (2001) 989998 Wang, Z., Yu, Y., Zhang, D.: Best neighborhood matching: an information loss restoration technique for block-based image coding systems. IEEE Trans. on Image Processing. 7(7) (1998) 1056-1061 ITU VCEG, H.26L Test Model Long-Term Number 9 (TML-9) draft 0 (2001) Valente, S., Dufour, C., Groliere, F., Snook, D.: An efficient error concealment implementation for MPEG-4 video streams. IEEE Trans. on Consumer Electronics. 47(3) (2001) 568-578 Kang, L. W., Leou, J. J.: A new hybrid error concealment scheme for MPEG-2 video transmission. (submitted for publication) Girod, B., Farber, N.: Feedback-based error control for mobile video transmission. Proceedings of the IEEE. 87(10) (1999) 1707-1723 Yin, P., Liu, B., Yu, H. H.: Error concealment using data hiding. Proc. of IEEE Int. Conf. on Acoustics, Speech, and Signal Processing. 3(2001) 1453-1456 Song, J., Liu, K. J. R.: A data embedded video coding scheme for error-prone channels. IEEE Trans. on Multimedia. 3(4) (2001) 415-423 Bartolini, F., Manetti, A., Piva, A., Barni, M.: A data hiding approach for correcting errors in H.263 video transmitted over a noisy channel. Proc. of IEEE Fourth Int. Workshop on Multimedia Signal Processing. Florence, Italy (2001) 65-70 Chuah, C. S., Leou, J. J.: An adaptive image interpolation algorithm for image/video processing. Pattern Recognition. 34(12) (2001) 2383-2393 Zhu, Q. F., Wang, Y., Shaw, L.: Coding and cell-loss recovery in DCT-based packet video. IEEE Trans. on Circuits and Systems for Video Technology 3(3) (1993) 248-258

Suggest Documents