Improving the Rate-Distortion Performance of the ... - IEEE Xplore

2012 IEEE International Conference on Multimedia and Expo Workshops

IMPROVING THE RATE-DISTORTION PERFORMANCE OF THE TRANSFORM DOMAIN REFINEMENT CODEC BY THE USE OF DECODER-DRIVEN ADAPTIVE MODES Vijay Kumar, Somnath Sengupta Electronics & Electrical Communication Engineering Indian Institute of Technology Kharagpur India-721302 Email: [email protected], [email protected]

quality of the SI and the correlation for decoding the remaining frame.

Abstract—Distributed video coding (DVC) is an emerging coding paradigm, aiming at low complexity encoders. This paper proposes a new scheme for transform domain WynerZiv (WZ) video codec, where the key and WZ frames are encoded in multiple layers. The layers of the each frame are generated by sub sampling the 4x4 blocks in the spatial domain. After decoding each layer, side information (SI) refinement process is employed to improve the quality of SI to decode the next WZ layer. The codec performance is further improved by the decoder-driven adaptive skip/WZ control, estimated using the refined SI and the correlation noise. Rate-Distortion performance of the proposed scheme is tested with several sequences and performance improvements are noted.

II. R ELATED WORK To increase the performance of the pixel domain WZ video codec, bit plane level refinement step was proposed by Ascenso et al. [4]. In the Transform domain WZ (TDWZ) video codec, band level SI refinement is carried out using the previous decoded frames as the reference in [5] and using the initial SI as the reference in [6]. Pixel domain multi resolution refinement (MRR) is proposed by Fan et.al [7], where the refinement is carried out at the pixel level. The MRR scheme shows good RD performance, compared to the bit plane based refinement scheme. As the temporal and spatial correlation varies in a video sequence, use of the coding modes improves the RD performance, as compared to encoding the WZ frame uniformly in WZ mode. Estimation of the coding mode is dependent on the quality of the estimated SI and the correlation. In DVC, original WZ frame and the SI frame are available at the opposite ends and provide a way to estimate the coding mode either at the encoder or at the decoder in a flexible way, depending on the encoder and decoder constraints. Different architectures with the use of different coding modes are reported in the literature. Coding mode decision schemes in the DVC are mainly based on the classification, as the calculation of the optimal mode decision is indeed difficult due to the unavailability of either the original WZ or the SI at either of the opposite ends-the encoder, or the decoder. Some of the techniques are purely encoder driven mode decision schemes, where the reference frame is created in order to estimate the mode. In such schemes, RD performance depends on the quality of the SI at the encoder, having a trade-off between RD performance and encoder complexity. Mys et al. [8] used block-based skip mode in the residual TDWZ codec. Liu et al. [9] proposed an iterative method to decide intra or WZ mode on a block by block basis. Ascenso et al. [10] used fast motion estimation techniques to generate SI, and proposed rate distortion based mode decision between the intra and the WZ mode. The hybrid

Keywords-Distributed video coding, Wyner-Ziv coding, Transform domain Wyner-Ziv codecs.

I. I NTRODUCTION Distributed video coding (DVC) is a new video coding paradigm where the most computationally intensive component, that is the motion estimation, is shifted from the encoder to the decoder. It is based on a fundamental result of information theory from Slepian and Wolf [1], later extended by Wyner and Ziv (WZ) [2]. This shift in complexity enables new and interesting applications including visual sensor networks,wireless video cameras and wireless video surveillance. Most of the distributed video coding schemes are based on the frame based approach [3], where the sequence is split into key frames and the WZ frames. Key frames are intra coded using conventional schemes and the non-key frames are WZ coded. WZ frame is estimated by creating the side information (SI) using the previous decoded frames and the errors in the SI are corrected using the parity bits from the encoder. Rate-Distortion (RD) performance of the DVC scheme is mainly dependent on the quality of the SI and the estimation of the correlation structure between the SI generated at the decoder and the original WZ frame at the encoder. Assumptions of linear motion between the reference frames in the SI generation requires significant error corrections in high motion sequences. To improve the RD performance of the DVC codec, recent schemes use the refinement approach, where partial decoded frame is utilized for improving the 978-0-7695-4729-9/12 $26.00 © 2012 IEEE DOI 10.1109/ICMEW.2012.11

19

Figure 1.

Architecture of the proposed scheme

codec of Benierbah et al. [11] did not use key frames. The key information is used at the block level in every WZ frame. One 4x4 sub block of every 8x8 block is intra coded and the remaining three sub blocks are encoded in the multiple layers in WZ mode. The partial decoded block at the decoder is used for refining the SI at the decoder. The use of the block level hybrid encoding pattern of the above schemes is fixed at the encoder and the modes used are not optimal. As an alternative to the encoder mode decision techniques, decoder driven coding mode schemes are reported in the literature with the estimated SI and the correlation. Generally, decoder driven modes provide good RD performance, compared to the encoder driven modes, as the SI is generated at the decoder. Since the original WZ frame is not available at the decoder, modes are estimated based on the information available at the decoder. In the pixel domain, RD based WZ/skip mode decision of the bit plane is proposed in [12]. In the transform domain WZ scheme, decoder driven skip mode decision based on the estimated distortion is proposed in [13] and in [14], skip mode is introduced at the coefficient level and bit plane level, based on the estimated RD cost. Decoder driven coding modes at the block level is used in a recent paper [15], where three modes-skip, intra and the WZ modes are used. In [16], decoder driven adaptive mode decision scheme is employed for each layer of the WZ frame in the pixel domain WZ scheme, estimated using the previous decoded layers and the refined SI. This paper proposes a TDWZ codec, where the layering is applied for the key and the WZ frames and different coding schemes are employed for the key and the WZ frames. The layers of the frame are generated by sub sampling the 4x4 blocks in the spatial domain, as in Fig. 2. For the key frames, layers are coded in the intra and WZ mode which are entirely encoder-driven. The RD performance of the key frame depends on the partition of the layers into intra and WZ mode and the size of the GOP used. Our experiments

show the RD performance of the key frame, by varying the GOP size and number of layers. For the WZ frames, the blocks in the each layer of the WZ frame are encoded using the 4x4 block level decoder-driven adaptive coding mode. The rest of the paper is organized as follows. Section III introduces the refinement codec, and section IV describes the coding scheme of the Wyner-Ziv frames with the use of adaptive coding modes. Experimental details are shown in section V and Section VI concludes the paper. III. B LOCK BASED R EFINEMENT CODEC IN THE TRANSFORM DOMAIN

This section presents the TDWZ coding scheme, where the key and WZ frames are encoded in multiple layers. The encoding structure considered in this scheme is KWKWKWK... and KWWWKWWWK... corresponding to the Group of pictures GOP-2 and GOP-4 respectively. K indicates the key frame and W indicates the WZ frame. The architecture of the proposed scheme is shown in Fig. 1, with the highlighted block representing the adaptive coding mode decision scheme. At the encoder, the first key frame is intra coded using the H.264 codec. The remaining key and WZ frames are encoded by the layered approach. Each frame is divided into 4x4 sub-blocks to form the layers, as in Fig. 2. The sub-blocks marked 1’s are grouped together to from a sub-frame and similar for the sub-blocks marked 2, 3 and 4. For the key frame, one or more of these sub-frames are intra coded. The remaining sub-frames of the key frame and all the sub-frames in the WZ frame are layer wise independently encoded in the WZ mode. Each layer is split into 4x4 subblocks and DCT transform is taken. The DCT bands of all the blocks are grouped according to the location and each band is uniformly quantized [3]. The bit planes of each band are independently encoded using the LDCPA codes [17] with the bit plane length of 396 bits.

20

(a)

Figure 2.

Block based splitting in the TDWZ refinement scheme

(b) Figure 4.

spatial SI are updated by using the updated partial decoded key frame and employed for decoding the next layer. Likewise, the process of SI generation and decoding is repeated, until all the layers are decoded. After decoding the key frame, initial SI for the Wyner-Ziv frame is generated as in the scheme [3]. After decoding the layer-1 sub-frame, the process for decoding the WZ layers in the KEY frame is employed to decode all the layers of the WZ frame. The temporal SI for the WZ frames is generated by the bidirectional reference frames. The same process is employed to decode all the WZ frames in the GOP.

(a)

A. Quality improvement of key frame

(b) Figure 3.

Quality improvement in the key frames (a) GOP-2 (b) GOP-4

In the above key frame coding scheme, sub blocks having the WZ mode are decoded by the SI estimated using the previous key frame as the reference. The quality of the key frame can be improved after decoding W frames, which are in between successive key frames. After decoding W frames, SI is refined by using the previous W frames in addition to the key frame, as the temporal distance of W frame is less compared to the previous key frame, as shown in the Fig. 4. This step will certainly improve the quality of the decoded key frames, and minimize the accumulation of the error drift.

Coding structure (a) GOP-2 (b) GOP-4

The coding structure of the proposed scheme is shown in Fig. 3(a) and Fig. 3(b) for GOP-2 and GOP-4 respectively. At the decoder, layers in the intra mode of the key frame are decoded and the partial decoded key frame is formed by placing the intra decoded blocks in their respective locations. Motion vector refinement is carried out for each 8x8 block in the original resolution frame by using the partial decoded 8x8 block and the previous decoded key frame as the reference for generating the temporal SI. For each 4x4 subblock in the next decoding layer, spatial SI is generated by the bilinear interpolation using the neighboring decoded 4x4 sub-blocks. Quality of the spatial SI is assessed based on the variance of the decoded neighboring sub-blocks. Mean square error (MSE) of the partial decoded block and the temporal SI are used for the assessment of the quality of the temporal SI. If the variance is less than the MSE and MSE is greater than threshold (𝜃1 ) then the spatial SI is used, else the temporal SI is used for decoding the 4x4 sub-block in the next layer. After decoding the 4x4 sub-blocks, they are placed in their respective locations in the key frame to update the partial decoded key frame. Next, the temporal and

IV. C ODING SCHEME OF WYNER - ZIV FRAME USING ADAPTIVE MODES

A. Modes in the Wyner-Ziv frame This section presents the decoder driven adaptive coding mode decision scheme at the block level for the WZ frame in the refinement codec. At first, skip or WZ coding mode is tested at 8x8 block level. If the skip condition is satisfied, then the entire block is skipped without coding, else, the part of the 8x8 is coded in the WZ mode. Let (p, q) denote the location of the 8x8 block in the WZ frame. Let 𝑀 𝐴𝐷𝑝𝑞 be the mean of absolute difference of the motion compensated temporal interpolation residue (MCTIR) of the 8x8 block and [𝑀 𝑉 𝑋𝑝𝑞 , 𝑀 𝑉 𝑌𝑝𝑞 ]

21

𝜆 is the lagrangian multiplier. The RD costs of the WZ and skip mode are estimated for the layer-1 sub-blocks using the above equation. The RD cost of the skip and WZ mode of the current sub-block in the layer-2 is the average cost of all the neighboring layer-1 sub-blocks in the skip and WZ mode. The mode having the minimum RD cost is the estimated mode for the current sub-block and the mode map is communicated to the encoder. The sub-blocks in the layer2 are encoded using the corresponding mode. The similar mode decision scheme is employed for the sub-blocks in the layer-3 and layer-4 by using the decoded neighboring subblocks in the RD calculation for mode estimation. There will be marginal addition of overhead of the feedback channel bit rate, due to the transmission of the mode map for each layer. As the decoding progresses, the incremental improvement in the quality of the SI with refinement gets less. Therefore, the probability of change in the mode decision between consecutive layers is low. The correlation between inter-layer mode maps is exploited to reduce the bit rate of the feedback channel.

be the initial motion vectors generated in the frame interpolation. The coding mode decision at the block level is done as follows If (𝑀 𝐴𝐷𝑝𝑞 < 𝜃2 and (∣𝑀 𝑉 𝑋𝑝𝑞 ∣ + ∣𝑀 𝑉 𝑌𝑝𝑞 ∣) < 𝜃3 ) 8x8 block at (p, q) is skipped without encoding Else layer-1 4x4 sub-block located at (p, q) is WZ coded End After decoding the sub-blocks in the layer-1 using the corresponding mode, temporal refined SI is generated and the mode decision is carried out for the sub-blocks in the layer-2. The mode decision for the sub-blocks in layer-2 is dependent on the updated SI and the information of the decoded neighboring sub-blocks belonging to the first layer. The RD calculations of the 4x4 sub-block in the skip and WZ modes are given below. Let the refined SI coefficient of the decoded band b of the 4x4 sub-block be 𝑌𝑏 . In the skip mode, reconstruction of the band is the same as the refined 𝑌𝑏 . The average distortion between the original coefficient 𝑋𝑏 and the refined SI coefficient 𝑌𝑏 is 𝑁𝑏 1 ∑ 𝐷𝑠 = 𝑁𝑏 1

∫ 𝑈𝑏 𝐿𝑏

𝑎𝑏𝑠(𝑋𝑏 − 𝑌𝑏 )𝑝(𝑋𝑏 ∣𝑌𝑏 ).𝑑𝑋𝑏 ∫ 𝑈𝑏 𝑝(𝑋𝑏 ∣𝑌𝑏 ).𝑑𝑋𝑏 𝐿𝑏

V. EXPERIMENTAL RESULTS We present the experimental results to demonstrate the performance of the proposed scheme. The RD performance of the proposed scheme is tested with the standard QCIF sequences foreman and coastguard at the rate of 15 frames per second with GOP-2 and GOP-4. 4x4 block quantization parameters are used from the band level predefined quantization matrices [3].

(1)

𝑁𝑏 is the number of decoded bands having non-zero quantization levels. If the decoding is performed using the WZ mode, then the coefficient is reconstructed using the Minimum Mean Square Error (MMSE) criteria.The average distortion between the reconstructed coefficient and the original coefficient is given by.

𝐷𝑤𝑧

𝑁𝑏 1 ∑ = 𝑁𝑏 1

∫ 𝑈𝑏 𝐿𝑏

𝑎𝑏𝑠(𝑋𝑏 − 𝑅𝑒𝑐(𝑌𝑏 ))𝑝(𝑋𝑏 ∣𝑌𝑏 ).𝑑𝑋𝑏 ∫ 𝑈𝑏 𝑝(𝑋𝑏 ∣𝑌𝑏 ).𝑑𝑋𝑏 𝐿𝑏

(2) [𝐿𝑏 , 𝑈𝑏 ] is the range of the decoded bin of the 𝑏𝑡ℎ band. The conditional probability density function 𝑝(𝑋𝑏 ∣𝑌𝑏 ) is assumed to be Laplacian, centered on the SI coefficient 𝑌𝑏 . By using the decoded and the refined SI coefficient, the parameter 𝛼 is estimated for each decoded band. Given the refined SI coefficient and 𝛼, the rate 𝑅𝑤𝑧 in the WZ mode is calculated as in [14]. The estimated rate of the sub-block is calculated as the average rate of all the decoded coefficients in the sub-block. Given the rate and the distortion, the RD cost is calculated as follows. 𝐶𝑤𝑧 = 𝑅𝑤𝑧 + 𝜆𝐷𝑤𝑧 𝐶𝑠 = 𝜆𝐷𝑠

Figure 5. Rate-Distortion performance comparison of the key frame foreman sequence

The layered scheme is analyzed individually for the key and the WZ frame. The layered coding of the key frame is compared with the key frame encoded in intra H.264. The RD performance comparison is plotted for the foreman and coastguard sequence in Fig. 5 and Fig. 6 by considering the layers L=1,2 intra coded with GOP-2 and GOP-4 for the key frame and all the layers of the WZ frame are encoded in the WZ mode. The rate and distortion in the plot are calculated only for the key frame. For the foreman sequence

(3)

(4)

22

Figure 6. Rate-Distortion performance comparison of the key frame coastguard sequence

Figure 8. sequence

Rate-Distortion performance comparison in GOP-4 foreman

Figure 9. sequence

Rate-Distortion performance comparison in GOP-2 coastguard

with GOP-2, and L=1, a gain of about 0.5dB is obtained. Whereas with L=2, intra and the hybrid coding schemes are almost the same with a gain of about 0.2dB.In GOP-4, the approach of partly intra coding and partly WZ coding results in a loss of about 0.7dB with L=1 and 1 dB with L=2, as compared to full intra coding of the key frame. This loss is due to high cost for coding the sub-parts of the key frame and larger distance between the key frames. For the coastguard sequence with GOP-2, and L=2, a gain of about 0.5dB is obtained. Whereas with L=1, the RD curve follows the h.264 intra. In GOP-4, it results in a loss of about 0.3dB with L=1 and 0.9 dB with L=2.

Figure 7. sequence

Rate-Distortion performance comparison in GOP-2 foreman Figure 10. Rate-Distortion performance comparison in GOP-4 coastguard sequence

To evaluate the performance of the layered coding of the WZ frames with decoder driven adaptive coding modes, RD performance is obtained for 1. TDWZ Refinement codec: Key frames are encoded using the H.264 intra coding scheme. WZ frames are split into layers by sub sampling the 4x4 block as given in this paper and decoded layer wise as described in the section.3. 2. TDWZ Refinement codec with adaptive coding of the WZ frames: Key frames are encoded in the H.264 intra

coding scheme. WZ frames are encoded by the decoder driven coding scheme as explained in the section 4. 3. DISCOVER codec [3]. 4. Refinement based scheme [6]. RD performance comparison is shown in the Fig. 7 and Fig. 8 for the GOP-2 and GOP-4 for the foreman sequence. The spatial splitting TDWZ Refinement scheme without

23

coding modes scheme gives a gain of about 1.3 dB compared to the DISCOVER codec for GOP-2 and 2.2 dB for GOP4. Adaptive coding modes used in the WZ frame gives 1.2 dB gain over the refinement codec for GOP-2 and 0.6 dB for the GOP-4. The RD performance plot for the coastguard sequence GOP-2 and GOP-4 is shown in Fig.9 and Fig.10. As the better initial SI is generated, the gain due to the refinement approach is less and about 0.5dB and 1dB is obtained over the DISCOVER codec. With the use of the adaptive modes, 1.2dB and 2dB gain can be obtained compared to the DISCOVER codec for GOP-2 and GOP-4. Although the scheme looks promising but it is at the cost of increase in the decoder complexity, due to the use of the layered structure with refinement approach for the key and WZ layers compared to the layered approach used only for the WZ frames [6, 7]. As the feedback channel is used for requesting the parity bits for each layer, the overhead in the transmission of the mode map for each layer is less. The bit rate overhead due to the transmission of the mode map is about 3-7 kbps. Compared to the actual rate spent on coding the frames, the feedback channel rate for the mode decision is indeed very small.

[6] R. Martins, C. Brites, J.Ascenso, and F. Pereira, ”Refining Side Information for Improved Transform Domain Wyner-Ziv Video Coding,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 19, no. 9, Sep. 2009. [7] X. Fan, O.C. Au, N.M. Cheung, and J. Zhou, ”Successive refinement based WynerZiv video compression,” Signal Processing: Image Communication, vol. 25, no. 1, pp. 4763, 2010. [8] S. Mys, J. Slowack, J. Skorupa, P. Lambert and R.VandeWalle, ”Introducing skip mode in distributed video coding,” Signal Processing Image Communication, Vol. 24, no. 3, pp. 200-213, 2009. [9] L. Liu, D. K. He, A. Jagmohan, L. Lu and E.J. Delp, ”A lowcomplexity iterative mode selection algorithm for Wyner - Ziv video compression,” IEEE International Conference on Image Processing (ICIP), pp. 1136-1139, Oct. 2008. [10] J. Ascenso and F. Pereira, ”Low complexity intra mode selection for efficient distributed video coding,” International Conference on Multimedia and Expo (ICME), pp. 101-104, July. 2009. [11] S. Benierbah and M. Khamadja,” Hybrid wyner-ziv and intra video coding with partial matching motion estimation at the decoder,” IEEE International Conference on Image Processing (ICIP)., pp.2925-2928, 7-10 Nov. 2009.

VI. CONCLUSION In this paper,we presented a layered TDWZ refinement codec for coding the key and WZ frames. The layers are generated by sub sampling the 4x4 blocks in the spatial domain. Adaptive coding mode decision scheme for coding the WZ frame is introduced in the refinement codec. In our future work, we intend to combine the band level [14] and block level mode decision scheme for coding the WZ frames to improve the RD performance and the coding scheme of the key frame.

[12] W. J. Chien and L. J. Karam, ”BLAST-DVC: Bitplane selective distributed video coding,” Multimedia Tools and Applications, July. 2009, DOI: 10.1007/s11042-009-0314-8. [13] J. Slowack, J. Skorupa, S. Mys, P. Lambert, C. Grecos and R. VandeWalle, ”Distributed video coding With decoder-driven skip,” Proceedings of the Mobimedia, Sep. 2009. [14] J. Slowack, S. Mys, J. Skorupa, N. Deligiannis, P. Lambert, A. Munteanu and R. Walle,”Rate-distortion driven decoder-side bit plane mode decision for distributed video coding,” Signal Processing Image Communication, Vol.25, no.9, pp.660-673, 2010.

R EFERENCES [1] D. Slepian and J. Wolf, ”Noiseless coding of correlated information sources,” IEEE Trans. Inf. Theory, vol. 19, no. 4, pp. 471-480, July 1973.

[15] S. Mys, J. Slowack, J. Skorupa, N. Deligiannis, P. Lambert, A. Munteanu and R. Walle,”Decoder-driven mode decision in a block-based distributed video codec,” Multimedia tools and applications, 2011.DOI 10.1007/s11042-010-0718-5.

[2] A. Wyner and J. Ziv, ”The rate distortion function for source coding with side information at the decoder,” IEEE Trans. Inf. Theory, vol. 22, no. 1, pp. 1-10,1976.

[16] V. Kumar and S. Sengupta,”Decoder driven multi resolution side information refinements and mode decisions for improved rate-distortion performance in distributed video coding”, in IEEE Intern. Conference on Multimedia and Expo,spain 2011.

[3] X. Artigas, J. Ascenso, M. Dalai, S. Klomp, D. Kubasov, and M. Ouaret, ”The DISCOVER codec: architecture, techniques and evaluation,” in Proc. of PCS 2007, Lisbon, Portugal, November, 2007.

[17] D. Varodayan, A. Aaron, and B. Girod, Rate-Adaptive Codes for Distributed Source Coding, EURASIP Signal Processing, vol. 86, no. 11, pp. 31233130, Nov. 2006.

[4] J. Ascenso, C. Brites and F. Pereira, ”Motion Compensated Refinement for Low Complexity Pixel Based Distributed Video Coding,” International Conf. on Advanced Video and Signal Based Surveillance, Sep. 2005. [5] S. Ye, M. Ouaret, F. Dufaux and T. Ebrahimi,”Improved Side Information Generation for Distributed Video Coding by Exploiting Spatial and Temporal Correlations,” EURASIP Journal on Image and Video Processing, vol. 2009, 15 pages, 2009.

24