Wyner-Ziv Video Coding With Improved Motion Field ...

4 downloads 47381 Views 566KB Size Report
estimation algorithm is block based, where all pixels within blocks .... app vu t app. M. YP. M. P. MP vu θ. (1) where Ŷ(u,v)+Mu,v is the k-by-k block of Ŷ with top ...
2011 International Conference on Instrumentation, Communication, Information Technology and Biomedical Engineering 8-9 November 2011, Bandung, Indonesia

Wyner-Ziv Video Coding With Improved Motion Field Using Bicubic Interpolation I Made Oka Widyantara1, Wirawan2 and Gamantyo Hendrantoro3 1

Department of Electrical Engineering, Universitas Udayana, Denpasar Bali Department of Electrical Engineering, Institut Teknologi Sepuluh Nopember (ITS), Surabaya Indonesia 1 2 3 e-mail: [email protected] , [email protected] , [email protected]

1,2,3

Abstract* - Wyner-Ziv video coding is a new paradigm in video coding based on Slepian-Wolf and Wyner-Ziv theorem. Not like standard video codec, Wyner-Ziv video coding estimates motion in decoder in order to simplify encoder. Among many motion estimation methods that already applied, ExpectationMaximization algorithm is the most effective one. This motion estimation algorithm is block based, where all pixels within blocks share common motion probability distribution which leads to less effective side information generation. This paper proposes method to improve motion field into pixel precision using Bicubic interpolation. The proposed method reduces decoding complexity significantly compared to existing WZVC codec with similar RD performance. Keyword: Wyner-Ziv video coding, Bicubic Interpolation, Expectation-Maximization, Motion estimation

I. INTRODUCTION Wyner-Ziv video coding (WZVC) is a new paradigm in video coding [1], based on Slepian-Wolf [2] and Wyner-Ziv [3] information theories in 1970s. Those theories proved that separate encoding and joint decoding (distributed compression) can achieve similar performances to joint encoding and joint decoding as long as correlated side information (SI) is used in decoder side. SI quality strongly influences the compression efficiency in WZVC. In general, better SI will lead to better rate-distortion (RD) performance. But, as long as SI is generated in decoder with minimum information about source, accurate SI estimation becomes a difficult task. Paper [4] has confirmed that to improve performance WZVC, the main task to be done is to get the best SI at the decoder. Recently, many practical methods to generate SI have been proposed, for example, motion-compensated Interpolation (MCI) and motion-compensated extrapolation (MCE) that adopted in classical pixel domain WZVC [1]. Key idea of this method is prediction of current frame by motion estimation that uses decoded frame. However, decoded frame carries only limited information.. Moreover, since blind motion estimation does not use any information of current frame to be encoded, the SI is generally not accurate. *

ACKNOWLEDGMENT: This work is supported by 2011 Doctoral Dissertation Research Grant from the Indonesian Ministry of National Education under contract No : 636/IT2.6/KU/2011. Doctoral study of Mr. I M.O. Widyantara is supported by BPPS scholarship from the same sponsor.

978-1-4577-1166-4/11/$26.00 ©2011 IEEE

Discover codec follows this WZVC architecture and improves SI generation to increase codec performance [5]. In this codec, estimate of motion field within MCI is done bidirectionally (backward and forward motion estimation) followed by spatial motion smoothing technique to smoothen blocking effect. Recently, a more accurate SI generation technique with motion vector learning method based on Expectation Maximization (EM) algorithm has been proposed for WZVC codec [6]. With this method, in motion estimation process, decoder learns forward motion vector iteratively using bit stream of Wyner-Ziv (WZ) frame and one previous reconstruction frame as reference frames. Decoder will always updates motion field to generate accurate SI, in order to lower number of syndrome bits required by LDPC decoder. In EM algorithm, motion field estimation, M, (or distribution probability) is based on blocks instead of frame to prevent high operational cost from large number of possible value of M. So, if one frame is divided into k x k blocks, all pixels within those blocks will share the same motion field probability distribution (Papp{Mu,v}). The use of single Papp{Mu,v} for all pixels within k x k block is equivalent to Nearest Neighbor (NN) interpolation a blockby-block motion field onto a pixel-by-pixel motion field. Reference [7] showed that the use of NN interpolation technique to enlarge motion field resolution from block based to pixel based produced unnatural step-like transitions in motion field profile. This causes the motion field compensation has resulted in less accurate SI. In order to improve motion field profile, we need such an interpolation technique to smoothen the block transition effect. Existing WZVC codec as in [6] that uses Bilinear interpolation technique, has produced a better RD performance than MCI method. Author has analyzed RD performance and decoding complexity of implementation of NN and Bilinear interpolation on existing WZVC codec [8]. The result showed that Bilinear interpolation is better for processing video sequence with high and complex motion content. An alternative improvement in probability distribution estimation motion field mechanism by improving searching mechanism using diamond method and bilinear interpolation technique has been proposed by [9]. This method is able to reduce EM complexity with RD performance close to existing WZVC codec.

2011 International Conference on Instrumentation, Communication, Information Technology and Biomedical Engineering 8-9 November 2011, Bandung, Indonesia

As an effort to ameliorate existing WZVC codec performance, this paper proposes Bicubic interpolation technique to improve motion field probability distribution into pixel precision, after block based motion field probability estimation process. There are two performance parameters of the codec in the analysis of the implementation of this method, the RD performance and complexity of the decoder. The goal is to obtain an efficient interpolation technique, which contributes to the enhanced performance of existing EM-based WZVC codec. This paper is organized as follows. In Section 2, unsupervised forward motion vector learning based on EM algorithm for WZVC codec is reviewed. Next on Section 3, we review Bicubic interpolation technique to interpolate a block by block motion field onto pixel to pixel motion field in EM algorithm. Evaluation of this implementation method in existing transform-domain WZVC will be explained in Section 4, and finally conclusions are presented in Section 5.

Fig. 1. Motion field interpolation in EM-based unsupervised forward motion vector learning.

II. WYNER-ZIV CODING OF VIDEO WITH UNSUPERVISED FORWARD MOTION VECTOR LEARNING VIA EM As shown in Fig.1, X (current Wyner-Ziv frame) is related to Ŷ (reference frame) through a forward motion field (M). WZVC decoder has to compute probability distribution of motion field (P{M}), with part of, or without any information of X. EM algorithm is applied to compute the P{M} because of its capability in estimate parameters without prior information or complete observation data. Besides, EM algorithm ensures that coefficient estimation converges into optimal value by using maximum likelihood function. EM algorithm also has been applied to estimate disparity probability distribution in stereo image coding [10]. In encoder side, X is encoded into syndrome bits (S) by LDPC encoder. Next, those bits sent gradually to decoder as requested by decoder. On decoder side, Ŷ is available from previously reconstructed frame. An iterative learning algorithm based on EM algorithm is applied to learn motion field using Ŷ and S. Iterative learning algorithm is performed stepwise including E-step and M step. On E-step, motion estimator updates estimated block-based motion field probability distribution (Papp {Muv}). In iterate t, before normalization, update denoted by:

{

(t ) Papp {M u ,v }:= Papp(t −1) {M u ,v }P Yˆ(u ,v)+M u ,v | M u ,v ;θ u(t,v−1)

}

(1)

where Ŷ(u,v)+Mu,v is the k-by-k block of Ŷ with top left pixel at ((u,v)+Mu,v). Note that P{Ŷ(u,v)+Mu,v|Mu,v;θu,v(t-1)} is the probability of observing Ŷ(u,v)+Mu,v given that it was generated through vector Mu,v from Xu,v as parameterized by θu,v(t-1). This procedure, shown in the left of Fig. 2, occurs in the blockbased motion estimator. Next, motion field interpolator is used to enhance motion field probability distribution resolution from block based to pixel based. Existing WZVC codec uses Bilinear interpolation

Fig. 2. E-step block-based motion estimator (left) and probability model (right)

technique to smoothen motion field profile, using 4 (four) motion field probability distribution (Papp{Mu,v}) for each pixel within k x k block. The probability model iteratively updates soft SI distribution (ψ) by blending information from pixels in Ŷ according to updated pixel based motion field distribution, as shown in the right of Fig. 2. More generally, the probability that the blended SI has value ω at pixel (i, j) is

{

(t ) ψ (t ) (i, j , ω ) = ∑ Papp {M = m}P X (i, j ) = ω | M = m, Yˆ m

(

(t ) {M = m}p Z ω − Yˆm (i, j ) = ∑ Papp m

)

} (2)

where pZ(z) is the probability mass function of the independent additive noise Z, and Ŷm is the previous reconstructed frame compensated through motion configuration m. On M-step, decoder LDPC computes soft estimate θ(t) by using already generated soft SI (ψ(t)).

θ (t ) (i, j ,ω ) d

( ) [ ] (1 − α ( ) ) [

:= ψ (t ) (i, j , ω )∏ α g(t ) g =1

1 wg =1

t g

1 wg = 0 ]

(3)

where ωg denotes the gth bit in Gray mapping of luminance value ω and 1[.] denotes the indicator function.

2011 International Conference on Instrumentation, Communication, Information Technology and Biomedical Engineering 8-9 November 2011, Bandung, Indonesia

To this process, LDPC decoder implement joint bit plane LDPC decoding method to maximize soft SI (ψ(t)) and syndrome (S). For procedure in detail of joint bitplane algorithm decoding, please refer to [6]. M-step also generates a hard estimate of by taking one most probable value for every pixel that match to soft estimate (θ(t)). ,

arg

, ,

(4)

By iterating through the M-step and the E-step, the LDPC decoder requests more syndrome bits if the estimates don’t converge. The algorithm terminates when the hard estimate of yields syndromes which are identical to Syndrome (S).

Fig. 3. Bicubic interpolation scheme

III. THE PROPOSED ALGORITHM To improve accuracy of soft SI (ψ(t)), this paper proposes Bicubic interpolation technique as a method to increase motion field resolution by preventing boundary effect and smoothen transition between blocks. As shown in Fig.1, after block based motion field probability estimation lasted, motion field resolution increase is carried out by interpolating motion field probability distribution from block to pixel. We adopt Bicubic interpolation technique from library documentation of Intel IPP version 6 [11]. Then we integrate this library function in our source code. Assume that Mu,v(xS,yS) denotes block by block motion field, and Mi,j(xD,yD) denotes pixel by pixel motion field. Bicubic interpolation algorithm uses 16 probability distribution Papp{Mu,V(xS,yS)} that close to (xS,yS) position in block based motion field, i.e. : xS0 = int(xS) - 1; xS2 = xS0 + 2; yS0 = int(yS) - 1; yS2 = yS0 + 2;

xS1 = xS0 + 1; xS3 = xS0 + 3; yS1 = yS0 + 1; yS3 = yS0 + 3.

Fig. 4. Architecture of transform-domain WZVC codec

(5)

where, • (xD,yD) is coordinates in pixel based motion field (integer value) • (xS,yS) is coordinates that computed from a position in block based motion field mapped exactly to (xD,yD) • Papp{Mu,v(x,y)} is block based motion field distribution probability • Papp{Mi,j(x,y)} is pixel based motion field distribution probability Firstly, for each ysk, the algorithm defines 4 (four) polynomial cubic F0(x), F1(x), F2(x), and F3(x) using below equations: Fk(x) = akx3 + bkx2 + ckx + dk, such that Fk(xS0) = Papp{Mu,v(xSo,ySk)}, Fk(xS1) = Papp{Mu,v(xS1,ySk)},



0≤k≤3

(6)

Fk(xS2) = Papp{Mu,v(xS2,ySk)}, Fk(xS3) = Papp{Mu,v(xS3,ySk)}.

(7)

In Fig 3, these polynomials are shown by solid curves. Next, the algorithm defines one cubic polynomial such that: Fy(yS0) = F0(xS), Fy(yS1) = F1(xS), Fy(yS2) = F2(xS), Fy(yS3) = F3(xS)

(8)

Polynomial Fy(y) is represented by dash curve in Fig.3. Finally, the value of probability distribution Papp{Mi,j(xD,yD)} is set on Fy(yS). IV. EXPERIMENTS AND RESULTS Procedure of motion field interpolation in EM-based unsupervised forward motion vector learning is shown in Fig. 1. We implement it in existing transform-domain WZVC codec as in [6], with codec architecture shown in Fig. 4. Codec divides video sequence into fixed size group of picture (GOP). The first frame in GOP is coded as Key frame using JPEG standard and decoded without any reference to SI. The subsequent frames of GOP (called WZ frames) are coded

2011 International Conference on Instrumentation, Communication, Information Technology and Biomedical Engineering 8-9 November 2011, Bandung, Indonesia

according to Fig. 4 using previous reconstructed frame as decoder reference. We use 2 (two) video sequences, i.e. Foreman and Carphone which have decreasing motion complexity. We use all video sequences in QCIF and 15 Hz resolution, as in previous research [6]. Next, each WZ frame of QCIF size is divided into 4 (four) quadrants and each quadrant is separately encoded using the corresponding quadrant of the previous reconstructed frame as decoder reference. Each quadrant from WZ frame is transformed using block-based k-by-k DCT, to exploit spatial redudancy within quadrant. Transform coefficients are then quantized uniformly using JPEG quantization with scaling factors, Qf, = 0.5, 1, 2 and 4 [12]. Scaling factor associated with the quality index factor, Qi from JPEG is 75, 50, 25, and 13, so that the greater the scaling factor, the more coarse the quantization. Quantization indices are input into LDPC encoder to reconstruct syndrome (S). In DCT procedures, we use block size of 8 x 8 pixels for the block-based motion estimator, the motion field interpolation, and the probability model. To estimate block-based motion field in motion estimator and probability model, we use motion search range ± 10 pixels horisontally and vertically. Motion field estimation will generate block based probability distribution with 11 x 9 samples resolution. Bicubic interpolation a block-by-block motion field onto a pixel-bypixel motion field then generate pixel based motion field probability distribution (Papp{Mi,j}) back to 88 x 72 pixel resolution. Rate control is impelemented by using rate-adaptive regular degree 3 LDPC accumulate codes of length 50688 bits as a platform for the joint bitplane systems. In these experiments, the EM algorithm at the decoder is initialized with a good value for variance of Laplacian noise Z and experimentallychosen distributions for motion vectors (Mu,v), same as that used in [6], that is:

{

(t ) Papp M u ,v

}

⎧ ( 34 )2 , if M u ,v = (0,0) ⎪3 1 = ⎨ 4 ⋅ 80 if M u ,v = (0,∗), (∗,0) ⎪ ( 1 )2 , otherwise 80 ⎩

(a)

(b) Fig. 5. RD curves for GOP sizes 2, (a) Foreman, (b) Carphone

(9)

After 50 decoding iterations of EM, if the reconstructed Xˆ still does not satisfy the syndrome condition, the decoder requests additional incremental transmission from the encoder via a feedback channel. A. Analysis of RD Performance This section presents the RD performance of the proposed codec WZVC, notably in comparison with existing WZVC codec, and in the evaluation of the GOP size 2. Fig.5 indicates that the WZVC codec with Bicubic interpolation produces an almost identical RD performance with existing WZVC codec with Bilinear interpolation, either on the video sequences Foreman and Carphone. At fixed rate, both interpolation techniques produce the same PSNR gain and constant throughout scaling factor quantization, Qf = 0.5, 1, 2 and 4.

In the EM algorithm, the probability distribution block based motion field consists of probabilities that live in the interval [0,1]. Thus, the implementation of interpolation technique is identical to mapping probabilities to the probabilities. Paper [9] has stated that the linear convex combination of probabilities generated by the interpolation technique must also continue on the interval [0,1]. Identical RD performance produced by both methods showed that Bicubic interpolation is also able to produce a linear convex combination of probability on the interval [0,1]. B. Analysis of Decoding Complexity The decoding complexity is evaluated by measuring the average decoding time per quadrant of EM iteration time needed by decoder to fulfill the conditions of syndrome, and expressed in seconds. This test was performed on a PC with processor intel core quad @2.33 GHz, 3 GB of memory, and Windows Vista operating system, for the Foreman video sequence. While the test for the Carphone video sequence

2011 International Conference on Instrumentation, Communication, Information Technology and Biomedical Engineering 8-9 November 2011, Bandung, Indonesia

performed on a PC with a processor core 2 quad @2.66 GHz, 4 GB memory, and Windows XP operating system. Source code of codec built with C++ code written using Visual Studio C++ compiler. Bilinear and Bicubic interpolation function is applied to the source code using the Intel IPP library version 6 (IPPI_INTER_LINEAR and IPPI_INTER_CUBIC). It is intended to allow fair comparisons. Figure 6 shows the comparison of the total time required to decode 96 frames at each value of quantization scale factor, using GOP size 2. In general, for both video sequences used, the implementation of Bicubic interpolation reduces the complexity of the decoder WZVC. The most decrease in complexity occurred in scaling factor Qf = 0.5 up to 9.49% for Foreman, and up to 7.33% for Carphone. This indicates that the codec WZVC with Bicubic interpolation is suitable to encode video sequences with high and complex motion content.

V. CONCLUSIONS In this paper, an improvement of transform-domain WZVC codec based on motion learning has been done. The new WZVC codec improves motion field probability distribution into pixel precision, using Bicubic interpolation technique. Experimental results showed that implementation of bicubic interpolation technique reduces decoder complexity significantly with RD quality almost equal to previous learning based WZVC codec that use Bilinear interpolation. ACKNOWLEDGMENT The authors would like to thank David Varodayan and David Chen of Information System Lab., Department of Electrical Engineering at Stanford University for sharing the code of WZVC codec. [1] [2] [3] [4]

[5] [6]

[7] (a) [8]

[9]

[10]

[11] [12]

(b) Fig. 6. Decoding complexity for GOP size 2, (a) Foreman, and (b) Carphone

REFERENCES B. Girod, A. Aaron, S. Rane, and D.R. Monedero, “ Distributed video coding,” in Proceedings of the IEEE, Vol. 93, No. 1, pp. 71–83, January 2005 D. Slepian, and J.K. Wolf, “ Noiseless coding of correlated information sources, “ in IEEE Transaction Information Theory, Vol. 19, No. 4, pp. 471– 480, July 1973. A.D. Wyner, and J. Ziv, “ The Rate-Distortion function for source coding with side information at the dekoder, “ in IEEE Transaction. Information Theory, Vol. 22, No.1, pp. 1-10, January 1976. C. Guillemot, F. Pereira, L.Torres, T. Ebrahimi, R. Leonardi, and J. Ostermann, ” Distributed monoview and multiview video coding,” in IEEE Signal Processing Magazine, Vol. 24, No.5 pp. 67-76, September 2007. X. Artegas, J. Ascenso, M Dalai, S Klomp, D. Kubasov, and M. Ouaret, “ The DISCOVER codec: architecture, techniques and evaluation, “ in Proceedings of Picture Coding Symposium, Lisbon, Portugal, 2007 D. Varodayan, D, Chen, M. Flierl, and B. Girod, “ Wyner-Ziv coding of video with unsupervised motion vector learning,” in EURASIP Signal Processing: Image Communication Journal, Special Issue on Distributed Video Coding, Vol. 23, No. 5, pp. 369-378, 2008 I M.O. Widyantara, Wirawan, and G. Hendrantoro, ”Efficient motion field interpolation method for Wyner-Ziv video coding,” in Telkomnika, Vol. 9, No. 1, pp. 191 – 200, April 2011. W. Haifang, and W. Anhong, “An improved unsupervised learning of motion estimation based on diamond searching for distributed video coding,” in International conference on computational aspects od social networks, 2010. D. Chen, D. Varodayan, M. Flierl, and B. Girod, “Distributed stereo image coding with improved disparity and noise estimation.,” in Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing. Las Vegas, Nevada. 2008 D. Varodayan, Y.C. Lin, A. Mavlankar, M. Flierl and B. Girod, “Wyner-Ziv coding of stereo image with unsupervised learning of disparity,” in Proceedings of Picture Coding Symposium. Lisbon, Portugal. 2007 Intel® Integrated Performance Primitives for Intel® Architecture, Reference manual, Vol. 2: Image and video processing, March 2009. ITU. ISO/IEC 10918-1. ITU-T Recommendation T.81. Information Technology-Digital Compression and Coding of Continuous-tone Still Image-Requirements and Guidelines. 1993.