Rate control algorithm for pixel-domain Wyner-Ziv video coding Antoni Rocaa , Marleen Morb´eeb , Josep Prades-Nebota and Edward J. Delpc a
DCOM, Universidad Polit´ecnica de Valencia, 46022 Valencia - Spain b c
TELIN-IPI-IBBT, Ghent University, 9000 Ghent, Belgium
VIPER, Purdue University, West Lafayette, IN 47907-1285, USA ABSTRACT
Wyner-Ziv video coders perform simple intra-frame encoding and complex inter-frame decoding. This feature makes this type of coder suitable for applications that require low-complexity encoders. Video coding algorithms provide coding modes and parameters so that encoders can fulfill rate constraints and improve the coding efficiency. However, in most Wyner-Ziv video coders, no algorithm is used to optimally choose the coding modes and parameters. In this paper, we present a rate control algorithm for pixel-domain Wyner-Ziv video coders. Our algorithm predicts the rate and distortion of each video frame as a function of the coding mode and the quantization parameter. In this way, our algorithm can properly select the best mode and quantization for each video frame. We show experimentally that, even though the rate and distortion cannot be accurately predicted in Wyner-Ziv video encoders, rate constraints are approximately fulfilled and good coding efficiency is obtained by using our algorithm. Keywords: Wyner-Ziv video coding, distributed video coding, rate control
1. INTRODUCTION In conventional motion-compensated video coders, motion estimation is performed at the encoder in order to exploit the redundancy existent in the video frames. Due to the large complexity of motion estimation algorithms, motion-compensated encoders are much more complex than their correspondent decoders. Consequently, this coding strategy is appropriate for applications where video is encoded once and decoded many times, as occurs in broadcasting or video-on-demand. However, some video applications (e.g., mobile video telephony, wireless video surveillance and disposable video cameras) require low-complexity encoders. This feature can be achieved by using Wyner-Ziv (wz) video coders, which perform simple intra-frame encoding and complex inter-frame decoding.1–4 Wyner-Ziv video coders can be classified into pixel-domain and transform-domain wz video coders. Transform domain wz video coders have greater coding efficiency than the pixel-domain ones. However, pixeldomain Wyner-Ziv (pdwz) video encoders are less complex than their transform domain counterparts. In this paper, we present a rate control algorithm for pdwz video coders. Video coding standards provide coding modes (e.g., skip, intra, inter) and parameters so that encoders can fulfill rate/distortion constraints and improve their coding efficiency. To optimally choose coding modes and parameter values, video encoders analyze the features of the input video.5 This analysis is difficult in wz video coding because wz video coders need to keep their complexity low and because not all the data necessary to make an accurate analysis is available at the encoder. For instance, a wz video encoder cannot know the coding distortion of a frame since the frame that is used to conditionally decode it is only available at the decoder. Thus, many wz video coding algorithms in the literature do not perform any optimization and are unable to fulfill coding constraints in real-time. This can considerably limit the number of applications where these coders can be used. Send correspondence to Antoni Roca (E-mail:
[email protected], Address: etsi de Telecomunicaci´ on, Camino de Vera, s/n, 46022, Valencia). This work was supported by the Spanish Ministry of Education and Science and the European Comission (FEDER) under grant TEC2005-07751-C02-01.
In this paper, we present a rate control algorithm for pdwz video coders. By using a rate-distortion model, our algorithm allows pdwz video coders to select the coding mode and the quantization parameter of each frame in order to provide good coding efficiency and to fulfill rate constraints. We show experimentally that, even though the rate and distortion cannot be accurately predicted in pdwz video encoders, rate constraints are approximately fulfilled and good coding efficiency is obtained by using our algorithm. The rest of the paper is organized as follows. In Section 2, we study the basics of pdwz video coders. In Section 3, we introduce a rate and a distortion model for the encoding of wz-frames. In Section 4, we propose a rate control algorithm for pdwz video coders, and in Section 5, we experimentally test the efficiency of our algorithm and compare it to other rate control strategies. Finally, the conclusions are presented in Section 6.
2. PIXEL-DOMAIN WYNER-ZIV VIDEO CODING In this section, we review the basics of pdwz video coding. Figure 1 shows the block diagram of a pdwz video coder. In pdwz coders, the frames are organized into key frames (k-frames) and Wyner-Ziv frames (wz-frames). The k-frames are coded using a conventional intra-frame coder. The wz-frames are intra-frame encoded but are conditionally decoded using side information (si). For each wz-frame X, an approximation Y is obtained at the decoder by extrapolating or interpolating previously decoded k-frames. The frame Y constitutes part of the si used in the decoding of X.1, 6–9 Let M be the number of bits used to represent the amplitude values of frame pixels. To encode a wz-frame X, first, the m most significant bps {X1 , . . . , Xm } are extracted. Parameter m can be a fixed value along the video sequence6–9 or can be adaptively determined by an adequate algorithm depending on the coding constraints (e.g. see Section 4). Then, each bp Xp (1 ≤ p ≤ m) is independently encoded using a Slepian-Wolf (sw) coder.6–9 The sw coder is implemented using a rate-adaptive channel coder10 that yields the parity bits of Xp , which are saved in a buffer. To determine the number of parity bits of a bp to be transmitted, a feedback channel is used.1 For each bp, the encoder first transmits a subset of P parity bits. Then, the bp is decoded and if the decoder estimates that the bit error probability Pe is above a threshold t, then it requests P additional parity bits through the feedback channel. This decode-request process is repeated until Pe ≤ t. The encoding and decoding of bps is done in order of significance (the most significant bps are transmitted and decoded first). On the decoder side, the sw decoder obtains each bp Xp from the transmitted parity bits, the corresponding bp Yp extracted from Y, and the previously decoded bps {X1 , . . . , Xp−1 } ∗ . Once the sw decoder has decoded the m ˆ of X by using {X1 , . . . , Xm } and Y.7 most significant bps of X, the decoder obtains a reconstruction X PDWZ video decoder
PDWZ video encoder Slepian-Wolf coder
Feedback channel Xp Channel Encoder
Buffer
Parity bits
Xp Channel Decoder
...
Bitplane Extraction
...
X
ˆ X Reconstruction
WZ-frames
Y
...
Yp Y Bitplane Extraction Interpolation/ Extrapolation
K-frames
Intra-frame Encoder
Intra-frame Decoder
Figure 1. Block diagram of a pdwz video coder.
Note that {X1 , . . . , Xm } constitutes a quantized version of X using a uniform quantizer of m bits with step size ∆ = 2M −m − 1. The larger the m, the smaller the ∆, and, therefore, the larger the rate R and the lower ∗
In practical pdwz video coding, sw decoders are allowed to introduce a certain small number of errors.
the distortion D of the decoded video. Once the m most significant bps of a wz-frame X are encoded by the sw coder, m + 1 different decodable bitstreams {BS0 , . . . , BSm } can be generated for X, where BSk (k = 0, . . . , m) contains parity bits of the k most significant bps of X. When m = 0, no bp is transmitted (BS0 does not contain any parity bits) and each decoded frame X is equal to Y. As each bitstream BSp has a rate and a distortion, with the scalable coder shown in Figure 1, m + 1 different rate-distortion (R,D) points are possible for each wz-frame. On the other hand, in non-scalable pdwz video coders,1 each wz-frame is quantized using a uniform quantizer and the sw coder directly encodes the quantization indexes. In these non-scalable coders, once ∆ has been set, only one rate-distortion (R,D) point is possible.
3. PREDICTION OF THE RATE AND THE DISTORTION In this section, we introduce a model to predict the rate and the distortion of a pdwz video coder. Let X and Y be continuous random variables representing a pixel of a wz-frame and the corresponding pixel of its interpolated/extrapolated frame at the decoder, respectively. Let Z be the correlation noise, i.e., Z = Y − X with Z and X being independent. Let x, y and z be outcomes of X, Y and Z, respectively. We assume X is distributed in an interval [xmin , xmax ]. Consider the quantization of X with a set of M embedded uniform p quantizers {Qp }M p=1 . Quantizer Qp has a step size ∆p = (xmax − xmin )/2 and the corresponding quantization indices are represented by p bits {X1 , X2 , . . . , Xp }, where X1 is the most significant bit and Xp is the least significant bit. In the following, we assume that for the quantization of X with Qm , X is quantized with QM but of the M bits {X1 , . . . , XM } only the m most significant bits {X1 , . . . , Xm } are transmitted. Hence, the implicit quantization step size of the transmitted bits is ∆m . √Moreover, √ we assume Z follows a Laplacian distribution with a probability density function (pdf) fZ (z) = 1/( 2σ) exp(− 2|z|/σ) where σ is the standard deviation of the correlation noise. In contrast to the reconstruction used in,7, 11, 12 in this work the reconstruction x ˆ is the minimum-mean-squared-error (mmse) estimate of x given y and quantization interval that x belongs to. In the following, we estimate the rate and the distortion when {X1 , X2 , . . . , Xm } are transmitted. Let fY |X (y|x) be the conditional pdf of Y given X, and let fX (x) be the pdf of X. The average quadratic distortion introduced in the encoding of a certain value x of X when its m most significant bps are transmitted and, when the standard deviation of the correlation noise is σ, is Z ∞ DWZ (x, m, σ) = (x − x ˆ)2 fY |X (y|x) dy (1) −∞
where fY |X (y|x) is the conditional pdf of Y given X. As Z is additive noise, fY |X (y|x) = fZ (y − x), and as Z follows a Laplacian distribution, Z ∞ √ 2 1 DWZ (x, m, σ) = √ (2) (x − x ˆ)2 e− σ |x−y| dy. 2σ −∞ Finally, the average distortion when the source X has a pdf fX (x) is Z ∞ Z ∞ √ 2 1 DWZ (m, σ) = √ fX (x) (x − x ˆ)2 e− σ |x−y| dy dx 2σ −∞ −∞
(3)
Obtaining a closed expression for (3) is difficult. Instead, we have computed (3) via simulation for different values of σ and m by assuming that fX (x) is a uniform distribution. In practical video, however, pixels have discrete-amplitude values, with xmin = 0, xmax = 255 and, therefore, ∆m = 28−m − 1. In the following, we estimate the rate to encode each bp provided that its most significant bps have already been decoded. Let Xp be the discrete random variable that represents the p-th most significant bit of the quantization indices of X. The minimum rate to encode Xp , given Y and the previously decoded bits {X1 , X2 , . . . , Xp−1 }, is RWZ (p, σ) = H(Xp |Y, X1 , . . . , Xp−1 ) and using the chain rule, we obtain RWZ (p, σ) = H(X1 , . . . , Xp |Y ) −
p−1 X i=0
RWZ (i, σ)
(4)
where RWZ (0, σ) = 0. Then, starting with p = 1 and using (4), we can iteratively compute the minimum rate to transmit Xp (p = 1, . . . , m). According to the model Y = Z + X, Y is a continuous random variable, and hence, the term H(X1 , . . . , Xp |Y ) in (4) can be computed through Z ∞ H(X1 , . . . , Xp |Y ) = fY (y) H(X1 , . . . , Xp |Y = y) dy. (5) −∞
As in digital video, the interpolated frames are quantized video frames, pixel amplitude values are integers between 0 and 255. Consequently, to compute H(X1 , . . . , Xp |Y ) we use H(X1 , . . . , Xp |Y ) =
255 X
pY (y) H(X1 , . . . , Xp |Y = y)
(6)
y=0
where pY (y) is the probability mass function of Y . Note, however, that frame Y is only available at the decoder side and that, consequently, the encoder cannot estimate pY (y). To solve this problem, we assume that both X and Y have the same probability mass functions and hence H(X1 , . . . , Xp |Y ) ≈
255 X
pX (y) H(X1 , . . . , Xp |Y = y)
(7)
y=0
= −
255 X
pX (y)
y=0
1 X x1 =0
···
1 X
p(x1 , · · · , xp |y) log p(x1 , · · · , xp |y).
(8)
xp =0
where pX (x) is estimated through the histogram of the wz-frame X, and p(x1 , · · · , xp |y) is computed by integrating the conditional pdf of X given in the interval determined by the values of the bits x1 , · · · , xp † . To take into account the suboptimality of the channel coder used to implement the sw coder, we add a rate margin to the estimated H(Xp |Y, X1 , . . . , Xp−1 ) value. More specifically, the rate necessary to transmit Xp is estimated through RWZ (p, σ) ˜ WZ (p, σ) = + 1 RP (9) R RP where RWZ (p, σ) is computed using (4) and (8), and RP is the rate that corresponds to the transmission of P bits (puncturing period of the rate-adaptive channel coder),10 and due denotes the function that returns the smallest integer not less than u. The error introduced in the estimation of the distortion and the rate of wz-frames through (3), (4) and (8) are due to the approximation in (7), the fact that the correlation noise is not exactly Laplacian and the errors in estimating the value of σ. This last factor is the most important source of error since the difference between the estimated and the real value of σ can be very large.
4. RATE CONTROL ALGORITHM Video coding algorithms provide coding modes and parameters that allow coders to obtain a high coding efficiency and to fulfill the coding constraints (in quality, rate, and delay). Due to the highly non-stationary nature of the video signal, the input video must be continuously analyzed in order to select the coding mode and the parameter values of each coding mode in an optimum way. In this section, we present a rate control algorithm for pdwz algorithms that selects the coding mode and the quantization step size of each frame in such a way that a rate constraint can be met and good coding efficiency is obtained. In those frames of a video sequence where the correlation noise has a large variance, conventional intra-coding can be more efficient than wz coding. For these frames, we allow wz-frames to be encoded in intra mode. Additionally, in portions of the video with very little motion, the distortion of the interpolated/extrapolated frames at the decoder can be very close to the distortion of their k-frames. Therefore, it could be efficient not †
As Z is additive noise, fX|Y (x|y) = fZ (x − y).
to send any parity bits in these frames (skip mode). Finally, when a wz-frame is coded using neither the intra mode nor the skip mode, parity bits of one or more bps must be transmitted. We refer to this coding mode as wz mode. Then, the rate control must select the mode (skip, wz and intra) of each wz-frame and the quantization step size of all the frames. As in,1, 6–9 we assume k-frames are the odd frames and wz-frames are the even frames of the video sequence, and that before encoding a wz-frame, its preceding and succeeding k-frames have already been transmitted and decoded. The decoder obtains an approximation Y of each wz-frame by interpolating its two closest decoded k-frames. Frame Y will be used in the decoding of each wz-frame if the rate control algorithm decides that the wz-frame must be encoded using the skip or wz modes. Our rate control algorithm divides the video sequence into groups of frames (gofs). The first gof has only one frame which is intra coded. The rest of the gofs have two frames: the first one is a wz-frame and the second is a k-frame. To properly select the coding mode of the wz-frame and the quantization parameter of each frame in a gof, the rate control algorithm must collect the rate-distortion data of each frame in the gof. To obtain the rate-distortion data of a frame when it is encoded using the skip or wz mode, our algorithm uses the results obtained in Section 3. Hence, the algorithm must estimate the standard deviation of the correlation noise and obtain the histogram of the frame to encode. To obtain the rate-distortion data of a frame when it is encoded in intra mode, our algorithm maintains a table with a row for each potential QP value of the intra coding algorithm. Each time a frame is intra coded, its PSNR is computed and saved in the table row that corresponds to the QP value used in the encoding. Thus, if we need to estimate the PSNR for a certain QP value, we read the value from the table if its corresponding row is not empty. If the row is empty (i.e., no frame has still been encoded with the desired QP value), we perform linear interpolation using the PSNR of the two QP values that are closest to the desired one. Although the data in this table corresponds to encodings of different frames, its content can be approximately valid in the absence of fast global motion or scene changes. To estimate the number of bits B produced by the intra-frame algorithm when a QP value has been used in the encoding of a frame, we use the bit producing model of MPEG-2 Test Model 5 (TM5) B=
C QP
(10)
where C is the complexity of frames that are intra coded.13 Parameter C is updated each time a frame is intra coded. Once all the necessary rate-distortion data of a gof has been collected, our rate control algorithm selects the optimum mode and quantization parameters so that the rate constraint is fulfilled and the quality of the decoded frames is maximized. The rate constraint imposes that the number of bits spent on the encoding of these frames is approximately the bit budget for that gof. The errors in estimating the rate are taken into account in order to compute the bit budget for the next gof so that the average rate is close to the target rate. With respect to the maximization of the quality of the decoded frames in a gof, we have considered two different criteria: to maximize the average PSNR in the gof (maxave strategy) and to maximize the minimum PSNR in the gof (maxmin strategy).14 The maxave strategy provides a better average PSNR than the maxmin strategy, but at the expense of introducing large quality fluctuations between consecutive k- and wz-frames. In the following, we describe the main steps of the rate control algorithm. The target number of bits to encode the two frames of a gof (Bt ) is computed through Bt = nR/f + ∆B
(11)
where n is the number of frames in the gof, R is the rate (in bits/s), f is the frame rate, and ∆B is the difference between the target number of bits and the real number of bits spent on the encoding of the previous gof (∆B is set to 0 in the first gof). Let XW and XK be, respectively, the wz- and the k-frame of a gof to be encoded. Table 1 lists the parameters used by the algorithm to obtain the coding mode and the quantization parameters of the gof.
Table 1. Variables used in the rate control algorithm Symbol σ ˆ
Description Estimated standard deviation of the correlation noise
XW
wz-frame (first frame) in the gof
XK
k-frame (second frame) in the gof
Bt
Bits available to encode the gof
PSNRW
Distortion of the wz-frame
PSNRK
Distortion of the k-frame
BK
Number of bits of the k-frame
BW
Number of bits of the wz-frame
QPK
Quantization parameter of the k-frame
QPW
Quantization parameter of the wz-frame when it is encoded in intra mode
C
Complexity of the frames13
Assuming we have obtained updated values of C and BA after the encoding of the last gof, the rate control algorithm performs the following steps: 1. Estimate the correlation noise standard deviation in XW 2. Estimate PSNRW and PSNRK when XW is encoded in the wz mode or the skip mode. For 0 ≤ p ≤ M : • Estimate BW and PSNRW when p bps are transmitted • Compute BK = Bt − BW • Compute QPK = round(C/BK ) • Estimate PSNRK • Save (PSNRK , PSNRW ) 3. Obtain PSNRW and PSNRK when XW is encoded in intra mode. For 0 ≤ QPK ≤ 31 and 0 ≤ QPW ≤ 31: • Compute BK = round(C/QPK ) and BW = round(C/QPW ). • If (BK + BW ≤ Bt ) – Estimate PSNRK and PSNRW – Save (PSNRK , PSNRW ) 4. Select the optimum (PSNRK , PSNRW ) according to the criterion chosen 5. Encode XW and XK according to the optimum mode and parameter values 6. Update C and Bt In the following, we describe each step of the rate control algorithm in more detail. In step 1, we obtain an estimate (ˆ σ ) of the standard deviation of the correlation noise of XW by computing the root mean squared difference between XW and an average of its preceding and the succeeding k-frames. In step 2, we estimate the PSNR and the number of bits of XW and XK when XW is encoded in skip mode or wz mode. In the skip mode, BW is set to 0 and PSNRW is set to 10 log(2552 /ˆ σ 2 ). In the wz mode, BW and PSNRW are estimated by using the results obtained in Section 3. In both modes, BK is set so that the rate constraint is fulfilled (BK = Bt −BW ).
Then, the QPK value that provides a number of bits closest to BK is computed using (10). Finally, the PSNRK is estimated from QPK and the list of (PSNRK , QPK ) pairs. The M + 1 pairs (PSNRW , PSNRK ) obtained together with their corresponding coding mode and quantization parameters are saved. In step 3, the same pair of PSNR values are computed when both frames in the gof are intra coded (intra mode). First, for each pair of potential quantization parameters (QPW , QPK ), their corresponding number of bits (BW and BK ) are estimated using (10). Then, the estimated (PSNRW , PSNRK ) values for those (QPW , QPK ) pairs that fulfil the rate constraint (BK + BW ≤ Bt ) together with their corresponding coding mode and quantization parameters are saved. In practice, many of the potential (QPW , QPK ) pairs can be discarded without computing their PSNR values. In step 4, the optimum coding mode and quantization is selected. If the maxave criterion is used, then the coding mode and the quantization parameters whose pair (PSNRW , PSNRK ) provides the largest PSNRK + PSNRW value is selected. If the maxmin criterion is used instead, then we select the coding mode and the quantization parameters of the pair (PSNRW , PSNRK ) such that the minimum between PSNRW and PSNRK has the maximum value. Once the optimum has been chosen, we encode XW and XK using the optimum coding mode and quantization. The number of bits spent is used to update parameters C and Bt for the encoding of the next gof.
5. EXPERIMENTAL RESULTS We experimentally tested the efficiency of our rate control algorithm using a pdwz coder. In our coder, k-frames are encoded as intra-frames using a standard H.263+ intra-coder.8, 15 As in,15 a pre-interleaving of each bp is done before the turbo coding in order to avoid burst errors. The sw coder uses a rate-compatible turbo coder10 with a puncturing period of 32. The turbo coder is composed of two identical constituent convolutional encoders of rate 1/2 with generator polynomials (1, 33/23) in octal form. For each wz-frame XW , the decoder obtains an interpolated frame Y by averaging the two decoded k-frames closest to XW . In each transmitted bp, the wz decoder asks for parity bits through the feedback channel until Pe < 10−3 . Once all the transmitted bps have been decoded, XW is reconstructed using mmse estimation. We encoded 101 frames of two qcif sequences (Foreman and Carphone) at 15 frames/second using the algorithm of Section 4 with the maxave criterion at four target rates: 200, 300, 400 and 500 kbps. In all the encodings, only the luminance component was considered. The mean rate (in kbps) and PSNR (in dB) values of each encoding using our algorithm are shown in Figures 2 and 3. We also coded the same sequences using a fixed quantization strategy that, in each encoding, uses the same number of bps m in all the wz-frames and the same QP value in all the k-frames. This strategy is used by most pdwz video coders.1, 6–9, 15 Then, for each encoding, we computed the rate and the distortion for each (m, QP) pair (0 ≤ m ≤ 4, 2 ≤ QP ≤ 31). The operational distortion-rate function of this strategy for each video sequence was obtained by selecting the (m, QP) points that belong to the convex hull. This function is displayed in Figures 2 (Carphone) and 3 (Foreman). Finally, we encode all the frames of each sequence using an H.263+ intra-coder using a fixed QP value for all the frames. Figures 2 and 3 show the PSNR and rate for different QP values. Note the gain in coding efficiency of the Wyner-Ziv strategies with respect to the intra-frame coder is smaller than in other previous works.1, 6–8, 15 There are several reasons for this. First, the frame rate of the video sequences used in this work is 15 frames/s while the frame rate of the sequences used in1, 6–8, 15 is 30 frames/s. The higher the frame rate, the greater the benefits of inter-frame decoding, and consequently, the higher the gain of wz coding with respect to intra coding. Second, the rate values provided in1, 6, 7, 15 only consider the rate of the wz-frames. In this work, however, the rate values include both the k- and wz-frames. Third, the interpolation tools used in some of the results presented in1, 6–8, 15 are more sophisticated than the simple frame averaging used in this work. Finally, in ,1, 6, 7, 15 the k-frames are losslessly transmitted while, in this work, they are encoded using a H.263+ intra-coder. Figures 2 and 3 show that the average rates achieved by using our algorithm are close to the target rates (200, 300, 400 and 500 kbps). These figures also show that the coding efficiency of our algorithm is higher than the one provided by the other two algorithms (wz coding with fixed quantization and intra-frame coding). The optimum coding mode and quantization depend on the rate and the correlation noise variance. Thus, the higher
the rate, the less efficient the wz mode is with respect to the intra mode. When the wz mode is used, the lower the rate, the lower the number of bps that must be transmitted. As a result, the skip mode is optimum at low rates, the wz mode is optimum at medium rates, and the intra mode is optimum at high rates. The rate margin where a coding mode remains more efficient than the others depends on the value of σ. Thus, the higher the value of σ, the lower the rate where the wz mode becomes more efficient than the skip mode and the lower the rate where the intra mode becomes more efficient than the wz mode. As our algorithm is able to change the coding mode and the quantization depending on the available rate and the σ of each wz-frame, our algorithm is more efficient than either of the other two strategies. At very low and high rates, however, most frames must be encoded in skip mode and intra mode, respectively. Therefore, the efficiency of our algorithm is very close to the fixed quantization strategy at very low rates and to the intra-frame strategy at very high rates.
Figure 2. Coding efficiency using different strategies in the encoding of the sequence Carphone.
Figure 3. Coding efficiency using different strategies in the encoding of the sequence Foreman
To compare the maxave and maxmin criteria in the selection of the optimum coding mode and quantization, we repeated the encodings of Figures 2 and 3 using the maxmin criterion. The average PSNR of all the frames and the average of the minimum PSNR values in each gof are shown in Table 2 (Carphone) and Table 3 (Foreman) for each criterion and rate. These tables also include the average of the absolute differences (aad) in PSNR of each gof. Note that the maxmin criterion provides lower aad values and higher values of the average minimum PSNR than the maxave, especially at low rates. As a result, maxmin provides encodings with smaller quality fluctuations than maxave. Surprisingly, maxave does not provide better average PSNR than the maxmin in all the encodings. This is mainly due to the errors in estimating σ, which translates into errors in the prediction of the PSNR and the rate of wz-frames, and hence, in wrong selections of the coding mode or the quantization. Improving the estimation of σ is a hard problem in wz video coding since it would require increasing the complexity of the encoder or transmitting information from the decoder to the encoder.16
Table 2. AAD, average PSNR and average min PSNR values for Carphone with two optimization criteria
Rate [kbps]
MAXAVE
MAXMIN
AAD [dB]
Average PSNR [dB]
Average min PSNR [dB]
AAD [dB]
Average PSNR [dB]
Average min PSNR [dB]
200
7.77
33.58
29.73
0.88
32.81
32.40
300
4.06
35.77
33.76
0.71
35.52
35.19
400
4.35
37.55
35.40
0.68
37.56
37.25
500
1.98
39.25
38.30
0.84
39.27
38.89
Table 3. AAD, mean PSNR and mean min PSNR values for Foreman with two optimization criteria.
Rate [kbps]
MAXAVE
MAXMIN
AAD [dB]
Average PSNR [dB]
Average min PSNR [dB]
AAD [dB]
Average PSNR [dB]
Average min PSNR [dB]
200
6.97
30.68
27.19
0.91
30.27
29.84
300
6.21
32.57
29.44
0.45
32.61
32.36
400
3.14
34.32
32.73
0.84
34.39
33.95
500
3.50
35.75
33.99
0.53
35.96
35.69
As the use of the wz mode only provides benefits over other coding modes within a certain range of values of R and σ, pdwz video coders should use the wz mode only when it provides a higher efficiency than the other available strategies (e.g, skip or intra). The pdwz encoder must analyze the video signal to collect all the rate-distortion data necessary to optimally select the coding modes and parameters. In contrast to standard video coding algorithms, the rate-distortion analysis in pdwz video coders must be simple in order to keep the complexity of the encoder low. Additionally, the rate-distortion analysis performed by a pdwz encoder is usually inaccurate since not all the data necessary to perform an accurate analysis is available at the encoder side. As these restrictions in complexity and accuracy do not exist in conventional video coding, the benefits provided by the use of a rate control algorithm are more limited in Wyner-Ziv video coders than in conventional video coders.
6. CONCLUSIONS We have presented a rate control algorithm for pdwz video coders. Our algorithm first predicts the rate and the distortion of each frame for each coding mode and quantization parameter. Then, it selects the proper mode/quantization so that a rate constraint is fulfilled and a quality criteria is maximized. Experimental results show that the rate constraint is approximately fulfilled and that a good coding efficiency is obtained by using our algorithm. The results also show that the use of Wyner-Ziv coding only provides better efficiency than other coding strategies (e.g. intra coding) under certain conditions. Consequently, pdwz video coders should consider Wyner-Ziv coding as a possible strategy that should only be used when it provides a higher efficiency than other coding strategies.
REFERENCES 1. A. Aaron, R. Zhang, and B. Girod, “Wyner-Ziv coding of motion video,” in Proc. Asilomar Conference on Signals and Systems, Pacific Grove, California, USA, November 2002. 2. R. Puri and K. Ramchandran, “PRISM: A new robust video coding architecture based on distributed compression principles,” in Proc. Allerton Conference on Communication, Control, and Computing, Allerton, IL, USA, Oct. 2002. 3. B. Girod, A. M. Aaron, S. Rane, and D. Rebollo-Monedero, “Distributed video coding,” Proc. IEEE, vol. 93, no. 1, pp. 71–83, Jan. 2005. 4. C. Guillemot, F. Pereira, L. Torres, T. Ebrahimi, R. Leonardi, and J. Ostermann, “Distributed monoview and multiview video coding,” IEEE Signal Processing Mag., vol. 24, no. 5, pp. 67–76, Sept. 2007. 5. G. J. Sullivan and T. Wiegand, “Rate-distortion optimization for video compression,” Signal Processing Magazine, vol. 15, no. 6, pp. 74–90, Nov. 1998. 6. J. Ascenso, C. Brites, and F. Pereira, “Improving frame interpolation with spatial motion smoothing for pixel domain distributed video coding,” in EURASIP EC-SIP-M, Slovack, Republic, June 2005. 7. J. Ascenso, C. Brites, and F. Pereira, “Motion compensated refinement for low complexity pixel distributed video coding,” in IEEE International Conference on Advanced Video and Signal Based Surveillance, Como, Italy, September 2005. 8. C. Brites, J. Ascenso, and F. Pereira, “Modeling correlation noise statistics at decoder for pixel based Wyner-Ziv video coding,” in Proc. PCS, Beijing, China, April 2006. 9. M. Morb´ee, J. Prades-Nebot, A. Piˇzurica, and W. Philips, “Rate allocation algorithm for pixel-domain distributed video coding without feedback channel,” in Proc. ICASSP, Hawaii, USA, April 2007. 10. D. Rowitch and L. Milstein, “On the performance of hybrid FEC/ARQ systems using rate compatible punctured turbo codes,” IEEE Trans. Commun., vol. 48, no. 6, pp. 948–959, June 2000. 11. J. Sun and H. Li, “A Wyner-Ziv coding approach to transmission of interactive video over wireless channels,” in Proc. ICIP, Genoa, Italy, September 2005, pp. 686–689. 12. A. Roca, M. Morb´ee, J. Prades-Nebot, and Edward Delp, “A distortion control algorithm for pixel-domain Wyner-Ziv video coding,” in Proc. PCS, Lisboa, Portugal, Nov. 2007. 13. ISO/IEC JTC1/SC29/WG11/93-225b Test Model Editing Committee, Test Model 5 (TM5), April 1993. 14. G. M. Schuster, G. Melnikov, and A. K. Katsaggelos, “A review of the minimum maximum criterion for optimal bit allocation among dependent quantizers,” IEEE Trans. Multimedia, vol. 1, no. 1, pp. 3–17, Mar. 1999. 15. M. Dalai, R. Leonardi, and F. Pereira, “Improving turbo codec integration in pixel-domain distributed video coding,” in Proc. ICASSP, Toulouse, France, May 2006, pp. 537–540. 16. C. Ngai-Man, H. Wang, and A. Ortega, “Correlation estimation for distributed source coding under information exchange constraints,” in Proc. IEEE International Conference on Image Processing, Genoa, Italy, Sept. 2005, vol. 2, pp. 682–685.