Drift-free Multiple Description Video Coding With Redundancy Rate-distortion Optimization Yilong Liu and Soontorn Oraintara Department of Electrical Engineering, University of Texas at Arlington, Arlington, TX 76019–0016, Email:
[email protected],
[email protected]
Abstract— Multiple description coding (MDC) has been shown to be robust for video transmission over error-prone channels. By applying the extra prediction loops to the multiple description transform video coder, the mismatching error (drift) incurred by the one-channel pairwise correlating transform reconstruction is decreased. However, the problem of effective redundancy allocation is introduced by the drift coding. In this paper, we propose an approach to optimize the drift-free multiple description video coding scheme. The optimization is in the redundancy rate-distortion sense using Lagrangian relaxation method for optimum allocation of redundancy amongst the correlating transform and the drift coding. The experimental results show that our algorithm achieves substantial gains over the conventional MDC coder.
I. I NTRODUCTION Due to predictive coding, any bit loss of video signal may incur severe error propagation and great quality degradation. Hence it is always a big challenge for video transmission over lossy network. Multiple description coding (MDC) is one of the approaches to solve this problem. It provides a very efficient framework to achieve robust communication over unreliable channels such as a lossy packet network [3], [8]. In this technique a single source of data is represented with several subsets of data called as descriptions by introducing correlations such that the source can be approximately estimated from any of these subsets. These descriptions are transmitted over different channels to the receiver. In the case of a channel erasure, the receiver is still able to retrieve some information about the lost piece of signals. Consequently, the multiple descriptions should be mutually refining and relatively independent. When all the descriptions are available at the receiver, the signal can be reconstructed with high fidelity. However, the quality of the reconstructed source is still acceptable if only one description is received. Vaishampayan proposed a simple and practical MDC scheme, named as Multiple Description Scalar Quantization (MDSQ) [6], [7], to generate two descriptions with two indices for each quantization level. Any description can use its coarse data to generate a basic video and both of them can be combined to reconstruct high quality signal. However, under this scheme, it is hard to estimate the value of a lost description from the one that remains. To achieve robustness against coefficient losses, Wang proposed applying a pairwise correlating transform (PCT) to each pair of uncorrelated variables obtained from the Karhunen-Loeve transform (KLT) [10], [11]. Goyal et al. further generalized Wang’s work to any number of
0-7803-8834-8/05/$20.00 ©2005 IEEE.
variables [1]. Reibman et al. adopted this multiple description transform coding (MDTC) method and proposed a multiple description video coder that uses motion-compensated predictions [5]. The challenge in developing an MDTC approach to video coding lies in the coding of prediction errors. If only one description is received, a mismatch exists between the encoding and decoding loops, which is known as drift. Mismatch errors will be further exacerbated because of error propagation, and are never corrected until the encoding and decoding loops are cleared by an intra-coded frame. To avoid such a mismatch, two independent prediction loops (side loop) are introduced based on each description reconstruction as described in Figure 2 so that the mismatching errors can be controlled. Hence the contribution of redundancy is addressed by two sources, the correlating transform and the coding accuracy of mismatch errors. Heuristic redundancy allocation is applied in [5]. The redundancy is allocated to the portion incurred by MDTC at first; then the rest portion of redundancy is allocated to code the mismatch errors by a fixed coarser quantizer compared to the central loop, when the correlation among the MDTC coefficients is getting higher. This method is simple and easy for implementation. However, it loses the property of optimal rate-distortion design. In this paper, we propose an efficient redundancy rate-distortion optimization algorithm based on the drift-free MDTC video coding system. By using Lagrangian relaxation for optimal allocation of redundancy amongst the central loop and side loops, a minimized average one-channel reconstruction distortion can be achieved under the given target redundancy.
II. D RIFT- FREE MD VIDEO CODING A. Multiple description transform coding MDTC uses a pairwise correlating transform to each pair of uncorrelated variables obtained from the KLT or discrete cosine transform (DCT) [10] as described in Figure 1. Assume A and B are two independent Gaussian variables 2 2 and σB respectively. They are paired and with variances σA transformed into C and D using cot(θ) tan(θ) C A A . (1) 2 =T = 2 tan(θ) B D B − cot(θ)
4034
2
2
C
C Q
A
C
A
Q B
B Q
Linear estim ator
-1
Channel1 Correlating Transform D T
Inverse Transform T -1
Channel2
Q D
-1
D
Aˆ Bˆ
Q
-1
A
Q
-1
B
Linear estim ator
reconstruction with description 1
Aˆ Bˆ
reconstruction with both descriptions
single-channel reconstruction, two additional side prediction loops are introduced as described in Figure 2. Besides the r1 and r2 generated from the MDTC, the mismatch errors, s1 and s2 , are predictively coded by using the side loop and transmitted as side information. In this way, a drift-free MDC video coding scheme is formed.
reconstruction with description 2
IDCT
IQ 1
DCT
Q2
IMDTC
S1 (Channel 1)
P1
Fig. 1.
IDCT
IQ 2
Diagram of the multiple description transform coding method. FM1 (Channel 1) r1
The transform rotation angle θ, θ ∈ (0, π2 ), controls the correlation between C and D, which changes the redundancy of the MDTC coder. Suppose the variances of C and D 2 2 and σD , respectively. If we parameterize are denoted as σC the correlation of C and D by the angle φ, defined as E{CD} = σC σD cosφ. Then the θ can be used to vary the A tan( φ2 ) [9]. The value of correlation angle φ by tan(θ) = σσB θ adjusts the projections on the bases of PCT, which actually distributes the source energy to the outputs. In order to capture most principal components of the source, the basis should be skewed toward the variable that has larger variance [1]. Since the transform is nonorthogonal except θ = π4 , quantizing C and D may lead to degradation. Hence the transform is performed on the quantized indices of A and B, denoted as A and B. An invertible integer transform is implemented to generate integer indices, thus Eq.(1) should be changed to [C, D]T = Tint [A, B]T , where Tint denotes the integer correlating transform following the method proposed in [12]. If only one description, e.g., C, is received as shown in the top block of Figure 1, inverse quantization is first applied to then an optimal linear predictor is used to estimate yield C, by minimizing the MSE. D from C 2 2 2 = − σA − σB tan (θ) · C D 2 + σ 2 tan2 (θ) σA B
(2)
is the estimate of D. Take inverse transform on C where D and D, the estimated A and B can be obtained as follows:
A B
=
=
tan(θ) 2 cot(θ) 2
√
− tan(θ) 2 cot(θ) 2
2 σA 2tan(θ) 2 2 (θ)σ 2 · C σA +tan√ B 2 σB tan(θ) 2tan(θ) ·C 2 +tan2 (θ)σ 2 σA B
C D
(3)
The reconstruction when only D is received follows from symmetry.
F
DCT
Q1
MDTC
IDCT
IQ 1
IMDTC
IDCT
IQ 1
IMDTC
DCT
Q2
P0
r2 (Channel 2)
FM0
-
S2 (Channel 2)
IDCT
P2
IQ 2
FM2
Fig. 2.
Framework for MDTC based video coding system.
However, it incurs new redundancy by transmitting the purely redundant side information. Thus the total redundancy ρ = ρ1 + ρ2 , where ρ1 denotes the portion of redundancy by using MDTC, and ρ2 is the portion of redundancy for coding the mismatching errors in the extra prediction loops. Hence, it is very important to smartly allocate the redundancy between ρ1 and ρ2 so that the optimal rate-distortion performance can be achieved. III. R EDUNDANCY RATE - DISTORTION OPTIMIZATION Since the redundancy and the distortion associated with the descriptions are incurred by the MDTC and the side information, they can be represented by the function of rotation angle for MDTC, θ, and the quantization parameter for the side loop, QP2 (Q2 = 2QP2 ). To achieve a sophisticated ratedistortion control, we construct the optimization problem at the slice level [2]. Given the target redundancy, the optimal rotation angles and side information quantization parameters for the current frame can be determined by solving the constrained optimization problem, D(θi , QP2,i ) (4) min i
Subject to
B. Drift-free MDTC based video coding To reduce the amount of mismatch between the two-channel reconstruction used for the central prediction loops and the
4035
i
ρ(θi , QP2,i ) ≤ ρ∗ , 0 ≤ ρ∗ ≤ 1
(5)
where, θi and QP2,i are the rotation angle and quantization parameter used for the ith slice, D(θi , QP2,i ) denotes the average one-channel reconstruction distortion, 1 D(θi , QP2,i ) = [D1,i (θi , QP2,i ) + D2,i (θi , QP2,i )], 2
θi∗
= arg min θi
D(θi , QP2,max ) + λi R1 (θi , QP2,max )
(6)
∗
+R2 (θi , QP2,max ) − (1 + ρ )Ri
∗
and ρ denotes the target redundancy. Dk,i (θi , QP2,i ), k = 1, 2 is the MSE between the original ith slice and the one reconstructed from the estimated lost description using Eq.(3) and the dequantized drift errors for description k. ρ(θi , QP2,i ) denotes the redundancy function for the ith slice, which is computed as: ρ(θi , QP2,i ) = [R1 (θi , QP2,i ) + R2 (θi , QP2,i ) − Ri ]/Ri Ri is the bit rate of the ith slice under single description coding (SDC) scheme, i.e. non-MDC video coding system, while Rk , k = 1, 2 represents the bit rate for description k of the ith slice. To solve this problem, we construct a cost function based on redundancy-rate-distortion (RRD) framework with MSE as distortion criterion [10]. Therefore, the optimal θi and QP2,i can be determined by:
min
D(θi , QP2,i ) + λ
i
+
R1 (θi , QP2,i )
i
R2 (θi , QP2,i ) − (1 + ρ∗ )
i
Ri
(7)
i
Since redundancy and distortion measures over the slices within a frame are additive, Eq.(7) can be converted to a simpler equivalent unconstrained problem as in Eq.(8).
(Θ, QP2 ) = arg min
Θ,QP2
D(θi , QP2,i )
i
∗ +λi R1 (θi , QP2,i ) + R2 (θi , QP2,i ) − (1 + ρ )Ri
(8)
Where, Θ = {θ1 , ..., θM } and QP2 = {QP2,1 , ..., QP2,M }, M is the number of slices within one frame. Although Eq.(8) relaxes the budget constraints, it is still burdensome to pursue the operating points on the convex hull of the RRD curve. Hence we propose a fast algorithm to approximate this procedure for deciding the best operating point for every slice. It is optimal to introduce more redundancy between the MDTC coefficients and then allocate the remained redundancy to code the mismatching errors incurred by linear prediction [9]. Therefore we first determine the optimal operating point by fixing the QP2,i as the maximum quantization parameter allowed. A set of candidate θ’s, Θ = {θj ∈ (0, π2 ); 1 ≤ j ≤ n}, are considered for generating the operating points. By using the bisection algorithm [4], the optimal value of rotation angle, θi∗ , can be determined by solving
(9)
Note that θi∗ is the rotation angle which achieves the optimal rate-distortion performance while still does not exceed the target redundancy ρ∗ . After we obtain θi∗ , a local ratedistortion curve can be built by varying QP2,i . ∗ QP2,i
= arg min
QP2,i
D(θi∗ , QP2,i ) + λi R1 (θi∗ , QP2,i )
+R2 (θi∗ , QP2,i )
∗
− (1 + ρ )Ri
(10)
∗ Similarly, we can determine the optimal QP2,i with bisection algorithm.
IV. S IMULATION RESULTS We examine the performance of our proposed rate-distortion optimized MDC algorithm (OMDC) with the non-ratedistortion optimized MDC method (Non-opt-MDC) proposed in [5], in which a heuristic method is used to allocate redundancy to the PCT coefficients until it reaches the point of degraded performance as the redundancy incurred by PCT is getting too large, and then it begins to allocate redundancy to the side loop mismatching errors. Two sequences, “Foreman” and “Hall monitor” (both in CIF,30Hz), are used in the simulations. All the comparisons are made by assuming that one entire description is lost. Distortion is measured as the average PSNR (dB) over the reconstructed frames using the survived description. The slice is simply formed by a row of macroblocks. Quantization parameters are maintained to be the same within one slice because of our optimization strategy. The descriptions generated by the OMDC are compliant to the syntax of H.263. Motion vectors and head information are duplicated into both descriptions. For simplicity, the number of the coefficients paired in Y, Cb and Cr are 16, 4 and 4, respectively [5]. The remained coefficients are alternatively split to one of the descriptions. We use Θ = { jπ 32 ; 1 ≤ j ≤ 15} as the candidate θi ’s for Eq.(9). In the experiments, a target bit rate for the SDC encoder is selected. The descriptions are optimally generated by using the proposed OMDC algorithm under a given target redundancy ρ∗ . Only the first frame is intra-coded, and all the remaining frames are inter-coded. The average luminance PSNR across time is recorded under certain redundancy which is expressed in terms of the percentage over the reference luminance bit rate, i.e. the bit rate of SDC. By varying ρ∗ , the RRD curve can be obtained. Figure 3 shows the RRD curves obtained for “Foreman” at the bit rates of 243 kbps and 408 kbps. It is obvious that the proposed ODMC consistently outperforms
4036
34
Non-opt-MDC by about 0.2 dB with the same redundancy at low target bit rate (243 kbps). When the target bit rate increases, the improvement is higher with the gain of 0.3 dB. Similar improvement can be found from “Hall monitor” as described in Figure 4, which has the gains of ODMC over Non-opt-MDC varying from 0.2 to 0.5 dB.
Non−opt MDC Opt MDC
Avg one−channel distortion, PNSR(dB)
33.5
V. C ONCLUSION In this paper we proposed an efficient redundancy ratedistortion optimization approach for improving the performance of the popular PCT based drift-free MDC scheme. The optimization is implemented on each slice within one frame, a fast sub-optimal operating points pursuit method is applied to simplify the optimization procedure. It is shown through simulations that our proposed method improves the conventional MDC scheme by 0.2-0.5 dB with the same redundancy over different bit rates.
31.5 30
40
50
60 70 Redundancy (%)
80
90
100
(a) 36
35.5
Avg one−channel distortion, PNSR(dB)
Avg one−channel distortion, PNSR(dB)
32.5
32
32.5
Non−opt MDC Opt MDC
32
33
Non−opt MDC Opt MDC
35
34.5
34
31.5 33.5
31 33 30
40
50
60 Redundancy (%)
70
80
90
(b)
30.5
Fig. 4. Rate-distortion performance comparison (Hall monitor, CIF, 30 Hz): (a) SDC target bit rate is 111 Kbps, two-channel reconstruction PSNR is 33.86 dB, (b) SDC target bit rate is 215 Kbps, two-channel reconstruction PSNR is 35.80 dB.
30
29.5 30
40
50
60 Redundancy (%)
70
80
90
(a) 34
Non−opt MDC Opt MDC
Avg one−channel distortion, PNSR(dB)
33.5
33
32.5
32
31.5
31 35
40
45
50
55
60 65 Redundancy (%)
70
75
80
85
(b) Fig. 3. Rate-distortion performance comparison (Foreman, CIF, 30 Hz): (a) SDC target bit rate is 243 Kbps, two-channel reconstruction PSNR is 32.76 dB, (b) SDC target bit rate is 408 Kbps, two-channel reconstruction PSNR is 34.48 dB.
R EFERENCES [1] V. K. Goyal and J. Kovacevic, “Generalized multiple description coding with correlating transforms,” IEEE Trans. Inform. Theory, vol. 47, pp. 2199–2224, Sep. 2001. [2] ITU-T, “Video coding for low bit-rate communication,” ITU-T Rec. H.263, Version 1: Nov. 1995, Version 2: Jan. 1998, Version 3: Nov. 2000, 1993.
[3] L. Ozarow, “On a source-coding problem with two channels and three receivers,” Bell Systems Technical Journal, vol. 59, no. 10, pp. 1909– 1921, Dec. 1980. [4] K. Ramchandran and M. Vetterili, “Best wavelet packet bases in ratedistortion sense,” IEEE Trans. Image Processing, vol. 2, pp. 160–175, Apr. 1993. [5] A. R. Reibman, H. Jafarkhani, Y. Wang, M. T. Orchard, and R. Puri, “Multiple-description video coding using motion-compensated temporal prediction,” IEEE Trans. on Circuit and Systems for Video Technology, vol. 12, no. 3, pp. 193–204, March 2002. [6] V. Vaishampayan, “Design of multiple description scalar quantizers,” IEEE Transactions on Information Theory, vol. 39, no. 3, pp. 821–834, May. 1993. [7] V. Vaishampayan and S. John, “Interframe balanced-multiple-description video compression,” Packet Video’99, New York, NY, USA, Apr. 1999. [8] V.K.Goyal, “Multiple description coding: Compression meets the network,” IEEE Signal Processing Magazine, vol. 18, pp. 74–93, Sept. 2001. [9] Y. Wang, M. T. Orchard, and A. R. Reibman, “Optimal pairwise correlating transforms for multiple description coding,” in Proc. IEEE Int. Conf. Image Processing, Chicago, IL, Oct. 1998. [10] Y. Wang, M. T. Orchard, V. A. Vaishampayan, and A. R. Reibman, “Multiple description coding using pairwise correlating transforms,” IEEE Trans. Image Processing, vol. 10, pp. 351–366, Mar. 2001. [11] Y. Wang, A. R. Reibman, M. T. Orchard, and H. Jafarkhani, “An improvement to multiple description transform coding,” IEEE Trans. Signal Processing, vol. 11, pp. 2843–2854, Nov. 2002. [12] X.Li, B.Tao, and M.T.Orchard, “On implementing transforms from integers to integers,” IEEE Int. Conf. Image Processing (ICIP’98), pp. 881–885, June 1998.
4037