coding. Multiple Description Coding (MDC), as a candidate .... O O. = and so on. 4. SIMULATION RESULTS. In this section, we examine the performance of our.
OPTIMAL MULTIPLE DESCRIPTION WAVELET VIDEO CODING Moyuresh Biswas, Michael Frater, John Arnold School of Information Technology & Electrical Engineering University College, University of New South Wales Australian Defence Force Academy, Australia ABSTRACT We propose an optimal Multiple Description (MD) video coding technique with 3D-Set Partitioning in Hierarchical Tree (3D-SPIHT) algorithm. Multiple Description Coding (MDC) technique generates multiple bitstreams of the video content and has useful applications in packet video networks and error-prone networks. We describe a twodescription MDC system that embraces rate-distortion optimization to generate two balanced, efficient descriptions. Experimental results on a variety of test sequences show that the proposed method achieves a significant improvement in compression performance compared to existing methods. Index Terms— Multiple description partitioning, tree structure, balanced.
coding,
set
1. INTRODUCTION Even with constant increase in bandwidth or channel capacity, the need for video compression still holds importance mainly because of the increasing demand for new features e.g. high resolution picture, higher frame rate etc [1]. At the same time, with advances in communication technology, multimedia transmission over different networks, such as packet video networks and wireless networks, has also become an important application. However, these networks are typically unreliable and traditionally compressed video data remains vulnerable to errors. The solution to the problem is error-resilient video coding. Multiple Description Coding (MDC), as a candidate for error-resilient video coding, has already shown promise [2]. MDC is a way of encoding video information into multiple independent descriptions or bitstreams. Each of these descriptions achieves acceptable video quality when decoded independently and the decoded quality increases when more descriptions are available. MDC introduces redundant information among the descriptions. The redundant data helps estimate the lost information. The initial work on MDC was presented by Vaishampayan [3] where the author proposes a scalar quantizer tailored to meet the Multiple Description (MD) requirements. Wang et al [4] proposed a method that uses a correlating transform namely Pairwise Correlating Transform (PCT). In [5], we proposed a MD video coding
technique with Set Partitioning in Hierarchical Tree (SPIHT) [6] algorithm where we used even-odd subsampling for description separation. In this paper, we extend our work with a rate-distortion optimization strategy for MDC. The paper is organized as follows. In Section 2, 3DSPIHT video coding is briefly described. In Section 3, our proposed optimized MDC is explained. Experimental results are presented in Section 4. Section 5 concludes the paper. 2. BACKGROUND OF 3D-SPIHT SPIHT is a wavelet based image coding algorithm which has, at its backbone, a set-partitioned sorting technique. The image is first decomposed with a pyramidal wavelet transform to compact signal energy. The transformed coefficients are then organized in a structure called a ‘spatial orientation tree’. The root of the tree always belongs to the lowest subband of the wavelet decomposition. For actual coding, SPIHT uses two passes: a sorting pass and a refinement pass. For a detailed description, the reader is referred to [6]. For video coding, 3D-SPIHT was initially proposed by Kim et al [7] by extending the idea of set partitioning to temporal dimension. Cho and Pearlman proposed an asymmetric tree structure (AT-SPIHT) in [8] which was found efficient for video coding. In [9], a Spatio-Temporal Tree Preserving SPIHT (STTP-SPIHT) was proposed where wavelet coefficients are divided into blocks according to their spatial and temporal relationships and each block is coded independently. 3. RATE-DISTORTION OPTIMIZED MDC The excellent performance of 3D-SPIHT in the single description case suggests its possible extension to MDC. In [10], Kim et al used STTP-SPIHT coder [9] to develop an unbalanced MDC system. We have proposed a modified tree structure for MDC in [5] and have shown its effectiveness in improving efficiency. However, in [5] we used an even-odd subsampling based approach to separate descriptions which is somewhat ad-hoc. In this work, we formulate a rate-distortion framework for a practical two description MD video coder. In particular, the goals of our optimization algorithm are: Maximize the quality for an individual
description. i.e. minimize the distortion when only one description is received. Generate two balanced descriptions. i.e. the descriptions should consume a nearly equal number of bits of the total bit budget and the difference in quality when each description is received on its own should also be as low as possible. 3.1 Optimal Redundancy Allocation Let = 1 , 2 ,....... be the set of trees in a Group of Frames (GOF). The number of trees () in a GOF is determined by the number of wavelet decomposition levels and the size of the picture. Let us now consider the following definitions: dt denotes the total number of bits to encode tree t for description d and dt denotes the
corresponding distortion. st denotes the total number of bits to encode
tree t by a Single Description Coder (SDC). where d 1, 2 and t 1, 2,....., . In the MDC paradigm, the difference 1 2 s is called the
redundancy. The concept of introducing redundancy in a description is to reduce the distortion when only one description is received. Therefore a straightforward way of achieving the first goal is to find the optimal allocation of redundancy among the trees so that overall distortion for a GOF is minimized. Therefore our optimization problem can be formulated as
min dt t St t 1
d 1, 2
t 1
In (1), St is the set of all admissible redundancy
values corresponding to tree t and dt t (as redundancy is common in both descriptions) is the distortion for tree t which is determined as the squared error between the original and reconstructed transform coefficients. With the Lagrangian optimization formulation, the above constrained optimization problem can be converted to an unconstrained problem:
min t t t 1
t
St
t 1
t
t
(3)
where is the Lagrangian multiplier and is the Lagrangian cost function. t t t is the Lagrangian (2)
min min t t t St t St t 1 t 1
min S t
(1)
subject to the constraint that
t GOF
Figure 1 : Different Tree types in a GOF
cost for tree t. Eqn (3) implies that the solution of the optimization problem can be obtained by performing minimization of the Lagrangian cost for each individual tree t
keeping constant and checking for the
satisfaction of constraint (2). 3.2 Balanced Separation of Descriptions To achieve the second goal of balanced description generation, let us consider a tree rooted at r . Let d be
the total bits (including the optimal redundancy bits for this tree as determined by the Redundancy Allocation algorithm) that can be spent to encode this tree. In the first stage, the redundancy allocation algorithm determines the lowest bitplane b bmax ,....., 2,1, 0 for the tree that can be encoded with the redundancy bits , where bmax is the highest bitplane. In the second stage of description separation, the coding process starts with next lower
TABLE I PATTERNS FOR BALANCED MDC O1
O2
O3
O4
Applies to
Pattern 1
Description 1
Description 2
Description 2
Description 1
Type 1 & 2
Pattern 2
Description 1
Description 1
Description 2
Description 2
Type 1 & 2
Pattern 3
Description 1
Description 2
Description 1
Description 2
Type 1 & 2
Pattern 4
Description 1
Description 2
Description 2
Description 2
Type 1
bitplane b 1 for each description and continues as long as allowed by the total bit budget for that tree. Now let c be the set of immediate descendants of
r . Our goal is to find two sets c1 c and c 2 c
with c1 c 2 so that the following condition is satisfied: If 1 is the distortion after encoding c1 with
bits and is the distortion as a result of encoding with bits; .
2
1
c2
2
1
2
As the cardinality of c is usually small, a simple brute-force search can be applied to find the two subsets satisfying the above constraint. However, instead of doing an exhaustive search, we can limit the search to certain patterns exploiting the tree structure as this can greatly reduce the complexity of the search. For example, Figure 1 shows a GOF of 4 frames (Figure 1a) with two different types of tree (Figure 1b). The whole GOF contains repetitions of these two different tree types. The coefficient A in Figure 1a is the root of a Type 1 tree; coefficients B, C, D are roots of a Type 2 tree. In both cases, the first level descendants are marked O1, O2, O3, O4. It is important to observe that For a Type 2 tree, the length of the tree branch starting from all the offsprings (O1,O2,O3,O4) is the same. For a Type 1 tree, the length of the tree branch starting from offsprings O2, O3 and O4 is the same; however, the length is larger for the tree branch from O1. We, therefore, define four patterns for balanced two description generation as described in Table I. Here, we apply an equal-divisioning approach to distribute the tree branch for our balanced Multiple Description technique. Pattern 1, 2 & 3 assign half of the available offsprings to each description and these patterns are applicable to both Type 1 and Type 2 trees. Pattern 4, however, is different. It accounts for the longer length of the tree branch from O1 and assigns it to one description and the rest of the offsprings to the other description. It is also applied to only Type 1 trees. If Pattern 1 is selected for a tree by the algorithm, then offsprings O1 and O4 and all of their descendants are coded in description 1 for that tree; while offsprings O2 and O3 and their descendants are coded in
Figure 2 : Rate-Distortion Performance of Foreman and Carphone QCIF Sequences
description 2. Therefore, in accordance with previous representation c1 {O1, O 4} and c2 {O2, O3} . Similarly if Pattern 2 is selected c1 {O1, O 2} ; c 2 {O3, O 4} and so on. 4. SIMULATION RESULTS In this section, we examine the performance of our proposed MDC method. We compare our method with MD implementation of AT-SPIHT as was done in [5]. This is termed AT-SPIHT-MDC. We also implemented the MDC technique proposed by Kim et al [10] where instead of unbalanced descriptions, we generated two balanced descriptions. This is termed STTP-SPIHT-MDC. We also reproduce here the results of the method proposed at [5]. The PSNR results for Foreman and Carphone sequences are presented in Figure 2 and those for Mother&Daughter sequence are in Figure 3. The PSNR values are the average over 296 frames of the sequences. The comparisons are made assuming that only one description is received in its entirety. All results are based on decoded bitstream. We see that the proposed method demonstrates 0.5~2.5 dB improvement in PSNR over different bitrates. We achieve significant coding gain at particularly low bitrates, where the performance of other methods is too coarse to be of any practical interest. In
Figure 4 : Subjective evaluation of Frame#21 of Football SIF sequence; Bitrate 200kbps Redundancy 30 % (a) Proposed method (PSNR 22.05 dB) (b) STTP-SPIHT-MDC (PSNR 21.06 dB) Figure 3 : Rate-Distortion Performance of Mother&Daughter CIF Sequence TABLE II PERFORMANCE COMPARISON OF TWO DECODED DESCRIPTIONS
Carphone QCIF
Rate (Redundancy) 100 kbps (20%)
Foreman CIF
200 kbps (40%)
Sequence
PSNR Description 1 Description 2 25.97 dB
25.91 dB
26.75 dB
26.80 dB
Table II, we compare the performance of two decoded descriptions of our MDC system for two sequences. As we see, the difference in reconstruction qualities of the two descriptions is very small; i.e. the proposed method generates truly balanced description. In Figure 4, we present extract from a frame of Football sequence decoded with one description for subjective quality evaluation. The absence of one description is evident from the frame decoded with STTPSPIHT-MDC whereas our method provides a smooth reconstruction. 5. CONCLUSION We have described an optimal Multiple Description video coding system with 3D-SPIHT algorithm. Experimental results show that our method significantly outperforms some of the existing MDC techniques. We are currently investigating the application of the proposed method under different network scenarios including error-prone channels. 6. REFERENCES [1]
G. J. Sullivan, J. R. Ohm, A. Ortega, E. Delp, A. Vetro, and M. Barni, "Future of Video Coding and Transmission," Signal Processing Magazine, IEEE, vol. 23, no. 6, pp. 76-82, 2006.
[2]
V.K.Goyal, "Multiple Description Coding: Compression Meets the network," IEEE Signal Proc. Magazine, vol. 18, no. 5, pp. 74-93, Sept. 2001. [3] V. A. Vaishampayan, "Design of Multiple Description Scalar Quantizers," IEEE Trans. Inform. Theory, vol. 39, pp. 821-834, May 1993. [4] Y. Wang, M. T. Orchard, V. A. Vaishampayan, and A. R. Reibman, "Multiple description coding using pairwise correlating transforms," IEEE Trans. Image Proc., vol. 10, no. 3, pp. 351-366, March 2001. [5] M. Biswas, M. Frater, and J. Arnold, "Multiple Description Video Coding with 3D-SPIHT Employing A New Tree Structure," accepted at ICIP, 2007. [6] A. Said and W. A. Pearlman, "A New Fast and Efficient Image Codec based on Set Partitioning in Hierarchical Trees," IEEE Trans. on Cir. & Sys. on video Tech., vol. 6, no. 3, pp. 243-250, June 1996. [7] B.-J. Kim, Z. Xiong, and W. A. Pearlman, "Low bit-rate scalable video coding with 3-D set partitioning in hierarchical trees (3D-SPIHT)," IEEE Trans. on Cir. & Sys. on video Tech., vol. 10, no. 8, pp. 1374-1387, Dec. 2000. [8] S. Cho and W. A. Pearlman, "Error Resilient Video Coding with improved 3-D SPIHT and Error Concealment," Proc. of SPIE/ID&T Int. Conf. on Electronic Imaging, Santa Clara, CA, 2003. [9] S. Cho and W. A. Pearlman, "A full-featured, error resilient, scalable wavelet video codec based on the Set Partitioning in Hierarchical Trees (SPIHT) algorithm," IEEE Trans. on Cir. & Sys. on video Tech., vol. 12, no. 3, pp. 157-171, March 2002. [10] J.Kim, R. M. Mersereau, and Y. Altunbasak, "Distributed Video Streaming using Multiple Description Coding and Unequal Error Protection," IEEE Trans. Image Proc., vol. 14, no. 7, pp. 849-861, July 2005.