IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
673
Multiple Description Image Coding Based on Lagrangian Rate Allocation Tammam Tillo, Member, IEEE, Marco Grangetto, Member, IEEE, and Gabriella Olmo, Senior Member, IEEE
Abstract—In this paper, a novel multiple description coding technique is proposed, based on optimal Lagrangian rate allocation. The method assumes the coded data consists of independently coded blocks. Initially, all the blocks are coded at two different rates. Then blocks are split into two subsets with similar rate distortion characteristics; two balanced descriptions are generated by combining code blocks belonging to the two subsets encoded at opposite rates. A theoretical analysis of the approach is carried out, and the optimal rate distortion conditions are worked out. The method is successfully applied to the JPEG 2000 standard and simulation results show a noticeable performance improvement with respect to state-of-the art algorithms. The proposed technique enables easy tuning of the required coding redundancy. Moreover, the generated streams are fully compatible with Part 1 of the standard. Index Terms—Error resilience, JPEG 2000, multiple description coding, rate-distortion optimization.
I. INTRODUCTION MAGE and video transmission over a wide range of transmission media would not be possible without a high degree of compression. However, when compressed data are transmitted over noisy channels or networks subject to packet erasures, the error recovery and concealment become very difficult. Channel coding is employed to recover from bit errors, whereas the packet or channel loss can be solved by multiple description coding (MDC). Multiple description coding is a technique where all the transmitted segments of data, or descriptions, can be independently decoded and are mutually refinable [1]. Descriptions are created so that, when transmitted on networks subject to independent packet erasures or multiple channels subject to independent losses, the quality of the recovered signal is function of the number of received descriptions, and not of the specific loss pattern. In case some descriptions are lost, they are estimated from the received ones. This implies a certain amount of redundancy among the descriptions, which in turn impairs the coding efficiency in the absence of losses. As a consequence, MDC is a good choice when the multimedia data are delivered over nonprioritized networks subject to harsh packet losses [2], [3]. The theoretical problem of determining the achievable rate/ redundancy/distortion regions for MDC with a given source
I
Manuscript received March 8, 2006; revised October 3, 2006. This work was supported by the EU under NEWCOM Network of Excellence. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Amir Said. The authors are with the authors are with the Department of Electronics, Politecnico di Torino, 10129 Torino, Italy (e-mail:
[email protected]; marco.
[email protected];
[email protected]). Digital Object Identifier 10.1109/TIP.2007.891152
statistical model has been addressed by many authors [4]–[7]. As for actual implementations, plenty of methods have been proposed in the literature. Perhaps the most popular is the class of quantization-based approaches, stemming from the pioneering MD scalar quantization (MDSQ) proposed by Vaishampayan in 1993 [8]. In MDSQ, a memoryless source is assumed, and both the quantization levels and the index assignment are designed so that the best quality level (central performance) is maximized, subject to proper constraints on the lower quality level (side performance) and the rate. The general scheme employing MD quantization followed by entropy coding has been further developed in several papers, among which we can mention [9] and references therein. In [10] and [11], generalizations are provided for MD trellis and vector quantization, respectively. In the class of MDC methods based on correlating transforms, redundancy is introduced among the descriptions by operating pairwise correlating transforms or generalized transforms on the source symbols [12]–[14]. A general framework to obtain MDC using lapped orthogonal transforms is presented in [15], whereas the use of subband filter banks, properly designed to achieve optimal rate/redundancy/distortion performance, has been addressed in [16]; the use of windowed Fourier co-decoding is proposed in [17]. All the mentioned methods are conceived as stand-alone algorithms, and are not compatible with standard image/video codecs. This represents a limitation for the delivery of multimedia contents using MDC, due to the widespread use of standardized codec such as JPEG 2000 [18]. In fact, generating descriptions that represent syntax-compliant JPEG 2000 streams can greatly help interoperability. In [19], the authors propose the use of oversampled filterbanks in order to allow for the reconstruction of the coded signal in the presence of coefficient loss. Moreover, they propose a low complexity MD scheme for images based on odd and even row splitting for two descriptions generation, whereas the third description is obtained by lowpass filtering each column and subsampling by a factor 2. However, the authors state that the objective was not to propose a complete coding scheme for a specific application. In [20], the authors suggest to use a uniformly distributed random mask for each subband of the wavelet transformed image, so as to generate different groups. These groups are then quantized at different rates and combined together so as to generate the descriptions, which are compressed with EBCOT [21]. Although EBCOT is the main engine of JPEG 2000, the implementation of this algorithm for JPEG 2000 requires the modification of the quantization stage, resulting in noncompliant streams. Moreover, the efficiency of EBCOT can be impaired due to the perturbation of the coefficient statistics. Methods based on the polyphase
1057-7149/$25.00 © 2007 IEEE
674
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
decomposition followed by selective quantization (PDSQ) [22] are, indeed, compatible with those standards that allow one to adjust the quantization step size (e.g., JPEG). On the other hand, they cannot be straightforwardly applied to algorithms such as JPEG 2000, where the rate allocation is not based on quantization, but instead on postcompression code-block truncation. One can think of applying the method in [22], using different selective quantization steps in order to create the descriptions; however, this may interfere with the rate allocation procedure, resulting in impaired encoding efficiency. The problem of optimizing the bit rate allocation for MD based on PDSQ is formalized in [23]. In [22], vector decomposition is also addressed, but only two heuristic approaches for the subsequence generation are described. For stationary processes, where all symbols have the same rate-distortion (RD) characteristics, the PDSQ method achieves a trade off between central and side distortion that is determined by the quantization step sizes for each generated subsequence. In this case, the decomposition stage is trivial, as it simply splits the source symbols into a number of subsequences equal to the number of descriptions to be created. Of course, it can be expected that the optimization of this task would result in further performance improvement, especially for nonstationary processes, which is the case of most realistic signals. Moreover, in case of nonstationary processes, the amount of introduced redundancy is difficult to control. The generation of MD compliant JPEG 2000 streams has previously been addressed in [24]; however, no analytical framework was developed to ensure the optimality of the procedure. A similar approach for video application has been proposed in [25], where a method based on interframe wavelet-based scalable video coding is proposed. Motion compensated temporal filtering and 2-D spatial wavelet filtering are used to perform motion compensated 3-D wavelet transform, which creates a spatiotemporal subband cube. All spatiotemporal subbands are divided into nonoverlapping regions that are coded at different rates in the different descriptions, and the LL subband is duplicated on each description. In this paper, we propose a novel method to generate two descriptions of images exploiting the Lagrangian rate allocation function present in many modern encoders, such as JPEG 2000. Besides the excellent RD performance, the main features of the technique are the simple processing at both the encoder and decoder side, and the full compatibility with the standard JPEG 2000 decoder in case of single description reception. The rest of this paper is organized as follows. In Section II, the basic principles of RD-based MDC are introduced, along with the analytical study of the optimality conditions and the redundancy tuning. Section III presents a practical approach to generate two balanced descriptions, with the analysis of its performance. In Section IV, a practical rate allocation optimized MD scheme for JPEG 2000 is described and analyzed. In Section V, simulation results are presented, and in Section VI, conclusions are drawn. II. RATE DISTORTION-BASED MDC A. Conditions for Optimality We address the problem of generating descriptions that are optimized in the RD sense. Throughout this paper, we will refer
to this method as rate-distortion-based MDC (RD-MDC). In the case of two descriptions, the performance can be evaluated in , i.e., the distortion when both terms of the central distortion descriptions are received, and the side distortions , when and either description is received, as a functions of the rates devoted to either description, corresponding to a total rate . The objective of the MD encoder is to find the optimal quintuple subject to proper constraints, as will be discussed in the following. We must point out that typical image data are not stationary, which makes the design of an efficient MD co-decoder more complicated. However, a typical approach, which will be adopted also in this paper, is to consider blocks of data, with the block dimension selected so that data can be considered almost stationary within each block. of a zero mean random Let us consider a vector process, and let us write that where are disjoint subsets of random variables. In the case of JPEG 2000 compressed images, the subsets can be identified with the so called code-blocks (CBs). In the following we will borrow the is JPEG 2000 terminology. A distortion function is an approximation of . associated to each CB, where Throughout this paper we will adopt as the distortion measure . The apthe squared Euclidean norm proximation is obtained by encoding at rate (measured in bits), according to a given RD function associated to the th CB. The collection of the RD functions is then associated to the whole vector . Let us assume that variables belonging to different CBs are independent. We want to generate two descriptions of is obtained vector ; the first (second) description by encoding each CB , at rate , yielding a distor, 1, 2; . The two tion descriptions can be built as and . Since the CBs are independent of each other, the overall distortion of the th description , 1, 2. Analogously, can be written as: , the rate devoted to either description is 1, 2. When the two descriptions are available at the decoder side, the central decoder selects the best representation of each CB, in order to get the best approximation of the original vector . Therefore, at the central decoder, and are merged into a single one the two streams . Let us focus on description . It will contain some CBs that than in and, therefore, will be are better represented in selected to generate ; on the other hand, other CBs will be as their representation is coarser than that in discarded from . Then, we can say that , where and that , where . Analogously, , , . We can write , and the central distortion . In Appendix A, we demonstrate that RD optimized descriptions , meaning that it is not should satisfy the condition convenient to have a CB encoded at the same rate in the two
TILLO et al.: MULTIPLE DESCRIPTION IMAGE CODING
675
descriptions. As a consequence, we can write , and the central distortion . At this point, we can investigate the necessary conditions to . Different obtain an optimized quintuple optimization scenarios can be addressed, but we will limit ourselves to the analysis of a case of particular interest. The goal of the optimization problem is the minimization of the expected , where , , and distortion are the probability of receiving both descriptions, only the first one and only the second one, respectively. Moreover, we must , , where and satisfy the constraints are the maximum allowed rate for the first and second description, respectively, as dictated by the network and the transmission scenario. This is typical of transmission on a packet network subject to known packet loss probability; in such a case, the average distortion can be controlled. This problem can be solved using the Lagrange method introducing the cost function
where and represent positive Lagrange multipliers. With a simple normalization and omitting the terms which are not relevant to for the optimization task we can rewrite the previous cost function as (1) and are positive multipliers. The first where , , Karush–Kuhn–Tucker (KKT) necessary conditions of opti, meaning that , mality [26] requires that 1, 2, . Taking into account that , we can write
is larger than the lossless coding rate, corhappens when , ; being quite unrealistic, this responding to situation will not be further considered. In practice, in order to solve the optimization task it is necand . Given CBs, essary to identify the sets of indexes possible allocations of the indexes to sets and there are . For each allocation, one should find if there are certain Lagrangian multipliers that lead to a solution satisfying the necessary conditions (4) and the constraint conditions. Finally, the best allocation in terms of the Lagrangian cost should be selected as the optimal one. This exhaustive search method is clearly unfeasible for practical values of and will be addressed in the following only for benchmarking purposes. B. Case of Independent Channels With Known Failure Probability The results achieved in the previous section are now applied to the case when the two descriptions are transmitted over channels with known failure probability. In such a case, it is important to tune the inserted redundancy, in order to get the best performance taking into account the network characteristics. We will study the special case of two independent channels with and , and maximum allowable transfailure probability mission rates and . If two descriptions are generated and transmitted over these two channels, the expected distortion at the receiver is (5) The Lagrangian function of this problem is given by (1), where and . This means that the slopes defined in (4) can be rewritten as (6)
(2)
From (6), the following conditions can be worked out:
By taking the derivative of (2), we obtain (7) for
(3)
Consequently, the optimal solution requires that for
(4)
This means that the optimal allocation for description is obtained when CBs are encoded so as to have two different slopes depending on the set they belong to. In particular, the finely , which is coded CBs have an RD slope equal to used for the coarser ones. larger than the slope The other KKT condition requires that , 1, 2. This is verified when , which is the typical case when all the available rate is being used; as a consequence, there are no constraints on the multipliers . On the other hand, in the case . This the same conditions imply
This result is particularly significant since it establishes a relationship between the failure probabilities and the optimal RD slopes. In particular, the ratio between the RD slopes of the finely and coarsely coded CBs of a given description should be equal to the probability of loosing the complementary descrip, for the tion. For example, considering the case when first description, we can conclude that and/or . This means that all the available , the rate is allocated to the finer CBs. In fact, when probability of receiving is extremely high; therefore, the first description should be built so that the redundant part is min, we can write imal. On the other hand, in the case of . This means that the first description should be built so as to deliver a maximum amount of information when the other one is not received, since this is . a very likely event when
676
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
we can easily build two balanced descriptions by using (8) to determine the rate and distortion points for each CB. In case a given CB cannot be associated to any other with similar RD function, it is necessary to split it into two segments, thus, obtaining two smaller and similar sets. This procedure is necessary when is odd, as well. In fact, condition (9) only imposes that points which exhibit and , result in the similar slopes on the RD curve of same rate and distortion contribution over the two descriptions. to have similar So, it is quite enough for the pair rate distortion values in the points where (8) is verified. Taking into account the previous conclusion, we can summarize the less strict conditions in order to build balanced descriptions Fig. 1. Creating balanced descriptions: the rate and distortion contribution of and . Filled and nonfilled circles represents finely and the two CBs coarsely coded version of the CB, respectively.
X
X
III. BALANCED DESCRIPTIONS In the general MDC framework, descriptions can be created with different rates and side distortions, but in practice balanced descriptions are mostly addressed. In the balanced scenario, it is , and . Balanced deassumed that scriptions should be addressed, when the two channels are independent and have similar bandwidth and probability of failure, and/or when it is necessary to have the same delivered quality for both channels. In general, it is not easy to identify the indexes and that, combined with the slopes of (7), yield balanced descriptions. This is especially true for nonstationary processes, are different. where the distortion-rate curves In order to build the set of indexes and that yield baland which have anced descriptions, let us identify CBs ; for example using “similar” RD functions . Assuming L1-norm, it is required that that is finely-coded in the first description, whereas is finely represented in the second description, and assuming , we can write using (7) (8) In general, this equation admits infinite solutions given the funcand . The strict convexity of the RD functions tions entails that, if we take any two points where , there exist two points satisfying (8). Consequently, the rate globally devoted to the two and in the first description is equal to that in the CBs . The same equasecond description: tion can be expressed in terms of CBs distortions: . This results in balanced contributions of the CBs and . Fig. 1 shows a pictorial representation of the rate and ; filled and distortion contribution of the two CBs and nonfilled circles represents finely and coarsely coded CBs, respectively. From the above consideration, we can conclude that if we and so that build the indexes (9)
(10)
A. Balanced Description Performance The performance of the MD approach should be compared with the SDC, where only one representation of the original data is generated in order to be transmitted over a single channel. The . For RD function of the SDC scheme will be denoted by of CB , , a convex RD functions is simply the collection of all points , where all CBs have equal slopes [27]. and , 1, 2, as the rate, respectively, Let us define devoted to CBs that are finely and coarsely represented in description . If balanced descriptions are built using the approach , it described in Section III with a total rate turns out that and . It is clear . Since the CBs are independent, we that have (11) Moreover, the RD functions being convex, we can con, and clude that . The central distortion is obtained decoding the best represented CB in the two descriptions; this means that (12) In practical applications, where a standard encoder is used, (11) and (12) are of particular interest. In fact, they state that two balanced optimized descriptions can be obtained by coding the original data at two rates and by the standard encoder, followed by a rearrangement stage, which combines the data of the two generated streams. The rearrangement stage has to take into account the conditions (10). It is clear that the distortion is less than the central of a SDC encoded at the total rate
TILLO et al.: MULTIPLE DESCRIPTION IMAGE CODING
distortion Since
677
; the equality hold when , it follows that:
.1
The impairment of the central quality with respect to SDC is introduced by the MDC due to the extra rate scheme. It is clear that increasing redundancy results in better side performance and impaired central quality. In order to better exploit the available rate, it is important for any MD scheme to offer an easy redundancy insertion mechanism, which help matching the network conditions. In the proposed approach this . We can write can be accomplished by tuning , where is the signal energy. Taking into account (11) . Comand (12), we can conclude that bining all the previous inequalities, we can work out the following expressions:
As already discussed, it is important to tune the inserted redundancy, in order to get the best performance over the used network. This objective can be achieved using (7). Since the RD functions of the CBs are convex, the global is also convex. So, the slope at the RD RD function is ; point the slope is analogously for the point . Consequently, (7) can be rewritten as
IV. PRACTICAL RATE ALLOCATION OPTIMIZED MD SCHEME In the following, the theoretical results obtained in the previous sections are experimentally validated using JPEG 2000 as the main compression engine. In JPEG 2000 the image is first DWT transformed. The generated coefficients are quantized by a high rate quantizer, then the coefficients of each subband are divided into nonoverlapping rectangular areas, or CBs, which are bit-plane encoded in the Tier-1 module. The bit stream is organized by the rate allocator (Tier-2 module) into a sequence of layers, each layer containing contributions from each CB. The block truncation points associated with each layer are optimized in the rate distortion sense [28]. The rate allocation-based MDC is well suited to such rate allocation procedure. Assuming that the RD curves of the CBs are strictly convex, we can adopt the criterion (10) in order to create the optimized balanced descriptions. To this end, two JPEG 2000 streams are and , respectively, generated, encoded at rates the overall rate being . Let us denote by and the RD points of the th CB belonging to and , respectively. The CBs the stream encoded at rate generated after the DWT are grouped into two sets having similar RD performance. This objective can be achieved by the following procedure, based on CBs classification: Procedure 2 Balanced description creation Given
while
do
identify a CB
, where
search for the CB
such as:
This equation can be used to minimize the expected distortion, are known, following this procedure. given that and identify the couple of CBs
as “similar” CBs
Procedure 1 The redundancy allocation procedure Given
(numerical tolerance)
end while
repeat then
if decrease
, where
increase
, where
else
end if
until 1In practical implementations, this equality cannot be exactly verified due to replicated header information.
where the multiplier determines the importance of achieving balanced rate or distortion, e.g., small emphasizes a balanced rate over balanced distortion. At this point, we have identified the two groups of CBs and having similar RD characteristics. Then, the first description is obtained by combining the CBs group of the stream encoded at rate with the CBs of the stream encoded at rate . Analogously, the second description is built by taking the CBs of the stream encoded at rate with the CBs of the stream encoded at rate . The encoding process is depicted in Fig. 2. This procedure yields balanced descriptions, each description being encoded at rate . At the decoder side, if both descriptions are received, these latter are preprocessed and merged in a single bit stream, where, for each CB, the best representation is selected.2 The resulting 2The best CB representation can be identified by simply determining the CB length.
678
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
R < 2R .
Fig. 2. Generation of descriptions from two JPEG 2000 streams encoded at rates 2
stream is then JPEG 2000 decoded. On the other hand, if a description is lost, the received one is simply JPEG 2000 decoded yielding inferior quality. This modality can also be adopted in case the receiver is not equipped with an enhanced decoder, able to cope with the necessary preprocessing for the case of two description reception, as it will be discussed later in this paper. Assuming that the distortion contribution of the CBs is additive,3 we can quantify the performance of this approach in terms , which of the RD function of the reference SDC scheme is simply the RD curve of the JPEG 2000 encoded image at rate , by using (11) and (12). So, after having estimated the RD function of the reference SDC scheme at the different candidate truncation points in the Tier-1 module, which is important for the success of the RD optimization algorithm of JPEG 2000 and can be easily tuned in order to meet different [28], trade off between central and side performance. This procedure mainly differs from polyphase PDSQ [22] and [30] in that it does not require modifications of the JPEG 2000 embedded quantization block, and it is fully compatible with JPEG 2000, since it exploit its post compression rate allocation procedure which does not depend on the quantization stage. V. EXPERIMENTAL RESULTS A first set of simulations has been carried out employing images Lena, Goldhill, Bridge, and Camera of dimension 256, and using the JPEG 2000 OpenJpeg image co256 3This assumption holds true if the wavelet transform is orthogonal, or the quantization errors for individual coefficients are uncorrelated; in practice, neither condition is met, but the approximation is good enough [29].
TABLE I
j(R 0 R )=R j FOR LENA, GOLDHILL, BRIDGE, AND CAMERA CODED
AT
0.25, 0.50, 1.00, AND 1.50 bpp WITH 20% REDUNDANCY; EXHAUSTIVE SEARCH (EX); CLASSIFICATION-BASED (CL); PREDEFINED RATE ALLOCATION PATTERNS (A), (B), (C)
decoder [31]. Five levels of resolution have been used with CBs . Consequently, of dimension 64 64, leading to and can be selected among possible combinations. First of all, the performance of different classification approaches and are compared. Table I reports, for used to select the four test images coded at total output rate of 0.25, 0.50, 1.00, and 1.50 bpp with 20% redundancy, the relative rate . deviation of the two descriptions, namely As a benchmark we used the exhaustive search method, which has been implemented in the Tier-2 module so as to obtain
TILLO et al.: MULTIPLE DESCRIPTION IMAGE CODING
679
TABLE II CENTRAL PSNR IMPAIRMENT [dB] FOR IMAGE LENA WHEN (A) FIRST OR (B) FIRST-SECOND RESOLUTION LEVELS ARE DUPLICATED
Fig. 3. Predefined rate allocation patterns tested in JPEG 2000 Tier-2 module; a) column-based; b) row-based; c) pseudo-checkerboard.
balanced descriptions. Another approach based on the implementation of predefined rate allocation patterns has been used in Tier-2 module, namely, column-based, row-based and pseudo-checkerboard arrangement of CBs. A pictorial representation of such patterns is reported in Fig. 3. Although the exhaustive search guarantees the best results, it has a computational complexity that grows exponentially with the number of CBs , making it prohibitive for large images. On the other hand, the classification-based approach, with a maximum rate deviation of 2.5%, guarantees good balanced descriptions with much less complexity. In fact, the average encoding time, evaluated on a Pentium 4-1.8 GHz, for 256 256 images and is 1.43 and 6.8 s for classification based and exhaustive search methods, respectively. For seven levels of resolution the encoding time are 1.57 and 424.0 s, respectively, with an evident exponential growth of the second. Due to its computational advantage, the classification-based approach will be adopted in the following. Simulations have been carried out in order to verify the corollary, which states that encoding a portion of data at the same quality in the two descriptions is not an optimal choice (see Appendix A). Employing the image Lena, two sets of simulations have been run. In the first one, the important data of the first resolution level (LL subband) are represented at the same quality in both descriptions. In the second one, both first and second resolution levels are duplicated. In Table II, the performance impairment with respect to the proposed technique, which avoid duplications, is reported. The impairment is measured in terms of central PSNR loss with respect to the proposed algorithm with redundancy 0.15 and 0.25 and three values of the total rate. The redundancy of the schemes with duplication has been tuned so that all the approaches yield about the same side performance. It can be noticed that the proposed approach leads to slightly better results. Duplication of the first resolution level yields a 0.1–0.2 dB penalty. This gap gets larger with the amount of duplicated data. On the other hand, it decreases for higher amount of redundancy, since the difference between the coarse and fine CBs vanishes. As in the proposed examples the number of duplicated coefficients is quite limited, the impact of duplication becomes less evident when increasing the total coding rate. As a consequence, we can notice that the largest gap, which is 0.6 dB, and appears when duplicating two resolutions at redundancy 0.15. In any case, a performance improvement is always obtained by not representing any portion of data at the same quality in both descriptions.
It is very important to test the sensitiveness of the allocation procedure with respect to channel mismatch. In Fig. 4, the expected end-to-end PSNR, evaluated as the PSNR of the mean square error over all single and two description reception cases, is reported versus the failure probability for images Lena and Goldhill (512 512, with six levels of resolution). The total coding rate is 1.5 bpp. The redundancy has been al0.05, 0.15, and located using Procedure 1, and assuming 0.25, respectively. As expected, the allocation with the correct value of yields the best average performance over an interval centered around the value of probability which matches the actual . When the failure probability is overestimated, let us say instead of , the allocated redundancy is larger than the optimal one; as a consequence, the average PSNR is impaired, even if the performance loss is limited to less than 1 dB. The performance of RD-MDC employing Lena image is reported in Fig. 5 in terms of the side and central PSNR versus the overall rate. The theoretical expected performance evaluated by (11) and (12), the SDC and the duplicated SDC curves, are reported for comparison. It can be noticed that there is an excellent match between the actual and the expected performance of the proposed method. In Fig. 6, the performance of RD-MDC is reported in terms of the side and central PSNR versus the overall rate, and compared with MD uniform scalar quantization (MDUSQ), and PDSQ [30]. These latter are MD methods proposed in the literature and based on JPEG 2000. The former, is a modified version of MD scalar quantization [8], whereas the latter is a JPEG 2000-based version of [22], which was originally proposed for SPIHT. The redundancy is tuned so as to compare the different algorithms with the same central performance. This allows one to make fair comparisons in terms of the side distortion. For the proposed algorithm, this corresponds to a redundancy of about 25%. It can be noticed that the proposed method outperforms PDSQ by nearly 1 dB. This is due to the fact that the CBs forming the two descriptions are obtained from two optimized streams, whereas in the polyphase approach adopted in [30] the descriptions are obtained using different quantization steps; this impairs the performance of the subsequent CB truncation rate allocation stage. Moreover, the proposed method outperforms MDUSQ of several dBs in the medium to high rate region . Fig. 7 shows the images obtained when decoding either de0.1, 0.25 bpp and redundancy 25%. This scription, at allows one to validate the visual quality performance of the proposed algorithm. It worth pointing out that, although the images are encoded at low bit rates, no blocking effect can be appreciated. In fact, the inverse wavelet transform smooths the representation error of the coarse CBs. Visual differences between the descriptions can be noticed only at very low rates; however,
680
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
Fig. 4. Expected end-to-end PSNR versus failure probability p for image: a) Goldhill; b) Lena; design parameter p
Fig. 5. (a) Central and (b) side PSNR (dB) versus rate (bpp) for image Lena; SDC; duplicated SDC; RD-MDC algorithm; expected performance.
Fig. 6. (a) Central and (b) side PSNR (dB) versus rate (bpp) for image Lena; RD-MDC algorithm; PDSQ-JPEG 2000 and MDUSQ.
= 0.05, 0.15, and 0.25.
this does not correspond to a significant PSNR variation. In the higher coding rate region it is hardly possible to discriminate between the two descriptions. This is the reason why such low values have been used. In Fig. 8, the central PSNR is reported versus the side PSNR for the Lena image. In Fig. 8(a) and (b), we report the actual performance of the proposed algorithm, its expected performance, and that of PDSQ (using JPEG 2000) and MDUSQ [30] at 0.5 and 1 bpp, respectively. Moreover, in order to enable comparisons with a non-JPEG 2000-based MD approach, in Fig. 8(c) and (d), we show the performance of PDSQ with SPIHT [22] and the algorithm [32], which is based on wavelet decomposition followed by optimized MD quantization and EBCOT (for this latter algorithm, results are available at rate 1 bpp only). From Fig. 8(a) and (b), we can notice that the proposed algorithm exhibits the best performance. Moreover, as already noticed in Fig. 5, there is a good match between the expected and the experimental performance. The model turns out to be slightly inaccurate only at low redundancy values, i.e., top-left corner of the plot. This behavior can be explained noticing that, in this region, the rate devoted to the coarse CBs becomes so small that the header information becomes relevant, making (11) inaccurate. From Fig. 8(c) and (d), it can be observed that the proposed algorithm outperforms PDSQ-SPIHT [22]. The algorithm in [32] yields slightly better results; however, it is not compatible with JPEG 2000. VI. CONCLUSION In this paper, we analyze a novel method to generate a ratedistortion-based MDC for still images. The necessary conditions to generate optimal descriptions are evaluated, along with the algorithm to allocate the amount of redundancy according to the network conditions. We apply the proposed approach to the JPEG 2000 standard. Simulation results show a noticeable performance improvement with respect to state-of-the art algorithms. Moreover, there is a good match between the analytical expected performance and that obtained by simulation. The main features of the proposed MDC scheme are the simple processing to be performed at both the encoder and decoder side, and the easy redundancy tuning mechanism. This latter characteristic is of paramount importance since it permits to select different trade off between central and side performance, depending on the actual network conditions. Moreover, the
TILLO et al.: MULTIPLE DESCRIPTION IMAGE CODING
681
Fig. 7. Images obtained when decoding the first and second description, respectively; R
proposed approach generates backward compatible JPEG 2000 streams. Future research will be devoted to the extension of the proposed technique to the video application. This is a challenging topic due to the interdependency among macroblocks and frames in standard hybrid motion compensated codecs.
In the case as
APPENDIX A
whereas, if
The objective of this section is to demonstrate that it is not convenient to have some CBs which are encoded at the same rate (or equivalently at the same quality) in both descriptions; in . If we other words, we want to demonstrate that assume that we can write that or . Corollary 1: In an optimal quintuple that minimizes the cost function , it is impossible to have a CB which is encoded at the same rate in the two descriptions. , Proof: We will make the hypothesis that and meaning that only one CB is encoded at the same rate in ; the generalization to more than one common CB is straightforward. By taking the derivative of (1), we can write
= 0.1 and 0.25 bpp, redundancy 25%. , the term
can be evaluated
, we have
We assume that the distortion functions are convex decreasing functions. This means that for 1, 2. Let us assume that a certain optimal operational point is obtained by the set of rates and . Moreis such that over, let us suppose that a CB as shown in Fig. 9. If we increase by the rate deand correspondingly decrease by that of , voted to the total rate is kept unchanged. According to Fig. 9 we as, i.e., . As a consequence, the sume that function cost changes by:
682
IEEE TRANSACTIONS ON IMAGE PROCESSING, VOL. 16, NO. 3, MARCH 2007
Fig. 8. Central versus side PSNR (dB) for image Lena. The RD-MDC algorithm is compared with: its expected performance; PDSQ-JPEG 2000 and MDUSQ. (a) 0.5 bpp; (b) 1 bpp. The RD-MDC algorithm is compared with: PDSQ-SPIHT and [32]. (c) 0.5 bpp; (d) 1 bpp.
the second description distortion unchanged. Moreover, given , the distortion of that we assumed the first description also decreases. Using the same approach, one can enumerate all the possible combinations of rates and descriptions. If and , the cost function changes by
Fig. 9. Graphical demonstration of Corollary 1.
Having means that the second rate allocation improves the performance. The same conclusion can also be drawn graphically from Fig. 9. In fact, we can notice that the distortion contribution of the th CB on the central distortion remains unis kept constant), whereas the fine version of the changed ( th CB improves ( increases by ). As a consequence, the central distortion decreases, while keeping the overall rate and
If no CB satisfies the condition with there is a CB changing by and by
, then . Now, by , we can write
if or
if
TILLO et al.: MULTIPLE DESCRIPTION IMAGE CODING
683
The other three parameters in these two cases are not altered. At this point, we can conclude that in an optimal quintuple which minimize the cost function , it is impossible to have a CB which is encoded to the same rate in the two descriptions; in other words (13)
REFERENCES [1] V. K. Goyal, “Multiple description coding: compression meets the network,” IEEE Signal Process. Mag., vol. 18, no. 5, pp. 74–93, Sep. 2001. [2] M. Alasti, K. Sayrafian-Pour, A. Ephremides, and N. Farvardin, “Multiple description coding in networks with congestion problem,” IEEE Trans. Inf. Theory, vol. 47, no. 3, pp. 891–902, Mar. 2001. [3] C. Lee, J. Kim, Y. Altunbasak, and R. M. Mersereau, “Layered coded vs. multiple description coded video over error-prone networks,” Signal Process.: Image Commun., May 2003. [4] J. K. Wolf, A. D. Wyner, and J. Ziv, “Source coding for multiple description: a binary source,” Bell Syst. Tech. J., vol. 59, pp. 1417–1426, Oct. 1980. [5] A. El Gamal and T. Cover, “Achievable rates for multiple description,” IEEE Trans. Inf. Theory, vol. 28, no. 6, pp. 851–857, Nov. 1982. [6] R. Ahlswede, “The rate-distortion region for multiple descriptions without excess rate,” IEEE Trans. Inf. Theory, vol. IT-31, no. 6, pp. 721–726, Nov. 1985. [7] Z. Zhang and T. Berger, “New results in binary multiple descriptions,” IEEE Trans. Inf. Theory, vol. 33, no. 4, pp. 502–521, Jul. 1987. [8] V. Vaishampayan, “Design of multiple description scalar quantizers,” IEEE Trans. Inf. Theory, vol. 39, no. 3, pp. 821–834, May 1993. [9] S. D. Servetto, K. Ramchandran, V. A. Vaishampayan, and K. Nahrstedt, “Multiple description wavelet based image coding,” IEEE Trans. Image Process., vol. 9, no. 5, pp. 813–826, May 2000. [10] J. Hafarkhani and V. Tarokh, “Multiple description trellis-coded quantization,” IEEE Trans. Commun., vol. 47, no. 6, pp. 799–803, Jun. 1999. [11] V. Vaishampayan, N. J. A. Sloane, and S. D. Servetto, “Multiple-description vector quantization with lattice codebooks: design and anaysis,” IEEE Trans. Inf. Theory, vol. 47, no. 5, pp. 1718–1734, Jul. 2001. [12] J. C. Batllo and V. Vaishampayan, “Asymptotic performance of multiple description transform codes,” IEEE Trans. Inf. Theory, vol. 43, no. 2, pp. 703–707, Mar. 1997. [13] V. K. Goyal and J. Kovacevic´ , “Generalized multiple description coding with correlating transforms,” IEEE Trans. Inf. Theory, vol. 47, no. 6, pp. 2199–2224, Sep. 2001. [14] Y. Wang, M. T. Orchard, V. Vaishampayan, and A. R. Reibman, “Multiple description coding using pairwise correlating transforms,” IEEE Trans. Image Process., vol. 10, no. 3, pp. 351–367, Mar. 2001. [15] C. Doe-Man Chung and W. Yao, “Multiple description image coding using signal decomposition and reconstruction based on lapped orthogonal transforms,” IEEE Trans. Circuits Syst. Video Technol., vol. 9, no. 6, pp. 895–908, Sep. 1999. [16] X. Yan and K. Ramchandran, “Optimal subband filter banks for multiple description coding,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2477–2490, Nov. 2000. [17] R. Balan, I. Daubechies, and V. Vaishampayan, “The analysis and design of windowed fourier frame based multiple description source coding schemes,” IEEE Trans. Inf. Theory, vol. 46, no. 7, pp. 2491–2536, Nov. 2000. [18] A. Skodras, C. Christopoulos, and T. Ebrahimi, “The JPEG2000 still image compression standard,” IEEE Signal Process. Mag., vol. 18, no. 5, pp. 36–58, Sep. 2001. [19] R. Bernardini and R. Rinaldo, “Efficient reconstruction from framebased multiple descriptions,” IEEE Trans. Signal Process., vol. 53, no. 8, pp. 3282–3296, Aug. 2005. [20] K. P. Subbalakshmi and S. Somasundaram, “Multiple description image coding framework for ebcot,” in Proc. IEEE Int. Conf. Image Processing, Jun. 2002, vol. 3, pp. 541–544. [21] D. Taubman, “High performance scalable image compression with EBCOT,” IEEE Trans. Image Process., vol. 9, no. 7, pp. 1158–1170, Jul. 2000. [22] W. Jiang and A. Ortega, “Multiple description coding via polyphase transform and selective quantization,” in Proc. SPIE Int. Conf. Visual Communication Image Processing, Jan. 1999, pp. 998–1008. [23] P. Sagetong and A. Ortega, “Optimal bit allocation for channel-adaptive multiple description coding,” in Proc. Image Video Communication Processing, SanJose, CA, Jan. 2000, pp. 53–63.
[24] T. Tillo and G. Olmo, “A novel multiple description coding scheme compatible with the JPEG 2000 decoder,” IEEE Signal Process. Lett., vol. 11, no. 11, pp. 908–911, Nov. 2004. [25] E. Akyol, A. Tekalp1, and M. R. Civanlar, “Scalable multiple description video coding with flexible number of descriptions,” presented at the IEEE Int. Conf. Image Processing, 2005. [26] B. D. Sivazlian and L. E. Stanfel, Optimization Techniques in Operations Research. Englewood Cliffs, NJ: Prentice-Hall, 1975. [27] K. Sayood, Introduction to Data Compression. San Francisco, CA: Morgan Kaufmann, 1995. [28] JPEG 2000 Part 1, ISO/IEC 15444-1 Document ISO/IEC 154441:2004. [29] M. Rabbani and D. Santa-Cruz, “JPEG2000 still-image compression standard,” presented at the Int. Conf. Image Processing, Thessaloniki, Greece, Oct. 2001. [Online]. Available: jj2000.epfl.ch/jj_publications/ [30] T. Guionnet, C. Guillemot, and S. Pateux, “Embedded multiple description coding for progressive image transmission over unreliable channels,” in Proc. Int. Conf Image Processing, Oct. 2001, vol. 1, pp. 94–97. [31] Multiple Description JPEG 2000 Codec [Online]. Available: http://www.telematica.polito.it/sas-ipl/ [32] M. U. Pereira, M. Antonini, and M. Barlaud, “Channel adapted multiple description coding scheme using wavelet transform,” in Proc. Int. Conf. Image Processing, Sep. 2002, vol. 2, pp. 197–200. Tammam Tillo (S’02–M’06) was born in Damascus, Syria, in 1971. He received the degree in electrical engineering from the University of Damascus, Syria, in 1994, and the Ph.D. in electronics and communication engineering from Politecnico di Torino, Torino, Italy, in 2005. From 1999 to 2002, he was with the Souccar For Electronic Industries, Damascus. In 2004, he was a Visiting Researcher at the EPFL, Lausanne, Switzerland. He is currently a Postdoctoral Researcher at the Dipartimento di Elettronica, Politecnico di Torino. His research interests are in the areas of robust transmission, image and video compression, and hyperspectral image compression. Marco Grangetto (S’99–M’03) received the “summa cum laude” degree in electrical engineering in 1999 and the Ph.D. degree in 2003 from Politecnico di Torino, Torino, Italy. He is currently a Postdoctoral Researcher at the Image Processing Lab, Politecnico di Torino. His research interests are in the fields of multimedia signal processing and communications. In particular, his expertise includes wavelets, image and video coding, data compression, video error concealment, error resilient video coding, unequal error protection, and joint source channel coding. He has participated in the ISO standardization activities on Part 11 of the JPEG 2000 standard. Dr. Grangetto was awarded the premio Optime by Unione Industriale di Torino in September 2000 and a Fulbright Grant in 2001 for a research period with the Department of Electrical and Computer Engineering, University of California, San Diego. He has been member of the Technical Program Committee for several international conferences, including IEEE ICME, ICIP, and ISCAS. Gabriella Olmo (S’89–M’91–SM’06) received the Laurea degree (summa cum laude) and the Ph.D. in electronic engineering from the Politecnico di Torino, Torino, Italy. She is currently an Associate Professor at the Politecnico di Torino. She has coordinated several national and international research programs in the fields of wireless multimedia communications, under contracts with the European Community and the Italian Ministry of Education. She has coauthored more than 140 papers in international technical journals and conference proceedings and is part of the Editorial Board of the Springer journal Signal, Image, and Video Processing. Her main recent interests are in the fields of wavelets, remote sensing, image and video coding, resilient multimedia transmission, joint source-channel coding, and distributed source coding. Dr. Olmo is member of IEEE Communications Society and IEEE Signal Processing Society. She has been member of the technical program committees and the Session Chair for several international conferences.