1
Permuted Smoothed Descriptions and Re nement Coding for Images Justin Ridge, Fred W. Ware, and Jerry D. Gibson
Abstract | We consider the problem of transmitting compressed still images over lossy channels. In particular, we examine the situation where the data stream is partitioned into two independent channels, as is often considered in the multiple descriptions approach to image compression. In the present work, we introduce a coder design called smoothed descriptions, that matches a data partitioning method utilized at the encoder to the error concealment technique employed at the decoder. This approach has the advantage of inserting minimal overhead into the transmitted data streams, so that system performance is undiminished when there are no packet losses over the channel. We show that, by using a combination of DC averaging and maximal smoothing to conceal errors, performance comparable to or better than multiple descriptions can be achieved for packet loss rates up to 5%. By adding a feedback loop that requests retransmission of important data, we also demonstrate the need to exploit latency whenever possible. Keywords | image compression, image partitioning, multiple description coding, robustness, error resiliency
I. Introduction
While error resiliency is important for image and video coders irrespective of the channel over which they operate, particularly close attention must be paid to this facet of the coder when relatively lossy channels, such as in wireless networks, are involved. Methods for countering the eects of errors can be broadly categorized as (a) forward error concealment, (b) decoder postprocessing, and (c) interactive concealment involving both the encoder and decoder. The approaches are similar for both video and image coding, and have been thoroughly reviewed in [1]. Of particular interest are schemes involving encoder-end partitioning of the data stream into two or more `descriptions', to be transmitted independently over the network. Partitioning strategies are a type of forward error correction, placing the burden of error concealment on the encoder and leaving the decoder to perform a simple piecing together of the received information. They are well suited to applications where there exist a number of distinct paths between encoder and decoder; whether or not this independence is physical (such as wireless diversity reception) is addressed by the speci c implementation. Multiple description coding (MDC) [2] is one partitioning approach which, unlike layered coding[3], does not require any particular description (or `stream') to act as a base layer; the coder is designed such that an acceptable Justin Ridge and Jerry D. Gibson are with the Department of Electrical Engineering, Southern Methodist University, Dallas, TX. Fred W. Ware is with the Department of Electrical Engineering, Texas A&M University, College Station, TX. Please address correspondence to Justin Ridge, e-mail:
[email protected].
result is achieved irrespective of which description(s) is received. In this paper, MDC refers to coders which adhere to the information theory meaning of multiple descriptions[4], including those presented in [5], [6] and [7], rather than those which, although appearing under the MDC banner, employ statistical channel models[8]. The ability of multiple description coding to guarantee an absolute lower bound on quality, even under very lossy conditions, has led to its increasing popularity. However, in imposing such a lower bound on quality, redundancy is inevitably introduced into the bitstream[4], implying conservatism on the part of MDC under less stringent operating conditions. Furthermore, in order to achieve acceptable twodescription quality for an arbitrary rate, many multiple description coders limit the amount of redundancy added to the bitstream, which in turn limits the single-description quality. As a result, not only might MDC be conservative in a rate-distortion sense for channels with a particular statistical pro le, but it might also yield practically unacceptable results. MDC inherits its conservative nature from the fact that it is a model-based information theory approach to coder design. In this paper, we develop an approach called smoothed descriptions, which is a coder design that makes use of both forward error concealment principles and decoder postprocessing, but which does not require any interaction between encoder and decoder to facilitate the concealment. According to this scheme, the discrete cosine transform (DCT) coded data is partitioned prior to transmission in such a manner that no redundancy is introduced, and then processing is done at the decoder to conceal the error in the event that a block from one description is not received. The decoder concealment is a two-stage process, where missing DC components are initially estimated, and then the optimum values for all missing coecients are determined using a maximal smoothing approach. In channels with highly variable characteristics, it is shown that the new concealment approach can realize additional bene ts if the encoder utilizes feedback from the decoder to perform a selective retransmission of missing information. Section II of this paper is devoted to examining recent work in multiple description coding, and its pros and cons in a practical environment. The new `smoothed descriptions' concealment algorithm and re nement strategy are discussed in depth in Section III, with comparative results provided in Section IV.
2
II. Multiple Description Coding
Multiple description coding (MDC) is a method of partitioning a coded bitstream into diering `descriptions' of the one source, such that receiving one or more of the source descriptions allows the source image to be reconstructed to some prescribed level of quality. The problem is one of weighted joint optimization, where the reconstructed image quality at a given data rate is optimized simultaneously with regard to the case where all descriptions are received, and also with regard to the case where a subset of descriptions is received. In order for optimum results to be achieved when only one description is received correctly, each description must carry with it sucient information about the original source (in the case of DCT-coded images, this could be taken to mean low frequency information such as the DC component). However, having this correlation between descriptions means that some information will have been transmitted needlessly when all descriptions are received; that is to say, an overhead or redundancy has been introduced. Due to this dichotomy, a compromise between redundant code and single-description quality must be realized. The degree to which the importance of one of these is elevated over the other becomes the weighting in the joint optimization problem[6]. A. Correlation between MDC descriptions Current multiple description coders[9], [10] approach the optimum by performing non-orthogonal transforms on the source data, so that the partitioned streams contain the required correlation. One of the better multiple description coders is the `paired transform coecient MDC' introduced in [5] and enhanced in [6]. Instead of using a transform which decorrelates all coecients, transform basis vectors are rotated by some so that correlation between pairs of coecients remains. Practically, this can be accomplished by rotating the DCT coecient pair (c1 ; c2 ) to give
c~1 = R c1 (1) c2 c~2 where p p ?p tan (2) R?1 = p12 ptan cot cot then allowing the pair (c~1 ; c~2 ) to then be split, with each ro-
The value of controls the amount of overhead in the MDC coder of [6], but there are signi cant questions concerning its selection. For instance, could be a xed value based upon E (p), or could be adjusted dynamically in response to varying channel conditions. In channels with varying p, maintaining opt is clearly desirable, dictating that dynamic adjustment should yield superior performance in bursty channels, but this raises the issue of sensing channel conditions, and synchronizing the values of at the encoding and decoding ends. C. Conservatism of MDC Because the MDC optimization does not take into account channel statistics (indeed, this is at the core of the MDC approach posed in [4]), the actual performance of the coder will vary according to the statistical probability of block error in the channel used; that is, the probability of losing data will not be zero (all descriptions received) or one (single description received), but rather somewhere in between. In the case of a two-description MDC and for a given image, it would be unlikely for errors to entirely corrupt one channel for the duration of transmission. Thus the actual MDC coder performance is a statistical combination of the decoder's performance when (a) both descriptions are received, and (b) one description is received. Additionally, since many MDC coders (including [5]) add redundancy on an intra-block basis, each block can be decoded independently of each other. The coder performance1 at a given rate can thus be stated as the average performance in decoding an individual block, which for two descriptions and assuming independent, identical channels, is E (Q) = (1 ? p)2 E (Q2 ) + 2p(1 ? p)E (Q1 ) (3) where Qi is the coder performance when i descriptions are received. Note that both descriptions are lost with probability p2 , rendering reconstruction impossible unless concealment algorithms are applied. Since Q2 will be below the corresponding quality for a single description coder at the same rate by virtue of the MDC-induced overhead, it is clear that for some range of error probability p, it would be preferable not to use the multiple description approach. It is shown in Section IV that this range of p can in fact be signi cant, making MDCbased coders unattractive in some circumstances. Multiple descriptions' trade-o between single-description quality and rate overhead can yield a practically unacceptable result when the designed amount of redundancy does not correlate with actual channel characteristics. For the purposes of illustration, a channel may have an expected p of 0.1, with the MDC-based coder's selected accordingly. Now assume that one description is subjected to a severe error burst { indeed, assume the `worst-case', that one description is entirely eliminated. Since the coder has been designed with a lower p in mind, the decoder output is
tated coecient being allocated to a dierent description. The parameter becomes the optimization weighting, controlling how much redundancy is incorporated into each description. B. Synchronization issues Channels carrying packetized data are often characterized by the probability of a packet error, p. However, this is really an expectation, better written as E (P ) = p, and does not reveal the channel's stability, or the `burstiness' 1 Making the allowance that loss of a data packet will only aect of the channel noise. the blocks within that packet, and will not propagate forward.
3
likely to be of poor quality, and may even be unacceptable without some form of enhancement. To counter this worst-case scenario, a modi cation called `re ned descriptions', which would allow re nement of any description, is introduced in Section III.
III. Smoothed Descriptions
(a)
B. Bitstream partitioning Following the DCT and quantization operations, coecients are partitioned into n descriptions; this paper describes the case where n = 2. The precise method of partitioning emerges as an important factor in the performance of smoothed description coding, and hence will receive detailed attention here. To establish notation, the DCT coder is taken to operate on N N blocks of the image, with N = 8 most commonly used. Following the DCT, the coecients from each block are also conventionally ordered in N N blocks { the DC term located in the upper-left at location (1; 1) and the highest-frequency coecient in the lower-right at (N; N ). From the rst coded block, every odd-indexed coecient (i.e. the AC1 ; AC3 ; AC5 ; : : :) is allocated to the rst description and every even-indexed coecient is allocated to the second description. For every second coded block in the row, the strategy is reversed as shown in Figure 2(a). The algorithm is similarly applied to other rows of blocks, with every second row having the opposite allocation as shown in Figure 2(b). Expressed mathematically, the j th coecient from block b of description d will be set to
Fig. 3. Resulting image using the smoothed descriptions partitioning approach, when one stream is received correctly and the other is lost entirely.
The `smoothed descriptions' method is a hybrid approach borrowing concepts from multiple descriptions and decoder error concealment, and therefore does not t neatly into any of the three concealment categories set out in [1]. For reasons of network overhead and delay, it may be undesirable to perform concealment by having the decoder feed back information to the encoder. Even considering (b) only those coders which do not employ feedback, only a mi2. (a) Coecient partitioning scheme for one row of 8 8 blocks. nority make use of techniques at both encoding and decod- Fig. Each square represents a coecient; shaded coecients are alloing ends to facilitate concealment, as is done in smoothed cated to description one, and unshaded coecients to description two. Block boundaries are distinguished by thick lines, and DC descriptions. coecients marked by . (b) Coecient selection for four rows of blocks in description one, indicating reversal of allocation for A. DCT coder every row. Each square represents a block of coecients; description one receives the odd-indexed coecients from shaded blocks, The smoothed descriptions image coder is based on and even-indexed coecients from unshaded blocks. a block transform such as the discrete cosine transform (DCT), with this paper speci cally using an altered JPEG2. A modi cation is introduced disabling DC prediction, removing any inter-block correlation and therefore helping to eliminate error propagation.
This partitioning scheme is symmetric, in that each description contains the same amount of information about the source; half of every coecient subband is allocated to each description. Importantly, it is seen that for one description of a particular block, all four neighboring blocks on that same description have the opposite coecient allocation, assisting in concealment at the receiver. In a multiple description sense, this is a bad partitioning method, as receipt of only one stream would result in half the DC being lost yielding the checkerboard eect (Figc~bd;j = cb2j+[(d+b+r) mod 2] ; r = bb=Mxc (4) coecients ure 3). But because it adds almost no overhead, dividing bitstream into two equally important streams is helpful where cbj is the j th DCT-coded coecient of block b, and the since, in the event of a block error, the pattern of missing the image measures Mx My blocks. coecients is regular and easy to determine [11]. In a bursty packet loss environment, the above method 2 Based on the PVRG JPEG code v.1.2, available from may result in whole rows of blocks being lost at a time. This ftp://havefun.stanford.edu/pub/jpeg
4 Channel noise Source image
DCT
Reconstruction
Permute block order
Block selection
Partition bitstream
DC average
Smoothing
Channel noise Channel noise
Fig. 1. Block diagram of smoothed description coder, with re nement feedback path shown as a dashed line.
outcome is undesirable, since it is shown in Section III.C that the ability to reconstruct one errored block will depend upon proper receipt of neighboring blocks. A permutation, or re-ordering of blocks, provides one solution, and has been used previously in other image coders such as [12], [13]. A simple permutation code is the linear congruential generator (LCG), rst proposed in [14] and also used in random number generation. The permutation indices are speci ed in a recursive fashion as
Ik+1 = (Ik + ) mod M (5) with I0 arbitrarily set to zero. M is the total number of
from the location of that block within the image. The major task of the receiver is thus to estimate these missing coecients; [16] and [17] develop algorithms for this based on coecient interpolation and spatial smoothness respectively. Smoothed descriptions adopts the former technique. C. Maximal smoothing Maximal smoothing takes advantage of the spatial smoothness of naturally occurring images by minimizing luminosity dierences between adjacent pixels both within a block and across block boundaries. Although [16] gives a complete treatment of the algorithm, the core elements are restated here. Formulating the problem, let fi;j represent the sample at position (i; j ) { ith row and j th column { in a N N block. The transform and inverse transform can then be written as
blocks in the image, and therefore is also the maximum period of the LCG. Parameters and are open for user speci cation, but in order for the LCG period to achieve its maximum, prime numbers often prove to be good choices for , and setting = 1 is satisfactory. For a more thorough treatment of random sequence generation, see [15]. NX ?1 NX ?1 Other permutation mechanisms, such as spatial interleavti;j;k fi;j ; k 2 (0; N 2 ? 1) c = k ing, could be substituted for the LCG, but such regular i=0 j =0 (9) 2 ?1 schemes may encounter diculties with periodic errors. NX The process of permuting block order can be expressed fi;j = ti;j;k ck ; i; j 2 (0; N ? 1) as k =0 (6) c^bj = cIjb : ti;j;k is the appropriate basis function element3 . Now and in fact, it is observed that both (4) and (6) are one- where de ne two sets, D1 and D2 , specifying which coecients dimensional permutations which can be combined to give were transmitted in each description. We could therefore I b b have c~d;j = c2j+[(d+b+r) mod 2] ; r = bb=Mxc (7) D1 = (0; 2; 4; : : :; N 2 ? 2) Due to the permutation in (6), the need for alternating the partitioning scheme for every second row is obviated, D2 = (1; 3; 5; : : :; N 2 ? 1) (10) allowing (7) to be simpli ed to or the inverse. Now assume that one description (D2 , withc~bd;j = cI2bj+[(d+b) mod 2] (8) out loss of generality) is not received correctly and that the other (D1 ) arrives intact. Then coecients ck ; k 2 D1 will Additional generalization is possible to the point where be received, and the remainder will need to be estimated. each coecient employs a dierent permutation code. Denoting estimated coecients as c^k , (9) can be rewritten According to the scheme set forth in (8), at the receiver, it is possible for a given block to be received correctly on 3 For a two-dimensional DCT, both streams; to be received only on one stream; or not v to be received on either stream. Clearly in the rst case, cos (2 j + 1) t = 41 T (u)T (v) cos (2i + 1) u 16 16 the block can be fully reconstructed and no concealment n p o is necessary. In the other scenarios, which coecients are missing and which are received can be readily determined where T (x) = 1= (2); x = 1; 1; x =6 1 and k = uN + v. i;j;k
5
The `boundary conditions', or values of samples adjacent to the block edge, are also written in directional mabe 2 trix form as bw ; be ; bn and bs , all N 2 1 in dimension. bw 0 Each matrix is populated with zeros, except for those elebe 5 bw 3 ments which form the directional boundary, in which case they are assigned the value of the adjacent sample in the b e8 bw 6 neighboring block. Continuing the 3 3 example, elements k 2 (0; 3; 6) of bw would be non-zero, taking on the value bs 6 bs 7 bs 8 of the sample immediately to the west of them. (a) (b) A weighted square dierence between adjacent samples Fig. 4. Smoothing directions for (a) an 8 8, and (b) a 3 3 block. is now de ned for the smoothing measure: bn0 bn1 bn2
as
f^i;j =
X
k2D1
ti;j;k ck +
X
k2D2
ti;j;k c^k ; i; j 2 (0; N ? 1): (11)
For convenience, these expressions can be rewritten in matrix form, so that (9) becomes c = TT f ; f = Tc (12) where T = [t0 ; t1 ; : : : ; tN2 ?1 ], tk being formed from the kth basis vector of the transform, or ti;j;k ; i; j 2 (0; N ? 1). Let cr denote the subset of received coecients, ck ; k 2 D1 , and let Tr be formed from the corresponding basis vectors, tk ; k 2 D1 . Conversely, let cl be a vector containing the coecients from the missing description and Tl hold the remaining basis vectors. Then (11) becomes ^f = Trcr + Tl^cl : (13) Having established notation, the smoothing measure is now discussed. If the sample values surrounding an improperly received block are known whilst intra-block samples must be estimated, it follows that each sample should be smoothed in the direction of the nearest block boundary (toward the sample oering greatest certainty), as depicted in Figure 4. Four directional smoothing matrices, Sw ; Se ; Sn and Ss , each N 2 N 2 , are de ned with the subscript (west, east, north, south) indicating the smoothing direction. Graphically, element (i; i) of each matrix will be one if there is an arrow pointing out of the ith sample of the block in the desired direction, or it is being smoothed `away from'. Element (i; j ) of each matrix will be -1 if there is an arrow pointing from the ith element toward the j th element, or the sample is being smoothed `towards'. Remaining areas of the matrix are set to zero. By way of illustration, a 3 3 block (Figure 4(b)) would result in the west-facing smoothing matrix 2 3 1 0 0 0 0 0 0 0 0 6 0 0 0 0 0 0 0 0 0 77 6 6 0 0 0 0 0 0 0 0 0 77 6 6 0 0 0 1 ?1 0 0 0 0 7 Sw = 666 0 0 0 0 1 0 0 0 0 777 : 6 0 0 0 0 0 0 0 0 0 77 6 6 0 0 0 0 0 0 1 0 0 77 6 4 0 0 0 0 0 0 0 0 05 0 0 0 0 0 0 0 0 0
(^cl ) = 21
w
2 2 S ^f ? bw
+
Se^f ? be
+
2
2
^
^
Sn f ? bn + Ss f ? bs
(14)
For maximum smoothness, the dierence between adjacent samples as indicated by (^cl ) should be minimized. Substituting (13) shows (14) to be a quadratic in ^cl , so optimization becomes a simple matter of dierentiation, @ = TT ST ^c + TT [ST c ? b] = 0 (15)
@^cl
l
l opt
l
r r
where
S b
= =
STw Sw + STe Se + STn Sn + STs Ss; STw bw + STe be + STn bn + STs bs:
Rearranging gives the nal result,
^copt = ?TTl STl?1 TTl (b ? STrcr ) = Ab + Bar (16) The matrix TTl STl is guaranteed to be nonsingular due to the formulation of S. Observe that the matrices A and B are functions of three parameters only: the smoothing matrix S and the transform matrices Tr and Tl . As the source partitioning
strategy is known by the receiver, and as the receiver can also determine which of the two descriptions has not been received, Tr and Tl can be known de nitively. Furthermore, since it was decided in Section III.B to partition the data symmetrically, the basis matrices will have the same contents in all circumstances, with only the subscripts being swapped depending upon which description is lost. In the two-description case, there are only two possible values of each A and B, both of which can be computed before commencement of the smoothing operation. Computation of the optimal estimate of the missing coecients therefore reduces to two matrix multiplications and an addition for each block. D. DC averaging Maximal smoothing acquits itself well when boundary conditions are known, i.e. when neighboring blocks are received correctly. At low probabilities of error, the possibility of losing a description from adjacent blocks barely impacts upon results, but in more lossy channels, or in
6
those subject to burst errors such as wireless channels, it becomes a real concern. In [16], the solution to this problem is to iterate the smoothing algorithm until the result is unchanged (within some threshold) from one iteration to the next. Unfortunately, this has the potential to greatly increase the amount of computation required by the algorithm, which in wireless devices could impact on areas such as cost and battery life. An alternative is to utilize the partitioning characteristics from Section III.B. First, note that the human eye has greatest sensitivity to low frequencies, and in particular the DC value [18]. As far as a block's boundary conditions are concerned, the absence of a DC component will have the most detrimental impact, whereas missing high-frequency components can often be overlooked with little negative consequence to the smoothing algorithm. Thus the problem of determining boundary conditions is approximately equivalent to determining estimates of all missing DC values. Secondly, careful selection of the permutation code will ensure that (for reasonable packet sizes) DC terms from neighboring blocks are not transmitted in the same packet. In addition, due to the assumption that each description channel is independent, the expected probability of at least one neighboring DC value being received will be (1 ? p4 ). The inherent smoothness of naturally occurring images allows adjacent available DC values to be used in estimating those which are missing. Once all DC values are either known or estimated, the boundary conditions are determined with suf cient precision for the maximal smoothing algorithm to proceed as in Section III.C. The DC estimation algorithm itself is extremely simple. Where all four neighbors are received, the pair of adjacent blocks (vertical or horizontal) with the most similar DC components are used to determine the missing DC component, which is arrived at via a simple average. Mathematically, 8 dcw + dce ; kdc ? dc k kdc ? dc k > < w e n s dcavg = > dcn +2 dcs : 2 ; otherwise (17) where dcw ; dce ; dcn and dcs are the DC component values in neighboring blocks to the west, east, north and south respectively (Figure 5(a)). At the image edges, the estimate is determined by averaging the available pair, and at the image corners, the two available DC components are used. More complex averaging algorithms have been considered which, for example, use a weighted average of available neighbors with the weightings based on that neighbor's distance from the damaged block (Figure 5(b)), ?1 dce ? 1 ?1 ?1 e + dcs s + dcn n : (18) dcavg = dcw w + ? 1 ? 1 ? 1 ? 1 w + e + s + n These more complicated schemes can provide marginally better performance, but whether the complexity versus performance trade-o is worthwhile will generally depend on the individual problem.
dcn dcw
dcn ∆n
dce
dcw ∆ w
dce dcs
dcs
(a)
(b)
Fig. 5. Each square represents a block, with shaded blocks indicating absent DC terms. (a) When considering the block marked , neighboring DC components are labeled as shown. (b) If neighboring DC terms are also missing, the closest available neighbors can be used. Here, = 3; = 2 and = = 1. w
n
e
s
E. Re nement Section II.B highlighted how, due to the trade-o between single-description quality and rate overhead, multiple description coding may produce visually unacceptable results at high error rates. In select circumstances where low latency is not stressed, introducing a feedback loop from decoder to encoder may be of bene t. In packet-switched networks, numerous blocks will be transmitted together in a packet with some maximum size, Smax . When a packet is lost, all blocks within that packet are lost, so that for a given packet error rate p, the losses are no longer independent. We proceed under this realistic constraint. Although the introduction of feedback does raise many new issues in addition to latency, the basic principle is simple. For each description, the expected number of packets required to transmit an image of M blocks at rate R bits/pixel (bpp) can be represented as
E (B ) d R 2SMN e: 2
max
(19)
Analysis now starts at the decoder. Packets from a given description will be lost with probability p, so that for the decoder to communicate this loss information back to the coder, a minimum of H (p) bits per transmitted packet per description will be required, where we know H (p) ! 1 as p ! 0:5. Assuming H (p) = 1, a worst-case scenario where either the probability of error is extremely high, or the channel characteristics are highly variable so as to prevent coder adaptation, then the number of feedback packets Bfb is upper-bounded by (20) B = d 2H (p) E (B ) e d 2E (B ) e fb
Smax
Smax
If one of the two `downstream' channels is used to return the feedback to the transmitter, there is also a probability E (Pfb ) = p of feedback packet loss. Practically, the decoder should be capable of analyzing both channels and sending the feedback data on the channel deemed most reliable at that instant. Alternatively, the feedback data is
7
likely to be suciently small that it could be transmitted on both channels without signi cant rate penalty, so that the probability of the encoder not receiving a feedback packet reduces to E (Pfb ) = p2 . This smoothed descriptions coder takes the latter approach, with an uncoded one-bit ag (indicating successful reception or failure of each packet of each description) forming the feedback data. Should the encoder receive this feedback, its simplest option would be to retransmit those packets which were received incorrectly. Or, since the encoder can replicate precisely the error concealment operation, a re nement layer could be sent allowing the decoder's smoothed description representation to be improved. Both of these operations would again be subject to losses of E (Pref jPfb = 0) = p. Using a simple retransmission scheme, the probability of not receiving re nement data is
E (Pref ) =
1 X
i=0
E (Pref jPfb = i)Pr(Pfb = i) (21)
= E (Pref jPfb = 1)Pr(Pfb = 1) + E (Pref jPfb = 0)Pr(Pfb = 0) (22) = p2 + p(1 ? p2 ) (23) and ultimately yielding E (P ) = p2 (1 + p ? p2 ) (24) An estimate of quality for this feedback system can therefore be achieved by altering the rate R non-feedback error expectation E (P ) to the value speci ed by (24). Naturally, the eective bitrate must be adjusted to account for the extra transmissions incurred during the feedback and re nement operations. If the smoothed description data was originally transmitted at rate R, the expected `total rate' Rtotal R[1 + p ? p3 ] + 2Bfb Smax (25) should be used in all comparisons. Rather than use a verbatim retransmission strategy, the smoothed descriptions coder presented here takes a slightly dierent tack. Previous discussion has already determined that performance is degraded when neighboring blocks are both in error, and this can be extrapolated to say that the `importance' of a block is directly related to the number of errored blocks surrounding it, so that retransmission of certain blocks will have a greater positive impact on performance than would others. Several metrics can be used to measure the `importance' of a block, including the Hamming distance of neighbors in the feedback block, or the energy of missing neighbor coecients. To implement the former approach, the number of missing neighbors of a damaged block are summed over all descriptions, yielding a priority value v, and then those blocks with the highest v values form the retransmission stream. The fraction of errored blocks which are retransmitted is controlled by a parameter , 0 1, so that the retransmission scheme heretofore described has the property = 1.
Depending upon the channel characteristics, it may be desirable to provide extra error protection to these `important' retransmitted blocks. We achieve this by sending all retransmitted blocks over both channels, so that the total rate of (25) becomes Rtotal R[1 + 2p ? 2p3 ] + 2Bfb Smax (26) The ideal system will vary this error protection as well as in response to channel conditions. Once again, since this re nement strategy does introduce latency, it will only be suitable for a subset of problems. IV. Simulation Results
The standard JPEG coder [19] is used as the basis of the simulations, with the necessary modi cations made to generate multiple descriptions and perform any required error concealment & re nement as previously detailed. We select a packet size of Smax = 384 bits, corresponding to the data portion of an ATM cell. Additional error detection capabilities could be aorded by making use of header information within ATM cells, were that to be the network protocol of choice. In permuting the block order, parameters = 17 and = 1 were chosen for the LCG. A number of 512 512 test images were used, including lena, zelda and goldhill, but because [5] only provides the results for lena, this particular image received the majority of attention in the simulation. The coder was tested at rates between 0.4 and 0.9 bpp; a rather large range considering most wireless applications would reside in the lower end of this spectrum. The multiple description coder against which smoothed descriptions is compared has = =4. Although can be varied according to channel conditions if feedback is allowed, since redundancy is added based upon previous transmissions, there will still be occasions when channel characteristics deviate from those expectations, and this simulation is designed to model those instances. Thus, the value of used is not of great relevance. A. Error concealment Raw results without re nement for one pattern of block losses at p = 0:25, R = 0:47 bpp are shown in Figure 6. This high error rate virtually destroys the image, with the SNR falling to 20.50 dB. By partitioning the image and applying one of the DC averaging or maximal smoothing procedures, the SNR can be improved by approximately 6 dB in each case. The DC smoothing algorithm leaves a `blotchy' image, since it only adjusts the DC component of damaged blocks and therefore luminosity gradients at block boundaries can be steep. The maximal smoothing algorithm on its own improves the image to a `spotty' version of the original, since the edges of damaged blocks are eectively smoothed. Clearly the best results are obtained when DC averaging and then maximal smoothing are applied in sequence. In this case, the SNR is raised to 28.58 dB, giving quite reasonable perceptual quality. At such high error rates, distortions in the image can be easily noticed, but the multiple descriptions coder of [5]
8 35
Smoothed descriptions Multiple descriptions
34
33
32
(a)
(b)
SNR (dB)
31
30
29
28
27
26
25
(c)
(d)
0
0.05
0.15
0.2 p
0.25
0.3
0.35
0.4
Fig. 8. Comparison of smoothed descriptions and multiple description coders at rate 0.47 bpp. MDC begins to outperform smoothed descriptions at a block error probability of approximately p = 0:05.
Fig. 6. Smoothed description results for lena image at 0.47 bpp with p = 0:25: (a) no error concealment, (b) DC averaging only, (c) maximal smoothing only, (d) both DC averaging and maximal B. Re nement smoothing
would be expected to perform better, in fact, according to equation (3) approximately 29.82 dB would be anticipated. If we now set p = 0:01 { still a high loss rate in most applications { the tables are turned, with the smoothed descriptions coder returning a better SNR (34.12 dB) compared to MDC (33.47 dB). Clearly the location of the errored blocks will aect the end result. To remedy this, the simulation was repeated 500 times using dierent random seeds for each of a range of bitrates and error probabilities. The outcomes are averaged, and presented both in Table 1 and in Figure 7. Alternatively, at a xed bitrate, the performance can be plotted as a function of p and compared to that of MDC. From Figure 8, it is observed that the smoothed descriptions coder gains an advantage over MDC at lower error rates { on the order of 5% of blocks damaged at a bitrate of 0.47 bpp. Similar plots can be generated for other bitrates, which in general show that smoothed descriptions becomes a more favorable option as rate decreases. Results for the zelda and goldhill test were also generated. Comparing the two, smoothed descriptions performs best on zelda, giving a 10 dB gain when p = 0:25 and a 7.5 dB gain for goldhill. The concealed SNR also approached that of the un-errored stream more quickly (as a function of p) for zelda than for goldhill. This is due almost entirely to the higher entropy of the goldhill image itself; losing a block of goldhill results in a greater loss of information than losing a block of zelda. The key observation is that, for both images, smoothed descriptions proves to be of bene t in concealing errors, and would be expected to perform better than MDC at suciently low error probabilities.
0.1
Importantly, the re nement described in Section III.E is driven by the receiver; the encoder only transmits overhead when requested by the decoder a posteriori, rather than incorporating that redundancy in advance and discarding it should it prove unnecessary. Therefore, in well-behaved channels, the decoder may judge the reconstruction quality to be acceptable and simply not emit a feedback packet, keeping the overhead compared to single description coding at precisely zero. Using the selective retransmission strategy previously discussed, a new set of rate-SNR pairs were generated and plotted in Figure 9(a) for = 1. It is immediately apparent that the SNR gain arising from retransmission increases with p, partly due to the minimal size of the feedback packet, and also due to the fact that the eective probability is a function of p2 . Fixing the bitrate at 0.47 bpp and varying p (Figure 9(b)) illustrates this trend. Since the probability of losing the feedback packet is small in concordance with its size, there is a SNR improvement over smoothed descriptions at all values of p. In general, performance decreased with , which could be due in part to the importance metric used to determine the priority of errored blocks. Because the encoder can judge the channel characteristics upon receipt of the feedback packet, it may be desirable to automatically control
so that in error bursts some minimum image quality is obtained with a minimum latency overhead. Although retransmission is not suited for low-delay systems, most feedback data in this simulation occupied fewer than two ATM cells, and (depending upon the topography of the network) this could equate to an allowable latency overhead.
9
TABLE I
PSNR (dB) of lena image using smoothed descriptions
Rate (bpp) 0.47 0.52 0.65 0.83
0.001 34.40 34.81 35.74 36.85
Probability of block error, p 0.005 0.01 0.02 0.05 0.1 34.27 34.12 33.80 32.93 31.66 34.66 34.49 34.15 33.20 31.87 35.56 35.35 34.92 33.82 32.30 36.62 36.35 35.81 34.49 32.74 All results averaged over 500 trials
0.25 28.36 28.45 28.61 28.81
0.4 25.09 25.13 25.17 25.25
37 37
36 36.5
35 36
33
SNR (dB)
SNR (dB)
34
32
35.5
35
31
p=0.02 p=0.05 p=0.1 p=0.25 MDC,p=0 MDC,p=0.5
30
29
28 0.4
0.5
0.6
0.7 Rate (bits/pixel)
0.8
0.9
34.5
p=0.001 p=0.005 p=0.01 MDC,p=0
34
33.5 0.45
1
0.5
0.55
0.6
0.65
(a)
0.7 0.75 Rate (bits/pixel)
0.8
0.85
0.9
0.95
(b) Fig. 7. Smoothed descriptions results with lena image for a range of error probabilities. (a) p 2 [0:02; 0:4], (b) p 2 [0:001; 0:01]. 37
35
36
34
Refinement strategy Smoothed descriptions Multiple descriptions
33
35
32 34
SNR (dB)
SNR (dB)
31 33
32
30
29 31
Refined,p=0.05 Refined,p=0.1 Refined,p=0.25 Concealed,p=0.05 Concealed,p=0.1 Concealed,p=0.25 MDC,p=0
30
29
28 0.4
0.45
0.5
0.55
0.6
0.65 0.7 Rate (bits/pixel)
(a)
0.75
0.8
0.85
28
27
26
0.9
25
0
0.05
0.1
0.15
0.2 p
0.25
0.3
0.35
0.4
(b)
Fig. 9. (a) Smoothed descriptions results for lena image with retransmission. Non-re ned results shown for comparison. (b) Comparison of algorithms at xed rate of 0.47 bpp. Retransmission allows smoothed descriptions to outperform MDC if latency is ignored.
Since a coded bitstream is symmetrically partitioned, each description contains the same amount of information The algorithm presented in this paper provided for error about the source, and this regularity makes application of concealment of improperly received images, despite these the maximal smoothing algorithm simple. To further reimages being partitioned sub-optimally in the multiple de- duce complexity, a `pre-averaging' was introduced prior to scription sense. V. Conclusion
10
smoothing in order to obtain an estimate of missing DC coecients. Simulations showed that smoothed descriptions performed well for target bitrates of 0.4-0.7 bpp. In fact, it was able to outperform a comparable multiple descriptions coder for packet loss probabilities up to approximately 0.05, depending on exact bitrate. Additional work demonstrated that conditionally sending re nement data based on feedback from the decoder increases performance for a range error probabilities. The major conclusion to be drawn from this paper is that, in cases when channel characteristics either do not match MDC's , or when cannot be adapted accurately in response to instability in the channel, alternative approaches can exhibit superior performance by either (a) oering identical quality to MDC but with less bitstream redundancy, or (b) adapting to instances of channel degradation through methods such as retransmission. It is hoped that further attention will be given to algorithms such as smoothed descriptions, as a viable solution for image transmission problems covering a considerable range of channel operating points. Further improved results may be yielded by investigating metrics such as the `importance measure' of missing blocks, and by exploring more generalized permutations of image data. References [1] Y. Wang and Q.-F. Zhu, \Error control and concealment for video communication: A review," Proceedings of the IEEE, vol. 86, no. 5, pp. 974{997, May 1998. [2] J. K. Wolf, A. Wyner, and J. Ziv, \Source coding for multiple descriptions," Bell System Technical Journal, vol. 59, no. ?, pp. 1417{1426, October 1980. [3] M. Vetterli and K. M. Uz, Multiresolution coding techniques for digital television: A review, CU/CTR/TR 223-91-04, Center for Telecommunication Research, Columbia University, New York, NY, 1991. [4] A. A. El-Gamal and T. M. Cover, \Achievable rates for multiple descriptions," IEEE Transactions on Information Theory, vol. 28, no. ?, pp. 851{857, November 1982. [5] Y. Wang, M. T. Orchard, and A. R. Reibman, \Multiple description image coding for noisy channels by pairing transform coecients," in Proc. IEEE Signal Processing Society 1997 Workshop on Multimedia Signal Processing, Princeton, NJ, June 1997. [6] Y. Wang, M. T. Orchard, and A. R. Reibman, \Optimal pairwise correlating transforms for multiple description coding," in Proc. IEEE Conference on Image Processing, Chicago, IL, October 1998, pp. 679{683. [7] S. D. Servetto, V. A. Vaishampayan, and N. J. A. Sloane, \Multiple description lattice vector quantization," in Proc. Data Compression Conference, Snowbird, UT, March 1999. [8] M. Eros, \Universal lossless source coding with the burrows wheeler transform," in Proc. Data Compression Conference, Snowbird, UT, March 1999. [9] S. D. Servetto, K. Ramchandran, V. Vaishampayan, and K. Nahrsteldt, \Multiple description wavelet based image coding," in Proc. IEEE Conference on Image Processing, Chicago, IL, October 1998, pp. 659{663. [10] V. K. Goyal, J. Kovacevic, R. Arean, and M. Vetterli, \Multiple description transform coding of images," in Proc. IEEE Conference on Image Processing, Chicago, IL, October 1998, pp. 674{678. [11] M. Nomura, T. Fujii, and N. Ohta, \Layered packet-loss protection for variable rate video coding using dct," in International Workshop on Packet Video, Torino, Italy, September 1988. [12] P. G. Sherwood and K. Zeger, \Macroscopic image compression for robust transmission over noisy channels," SPIE Visual Communication and Image Processing, vol. 3653, pp. 73{83, January 1999.
[13] J. Rogers and P. Cosman, \Robust wavelet zerotree image compression with xed-length packetization," in Proc. Data Compression Conference, March 1998, pp. 418{427. [14] D. H. Lehmer, \Mathematical methods in large-scale computing units," in Proc. 2nd Symposium on Large-Scale Digital Calculating Machinery, Cambridge, MA, 1949, pp. 141{146. [15] S. K. Park and K. W. Miller, \Random number generators: good ones are hard to nd," Communications of the ACM, vol. 31, no. 10, pp. 1192{1201, 1988. [16] Y. Wang, Q.-F. Zhu, and L. Shaw, \Maximally smooth image recovery in transform coding," IEEE Transactions on Communications, vol. 41, no. 10, pp. 1544{1551, October 1993. [17] S. S. Hemami and R. M. Gray, \Image reconstruction using vector quantized linear interpolation," in Proc. IEEE Conference on Acoustics, Speech and Signal Processing, Adelaide, SA, Australia, April 1994, pp. 5:629{632. [18] J. L. Mitchell, W. B. Pennebaker, C. E. Fogg, and D. J. LeGall, MPEG video compression standard, pp. 51{55, Chapman & Hall, New York, 1997. [19] W. B. Pennebaker and J. L. Mitchell, JPEG still image data compression standard, Van Nostrand Reinhold, New York, 1993.
Justin Ridge received the B.E. (Hons) de-
mation theory.
gree from the University of Tasmania, Hobart, Australia, in 1994 and the M.Eng. degree from Texas A&M University, College Station, Texas, in 1997. He is currently a Ph.D. candidate at Southern Methodist University in Dallas, Texas. Mr. Ridge is a student member of the IEEE, and a graduate member of the Institute of Engineers (Australia). His research interests include image and video compression, multimedia transmission over networks, and infor-
Fred W. Ware received the B.S.E.E. from
works.
Howard University, Washington, D.C., in 1991, the M.S.E.E. from University of California, at Los Angeles in 1993, and the Ph.D. in electrical engineering from Texas A&M University, College Station, Texas, in 1999. He is currently with Nokia Mobile Phones, Inc., Irving, Texas, as a Research Engineer where he conducts multimedia communications research. His interests include wireless communications, image and video coding, and telecommunications net-
Jerry D. Gibson currently serves as Chair-
man of the Department of Electrical Engineering at Southern Methodist University in Dallas, Texas. From 1987-1997, he held the J. W. Runyon, Jr. Professorship in the Department of Electrical Engineering at Texas A&M University. He also has held positions at General Dynamics-Fort Worth, the University of Notre Dame, and the University of Nebraska-Lincoln, and during the fall of 1991, Dr. Gibson was on sabbatical with the Information Systems Laboratory and the Telecommunications Program in the Department of Electrical Engineering at Stanford University. He is co-author of the books Digital Compression for Multimedia (Morgan-Kaufmann, 1998) and Introduction to Nonparametric Detection with Applications (Academic Press, 1975 and IEEE Press, 1995) and author of the textbook, Principles of Digital and Analog Communications (Prentice-Hall, second ed., 1993). He is Editor-inChief of The Mobile Communications Handbook (CRC Press, 2nd ed., 1999) and Editor-in-Chief of The Communications Handbook (CRC Press, 1996).
11
Dr. Gibson was Associate Editor for Speech Processing for the IEEE Transactions on Communications from 1981 to 1985 and Associate Editor for Communications for the IEEE Transactions on Information Theory from 1988-1991. He has served as a member of the Speech Technical Committee of the IEEE Signal Processing Society (1992-1995), as a member of the IEEE Information Theory Society Board of Governors (1990-1998), and as a member of the Editorial Board for the Proceedings of the IEEE. He was President of the IEEE Information Theory Society in 1996. Dr. Gibson served as Technical Program Chair of the 1999 IEEE Wireless Communications and Networking Conference, Technical Program Chair of the 1997 Asilomar Conference on Signals, Systems, and Computers, Finance Chair of the 1994 IEEE International Conference on Image Processing, and General Co-Chair of the 1993 IEEE Information Theory Symposium. In 1990, Dr. Gibson received The Fredrick Emmons Terman Award from the American Society for Engineering Education, and in 1992, was elected Fellow of the IEEE \for contributions to the theory and practice of adaptive prediction and speech waveform coding." He was co-recipient of the 1993 IEEE Signal Processing Society Senior Paper Award for the Speech Processing area. His research interests include data, speech, image, and video compression, multimedia over networks, wireless communications, information theory, and digital signal processing.