Multiple Description Coding using Exact Discrete Radon Transform B. Parrein, N. Normand, JP. Guédon bparrein, nnormand,
[email protected] IRCCyN UMR 6597 Image Vidéo Communication team EPUN, rue Christian Pauc La Chantrerie BP 50609 44306 Nantes Cedex 3 Abstract We propose in this paper a multiple description coding system by computing projections of information elements. With a complete adequation between projections and packets, we can use it over present erasure channel. Any projection reception contributes in a same way to original information reconstruction. Moreover the system can allocate level of priority for each source layer adding a multi-level redundancy. By the topology of support where scalable data is multiplexed, our Priority Encoding Transmission system gives a near optimal solution MDS-based propose in [1] with a lower complexity and a simpler implementation.
1 Introduction Multimedia applications transmit more and more important information volume over packet switch data network as the familiar Internet Protocol. For a wide area network level, a daily Internet weather report [2] shows a huge percentage of packet loss for that several links. Several kinds of umbrellas is today proposed to protect Quality of service (QoS) parameters as integrity and delay. Integrity QoS parameters are grouped into erasure resilient codes coming from the source or the channel or both. A mechanism where source and channel are jointly working seems more relevant. Roughly speaking, a high compression ratio is of no interest when the network is not safe from erasures since destinations will ask back the source for the missing information. A best effort method consists in adding no specific mechanism in the network for regulating flows (no packet hierarchy, no integrity packet and no real-time checking) as the Internet protocol. In this paper, we describe a new way of multi-level redundancy generation applied to the packets content without changing the best effort type of network protocol running these packets. The obtained representation is used to protect in a unequal manner the different semantic levels of a source, e.g. the metaflow, the low frequencies and the high frequencies of an encoded image. Multiple description coding of a source gives a way to use a graceful degradations function for different reception scenarii and according to source scalability. For instance when two channels are used: 1
- destinations receive one description R1 encoded. A version of the original information is thus obtained with side distorsion D1 . - destinations receive one description R2 encoded. Another version of the original information is reconstructed with side distorsion D2 . - both descriptions are received at R0 and the source is reconstructed with a central distorsion D0 with D0 ≤ min(D1 , D2 ) (corresponding to noise quantification). In section 2, some implementations of multiple description coding with support scalability are presented. Our mechanism based on the projection of scalable information is described in section 3 and analysed in section 4. The section 5 presents the protection of progressive JPEG data set. Finally, section 6 compares the proposed approach with MDS codes which are optimal when only the overweight factor is considered.
2 Robustness and multiple description As already presented in the introduction, solutions to the multiple description problem did appeared several years ago. In [3], the generic solution is decomposed into 3 fundamentals steps. First an orthogonal transform (DCT) is applied to the initial signal to roughly decorrelate each sample. The obtained coefficients are partially recorrelated with a known unitary transform to add a statistical redundancy (with constant weight). An entropic coding is eventually implemented for each new description as step 3. The known transform can be a rotation of angle θ as proposed in [4]. Each elements of a description result of pairing decorrelated and quantized coefficient in the two channels case. To extend this result for the N channels case, [3] uses a cascading of several transforms. Vaishampayan [5] constructs descriptions from coefficients index located in a band matrix. One example of index assignment is shown at figure 1. If both descriptions arrived to destination, initial quantization is reconstructed. Otherwise, if one description is erased, the available information is the coefficient of either a line or a column. For scalability management and priority weighting between layers (subband of image) several matrices are used in [6]. i 0
1
2
3
4 Reception1
0
-3 0
1 2
-2
1
2
3
-1 0
1
-3
-2
-1
0
1
2
3
Decoding 1 and 2
j 3 4 5
2
3 4
5
0
1
2
3
Reception 2
6
Fig.1. Indexation matrix with 3 possible reception scenarii When all the traffic gets the same priority the quality transmission is only due to network load. Indeed, in this case network applications (medicals, financials, games) are playing concurrently without any service insurance. The emergence of DiffServ in the IPv4 or the IPv6 protocol will add new QoS functionnalities. By using the Type Of Service field in 2
today’s Internet (IPv4) the DiffServ Code Point (DSCP) service classes will appear on the network. Stamped packets will gain distinct behaviors from end-to-end function according to their priorities. The DiffServ architecture is service-based but can be strenghtened by implementing a stream-based architecture. For Real Time Protocol streams (working over IPv4) Rosenberg and al [7] add a separated parity check stream built by Forward Error Correcting codes (FEC). A systematic form is used to decrease complexity in lossless conditions and still allows classical destinations (without this extension) to work. If a message is splitted into m packets and FEC coder generates n − m control packets, destination is able to decode original message from a set of any r packets (packets media or packets FEC). Parameters (m, n, r) must be adjusted for a multi-level system. For a single priority level, Blömer and al [8] propose a coding scheme based on MDS codes (Maximum Distance Separable) where r = m. As above this code is systematic. Let C a matrix (n − m × m) over GF [2L ]. Matrix (Im |C) generate a MDS(m, n, r) code with packet size L iff each sub-matrix of C is invertible. Cauchy matrices have this property over Galois field. For the decoding side, one sub-matrix of C is simply calculated from lines index, index of redundant packets and index of missing packets for column index. This scheme is used for a Priority Encoding Transmission (PET) system developped by Albanese and al[1]. One priority encoding function gives for each message segment the ratio of packets used in the decoding over emitted packets. In this scheme, the complexity order increases with the number of packets erasures. We propose in the next section an other PET system based on Mojette transform which leads to a constant complexity.
3 Priority encoding transmission by Mojette transform 3.1
Mojette transform
The Mojette transform is a discrete and exact Radon transform which computes a finite number of projections that are able to reconstruct the initial valued support. An image f [k, l] is represented by a finite set of projections projp,q [m]. The projection angle θ is only defined by two integers (p, q), with GCD(p,q)=1 and tan(θ) = pq . Each projection element is denoted as a bin and sums pixels located over the line m = −qk + pl. These pixels are called antecedent for a given bin. The Mojette transform is defined below : P P
f [k, l]4[m + kq ½ − pl] 1 if m = 0 where 4 is the Krönecker function 4[m] = . 0 else Mf represents a set of I projections Mp,q f [k, l] such that Mp,q f [k, l] =
k
l
Mf = {projpi ,qi , i = 1, ...I}. The figure 2 presents the transformation over a simple image of 4x4 pixels with proj−1,0 [m], proj1,1 [m]. The reconstructibility conditions for a (P, Q) rectangular image from a set of I projections give (Katz [9]): P P P ≤ PN = Ii=1 |pi | or Q ≤ QI = Ii=1 |qi |. Reconstructiblity conditions were extended for a (p, q)-convex region using morpholog3
16 7 23 34 28
(-1,0)
27 34
16
2
3
13
34
5
11
10
8
34
9
7
6
12
34
4
14
15
1
1 (1,1)
q
p
Fig.2. Mojette transform of 4x4-magic square by projections (-1,0) and (1,1) ical mathematics tool. One considers each projection vector as a two pixels structuring element ( 2PSE) {(0, 0), (p, q)}. Let image f [k, l] defined over a convex region G. Let R, the region obtained by a series of dilations from the set of 2PSE {(0, 0), (pi , qi )}∀i = 1, ...I. f is reconstructible from projections set {projpi ,qi }, iff R is not included in G [10]. Mojette transform represents an original source in a redundant manner. The above Red index measures the redundant rate for a region and a given projections set. number Red = PBins − 1. ixels number For the following, a complete adequation between projections and packets will be assumed.
3.2
One protection level coding
Here, a non scalable information volume is considered. The resultant coding function is simply the equal priority protection of the source flow. The protection mechanism consists in the 2D-support allocation of each information element where each pixel is decoded by a same number of projections. Besides, to respect equivalence between descriptions, projections size is wished constant. For a support G reconstructible from M projections among I, each projections bin of Mf have to get in maximum M antecedents. This condition is fullfilled when the series of M 2PSE dilations taken in the set Mf can not be included in shape G. For constant size projections, two points M0 (k0 , 0) et M1 (k1 , 0) are chosen to be projected in the first and last bin respectively for any direction. In this case, the number of bins is given by : #binsi = | − qi ∆k| + 1 ∀i = 1, ...I 4
So for a constant number of bins and for a given R, qi is constant for all i = 1, ...I.The addition of a pixel column generate q bins in each projection of Mf. Each value pi must belong to an integer range in such a manner that M0 and M1 corresponds to the first and last bin respectively and such GCD(pi ,q)=1. The next figure shows such a support that can be exactly decoded by any two projections taken in the set Mf = {proj−5,13 , proj−4,13 , proj4,13 , proj5,13 }.
Fig.3. Example of support with one protection level. 3.3
Multi-level redundancy coding
To preserve scalability of a source against channel erasure, a multi level description can be applied to each layer of initial information. The encoded version must transmit the hierarchy of the source in a transparent manner for the channel which is still supposed acting in a best effort case (no hierarchy between packets). In this case, any description can be erased but degradation is still gracefully performed. The figure 4 shows the mapping of two protections levels i.e parts decoded by any 2 (small red portion) and 3 projections (big blue portion) taken in the set Mf = {proj−5,13 , proj−4,13 , proj4,13 , proj5,13 }. This representation of scalable information can be extended with as many protection level as needed by the layered source. So, the Mojette transform and its protected support can be used to transmit by example I,P and B pictures in a MPEG context or different scans in progressive JPEG image coding as presented in section 5. The next section analyses the effective cost of the coding in terms of symbols used for transmission and in complexity order for the relevant algorithms.
5
Fig.4. Example of support with two protection levels.
4 Performances 4.1
Symbols used in decoding
Information support capacity is given by Pick theorem which establishes a relation between surface of a regular polygon A in R2 and pixels number in Z 2 : Card(A) = S(A) + Card(∂A) + 1. 2 In this expression, Card(A) represents the number of points of the Z 2 lattice inside the shape A, Card(∂A) counts edge points of A, and S(A) is the surface of A in lattice units. When A is a simple parallelogram defined from two vectors (p1 ,q1 ) and (p2 ,q2 ) we have: ¯ ¯ ¯ p1 q1 ¯ ¯. S(A) = ¯¯ p2 q2 ¯ Using this formula, pixels numbering on the support of figure 3 and 4 were measured. Figure 5 shows the variations of the redundancy rate Red (symbols used in the decoding over support capacity) according to the support size when using a single protection level and a reconstruction with 2 projections taken in the set Mf = {proj−2,q , proj−1,q , proj1,q , proj2,q } with q = {5, 7, 13}. As expected, the number of bins used in decoding decreases when increasing the support size. Calling m the number of pixels, r the number of bins from the decoded message leads to r = (1 + ε)m. Table 1 presented below shows some ε values for several projections sizes and several values of q (support thickness).
6
0.7
0.6
0.5
Red
0.4
0.3
0.2
q=13 q=7
0.1
0
q=5
0
500
1000
1500 # pixels
2000
2500
3000
Fig.5. Evolution of the redundancy according to the support capacity #bins εq=5 #bins εq=7 #bins εq=13 4.2
302 0.0709 301 0.0995 314 0.1985
502 0.0415 506 0.0586 522 0.1106
1002 0.0204 1010 0.0285 1016 0.0539
1502 0.0135 1510 0.0190 1510 0.0357
Complexity order
Complexity order for decoding in Blömer and al [8] is O(m(n − m)L2 ) where m is the number of packets representing the original message, n the number of sent packets and L the number of word by packet. A systematic form carries a null complexity in decoding for the lossless case. Decoding simply lies in the packets message copy. In the erasure case, complexity is estimated to O(mkL2 ) where k is the redundant packets number. In other words, the overall complexity increase with the number of packets with this method . Algorithms of direct and inverse Mojette transform are not presented in this paper. They are detailed in [10]. To compute a projection, each pixel is added once into a projection. Complexity is defined as O(IN ) where I is the number of projections and N is the number of pixels. The same complexity has been demonstrated for the inverse transform. However, bin indexes and antecedents have to be setted in a table to get this result for the inverse transform. For the streaming case, where scalability can produce temporal differences, partitions are simply updated. The algorithms of the Mojette transform built with constant parameters of channel coding (support shape, projections angles) result in reception scenario with a constant complexity.
5 Application to images In this section, an application of priority encoding transmission system is presented using the Mojette transform onto scalable streams featuring progressive JPEG. The JPEG coder comes from the Independent JPEG Group [11]. The default script for the progressive option is presented at the figure 6 as the support used for multiplexing. Our system protects stream 7
1 by a possible reconstruction from 3 projections among 6(erasure packets of 50% authorized) whereas streams 2 and 3 can be reconstructed by 4 and 5 projections respectively among 6 sent. Since these projections have some extreme bins without information (no pixel correspondences), their sizes are approximately constant around 1580 bins. The total initial information volume was 5645 bytes whereas the 6 projection information represents 9480 bytes corresponding to an overweight of 35% without any additional compression be realized. The obtained redundancy is roughly the rate of [3] for a multiple description coding using frame techniques. The visual results of simulations for an erasure channel are given at figure 7. The left reconstruction is then obtained with 50% packets loss (thus reconstructing only the red portion of the support) corresponding to 0.45 bpp (bit per pixel of the final image compared to 1.12 bit of the received stream per pixel). The center image uses four projections among six which reconstruct two of the three sub-support for a rate of 0.79 bpp (compared to 1.49 bit of received stream per pixel).The right image corresponds to a complete reconstruction of 1.33 bpp (compared to 1.89 bit of received stream per pixel).
Fig.6. Multiplexing scalable streams defined by scans of progressive JPEG over one 3 protection level support. Volumes of each initial image streams are 1910, 1413 and 2322 bytes respectively. When taking an optimal algorithm as MDS codes in the PET system [1], one found 637 (=1910/3), 354 (=1413/4) and 465 (=2322/5) elements in each code word for stream 1,2 and 3 to get the above equivalent priority. This leads to a total code length of 1456 elements. This length can be encoded with the Mojette Transform but can not be decoded. So to compare with identical reconstructed images, our code length for such a support and projection set is around 8% longer than MDS code. On the other hand, the Mojette transform deserves a lower complexity and a simpler implementation.
6 Conclusion In this paper, we have introduced a new multiple description system based on an exact discrete Radon transform called the Mojette transform. It produces partial descriptions (projections) which have equal importance. Yet they contain unequaly protected parts of the initial data. In the application presented in section 5, a progressive JPEG encoder feeds data to the Mojette transformer which behaves as a priority encoding system. An interesting feature of this scheme is to let the application choose reconstruction and distortion levels for each reception scenario. In our case, two situations led to a partial reconstruction and the last to a full reconstruction of the original JPEG image. 8
Fig.7. Results with Mojette transform for an original progressive JPEG picture at 1.33 bpp. From Left to the right, packet loss is 3, 2 and 1 among 6 for rate 0.45 bpp, 0.79 bpp and 1.33 bpp. Better redundancy results could have been achieved with an MDS-based PET system. In the example, we obtained an extra 8% overhead compared to the optimal MDS solution. However, MDS codes require much more computational power than the linear cost Mojette transform (both in direct and inverse algorithms). Furthermore, the Mojette cost does only depend on the number of received packets and not on their contents. Thus, the Mojette decoder immediately launches a reconstruction process upon reception of each packet whereas an MDS decoder could wait a little longer for initial data when parity packets come first in order to lower the decoding cost. However, the decreased global Mojette redundancy the increased support length. For instance, by appropriately growing the support shown in figure 6 in order to allocate bits instead of bytes, we obtained 1% overhead compared to the corresponding MDS system. Lower redundancy rates can also be achieved by using 3D support (and the corresponding Mojette transform) and smaller packets which are best suited for real time applications. The proposed application could use any industry coder and is therefore compatible with the JPEG standard without recoding. Other media standards like JPEG2000 and MPEG can be encoded with the same approach, using a near-MDS coded multiple description system based on the Mojette transform.
References [1] [2] [3] [4]
[5] [6]
A. Albanese, J. Blömer, J. Edmonds, and M. S. M. Luby, “Priority encoding transmission,” IEEE Trans. On Information Theory, vol. 42, pp. 1737–1744, nov. 1996. “Internet weather report.” at http://www.noc.ucla.edu/networking/weather.html. V. K. Goyal, J. Kovacevic, R. Arean, and M. Vetterli, “Multiple description transform of coding images,” in Proc. IEEE Int. Conf. Image Processing, Chicago, IL., 1998. M. T. Orchard, Y. Wang, V. Vaishampayan, and A. R. Reibman, “Redundancy ratedistorsion analysis of multiple description coding using pairwise correlating transforms,” in Proc. IEEE Int. Conf. Image Processing, Santa Barbara, CA., 1997. V. A. Vaishampayan, “Design of multiple description scalar quantizers,” IEEE Trans. On Information Theory, vol. 39, pp. 821–834, May 1993. S. D. Servetto, K. Ramchandran, V. A. Vaishampayan, and K. Nahrstedt, “Multiple description wavelet based image coding,” IEEE Trans. On Image Processing, vol. 9, pp. 813–825, may 2000. 9
[7] [8]
[9] [10] [11]
J. Rosenberg and H. Schulzrinne, “An RTP payload format for generic forward error correction,” Tech. Rep. RFC 2733, Internet Society, December 1999. J. Blömer, M. Kalfane, R. Karp, M. Karpinski, M. Luby, and D. Zuckerman, “An xorbased erasure-resilient coding scheme,” Tech. Rep. TR-95-048, International Computer Science Institute, 1995. M. Katz, “Questions of uniqueness and resolution in reconstruction from projections,” Lectures Notes in Biomathematics, vol. 26, 1979. N. Normand, Représentation D’ Images et Distances Discrètes Bsées sur Les Élements Structurants À Deux Pixels. PhD thesis, Sciences pour l’Ingenieur de Nantes, jan. 1997. “Version 6b of cjpeg.” Independent JPEG Group at ftp://ftp.uu.net/graphics/jpeg/.
10